7854 Commits

Author SHA1 Message Date
ffec31ea07 pycaffe2 c++ extension: py3
So I tried to make things compilable in python3 but a lot of the actual
functionalities are yet to be verified. Since I am not using py3 for a
short while and protobuf 2.6.1 does not work with py3 (among a bunch of
others), I'll put this as a future todo item.
2016-03-07 22:08:53 -08:00
1b5da38e29 minor fix 2016-03-07 13:47:36 -08:00
46de5451ed BREW modifications 2016-03-05 08:00:42 -08:00
b589d831b8 cudnn v4 interface change. 2016-03-05 08:00:08 -08:00
05ead5f76f Bugfix for logging 2016-03-04 09:47:20 -08:00
50874dc746 relu and pool wip 2016-02-01 14:08:10 -08:00
1740974347 average pooling wrapper: without this the NHWC path would throw an error as the order is not passed along. 2016-01-22 09:31:49 -08:00
5a94ee6b64 Allow one to set the blas backend, while optionally choosing to use
Eigen for the whole numerical computation (for example, on a platform
where there is no optimized BLAS libraries present, or Eigen is already
the fastest numerical library existing).

The paths I have tested is Eigen and atlas. Have not tested MKL yet.
2016-01-20 17:05:21 -08:00
78aa266770 Fix 2016-01-19 14:49:48 -08:00
d84545c5fb fp16: allow one to override. 2016-01-19 14:39:26 -08:00
5f2d7ba963 misc: experimental cuda elementwise rtc, softmax fp16 2016-01-19 12:49:36 -08:00
d244ca9052 relu fp16 fix 2016-01-13 22:12:49 -08:00
fa59b90c72 misc updates 2016-01-13 21:00:56 -08:00
a05782f025 fix 2016-01-12 15:59:02 -08:00
d08880e61a more RTC experiments 2016-01-12 15:44:15 -08:00
fe78d1a445 quick rtc try 2016-01-11 20:21:41 -08:00
64e0d3a29a misc updates, mainly relu, to test fp16 2016-01-07 20:56:11 +00:00
e2b9172b4c print cudnn version 2016-01-07 18:49:41 +00:00
1c020d257b bugfix 2016-01-07 18:48:30 +00:00
a5a75e8005 some changes for TX1 benchmark 2016-01-05 20:20:50 +00:00
8c1bbaa2ab some fill ops that are not tested. 2016-01-05 09:55:22 -08:00
6cb2072422 cudnn conv op backward compatibility back to v2 2016-01-05 09:55:21 -08:00
778a1f6956 speed benchmark 2016-01-05 09:55:21 -08:00
05eda208a5 Last commit for the day. With all the previous changes this should give an exact reference speed that TensorFlow with CuDNN3 should achieve in the end. 2016-01-05 09:55:21 -08:00
896e8e5274 pooling backward cudnn, and constant for kOne and kZero. 2016-01-05 09:55:21 -08:00
f8585bbf62 cudnn pool op. 2016-01-05 09:55:21 -08:00
664bdf83d7 Pooling refactor so we can do a proper cudnn benchmark. 2016-01-05 09:55:21 -08:00
288f350899 math_gpu.cu bugfix 2016-01-05 09:55:21 -08:00
55cced894d Some untested half float stuff for benchmarking. 2016-01-05 09:49:55 -08:00
8d4683434b convnet benchmark: make it consistent with TF's model. 2015-12-17 11:25:51 -08:00
b7c3b48469 copy matrix can be done with cudamemcpy. 2015-12-17 10:22:02 -08:00
b10ee24fc3 conv op: backward exhaustive mode too. This does not seem to help much, suggesting that cudaGetConvolution*Algo is already doing a very good job. Verified with googlenet. 2015-12-17 10:21:16 -08:00
d79cfb4ae7 exhaustive search for cudnn 2015-12-15 22:21:11 -08:00
05e3207e26 fast path for copymatrix 2015-12-15 21:25:53 -08:00
cc9323793e add relu cudnn code 2015-12-15 20:43:34 -08:00
4f2530d8ce expose benchmark code to python 2015-12-15 20:42:54 -08:00
6b27cabf17 net benchmark code 2015-12-15 20:42:22 -08:00
cf8ffe215f minor tuning 2015-12-15 20:41:58 -08:00
f714ad0a70 number of blocks now makes more sense. 2015-12-15 10:46:50 -08:00
3b0cc79465 context gpu: better error catching 2015-12-14 13:59:28 -08:00
73f3daf736 minor bugfix for workspace 2015-12-13 08:37:36 -08:00
bfae070de1 minor bugfix for net 2015-12-13 08:37:01 -08:00
359f7685f8 halfway into timing test. 2015-12-11 11:01:40 -08:00
ae1ebd0f19 a script to test zeromq db throughput. 2015-12-09 15:15:06 -08:00
77541ffe14 flags relaxation, or tightening? 2015-12-07 20:48:57 -08:00
ceb4cde74a average pooling format change to fit the cudnn interface 2015-12-06 15:56:29 -08:00
6bfb30047e deprecate legacy pooling 2015-12-06 11:28:00 -08:00
05465783c6 optionally use protobuf lite 2015-12-05 16:15:00 -08:00
3d7cb201a3 misc changes to reduce binary size. 2015-12-04 21:31:23 -08:00
4eb486bd34 misc update to reduce binary size. Removed zmq.hpp 2015-12-03 21:28:55 -08:00