ffec31ea07
pycaffe2 c++ extension: py3
...
So I tried to make things compilable in python3 but a lot of the actual
functionalities are yet to be verified. Since I am not using py3 for a
short while and protobuf 2.6.1 does not work with py3 (among a bunch of
others), I'll put this as a future todo item.
2016-03-07 22:08:53 -08:00
1b5da38e29
minor fix
2016-03-07 13:47:36 -08:00
46de5451ed
BREW modifications
2016-03-05 08:00:42 -08:00
b589d831b8
cudnn v4 interface change.
2016-03-05 08:00:08 -08:00
05ead5f76f
Bugfix for logging
2016-03-04 09:47:20 -08:00
50874dc746
relu and pool wip
2016-02-01 14:08:10 -08:00
1740974347
average pooling wrapper: without this the NHWC path would throw an error as the order is not passed along.
2016-01-22 09:31:49 -08:00
5a94ee6b64
Allow one to set the blas backend, while optionally choosing to use
...
Eigen for the whole numerical computation (for example, on a platform
where there is no optimized BLAS libraries present, or Eigen is already
the fastest numerical library existing).
The paths I have tested is Eigen and atlas. Have not tested MKL yet.
2016-01-20 17:05:21 -08:00
78aa266770
Fix
2016-01-19 14:49:48 -08:00
d84545c5fb
fp16: allow one to override.
2016-01-19 14:39:26 -08:00
5f2d7ba963
misc: experimental cuda elementwise rtc, softmax fp16
2016-01-19 12:49:36 -08:00
d244ca9052
relu fp16 fix
2016-01-13 22:12:49 -08:00
fa59b90c72
misc updates
2016-01-13 21:00:56 -08:00
a05782f025
fix
2016-01-12 15:59:02 -08:00
d08880e61a
more RTC experiments
2016-01-12 15:44:15 -08:00
fe78d1a445
quick rtc try
2016-01-11 20:21:41 -08:00
64e0d3a29a
misc updates, mainly relu, to test fp16
2016-01-07 20:56:11 +00:00
e2b9172b4c
print cudnn version
2016-01-07 18:49:41 +00:00
1c020d257b
bugfix
2016-01-07 18:48:30 +00:00
a5a75e8005
some changes for TX1 benchmark
2016-01-05 20:20:50 +00:00
8c1bbaa2ab
some fill ops that are not tested.
2016-01-05 09:55:22 -08:00
6cb2072422
cudnn conv op backward compatibility back to v2
2016-01-05 09:55:21 -08:00
778a1f6956
speed benchmark
2016-01-05 09:55:21 -08:00
05eda208a5
Last commit for the day. With all the previous changes this should give an exact reference speed that TensorFlow with CuDNN3 should achieve in the end.
2016-01-05 09:55:21 -08:00
896e8e5274
pooling backward cudnn, and constant for kOne and kZero.
2016-01-05 09:55:21 -08:00
f8585bbf62
cudnn pool op.
2016-01-05 09:55:21 -08:00
664bdf83d7
Pooling refactor so we can do a proper cudnn benchmark.
2016-01-05 09:55:21 -08:00
288f350899
math_gpu.cu bugfix
2016-01-05 09:55:21 -08:00
55cced894d
Some untested half float stuff for benchmarking.
2016-01-05 09:49:55 -08:00
8d4683434b
convnet benchmark: make it consistent with TF's model.
2015-12-17 11:25:51 -08:00
b7c3b48469
copy matrix can be done with cudamemcpy.
2015-12-17 10:22:02 -08:00
b10ee24fc3
conv op: backward exhaustive mode too. This does not seem to help much, suggesting that cudaGetConvolution*Algo is already doing a very good job. Verified with googlenet.
2015-12-17 10:21:16 -08:00
d79cfb4ae7
exhaustive search for cudnn
2015-12-15 22:21:11 -08:00
05e3207e26
fast path for copymatrix
2015-12-15 21:25:53 -08:00
cc9323793e
add relu cudnn code
2015-12-15 20:43:34 -08:00
4f2530d8ce
expose benchmark code to python
2015-12-15 20:42:54 -08:00
6b27cabf17
net benchmark code
2015-12-15 20:42:22 -08:00
cf8ffe215f
minor tuning
2015-12-15 20:41:58 -08:00
f714ad0a70
number of blocks now makes more sense.
2015-12-15 10:46:50 -08:00
3b0cc79465
context gpu: better error catching
2015-12-14 13:59:28 -08:00
73f3daf736
minor bugfix for workspace
2015-12-13 08:37:36 -08:00
bfae070de1
minor bugfix for net
2015-12-13 08:37:01 -08:00
359f7685f8
halfway into timing test.
2015-12-11 11:01:40 -08:00
ae1ebd0f19
a script to test zeromq db throughput.
2015-12-09 15:15:06 -08:00
77541ffe14
flags relaxation, or tightening?
2015-12-07 20:48:57 -08:00
ceb4cde74a
average pooling format change to fit the cudnn interface
2015-12-06 15:56:29 -08:00
6bfb30047e
deprecate legacy pooling
2015-12-06 11:28:00 -08:00
05465783c6
optionally use protobuf lite
2015-12-05 16:15:00 -08:00
3d7cb201a3
misc changes to reduce binary size.
2015-12-04 21:31:23 -08:00
4eb486bd34
misc update to reduce binary size. Removed zmq.hpp
2015-12-03 21:28:55 -08:00