Commit Graph

53 Commits

Author SHA1 Message Date
29d759948e use irange for loops 2 (#66746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705361

fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268
2021-12-10 04:26:23 -08:00
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
fadaa52f64 [caffe2] add an EstimateAllBlobSizes operator (#59775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59775

This operator is similar to `GetAllBlobNames` but also returns the estimated
size required to serialize each node.

One goal of this operator is to allow checkpoint saving logic to estimate the
amount of space/bandwidth required to save a checkpoint when first starting
training, without actually serializing any blobs yet.  Currently the
checkpointing logic uses `GetAllBlobNames` to determine the blobs to
checkpoint.  It can instead be updated to use `EstimateAllBlobSizes` to also
get an estimate for how much space will be required for the checkpoint.
ghstack-source-id: 132275153

Test Plan: Included a new unit test.

Reviewed By: mraway

Differential Revision: D29020227

fbshipit-source-id: 811e5d86c4b59183e84e6424c48c97739be09043
2021-06-24 16:55:22 -07:00
7e5ffbfa94 [caffe2] add a SerializationOptions field for the save operator (#53402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53402

Add an `options` field to the `Save` operator which accepts options for how to
serialize different blobs.  At the moment this simply allows controlling the
existing `chunk_size` behavior, but in the future we can add other options,
such as the ability to control compression settings or other serialization
formats.
ghstack-source-id: 123567034

Test Plan:
Added a new test to `load_save_test.py` that passes in options and verifies
that blobs were serialized with the expected number of chunks.

  buck test caffe2/caffe2:caffe2_test_cpu \
    caffe2/caffe2/core:serialization_test \
    caffe2/caffe2/python/operator_test:load_save_test

Reviewed By: mraway

Differential Revision: D26502577

fbshipit-source-id: 6e302e530bb96990517c2e35c505db7f14a56284
2021-03-11 13:02:58 -08:00
99d7c8ff94 [caffe2] use AddNAlreadyReserved() when serializing blobs (#53400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53400

This is a reland of D26617038 (b4a8d98247) after rebasing onto D26802576 (f595ba1bae).

Optimize the blob serialization code by using `AddNAlreadyReserved()` when
serializing tensor data, rather than making N separate `Add()` calls.
`AddNAlreadyReserved()` is a simple addition operation, while each `Add()`
call checks to see if it needs to reserve new space, and then updates the
element data, which is unnecessary in this case.
ghstack-source-id: 123567030

Test Plan:
This appears to improve raw serialization performance by 30 to 35% for float,
double, and int64_t types which use this function.  This improvement appears
relatively consistent across large and small tensor sizes.

Reviewed By: mraway

Differential Revision: D26853941

fbshipit-source-id: 4ccaa5bc1dd7f7864068d71a0cde210c699cbdba
2021-03-10 15:27:52 -08:00
21c3f6f415 Revert D26617038: [caffe2] use AddNAlreadyReserved() when serializing blobs
Test Plan: revert-hammer

Differential Revision:
D26617038 (b4a8d98247)

Original commit changeset: 97dedbae889d

fbshipit-source-id: 6921d0a64dee26e18f16628773953bbe7280998e
2021-02-25 21:32:40 -08:00
b4a8d98247 [caffe2] use AddNAlreadyReserved() when serializing blobs
Summary:
Optimize the blob serialization code by using `AddNAlreadyReserved()` when
serializing tensor data, rather than making N separate `Add()` calls.
`AddNAlreadyReserved()` is a simple addition operation, while each `Add()`
call checks to see if it needs to reserve new space, and then updates the
element data, which is unnecessary in this case.

Test Plan:
This appears to improve raw serialization performance by 30 to 35% for float,
double, and int64_t types which use this function.  This improvement appears
relatively consistent across large and small tensor sizes.

Differential Revision: D26617038

fbshipit-source-id: 97dedbae889d35463628f3016ac56986e685289e
2021-02-25 20:24:01 -08:00
71ca600af9 Renaming CAFFE2_API to TORCH_API (#49496)
Summary:
Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO.

Manually edited some references of the removed `CAFFE2_API`:
* `CONTRIBUTING.md`
* `caffe2/proto/CMakeLists.txt`
* `cmake/ProtoBuf.cmake`
* `c10/macros/Export.h`
* `torch/csrc/WindowsTorchApiMacro.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496

Reviewed By: malfet, samestep

Differential Revision: D25600726

Pulled By: janeyx99

fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782
2020-12-18 10:54:50 -08:00
da6f249a10 [caffe2] DeserializeToNDArray (#49135)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49135

Differential Revision: D25417845

fbshipit-source-id: 4d8efd440bc2577fb717f911a401e7b81d48b907
2020-12-10 21:59:25 -08:00
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
7bb36ada1f fix -Wsign-compare warnings for some files inside c2 (#18123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18123

the motivation of this fix is to resolve things like:
for(auto i = 0; i < N; i++) where N is bigger than int32

These instances of comparison were found by enabling -Wsign-compare

There are way too many things to fix, so issuing this as a series of fixes

The plan is to fix all these issues and then enable this flag into Caffe2 to catch future instances

Reviewed By: ZolotukhinM

Differential Revision: D14497094

fbshipit-source-id: bca3927a2188bd33a508fa503ba221c220cdaefe
2019-03-19 10:39:20 -07:00
9b272c08cf Remove partially initialized Tensor in Deserialization (#14197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13642

Previously we pass in a patially initialized Tensor to Deserialize and it will fill
it with the result of deserialization of a tensor proto. Now we want it to return
a Tensor directly since it's just a shared pointer to TensorImpl.

Reviewed By: dzhulgakov

Differential Revision: D12874357

fbshipit-source-id: 12b80a763375da23cfa64a74d6bc186d8d03b94f
2018-12-10 17:17:29 -08:00
4b0fc5200b Fix include paths for typeid.h (#13689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13689

Now that typeid.h lives in c10/util, the include paths should reflect that.

Reviewed By: ezyang

Differential Revision: D12912237

fbshipit-source-id: e54225f049f690de77cb6d5f417994b211a6e1fb
2018-11-14 18:04:09 -08:00
a6949abb15 Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (fixed reverted bug) (#12848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12848

Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.

Most of the affected code was called from classes derived from  BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.

Original commit changeset: c0760e73ecc7

Reviewed By: dzhulgakov

Differential Revision: D10453456

fbshipit-source-id: d2f2b7b4578e721924354149f08f627c7e3bf070
2018-10-23 16:21:26 -07:00
805f4d5cb8 Revert D10416438: Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE
Differential Revision:
D10416438

Original commit changeset: cb842e3e26b0

fbshipit-source-id: c0760e73ecc76ca9b1b74f6844e243c2df5260a2
2018-10-18 13:46:33 -07:00
63cd051867 Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (#12799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12799

Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.

Most of the affected code was called from classes derived from  BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.

Reviewed By: ezyang

Differential Revision: D10416438

fbshipit-source-id: cb842e3e26b0918829d71267a375d4dd40600d58
2018-10-18 12:49:01 -07:00
6cbf1992bd Serialization takes pointers instead of Blob (#11925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11925

This is step 1 in the refactoring to remove Blob::ShareExternal(), i.e. Blob would then always own its contents.

ShareExternal() is for example used to pass non-owning blobs to serialization. This diff prepares removing that.

Reviewed By: ezyang

Differential Revision: D9884177

fbshipit-source-id: d01df9a613a4fc62e5679fe45bfc47e2c899b818
2018-10-17 11:50:34 -07:00
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
b2b05b7c20 Move blob serialization to free functions (#11817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817

Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead.
This takes away access to Blob internals from them and makes future refactorings easier.

Reviewed By: ezyang

Differential Revision: D9882726

fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729
2018-09-20 23:27:34 -07:00
0a809fc8b1 build changes to make cpu unified build working. (#10504)
Summary:
Properly annotated all apis for cpu front. Checked with cmake using

cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON

and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504

Reviewed By: ezyang

Differential Revision: D9316491

Pulled By: Yangqing

fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
2018-08-15 17:22:36 -07:00
5765549155 codemod -d caffe2 --extensions cc,h CaffeTypeId TypeIdentifier (#10166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10166

TypeIdentifier is still easy to codemod away from

Reviewed By: smessmer

Differential Revision: D9132840

fbshipit-source-id: bc83a8b17b2e7c19c9d2c9cfe5c7ce6ec1d8cec5
2018-08-02 11:54:30 -07:00
aebf3b47ae Remove template parameter from Tensor (#9939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939

Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: ezyang, houseroad

Differential Revision: D9024330

fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
2018-07-27 10:56:39 -07:00
969b62f276 Revert D8121878: Remove template parameter from Tensor
Differential Revision:
D8121878

Original commit changeset: 4a5e9a677ba4

fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e
2018-07-26 14:02:04 -07:00
cd5adc7b5f Remove template parameter from Tensor (#13)
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: xw285cornell

Differential Revision: D8121878

fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
2018-07-26 10:25:23 -07:00
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
b8f670eae8 Fix windows build error
Summary:
TSIA. Verified on local machine with VS 2017.
Closes https://github.com/caffe2/caffe2/pull/1455

Differential Revision: D6310658

Pulled By: Yangqing

fbshipit-source-id: 88f4519e8e9a4178719a5627365267f627dcb939
2017-11-14 00:05:33 -08:00
fc8532c89d Allow serialization of custom types inside Tensor
Summary:
The use case is that sometimes we need a Tensor of custom type instead of POD
or string. This diff allows one to delegate to BlobSerializerBase to further
serialize the contents inside the Tensor.

Design choices:
(1) Each element is serialized as a BlobProto string, and stored in the
repeated string field.
(2) UNDEFINED is used as the enum value for the tensor data type, and the exact
type string is stored in the additional field.
(3) BlobSerializer is called on each item to obtain the serialized string.
(4) This requires the custom type to have copy constructor - otherwise it
will simply not be possible to copy over the deserialized content without
explicit type.

See blob_test.cc for an example.

Reviewed By: sunnieshang

Differential Revision: D6300196

fbshipit-source-id: 18bf94a22a07337e0fa83d3f1004b3651e38cf27
2017-11-10 13:14:21 -08:00
1149b9bbb5 Polling async net executor
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
 Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
 ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device

Reviewed By: dzhulgakov

Differential Revision: D5985110

fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
2017-11-03 07:27:44 -07:00
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
34be12353b comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 15:14:43 -07:00
54e8ef14fb add flag caffe2_serialize_fp16_as_bytes
Reviewed By: kennyhorror

Differential Revision: D5403218

fbshipit-source-id: 755e7a709880f54096a6e5e661554614fc2cc585
2017-07-11 22:20:36 -07:00
d9daad509d Serialize float16 tensors as bytes to get rid of 50% overhead
Summary: When we use int32_data field for float16 tensors serialization it's possible to end up with up to 50% larger representation than can be achieved using byte_data. The reason for it is varints (https://developers.google.com/protocol-buffers/docs/encoding#varints). In worst cast (when highest sign bit is set) it uses 3 8-bit blocks i.e. 24 bits for each number. Saving in byte_field removes this overhead.

Reviewed By: Yangqing

Differential Revision: D5375267

fbshipit-source-id: 0068daed25cd0157ea80a768b6e3899ea2bd8caf
2017-07-10 11:19:09 -07:00
7517f050fc apply clang-tidy modernize-use-override
Summary: Use clang-tidy to mechanically add missing `override` and remove redundant `virtual`.

Reviewed By: igorsugak

Differential Revision: D5211868

fbshipit-source-id: 6a85f7c4a543a4c9345ec5b0681a8853707343dc
2017-06-09 11:33:07 -07:00
0a25b9cb50 fix android build
Summary:
The most recent diff from Andrey had a tiny bug that triggered an error in Android.
Closes https://github.com/caffe2/caffe2/pull/543

Differential Revision: D5040516

Pulled By: Yangqing

fbshipit-source-id: d7b11b509a20b8b5e33db74dd383b55f43608c8f
2017-05-11 11:22:25 -07:00
12965a4108 Add Poorman's IOBound ThreadPool for serialization.
Summary:
At the moment serialization can tak up to 3x memory of the largest blob:
original blob, BlobProto, SerializeAsString version of the blob. As a result in
certain cases serialization takes more memory than it should and it hurts
utilization/max model size per machines.

This diff is adding IOBound ThreadPool that should set quite strict limitation
on the extra memory overhead per one blob.

Reviewed By: dzhulgakov

Differential Revision: D5012887

fbshipit-source-id: 12dbb9d3efab136411ddeffd519b602cf606661e
2017-05-08 06:43:31 -07:00
ba1d592b5f New 40% faster net-type for MLP on GPUs
Summary:
This diff introduces a new net type 'singlethread_async' which is based on my investigation of DPER/hogwild MLP bottlenecks.
It only uses one CPU thread, but multiple GPUs on each GPU. This is implemented by having each Net to submit their list of operators to
a central GPU-specific executor queue and a thread that executes them asynchronously. This executor takes all tasks in the queue and executes them on separate cuda streams and then waits them in the end. This solution can achieve >95% GPU utilization on 8 GPUs when sufficient amount of workers is used.

FYI: I also tried fancier solution such as using cudaStreamCallbacks(), but they did not have as good performance.

Improved the dper bench by adding the MomentumSGDUpdate operations and adding speed test capabilities. During my testing I also noticed that the startup costs for inizialing CUDA streams and contexts  are high, so it is important to do a warm up.

Reviewed By: Yangqing

Differential Revision: D4553941

fbshipit-source-id: bb00524bef653d75de026dd64097b8d9b7a0acb3
2017-02-21 21:40:15 -08:00
864f561525 Make BlobDeserialization throw exceptions instead of returning bool
Summary: Makes it much nicer to spot errors, especially in iPython notebook.

Reviewed By: kennyhorror

Differential Revision: D4465726

fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219
2017-01-26 09:44:19 -08:00
65f7c915fd Fix non-chunked Blob::Serialize method
Summary: Previous implementation was just concatenating string which I believe is wrong. Instead let's turn off chunking when we don't ask for it.

Reviewed By: kennyhorror

Differential Revision: D4461311

fbshipit-source-id: 8b9a3325a40a1cd0a8ffeeb20a17bf9f57b7b0a9
2017-01-25 11:14:51 -08:00
ceb0c765b9 Make avoid duplicate keys when doing chunking in serialization
Summary: Some DB don't support duplicate keys. Nvidia had problems with LMDB where we potentially can setup duplicate keys. But this won't be possible in some other cases. So instead lets just store different chunks with different keys in DB. And then when reading back we will remove the special suffix.

Reviewed By: dzhulgakov

Differential Revision: D4446583

fbshipit-source-id: 6b345e342840c5fd476029166db131d343467d48
2017-01-23 10:14:18 -08:00
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00
d1e9215184 fbsync 2016-10-07 13:08:53 -07:00
c15e45c9bb chunky sync again 2016-08-01 20:58:46 -07:00
b729f05c35 Android build improvements 2016-07-26 12:48:53 -07:00
6463eebc7b chunky sync - build scripts to be written 2016-07-21 10:16:42 -07:00
559053d3a8 chunky sync 2016-05-13 14:43:48 -07:00
98c5b86ef7 A few changes:
(1) cudnn for conv
(2) cublas: after going through the work I feel it's beter to use HOST pointer mode, so changed it.
(3) storage order: despite that googlenet and multibox uses NHWC, it seems better to be still using
    NCHW as default to be consistent with caffe and cudnn; moved to NCHW as default.
2015-10-21 22:37:11 -07:00
d734ddc196 Adding optional Eigen code. Added a switch USE_SYSTEM_EIGEN in Env. Misc changes. 2015-10-18 16:55:24 -07:00