Summary:
…done once
This allow no-op build to work correctly even when BUILD_CAFFE2_OPS is on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14982
Differential Revision: D13413960
Pulled By: zdevito
fbshipit-source-id: 6e5412a8c375af8a47c76f548cdd31cff15f3853
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14269
Removes reference to Context proper and instead adds a bool argument for async copy (the same as `copy_`)
For CopyFrom - I haven't tweaked all callsites yet. Instead I rely on a terrible hack that pointer to context is implicitly converted to bool when passed, haha :) It's not a good code and I propose to fix it in a follow up diff (maybe using clangr tooling).
Reviewed By: ezyang
Differential Revision: D13117981
fbshipit-source-id: 7cb1dc2ba6a4c50ac26614f45ab8318ea96e3138
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12729
This may have a dependency on D10380678 if size_from_dim(0)
was required because numel() used to return -1 in some cases.
This is no longer true.
Reviewed By: li-roy, dzhulgakov
Differential Revision: D10415069
fbshipit-source-id: 39f46f56249ecaf3533f62a0205b3a45d519d789
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10261
1. Reserve
Currently, Reserve will allocate new memory and old data in the tensor is also preserved,
and Resize is relying on this behavior in some call-site, e.g. https://github.com/pytorch/pytorch/blob/master/caffe2/operators/reservoir_sampling.cc#L103, where we should be using Extend.
We want to bring semantics of Reserve to be more aligned with std::vector, i.e. we want it to be
an optimization about memory allocation and remove the semantics about preserving the data. We'll remove the guarantee that data will be preserved after Reserve, and Extend will be the only API that preserves old data when we do in-place extension of memory. This also helps with the later refactoring on split Storage from Tensor.
Also, we'll only pass in the outer dimension to Reserve which means the later dimensions should be set before we call Reserve.
2. Extend/Shrink
Previously, Extend actually means ExtendBy and Shrink means ShrinkTo, I would like to add a ExtendTo for convenience, and change Shrink to ShrinkTo.
Old functions calling Extend is still there, although it actually means Extend by, but I think it still makes sense to have it.
3. Usage Patterns
The expected usage patterns right now is:
```
t->Resize({0, 32, 32, 32});
t->template mutable_data<T>(); // set meta_
t->Reserve(100);
auto* t_data = t->template mutable_data<T>();
// feed data to tensor using t_data
for (int i = 0; i < 100; ++i) {
t->Extend(1, 50, &context_);
// you can continue to use t_data if you have reserved enough space
// otherwise, you should call t->template mutable_data<T> again to
// get the new data pointer since Extend will allocate new memory even
// though the original data is preserved.
}
```
Reviewed By: ezyang
Differential Revision: D9128147
fbshipit-source-id: e765f6566d73deafe2abeef0b2cc0ebcbfebd096
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10173
With D9024330, `Extend` fundtion is no more a template, which makes
the `template` keyword here invalid. For some reason current version of LLVM
doesn't catch this, but the latest one does.
Reviewed By: jerryzh168
Differential Revision: D9133462
fbshipit-source-id: 54ac9aad01f81b9b4e7b6e2864b8961478d2d860
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: ezyang, houseroad
Differential Revision: D9024330
fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: xw285cornell
Differential Revision: D8121878
fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9520
Add random data filler to predictor bench to support production nets
Reviewed By: salexspb
Differential Revision: D8712757
fbshipit-source-id: 2c732b2ba71ab210f9222adf94d08442ca71dc03
Summary: Add check that every time we register a caffe operator to CPU or GPU that documentation is added for the particular operator.
Reviewed By: dzhulgakov
Differential Revision: D5443110
fbshipit-source-id: 3793c3d29bea1228078cb30bdf8243ac0ab90664
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary: Net construct bench was using old version of data_parallel_model API.
Reviewed By: bddppq
Differential Revision:
D5453281
Tags: easy
fbshipit-source-id: 93e1ba58511c7b25235ee50d9862fd0614b344c9
Summary: Add lint rule to check that every time we register a caffe operator to CPU or GPU that documentation is added for the particular operator.
Reviewed By: dzhulgakov
Differential Revision: D5348078
fbshipit-source-id: c3fa22fc7ca8066d5fc8fa780b23d7867fd3380e
Summary: Based on benchmark script located at `caffe2/experiments/python/device_reduce_sum_bench.py`, device reduce sum is slower for N <= 10000, so we only switch to use device reduce for large N in SumElements. This diff applies the same schema for SumSqrElements.
Reviewed By: jamesr66a
Differential Revision: D5369868
fbshipit-source-id: ae13a611aff9d3464d1c4950ee155c740a2da339
Summary: Port SumElements and softmax_ops.cu to use device reduce sum
Reviewed By: akyrola
Differential Revision: D5351881
fbshipit-source-id: ca9604186c261ffcb1480da2a17baab8a4809372
Summary:
Migrate experiments folder to fb/sparse folder. Keep FunHashOp and SparseFunHashOp because they are now assumed as a default Op in depr. What I did
# Migrate FunHashOp and SparseFunHashOp and their unitests to core-caffe2, make sure tests are passed.
# Migrate other Ops in experiment folder to fb/sparse folder. Write new TARGETS files for them. Make sure tests are passed.
# Make sure all related tests passed.
# Fix MKL definition btw. Make sure that FC_Sparse is not compiled when there is no MKL support
Reviewed By: salexspb
Differential Revision: D4952993
fbshipit-source-id: 86c03676ab4e47f04d2d0dd438a4a1c849bbbff0
Summary:
Currently build_sgd is in facebook specific directory. Need to move it to python so that
the open source world can use it.
Reviewed By: salexspb
Differential Revision: D4547016
fbshipit-source-id: d699b7b1ab8051afdeadedb4d247ec2a04a7a3e7
Summary:
I have noticed that constructing the Xray model takes quite a while. To measure this, I wrote a benchmark script that creates a resnet-50 model on 8 gpus. This takes about 95 secs -- which is kind of annoying when you want to quickly debug stuff.
Profiling (using Python's cProfile), I was able to see that the most of the time is used in net.BlobIsDefined(), which does a linear search over external inputs and operator outputs. Thus it gets slower and slower with large nets. This can be fully optimized by keeping a separate lookup table of operator inputs and outputs (and external inputs and outputs). It is a bit annoying to keep this separate data structure, but I setup the unit tests to ensure things are doing correctly over Clones.
After the optimization, the net construction drops from 95 secs to 8.2 secs!
Reviewed By: azzolini
Differential Revision: D4288307
fbshipit-source-id: 0bb82c8bde9d86a2702b298f4aa706cba509346e