Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.
After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.
**Note that this PR is BC-breaking in the following use cases:**
**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.
**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
# Change gradient to a sparse tensor
params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))
grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad) # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072
Differential Revision: D14075257
Pulled By: yf225
fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18716
Might be useful as an intermediate stage for some systems that currently use Caffe2 nets as an execution mechanism.
Not sure it's a good idea all together, please comment.
Limitations:
- only Tensor types as inputs/outputs
- the entire module is serialized as a zip archive inside a proto in Caffe2 db, it'd be subject to 4Gb limit and is likely very slow. For small models it'd work though.
- no autograd, though it can be attached in principle
- no way to retrieve parameters inside the script module from C2 runtime perspective (though they potentially can be alias-fetched and stored as individual blobs)
- after deserialization, python wrappers returned don't have correct type (as we don't do module_lookup trick)
Build-wise, I had to add dependency from pybind_state to libtorch.so. I don't think we build Caffe2 python frontend independently anymore, so it should be fine.
Reviewed By: amirshim, houseroad
Differential Revision: D14339599
fbshipit-source-id: 88a37a8abd1f1c4703e5ef937031f222535d4080
Summary:
Because of two separate python extensions with different pybind
instances I have to go through void* conversion. Since it's hidden from
user, it's fine.
New APIs added on C2 side:
- workspace.FetchTorch('blob')
- workspace.Workspace.current.blobs['blob'].to_torch()
- workspace.FeedBlob('blob', pytorch_tensor)
Works on CPU an GPU.
The only glitches are with resizing because of variable/tensor split.
But data sharing works properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17190
Reviewed By: ezyang
Differential Revision: D14163882
Pulled By: dzhulgakov
fbshipit-source-id: d18e5b8fcae026f393c842a1149e972515732de2
Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221
Differential Revision: D13148564
Pulled By: bddppq
fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13377
* Enable junk fill for the default CPU allocator. The first diff only enables this for the tests. A second diff will change the default of zero-fill to false.
* Fix tests to use 64-bit counters that IterOp and LearningRateOp demands.
* Fix kernels that uses uninitialized memory.
Reviewed By: salexspb
Differential Revision: D10866512
fbshipit-source-id: 17860e77e63a203edf46d0da0335608f77884821
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9637
Adding a method to run plan in background. The intended use is to run BlueWhale's data reading & preprocessing net in background while the GPU is training.
Reviewed By: MisterTea
Differential Revision: D8906439
fbshipit-source-id: b1c73ca7327e2d87a8f873924e05ab3d161a3f1e
Summary:
Reason for this change:
(1) Setting/Getting default gpu id doesn't seem to be used at all.
(2) It actually is confusing compared to the CUDA_VISIBLE_DEVICES options etc.
(3) When setting cuda_gpu_id=-1 in the CUDAContext arg, it used to use the
default gpu id but probably we should use the current gpu - so that the caller
will be able to control the device placement.
One use case is for TensorRT - if we have a custom callback layer, then it would
be easier for TRT or whatever caller to set the running device.
Reviewed By: dzhulgakov
Differential Revision: D6740357
fbshipit-source-id: 2ea710e434b10220d5a198e31c93847304636863
Summary:
Flops in conv were underestimated when pad is not zero.
The difference is especially big when image is small.
Reviewed By: salexspb
Differential Revision: D6394190
fbshipit-source-id: b9f057fceae77f745c5daa668cb2100f993d21a7
Summary:
Implemented ApplyTransformIfFaster
Determine if a transform is faster, then return whichever net is better.
Reviewed By: bwasti
Differential Revision: D5534535
fbshipit-source-id: 509943205b0c454bf30fb01343ac4e88d1441c39
Summary: Allow the use of apply_transform() in the python API
Reviewed By: bwasti
Differential Revision: D5530483
fbshipit-source-id: 61a6d36fe125c89629fdeea040a717c453d84417
Summary: Deprecate CNNModelHelper in python/workspace_test.py to use Model_Helper instead of CNN
Reviewed By: harouwu
Differential Revision: D5251778
fbshipit-source-id: d634f1c76e41a95b0247ebf5d5a48aef6f8e232e
Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings.
Reviewed By: salexspb
Differential Revision: D4893083
fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57
Summary:
This is from discussion with dzhulgakov : as a step towards revisiting the
core.Net autonaming, we will first guard against accidental overwrites of
existing networks in the workspace.
ajtulloch since we are doing Predictors in mobile, this should be safe right?
azzolini - I assume this would be safe, but would love to get your approval.
akyrola - would this hurt xray?
Reviewed By: dzhulgakov
Differential Revision: D4897725
fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5
Summary:
A workspace may add a suffix such as "_1" to the net name if other nets
have been added to the workspace with the same name. This is true even
if the previous nets have been removed or if the workspace has been
reset.
Closes https://github.com/caffe2/caffe2/pull/213
Differential Revision: D4899877
Pulled By: Yangqing
fbshipit-source-id: b89b196df815dceff49a3ec76d7f658cdc4b0a38
Summary: Recently a PR landed that removed asserts of trying to feed float64 to FeedBlob for GPUs and changed to a warning. Thus the test testing assertions were given started to fail. Removing it.
Reviewed By: Yangqing
Differential Revision: D4363780
fbshipit-source-id: d9e222c309302243138d4ff3c223c711a4d2052d
Summary:
Recurrent developer-issue is that they pass numpy arrays with FeedBlob but forget that python float is actually double. Cuda ops in caffe2 don't allow doubles.
Thus, I think we should reject incorrect types already at the FeedBlob() when device option is CUDA.
Added test.
Is this too strong?
Reviewed By: ajtulloch
Differential Revision: D4208153
fbshipit-source-id: 364b057a2a37b5d4b95de4e59faebdab724bb0ed
Summary:
This is #2 of a series of changes. It did the following:
(1) a few refactor of the MKL memory interface
(2) an initial MKLContext to deal with MKL specific computations
(3) Provide MKLMemory access in Python with the blob feeder/fetcher registration.
Reviewed By: dzhulgakov
Differential Revision: D4210123
fbshipit-source-id: adea1f1ffbd0b9ffdd55092676468c16bec08992