Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.
After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.
**Note that this PR is BC-breaking in the following use cases:**
**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.
**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
# Change gradient to a sparse tensor
params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))
grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad) # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072
Differential Revision: D14075257
Pulled By: yf225
fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
Summary:
Appending `arch` to the generator name is not supported for VS starting from VS 2019.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20752
Differential Revision: D15436740
Pulled By: ezyang
fbshipit-source-id: 20057aae8f708d82619927bf2cb87dd1bc2df312
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20674
A few targets in caffe2/caffe2/distribute needs to be split too, otherwise won't compile. Also some clean ups and make select_gpu_type to gpu_library_selector
Differential Revision: D15406019
fbshipit-source-id: 6455ab885b248502b48d4c7565597e00fecfd547
Summary:
Symbols are given hidden visibility by default on Linux to emulate the behavior on Windows. This helps developers catch visibility issues in their streamlined Linux dev environment before being surprised, late in the process, by Windows errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20461
Reviewed By: kostmo
Differential Revision: D15410410
Pulled By: dzhulgakov
fbshipit-source-id: 1d684b5a9a80b692966a775c3f1c56b7c72ffc95
Summary:
First time this was merged it broke master and was reverted. This time I do not add ```set -u``` to the .circleci/scripts/setup* scripts. There's still a chance that ```set -u``` breaks the binary builds on master, but at least those can be fixed in parallel and don't completely eliminate signal from all merges.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20540
Differential Revision: D15373444
Pulled By: pjh5
fbshipit-source-id: 0203c20865827366ecd8fa07b2db74d255549ed1
Summary:
#19975 was separated by 2 PRs.
This one:
Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.
At this moment both functions just operate with strides and doesn't store any tensor state.
(Original RFC #19092)
-----
Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).
Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.
1. `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.
- Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.
- Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.
`x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.
2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.
- `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.
- `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.
Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455
Differential Revision: D15341577
Pulled By: VitalyFedyunin
fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19816
We need this for quantization for bias
add third argument of ScalarType to `quantize_linear`
Differential Revision: D15094174
fbshipit-source-id: f19ec8f4716cf5fe0aa21b38d45af6d27c9ab377
Summary:
The current variance kernels compute mean at the same time. Many times we want both statistics together, so it seems reasonable to have a kwarg/function that allows us to get both values without launching an extra kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18731
Differential Revision: D14726082
Pulled By: ifedan
fbshipit-source-id: 473cba0227b69eb2240dca5e61a8f4366df0e029
Summary:
We don't need to overlay vc env when not using ninja. CMake will deal with it automatically. Overlaying is a no-op when the env is the same with the generator specified but will generate the error "Cannot find CMAKE_CXX_COMPILER" when they are different.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20417
Differential Revision: D15317081
Pulled By: ezyang
fbshipit-source-id: 5d9100321ecd593e810c31158f22c67d3e34973b
Summary:
This PR is an intermediate step toward the ultimate goal of eliminating "caffe2" in favor of "torch". This PR moves all of the files that had constituted "libtorch.so" into the "libcaffe2.so" library, and wraps "libcaffe2.so" with a shell library named "libtorch.so". This means that, for now, `caffe2/CMakeLists.txt` becomes a lot bigger, and `torch/CMakeLists.txt` becomes smaller.
The torch Python bindings (`torch_python.so`) still remain in `torch/CMakeLists.txt`.
The follow-up to this PR will rename references to `caffe2` to `torch`, and flatten the shell into one library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17783
Differential Revision: D15284178
Pulled By: kostmo
fbshipit-source-id: a08387d735ae20652527ced4e69fd75b8ff88b05
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20234
The differences with the existing function _dist_broadcast_coalesced
is that this one works for both CPU and CUDA tensors and that it has a
maximum number of in flight operations.
This should be the final change needed to have only a single version
of DistributedDataParallel that both supports CPU and CUDA models, or
even a mix of both.
See #17757 for more information.
Reviewed By: mrshenli
Differential Revision: D15228099
fbshipit-source-id: a2113ba6b09b68cb5328f49f4c1960031eb43c93
Summary:
Added deprecation warnings for the masked methods and enabled them for a bool tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19140
Differential Revision: D14888021
Pulled By: izdeby
fbshipit-source-id: 0e42daf8f3732ca29f36d10485402bfc502716ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19688
Minor changes to hipify script to take extra folders.
Reviewed By: bddppq
Differential Revision: D15068427
fbshipit-source-id: e2e792c8227cbd0e15fd2564f87d740a62c477da
Summary:
* adds TORCH_API and AT_CUDA_API in places
* refactor code generation Python logic to separate
caffe2/torch outputs
* fix hip and asan
* remove profiler_cuda from hip
* fix gcc warnings for enums
* Fix PythonOp::Kind
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554
Differential Revision: D15082727
Pulled By: kostmo
fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19607
Explicit is better than implicit - it's pretty hard to debug where particular file is if it's not greppable.
As a follow up step - we should look whether we can just include build_variables.py in CMake directly to share setups of two build systems
Reviewed By: ezyang
Differential Revision: D15023348
fbshipit-source-id: 600ef2d1871bc28530c6a02681b284f7499904df
Summary:
We would previously have statements like
```
set_history(flatten_tensor_args( result ), grad_fn);
```
Internally, {set,rebase}_history would check grad_fn and short circuit if it is nullptr. However, this means that we are executing the expression `flatten_tensor_args( result )` and immediately throwing away the results. This was causing unnecessary allocations + overhead.
My JIT overhead benchmark script (with custom benchmark method):
```
import torch, time
torch.jit.script
def add(x, y):
return x + y
a = torch.rand([])
b = torch.rand([])
niter = 1000000
with torch.no_grad():
s = time.time()
add.__getattr__('forward').benchmark(niter, a, b)
e = time.time() - s
print('overhead per call (us)', e / niter * 1e6)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19623
Differential Revision: D15053399
Pulled By: jamesr66a
fbshipit-source-id: 8777e1a2b5c5a5bbd3a035b7247c8154c5fc4aa6
Summary:
in functional interfaces we do boolean dispatch, but all to max_pool\*d_with_indices. This change it to emit max_pool\*d op instead when it's not necessary to expose with_indices ops to different backends (for jit).
It also bind max_pool\*d to the torch namespace, which is the same behavior with avg_pool\*d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19449
Differential Revision: D15016839
Pulled By: wanchaol
fbshipit-source-id: f77cd5f0bcd6d8534c1296d89b061023a8288a2c
Summary:
Changelog:
- Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498
Differential Revision: D15029901
Pulled By: ezyang
fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a