Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56976
Band-aid fix for #54282
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D28020401
Pulled By: ezyang
fbshipit-source-id: 50546d5275eade408d65e9c883999fb3b65ff55a
Summary:
Tentative fix for https://github.com/pytorch/pytorch/issues/55027.
Wraps cub import in its name space so that static variables used by cub and thrust don't conflict if they end up in the different libraries when torch is built with BUILD_SPLIT_CUDA. cub variables end up in their own namespace, thrust variables are unwrapped, so they don't clash.
This also allows extensions to use cub without wrapping it (thrust will still be problematic). The solution to allowing extensions to use thrust is to stop using thrust in pytorch completely.
Now importing cub and importing thrust cannot coexist, so I had to move nonzero to its own file, and remove reliance on thrust functions for it. Nonzero now uses cub only.
Also, we cannot selectively import just some of cub headers, we are forced to import `cub/cub.cuh`, which is not great.
Caffe2 ops using cub are not touched (there are too many), so mixing caffe2 and torch will (can) still result in the same bug. We are moving towards disabling c2 ops, so I think this is fine.
Still, even with that compiler (correctly) warns about redefinition of `CUB_NS_PREFIX` because including `ATen/ATen.h` transitively includes `thrust/complex.h` and that in turn includes original (empty) definition of `CUB_NS_PREFIX`. We probably can just ignore this warning. Here's an example warning:
```
In file included from /data/users/ngimel/pytorch/aten/src/ATen/native/cuda/Nonzero.cu:9:
/data/users/ngimel/pytorch/aten/src/ATen/cuda/CubUtils.cuh:4: warning: "CUB_NS_PREFIX" redefined
#define CUB_NS_PREFIX namespace at{ namespace native{
In file included from /home/ngimel/local/cuda/include/thrust/system/cuda/config.h:76,
from /home/ngimel/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:33,
from /home/ngimel/local/cuda/include/thrust/iterator/detail/device_system_tag.h:23,
from /home/ngimel/local/cuda/include/thrust/iterator/iterator_traits.h:111,
from /home/ngimel/local/cuda/include/thrust/detail/type_traits/pointer_traits.h:23,
from /home/ngimel/local/cuda/include/thrust/type_traits/is_contiguous_iterator.h:27,
from /home/ngimel/local/cuda/include/thrust/type_traits/is_trivially_relocatable.h:19,
from /home/ngimel/local/cuda/include/thrust/detail/complex/complex.inl:20,
from /home/ngimel/local/cuda/include/thrust/complex.h:1031,
from /data/users/ngimel/pytorch/c10/util/complex.h:9,
from /data/users/ngimel/pytorch/c10/core/ScalarType.h:4,
from /data/users/ngimel/pytorch/c10/core/Scalar.h:10,
from /data/users/ngimel/pytorch/build/aten/src/ATen/core/TensorBody.h:8,
from /data/users/ngimel/pytorch/aten/src/ATen/Tensor.h:3,
from /data/users/ngimel/pytorch/aten/src/ATen/Context.h:4,
from /data/users/ngimel/pytorch/aten/src/ATen/ATen.h:9,
from /data/users/ngimel/pytorch/aten/src/ATen/native/cuda/Nonzero.cu:1:
/home/ngimel/local/cuda/include/cub/util_namespace.cuh:43: note: this is the location of the previous definition
#define CUB_NS_PREFIX
```
We will need a lint rule to prevent people from including `cub/cub.cuh`, because this will lead to https://github.com/pytorch/pytorch/issues/55027 reappearing again for some sequence of operations (and will lead to errors with cub code in extensions).
Also, for this to work reliably we'll need to make sure that everything calling cub ends up in only one of libtorch_cuda_cu or libtorch_cuda_cpp, otherwise even namespace won't help (there still will be same symbols in 2 libraries).
Upd: libtorch_cuda_cpp and libtorch_cuda_cu still contain the same symbols, which means that there exists a sequence of operations that will cause cache bug to reappear, so this is not a solution, we need to adjust file lists for BUILD_SPLITC_CUDA:
```
(pytorch) [ngimel@ ~/local/pytorch/build/lib] nm libtorch_cuda_cu.so | grep PerDeviceAttributeCache | c++filt
000000000c6bf808 u guard variable for at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
000000000c600830 u guard variable for cub::GetPerDeviceAttributeCache<cub::PtxVersionCacheTag>()::cache
00000000018625e0 t at::native::cub::PerDeviceAttributeCache::DevicePayload at::native::cub::PerDeviceAttributeCache::operator()<at::native::cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}>(at::native::cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}&&, int)
00000000009ce630 t cub::PerDeviceAttributeCache::DevicePayload cub::PerDeviceAttributeCache::operator()<cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}>(cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}&&, int)
000000000c6bf820 u at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
000000000c600840 u cub::GetPerDeviceAttributeCache<cub::PtxVersionCacheTag>()::cache
(pytorch) [ngimel@ ~/local/pytorch/build/lib] nm libtorch_cuda_cpp.so | grep PerDeviceAttributeCache | c++filt
0000000000ad2d98 u guard variable for at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
0000000000ad2da0 u at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache
```
Upd2:
Moved TensorFactories.cu to torch_cuda_cu sources (see a change to caffe2/CMakeLists.txt), so now cub-related symbols are only in libtorch_cuda_cu. We'd need a test for that, any suggestions on how best to test it?
cc zasdfgbnm malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55292
Reviewed By: anjali411
Differential Revision: D27576442
Pulled By: ngimel
fbshipit-source-id: 1ef29503a342bb214794d34a42a47052092a66c1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53682
With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.
Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
in the same way as CUDA memory leak check, i.e., by wrapping
test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
ignore NotImplementedError when _ignore_not_implemented_error is True.
The implementation relies on a small amount of private API that hasn't
changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
all; there's probably a way to do it better
I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D26955539
Pulled By: ezyang
fbshipit-source-id: ac21c8734562497fdcca3b614a28010bc4c03d74
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52044 (`stack` dispatches to `cat`)
The way dispatcher works, currently this case happens only in CUDA kernel (CPU kernel is chosen if all inputs and out are on CPU). That is why the check is added only on the CUDA side.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53004
Reviewed By: albanD
Differential Revision: D27003956
Pulled By: mruberry
fbshipit-source-id: 818ea0f76153f4fa281740f30705e5ef018413f6
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
torch.logspace doesn't seem to have explained how integers are handled.
Add some clarification and some test when dtype is integral.
The CUDA implementation is also updated to be consistent with CPU implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47647
Reviewed By: gchanan
Differential Revision: D25843351
Pulled By: walterddr
fbshipit-source-id: 45237574d04c56992c18766667ff1ed71be77ac3
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356
Reviewed By: ngimel
Differential Revision: D25202268
Pulled By: mruberry
fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47225
Summary
-------
This PR implements Tensor.new_empty_strided. Many of our torch.* factory
functions have a corresponding new_* method (e.g., torch.empty and
torch.new_empty), but there is no corresponding method to
torch.empty_strided. This PR adds one.
Motivation
----------
The real motivation behind this is for vmap to be able to work through
CopySlices. CopySlices shows up a lot in double backwards because a lot
of view functions have backward formulas that perform view+inplace.
e0fd590ec9/torch/csrc/autograd/functions/tensor.cpp (L78-L106)
To support vmap through CopySlices, the approach in this stack is to:
- add `Tensor.new_empty_strided` and replace `empty_strided` in
CopySlices with that so that we can propagate batch information.
- Make some slight modifications to AsStridedBackward (and add
as_strided batching rule)
Please let me know if it would be better if I squashed everything related to
supporting vmap over CopySlices together into a single big PR.
Test Plan
---------
- New tests.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D24741688
Pulled By: zou3519
fbshipit-source-id: b688047d2eb3f92998896373b2e9d87caf2c4c39
Summary:
**BC-breaking Note:**
This PR disallows passing in a generator of a different device than the tensor being created during `randperm` execution. For example, the following code which used to work no longer works.
```
> torch.randperm(3, device='cuda', generator=torch.Generator(device='cpu'))
tensor([0, 1, 2], device='cuda:0')
```
It now errors:
```
> torch.randperm(3, device='cuda', generator=torch.Generator(device='cpu'))
RuntimeError: Expected a 'cuda:0' generator device but found 'cpu'
```
**PR Summary:**
Fixes https://github.com/pytorch/pytorch/issues/44714
Also added + ran tests to ensure this functionality.
Disclaimer: More work needs to be done with regards to small cuda tensors when a generator is specified, look at the issue thread for more details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47022
Reviewed By: samestep
Differential Revision: D24608237
Pulled By: janeyx99
fbshipit-source-id: b83c47219c7816d93f938f7ce86dc8857513961b
Summary:
**BC-breaking note**
This change is BC-breaking for C++ callers of linspace and logspace if they were providing a steps argument that could not be converted to an optional.
**PR note**
This PR deprecates calling linspace and logspace wihout setting steps explicitly by:
- updating the documentation to warn that not setting steps is deprecated
- warning (once) when linspace and logspace are called without steps being specified
A test for this behavior is added to test_tensor_creation_ops. The warning only appears once per process, however, so the test would pass even if no warning were thrown. Ideally there would be a mechanism to force all warnings, include those from TORCH_WARN_ONCE, to trigger.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43860
Reviewed By: izdeby
Differential Revision: D23498980
Pulled By: mruberry
fbshipit-source-id: c48d7a58896714d184cb6ff2a48e964243fafc90
Summary:
As part of our continued refactoring of test_torch.py, this takes tests for tensor creation ops like torch.eye, torch.randint, and torch.ones_like and puts them in test_tensor_creation_ops.py. There hare three test classes in the new test suite: TestTensorCreation, TestRandomTensorCreation, TestLikeTensorCreation. TestViewOps and tests for construction of tensors from NumPy arrays have been left in test_torch.py. These might be refactored separately into test_view_ops.py and test_numpy_interop.py in the future.
Most of the tests ported from test_torch.py were left as is or received a signature change to make them nominally "device generic." Future work will need to review test coverage and update the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43104
Reviewed By: ngimel
Differential Revision: D23280358
Pulled By: mruberry
fbshipit-source-id: 469325dd1a734509dd478cc7fe0413e276ffb192