Fixes#162812
Adds support for either passing a device directly into get_rng_state, or passing in a string or int (which is then wrapped into a device inside, as in torch.cuda.get_rng_state).
I wasn't exactly sure where tests for this should go, please let me know. I used this script for testing:
```python
import torch
# note: when running with CUDA GPU, first three tests will give the same result,
# as will the last two
# test with no device specified
print(torch.get_rng_state())
# test with CPU
cpu_device = torch.device("cpu")
print(torch.get_rng_state(cpu_device))
# test with direct name
print(torch.get_rng_state("cpu"))
# test with CUDA
cuda_device = torch.device("cuda:0")
print(torch.get_rng_state(cuda_device))
# test with integer
print(torch.get_rng_state(0))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163034
Approved by: https://github.com/ezyang, https://github.com/cyyever
Refactor torchscript based exporter logic to move them to a single (private) location for better code management. Original public module and method apis are preserved.
- Updated module paths in `torch/csrc/autograd/python_function.cpp` accordingly
- Removed `check_onnx_broadcast` from `torch/autograd/_functions/utils.py` because it is private&unused
@albanD / @soulitzer could you review changes in `torch/csrc/autograd/python_function.cpp` and
`torch/autograd/_functions/utils.py`? Thanks!
## BC Breaking
- **Deprecated members in `torch.onnx.verification` are removed**
Differential Revision: [D81236421](https://our.internmc.facebook.com/intern/diff/D81236421)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161323
Approved by: https://github.com/titaiwangms, https://github.com/angelayi
This PR adds the `@deprecate` decorator for internal functions which we are prepping for deprecation. Add it on top of an internal function to emit a deprecation warning + allow bc with the non internal version of the function.
Tested with `python test/test_utils.py TestDeprecate.test_deprecated `
Furthermore, testing with a modified version of the tes in the pr gives something like this which is what we want
```
/home/sahanp/repos/pytorch/test/test_utils.py:1239: UserWarning: deprecated_api is DEPRECATED, please consider using an alternative API(s).
deprecated_api(1, 2)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155127
Approved by: https://github.com/albanD
Co-authored-by: albanD <desmaison.alban@gmail.com>
Not requiring all functions to have types allows a lot of 'Any' types to slip in - which poison types and make mypy unable to properly typecheck the code. I want to flip the default so that new files are required to have fully typed defs and we can have a burndown list of files that fail to require full types.
The preceding stack of PRs (cut up simply to limit the number of file changes per PR "reasonable") adds `# mypy: allow-untyped-defs` to any file which didn't immediately pass mypy with the flag flipped. Due to changing files and merge conflicts it will probably be necessary to have several passes through before landing this final PR which turns the option on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127836
Approved by: https://github.com/oulgen, https://github.com/Skylion007
Previously, we were checking `len(device_types)` where `device_types` is a `list`. This meant that if there were multiple inputs, we would see something like `device_types = ["cuda", "cuda"]` and a false positive warning. We should check `len(set(device_types))`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122726
Approved by: https://github.com/soulitzer
`get_device_states` doesn't recursively look into nested lists/dicts to find tensors. As a result, activation checkpointing for such inputs results in silent incorrect results as `get_device_states` returns an empty result and no rng is saved as a result here: https://github.com/pytorch/pytorch/blob/main/torch/utils/checkpoint.py#L188 since `fwd_device_states` is empty.
Fixed this by using `tree_map` for both `get_device_states` and `_infer_device_type`. Also added appropriate unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121462
Approved by: https://github.com/soulitzer
Fixes#115331.
This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:
- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`
[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/huydhn
Fixes#115331.
This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:
- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`
[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD
Fixes #[117504](https://github.com/pytorch/pytorch/issues/117504)
Re-engineering Hipify Trie:
(1) Re-engineering Trie.
(2) More documentation or comments for easier understanding
(3) Created a set of unit test (class `TestHipifyTrie`) to test the Trie data structure and APIs.
Test:
```
root@xxx:/development/pytorch# pytest test/test_utils.py -k TestHipifyTrie
==================================================================================================== test session starts ====================================================================================================
platform linux -- Python 3.9.18, pytest-7.3.2, pluggy-1.3.0
rootdir: /dockerx/development/pytorch
configfile: pytest.ini
plugins: flakefinder-1.1.0, rerunfailures-13.0, xdist-3.3.1, xdoctest-1.1.0, cpp-2.3.0, shard-0.1.2, hypothesis-5.35.1
collected 11453 items / 11445 deselected / 8 selected
Running 8 items in this shard
test/test_utils.py ........ [100%]
============================================================================================ 8 passed, 11445 deselected in 3.84s ============================================================================================
root@xxx:/development/pytorch#
```
Also performed diff on modified and generated contents by this tool with the original code and the new code of the hipify_python.py script. Verified no difference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118433
Approved by: https://github.com/malfet, https://github.com/jeffdaily
Since MYPYNOFOLLOW is about to turn on import following, there's no
reason to keep test_utils.py in the MYPYNOFOLLOW config. Moreover, I'm
not sure it still takes 10 minutes to typecheck this file; adding it to
the MYPY config takes `lintrunner --take MYPY --all-files` from 53s to
57s on my machine, which is substantial but not horrible. I guess we'll
see how it fares on CI.
(Note that we cannot simply merge MYPY and MYPYNOFOLLOW because the
latter config turns on `disallow_any_generics` and so is in that sense
stricter than the MYPY config.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113745
Approved by: https://github.com/clee2000
caused by #109866
The test registers new device module, the above pr checks for xpu, sees that it got registered and uses it but its a dummy module.
This causes any test after it to fail so I "clean up" the registered module
Another possible solution would be to run this test last lol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110254
Approved by: https://github.com/huydhn
This adds some utilities for conveniently working with fast combined CapturedTraceback from Python. The main goal of these utilities is to make it easier for people to use CapturedTraceback as a drop-in replacement for `traceback.extract_stack`, which is 20x slower than CapturedTraceback.
I port symbolic shapes to use the new CapturedTraceback code, to validate that the APIs work and are useful.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107358
Approved by: https://github.com/zdevito, https://github.com/albanD
ghstack dependencies: #107438
- Add get_printoptions and printoptions context manager
- Improve edgeitems handling when it is zero
- Add render_call which can be used to conveniently print command
line arguments of a function call, while suppressing actual
tensor data
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102623
Approved by: https://github.com/albanD
Fixes #ISSUE_NUMBER
1、optimize the func name of AMP in custom device module,use `torch.foo.set_autocast_enable` instead of `torch.foo.set_autocast_foo_enable`.
2、In AMP with custom device,use `custom_device_mod.set_autocast_enable` instead of `getattr(custom_device_mod, "set_autocast_enable"`, because we have check that `custom_device_mod` hasattr `set_autocast_enable` before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98052
Approved by: https://github.com/bdhirsh
Fixes #ISSUE_NUMBER
1、add amp support for custom backend
2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188
Approved by: https://github.com/bdhirsh
Fixes #ISSUE_NUMBER
1、add amp support for custom backend
2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188
Approved by: https://github.com/bdhirsh
Changes:
- bc-breaking change: The main difference between this and the old non-reentrant impl that it replaces is that we clear recomputed tensors on backward immediately upon unpack, even if retain_graph=True. This has the following additional implications:
- Accessing _saved_tensors multiple times will silently recompute forward multiple times.
- Accessing ctx.saved_tensor twice in the same backward will now raise an error.
- To avoid dealing with the potential consequences, early stopping has been hidden behind a global flag that is by default False, and can be enabled via a context manager. We can remove this in a follow up. Some features of nesting as a result do not work by default.
Before land:
- import to check for more bc-breakingness
- implement any workarounds for the bc-breaking-ness, if we decide on any
- update docs to reflect new lifetime of recomputed variables
- update docs to mention the early stop feature
Follow ups:
- enable early-stopping by default
- update docs/tutorial to feature nested use cases
Related docs:
- code comment: https://github.com/pytorch/pytorch/pull/90105/files#diff-9dcd955620b52ce128e18e3567be88edbb238810460d1288a86fabc20e483b30R448
- design doc: https://docs.google.com/document/d/1UDLhTNv6_kvuDTRlsjfj9WdqtNaQNr8ahrvdBIB6914/edit#
- retains_grad <> checkpiont https://docs.google.com/document/d/1maiGmuFUxysQL0AdYUU88kngAaXh_L0XpDcLDh_5Ors/edit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90105
Approved by: https://github.com/albanD