Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
### Motivation
Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations.
Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming.
Hence with this PR we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded.
The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices
### Changes
Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU.
Include code to check if intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests
### Additional Context
please refer the following RFC : https://github.com/pytorch/rfcs/pull/63/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126970
Approved by: https://github.com/albanD
see the issue https://github.com/pytorch/pytorch/issues/127636 to for details about the issue, TLDR is that
when inlining is enabled, we create a fake tensor while tracing in dynamo and try to perform aten.add.Tensor between
two tensor of different types, with out inlining we do not hit that operation during tracing.
```
Failed running call_function <built-in function add>(*(FakeTensor(..., size=(20, 20), grad_fn=<AddBackward0>), FakeTensor(..., device='cuda:0', size=(20, 20))), **{}):
Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices cpu, cuda:0
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128023
Approved by: https://github.com/anijain2305
ghstack dependencies: #127487, #127553
This PR excises opcheck's dependency on
torch.testing._internal.common_utils, (which comes with dependencies on
expecttest and hypothesis). We do this by moving what we need to
torch.testing._utils and adding a test for it.
Fixes#126870, #126871
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127292
Approved by: https://github.com/williamwen42
ghstack dependencies: #127291
yolo
iirc the a10g/sm86 runners have ~21 GB of space, so we can increase parallelism on it to 3. This results in about 6GB CUDA mem per proc. The previous calculation + 2 procs resulted in about 8 GB
Also fixes the the calc for per proc memory, assuming that CUDA context + anything else take about a little under 1GB of space (previous calc was .11 on about 7.5 - 8 GB <= .9GB)
Times on main are about 1.9-2.5hr per shard
This commit is around 1.6-2hr per shard
Risks: increase in flaky tests due to OOM
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125598
Approved by: https://github.com/huydhn
Fixes#117850
This PR:
* Adds the class name in the repro command
* Fixes the path to the test file for python 3.8 jobs (apparently `inspect.getfile(class_type)` returns a relative path in this older python version)
Before (in python 3.8):
```sh
PYTORCH_TEST_WITH_DYNAMO=1 python test_autograd.py -k test_foo
```
After:
```sh
PYTORCH_TEST_WITH_DYNAMO=1 python test/test_autograd.py -k TestAutograd.test_foo
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125498
Approved by: https://github.com/huydhn, https://github.com/janeyx99
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources.
The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122106
Approved by: https://github.com/lezcano
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.
Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0
| Repository | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7 | 251.8 | 351.1 | 274.9 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
Add serial marker for individual tests so the test file can be removed from the ci serial list
Run serial marked tests first in serial
Run all other tests afterwards in parallel
Slowly reduce list and mark individual tests as serial instead
Hope # of serial tests is small so sharding evenness doesn't get too messed up
Hopefully can do 3 procs for sm86 and cpu?
serial no longer looks like a real word to me
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124085
Approved by: https://github.com/seemethere, https://github.com/malfet
Check that the `classname` attribute actually exists.
#122017
I expect this route to happen very rarely
At a certain point, we should just remove this parsing altogether since everything uses pytest now...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122103
Approved by: https://github.com/huydhn
`Tensor.__repr__` calls functions which can perform logging which ends up logging `self` (with `__repr__`) causing an infinite loop. Instead of logging all the args in FakeTensor.dispatch log the actual parameters (and use `id` to log the tensor itself).
The change to torch/testing/_internal/common_utils.py came up during testing - in some ways of running the test parts was `('test', 'test_testing.py')` and so `i` was 0 and we were doing a join on `()` which was causing an error.
Repro:
```
import torch
from torch.testing import make_tensor
from torch._subclasses.fake_tensor import FakeTensor, FakeTensorMode
t = torch.sparse_coo_tensor(((0, 1), (1, 0)), (1, 2), size=(2, 2))
t2 = FakeTensor.from_tensor(t, FakeTensorMode())
print(repr(t2))
```
and run with `TORCH_LOGS=+all`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120206
Approved by: https://github.com/yanboliang, https://github.com/pearu
It's a bit annoying to have to read through the test name in verbose mode just to see what the test's sentinel file is actually called when encountering an unexpected success. Now that we have sentinel files, we can directly list the file path from root in the error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120766
Approved by: https://github.com/Skylion007
Added a `torch.Tensor` method that defines how to transform `other`, a value in the state dictionary, to be loaded into `self`, a param/buffer in an `nn.Module` before swapping via `torch.utils.swap_tensors`
* `param.module_load(sd[key])`
This method can be overridden using `__torch_function__`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117913
Approved by: https://github.com/albanD