Commit Graph

11 Commits

Author SHA1 Message Date
d8c8ba2440 Fix unused Python variables in test/[e-z]* (#136964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964
Approved by: https://github.com/justinchuby, https://github.com/albanD
2024-12-18 23:02:30 +00:00
6753ee127c Allow torch.cuda.memory.mem_get_info to take a device str argument with an unspecified device index. (#132616)
`torch.cuda.memory.mem_get_info` allows device strings given the current type hints. However, `device = torch.device('cuda')` leads to `device.index = None`, which results in downstream problems. Setting `optional=True` will insert the default device index in such cases.

Fixes #132583

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132616
Approved by: https://github.com/soulitzer
2024-08-06 13:19:46 +00:00
ba48cf6535 [BE][Easy][6/19] enforce style for empty lines in import segments in test/ (#129757)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757
Approved by: https://github.com/ezyang
2024-07-17 06:42:37 +00:00
c09205a057 Deprecate device-specific GradScaler autocast API (#126527)
# Motivation

## for `torch.amp.GradScaler`,
- `torch.cpu.amp.GradScaler(args...)` is completely equivalent to `torch. amp.GradScaler("cpu", args...)`.
- `torch.cuda.amp.GradScaler(args...)` is completely equivalent to `torch.amp.GradScaler("cuda", args...)`.

So, we intend to depreate them and **strongly recommend** developer to use `torch.amp.GradScaler`.

## for `custom_fwd` and `custom_bwd`,
this is a good solution to make the custom function run with or without effect even in an autocast-enabled region and can be shared by other backends, like CPU and XPU.
So we generalize it to be device-agnostic and put them int `torch/amp/autocast_mode.py` and re-expose to `torch.amp.custom_fwd` and `torch.amp.custom_bwd`. Meanwhile, we deprecate `torch.cuda.amp.custom_fwd` and `torch.cuda.amp.custom_bwd`.

# Additional Context
Add UT to cover the deprecated warning.
No need for more UTs to cover the functionality of `torch.amp.custom_f/bwd`, the existing UTs that previously covered the functionality of `torch.cuda.amp.custom_f/bwd` can cover them.
To facilitate the review, we separate these code changes to two PRs. The first PR cover `torch.amp.GradScaler`. The follow-up covers `custom_fwd` and `custom_bwd`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126527
Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/janeyx99, https://github.com/EikanWang
2024-05-25 06:41:34 +00:00
b08072f645 [CI] Relax per proc memory by a little bit, mark a test as serial (#125960)
test failure is here https://github.com/pytorch/pytorch/actions/runs/9036789873/job/24836020415

* OOMs etc rel to https://github.com/pytorch/pytorch/pull/125598
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125960
Approved by: https://github.com/huydhn
2024-05-10 21:11:39 +00:00
d5182bb75b Enable UFMT on test/test_cuda*.py (#124352)
Part of: #123062

Ran lintrunner on:

- test/test_cuda.py
- test/test_cuda_expandable_segments.py
- test/test_cuda_multigpu.py
- test/test_cuda_nvml_based_avail.py
- test/test_cuda_primary_ctx.py
- test/test_cuda_sanitizer.py
- test/test_cuda_trace.py

Detail:

```bash
$ lintrunner -a --take UFMT --all-files
ok No lint issues.
Successfully applied all patches.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124352
Approved by: https://github.com/ezyang
2024-04-25 18:31:08 +00:00
b6139b1e57 [PyTorch][CUDA Caching Allocator] Export sync-stream-and-free-HBM counter in memory_stats for performance debugging (#120050)
Differential Revision: D53734057

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120050
Approved by: https://github.com/xw285cornell
2024-02-27 04:34:53 +00:00
bacbad5bc9 add GradScaler on CPU (#109993)
Step 2 of https://github.com/pytorch/pytorch/issues/111559.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109993
Approved by: https://github.com/jgong5, https://github.com/ezyang
2024-01-29 23:42:35 +00:00
1600585219 Revert "Fix test failure in TestCudaMultiGPU.test_cuda_device_memory_allocated (#105501)"
This reverts commit e6fd8ca3eef2b85b821936829e86beb7d832575c.

Reverted https://github.com/pytorch/pytorch/pull/105501 on behalf of https://github.com/zou3519 due to We've agreed that the PR is wrong. It didn't actually break anything. ([comment](https://github.com/pytorch/pytorch/pull/105501#issuecomment-1648005842))
2023-07-24 14:18:44 +00:00
e6fd8ca3ee Fix test failure in TestCudaMultiGPU.test_cuda_device_memory_allocated (#105501)
The test

f508d3564c/test/test_cuda_multigpu.py (L1282-L1290)

Torch cuda caching allocator may cache the allocation and cause the "new_alloc" being the same as the "old_alloc".
```python
     self.assertGreater(memory_allocated(0), current_alloc[0])
```

I suggest that we use `assertGreaterEqual` instead of `assertGreater` in the test.

Individually running only this test does not make it fail but running it together with other tests from the same test module will make it fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105501
Approved by: https://github.com/zou3519
2023-07-20 19:59:10 +00:00
c3e4a67905 Refactor multigpu tests to test_cuda_multigpu (#104059)
Mostly refactor, that moves all the tests from `test_cuda` that benefit from multiGPU environment into its own file.

- Add `TestCudaMallocAsync` class for Async tests ( to separate them from `TestCudaComm`)
- Move individual tests from `TestCuda` to `TestCudaMultiGPU`
- Move `_create_scaling_models_optimizers` and `_create_scaling_case` to `torch.testing._internal.common_cuda`
- Add newly created `test_cuda_multigpu` to the multigpu periodic test

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at f4d46fa</samp>

This pull request fixes a flaky test and improves the testing of gradient scaling on multiple GPUs. It adds verbose output for two CUDA tests, and refactors some common code into helper functions in `torch/testing/_internal/common_cuda.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104059
Approved by: https://github.com/huydhn
2023-06-27 05:32:05 +00:00