Commit Graph

22 Commits

Author SHA1 Message Date
402ae09e41 [BE] fix typos in c10/ (#156078)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156078
Approved by: https://github.com/malfet, https://github.com/cyyever
2025-06-18 10:24:44 +00:00
a5df6ffbc2 Improve IPC for Expandable Segments to use fabric handle when possible (#156074)
Improve upon https://github.com/pytorch/pytorch/pull/130890 , inspired by https://github.com/pytorch/pytorch/pull/130890#issuecomment-2278882984 , we can automatically use the fabric handle for IPC when possible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156074
Approved by: https://github.com/ngimel, https://github.com/malfet
2025-06-18 05:22:06 +00:00
cyy
ce94b212c7 [Environment Variable][Rebase] Use thread-safe getenv functions (#140200)
Use our thread-safe getenv wrappers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140200
Approved by: https://github.com/kwen2501, https://github.com/eqy
2025-05-02 00:41:49 +00:00
6cbf97ede8 [ROCm] enable HIPMallocAsyncAllocator (#149145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145
Approved by: https://github.com/izaitsevfb

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-03-19 23:42:35 +00:00
e1d143cb7b Revert "[ROCm] enable HIPMallocAsyncAllocator (#149145)"
This reverts commit ee1a2b7810126258ce64d1e22b59fae81a3f7bcb.

Reverted https://github.com/pytorch/pytorch/pull/149145 on behalf of https://github.com/izaitsevfb due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/149145#issuecomment-2738115728))
2025-03-19 21:12:13 +00:00
ee1a2b7810 [ROCm] enable HIPMallocAsyncAllocator (#149145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-03-19 03:59:55 +00:00
9d37b501db Revert "[ROCm] enable HIPMallocAsyncAllocator (#149145)"
This reverts commit 2e02c07a5d1c432547542f90de2885be9ffd13cf.

Reverted https://github.com/pytorch/pytorch/pull/149145 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally.  @albanD, might you be able to help get this PR landed? See D71214814 for more details on the failure. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/149145#issuecomment-2730104736))
2025-03-17 16:17:02 +00:00
2e02c07a5d [ROCm] enable HIPMallocAsyncAllocator (#149145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149145
Approved by: https://github.com/jeffdaily
2025-03-14 18:21:27 +00:00
a575ce0dc6 [PyTorch Pinned Allocator] Add support of background thread to process events (#135524)
Summary: Currently we process events in the regular allocation path and we call cudaEventQuery to check on the events and this path can take some locks in libcuda driver. Its not entirely needed to do process events in the allocation path, we could move this to a background thread and keep processing events regularly and put the freed block to the free list.

Differential Revision: D62396585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135524
Approved by: https://github.com/zyan0
2024-09-17 21:08:10 +00:00
48d18fbd4c [PyTorch CUDA Allocator] Allow reuse of non-split blocks with better rounding (#136174)
Summary:
This diff adds an option to round the non-split blocks in caching allocator so that they can be reused without causing lots of fragmentation for large memory segments.

For example, if we specify max_split memory size as 400MB, then all allocations more than 400MB will not be split. Lets say, we allocated some 1024MB blocks and these are cached in the allocator blocks. If we request a new 500MB block, we round it to nearest power-2-division, thats 512MB, we add default kLargeBuffer of 20MB, that will be 532MB and since 532MB is less than existing 1024MB block, the 1024MB will not be used for this allocation, instead a new 512MB block will be created. In this diff, we provide an option to cofigure the kLargeBuffer for rounding and expose as a configurable option, so 512MB + max_non_split_rounding_size and if thats greater than 1024MB, we will use te 1024MB and we wont create a new 512MB block using cudaMalloc. This option is added so that we can pre-allocate some large blocks so that we can reuse them as much as possible and we dont stall on calling cudaMalloc.

Differential Revision: D62758758

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136174
Approved by: https://github.com/zyan0
2024-09-17 19:08:44 +00:00
277ab8a4c0 Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449)"
This reverts commit a56e057814565b2ae33b2106b4d0136179aa18f8.

Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/jeanschmidt due to Broken internal signals, @albanD please help get this sorted :) ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2069716129))
2024-04-22 14:44:44 +00:00
cyy
a56e057814 [Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449)
This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449
Approved by: https://github.com/malfet, https://github.com/albanD
2024-04-19 13:39:41 +00:00
61bc188f42 Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449)"
This reverts commit b51f66c1950a582dd18d1b2ee67df840a8c4dbbe.

Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/malfet due to Broke gcc9 builds ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2064936414))
2024-04-18 18:53:59 +00:00
cyy
b51f66c195 [Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449)
This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449
Approved by: https://github.com/albanD
2024-04-18 13:35:48 +00:00
f5049de242 Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449)"
This reverts commit 5bef127c2ea49280e7fda4f9fa7cad6fa4078e7d.

Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/PaliC due to your using TORCH_INTERNAL_ASSERT incorrectly ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2062696010))
2024-04-17 23:44:00 +00:00
cyy
5bef127c2e [Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449)
This PR is the beginning of attempts to wrap thread-unsafe getenv and set_env functions inside a RW mutex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119449
Approved by: https://github.com/albanD
2024-04-16 04:39:20 +00:00
cyy
fb10e13000 [Clang-tidy header][24/N] Fix clang-tidy warnings on c10/cuda/*.{cpp,h} (#120781)
This PR begins to clean clang-tidy warnings of code in c10/cuda.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120781
Approved by: https://github.com/ezyang
2024-03-15 05:03:22 +00:00
7973ac586d [Memory Snapshot] Add CUDAAllocatorConfig details into snapshot metadata (#119404)
Summary:
Include the CUDAAllocatorConfig at the time of snapshot into the snapshot file. These include adding variables:

```
  double garbage_collection_threshold;
  size_t max_split_size;
  size_t pinned_num_register_threads;
  bool expandable_segments;
  bool release_lock_on_cudamalloc;
  bool pinned_use_cuda_host_register;
  std::string last_allocator_settings;
  std::vector<size_t> roundup_power2_divisions;
```

Test Plan:
`PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True ` produces
```
{'PYTORCH_CUDA_ALLOC_CONF': 'expandable_segments:True',
 'max_split_size': -1,
 'garbage_collection_threshold': 0.0,
 'expandable_segments': True,
 'pinned_num_register_threads': 1,
 'release_lock_on_cudamalloc': False,
 'pinned_use_cuda_host_register': False,
 'roundup_power2_divisions': {'1': 0,
  '2': 0,
  '4': 0,
  '8': 0,
  '16': 0,
  '32': 0,
  '64': 0,
  '128': 0,
  '256': 0,
  '512': 0,
  '1024': 0,
  '2048': 0,
  '4096': 0,
  '8192': 0,
  '16384': 0,
  '32768': 0}}
```
`PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:2000,roundup_power2_divisions:[256:1,512:2,1024:4,>:8]"` produces
```
{'PYTORCH_CUDA_ALLOC_CONF': 'max_split_size_mb:2000,roundup_power2_divisions:[256:1,512:2,1024:4,>:8]',
 'max_split_size': 2097152000,
 'garbage_collection_threshold': 0.0,
 'expandable_segments': False,
 'pinned_num_register_threads': 1,
 'release_lock_on_cudamalloc': False,
 'pinned_use_cuda_host_register': False,
 'roundup_power2_divisions': {'1': 1, '2': 1, '4': 1, '8': 1, '16': 1, '32': 1, '64': 1, '128': 1, '256': 1, '512': 2, '1024': 8, '2048': 8, '4096': 8, '8192': 8, '16384': 8, '32768': 8}
}
```

Differential Revision: D53536199

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119404
Approved by: https://github.com/zdevito
2024-02-17 01:16:37 +00:00
cyy
b72ddbab60 [Clang-tidy header][15/N] Enable clang-tidy on headers in c10/cuda and c10/mobile (#116602)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116602
Approved by: https://github.com/ezyang
2024-01-18 08:15:50 +00:00
53e32d12c4 [c10] Use nested namespace in c10/cuda (#116464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116464
Approved by: https://github.com/Skylion007
2023-12-27 23:14:00 +00:00
64583c4d04 [CUDA Host Allocator] Add support of CudaHostRegister (#108488)
Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister.

Differential Revision: D45843715

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108488
Approved by: https://github.com/zdevito
2023-10-06 04:13:02 +00:00
30c4c6ff9b [PyTorch CCA] Refactor caching allocator config code (#110123)
Summary: This diff refactors the code by moving CUDAAllocatorConfig into the header file. This config refactoring is done so that we can use the same config code for CUDA pinned memory as well.

Test Plan: sandcastle

Differential Revision: D49653265

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110123
Approved by: https://github.com/zdevito
2023-10-04 14:58:23 +00:00