pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Maggie Moss	f414aa8e0d	Add pyrefly suppressions (3/n) (#164588 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: uncomment lines in the pyrefly.toml file step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/bb31574ac8a59893c9cf52189e67bb2d after: 0 errors (1,970 ignored) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164588 Approved by: https://github.com/oulgen	2025-10-03 22:03:03 +00:00
Yuanyuan Chen	60c2bdedcd	Replace Literal[None] with None in typing (#163489 ) This PR replaces Literal[None] with None in typing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163489 Approved by: https://github.com/Skylion007, https://github.com/mlazos	2025-09-22 22:10:08 +00:00
Yu, Guangye	c03d8d4082	Revert "Generalize torch._C._set_allocator_settings to be generic (#156175 )" (#161626 ) This reverts commit 908c5cc4c0f22d141776bde47c296b5186691855. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161626 Approved by: https://github.com/atalman ghstack dependencies: #161625	2025-08-27 21:37:14 +00:00
Natalia Gimelshein	726dce3c94	[nccl symm mem] don't use arg for mempool, correctly use symmetric registration in hooks (#161238 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/161238 Approved by: https://github.com/kwen2501, https://github.com/syed-ahmed	2025-08-25 03:09:32 +00:00
Yu, Guangye	84f7e88aef	Add unified memory APIs for torch.accelerator (#152932 ) # Motivation The following API will be put under torch.accelerator - empty_cache - max_memory_allocated - max_memory_reserved - memory_allocated - memory_reserved - memory_stats - reset_accumulated_memory_stats - reset_peak_memory_stats Pull Request resolved: https://github.com/pytorch/pytorch/pull/152932 Approved by: https://github.com/albanD ghstack dependencies: #138222	2025-08-08 17:41:22 +00:00
PyTorch MergeBot	74da2604c9	Revert "Add unified memory APIs for torch.accelerator (#152932 )" This reverts commit 15f1173e5d72d6d45faba4cecd135e0160f06c6f. Reverted https://github.com/pytorch/pytorch/pull/152932 on behalf of https://github.com/jithunnair-amd due to Broke ROCm periodic runs on MI300 e.g. https://github.com/pytorch/pytorch/actions/runs/16764977800/job/47470050573 ([comment](https://github.com/pytorch/pytorch/pull/138222#issuecomment-3164941815))	2025-08-07 16:34:36 +00:00
Yu, Guangye	15f1173e5d	Add unified memory APIs for torch.accelerator (#152932 ) # Motivation The following API will be put under torch.accelerator - empty_cache - max_memory_allocated - max_memory_reserved - memory_allocated - memory_reserved - memory_stats - reset_accumulated_memory_stats - reset_peak_memory_stats Pull Request resolved: https://github.com/pytorch/pytorch/pull/152932 Approved by: https://github.com/albanD ghstack dependencies: #138222	2025-08-06 02:22:18 +00:00
Yu, Guangye	908c5cc4c0	Generalize torch._C._set_allocator_settings to be generic (#156175 ) # Motivation This PR moves the implementation of `torch.cuda.memory._set_allocator_settings` to `torch._C._accelerator_setAllocatorSettings`. Since the original API was intended as a temporary/internal utility, I am not exposing the new function as a public API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156175 Approved by: https://github.com/albanD ghstack dependencies: #159629, #150312, #156165	2025-08-05 04:08:42 +00:00
PyTorch MergeBot	cb9b74872b	Revert "Generalize torch._C._set_allocator_settings to be generic (#156175 )" This reverts commit d3ce45012ed42cd1e13d5048b046b781f0feabe0. Reverted https://github.com/pytorch/pytorch/pull/156175 on behalf of https://github.com/guangyey due to Static initialization order issue impact the downstream repo ([comment](https://github.com/pytorch/pytorch/pull/150312#issuecomment-3142035444))	2025-08-01 03:24:54 +00:00
Yu, Guangye	d3ce45012e	Generalize torch._C._set_allocator_settings to be generic (#156175 ) # Motivation This PR moves the implementation of `torch.cuda.memory._set_allocator_settings` to `torch._C._accelerator_setAllocatorSettings`. Since the original API was intended as a temporary/internal utility, I am not exposing the new function as a public API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156175 Approved by: https://github.com/albanD ghstack dependencies: #149601, #157908, #150312, #156165	2025-07-30 06:37:15 +00:00
PyTorch MergeBot	6341311333	Revert "Add unified memory APIs for torch.accelerator (#152932 )" This reverts commit 2ad5c25cfc603c3656e6699d6137419dbb009495. Reverted https://github.com/pytorch/pytorch/pull/152932 on behalf of https://github.com/ZainRizvi due to Very sorry but this is still breaking internally. @albanD would you be able to help get this past the finish line? D78496124 has more details on the failure and the workaround might be to do something like what's in D78684669. To validate the fixes internally, you can follow the instructions here to ghimport the changes: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/138222#issuecomment-3100195370))	2025-07-22 01:01:41 +00:00
Yu, Guangye	2ad5c25cfc	Add unified memory APIs for torch.accelerator (#152932 ) # Motivation The following API will be put under torch.accelerator - empty_cache - max_memory_allocated - max_memory_reserved - memory_allocated - memory_reserved - memory_stats - reset_accumulated_memory_stats - reset_peak_memory_stats Pull Request resolved: https://github.com/pytorch/pytorch/pull/152932 Approved by: https://github.com/albanD ghstack dependencies: #138222	2025-07-17 01:56:01 +00:00
Xuehai Pan	4cc8b60d1b	[BE][1/16] fix typos in torch/ (#156311 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156311 Approved by: https://github.com/albanD	2025-07-09 11:02:22 +00:00
Xuehai Pan	3fd84a8592	[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format` (#144554 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144554 Approved by: https://github.com/soulitzer	2025-07-03 18:56:07 +00:00
PyTorch MergeBot	6459a5c7a9	Revert "Add unified memory APIs for torch.accelerator (#152932 )" This reverts commit 35e44067c4d9cc9be2652c0b9098885c5a321029. Reverted https://github.com/pytorch/pytorch/pull/152932 on behalf of https://github.com/Camyll due to internal build failures ([comment](https://github.com/pytorch/pytorch/pull/138222#issuecomment-3002206756))	2025-06-25 00:11:35 +00:00
Yu, Guangye	35e44067c4	Add unified memory APIs for torch.accelerator (#152932 ) # Motivation The following API will be put under torch.accelerator - empty_cache - max_memory_allocated - max_memory_reserved - memory_allocated - memory_reserved - memory_stats - reset_accumulated_memory_stats - reset_peak_memory_stats Pull Request resolved: https://github.com/pytorch/pytorch/pull/152932 Approved by: https://github.com/albanD ghstack dependencies: #138222	2025-06-24 07:57:48 +00:00
Syed Tousif Ahmed	f70c80105e	Enables NCCL symmetric memory kernels through mempool registration (#155134 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155134 Approved by: https://github.com/kwen2501 Co-authored-by: Ke Wen <kw2501@meta.com>	2025-06-21 23:24:04 +00:00
Ben Koopman	6959b5febe	Context on torch.cuda.memory._record_memory_history max_entries (#155889 ) Context on torch.cuda.memory._record_memory_history buffer behavior ## Description Answer questions: - Can I keep _record_memory_history() always enabled with the default max_entries=sys.maxsize (9223372036854775807)? Will it consume a significant amount of CPU RAM? - If I set max_entries to a lower value, e.g. 2000, will it keep the first 2000 entries and then stop recording or will it keep the most recent 2000 entries before each snapshot (fifo-style)? - What is the expected size on disk of the snapshots? Some KBs, MBs? Fixes #129674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155889 Approved by: https://github.com/ngimel	2025-06-19 00:44:43 +00:00
Shivam Raikundalia	1083bc749d	[Memory Snapshot] Add Flag to Toggle Global and Local Callbacks for Annotations (#154932 ) Summary: There are some cases where we want only local annotations for memory snapshot such as executing inside the cudastream callback, which cannot execute CUDA operators. Thus the cuda errors happen: Exception in RecordFunction callback: CUDA error: operation not permitted However, we need to have an option to turn on the globally so that on-demand snapshot can get annotations. Additionally, there may be some cases in which auto-trace will also want annotations using record functions so we expose the flag to the auto-trace as well. Test Plan: Run MVAI executable and see that the errors go away Rollback Plan: Differential Revision: D75831687 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154932 Approved by: https://github.com/mzzchy, https://github.com/sanrise	2025-06-04 23:15:19 +00:00
Natalia Gimelshein	f01e628e3b	Resubmit Remove MemPoolContext (#154042 ) (#154746 ) Summary: Per title Test Plan: Added tests + existing tests Differential Revision: D75695030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154746 Approved by: https://github.com/malfet	2025-05-31 01:21:54 +00:00
PyTorch MergeBot	d173ba5a75	Revert "Remove MemPoolContext (#154042 )" This reverts commit 3b38989b5f8f918cf1ad38bdade059608544af4b. Reverted https://github.com/pytorch/pytorch/pull/154042 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/154042#issuecomment-2921401100))	2025-05-30 06:53:37 +00:00
Natalia Gimelshein	3b38989b5f	Remove MemPoolContext (#154042 ) Removes MemPoolContext from custom user mempools. The ground truth for which pool should be used is in graph_pools active pool, and MemPoolContext just introduced an opportunity for the pool pointed to by MemPoolContext and active pool in graph_pools to go out of sync (see all the asserts in the code to make sure that happens, and yet it still could happen in a multithread scenario, see my recent PRs (#153990). Pull Request resolved: https://github.com/pytorch/pytorch/pull/154042 Approved by: https://github.com/albanD, https://github.com/syed-ahmed	2025-05-28 16:35:48 +00:00
Aaron Orenstein	6503b4a96e	Update to using mypy 1.15 (#154054 ) The BC break isn't real - mypy decided to start complaining about the way we were typing that function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154054 Approved by: https://github.com/Skylion007	2025-05-24 04:30:57 +00:00
Natalia Gimelshein	0cf61ca7e4	make use_mem_pool threadlocal (#153356 ) Partial fix for #152861, makes allocation to pool thread-local, but doesn't touch the second bug where multiple threads allocating to multiple pools error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153356 Approved by: https://github.com/Skylion007, https://github.com/eellison	2025-05-13 00:16:07 +00:00
Shivam Raikundalia	dbb4444ce3	[Memento] Add PT2 to Memory Snapshot (#152707 ) Summary: To add PT2 information to memory snapshot we piggyback off of the Kineto implementation using record_function similar to adding the user annotations. To do this we add the following: 1. Stack implementation that we instantiate to keep track of which compile context stack we are currently in (top element of the stack). The stack will be per device and thread-local since different threads of a process can be in different compile contexts at a given time. For this reason, we do not need to add mutexes to our stack impl since no two threads will touch a given stack 2. RecordFunction hooks to properly pipe the correct events to the compile context stack. These hooks are similar to the annotation ones in the fact that we just register them lazily and DO NOT unregister them. This is done out of convenience. In the future, we should save the handles and unregister them to minimize overhead after profiling is finished. As of now, we are registering this at the FUNCTION scope which is wide; however, we treat any function that does not start with "Torch-Compiled Region" as a no-op so we anticipate the difference in performance to be negligible during and after profiling. We also hide this feature behind a flag set to off on default so existing jobs will be unaffected 3. Piping for compile context to pickle output Test Plan: In D74039793, we add CompileContext to the visualizer and we see the following {F1977654658} Differential Revision: D74028214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152707 Approved by: https://github.com/eqy	2025-05-12 21:12:51 +00:00
Yuanhao Ji	930de01861	[Typing] Apply `torch.types.Device` in `torch/cuda/memory.py` (#153027 ) Part of: #152952 Here is the definition of `torch.types.Device`: `ab997d9ff5/torch/types.py (L74)` It contains `int`, so the `int` in `Union[Device, int]` is redundant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153027 Approved by: https://github.com/Skylion007	2025-05-11 23:32:59 +00:00
Boyuan Feng	d969e2ec33	[CUDAGraph Trees] support memory allocation on side stream (#152472 ) I tried `beginAllocateToPool` instead of `_cuda_beginAllocateCurrentStreamToPool` and the error in #151199 does not happen any more. However, this approach is unsafe for multithreading. When multiple run_eager happens concurrently, we expect memory allocation to different mem_pool. Since beginAllocateToPool does not check stream, these memory allocation may happen on the same mem_pool. So, I use `_cuda_beginAllocateCurrentThreadToPool` to direct all memory allocation on the same thread to a given mem_pool. In particular, `_cuda_beginAllocateCurrentThreadToPool` records the launching thread id, and during runtime checks if the current thread id matches the launching thread id. Fixes #151199 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152472 Approved by: https://github.com/eellison, https://github.com/ngimel	2025-05-02 04:26:35 +00:00
Dan Johnson	d22c4cc353	Add option to use mempool on OOM (#151487 ) MemPool is a separate pool of memory handled by the caching allocator. This PR adds the option let the caching allocator try to use this pool as a last resort instead of OOMing by associating a use_on_oom bool with each MemPool. Usage: Users can optionally specify a ``use_on_oom`` bool (which is False by default) during MemPool creation. If true, then the CUDACachingAllocator will be able to use memory in this pool as a last resort instead of OOMing. ``` pool = torch.cuda.MemPool(allocator, use_on_oom=True) with torch.cuda.use_mem_pool(pool): a = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") del a # at the memory limit, this will succeed by using pool's memory in order to avoid the oom b = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") ``` Testing: ``` python test/test_cuda.py -k test_mempool_limited_memory_with_allocator ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151487 Approved by: https://github.com/eqy, https://github.com/syed-ahmed, https://github.com/ngimel	2025-04-26 04:04:57 +00:00
Shivam Raikundalia	a11538aa46	[GPU Snapshot] Add Clear History Flag (#149352 ) Summary: Oftentimes, users complain that a bunch of extra events are prepended to their desired GPU snapshot. This is because they usually attach an OOM logger without knowing and when they go to collect the actual snapshot, it adds all the OOM logger contents. Since OOM and regular snapshot use the same backend, we currently don't have the infra in place to split these snapshots. As a solution we add a flag to the snapshot frontend to clear out the history when starting the auto-trace record memory history. A more thorough solution would be to have a user pass in a handle and to have snapshots per handle to seperate the events. However, this would likely be complicated and more work than it is worth as we would have to change the callbacks in the caching allocator and pass these objects between python and cpp. Test Plan: See diff below Differential Revision: D71159720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149352 Approved by: https://github.com/eqy, https://github.com/aaronenyeshi	2025-03-19 21:44:20 +00:00
Marko Radmilac	c65ee728f0	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-03-05 16:13:19 +00:00
PyTorch MergeBot	a983b2b11a	Revert "Initial implementation of host memory stats (#147660 )" This reverts commit 945e359fc1afe6c0bb6129ed9607b237fa19cd98. Reverted https://github.com/pytorch/pytorch/pull/147660 on behalf of https://github.com/mradmila due to There is an issue with ambiguous definition of Stat structure when different C++ tools are used. Backing out for now. ([comment](https://github.com/pytorch/pytorch/pull/147660#issuecomment-2692346379))	2025-03-01 18:05:45 +00:00
Marko Radmilac	945e359fc1	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-02-28 18:36:44 +00:00
Aaron Orenstein	805c4b597a	PEP585 update - torch/_higher_order_ops torch/_subclasses torch/backends torch/compiler torch/cuda torch/masked torch/mtia torch/nested (#145202 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145202 Approved by: https://github.com/bobrenjc93	2025-01-20 22:37:26 +00:00
Benjamin Glass	4959784dac	Add API query for available per-process CUDA memory (#140620 ) Certain `cpp_wrapper`-enabled tests were OOM-ing in the CI pipeline, with error messages suggesting that sufficient memory was accessible. This ultimately resulted from an internal memory limitation that was not queryable in the API. This PR adds querying for that limit. Additionally, the failing tests had incorrect memory availability checks, and are updated with measured memory requirements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140620 Approved by: https://github.com/malfet, https://github.com/eqy ghstack dependencies: #141367	2024-12-03 00:24:03 +00:00
Brad Hilton	879e273601	fix: Add type annotation to _record_memory_history (#140545 ) Pylance infers the type of the first argument (`enabled`) to `_record_memory_history` as `str` even though the function accepts `Literal[None, "state", "all"]`. This raises an issue when passing `None`, even though it is a legitimate argument. This PR addresses the issue by adding the type annotation in the doc string. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140545 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-11-14 17:44:46 +00:00
Syed Tousif Ahmed	1637a40796	Adds snapshot API for MemPools to get pool memory segments (#133601 ) Canonically, the snapshot API returns the entire memory state of the CUDACachingAllocator (using `get_all_blocks`). There is no API that can only return the memory state of a given pool. In this PR, we extend the functionality of snapshot API such that it can only return the memory addresses of an active pool. When snapshot API is called under a MemPoolContext, we only return the blocks that correspond to the pool id of the active pool. Part of https://github.com/pytorch/pytorch/issues/124807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133601 Approved by: https://github.com/ezyang	2024-10-29 01:01:47 +00:00
PyTorch MergeBot	3b0f39336c	Revert "Adds snapshot API for MemPools to get pool memory segments (#133601 )" This reverts commit 00504aa6b8b0ae68761b89f023184202e8c79bc8. Reverted https://github.com/pytorch/pytorch/pull/133601 on behalf of https://github.com/wdvr due to reverting for now as this breaks lots of internal tests. Details below ([comment](https://github.com/pytorch/pytorch/pull/133601#issuecomment-2441864871))	2024-10-28 15:12:20 +00:00
Syed Tousif Ahmed	00504aa6b8	Adds snapshot API for MemPools to get pool memory segments (#133601 ) Canonically, the snapshot API returns the entire memory state of the CUDACachingAllocator (using `get_all_blocks`). There is no API that can only return the memory state of a given pool. In this PR, we extend the functionality of snapshot API such that it can only return the memory addresses of an active pool. When snapshot API is called under a MemPoolContext, we only return the blocks that correspond to the pool id of the active pool. Part of https://github.com/pytorch/pytorch/issues/124807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133601 Approved by: https://github.com/ezyang	2024-10-26 03:34:59 +00:00
Syed Tousif Ahmed	03c72976a5	Properly uses ref-counting for torch.cuda.use_mem_pool (#133600 ) This PR refactors some ref-counting functionality out of `beginAllocateToPool` and `releasePool`. The ref-counting logic is then used in construction and destruction of `torch.cuda.MemPool`. The `use_count` variable in the CUDACachingAllocator is essentially a refcount of how many context managers are using the pool. Since we are now lifting up the MemPool abstraction to the user, the MemPool object itself now needs to hold a an extra reference as well. Part of https://github.com/pytorch/pytorch/issues/124807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133600 Approved by: https://github.com/eqy, https://github.com/ezyang	2024-10-22 03:21:53 +00:00
Jeff Daily	c7b0d4b148	raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 ) raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114 Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD Co-authored-by: Nichols A. Romero <nick.romero@amd.com>	2024-10-04 15:36:29 +00:00
PyTorch MergeBot	0d1701f310	Revert "raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 )" This reverts commit 70019074806920f95976fedad775d7570294f635. Reverted https://github.com/pytorch/pytorch/pull/131114 on behalf of https://github.com/PaliC due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/131114#issuecomment-2390615007))	2024-10-03 06:22:55 +00:00
Jeff Daily	7001907480	raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114 ) raw_alloc is used by cudnn, miopen, thrust, and tunableop. Without this PR, the env var for disabling the caching allocator will only partially work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114 Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD Co-authored-by: Nichols A. Romero <nick.romero@amd.com>	2024-10-02 16:27:15 +00:00
Syed Tousif Ahmed	4655eb3ee2	Uses MemPoolContext to route allocations from CUDACachingAllocator (#134685 ) Re-open of https://github.com/pytorch/pytorch/pull/133599 that was mistakenly closed by issuing `ghstack land` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134685 Approved by: https://github.com/ezyang	2024-08-29 03:56:31 +00:00
Tobias Ringwald	6753ee127c	Allow torch.cuda.memory.mem_get_info to take a device str argument with an unspecified device index. (#132616 ) `torch.cuda.memory.mem_get_info` allows device strings given the current type hints. However, `device = torch.device('cuda')` leads to `device.index = None`, which results in downstream problems. Setting `optional=True` will insert the default device index in such cases. Fixes #132583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132616 Approved by: https://github.com/soulitzer	2024-08-06 13:19:46 +00:00
Xuehai Pan	f3fce597e9	[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]/` and `torch/[e-n]/` (#129769 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769 Approved by: https://github.com/ezyang	2024-08-04 10:24:09 +00:00
Syed Tousif Ahmed	7c89ec0f7c	Implements torch.cuda.MemPool() API (#131152 ) In this PR: - Pool id creation logic is refactored and moved to a MemPool class. `graph_pool_handle()` API now uses `torch.cuda.MemPool()` to get a unique id for a pool. Existing tests should cover this change. - MemPool holds a pointer to a CUDAAllocator as proposed in https://github.com/pytorch/pytorch/issues/124807#issuecomment-2077506997. Tests are added to show usage with CUDAPluggableAllocator. - MemPoolContext API makes a mempool active. Tests are added to show usage of this API. This API will be used in CUDACachingAllocator to route allocations to a user provided allocator. See draft here: https://github.com/pytorch/pytorch/pull/125722/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/131152 Approved by: https://github.com/eqy, https://github.com/ezyang	2024-08-01 01:29:30 +00:00
Xiaodong Wang	9e753d1f20	[AMD] catch exception when other processes belong to other users (#131018 ) Summary: It is a long known pain point that if other users are running things, the call of `torch.cuda.memory.list_gpu_processes()` will error out: ``` torch.cuda.memory.list_gpu_processes() File "torch/cuda/memory.py", line 647, in list_gpu_processes procs = amdsmi.amdsmi_get_gpu_process_list(handle) # type: ignore[attr-defined] File "amdsmi/py_interface/amdsmi_interface.py", line 1946, in amdsmi_get_gpu_process_list _check_res( File "amdsmi/py_interface/amdsmi_interface.py", line 510, in _check_res raise AmdSmiLibraryException(ret_code) amdsmi.py_interface.amdsmi_exception.AmdSmiLibraryException: Error code: 10 \| AMDSMI_STATUS_NO_PERM - Permission Denied ``` So just catch this error Test Plan: torch.cuda.memory.list_gpu_processes() no longer fails Differential Revision: D59901053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131018 Approved by: https://github.com/eqy, https://github.com/clee2000	2024-07-22 19:38:51 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit 749a132fb0a8325cbad4734a563aa459ca611991. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00

1 2

99 Commits