pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Joshua Su	a818fa77e3	Back out "Deprecate overleap functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead (#156165 )" (#160999 ) Summary: reverting this diff since it caused S551328. Please see D80217492 for dertails. Test Plan: NA Rollback Plan: Differential Revision: D80553314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160999 Approved by: https://github.com/izaitsevfb, https://github.com/jingsh	2025-08-20 15:04:36 +00:00
Yu, Guangye	908c5cc4c0	Generalize torch._C._set_allocator_settings to be generic (#156175 ) # Motivation This PR moves the implementation of `torch.cuda.memory._set_allocator_settings` to `torch._C._accelerator_setAllocatorSettings`. Since the original API was intended as a temporary/internal utility, I am not exposing the new function as a public API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156175 Approved by: https://github.com/albanD ghstack dependencies: #159629, #150312, #156165	2025-08-05 04:08:42 +00:00
Yu, Guangye	c1145852a5	Deprecate overleap functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead (#156165 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156165 Approved by: https://github.com/albanD ghstack dependencies: #159629, #150312	2025-08-05 04:08:42 +00:00
PyTorch MergeBot	90f13f3b2a	Revert "Deprecate overleap functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead (#156165 )" This reverts commit 1fc010a9d8ea95bb74e54b31d17eba56ef16c27c. Reverted https://github.com/pytorch/pytorch/pull/156165 on behalf of https://github.com/guangyey due to Static initialization order issue impact the downstream repo ([comment](https://github.com/pytorch/pytorch/pull/150312#issuecomment-3142035444))	2025-08-01 03:24:54 +00:00
PyTorch MergeBot	cb9b74872b	Revert "Generalize torch._C._set_allocator_settings to be generic (#156175 )" This reverts commit d3ce45012ed42cd1e13d5048b046b781f0feabe0. Reverted https://github.com/pytorch/pytorch/pull/156175 on behalf of https://github.com/guangyey due to Static initialization order issue impact the downstream repo ([comment](https://github.com/pytorch/pytorch/pull/150312#issuecomment-3142035444))	2025-08-01 03:24:54 +00:00
Yu, Guangye	d3ce45012e	Generalize torch._C._set_allocator_settings to be generic (#156175 ) # Motivation This PR moves the implementation of `torch.cuda.memory._set_allocator_settings` to `torch._C._accelerator_setAllocatorSettings`. Since the original API was intended as a temporary/internal utility, I am not exposing the new function as a public API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156175 Approved by: https://github.com/albanD ghstack dependencies: #149601, #157908, #150312, #156165	2025-07-30 06:37:15 +00:00
Yu, Guangye	1fc010a9d8	Deprecate overleap functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead (#156165 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156165 Approved by: https://github.com/albanD ghstack dependencies: #149601, #157908, #150312	2025-07-30 06:37:15 +00:00
PyTorch MergeBot	ea5f88dca6	Revert "Deprecate overleap functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead (#156165 )" This reverts commit e40ade5182233f548b25f2732effe3719d16e9ad. Reverted https://github.com/pytorch/pytorch/pull/156165 on behalf of https://github.com/huydhn due to Sorry for reverting your change but because https://github.com/pytorch/pytorch/pull/157908 has been reverted + this PR caused issue earlier, I think it is better to revert the whole stack and reland it from scratch to be sure ([comment](https://github.com/pytorch/pytorch/pull/150312#issuecomment-3074897532))	2025-07-15 18:24:36 +00:00
Yu, Guangye	e40ade5182	Deprecate overleap functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead (#156165 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156165 Approved by: https://github.com/albanD ghstack dependencies: #150312	2025-07-15 10:14:35 +00:00
PyTorch MergeBot	e8cca7bac7	Revert "Deprecate overleap functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead (#156165 )" This reverts commit 85857181ebca86e9c709e9922a9d9ef41a9c4ef9. Reverted https://github.com/pytorch/pytorch/pull/156165 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing to build PyTorch internally ([comment](https://github.com/pytorch/pytorch/pull/150312#issuecomment-3070218901))	2025-07-14 16:33:48 +00:00
Yu, Guangye	85857181eb	Deprecate overleap functions in CUDAAllocatorConfig, use AcceleratorAllocatorConfig instead (#156165 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156165 Approved by: https://github.com/albanD ghstack dependencies: #149601, #157908, #150312	2025-07-11 11:41:34 +00:00
Aidyn-A	4a26bb8a12	[C10][CUDA] Eagerly create context on torch.cuda.set_device(device) call (#155900 ) Fixes #155668 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155900 Approved by: https://github.com/ngimel	2025-06-17 18:59:44 +00:00
PyTorch MergeBot	365ce465f3	Revert "[C10][CUDA] Eagerly create context on torch.cuda.set_device(device) call (#155900 )" This reverts commit 8142a0286016e63a0e91b5667e1fb1a5e868ffd7. Reverted https://github.com/pytorch/pytorch/pull/155900 on behalf of https://github.com/clee2000 due to causing some sort of hang? in test_distributed_spawn [GH job link](https://github.com/pytorch/pytorch/actions/runs/15678895788/job/44168117193) [HUD commit link](`8142a02860`) note to self: bad TD ([comment](https://github.com/pytorch/pytorch/pull/155900#issuecomment-2977365699))	2025-06-16 16:59:25 +00:00
Aidyn-A	8142a02860	[C10][CUDA] Eagerly create context on torch.cuda.set_device(device) call (#155900 ) Fixes #155668 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155900 Approved by: https://github.com/ngimel	2025-06-16 10:55:47 +00:00
Yu, Guangye	d84efde3f0	Move _storage_Use_Count to be gerneric (#155451 ) # Motivation `torch._C._storage_Use_Count` should be a generic API that is not aware of device type. It is also used in `337cd7c53d/torchtune/training/_activation_offloading.py (L323)` to do some memory optimization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155451 Approved by: https://github.com/albanD	2025-06-12 01:39:04 +00:00
Shivam Raikundalia	1083bc749d	[Memory Snapshot] Add Flag to Toggle Global and Local Callbacks for Annotations (#154932 ) Summary: There are some cases where we want only local annotations for memory snapshot such as executing inside the cudastream callback, which cannot execute CUDA operators. Thus the cuda errors happen: Exception in RecordFunction callback: CUDA error: operation not permitted However, we need to have an option to turn on the globally so that on-demand snapshot can get annotations. Additionally, there may be some cases in which auto-trace will also want annotations using record functions so we expose the flag to the auto-trace as well. Test Plan: Run MVAI executable and see that the errors go away Rollback Plan: Differential Revision: D75831687 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154932 Approved by: https://github.com/mzzchy, https://github.com/sanrise	2025-06-04 23:15:19 +00:00
Natalia Gimelshein	f01e628e3b	Resubmit Remove MemPoolContext (#154042 ) (#154746 ) Summary: Per title Test Plan: Added tests + existing tests Differential Revision: D75695030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154746 Approved by: https://github.com/malfet	2025-05-31 01:21:54 +00:00
PyTorch MergeBot	d173ba5a75	Revert "Remove MemPoolContext (#154042 )" This reverts commit 3b38989b5f8f918cf1ad38bdade059608544af4b. Reverted https://github.com/pytorch/pytorch/pull/154042 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/154042#issuecomment-2921401100))	2025-05-30 06:53:37 +00:00
Natalia Gimelshein	3b38989b5f	Remove MemPoolContext (#154042 ) Removes MemPoolContext from custom user mempools. The ground truth for which pool should be used is in graph_pools active pool, and MemPoolContext just introduced an opportunity for the pool pointed to by MemPoolContext and active pool in graph_pools to go out of sync (see all the asserts in the code to make sure that happens, and yet it still could happen in a multithread scenario, see my recent PRs (#153990). Pull Request resolved: https://github.com/pytorch/pytorch/pull/154042 Approved by: https://github.com/albanD, https://github.com/syed-ahmed	2025-05-28 16:35:48 +00:00
Shivam Raikundalia	dbb4444ce3	[Memento] Add PT2 to Memory Snapshot (#152707 ) Summary: To add PT2 information to memory snapshot we piggyback off of the Kineto implementation using record_function similar to adding the user annotations. To do this we add the following: 1. Stack implementation that we instantiate to keep track of which compile context stack we are currently in (top element of the stack). The stack will be per device and thread-local since different threads of a process can be in different compile contexts at a given time. For this reason, we do not need to add mutexes to our stack impl since no two threads will touch a given stack 2. RecordFunction hooks to properly pipe the correct events to the compile context stack. These hooks are similar to the annotation ones in the fact that we just register them lazily and DO NOT unregister them. This is done out of convenience. In the future, we should save the handles and unregister them to minimize overhead after profiling is finished. As of now, we are registering this at the FUNCTION scope which is wide; however, we treat any function that does not start with "Torch-Compiled Region" as a no-op so we anticipate the difference in performance to be negligible during and after profiling. We also hide this feature behind a flag set to off on default so existing jobs will be unaffected 3. Piping for compile context to pickle output Test Plan: In D74039793, we add CompileContext to the visualizer and we see the following {F1977654658} Differential Revision: D74028214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152707 Approved by: https://github.com/eqy	2025-05-12 21:12:51 +00:00
FFFrog	fd8fd01d25	[OpenReg] Add _lazy_init and rng_state support for OpenReg (#151914 ) As the title stated. Changes: - Add get_rng_state & set_rng_state support for OpenReg - Add _lazy_init support for OpenReg - Remove redundant code for cuda/Module.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/151914 Approved by: https://github.com/albanD	2025-05-04 09:42:08 +00:00
Boyuan Feng	d969e2ec33	[CUDAGraph Trees] support memory allocation on side stream (#152472 ) I tried `beginAllocateToPool` instead of `_cuda_beginAllocateCurrentStreamToPool` and the error in #151199 does not happen any more. However, this approach is unsafe for multithreading. When multiple run_eager happens concurrently, we expect memory allocation to different mem_pool. Since beginAllocateToPool does not check stream, these memory allocation may happen on the same mem_pool. So, I use `_cuda_beginAllocateCurrentThreadToPool` to direct all memory allocation on the same thread to a given mem_pool. In particular, `_cuda_beginAllocateCurrentThreadToPool` records the launching thread id, and during runtime checks if the current thread id matches the launching thread id. Fixes #151199 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152472 Approved by: https://github.com/eellison, https://github.com/ngimel	2025-05-02 04:26:35 +00:00
PyTorch MergeBot	56039b5778	Revert "[CUDAGraph Trees] support memory allocation on side stream (#152472 )" This reverts commit c620763ec2be83e37f9b31ad6663c6e82d6c0ab0. Reverted https://github.com/pytorch/pytorch/pull/152472 on behalf of https://github.com/BoyuanFeng due to should use tid instead pid ([comment](https://github.com/pytorch/pytorch/pull/152472#issuecomment-2843491656))	2025-04-30 22:18:10 +00:00
Boyuan Feng	c620763ec2	[CUDAGraph Trees] support memory allocation on side stream (#152472 ) I tried `beginAllocateToPool` instead of `_cuda_beginAllocateCurrentStreamToPool` and the error in #151199 does not happen any more. However, this approach is unsafe for multithreading. When multiple run_eager happens concurrently, we expect memory allocation to different mem_pool. Since beginAllocateToPool does not check stream, these memory allocation may happen on the same mem_pool. So, I use `_cuda_beginAllocateCurrentThreadToPool` to direct all memory allocation on the same thread to a given mem_pool. In particular, `_cuda_beginAllocateCurrentThreadToPool` records the launching thread id, and during runtime checks if the current thread id matches the launching thread id. Fixes #151199 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152472 Approved by: https://github.com/eellison	2025-04-30 17:45:07 +00:00
PyTorch MergeBot	3962b8f1e0	Revert "[OpenReg] Add _lazy_init and rng_state support for OpenReg (#151914 )" This reverts commit 64a55b531f4f4ae2b35175ab5d9a30a856b0d6ef. Reverted https://github.com/pytorch/pytorch/pull/151914 on behalf of https://github.com/malfet due to Looks like breaks number of ROCM jobs, see `797768cd90/1` ([comment](https://github.com/pytorch/pytorch/pull/151914#issuecomment-2839691038))	2025-04-29 17:36:12 +00:00
FFFrog	64a55b531f	[OpenReg] Add _lazy_init and rng_state support for OpenReg (#151914 ) As the title stated. Changes: - Add get_rng_state & set_rng_state support for OpenReg - Add _lazy_init support for OpenReg - Remove redundant code for cuda/Module.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/151914 Approved by: https://github.com/albanD	2025-04-29 11:18:12 +00:00
cyy	b34146a093	Fix initGdsBindings declaration (#152277 ) Move initGdsBindings into the correct namespace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152277 Approved by: https://github.com/Skylion007	2025-04-27 17:04:56 +00:00
Yu, Guangye	14e3ffb1ff	Deprecate host allocator legacy APIs (#151437 ) # Motivation This PR aims to deprecate the host allocator legacy API and recommend users to use the unified API `getHostAllocator(device_type)` APIs, such as: ```cpp at::getHostAllocator(device_type)->allocate(...); at::getHostAllocator(device_type)->empty_cache(); at::getHostAllocator(device_type)->record_event(...); at::getHostAllocator(device_type)->get_stats(); at::getHostAllocator(device_type)->reset_accumulated_stats(); at::getHostAllocator(device_type)->reset_peak_stats(); ``` # Additional Context TODO: - [ ] Move is_pinned from `AcceleratorHookInterface` to `HostAllocator` - [ ] Deprecate `getPinnedMemoryAllocator` inside `AcceleratorHookInterface` and recommend using `getHostAllocator` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151437 Approved by: https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #151403, #151431	2025-04-22 03:13:24 +00:00
Evgeny Fiksman	1a6effc5d8	[torch] Expose PCI info from CUDA device (#151672 ) Summary: PR#125083 add cuda device UUID info, but due to meta internal [version of ROCM the code was excluded](https://github.com/pytorch/pytorch/pull/125083?fbclid=IwY2xjawJvLnNleHRuA2FlbQIxMQABHlY55crrkTqWBWTsr2HVfuqnZ3R1GHR3o9Kf1o3h3uvyawEmCEdhdT48iY1P_aem_8tfrGrWE9SxFYasGfH8kCQ#issuecomment-2103315320). This change will ensure meta internal code is built and PCI info is available Test Plan: pass CI Differential Revision: D73253426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151672 Approved by: https://github.com/Skylion007	2025-04-21 19:55:19 +00:00
Xiaodong Wang	88b0553c58	[AMD] Remove fbcode limit for uuid (#151652 ) Summary: We're now w/ later rocm version so ok to add uuid back. Test Plan: sandcastle Differential Revision: D73240086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151652 Approved by: https://github.com/Skylion007, https://github.com/ngimel, https://github.com/houseroad	2025-04-18 20:37:09 +00:00
Yu, Guangye	b0810168a3	Generalize poison fork logic for each device backend (#144664 ) # Motivation Generalize the posion_fork code to make it reusable across different devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664 Approved by: https://github.com/EikanWang, https://github.com/albanD	2025-04-13 09:54:30 +00:00
PyTorch MergeBot	a0ab243c3a	Revert "Generalize poison fork logic for each device backend (#144664 )" This reverts commit 83bd0b63b55f224fada6d5f6dd7eb5b4cb3072fb. Reverted https://github.com/pytorch/pytorch/pull/144664 on behalf of https://github.com/atalman due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/144664#issuecomment-2795157082))	2025-04-10 21:02:14 +00:00
Yu, Guangye	83bd0b63b5	Generalize poison fork logic for each device backend (#144664 ) # Motivation Generalize the posion_fork code to make it reusable across different devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664 Approved by: https://github.com/EikanWang, https://github.com/albanD	2025-04-10 02:34:53 +00:00
PyTorch MergeBot	bf1132c196	Revert "Generalize poison fork logic for each device backend (#144664 )" This reverts commit d86c14156d875b782b82dda96842a1f77910f010. Reverted https://github.com/pytorch/pytorch/pull/144664 on behalf of https://github.com/atalman due to failing periodic test: python test/test_cpp_extensions_mtia_backend.py TestCppExtensionMTIABackend.test_device_context ([comment](https://github.com/pytorch/pytorch/pull/144664#issuecomment-2784506104))	2025-04-07 20:09:53 +00:00
Yu, Guangye	d86c14156d	Generalize poison fork logic for each device backend (#144664 ) # Motivation Generalize the posion_fork code to make it reusable across different devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664 Approved by: https://github.com/EikanWang, https://github.com/albanD	2025-04-07 02:06:21 +00:00
Shivam Raikundalia	a11538aa46	[GPU Snapshot] Add Clear History Flag (#149352 ) Summary: Oftentimes, users complain that a bunch of extra events are prepended to their desired GPU snapshot. This is because they usually attach an OOM logger without knowing and when they go to collect the actual snapshot, it adds all the OOM logger contents. Since OOM and regular snapshot use the same backend, we currently don't have the infra in place to split these snapshots. As a solution we add a flag to the snapshot frontend to clear out the history when starting the auto-trace record memory history. A more thorough solution would be to have a user pass in a handle and to have snapshots per handle to seperate the events. However, this would likely be complicated and more work than it is worth as we would have to change the callbacks in the caching allocator and pass these objects between python and cpp. Test Plan: See diff below Differential Revision: D71159720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149352 Approved by: https://github.com/eqy, https://github.com/aaronenyeshi	2025-03-19 21:44:20 +00:00
Marko Radmilac	c65ee728f0	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-03-05 16:13:19 +00:00
PyTorch MergeBot	a983b2b11a	Revert "Initial implementation of host memory stats (#147660 )" This reverts commit 945e359fc1afe6c0bb6129ed9607b237fa19cd98. Reverted https://github.com/pytorch/pytorch/pull/147660 on behalf of https://github.com/mradmila due to There is an issue with ambiguous definition of Stat structure when different C++ tools are used. Backing out for now. ([comment](https://github.com/pytorch/pytorch/pull/147660#issuecomment-2692346379))	2025-03-01 18:05:45 +00:00
Marko Radmilac	945e359fc1	Initial implementation of host memory stats (#147660 ) This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics. This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache. As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later. Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660 Approved by: https://github.com/ngimel	2025-02-28 18:36:44 +00:00
cyy	25aa7ca62d	Cleanup CallOnce.h (#146700 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146700 Approved by: https://github.com/albanD	2025-02-07 16:44:45 +00:00
cyy	29f52e3972	[2/N] Remove unnecessary once flag usage (#145057 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/145057 Approved by: https://github.com/albanD	2025-01-23 09:48:46 +00:00
Yu, Guangye	3848de55ed	Add get_stream_from_external API for CUDA backend (#143799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143799 Approved by: https://github.com/albanD, https://github.com/EikanWang ghstack dependencies: #142347, #141119, #141123	2024-12-31 11:15:59 +00:00
Nichols A. Romero	c0a39ad35a	[ROCm] Fix TunableOp UTs: Rotating Buffer (#143172 ) TunableOp's rotating buffer feature cannot be properly tested because the environment variable that controls this feature is sticky. A Python API is introduced to modify this value. Additional items in this PR: * UT for rotating buffer API * Clean up UTs that were setting the rotating buffer via the environment variable * Align behavior of environment variable and Python API when a negative value (< 0) is set. * Update documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143172 Approved by: https://github.com/jeffdaily	2024-12-14 06:18:11 +00:00
Peter Bell	96c3b2c388	Expose remaining sharedMem cudaDeviceProps to python (#143226 ) Was a bit too fast with my earlier PR, `sharedMemPerMultiprocessor` includes some memory that is reserved for the system. The amount a kernel can actually use is limited by `sharedMemPerBlockOptin`. I also expose `sharedMemPerBlock` for completeness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143226 Approved by: https://github.com/ezyang	2024-12-14 06:13:28 +00:00
Peter Bell	82a45d19b4	Expose sharedMemPerMultiprocessor device property to python (#143119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143119 Approved by: https://github.com/ezyang	2024-12-13 16:53:57 +00:00
Benjamin Glass	4959784dac	Add API query for available per-process CUDA memory (#140620 ) Certain `cpp_wrapper`-enabled tests were OOM-ing in the CI pipeline, with error messages suggesting that sufficient memory was accessible. This ultimately resulted from an internal memory limitation that was not queryable in the API. This PR adds querying for that limit. Additionally, the failing tests had incorrect memory availability checks, and are updated with measured memory requirements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140620 Approved by: https://github.com/malfet, https://github.com/eqy ghstack dependencies: #141367	2024-12-03 00:24:03 +00:00
cyy	f95c71867e	[9/N] Fix extra warnings brought by clang-tidy-17 (#139286 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139286 Approved by: https://github.com/ezyang	2024-10-31 05:20:31 +00:00
cyyever	456c87c8a2	[8/N] Fix extra warnings brought by clang-tidy-17 (#139151 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139151 Approved by: https://github.com/ezyang Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-10-30 14:20:08 +00:00
Edward Yang	b14269dcfb	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) (#138155 ) Summary: - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Original pull request: https://github.com/pytorch/pytorch/pull/136519 Test Plan: contbuild & OSS CI, see `4a8e49389c` Reviewed By: malfet Differential Revision: D64471142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138155 Approved by: https://github.com/malfet, https://github.com/bobrenjc93	2024-10-17 20:58:56 +00:00
PyTorch MergeBot	d4d687ffb2	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit 4a8e49389c33934234dc89616fd17a58e760e2e7. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302))	2024-10-15 17:19:16 +00:00

1 2 3 4 5 ...

341 Commits