pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 13:44:15 +08:00

Author	SHA1	Message	Date
Svetlana Karslioglu	e58c73be44	Add latex settings (#152350 ) - Fixes #147027 - Only lualatex can build our 3K pages PDF with reasonable quality, xelatex runs out of memory and pdflatex just fails. - Move notes under the same toctree as python-api which is needed for the PDF but doesn't change how the HTML is generated. This is the produced PDF: [pytorch.pdf](https://github.com/user-attachments/files/19945450/pytorch.pdf) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152350 Approved by: https://github.com/albanD	2025-04-29 19:28:43 +00:00
Zizeng Meng	861945100e	[Kineto] Enable OOM observer (#152160 ) Summary: # Context: When memory leak happens, it usually trigger the OOM in the later iterations. The snapshot of full iteration will be huge and hard to interpret. On CUDA side, they provide OOM observer which generates snapshot when OOM happens with latest 1,500,000 entries for debugging. In this diff, we want to implement the feature on MTIA side Test Plan: Run this test with last diff in the stack. ``` buck run @//mode/opt kineto/libkineto/fb/mtia/integration_tests:mtia_memory_auto_trace_test ``` As shown, the memory_snapshot is generated when oom happens Log: P1794792326 Snapshot: https://fburl.com/pytorch_memory_visualizer/lx73y6s3 {F1977402355} Differential Revision: D71993315 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152160 Approved by: https://github.com/sraikund16	2025-04-27 15:56:44 +00:00
Anthony Shoumikhin	e2f9759bd0	Fix broken URLs (#152237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet	2025-04-27 09:56:42 +00:00
Dan Johnson	d22c4cc353	Add option to use mempool on OOM (#151487 ) MemPool is a separate pool of memory handled by the caching allocator. This PR adds the option let the caching allocator try to use this pool as a last resort instead of OOMing by associating a use_on_oom bool with each MemPool. Usage: Users can optionally specify a ``use_on_oom`` bool (which is False by default) during MemPool creation. If true, then the CUDACachingAllocator will be able to use memory in this pool as a last resort instead of OOMing. ``` pool = torch.cuda.MemPool(allocator, use_on_oom=True) with torch.cuda.use_mem_pool(pool): a = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") del a # at the memory limit, this will succeed by using pool's memory in order to avoid the oom b = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") ``` Testing: ``` python test/test_cuda.py -k test_mempool_limited_memory_with_allocator ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151487 Approved by: https://github.com/eqy, https://github.com/syed-ahmed, https://github.com/ngimel	2025-04-26 04:04:57 +00:00
Yu, Guangye	33c75cae0a	Add torch.accelerator.device_index as accelerator's device switch context (#148864 ) # Motivation We propose adding support for the Python with statement on `torch.accelerator.device_index` to enable device switching functionality. This enhancement would simplify writing device-agnostic code and provide benefits across all accelerators. Its device-specific counterparts include [`torch.cuda.device`](`00199acdb8/torch/cuda/__init__.py (L482)`) and [`torch.cuda._DeviceGuard`](`00199acdb8/torch/cuda/__init__.py (L469)`). Design Philosophy It accepts either an `Int` or `None` as input. When `None` is passed, no device switch is performed. Supporting `None` is important for compatibility, as it's possible to encounter `None` values from `torch.device.index`. Therefore, with this PR, we can do like this ```python src = 0 dst = 1 # Set src to current device torch.accelerator.set_device_index(src) with torch.accelerator.device_index(dst): # Inside with statement, we set dst to current device assert torch.accelerator.get_device_index() == dst # Here the current device should be src assert torch.accelerator.get_device_index() == src ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148864 Approved by: https://github.com/albanD	2025-04-25 09:45:25 +00:00
Jane Xu	8a9c66bb70	Improve stable library apis per Scott's feedback (#152040 ) Following 3 suggestions: 1. inline at::Tensor arg 2. use uniq ptr of array vs std::vector 3. document the `std::optional<S>()` case Pull Request resolved: https://github.com/pytorch/pytorch/pull/152040 Approved by: https://github.com/swolchok, https://github.com/albanD	2025-04-24 20:51:03 +00:00
ILCSFNO	bd09d87fdb	add Out Notes (#151306 ) Fixes #150181 @albanD Could you please have a check? Build locally without pytorch build: ![Developer-FAQ](https://github.com/user-attachments/assets/351a7e0b-588e-48ae-ad0a-03f427c86e89) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151306 Approved by: https://github.com/albanD	2025-04-24 20:25:09 +00:00
Pian Pawakapan	2ee8de54b1	[dynamic shapes] user-code friendly statically_known_true, has_static_value (#151601 ) Fixes #151480 Allows `statically_known_true` in user code, as well as introducing `has_static_value`, returning True if the input has a static bool/float/int value Pull Request resolved: https://github.com/pytorch/pytorch/pull/151601 Approved by: https://github.com/laithsakka, https://github.com/zou3519, https://github.com/jingsh	2025-04-24 02:53:59 +00:00
Kaiyu Shi	f39a1a43ee	Fix typos in meta.rst (#151979 ) ### Fixes made: - "allow you to the module" → corrected to "allows you to move the module" - "allow" → changed to "allows" to agree with the singular subject "method" Pull Request resolved: https://github.com/pytorch/pytorch/pull/151979 Approved by: https://github.com/colesbury	2025-04-24 01:25:09 +00:00
Syed Tousif Ahmed	334aab0dea	Updates NCCLConfig with QOS variable (#151821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151821 Approved by: https://github.com/kwen2501	2025-04-23 00:03:49 +00:00
Scott Wolchok	2f74cffab2	Remove `reinterpret_cast`s with undefined behavior from stable/library.h (#151595 ) There is a list of valid uses of `reinterpret_cast` (see https://en.cppreference.com/w/cpp/language/reinterpret_cast), and the use here was not on the list, hence undefined behavior. Implement what we meant using memcpy, which is well-defined. Differential Revision: [D73200791](https://our.internmc.facebook.com/intern/diff/D73200791/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151595 Approved by: https://github.com/janeyx99	2025-04-22 20:24:47 +00:00
Svetlana Karslioglu	2fb1326483	Add dates to pages (#151602 ) re: #150873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151602 Approved by: https://github.com/albanD	2025-04-21 19:53:55 +00:00
Will Constable	bedefa46a9	Document non-pytorch CUDA memory allocation and how to query it (#150880 ) This PR documents the fact that PyTorch does not have visibility into how every CUDA memory allocation happend - it only knows about allocations that went through the pytorch CUDA allocator. It also adds a code snippet showing how to use pynvml to query current GPU memory usage. ## Preview Added a note at the top of "Understanding CUDA Memory Usage" doc: <img width="732" alt="image" src="https://github.com/user-attachments/assets/69e28d2a-841a-4b1b-b886-e96fb5d76582" /> which links to a section below: <img width="733" alt="image" src="https://github.com/user-attachments/assets/cab4f252-9ac2-4fc6-a45d-fdb958fc7dbc" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/150880 Approved by: https://github.com/kwen2501, https://github.com/ngimel	2025-04-18 03:48:54 +00:00
Kashif Rasul	2ed2cb5805	add generalized pareto distribution (GPD) (#135968 ) Add the GPD as a distribution class Pull Request resolved: https://github.com/pytorch/pytorch/pull/135968 Approved by: https://github.com/albanD Co-authored-by: Alexander März <statmixedmlgit@gmail.com>	2025-04-17 18:51:02 +00:00
Svetlana Karslioglu	cd7bc60e11	Migrate to new theme (#149331 ) - Migrate pytorch docs, cpp docs and functorch docs to the pytorch_sphinx_theme2 - Migrate index.rst to markdown and restructure to use high-level horizontal bar sections Python API, Developer Notes - Added python-api.md which becomes the main container for the API docs. This file will be used to add all api references in the toctree. It would be great to have lint for this file: https://github.com/pytorch/pytorch/issues/150718 - Enabled mermaid sphinx extension and opengraph sphinx extension Pull Request resolved: https://github.com/pytorch/pytorch/pull/149331 Approved by: https://github.com/malfet, https://github.com/atalman, https://github.com/albanD	2025-04-16 21:35:19 +00:00
Pian Pawakapan	6dddd6520d	[dynamic shapes] add sym_and, sym_or (#150456 ) This has been pretty helpful for the size-oblivious rewrite. Wanted the variadic args version to avoid `sym_or(a, sym_or(b, sym_or(c, d)))` in favor of `sym_or(a, b, c, d)`. Happy to change this to ban the 1-arg version. This is better than plain and/or because the whole symbolic expression gets preserved, and if we guard on it or defer as a runtime assert, we preserve all branches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150456 Approved by: https://github.com/laithsakka	2025-04-14 18:18:06 +00:00
fzyzcjy	50abc1ecc4	Super tiny fix typo (#151212 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/151212 Approved by: https://github.com/Skylion007	2025-04-14 16:47:40 +00:00
zeshengzong	5eebcb991a	Add scripts to generate plots of LRSchedulers (#149189 ) Fixes #92007 ## Changes - Add script to generate plots for `lr_scheduler` - Add plots to `lr_scheduler` docs - Add example section if it missing in `lr_scheduler` docs ## Test Result ### LambdaLR ![image](https://github.com/user-attachments/assets/37fc0894-e2ec-48f2-a2d6-3514e51e1ea2) ### MultiplicativeLR ![image](https://github.com/user-attachments/assets/2122b3a0-a4ce-42c7-bb45-559c1fc73e0f) ### StepLR ![image](https://github.com/user-attachments/assets/47bc9d96-4b60-4586-a000-f213583bbe8f) ### MultiStepLR ![image](https://github.com/user-attachments/assets/c822b849-d5be-4b94-aa7a-0017a2c9ff15) ### ConstantLR ![image](https://github.com/user-attachments/assets/83107cdd-7b00-44a6-b09d-e8ee849b4a12) ### LinearLR ![image](https://github.com/user-attachments/assets/60190105-691a-4101-8966-5b0c396093a4) ### ExponentialLR ![image](https://github.com/user-attachments/assets/dfcbcbca-89e5-4a2f-b1bd-33e25d2405ec) ### PolynomialLR ![image](https://github.com/user-attachments/assets/7c3d4fce-c846-40a0-b62e-f3e81c7e08bd) ### CosineAnnealingLR ![image](https://github.com/user-attachments/assets/26712769-dde9-4faa-b61b-e23c51daef50) ### ChainedScheduler ![image](https://github.com/user-attachments/assets/20734a8b-e939-424f-b45a-773f86f020b1) ### SequentialLR ![image](https://github.com/user-attachments/assets/2cd3ed67-2a0a-4c42-9ad2-e0be090d3751) ### ReduceLROnPlateau ![image](https://github.com/user-attachments/assets/b77f641e-4810-450d-b2cd-8b3f134ea188) ### CyclicLR ![image](https://github.com/user-attachments/assets/29b8666f-41b3-45e4-9159-6929074e6108) ### OneCycleLR ![image](https://github.com/user-attachments/assets/d5b683ef-41e8-4ca8-9fe8-0f1e6b433866) ### CosineAnnealingWarmRestarts ![image](https://github.com/user-attachments/assets/1d45ea80-dea8-494d-a8ab-e9cfc94c55d6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149189 Approved by: https://github.com/janeyx99	2025-04-14 09:53:38 +00:00
Tristan Rice	df4e5294a6	Reapply "ProcessGroupGloo: support lazy_init (#150801 )" (#151031 ) This reverts commit 73f3d6d9aaa128d9917e8b3790933ba2855066cc. Reapplies #150801 Test plan: See #150801 submodule Pull Request resolved: https://github.com/pytorch/pytorch/pull/151031 Approved by: https://github.com/fduwjj	2025-04-11 01:58:35 +00:00
Will Constable	c9a35c2a6e	[C10D] Document object collectives limitations (#150815 ) Adds louder warning labels in the doc page and docstring for object collectives in hopes of raising awareness of several footgun issues including accidental creation of cuda contexts by serializing and sending 'device-local' gpu tensors over the object-* apis. Preview: <img width="902" alt="image" src="https://github.com/user-attachments/assets/e0c08c70-d8e5-4e15-b3e2-5cd563714f71" /> addresses #150798 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150815 Approved by: https://github.com/kwen2501	2025-04-10 22:48:39 +00:00
PyTorch MergeBot	73f3d6d9aa	Revert "ProcessGroupGloo: support lazy_init (#150801 )" This reverts commit f237ee54bfb35d16cd10e358d4b78578c88a5781. Reverted https://github.com/pytorch/pytorch/pull/150801 on behalf of https://github.com/atalman due to failing internally ([comment](https://github.com/pytorch/pytorch/pull/150801#issuecomment-2793161239))	2025-04-10 13:44:31 +00:00
Yu, Guangye	6972255dad	Document poison fork note for accelerator APIs (#147507 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147507 Approved by: https://github.com/sraikund16, https://github.com/kwen2501, https://github.com/albanD	2025-04-10 02:37:37 +00:00
Tristan Rice	f237ee54bf	ProcessGroupGloo: support lazy_init (#150801 ) This adds lazy initialization support to ProcessGroupGloo via `TORCH_GLOO_LAZY_INIT` or via `create_device(..., lazy_init=True)` This is still a draft PR as there's one race condition when doing coalesced operations that needs to be fixed upstream in Gloo first. Depends on https://github.com/facebookincubator/gloo/pull/427 landing first This also updates the gloo submodule to include the required changes. Test plan: added lazy init test variants ``` pytest -v test/distributed/test_c10d_gloo.py -k Lazy ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150801 Approved by: https://github.com/fduwjj	2025-04-09 19:29:50 +00:00
Antoine Broyelle	886d9acb0d	[docs] Add 32-bit complex to the list of dtypes (#144590 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144590 Approved by: https://github.com/janeyx99	2025-04-09 13:10:21 +00:00
zeshengzong	c9c0f8eae3	Add plot for `torch.nn.Threshold` and `torch.nn.GLU` (#150171 ) Fixes #150170 ## Changes - Add plot for `torch.nn.Threshold` and `torch.nn.GLU` - Add example output make them easier get result by users ## Test Result ![image](https://github.com/user-attachments/assets/f6c5bc46-f9b7-4db7-9797-e08d8423d1b3) ![image](https://github.com/user-attachments/assets/ad4e6c84-7b29-44f1-b7bd-9c81e4a92ef8) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150171 Approved by: https://github.com/albanD	2025-04-08 03:55:37 +00:00
ZhaoqiongZ	96f35f55e2	update get start xpu document for v2.7 (#150397 ) update get start xpu document for v2.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150397 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/atalman Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-04-03 18:17:08 +00:00
Avik Chaudhuri	b70d105c77	infer dynamic shapes through additional inputs (#150144 ) Summary: Instead of explicitly specifying dynamic shapes, it is possible to infer them from additional example inputs. Together with the example inputs provided to export, we can basically make any varying dim dynamic and keep any fixed dim static. This should be useful for prod scenarios that have access to tests and/or profiling data, yet are somewhat removed from the model authoring process. However this alone is not satisfactory: the exported program by design has only one graph, representing one path through the model, and we cannot necessarily guarantee that this graph works for the additional example inputs because different guards might have been created if we had exported with them instead (corresponding to different traced paths). However, checking that the additional example inputs satisfy the guards created by the original export should be sufficient for generalization. Now, while we don't preserve all guards in the exported program, we do check a subset of them as part of input matching. So we add a verification step at the end of export when such additional example inputs are provided. This should be enough for now. Test Plan: added test (positive and negative cases) Differential Revision: D72001771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150144 Approved by: https://github.com/bobrenjc93	2025-04-01 21:13:39 +00:00
Tianyu Liu	d2ad9aa2f2	[dtensor][tp] add a ParallelStyle PrepareModuleInputOutput (#150372 ) Needed this class for because `parallelize_module` takes a dict, which doesn't allow `PrepareModuleInput` and `PrepareModuleOutput` to be applied at the same time. The `PrepareModuleInputOutput` in this PR initializes two variables `prepare_module_input` and `prepare_module_output` and uses them to process module / inputs / outputs. I had another implementation which put all code in `PrepareModuleInputOutput` and let `PrepareModuleInput` and `PrepareModuleOutput` inherit the monolithic `PrepareModuleInputOutput`. But it is 1. less cleaner 2. conceptually abusing inheritance because `PrepareModuleInput` shouldn't be able to access class methods of `PrepareModuleOutput` and vice versa Pull Request resolved: https://github.com/pytorch/pytorch/pull/150372 Approved by: https://github.com/wanchaol	2025-04-01 19:15:43 +00:00
Xia, Weiwen	3b0cd9b542	[Quant][PT2E] add a lowering pass for x86 backend (#149708 ) Summary This PR adds a lowering pass for x86 backend - Patterns of `dequantize -> conv/linear (-> quantize)` are fused to corresponding quantized onednn ops. - Weights are prepacked ahead of time. - Post ops of conv/linear are fused if supported. - The pass returns a `GraphModule` with the modifications mentioned above. Test plan ``` pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_lowering_to_x86 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149708 Approved by: https://github.com/jerryzh168, https://github.com/leslie-fang-intel	2025-04-01 17:32:41 +00:00
Pian Pawakapan	103bf64a3c	[export] refactor _Dim into Dim (#149891 ) Summary: forward fix T218515233 Test Plan: test_export Differential Revision: D71769231 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149891 Approved by: https://github.com/jingsh, https://github.com/angelayi	2025-03-28 06:19:03 +00:00
Laith Sakka	6cbcdee944	Introduce guard_or_true, guard_or_false (#148430 ) some context in this document: https://docs.google.com/document/d/18nJsj-F2C_QXO7ClwzPcAUENQ-B440B43W7DdDnlDt4/edit?tab=t.0#heading=h.pgebnyi7pocj But TLDR; `guard_or_true`, `guard_or_false` are better than `guard_size_oblivious` due to : - Easier to reason about what assumptions we are making while reading the code. - Avoid size_oblivious complexity that is not needed. - Avoid unsoundness that could make `guard_size_oblivious(a==1)` be true when its not true for some vaue `a` during runtime. - Less data dependent errors for some cases: ex, when doing `guard_size_oblivious(a==1)` and we know `a` is a tensor size, if it's traced with `a=u1-u2` `guard_size_oblivious(a==1)` will throw a data dependent error but `guard_else_false` will just return `False`. ### How is it different from statically_known_true?? `if(cond)`: (normal guarding) will try to evaluate statically and guard on the condition, willing to restrict input space to evaluate cond. if it fails to evaluate due to data dependent error will throw an exception (that could be converted to graph break in some situations). `statically_known_true(cond)`: would be used when you never want to add a guard (restrict your input space), but just want to do a best effort check to see if you can infer that something is true/false ONLY based on existing constraints. `guard_or_true(cond)`/`guard_or_false(cond)`: Those would be used in situations you prefer to guard and know the result of the expression over not guarding, but in case you hit a data dependent error you are ok with just returning true or false. Some reasons you might be ok with returning true/false instead could be: 1. It's an optimization I do not want to fail for not performing optimization. 2. I am willing to deviate from the normal semantics when I have unbacked for the benefit of not failing (See the doc above for more details). `definitely_true(cond)`: same as `guard_or_false(cond)` except does not try to do static eval for unbacked (planning to deprecate it and replace uses with `guard_or_false` or make it alias to `guard_or_false`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148430 Approved by: https://github.com/bobrenjc93	2025-03-27 09:34:05 +00:00
Louie Tsai	7aacbab0b3	Update Doc for Intel XPU Profiling (#134515 ) Updated below two pages for Intel XPU https://pytorch.org/docs/stable/torch.compiler_profiling_torch_compile.html https://pytorch.org/docs/stable/profiler.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/134515 Approved by: https://github.com/dvrogozh, https://github.com/malfet	2025-03-27 09:15:35 +00:00
PyTorch MergeBot	e080bac533	Revert "Introduce guard_or_true, guard_or_false (#148430 )" This reverts commit d5593ea31ceb2590336cc9815ee2c13a18db6cd7. Reverted https://github.com/pytorch/pytorch/pull/148430 on behalf of https://github.com/laithsakka due to need to fix stuff ([comment](https://github.com/pytorch/pytorch/pull/148430#issuecomment-2756701436))	2025-03-27 05:10:20 +00:00
Laith Sakka	d5593ea31c	Introduce guard_or_true, guard_or_false (#148430 ) some context in this document: https://docs.google.com/document/d/18nJsj-F2C_QXO7ClwzPcAUENQ-B440B43W7DdDnlDt4/edit?tab=t.0#heading=h.pgebnyi7pocj But TLDR; `guard_or_true`, `guard_or_false` are better than `guard_size_oblivious` due to : - Easier to reason about what assumptions we are making while reading the code. - Avoid size_oblivious complexity that is not needed. - Avoid unsoundness that could make `guard_size_oblivious(a==1)` be true when its not true for some vaue `a` during runtime. - Less data dependent errors for some cases: ex, when doing `guard_size_oblivious(a==1)` and we know `a` is a tensor size, if it's traced with `a=u1-u2` `guard_size_oblivious(a==1)` will throw a data dependent error but `guard_else_false` will just return `False`. ### How is it different from statically_known_true?? `if(cond)`: (normal guarding) will try to evaluate statically and guard on the condition, willing to restrict input space to evaluate cond. if it fails to evaluate due to data dependent error will throw an exception (that could be converted to graph break in some situations). `statically_known_true(cond)`: would be used when you never want to add a guard (restrict your input space), but just want to do a best effort check to see if you can infer that something is true/false ONLY based on existing constraints. `guard_or_true(cond)`/`guard_or_false(cond)`: Those would be used in situations you prefer to guard and know the result of the expression over not guarding, but in case you hit a data dependent error you are ok with just returning true or false. Some reasons you might be ok with returning true/false instead could be: 1. It's an optimization I do not want to fail for not performing optimization. 2. I am willing to deviate from the normal semantics when I have unbacked for the benefit of not failing (See the doc above for more details). `definitely_true(cond)`: same as `guard_or_false(cond)` except does not try to do static eval for unbacked (planning to deprecate it and replace uses with `guard_or_false` or make it alias to `guard_or_false`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148430 Approved by: https://github.com/bobrenjc93	2025-03-27 02:22:20 +00:00
Tristan Rice	159e97cbcf	ProcessGroupGloo: support reduce_scatter + update support chart (#149869 ) This adds a `reduce_scatter` implementation for ProcessGroupGloo. This is a pretty naive implementation as it does 1 allreduce per rank but may be useful for testing in FSDP etc. There was an existing implementation of reduce_scatter_tensor/reduce_scatter_tensor_coalesed that has a very similar implementation but requires a fixed tensor size per rank. If users find these functions to be too slow we can address them as issues arise. Gloo now supports all major distributed operations. Quite a few of these were added by @rohan-varma and @yifuwang but they didn't update the support chart. We also have `CUDAWork` variants of most operations so those were also added to the chart. Test plan: ``` pytest -v test/distributed/test_c10d_gloo.py -k reduce_scatter ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149869 Approved by: https://github.com/fduwjj	2025-03-25 01:16:12 +00:00
Justin Chu	2dccd70ef0	[ONNX] Clean up legacy dynamo export code (#149745 ) Clean up code that is unused and obsolete. The public `torch.onnx.dynamo_export` is kept for now but the legacy implementation is removed. Remove public option classes and OnnxRegistry that have been deprecated. Users: use torch.onnx.export(…, dynamo=True). Pull Request resolved: https://github.com/pytorch/pytorch/pull/149745 Approved by: https://github.com/titaiwangms, https://github.com/cyyever	2025-03-23 19:35:16 +00:00
Pradeep Fernando	1b08aaeafe	Supporting non-tensor-data write_size in planner write items. (#149699 ) Summary: 1\ The current write item structure does not contain the amount of data that needs to be written. 2\ the planner.item already has a size primitive 'tensor_storage_size'. https://fburl.com/code/7a0gsmw7 But only for tensors. 3\ Right now, the only way the writer layer get hold of this property (fro non tensor data) first do a lookup in to the actual tensor/bytes then calculate the nbytes. This change introduce a way to capture non-tensor data size within a write-plan item. Test Plan: Existing UT. Differential Revision: D71599725 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149699 Approved by: https://github.com/MeetVadakkanchery	2025-03-21 18:09:14 +00:00
Jing Xu	4ea580568a	update aotinductor doc for XPU support (#149299 ) as title. Since the AOTInductor feature starting from 2.7 works on Intel GPU, add the related contents into its doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149299 Approved by: https://github.com/guangyey, https://github.com/desertfire	2025-03-21 04:40:31 +00:00
FFFrog	1dce65a82c	Fix the invalid link for FX (#149289 ) As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149289 Approved by: https://github.com/zou3519	2025-03-19 14:03:18 +00:00
FFFrog	e8a35eb7da	Add Missing Communication collectives (#147379 ) ---- - reduce_add_coalesced Pull Request resolved: https://github.com/pytorch/pytorch/pull/147379 Approved by: https://github.com/mikaylagawarecki	2025-03-19 06:59:04 +00:00
Justin Chu	010963032c	[ONNX] Create onnx_symbolic (#148905 ) In the old exporter we allow users to define a symbolic() method to bypass JIT tracing for a block of logic. We can allow users to do similar things by creating symbolic ops at export. This PR implements `torch.onnx.ops.symbolic` and `torch.onnx.ops.symbolic_multi_out` to allow users to create onnx nodes symbolically with pt2 & fx. The custom pytorch ops were designed such that the attributes are encoded to be part of a valid fx op. Users provide shape and dtype for the meta function to produce the currect fake tensor during export. An example is ![image](https://github.com/user-attachments/assets/c62f5f21-e038-456e-a71d-b9a5d0a7cd9d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148905 Approved by: https://github.com/titaiwangms	2025-03-18 21:32:06 +00:00
Jane Xu	988827cdfb	Use schema as source of truth + support ones_like/empty_like (#149052 ) This change does 2 important things: (a) Instead of relying on IValue type as source of truth, we use the schema as the source of truth, which is important as IValue types are overloaded and can ambiguously convert incorrectly. For example, a MemoryFormat will look like an int + get converted to an int64_t vs a MemoryFormat! (b) This PR expands support for many more types to encompass way more schemas, e.g., Optional, Device, dtype, etc. The main win from this PR is the ability for aoti_torch_call_dispatcher to call TensorFactory ops like ones_like/empty_like! Pull Request resolved: https://github.com/pytorch/pytorch/pull/149052 Approved by: https://github.com/albanD	2025-03-18 02:40:54 +00:00
Justin Chu	ebabd0efdd	[ONNX] Expose verification utilities (#148603 ) Expose verification utilities to public documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148603 Approved by: https://github.com/titaiwangms	2025-03-18 02:10:34 +00:00
Leo Wang	f4bffb7461	[docs] fix autograd description on convex function case (#148658 ) The sub-gradient of minimum norm is the least steep descent direction. ```python import torch x = torch.tensor([-2, -1, 0, 1, 2.], requires_grad=True) torch.relu(x).sum().backward() print(x.grad) # tensor([0., 0., 0., 1., 1.]) y = torch.tensor([-2, -1, 0, 1, 2.], requires_grad=True) torch.abs(y).sum().backward() print(y.grad) # tensor([-1., -1., 0., 1., 1.]) ``` (How can I request a reviewer? I don't have the button on the right) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148658 Approved by: https://github.com/lezcano	2025-03-13 09:06:15 +00:00
Howard Huang	b98af95401	Fix DCP link (#148974 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148974 Approved by: https://github.com/svekars	2025-03-11 21:26:37 +00:00
Nikita Shulga	c18858d633	[MPS] Make `torch.mps.compile_shader` public (#148972 ) It was a private method in 2.6, but nothin changes in its API for 2.7 and it will likely remain the same in 2.8, so time to remove underscore from its name Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148972 Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/seemethere, https://github.com/albanD, https://github.com/dcci	2025-03-11 20:20:58 +00:00
Chien-Chin Huang	52acc1f955	[DSD] Update the document to mention the limitation of set_optimizer_state_dict (#148918 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/140898 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148918 Approved by: https://github.com/fduwjj, https://github.com/mori360 ghstack dependencies: #148825	2025-03-11 18:24:12 +00:00
albanD	68c12ecfe2	Move get accelerator to use build time flags when possible (#146098 ) This PR does two main things (they are in a single PR to show how the newly added APIs are used). - Add isBuilt and isAvailable APIs to the AcceleratorHook interface. See inline doc for their exact semantic - Use the newly added isBuilt for accelerator check to ensure it does not poison fork Pull Request resolved: https://github.com/pytorch/pytorch/pull/146098 Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/EikanWang, https://github.com/jeromean Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2025-03-10 13:17:58 +00:00
Nichols A. Romero	08baaa7d63	[Docs][TunableOp] TunableOp documentation update (#148384 ) This PR aligns documentation to what is in the README file: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cuda/tunable/README.md and removes the prototype NOTE. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148384 Approved by: https://github.com/jeffdaily, https://github.com/svekars Co-authored-by: Svetlana Karslioglu <svekars@meta.com>	2025-03-07 21:02:49 +00:00
PyTorch MergeBot	b246cd7b82	Revert "Move get accelerator to use build time flags when possible (#146098 )" This reverts commit 17302b4bc837af079d2f6480f07ea2c99b93fb4b. Reverted https://github.com/pytorch/pytorch/pull/146098 on behalf of https://github.com/albanD due to Still fails with cuda build on a non-gpu machine ([comment](https://github.com/pytorch/pytorch/pull/146098#issuecomment-2707191770))	2025-03-07 18:59:58 +00:00

... 5 6 7 8 9 ...

3248 Commits