pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-06 00:54:56 +08:00

Author	SHA1	Message	Date
Catherine Lee	4e3e4c0940	tc	2025-10-28 14:58:11 -07:00
Yuanyuan Chen	a60d9e1f6d	Fix flake8 B028 warnings (#166224 ) This PR fixes flake8 B028 warning by specifying stacklevel=2 in `warnings.warn`. The advantage is that users can know more contextual information about PyTorch warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166224 Approved by: https://github.com/ezyang	2025-10-26 06:18:55 +00:00
Artem Kuzmitckii	f3b8e15f20	[AMD][gfx1100] test_decompose_mem_bound_mm.py tolerance increase (#165625 ) test_decompose_mem_bound_mm.py tolerance increase for navi3x(gfx11x) (cherry picked from commit 03c7da05f61890bbf5ae41e23c8df6d5f6805bac) from Fixes for CI HUD for gfx1100 Signed-off-by: Artem Kuzmitckii <artem.kuzmitckii@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165625 Approved by: https://github.com/jeffdaily Co-authored-by: iupaikov-amd <Iurii.Paikov@amd.com> Co-authored-by: Dmitry Nikolaev <139769634+dnikolaev-amd@users.noreply.github.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-22 01:38:48 +00:00
Yuanyuan Chen	aaac8cb0f5	[1/N] Add strict parameter to Python zip calls (#165531 ) Add `strict=True/False` to zip calls in test utils. `strict=True` is passed when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165531 Approved by: https://github.com/Skylion007	2025-10-18 05:26:33 +00:00
Xu Han	e7091a47da	[AOTI] skip Windows XPU crashed UTs. (#165393 ) Skip some UTs, which crashed on Windows XPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165393 Approved by: https://github.com/jansel	2025-10-14 23:45:14 +00:00
Yuanyuan Chen	fb64da0791	[2/N] Use "is" in python type comparison (#165142 ) This is follow-up of #165037. It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165142 Approved by: https://github.com/albanD	2025-10-10 15:36:44 +00:00
Jithun Nair	ee6a1ecb0a	[ROCm] Enable MI355 CI on PRs, and run full set of UTs on PRs (#160215 ) Useful to have PR testing for PRs such as https://github.com/pytorch/pytorch/pull/151360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160215 Approved by: https://github.com/malfet, https://github.com/atalman Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-09 18:03:12 +00:00
Yuanyuan Chen	a029675f6f	More ruff SIM fixes (#164695 ) This PR applies ruff `SIM` rules to more files. Most changes are about simplifying `dict.get` because `None` is already the default value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164695 Approved by: https://github.com/ezyang	2025-10-09 03:24:50 +00:00
Maggie Moss	086dec3235	Pyrefly suppressions 6/n (#164877 ) Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283 Almost there! Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check step 1: delete lines in the pyrefly.toml file from the project-excludes field step 2: run pyrefly check step 3: add suppressions, clean up unused suppressions before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199 after: INFO 0 errors (5,064 ignored) Only four directories left to enable Pull Request resolved: https://github.com/pytorch/pytorch/pull/164877 Approved by: https://github.com/oulgen	2025-10-08 02:30:57 +00:00
amdfaa	955f21dc2c	[ROCm][CI] Add support for gfx1100 in rocm workflow + test skips (#148355 ) This PR adds infrastructure support for gfx1100 in the rocm workflow. Nodes have been allocated for this effort. @dnikolaev-amd contributed all the test skips. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148355 Approved by: https://github.com/jeffdaily Co-authored-by: Dmitry Nikolaev <dmitry.nikolaev@amd.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-10-07 22:36:25 +00:00
Yuanyuan Chen	35c4130fd1	[2/N] Fix ruff warnings (#164460 ) Apply ruff `SIM` rules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164460 Approved by: https://github.com/ezyang	2025-10-04 03:40:32 +00:00
Prachi	3ca09d65f1	[ROCm] Enable several distributed UTs (#164390 ) Increase the tolerance for the following UTs as there was a slight mismatch seen on MI200. - test_data_parallel.py:test_strided_grad_layout - test_c10d_nccl.py:test_grad_layout_1devicemodule_1replicaperprocess Skip for MI200: - test_fully_shard_training.py:test_2d_mlp_with_nd_mesh - test_2d_composability.py:test_train_parity_2d_mlp - test_fully_shard_overlap.py:test_fully_shard_training_overlap Fixes #159489 Fixes #159488 Fixes #152700 Fixes #125555 Fixes #134139 Working as is on both MI200 and MI300: Fixes #125991 Fixes #125918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164390 Approved by: https://github.com/jeffdaily	2025-10-03 19:52:51 +00:00
Yuanyuan Chen	18e18488e8	[6/N] Apply ruff UP035 rule (#164438 ) Continued code migration to enable ruff UP035. Most changes are about moving `Callable` from typing to from collections.abc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164438 Approved by: https://github.com/ezyang	2025-10-03 00:15:32 +00:00
Anthony Barbier	bdc0a421d7	Stop parsing command line arguments every time common_utils is imported. (#156703 ) Last PR in the series to re-submit https://github.com/pytorch/pytorch/pull/134592 as smaller PRs: https://github.com/pytorch/pytorch/pull/154612 https://github.com/pytorch/pytorch/pull/154628 https://github.com/pytorch/pytorch/pull/154715 https://github.com/pytorch/pytorch/pull/154716 https://github.com/pytorch/pytorch/pull/154725 https://github.com/pytorch/pytorch/pull/154728 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156703 Approved by: https://github.com/clee2000	2025-10-02 22:22:04 +00:00
PyTorch MergeBot	39189592fd	Revert "Stop parsing command line arguments every time common_utils is imported. (#156703 )" This reverts commit ac7b4e7fe4d233dcd7f6343d42b4fa3d64bce548. Reverted https://github.com/pytorch/pytorch/pull/156703 on behalf of https://github.com/clee2000 due to failing internally D80206253, see above comment for details ([comment](https://github.com/pytorch/pytorch/pull/156703#issuecomment-3362156908))	2025-10-02 16:54:22 +00:00
Anthony Barbier	ac7b4e7fe4	Stop parsing command line arguments every time common_utils is imported. (#156703 ) Last PR in the series to re-submit https://github.com/pytorch/pytorch/pull/134592 as smaller PRs: https://github.com/pytorch/pytorch/pull/154612 https://github.com/pytorch/pytorch/pull/154628 https://github.com/pytorch/pytorch/pull/154715 https://github.com/pytorch/pytorch/pull/154716 https://github.com/pytorch/pytorch/pull/154725 https://github.com/pytorch/pytorch/pull/154728 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156703 Approved by: https://github.com/clee2000	2025-10-02 15:48:47 +00:00
Pat Vignola	702f6e703b	[MTIA] Enable deserialization for FP8 checkpoint loading (#163559 ) Summary: It looks like loading FP8 checkpoints goes through that path which wasn't enabled for MTIA beforehand, whereas loading BF16 checkpoints didn't. Differential Revision: D82997140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163559 Approved by: https://github.com/mikaylagawarecki	2025-10-02 04:18:46 +00:00
Klaus Zimmermann	fa54b08cd5	Replace setup.py install with pip install (#156711 ) #156027 already replaced most use of `python setup.py install`. This PR only adds a few more occurrences and adds `--no-build-isolation` in a few places. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156711 Approved by: https://github.com/atalman	2025-09-29 15:15:10 +00:00
Xuehai Pan	047ae24e34	Eliminate setup.py install/develop in the codebose (#162329 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162329 Approved by: https://github.com/ezyang	2025-09-29 03:54:28 +00:00
Nikita Shulga	4027e97791	[BE] Delete `skipIfMPSOnMacOS13` (#163515 ) As PyTorch needs MacOS-14 or newer to use MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/163515 Approved by: https://github.com/Skylion007	2025-09-22 21:10:22 +00:00
Prachi Gupta	f638854e1d	[ROCm][SymmMem] re-enable UTs (#162811 ) After the UT suite moved to `MultiProcContinuousTest`, `skipIfRocm` decorator started failing rather than skipping UTs because now we spawn multiple threads before the skip decorator is taken into account and the skip decorator was raising an exception to exit the process. But, the parent process treated the child process exiting as a crash rather than a skip. Additionally, in `MultiProcContinuousTest`, if one UT fails all subsequent ones are also skipped which makes sense since there's one setup for the entire suite. However, this showed up as many failing/skipped UTs in the parity. I added multiprocess version of skip decorators for ROCm, including, `skip_if_rocm_arch_multiprocess` and `skip_if_rocm_ver_lessthan_multiprocess`. These are needed as symmetric memory feature is only supported on MI300 onwards and we need to skip them for other archs and some UTs only work after ROCm7.0. Fixes #161249 Fixes #161187 Fixes #161078 Fixes #160989 Fixes #160881 Fixes #160768 Fixes #160716 Fixes #160665 Fixes #160621 Fixes #160549 Fixes #160506 Fixes #160445 Fixes #160347 Fixes #160203 Fixes #160177 Fixes #160049 Fixes #159921 Fixes #159764 Fixes #159643 Fixes #159499 Fixes #159397 Fixes #159396 Fixes #159347 Fixes #159067 Fixes #159066 Fixes #158916 Fixes #158760 Fixes #158759 Fixes #158422 Fixes #158138 Fixes #158136 Fixes #158135 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/162811 Approved by: https://github.com/jeffdaily	2025-09-16 15:35:39 +00:00
Edward Yang	1051c7dbc2	Don't unconditionally import torch._dynamo, it's slow (#162595 ) A trivial test on OS X. Before: ``` real 0m6.550s user 0m2.532s sys 0m3.359s ``` After: ``` real 0m2.607s user 0m1.898s sys 0m3.344s ``` Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/162595 Approved by: https://github.com/albanD	2025-09-10 17:21:03 +00:00
PyTorch MergeBot	b149c7204c	Revert "port distributed pipeline test files for Intel GPU (#159033 )" This reverts commit 76a0609b6bddb2bc40f1eb4ade12885023653d59. Reverted https://github.com/pytorch/pytorch/pull/159033 on behalf of https://github.com/clee2000 due to broke test_cpp_extensions_stream_and_event.py::TestCppExtensionStreamAndEvent::test_stream_event [GH job link](https://github.com/pytorch/pytorch/actions/runs/16890370216/job/47849586456) [HUD commit link](`76a0609b6b`) note to self: bad TD ([comment](https://github.com/pytorch/pytorch/pull/159033#issuecomment-3176833314))	2025-08-11 20:44:45 +00:00
Liao, Wei	76a0609b6b	port distributed pipeline test files for Intel GPU (#159033 ) In this PR we will port all distributed pipeline test files. We could enable Intel GPU with following methods and try the best to keep the original code styles: 1. instantiate_device_type_tests() 2. use "torch.accelerator.current_accelerator()" to determine the accelerator backend 3. use "requires_accelerator_dist_backend()" to replace requires_nccl() 4. use "get_default_backend_for_device()" to get backend 5. enabled XPU for some test path 6. add TEST_MULTIACCELERATOR in common_utils for all backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159033 Approved by: https://github.com/guangyey, https://github.com/d4l3k Co-authored-by: Daisy Deng <daisy.deng@intel.com>	2025-08-11 19:43:15 +00:00
Aaron Gokaslan	beb4d7816d	[BE]: ruff PLC0207 - use maxsplit kwarg (#160107 ) Automatically replaces split with rsplit when relevant and only performs the split up to the first ( or last value). This allows early return of the split function and improve efficiency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160107 Approved by: https://github.com/albanD	2025-08-08 03:14:59 +00:00
Mwiza Kunda	0afaeb7c4e	Improve `extract_test_fn` (#158637 ) The current implementation assumes test functions are resolved as test_module.TestClass.test_fn, however this would not work for modules nested in directories e.g. inductor.test_torchinductor.TestClass.test_fn Pull Request resolved: https://github.com/pytorch/pytorch/pull/158637 Approved by: https://github.com/jbschlosser	2025-08-06 20:45:21 +00:00
Nikita Shulga	e06b110f73	[Testing] Add MPS to NATIVE_DEVICES (#153835 ) This would allow me to enable more opinfo tests against MPS device eventually and supposed to be a very simple test, but actually required minor adjustments to lots of test files, namely: - Introduce `all_mps_types_and` that is very similar to `all_types_and`, but skips `float64` - Decorate lots of tests with `@dtypesIfMPS(*all_mps_types())` - Skip `test_from_dlpack_noncontinguous` as it currently crashes (need to be fixed) - Add lots of `expectedFailureIfMPS` - Delete all `@onlyNativeDeviceTypesAnd("mps")` <sarcasm> I love how well documented this variable are </sarcasm> Pull Request resolved: https://github.com/pytorch/pytorch/pull/153835 Approved by: https://github.com/Skylion007	2025-08-05 18:57:35 +00:00
PyTorch MergeBot	356ac3103a	Revert "Stop parsing command line arguments every time common_utils is imported. (#156703 )" This reverts commit 310f901a71e53688866b14bb2f2b4c8eef9979b3. Reverted https://github.com/pytorch/pytorch/pull/156703 on behalf of https://github.com/izaitsevfb due to breaking tests internally with `assert common_utils.SEED is not None` ([comment](https://github.com/pytorch/pytorch/pull/156703#issuecomment-3152337518))	2025-08-04 20:37:39 +00:00
Anthony Barbier	310f901a71	Stop parsing command line arguments every time common_utils is imported. (#156703 ) Last PR in the series to re-submit https://github.com/pytorch/pytorch/pull/134592 as smaller PRs: https://github.com/pytorch/pytorch/pull/154612 https://github.com/pytorch/pytorch/pull/154628 https://github.com/pytorch/pytorch/pull/154715 https://github.com/pytorch/pytorch/pull/154716 https://github.com/pytorch/pytorch/pull/154725 https://github.com/pytorch/pytorch/pull/154728 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156703 Approved by: https://github.com/clee2000	2025-08-02 16:38:54 +00:00
Denghui Dong	a775c8e73e	[Profiler] Fix lost C call events problem in Python 3.12.0-3.12.4 (#155446 ) Hi team, Please help review this patch. This PR https://github.com/pytorch/pytorch/pull/150370 tried to fix the "Empty C Call Queue" problem on Python 3.12. It added C calls for each starting Python event with a callable. I found the root cause is not that we cannot get C function frames by `PyFrame_GetBack` when PythonTracer is filling start frames, but the c call event loss problem bug on Python 3.12.0-3.12.4. And that problem was fixed by `257c413cd1` on 3.12.5. So I think the https://github.com/pytorch/pytorch/pull/150370 cannot fix the problem, this patch reverts the change of it. There are solutions to fix the problem correctly, such as we can add a new monitoring callback to compensate call events of methods with C function or we can override the callback registered by `PyEval_SetProfile`. These solutions may make the code hard to maintain. ~~Since upgrading the micro version of Python is not difficult for users, we can just ignore C functions and suggest user upgrade.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/155446 Approved by: https://github.com/sraikund16	2025-07-30 16:35:51 +00:00
PyTorch MergeBot	1cffb217ef	Revert "[Profiler] Fix lost C call events problem in Python 3.12.0-3.12.4 (#155446 )" This reverts commit e88f804a2eecf967dbbf95c5643248352626dafd. Reverted https://github.com/pytorch/pytorch/pull/155446 on behalf of https://github.com/XuehaiPan due to Breaks Windows wheels ([comment](https://github.com/pytorch/pytorch/pull/155446#issuecomment-3125566269))	2025-07-28 05:29:37 +00:00
Denghui Dong	e88f804a2e	[Profiler] Fix lost C call events problem in Python 3.12.0-3.12.4 (#155446 ) Hi team, Please help review this patch. This PR https://github.com/pytorch/pytorch/pull/150370 tried to fix the "Empty C Call Queue" problem on Python 3.12. It added C calls for each starting Python event with a callable. I found the root cause is not that we cannot get C function frames by `PyFrame_GetBack` when PythonTracer is filling start frames, but the c call event loss problem bug on Python 3.12.0-3.12.4. And that problem was fixed by `257c413cd1` on 3.12.5. So I think the https://github.com/pytorch/pytorch/pull/150370 cannot fix the problem, this patch reverts the change of it. There are solutions to fix the problem correctly, such as we can add a new monitoring callback to compensate call events of methods with C function or we can override the callback registered by `PyEval_SetProfile`. These solutions may make the code hard to maintain. ~~Since upgrading the micro version of Python is not difficult for users, we can just ignore C functions and suggest user upgrade.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/155446 Approved by: https://github.com/sraikund16	2025-07-25 21:44:57 +00:00
Xuehai Pan	f903bc475c	[BE] add noqa for flake8 rule B036: found `except BaseException` without re-raising (#159043 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159043 Approved by: https://github.com/Skylion007	2025-07-25 02:56:34 +00:00
PyTorch MergeBot	b533f12120	Revert "[Profiler] Fix lost C call events problem in Python 3.12.0-3.12.4 (#155446 )" This reverts commit da94023b0205bf98c3da366f2f86e0a443f4db17. Reverted https://github.com/pytorch/pytorch/pull/155446 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @sraikund16 can you please help validate the fix? (See D78845227 for details). You can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/155446#issuecomment-3115072504))	2025-07-24 21:46:00 +00:00
Denghui Dong	da94023b02	[Profiler] Fix lost C call events problem in Python 3.12.0-3.12.4 (#155446 ) Hi team, Please help review this patch. This PR https://github.com/pytorch/pytorch/pull/150370 tried to fix the "Empty C Call Queue" problem on Python 3.12. It added C calls for each starting Python event with a callable. I found the root cause is not that we cannot get C function frames by `PyFrame_GetBack` when PythonTracer is filling start frames, but the c call event loss problem bug on Python 3.12.0-3.12.4. And that problem was fixed by `257c413cd1` on 3.12.5. So I think the https://github.com/pytorch/pytorch/pull/150370 cannot fix the problem, this patch reverts the change of it. There are solutions to fix the problem correctly, such as we can add a new monitoring callback to compensate call events of methods with C function or we can override the callback registered by `PyEval_SetProfile`. These solutions may make the code hard to maintain. ~~Since upgrading the micro version of Python is not difficult for users, we can just ignore C functions and suggest user upgrade.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/155446 Approved by: https://github.com/sraikund16, https://github.com/cyyever	2025-07-23 20:03:52 +00:00
Simon Fan	80ac73c057	[ca] reset between tests (#158418 ) CA reset is much faster than dynamo reset, so it's probably okay to run it every time. I'm not sure if this will fix the flaky autograd tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158418 Approved by: https://github.com/jansel	2025-07-17 20:14:29 +00:00
Daisy Deng	8088958793	port 4 dynamo test files to Intel GPU (#157779 ) For https://github.com/pytorch/pytorch/issues/114850, we will port test cases to Intel GPU. Six dynamo test files were ported in PR [#156056](https://github.com/pytorch/pytorch/pull/156056) and [#156575](https://github.com/pytorch/pytorch/pull/156575.) In this PR we will port 4 more dynamo test files. We could enable Intel GPU with following methods and try the best to keep the original code styles: - instantiate_device_type_tests() - use "torch.accelerator.current_accelerator()" to determine the accelerator backend - added XPU support in decorators like @requires_gpu - enabled XPU for some test path - added xfailIfXPU to skip xpu test when there is a bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157779 Approved by: https://github.com/guangyey, https://github.com/jansel	2025-07-11 10:11:49 +00:00
Nikita Shulga	5e636d664a	[BE] `@serialTest` decorator must be called (#157388 ) Otherwise it turns test into a trivial one(that always succeeds), as following example demonstrates ```python import torch from torch.testing._internal.common_utils import serialTest, run_tests, TestCase class MegaTest(TestCase): @serialTest def test_foo(self): if hasattr(self.test_foo, "pytestmark"): print("foo has attr and it is", self.test_foo.pytestmark) print("foo") @serialTest() def test_bar(self): if hasattr(self.test_bar, "pytestmark"): print("bar has attr and it is", self.test_bar.pytestmark) print("bar") if __name__ == "__main__": run_tests() ``` That will print ``` test_bar (__main__.MegaTest.test_bar) ... bar has attr and it is [Mark(name='serial', args=(), kwargs={})] bar ok test_foo (__main__.MegaTest.test_foo) ... ok ---------------------------------------------------------------------- Ran 2 tests in 0.013s ``` Added assert that arg is boolean in the decorator to prevent such silent skips in the future Pull Request resolved: https://github.com/pytorch/pytorch/pull/157388 Approved by: https://github.com/clee2000	2025-07-02 19:15:19 +00:00
haozhe.zhu	53e0b9c393	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-06-26 10:32:20 +00:00
fduwjj	4585c33e74	[symm_mem] Fix nccl test for symm mem (#156752 ) Try not to call set_device to Fixes #156569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156752 Approved by: https://github.com/kwen2501	2025-06-26 02:59:38 +00:00
Xuehai Pan	cec2977ed2	[BE][6/16] fix typos in torch/ (#156316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315	2025-06-23 02:57:34 +00:00
PyTorch MergeBot	3f44fdc03d	Revert "[BE][6/16] fix typos in torch/ (#156316 )" This reverts commit b210cf1ea56bcd9f937a2805d9e70d8684d25ee4. Reverted https://github.com/pytorch/pytorch/pull/156316 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Sidharth	aeaf6b59e2	[dynamo] Weblink generation when unimplemented_v2() is called (#156033 ) This PR includes the GBID weblink whenever a user encounters a graph break. I also had to include the JSON file in setup.py, so it can be part of the files that are packaged in during CI. It also fixes the issue of the hardcoded error messages stripping away one of the '/' in 'https'. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156033 Approved by: https://github.com/williamwen42	2025-06-22 11:39:31 +00:00
Xuehai Pan	b210cf1ea5	[BE][6/16] fix typos in torch/ (#156316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315	2025-06-22 08:43:33 +00:00
Simon Fan	f1968a5e76	[ca] skip on some PYTORCH_TEST_WITH_DYNAMO=1 autograd tests (#156374 ) These aren't supported. Not sure how they passed CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/156374 Approved by: https://github.com/jansel	2025-06-21 18:33:38 +00:00
atalman	a47ca4fc74	Revert "[dynamo] Weblink generation when unimplemented_v2() is called (#156033 )" (#156546 ) Broke multiple CI jobs: dynamo/test_reorder_logs.py::ReorderLogsTests::test_constant_mutation [GH job link](https://github.com/pytorch/pytorch/actions/runs/15792695433/job/44521220864) [HUD commit link](`9de23d0c29`) This reverts commit 9de23d0c29dfac8dc0f6f234bdbcd85a6375fa81. PyTorch bot revert failed: https://github.com/pytorch/pytorch/pull/156033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156546 Approved by: https://github.com/jansel	2025-06-21 14:10:12 +00:00
Sidharth	9de23d0c29	[dynamo] Weblink generation when unimplemented_v2() is called (#156033 ) This PR includes the GBID weblink whenever a user encounters a graph break. I also had to include the JSON file in setup.py, so it can be part of the files that are packaged in during CI. It also fixes the issue of the hardcoded error messages stripping away one of the '/' in 'https'. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156033 Approved by: https://github.com/williamwen42	2025-06-21 05:47:54 +00:00
Daniel Galvez	9ed0060225	Provide access to the cudaGraph_t underlying a CUDAGraph. (#155164 ) There are a few considerations here: 1. A user might want to modify the cudaGraph_t either during the stream capture or after the stream capture (but before instantiation). This draft implements modification after stream capture only, though support could be added for modification during stream capture by applying https://github.com/pytorch/pytorch/pull/140979/files#diff-d7302d133bb5e0890fc94de9aeea4d9d442555a3b40772c9db10edb5cf36a35cR391-R404 2. Previously, the cudaGraph_t would be destroyed before the end of capture_end() unless the user had previously called enable_debug_mode(). There is no way to implement this correctly without removing this restriction, or forcing the user to always call enable_debug_mode(). However, enable_debug_mode() is a confusing API (despite being an instance method, it would modify a static global variable; thus, putting one CUDAGraph object into debug mode puts all of them into debug mode, which is not acceptable in my opinion). Therefore, I made enable_debug_mode() into a no-op. This means that the CPU memory usage will increase after this change. I think this is likely to be fine. 3. No python bindings yet. These should be easy to add. It is probably worthwhile to take some time to make sure that the returned cudaGraph_t can be converted into the cuda-python cudaGraph_t in a reasonable, hopefully type-safe, manner (but without making cuda-python a dependency of pytorch), since I imagine most users will use the pip cuda-python package to make modifications. 4. There are two foot guns: a. The cudaGraph_t returned by raw_cuda_graph() is not owned by the user, so it will be destroyed once the owning CUDAGraph is destroyed (or calls reset()). b. The following seuquence won't work as intended: ``` g = torch.cuda.CUDAGraph() with torch.cuda.graph(g): foo() g.replay() raw_graph = g.raw_cuda_graph() modify(raw_graph) g.replay() ``` This won't work because the user must call instantiate() again after modifying cudaGraph_t. You could add a "safety" mechanism by traversing the cudaGraph_t to create a hash and seeing if the hash changes between calls to replay(), but this is likely way too expensive. I think these two foot guns are probably okay given that this a bit of an experts' API. Fixes #155106 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155164 Approved by: https://github.com/ngimel	2025-06-18 03:39:28 +00:00
Simon Fan	907d0931cc	[ca] default on in CI, with fallback for tests in test/compiled_autograd_skips/ (#155480 ) For every test that is ran with PYTORCH_TEST_WITH_DYNAMO=1, turn on compiled autograd via config if it is not skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/155480 Approved by: https://github.com/jansel ghstack dependencies: #155521	2025-06-16 18:45:03 +00:00
Aaron Orenstein	e95e8eed0a	mypy 1.16.0 (#155821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155821 Approved by: https://github.com/ezyang, https://github.com/zou3519	2025-06-14 18:18:43 +00:00

1 2 3 4 5 ...

726 Commits