pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Xuehai Pan	4dce5b71a0	[build] modernize build-frontend: `python setup.py develop/install` -> `[uv ]pip install --no-build-isolation [-e ].` (#156027 ) Modernize the development installation: ```bash # python setup.py develop python -m pip install --no-build-isolation -e . # python setup.py install python -m pip install --no-build-isolation . ``` Now, the `python setup.py develop` is a wrapper around `python -m pip install -e .` since `setuptools>=80.0`: - pypa/setuptools#4955 `python setup.py install` is deprecated and will emit a warning during run. The warning will become an error on October 31, 2025. - `9c4d383631/setuptools/command/install.py (L58-L67)` > ```python > SetuptoolsDeprecationWarning.emit( > "setup.py install is deprecated.", > """ > Please avoid running ``setup.py`` directly. > Instead, use pypa/build, pypa/installer or other > standards-based tools. > """, > see_url="https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html", > due_date=(2025, 10, 31), > ) > ``` - pypa/setuptools#3849 Additional Resource: - [Why you shouldn't invoke setup.py directly](https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156027 Approved by: https://github.com/ezyang	2025-07-09 11:24:27 +00:00
Jithun Nair	38757d94f1	Enable target-determination (TD) for ROCm CI (#156545 ) Target determination sorts the tests in a PR CI run based on heuristics about which tests are more relevant to the PR's changes. This can help provide faster CI signal as well as help alleviate capacity concerns as job durations should decrease due to catching failures earlier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156545 Approved by: https://github.com/jeffdaily, https://github.com/clee2000	2025-07-08 06:27:40 +00:00
Aaron Orenstein	edf7bb4f51	Fix unbound local when an error occurs before pool is initialized (#156750 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156750 Approved by: https://github.com/jamesjwu	2025-07-08 00:28:21 +00:00
rzou	e3fe001d9e	Add einops x torch.compile testing in PyTorch CI (#157416 ) Fixes #146782. This PR adds testing for multiple einops versions in PyTorch CI. This occurs in a new "einops" CI job that runs for both Python 3.9 and 3.13 (aka, what we test Dynamo over). Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/157416 Approved by: https://github.com/guilhermeleobas, https://github.com/arogozhnikov, https://github.com/anijain2305	2025-07-03 17:36:39 +00:00
Aleksei Nikiforov	c11888e7a6	Skip more tests on s390x (#155210 ) Make CI for s390x green before fixing and restoring tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155210 Approved by: https://github.com/seemethere	2025-06-18 12:07:17 +00:00
Catherine Lee	32c1611263	[CI][run_test] Fix rerun logic for failing at exit (#155853 ) Sometimes a test file reports success according to pytest, but fails afterwards, and the rerun logic doesn't handle that correctly. The name of the last run test is saved in order to do more efficient reruns (target the last run test for a rerun without rerunning the entire file). This usually correct, ex test fails and pytest catches it -> lastrun = the test that failed, test segfaults (pytest doesn't catch) -> lastrun is the test that segfaulted. But sometimes pytest reports a success, but the process has non zero exit code. The two cases I know of are hangs and double freeing at exit. In this case, its unclear which test caused the failure, so lastrun is set to be the first test that ran in that session, so that during the next session it will start from the beginning in an attempt to replicate the error (an alternate solution would be to just fail and not rerun, which might be the better option). But then it reruns with runsingle, which prevents lastrun from being reset (not sure why, I'm pretty sure there's no difference between resetting and not normally), so lastrun becomes the last test that ran, and its not always true that lastrun is the one that caused it. Then on the next run, it starts from the last test and the process now exits cleanly Short term solution here: ensure the lastrun is always set to the initial value if the session succeeds. This is correct even in the normal path because initial value shouldn't change in that case Things that still need to be fixed: * log says "running single test" which is not true * no xml reports get generated here * also no xml reports get generated on segfault * docs for this I think I have a PR that fixes the above but its old so I need to take another look Testing: This from when I was based on a commit that had a hang for macs, and before I added the skips in inductor array ref: `cc862d2c14` Pull Request resolved: https://github.com/pytorch/pytorch/pull/155853 Approved by: https://github.com/malfet	2025-06-17 17:51:40 +00:00
Catherine Lee	0079c80b35	[CI] Do not constrain memory for ROCm testing in CI (#156115 ) Fixes ROCm OOMs introduced by https://github.com/pytorch/pytorch/pull/155631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156115 Approved by: https://github.com/jeffdaily	2025-06-17 15:30:36 +00:00
Catherine Lee	eef253d9f6	[CI] Keep going display on HUD: upload log when test fails (#155371 ) I guess this is more of an RFC Goal: Enable keep going so that we can get information immediately for failures. We want be aware of failures as soon as possible, especially on the main branch, this is so that reverts can happen quickly. Proposal: A job with `keep-going` will continue through errors in `python run_test.py`. If a test fails, before it runs the next test, it will upload a fake log that should have enough information in it so that viewing the log will be able to tell you what failed and any stack traces/error logs, and should be able to be parsed by log classifier to get a line. I am getting the log by concating the test logs in test/test-reports, which is all the text outputted by pytest (unless someone runs with `ci-verbose-test-logs` label). There are obviously many things this won't catch, ex output outside of run_test.py, some output inside of run_test.py, but it should be enough. After a log finishes, eventually its raw log is uploaded to ossci-raw-job-status s3 bucket and the log classifier will read it to do classification. This means we will have to change log classifier to read from this bucket as well. I'm thinking just add an input parameter to log classifier like https://github.com/pytorch/test-infra/pull/6723/files Also upload the temp results to a temp attribute instead of the real one To overwrite the conclusion on HUD, I'm thinking a lambda that is s3 put trigger on the fake log being put into s3, that does something similar to log classifier where it just mutates the entry `13a990b678/aws/lambda/log-classifier/src/network.rs (L85)` to add a new field like "will_fail": true, and also triggers the log classifier to run Then we change HUD/ClickHouse to point the raw log url to the alternate place, the new "will_fail" field as the conclusion, and the temp log classifier result if needed Why always write to temp attribution/column? I am unsure about overwriting the real results with fake ones Pros: Not many changes outside of HUD/UI Cons: Lots of moving parts, lots of temp fields that will require adjustment for queries, temp fields never really get deleted Pull Request resolved: https://github.com/pytorch/pytorch/pull/155371 Approved by: https://github.com/malfet	2025-06-13 21:21:55 +00:00
Catherine Lee	9b122aab5d	Fix set per proc memory fraction when running tests (#155631 ) env setting needs to happen before pool creation for it to take effect In theory this should fix some OOMs and also cause some OOMs, but this PR is green so idk alt options: use initializer? Pull Request resolved: https://github.com/pytorch/pytorch/pull/155631 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/seemethere, https://github.com/atalman	2025-06-12 01:28:08 +00:00
PyTorch MergeBot	8347268edc	Revert "Make open device registration tests standalone (#153855 )" This reverts commit 8823138e47a3200c313f6bf2d21eb689d8150f39. Reverted https://github.com/pytorch/pytorch/pull/153855 on behalf of https://github.com/clee2000 due to causing some linux aarch64 tests to fail [GH job link](https://github.com/pytorch/pytorch/actions/runs/15566289293/job/43832373302) [HUD commit link](`8823138e47`), should be easy fix, rename in places where its mentioned, there might be more than just aarch64 though ([comment](https://github.com/pytorch/pytorch/pull/153855#issuecomment-2960191503))	2025-06-10 18:11:24 +00:00
Joel Schlosser	8823138e47	Make open device registration tests standalone (#153855 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153855 Approved by: https://github.com/janeyx99	2025-06-10 17:33:26 +00:00
soulitzer	2af78d368f	Skip another test file that doesn't run gradcheck for slow gradcheck (#154852 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154852 Approved by: https://github.com/albanD	2025-06-04 07:47:09 +00:00
Alessandro Sangiorgi	f57754e815	[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#154618 ) This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/148981 re-opening for visibility : Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key. Motivation Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config. The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging. Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154618 Approved by: https://github.com/jansel	2025-05-30 19:30:25 +00:00
soulitzer	733e684b11	Skip test file that doesn't run gradcheck for slow gradcheck (#154509 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154509 Approved by: https://github.com/malfet	2025-05-29 16:32:26 +00:00
Joel Schlosser	3ecd444004	Support independent builds for cpp extension tests + apply to libtorch_agnostic tests (#153264 ) Related: #148920 This PR: * Provides a helper `install_cpp_extension(extension_root)` for building C++ extensions. This is intended to be used in `TestMyCppExtension.setUpClass()` * Updates libtorch_agnostic tests to use this * Deletes preexisting libtorch_agnostic tests from `test/test_cpp_extensions_aot.py` * Fixes `run_test.py` to actually run tests in `test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py` to avoid losing coverage. This wasn't being run due to logic excluding tests that start with "cpp"; this is fixed now After this PR, it is now possible to run: ``` python test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py ``` and the test will build the `libtorch_agnostic` extension before running the tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153264 Approved by: https://github.com/janeyx99	2025-05-20 19:18:09 +00:00
Shangdi Yu	b3dea0c0dd	Change aoti cpp tests to run serially within file (#152960 ) Fixes #152674 https://github.com/pytorch/pytorch/issues/152889 https://github.com/pytorch/pytorch/issues/152888 https://github.com/pytorch/pytorch/issues/152891 `--dist=loadfile` ensures all tests in the same source file run in the same worker. Tests like `FreeInactiveConstantBufferRuntimeConstantFoldingCuda` expect exclusive access to memory during test time to compute diffs (e.g., initMemory - updateMemory2 == DATASIZE). With `-n 3`, tests run in separate processes, but CUDA device memory is shared — and cudaMemGetInfo() reads device-wide global state. ``` python test/run_test.py --cpp --verbose -i cpp/test_aoti_inference -dist=loadfile ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152960 Approved by: https://github.com/desertfire, https://github.com/cyyever	2025-05-14 17:02:39 +00:00
Guilherme Leobas	ae1e51b6ad	Add infra to run CPython tests under Dynamo (#150787 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150787 Approved by: https://github.com/zou3519	2025-05-07 04:03:14 +00:00
PyTorch MergeBot	103fe856e1	Revert "Add infra to run CPython tests under Dynamo (#150787 )" This reverts commit 7c96dd8f0c9a7e17f598612405f002441c7f07ae. Reverted https://github.com/pytorch/pytorch/pull/150787 on behalf of https://github.com/huydhn due to Sorry for reverting your change but a failed test is showing up in trunk ([comment](https://github.com/pytorch/pytorch/pull/150787#issuecomment-2852818113))	2025-05-06 00:20:02 +00:00
Alexander Grund	99287b170b	Generate test reports for pytest when option is given (#152170 ) The argument needs to be appended when test reports should be generated. IS_CI is not necessarily set, so rather check TEST_SAVE_XML instead as in other places where test reports are conditionally enabled. See also https://github.com/pytorch/pytorch/issues/126523 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152170 Approved by: https://github.com/Skylion007	2025-05-05 17:46:40 +00:00
Guilherme Leobas	7c96dd8f0c	Add infra to run CPython tests under Dynamo (#150787 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150787 Approved by: https://github.com/zou3519	2025-05-05 17:20:14 +00:00
Alexander Grund	ad11d6378c	Don't run NCCL/gloo distributed test without GPUs (#150764 ) If there aren't any GPUs the WORLD_SIZE would be zero which does not work. So skip those backends completely in that case. Fix after https://github.com/pytorch/pytorch/pull/137161 It might make sense to still run the (CPU-) part of the tests by using something like `world_size = max(3, gpu_count)` or `num_gpus if num_gpus else 3` instead of skipping them all Pull Request resolved: https://github.com/pytorch/pytorch/pull/150764 Approved by: https://github.com/kwen2501	2025-04-29 05:27:23 +00:00
PyTorch MergeBot	c03359de2d	Revert "[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#148981 )" This reverts commit fc6e37ceb23f99808265c11a37368078d5f982b8. Reverted https://github.com/pytorch/pytorch/pull/148981 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @davidberard98 can you please help get these changes validated? Details in D73628297. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/148981#issuecomment-2831044810))	2025-04-25 17:45:13 +00:00
fulvius31	fc6e37ceb2	[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#148981 ) This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/147019 : Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key. Motivation Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config. The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging. Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148981 Approved by: https://github.com/davidberard98	2025-04-24 21:28:53 +00:00
FFFrog	3528488061	[Openreg][PrivateUse1] Enable CI for openreg (#151007 ) Changes: - move test_openreg.py from test/cpp_extensions/open_registration_extension/ to test/ - update README.md for openreg - enable CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/151007 Approved by: https://github.com/albanD	2025-04-18 02:40:07 +00:00
PyTorch MergeBot	f252f9df5e	Revert "[Openreg][PrivateUse1] Enable CI for openreg (#151007 )" This reverts commit abbca37fe882541e0259b43dd314a324180550ed. Reverted https://github.com/pytorch/pytorch/pull/151007 on behalf of https://github.com/clee2000 due to At least test_record_event needs to also be skipped on dynamo too, its failing and then somehow causing a hang? https://github.com/pytorch/pytorch/actions/runs/14487625709/job/40637535027#step:25:73 ([comment](https://github.com/pytorch/pytorch/pull/151007#issuecomment-2810789483))	2025-04-16 21:05:17 +00:00
FFFrog	abbca37fe8	[Openreg][PrivateUse1] Enable CI for openreg (#151007 ) Changes: - move test_openreg.py from test/cpp_extensions/open_registration_extension/ to test/ - update README.md for openreg - enable CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/151007 Approved by: https://github.com/albanD ghstack dependencies: #151005	2025-04-16 07:55:51 +00:00
Nikita Shulga	d7050ef48b	[CI] Run test_torchinductor for MPS device (#150821 ) There are only 118 failures atm, mark them all with xfail to avoid new regressions Add `xfail_if_mps_unimplemented` decorator to distinguish between tests that call unimplemented eager op vs ones that fail for some other reason. Added `aten._scaled_dot_product_attention_math_for_mps` fallback to make test behavior consistent between MacOS-15 (where falback is in place) and MacOS-14 Weird MacOS-14 specific skips: - test_torchinductor.py::GPUTests::test_cat_extern_kernel_mps - test_torchinductor.py::GPUTests::test_sort_transpose_mps (likely an eager bug) - test_torchinductor.py::GPUTests::test_unaligned_input_mps Numerous MacOS-13 skips, including few eager hard crashes, for example running `test_torchinductor.py::GPUTests::test_scatter5_mps` causes ``` /AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayScatter.mm:309: failed assertion `Rank of destination array (1) must be greater than or equal to inner-most dimension of indices array (3)' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150821 Approved by: https://github.com/ZainRizvi, https://github.com/dcci ghstack dependencies: #151224, #151246, #151272, #151282, #151288	2025-04-15 18:42:39 +00:00
Prachi Gupta	47cdad2995	[ROCm] Enable several fsdp related UTs (#149369 ) Enabling 26 UTs for ROCm in the following files: - distributed._shard.sharded_optim.test_sharded_optim - 2 UTs - distributed._shard.sharded_tensor.ops.test_binary_cmp - 4 UTs - distributed._shard.sharded_tensor.ops.test_init - 3 UTs - distributed._shard.sharded_tensor.ops.test_embedding - 2 UTs - distributed._shard.sharded_tensor.ops.test_embedding_bag - 2 UTs - distributed._composable.test_replicate_with_compiler - 4 UTs - distributed._composable.fsdp.test_fully_shard_grad_scaler - 1 UTs - distributed.tensor.test_attention - 4 UTs - distributed.tensor.test_matrix_ops - 1 UTs - distributed.tensor.test_tensor_ops - 1 UTs - distributed.fsdp.test_fsdp_grad_acc - 2 UTs Pull Request resolved: https://github.com/pytorch/pytorch/pull/149369 Approved by: https://github.com/jeffdaily	2025-03-31 16:15:57 +00:00
Catherine Lee	85079e4380	[TD] Enable TD on distributed cpu (#150028 ) Enable TD on distributed cpu, I think the only reason it's not is because I forgot to enable it Get rid of some of the statements that are no ops: * asan uses default shard * nogpu got moved to periodic * no windows cuda testing anymore Only thing on pull and trunk that doesn't use TD is dynamo_wrapped but I think it's fast enough to be ok for now, we can take another look after this Pull Request resolved: https://github.com/pytorch/pytorch/pull/150028 Approved by: https://github.com/ZainRizvi	2025-03-28 17:19:11 +00:00
Aleksei Nikiforov	0c139fa58e	Switch s390x tests to blocklist (#149507 ) Switch s390x tests to blocklist Pull Request resolved: https://github.com/pytorch/pytorch/pull/149507 Approved by: https://github.com/seemethere	2025-03-26 12:11:41 +00:00
Aleksei Nikiforov	d5b1d99f78	Enable more nightly tests on s390x (#148452 ) Also enable some tests which probably were accidentally disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148452 Approved by: https://github.com/seemethere, https://github.com/malfet	2025-03-18 16:09:39 +00:00
soulitzer	916e8979d3	Skip some tests not using gradcheck on slowgradcheck (#149220 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149220 Approved by: https://github.com/seemethere	2025-03-17 00:34:52 +00:00
Jane Xu	971606befa	Add a stable TORCH_LIBRARY to C shim (#148124 ) This PR adds two main parts: - shim.h stable C APIs into torch::Library APIs - a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with. Subplots resolved: - Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (fn)(void , int64_t, int64_t)` into it - Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only. - Should I use unint64_t as the common denominator instead of void to support 32bit architectures better? - Yes, and done - Should I add a stable `def` and `fragment` when those can be done in python? - I think we do want these --- and now they're done - Where should library_stable_impl.cpp live? -- no longer relevant - I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc. - Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124 Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/atalman	2025-03-11 19:12:46 +00:00
PyTorch MergeBot	275a7c5dbb	Revert "Add a stable TORCH_LIBRARY to C shim (#148124 )" This reverts commit 327e07ac1dc3351bb5f0ad436760b83590c400aa. Reverted https://github.com/pytorch/pytorch/pull/148124 on behalf of https://github.com/malfet due to Sorry for reverting your PR, but somehow it caused test failures in newly introduced tests, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=pull%20%2F%20linux-focal-cuda12.6-py3.10-gcc11-sm89%20%2F%20test%20(default%2C%201&mergeLF=true ([comment](https://github.com/pytorch/pytorch/pull/148124#issuecomment-2709057833))	2025-03-09 20:44:56 +00:00
Jane Xu	327e07ac1d	Add a stable TORCH_LIBRARY to C shim (#148124 ) This PR adds two main parts: - shim.h stable C APIs into torch::Library APIs - a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with. Subplots resolved: - Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (fn)(void , int64_t, int64_t)` into it - Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only. - Should I use unint64_t as the common denominator instead of void to support 32bit architectures better? - Yes, and done - Should I add a stable `def` and `fragment` when those can be done in python? - I think we do want these --- and now they're done - Where should library_stable_impl.cpp live? -- no longer relevant - I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc. - Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124 Approved by: https://github.com/albanD, https://github.com/zou3519	2025-03-09 10:07:25 +00:00
PyTorch MergeBot	63778cb8a0	Revert "[Inductor] Record Triton’s Base32 Cache Key in `.best_config` for Debugging (#147019 )" This reverts commit e3e45d90d8578083da8b51a3b1d911e9a4523e5b. Reverted https://github.com/pytorch/pytorch/pull/147019 on behalf of https://github.com/clee2000 due to broke inductor test inductor/test_max_autotune.py::TestMaxAutotune::test_cat_max_autotune_extern [GH job link](https://github.com/pytorch/pytorch/actions/runs/13653495421/job/38171259603) [HUD commit link](`e3e45d90d8`) on inductor workflow and rocm workflow ([comment](https://github.com/pytorch/pytorch/pull/147019#issuecomment-2698677222))	2025-03-04 19:20:15 +00:00
fulvius31	e3e45d90d8	[Inductor] Record Triton’s Base32 Cache Key in `.best_config` for Debugging (#147019 ) Modified TorchInductor’s autotuning flow so that each `best_config` JSON file also includes the Triton “base32” (or base64) cache key. Motivation Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config. The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging. Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set `store_cubin = True` since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147019 Approved by: https://github.com/davidberard98	2025-03-04 12:16:38 +00:00
Alexander Grund	f1cce0951b	Create unique test report files for distributed tests (#148325 ) The distributed tests are executed once for each backend and for each init method. `$TEST_REPORT_SOURCE_OVERRIDE` is used such that test results from different backends are stored in different files. The same needs to be done for the init method. Move the setting of the variable into `test_distributed` and incorporate the init method into the name. Useful for e.g. https://github.com/pytorch/pytorch/issues/126523 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148325 Approved by: https://github.com/clee2000	2025-03-04 10:45:33 +00:00
Xuehai Pan	c73a92fbf5	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 ) Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements > Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target: > > ```python > # Input > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > > # Black > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > # Ruff > assert len(policy_types) >= priority + num_duplicates, ( > f"This tests needs at least {priority + num_duplicates} many types." > ) > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546 Approved by: https://github.com/malfet	2025-02-27 20:46:16 +00:00
Zhenbin Lin	7ffae2c028	Split test_transformers.py (#147441 ) Split test_transformers.py into test_transformers.py and test_transformers_privateuser1.py. Currently the privateuse1 test cases in test_transformers.py are skipped since they conflict with cuda test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147441 Approved by: https://github.com/drisspg	2025-02-26 11:54:24 +00:00
Xuehai Pan	754fb834db	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 ) Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting - Change the outer quotes to double quotes for nested f-strings ```diff - f'{", ".join(args)}' + f"{', '.join(args)}" ``` - Change the inner quotes to double quotes for triple f-strings ```diff string = """ - {', '.join(args)} + {", ".join(args)} """ ``` - Join implicitly concatenated strings ```diff - string = "short string " "short string " f"{var}" + string = f"short string short string {var}" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569 Approved by: https://github.com/Skylion007 ghstack dependencies: #146509	2025-02-24 19:56:09 +00:00
Catherine Lee	0d16188c06	[CI] Use job name to index into test times json (#147154 ) When the test times are generated, it doesn't know what the build environment is because it's an environment variable. But when we index into the test times, we (previously) didn't know what the job name is. These are usually the same but sometimes they're different and when they're different it ends up using default, which can have unbalanced sharding I think job name was added at some point to most of the CI environments but I didn't realize, so we can now update this code to use the job name instead so the generation and the indexing match also upload stats workflow for mps Checked that inductor_amx doesn't use default Pull Request resolved: https://github.com/pytorch/pytorch/pull/147154 Approved by: https://github.com/huydhn	2025-02-14 17:06:56 +00:00
PyTorch MergeBot	9a883007a2	Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 )" This reverts commit c7515da7b00de40942c83dc5856b6daec727e280. Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))	2025-02-13 18:04:26 +00:00
Daniel Galvez	c7515da7b0	Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979 ) This is a new PR for #130386 , which got stale and was closed. Since I force-pushed to that branch in order to rebase it on top of main, the PR can no longer be reopened, according to https://github.com/isaacs/github/issues/361 I fixed the possibly-not-warmed-up problem described here: https://github.com/pytorch/pytorch/pull/130386/files#r1690856534 Since starting this, torch.cond and torch.while_loop now apparently have support for backward passes. I will look into what it might take to support that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140979 Approved by: https://github.com/eqy, https://github.com/eellison	2025-02-11 18:16:15 +00:00
Aleksei Nikiforov	44ecbcbd5a	s390x: disable test_model_exports_to_core_aten.py test (#145835 ) It often gets killed by OOM. Disable it while investigating. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145835 Approved by: https://github.com/huydhn	2025-01-31 17:45:10 +00:00
Burak Turk	01a4d86b31	add pt2 callbacks for backward pass and prevent duplicate callbacks (#145732 ) Summary: This change adds callbacks for lazy backwards compilation while preventing duplicate callbacks to be fired. Differential Revision: D68577593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145732 Approved by: https://github.com/mlazos	2025-01-28 03:50:02 +00:00
albanD	0d28188cc8	Move privateuse1 test out of test_utils and make them serial (#145380 ) Fixes https://github.com/pytorch/pytorch/issues/132720 The reason is that changing the privateuse1 module is global and so can race when other tests happen to check if it is enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145380 Approved by: https://github.com/Skylion007, https://github.com/janeyx99	2025-01-23 00:31:39 +00:00
Aaron Orenstein	99dbc5b0e2	PEP585 update - test (#145176 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145176 Approved by: https://github.com/bobrenjc93	2025-01-22 04:48:28 +00:00
Zhenbin Lin	cbb1ed2966	[1/N] OpenReg: Replace `open_registration_extension.cpp` with openreg (#141815 ) As described in OpenReg [next-steps](https://github.com/pytorch/pytorch/blob/main/test/cpp_extensions/open_registration_extension/README.md#next-steps), here we replace the current `open_registration_extension.cpp` test in PyTorch CI with openreg. The current `open_registration_extension.cpp` contains two parts: 1. Implentations to support `PrivateUse1` backend. 2. Helper functions used for UTs in `test_cpp_extensions_open_device_registration.py` and `test_transformers.py`. For the first part, we'll replace it with openreg. For the second part, we'll migrate them to ut files step by step. @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/141815 Approved by: https://github.com/albanD	2025-01-14 15:59:00 +00:00
Aleksei Nikiforov	4143312e67	S390x ci periodic tests (#125401 ) Periodically run testsuite for s390x Dependencies update Package z3-solver is updated from version 4.12.2.0 to version 4.12.6.0. This is a minor version update, so no functional change is expected. The reason for update is build on s390x. pypi doesn't provide binary build for z3-solver for versions 4.12.2.0 or 4.12.6.0 for s390x. Unfortunately, version 4.12.2.0 fails to build with newer gcc used on s390x builders, but those errors are fixed in version 4.12.6.0. Due to this minor version bump fixes build on s390x. ``` # pip3 install z3-solver==4.12.2.0 ... In file included from /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:53: /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp: In member function ‘void* region::allocate(size_t)’: /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/tptr.h:29:62: error: ‘uintptr_t’ does not name a type 29 \| #define ALIGN(T, PTR) reinterpret_cast<T>(((reinterpret_cast<uintptr_t>(PTR) >> PTR_ALIGNMENT) + \ \| ^~~~~~~~~ /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:82:22: note: in expansion of macro ‘ALIGN’ 82 \| m_curr_ptr = ALIGN(char , new_curr_ptr); \| ^~~~~ /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:57:1: note: ‘uintptr_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’? 56 \| #include "util/page.h" +++ \|+#include <cstdint> 57 \| ``` Python paths update* On AlmaLinux 8 s390x, old paths: ``` python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())' /usr/lib/python3.12/site-packages ``` Total result is `/usr/lib/python3.12/site-packages/torch;/usr/lib/python3.12/site-packages` New paths: ``` python -c 'import site; print(";".join([x for x in site.getsitepackages()] + [x + "/torch" for x in site.getsitepackages()]))' /usr/local/lib64/python3.12/site-packages;/usr/local/lib/python3.12/site-packages;/usr/lib64/python3.12/site-packages;/usr/lib/python3.12/site-packages;/usr/local/lib64/python3.12/site-packages/torch;/usr/local/lib/python3.12/site-packages/torch;/usr/lib64/python3.12/site-packages/torch;/usr/lib/python3.12/site-packages/torch ``` ``` # python -c 'import torch ; print(torch)' <module 'torch' from '/usr/local/lib64/python3.12/site-packages/torch/__init__.py'> ``` `pip3 install dist/.whl` installs torch into `/usr/local/lib64/python3.12/site-packages`, and later it's not found by cmake with old paths: ``` CMake Error at CMakeLists.txt:9 (find_package): By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Torch", but CMake did not find one. ``` https://github.com/pytorch/pytorch/actions/runs/10994060107/job/30521868178?pr=125401 Builders availability* Build took 60 minutes Tests took: 150, 110, 65, 55, 115, 85, 50, 70, 105, 110 minutes (split into 10 shards) 60 + 150 + 110 + 65 + 55 + 115 + 85 + 50 + 70 + 105 + 110 = 975 minutes used. Let's double it. It would be 1950 minutes. We have 20 machines * 24 hours = 20 * 24 * 60 = 20 * 1440 = 28800 minutes We currently run 5 nightly binaries builds, each on average 90 minutes build, 15 minutes test, 5 minutes upload, 110 minutes total for each, 550 minutes total. Doubling would be 1100 minutes. That leaves 28800 - 1100 = 27700 minutes total. Periodic tests would use will leave 25750 minutes. Nightly binaries build + nightly tests = 3050 minutes. 25750 / 3050 = 8.44. So we could do both 8 more times for additional CI runs for any reason. And that is with pretty good safety margin. Skip test_tensorexpr On s390x, pytorch is built without llvm. Even if it would be built with llvm, llvm currently doesn't support used features on s390x and test fails with errors like: ``` JIT session error: Unsupported target machine architecture in ELF object pytorch-jitted-objectbuffer unknown file: Failure C++ exception with description "valOrErr INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/jit/tensorexpr/llvm_jit.h":34, please report a bug to PyTorch. Unexpected failure in LLVM JIT: Failed to materialize symbols: { (main, { func }) } ``` Disable cpp/static_runtime_test on s390x Quantization is not fully supported on s390x in pytorch yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125401 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-01-10 18:21:07 +00:00

1 2 3 4 5 ...

748 Commits