4d833f859b
[BE] [CI] Fix aarch64 arch checks ( #165676 )
...
Instead of relying on `TEST_CONFIG` environment variable to contain `aarch64`, which is prone to errors, use output of `$(uname -m)` that is equal to `aarch64` on Linux ARM systems
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165676
Approved by: https://github.com/huydhn , https://github.com/atalman
ghstack dependencies: #165583 , #165584
2025-10-16 22:19:53 +00:00
69b05913fb
Revert "Add mingw to docker ( #165560 )"
...
This reverts commit 5e480b8ecf870e4a466c165701ab0e9d055f2ceb.
Reverted https://github.com/pytorch/pytorch/pull/165560 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165560#issuecomment-3409814274 ))
2025-10-16 08:42:11 +00:00
23fb7e9f4b
[CI] Add arch prefix in front of op benchmark results ( #165584 )
...
To be able to run x86 and aarch64 benchmarks later on
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165584
Approved by: https://github.com/huydhn
ghstack dependencies: #165583
2025-10-16 01:50:52 +00:00
5e480b8ecf
Add mingw to docker ( #165560 )
...
Add mingw to `pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11` docker image to support AOTI cross-compilation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165560
Approved by: https://github.com/malfet
ghstack dependencies: #165574
2025-10-16 01:31:50 +00:00
132ae8e6dd
Don't link with libnvToolsExt when building for 12.9 ( #165465 )
...
This is to bring back this logic from https://github.com/pytorch/pytorch/pull/161916/files#diff-bf46b4a09ca67e50622bf84fefc0d11b584ffcc24ee6cc5019cf0fc7565d81a8L170 . Building libtorch on 12.9 is failing otherwise https://github.com/pytorch/pytorch/actions/runs/18458531395/job/52610761895 :
```
cp: cannot stat '/usr/local/cuda/lib64/libnvToolsExt.so.1': No such file or directory
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165465
Approved by: https://github.com/atalman , https://github.com/malfet
2025-10-15 01:45:37 +00:00
a2601630cd
[vllm hash update] update the pinned vllm hash ( #164628 )
...
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml ).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164628
Approved by: https://github.com/pytorchbot
Co-authored-by: Huy Do <huydhn@gmail.com >
2025-10-12 18:26:07 +00:00
4400c5d31e
Continue to build nightly CUDA 12.9 for internal ( #163029 )
...
Revert part of https://github.com/pytorch/pytorch/pull/161916 to continue building CUDA 12.9 nightly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163029
Approved by: https://github.com/malfet
2025-10-11 08:26:47 +00:00
d41aa187ec
Add more B200 smoke test ( #165133 )
...
A follow up to #159494 . This PR adds additional `test_scaled_matmul_cuda` to smoke tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165133
Approved by: https://github.com/drisspg
2025-10-10 16:46:26 +00:00
e0abcee3b5
[Code Clean] Remove support of python3.9 ( #163846 )
...
As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163846
Approved by: https://github.com/ezyang
2025-10-10 11:11:56 +00:00
c7b57d9349
Add gfx1100 to build target for ROCm docker builds ( #165103 )
...
Fixes issue of gfx1100 test jobs timing out
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165103
Approved by: https://github.com/jeffdaily
2025-10-10 01:18:56 +00:00
e7fd296930
[CI] Add full debug build to trunk ( #164974 )
...
But not test, just import torch, as regression test for https://github.com/pytorch/pytorch/issues/164297
Test plan: Re-apply #164974 on top of this change and observer the failure in the workflows: https://github.com/pytorch/pytorch/actions/runs/18383302153/job/52375282838
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164974
Approved by: https://github.com/seemethere , https://github.com/clee2000 , https://github.com/atalman
ghstack dependencies: #164968 , #164969
2025-10-09 20:12:16 +00:00
91040f4934
Revert "[Code Clean] Remove support of python3.9 ( #163846 )"
...
This reverts commit bc1690c7e859dee8c47a7f0bbd3c43cc27c6fd2a.
Reverted https://github.com/pytorch/pytorch/pull/163846 on behalf of https://github.com/izaitsevfb due to breaks distributed tests ([comment](https://github.com/pytorch/pytorch/pull/163846#issuecomment-3386855437 ))
2025-10-09 17:27:08 +00:00
eaa02655ea
[CI] Run cpp tests on windows in one run_tests call ( #164861 )
...
The windows cpp tests take ~1 hour according to logs. Each has run_test called on them individually, so I tried batching them together so it's just one run_test call for all of them. I believe it now takes 30min. I turned off TD since I don't think cpp tests are included in TD stuff.
As always with batch, I'm not sure if the errorlevel/error surfacing stuff is correct
This code is written with a lot of help from chatgpu and copilot
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164861
Approved by: https://github.com/huydhn
2025-10-09 16:07:28 +00:00
bc1690c7e8
[Code Clean] Remove support of python3.9 ( #163846 )
...
As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163846
Approved by: https://github.com/ezyang
2025-10-09 11:54:10 +00:00
90b4e130d6
[Benchmark] cleanup torchbench models ( #164816 )
...
Prune models from TorchInductor dashboard to reduce ci cost. This PR prunes torchbench models according to the [doc](https://docs.google.com/document/d/1nLPNNAU-_M9Clx9FMrJ1ycdPxe-xRA54olPnsFzdpoU/edit?tab=t.0 ), which removes timm and huggingface models from torchbench.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164816
Approved by: https://github.com/anijain2305 , https://github.com/seemethere , https://github.com/huydhn , https://github.com/malfet
2025-10-09 00:31:25 +00:00
0b85236477
Fix refine_ranges corner case ( #164075 ) ( #164846 )
...
Summary:
address https://github.com/pytorch/pytorch/issues/161360
u0>0 should update the range of u0 to start from [1, ..] this fix it. it was not doing that.
Test Plan: contbuild & OSS CI, see 27234792ad
D84038721
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164846
Approved by: https://github.com/izaitsevfb , https://github.com/ezyang
2025-10-08 18:42:37 +00:00
73adac05d1
Triton 3.5.x pin update to 7416ffc ( #164587 )
...
Updates triton pin to latest: https://github.com/triton-lang/triton/commits/release/3.5.x/
This updates contains 1 cherry-pick to fix flex_attention_fwd regression on B200:
- https://github.com/triton-lang/triton/pull/8366
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164587
Approved by: https://github.com/atalman
2025-10-08 16:07:18 +00:00
f46ddb1e65
[ROCm][CI] add gfx1150 gfx1151 to docker images for binary builds ( #164854 )
...
Fixes #164346 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164854
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-08 14:34:22 +00:00
f76fdcaaf8
[Benchmark] cleanup huggingface models ( #164815 )
...
Prune models from TorchInductor dashboard to reduce ci cost. This PR prunes for hugging face models according to the [doc](https://docs.google.com/document/d/1nLPNNAU-_M9Clx9FMrJ1ycdPxe-xRA54olPnsFzdpoU/edit?tab=t.0 ), which reduces from 46 to 27 models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164815
Approved by: https://github.com/anijain2305 , https://github.com/seemethere , https://github.com/huydhn , https://github.com/malfet
2025-10-08 03:21:04 +00:00
955f21dc2c
[ROCm][CI] Add support for gfx1100 in rocm workflow + test skips ( #148355 )
...
This PR adds infrastructure support for gfx1100 in the rocm workflow. Nodes have been allocated for this effort.
@dnikolaev-amd contributed all the test skips.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148355
Approved by: https://github.com/jeffdaily
Co-authored-by: Dmitry Nikolaev <dmitry.nikolaev@amd.com >
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-07 22:36:25 +00:00
87c9fbda22
Follow up to PR 163980 for s390x ( #164464 )
...
Now with same updates propagated to s390x it works on s390x runners too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164464
Approved by: https://github.com/atalman
2025-10-07 12:02:29 +00:00
50e077beaa
Fix outdated info in requirements-ci.txt ( #164441 )
...
Fixes installation instructions and descriptions for `numba` and `scikit-image`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164441
Approved by: https://github.com/albanD
2025-10-07 02:10:41 +00:00
8f54e27e5d
[ROCm][CI] rebuild magma binary for gfx1150 gfx1151 ( #164782 )
...
After #164763 added gfx1150 gfx1151 to list of targets, this PR will trigger rebuild of magma binary for ROCm 7 with the new targets.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164782
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-06 23:29:21 +00:00
3912ba3e94
Revert "Fix refine_ranges corner case ( #164075 )"
...
This reverts commit 27234792add2ee9bedd84ca02dbf34f8f244bc5c.
Reverted https://github.com/pytorch/pytorch/pull/164075 on behalf of https://github.com/izaitsevfb due to fails executorch builds, see [D83938444](https://www.internalfb.com/diff/D83938444 ) ([comment](https://github.com/pytorch/pytorch/pull/164075#issuecomment-3374430964 ))
2025-10-06 22:09:39 +00:00
48b54b45d6
Replace pynvml with nvidia-ml-py in win-test.sh ( #164681 )
...
pynvml was deprecated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164681
Approved by: https://github.com/Aidyn-A , https://github.com/eqy
2025-10-06 21:57:26 +00:00
7e7ac2039d
[ROCm][CI] add gfx1150 gfx1151 to almalinux image ( #164763 )
...
First PR necessary to address missing gfx1151 reported in https://github.com/pytorch/pytorch/issues/164346 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164763
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-06 20:19:43 +00:00
9d1ab4f4bb
[CI] Limit Numba CUDA-13 patch to CUDA environments only ( #164607 )
...
The patch introduced in https://github.com/pytorch/pytorch/pull/163111 caused issues in ROCm environments. This change guards the patching logic to CUDA environments only, thus ameliorating test failures in ROCm environments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164607
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-04 02:39:07 +00:00
f3afbcf340
[ONNX] Bump tested onnxruntime to 1.23.0 and onnxscript to 0.5.2 ( #164440 )
...
Performs tests on the latest ONNX environment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164440
Approved by: https://github.com/justinchuby , https://github.com/albanD
2025-10-04 01:10:47 +00:00
27234792ad
Fix refine_ranges corner case ( #164075 )
...
address https://github.com/pytorch/pytorch/issues/161360
u0>0 should update the range of u0 to start from [1, ..] this fix it. it was not doing that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164075
Approved by: https://github.com/ColinPeppler
2025-10-03 23:30:46 +00:00
235b995ce1
Make sure Windows CUDA 12.8 build follow same arches as Linux builds ( #164470 )
...
I believe ``set TORCH_CUDA_ARCH_LIST=7.0;7.5;8.0;8.6;9.0;10.0;12.0`` is the one thats actually used. Hence remove 6.1 to align the support with Linux support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164470
Approved by: https://github.com/tinglvv , https://github.com/nWEIdia , https://github.com/Camyll
2025-10-02 16:34:42 +00:00
566ea4e86a
Work Around exposing statically linked libstdc++ CXX11 ABI strong symbols ( #163980 )
...
Work Around for: https://github.com/pytorch/pytorch/issues/133437
Test plan:
1. Build whl in CI
2. Download
3. Run ``nm -D libtorch_cpu.so | grep "recursive_directory_iterator"``
Test with check_binary_symbols.py:
Success:
```
num_cxx11_symbols: 2326
num_pre_cxx11_symbols: 0
lib: /home/ec2-user/github/variant-repack/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
num_statically_linked_symbols (T): 0
```
Fail when using "W" instead of "T" as type calling ``cxx11_statically_linked_symbols = grep_symbols(
lib, STATICALLY_LINKED_CXX11_ABI, symbol_type="W"
)`` :
```
num_cxx11_symbols: 2326
num_pre_cxx11_symbols: 0
lib: /home/ec2-user/github/variant-repack/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
num_statically_linked_symbols (T): 20
Traceback (most recent call last):
File "/home/ec2-user/github/variant-repack/test/pytorch/.ci/pytorch/smoke_test/check_binary_symbolsc.py", line 130, in <module>
main()
File "/home/ec2-user/github/variant-repack/test/pytorch/.ci/pytorch/smoke_test/check_binary_symbolsc.py", line 126, in main
check_lib_statically_linked_libstdc_cxx_abi_symbols(libtorch_cpu_path)
File "/home/ec2-user/github/variant-repack/test/pytorch/.ci/pytorch/smoke_test/check_binary_symbolsc.py", line 95, in check_lib_statically_linked_libstdc_cxx_abi_symbols
raise RuntimeError(
RuntimeError: Found statically linked libstdc++ symbols (recursive_directory_iterator), but there shouldn't be any, see: ['std::filesystem::__cxx11::recursive_directory_iterator::recursion_pending() const', 'std::filesystem::__cxx11::recursive_directory_iterator::depth() const', 'std::filesystem::__cxx11::recursive_directory_iterator::options() const', 'std::filesystem::__cxx11::recursive_directory_iterator::operator*() const', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::operator bool() const', 'std::filesystem::__cxx11::recursive_directory_iterator::disable_recursion_pending()', 'std::filesystem::__cxx11::recursive_directory_iterator::pop(std::error_code&)', 'std::filesystem::__cxx11::recursive_directory_iterator::pop()', 'std::filesystem::__cxx11::recursive_directory_iterator::increment(std::error_code&)', 'std::filesystem::__cxx11::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::__cxx11::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::__cxx11::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::__cxx11::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::__cxx11::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::__cxx11::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::__cxx11::recursive_directory_iterator::operator=(std::filesystem::__cxx11::recursive_directory_iterator&&)', 'std::filesystem::__cxx11::recursive_directory_iterator::operator=(std::filesystem::__cxx11::recursive_directory_iterator const&)', 'std::filesystem::__cxx11::recursive_directory_iterator::operator++()', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>&&)', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::__shared_ptr()', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>&&)', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::__shared_ptr()']
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163980
Approved by: https://github.com/isuruf , https://github.com/malfet
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com >
2025-10-01 23:17:30 +00:00
773c6762b8
[CD][CUDA13][NCCL] Fix nccl version typo for cu13 ( #164383 )
...
https://pypi.org/project/nvidia-nccl-cu13/#history does not have 2.27.5 but 2.27.7+.
Companion PR: https://github.com/pytorch/pytorch/pull/164352
Fixes a potential binary breakage due to non-existence of referenced NCCL cu13 version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164383
Approved by: https://github.com/tinglvv , https://github.com/Skylion007 , https://github.com/atalman
2025-10-01 21:32:25 +00:00
2610746375
Revert nccl upgrade back to 2.27.5 ( #164352 )
...
Revert https://github.com/pytorch/pytorch/pull/162351 as it breaks H100
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164352
Approved by: https://github.com/atalman , https://github.com/malfet
2025-10-01 15:27:40 +00:00
69fa26d9b4
Triton 3.5.x pin update ( #164268 )
...
Updates triton pin to latest: https://github.com/triton-lang/triton/commits/release/3.5.x/
This updates contains 2 cherry-pick to remove Python 3.9 from list of supported python versions:
https://github.com/triton-lang/triton/pull/8288
https://github.com/triton-lang/triton/pull/8287
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164268
Approved by: https://github.com/aakhundov
2025-10-01 11:41:50 +00:00
d9c80ef97d
Build and Install Arm Compute Library in manylinux docker image ( #159737 )
...
----
This PR will be part of a series of PR's that aims to remove `.ci/aarch64_linux` folder entirely, such that Aarch64 manylinux build happens as part of `.ci/manywheel/build.sh`, the same as other platforms.
In this PR:
- We prebuild + install Arm Compute Library in the manylinux docker image ( at /acl ), instead of a build time for every pytorch build. Also updated jammy install path to be /acl too.
- We can therefore remove build_ArmComputeLibrary functions from the ci build scripts.
- There is also some refactoring of install_openblas.sh and install_acl.sh to align them together ( similar formatting, similar variable names, same place for version number update )
- We had 2 places to define openblas version, this has been reduced to 1 now ( install_openblas.sh ).
- ACL_VERSION and OPENBLAS_VERSION are now able to be overriden at build.sh level for developers, but there is only 1 version of each hardcoded for ci.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159737
Approved by: https://github.com/seemethere , https://github.com/aditew01
2025-10-01 11:33:51 +00:00
bd0907dc4c
[BE][CI] Unify requirments ( #163396 )
...
Both Linux, Windows and MacOS CI workflows should use `.ci/docker/requirements-ci.txt`
TODOS:
- Investigate why `choco install cmake` is needed to successfully detect MKL
- Move `psutil` installation from specific scripts into requirements-ci.txt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163396
Approved by: https://github.com/Skylion007
2025-10-01 03:28:48 +00:00
5b1c39f5a1
Add smoke tests to verify that stable ABI FA3 wheel runs w/ newer torch ( #163782 )
...
Passing CI: https://github.com/pytorch/pytorch/actions/runs/18141589975/job/51635340255?pr=163782
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163782
Approved by: https://github.com/huydhn , https://github.com/mikaylagawarecki
2025-10-01 02:30:38 +00:00
ad7e3c93b1
[ROCm][CD] librocroller.so missing from ROCm 7 wheel ( #164244 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164244
Approved by: https://github.com/jeffdaily , https://github.com/Skylion007
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-10-01 00:02:34 +00:00
e88cca0691
Update Sphinx theme ( #164147 )
...
Fix links in the top nav bar: 71e55749be
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164147
Approved by: https://github.com/albanD
2025-09-30 15:35:58 +00:00
0d7994ca97
[inductor] do comm compute overlap at aten fx level ( #163215 )
...
This is first part of the stack that does comm/compute reordering, and then uses the exposure analysis to do bucketing.
Subsequent prs will handle:
- use of exposure analysis to do bucketing
- make sure inductor respects comm/compute overlapping done at fx level
- non-profiling mm estimation/rank broadcasting of profile results
Other mis:
- Validate accuracy of nccl estimations ( use ruisi's profiling instead ?)
For a llama 2d parallelism test, on forward, we overlap all but 2 of potentially hidden collectives. For backward, we overlap 217/269 of potentially hidden collectives. If you increase `compute_overlap_multipler` (for fudge factor of inaccurate comms estimation), that goes down to all but 16 of potentially hidden collectives.
fwd example: https://gist.github.com/eellison/76209c49d8829c5f1e323d34a3f040c3
bwd example: https://gist.github.com/eellison/6cfc2285df53a94cfa4012f5fdae5c51
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163215
Approved by: https://github.com/IvanKobzarev
2025-09-30 04:53:58 +00:00
b7419b920d
[ROCm][CI] Upgrade ROCm to 7.0 ( #163140 )
...
Upgrade all the ROCm docker image to ROCm 7.0 release version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163140
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-09-30 02:23:26 +00:00
3b4ad4a17d
[AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 ( #163988 )
...
See also #163972 , which was intended to be this PR.
Triton (release/3.5.x) by default ships CUDA12.8 ptxas.
This PR tries to bundle a ptxas version for cuda13, so that it can help https://github.com/pytorch/pytorch/issues/163801 when users run on new devices like THOR and Spark.
Fixes https://github.com/pytorch/pytorch/issues/163801
Test Plan:
Check binary size increase against nightly or v2.9RC
Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression.
Reference: https://github.com/pytorch/pytorch/pull/119750 and 5c814e2527
Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary.
However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then c6ad34f7eb/python/triton/knobs.py (L216)
would still complain ptxas not found (if removed - it won't know this new one available)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163988
Approved by: https://github.com/atalman
2025-09-30 01:56:12 +00:00
cee4e36f9a
[BE] remove manylinuxcxx11-abi-builder:cpu-cxx11-abi docker image ( #164187 )
...
I believe this image is not used anywhere anymore.
Test:
```
git grep manylinuxcxx11-abi-builder
git grep manylinuxcxx11
```
Return no results.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164187
Approved by: https://github.com/izaitsevfb , https://github.com/malfet , https://github.com/seemethere
2025-09-30 00:26:20 +00:00
170e0309ca
Bump protobuf from 5.29.4 to 5.29.5 in /.ci/docker ( #156157 )
...
* Bump protobuf from 5.29.4 to 5.29.5 in /.ci/docker
Bumps [protobuf](https://github.com/protocolbuffers/protobuf ) from 5.29.4 to 5.29.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases )
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl )
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v5.29.4...v5.29.5 )
---
updated-dependencies:
- dependency-name: protobuf
dependency-version: 5.29.5
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
* Update .ci/docker/requirements-ci.txt
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com >
2025-09-29 15:20:44 -07:00
50d418f69f
Replace setup.py bdist_wheel with python -m build --wheel ( #156712 )
...
Previously we already replaced most use of `python setup.py develop/install`.
This PR also replaces the use of `setup.py bdist_wheel` with the modern `python -m build --wheel` alternative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156712
Approved by: https://github.com/atalman
ghstack dependencies: #156711
2025-09-29 21:51:32 +00:00
fa54b08cd5
Replace setup.py install with pip install ( #156711 )
...
#156027 already replaced most use of `python setup.py install`.
This PR only adds a few more occurrences and adds `--no-build-isolation` in a few places.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156711
Approved by: https://github.com/atalman
2025-09-29 15:15:10 +00:00
54b38f3b46
Add operator benchmarking run to CI nightly ( #162530 )
...
This PR introduces a new "operator microbenchmark" CI workflow and GitHub Actions for operator microbenchmarks, updating test scripts and job matrices to support new parameters, and broadening the operator benchmark tests to include more data types, larger shapes, and gradient tests. The benchmark configurations now focus more on different cuda hardware and multiple dtypes (bf16, fp16, fp32), for both compile and eager mode.
**Benchmark Configuration and Coverage:**
* Expanded operator benchmark configurations in `addmm_test.py`, `bmm_test.py`, `matmul_test.py`, and `mm_test.py` to benchmark multiple dtypes on CUDA devices, in eager and compile mode, for forward and backward run. The configs with tag "long" for the above mentioned files are being run in CI.
* The CI benchmarking is running on various hardwares: H100, A100.
* The CI job also uploads the microbenchmarking outputs to a [HUD](https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=PyTorch+operator+microbenchmark ) dashboard.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162530
Approved by: https://github.com/huydhn
Co-authored-by: Huy Do <huydhn@gmail.com >
2025-09-29 00:46:38 +00:00
5504a06e01
[BE]: Update NCCL to 2.28.3 ( #162351 )
...
@eqy New NCCL has some a bunch of bugfixes for features including reducing the number SMs needed by NVLINK collectives as well as some very useful new APIs for SymmetricMemory. Also allows FP8 support for non-reductive operations on pre-sm90 devices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162351
Approved by: https://github.com/ezyang , https://github.com/malfet , https://github.com/atalman
2025-09-28 01:38:59 +00:00
960290d629
[Docs] Add standard-imghdr for PyTorch Doc ( #163944 )
...
As the title stated.
Python [Pep-0594](https://peps.python.org/pep-0594 ) have removed imghdr from python standard libaries, the older version of sphinx don`t add it as installation dependencies, so we need to add it to requirement as an temporary dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163944
Approved by: https://github.com/albanD , https://github.com/svekars
2025-09-27 08:14:51 +00:00
96182faf96
[CI][Distributed][CUDA][Symm-Mem] Enable B200 Symm Mem Test ( #162988 )
...
Inspired by https://github.com/pytorch/pytorch/pull/162981 and motivated by https://github.com/pytorch/pytorch/pull/159323 taking a total of 20 hours to finish (and unlikely to make it in short time due to https://github.com/pytorch/pytorch/issues/162178 )
Creating this subtest to get *something distributed* on B200.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162988
Approved by: https://github.com/malfet
2025-09-27 05:12:05 +00:00