5d62b63a76
[BE] Use Python-3.14 GE build ( #165804 )
...
3.14 reached general availability on Oct 7th 2025, so we can remove all pre-release workarounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165804
Approved by: https://github.com/yangw-dev , https://github.com/Skylion007 , https://github.com/cyyever
2025-10-19 11:45:10 +00:00
fcbde24c1c
[ONNX] Remove common imports from torchlib ( #165156 )
...
The Rank and IsScalar functions are no longer used in the torchlib. Requires onnxscript v0.5.4
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165156
Approved by: https://github.com/Skylion007 , https://github.com/cyyever
2025-10-17 03:25:34 +00:00
5d9b024276
Add mingw to docker ( #165560 )
...
Add mingw to `pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11` docker image to support AOTI cross-compilation
This PR will make docker container rebuild, and upgrade python version from 3.13.7 to 3.13.8. and it relies on https://github.com/pytorch/pytorch/pull/165667
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165560
Approved by: https://github.com/malfet
2025-10-17 00:47:01 +00:00
69b05913fb
Revert "Add mingw to docker ( #165560 )"
...
This reverts commit 5e480b8ecf870e4a466c165701ab0e9d055f2ceb.
Reverted https://github.com/pytorch/pytorch/pull/165560 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165560#issuecomment-3409814274 ))
2025-10-16 08:42:11 +00:00
5e480b8ecf
Add mingw to docker ( #165560 )
...
Add mingw to `pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11` docker image to support AOTI cross-compilation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165560
Approved by: https://github.com/malfet
ghstack dependencies: #165574
2025-10-16 01:31:50 +00:00
f3afbcf340
[ONNX] Bump tested onnxruntime to 1.23.0 and onnxscript to 0.5.2 ( #164440 )
...
Performs tests on the latest ONNX environment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164440
Approved by: https://github.com/justinchuby , https://github.com/albanD
2025-10-04 01:10:47 +00:00
566ea4e86a
Work Around exposing statically linked libstdc++ CXX11 ABI strong symbols ( #163980 )
...
Work Around for: https://github.com/pytorch/pytorch/issues/133437
Test plan:
1. Build whl in CI
2. Download
3. Run ``nm -D libtorch_cpu.so | grep "recursive_directory_iterator"``
Test with check_binary_symbols.py:
Success:
```
num_cxx11_symbols: 2326
num_pre_cxx11_symbols: 0
lib: /home/ec2-user/github/variant-repack/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
num_statically_linked_symbols (T): 0
```
Fail when using "W" instead of "T" as type calling ``cxx11_statically_linked_symbols = grep_symbols(
lib, STATICALLY_LINKED_CXX11_ABI, symbol_type="W"
)`` :
```
num_cxx11_symbols: 2326
num_pre_cxx11_symbols: 0
lib: /home/ec2-user/github/variant-repack/.venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
num_statically_linked_symbols (T): 20
Traceback (most recent call last):
File "/home/ec2-user/github/variant-repack/test/pytorch/.ci/pytorch/smoke_test/check_binary_symbolsc.py", line 130, in <module>
main()
File "/home/ec2-user/github/variant-repack/test/pytorch/.ci/pytorch/smoke_test/check_binary_symbolsc.py", line 126, in main
check_lib_statically_linked_libstdc_cxx_abi_symbols(libtorch_cpu_path)
File "/home/ec2-user/github/variant-repack/test/pytorch/.ci/pytorch/smoke_test/check_binary_symbolsc.py", line 95, in check_lib_statically_linked_libstdc_cxx_abi_symbols
raise RuntimeError(
RuntimeError: Found statically linked libstdc++ symbols (recursive_directory_iterator), but there shouldn't be any, see: ['std::filesystem::__cxx11::recursive_directory_iterator::recursion_pending() const', 'std::filesystem::__cxx11::recursive_directory_iterator::depth() const', 'std::filesystem::__cxx11::recursive_directory_iterator::options() const', 'std::filesystem::__cxx11::recursive_directory_iterator::operator*() const', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::operator bool() const', 'std::filesystem::__cxx11::recursive_directory_iterator::disable_recursion_pending()', 'std::filesystem::__cxx11::recursive_directory_iterator::pop(std::error_code&)', 'std::filesystem::__cxx11::recursive_directory_iterator::pop()', 'std::filesystem::__cxx11::recursive_directory_iterator::increment(std::error_code&)', 'std::filesystem::__cxx11::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::__cxx11::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::__cxx11::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::__cxx11::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::__cxx11::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::__cxx11::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::__cxx11::recursive_directory_iterator::operator=(std::filesystem::__cxx11::recursive_directory_iterator&&)', 'std::filesystem::__cxx11::recursive_directory_iterator::operator=(std::filesystem::__cxx11::recursive_directory_iterator const&)', 'std::filesystem::__cxx11::recursive_directory_iterator::operator++()', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>&&)', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::__shared_ptr()', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>&&)', 'std::__shared_ptr<std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack, (__gnu_cxx::_Lock_policy)2>::__shared_ptr()']
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163980
Approved by: https://github.com/isuruf , https://github.com/malfet
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com >
2025-10-01 23:17:30 +00:00
d9c80ef97d
Build and Install Arm Compute Library in manylinux docker image ( #159737 )
...
----
This PR will be part of a series of PR's that aims to remove `.ci/aarch64_linux` folder entirely, such that Aarch64 manylinux build happens as part of `.ci/manywheel/build.sh`, the same as other platforms.
In this PR:
- We prebuild + install Arm Compute Library in the manylinux docker image ( at /acl ), instead of a build time for every pytorch build. Also updated jammy install path to be /acl too.
- We can therefore remove build_ArmComputeLibrary functions from the ci build scripts.
- There is also some refactoring of install_openblas.sh and install_acl.sh to align them together ( similar formatting, similar variable names, same place for version number update )
- We had 2 places to define openblas version, this has been reduced to 1 now ( install_openblas.sh ).
- ACL_VERSION and OPENBLAS_VERSION are now able to be overriden at build.sh level for developers, but there is only 1 version of each hardcoded for ci.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159737
Approved by: https://github.com/seemethere , https://github.com/aditew01
2025-10-01 11:33:51 +00:00
b7419b920d
[ROCm][CI] Upgrade ROCm to 7.0 ( #163140 )
...
Upgrade all the ROCm docker image to ROCm 7.0 release version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163140
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-09-30 02:23:26 +00:00
50d418f69f
Replace setup.py bdist_wheel with python -m build --wheel ( #156712 )
...
Previously we already replaced most use of `python setup.py develop/install`.
This PR also replaces the use of `setup.py bdist_wheel` with the modern `python -m build --wheel` alternative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156712
Approved by: https://github.com/atalman
ghstack dependencies: #156711
2025-09-29 21:51:32 +00:00
f1260c9b9a
[ROCm][CI/CD] upgrade nightly wheels to ROCm 7.0 ( #163937 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163937
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-09-26 21:42:09 +00:00
f4eca0e3b3
Try updating ET pin in PT/PT ( #159664 )
...
Looking into resolving this: https://github.com/pytorch/pytorch/issues/159599
Test Plan: Wait for executorch CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159664
Approved by: https://github.com/malfet
2025-09-18 21:55:16 +00:00
9f783e172d
Revert "Build and Install Arm Compute Library in manylinux docker image ( #159737 )"
...
This reverts commit 582d278983b28a91ac0cedd035183f2495bb6887.
Reverted https://github.com/pytorch/pytorch/pull/159737 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503 ) ([comment](https://github.com/pytorch/pytorch/pull/159737#issuecomment-3281398272 ))
2025-09-11 15:25:24 +00:00
80d4da893c
Revert "Put torchao (0.13.0) back to benchmark workflow ( #162227 )"
...
This reverts commit 00985970e312c3c5e674e8e14d39fe77c226600e.
Reverted https://github.com/pytorch/pytorch/pull/162227 on behalf of https://github.com/huydhn due to Crashing some inductor jobs in trunk ([comment](https://github.com/pytorch/pytorch/pull/162227#issuecomment-3276355034 ))
2025-09-10 20:11:37 +00:00
ab0694f1c6
[ROCm][Inductor][CK backend] Install rocm-composable-kernel python package on ROCm Linux CI docker images ( #162288 )
...
Reopened from #158747 which got reverted since without setuptools-scm in pytorch index URL the wheel cannot be built
We reconsider the original PR idea of introducing CK as a pytorch dependency on ROCm Linux and install the CK python package in CI only -- since (1) rocm-composable-kernel depends on setuptools-scm which depends on tomli and the existing index URLs need to be modified to host the new packages and (2) there also is a packaging [bug](https://github.com/pypa/setuptools/issues/3269#issuecomment-1254507377 ) in Ubuntu 22.04 which prevents correct dynamic version calculation with default system pip.
Extras:
-> this PR reconsiders how TORCHINDUCTOR_CK_DIR env variable is used; previously, this var was used to point to rocm-composable-kernel package installation path on the filesystem; now, the path is inferred by trying to import ck4inductor
-> the tests are updated to reflect this change
-> since in CI clang points to a bash script which invokes sccache, we cannot patch PATH to not contain sccache, this logic is removed from the testing code
-> scaled_mm test crashes during the benchmarking when the benchmarking happens in the main process, and times out benchmarking when it happens in a subprocess, on gfx942, so it is disabled
TBD: roll back rocm-mi300 workflow before merging
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162288
Approved by: https://github.com/jeffdaily
2025-09-10 19:33:40 +00:00
582d278983
Build and Install Arm Compute Library in manylinux docker image ( #159737 )
...
----
This PR will be part of a series of PR's that aims to remove `.ci/aarch64_linux` folder entirely, such that Aarch64 manylinux build happens as part of `.ci/manywheel/build.sh`, the same as other platforms.
In this PR:
- We prebuild + install Arm Compute Library in the manylinux docker image ( at /acl ), instead of a build time for every pytorch build. Also updated jammy install path to be /acl too.
- We can therefore remove build_ArmComputeLibrary functions from the ci build scripts.
- There is also some refactoring of install_openblas.sh and install_acl.sh to align them together ( similar formatting, similar variable names, same place for version number update )
- We had 2 places to define openblas version, this has been reduced to 1 now ( install_openblas.sh ).
- ACL_VERSION and OPENBLAS_VERSION are now able to be overriden at build.sh level for developers, but there is only 1 version of each hardcoded for ci.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159737
Approved by: https://github.com/seemethere
ghstack dependencies: #160078
2025-09-10 15:39:38 +00:00
00985970e3
Put torchao (0.13.0) back to benchmark workflow ( #162227 )
...
0.13.0 was released on Sep 3rd https://pypi.org/project/torchao/#history , which should have fixed the crashing issue on transformers now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162227
Approved by: https://github.com/malfet
2025-09-10 03:56:25 +00:00
145a3a7bda
[CUDA 13][cuDNN] Bump CUDA 13 to cuDNN 9.13.0 ( #162268 )
...
Fixes some `d_qk` != `d_v` cases on Hopper that are broken by cuDNN 9.11-9.12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162268
Approved by: https://github.com/drisspg , https://github.com/Skylion007
2025-09-06 01:59:03 +00:00
71992dd805
S390x: build nightly binaries for new pythons ( #161920 )
...
Enable python 3.13t, 3.14 and 3.14t on s390x for nightly binaries
Fixes #161515
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161920
Approved by: https://github.com/malfet
2025-09-03 17:38:38 +00:00
e304ea4e69
Revert "[BE] Update xpu driver repo for CD used almalinux 8.10 ( #157356 )"
...
This reverts commit c78bbdf4102d2c13bf6aa1abe4352aa7bca401ca.
Reverted https://github.com/pytorch/pytorch/pull/157356 on behalf of https://github.com/chuanqi129 due to This PR has performance regression on some workloads ([comment](https://github.com/pytorch/pytorch/pull/157356#issuecomment-3245319046 ))
2025-09-02 13:20:38 +00:00
303f514d5b
[CI] Add basic CUDA 13.0 periodic test ( #161013 )
...
https://github.com/pytorch/pytorch/issues/159779
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161013
Approved by: https://github.com/atalman
Co-authored-by: Andrey Talman <atalman@fb.com >
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com >
2025-08-29 17:56:33 +00:00
cbc53b7696
Update pybind11 submodule to 3.0.1 ( #160754 )
...
Upgrade to PyBind11 v3. This allows us to strip out our own (possibly broken?) handling of the C++ ABI when building extensions, in favor of the more-complete PyBind11 internal handling.
Fixes a few test failures due to https://github.com/pybind/pybind11/issues/5774 , which effectively makes the `__qualname__` attribute of functions platform-dependent.
Test plan: CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160754
Approved by: https://github.com/Skylion007
2025-08-27 21:15:01 +00:00
06c7516994
[BE] Upgrade XPU support package to 2025.2 ( #158733 )
...
Including below changes,
- Add XPU support package 2025.2 build and test in CI for both Linux and Windows
- Keep XPU support package 2025.1 build in CI to ensure no break issue until PyTorch 2.9 release
- Upgrade XPU support package from 2025.1 to 2025.2 in CD for both Linux and Windows
- Rename Linux CI job name & image name to n & n-1
- Update XPU runtime pypi packages dependencies of CD wheels
- Remove deprecated support package version docker image build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158733
Approved by: https://github.com/EikanWang , https://github.com/atalman
2025-08-27 19:33:38 +00:00
1b34e04485
Revert "Update pybind11 submodule to 3.0.1 ( #160754 )"
...
This reverts commit 660b0b8128181d11165176ea3f979fa899f24db1.
Reverted https://github.com/pytorch/pytorch/pull/160754 on behalf of https://github.com/atalman due to please see https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226051449 ([comment](https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226078102 ))
2025-08-26 23:35:22 +00:00
ae8d319fd4
Update NVSHMEM to 3.3.24 and fix download link ( #161321 )
...
https://github.com/pytorch/pytorch/issues/159779
Update NVSHMEM 3.3.24 for [PyTorch CUDA13 Binary Cannot Be Built with SM_75 with NVSHMEM](https://github.com/pytorch/pytorch/issues/160980 )
Enabled back sm_75 for NVSHMEM
Fixed the NVSHMEM download link for the issue with 3.3.20 download in issue - [[CD] nvshem-3.3.9 wheels for aarch64 is not manylinux2_28 compliant](https://github.com/pytorch/pytorch/issues/160425 )
Todo: Should also enable back build ARM with NVSHMEM since it is compatible with manylinux2_28
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161321
Approved by: https://github.com/Skylion007 , https://github.com/atalman
2025-08-26 13:26:18 +00:00
660b0b8128
Update pybind11 submodule to 3.0.1 ( #160754 )
...
Upgrade to PyBind11 v3. This allows us to strip out our own (possibly broken?) handling of the C++ ABI when building extensions, in favor of the more-complete PyBind11 internal handling.
Fixes a few test failures due to https://github.com/pybind/pybind11/issues/5774 , which effectively makes the `__qualname__` attribute of functions platform-dependent.
Test plan: CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160754
Approved by: https://github.com/Skylion007
2025-08-26 01:21:18 +00:00
0d9da384ef
Bump onnxscript to 0.4.0 in CI ( #161312 )
...
Use onnxscript apis for torch 2.9.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161312
Approved by: https://github.com/titaiwangms , https://github.com/malfet
2025-08-22 23:23:08 +00:00
dc200066cf
[ONNX] Use onnxruntime 1.22 in CI ( #160924 )
...
Use onnxruntime 1.22 in CI to enable testing of newer opsets and IR versions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160924
Approved by: https://github.com/titaiwangms
2025-08-19 00:05:26 +00:00
87d6831b2e
Add CUDA installation script for CUDA 13 ( #160201 )
...
Add the almalinux docker for building magma-cuda 13.0
https://github.com/pytorch/pytorch/issues/159779
Also fixed the NVSHMEM download link
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160201
Approved by: https://github.com/atalman
Co-authored-by: Andrey Talman <atalman@fb.com >
2025-08-18 17:26:25 +00:00
a84541c73f
Update transformers version automatically with Dependabot ( #160635 )
...
My proposal here is to use GitHub Dependabot to make sure that `transformers` version used in CI are always up-to-date. To achieve this, this PR does 2 things:
1. Pin `transformers` version across all CI jobs to only one place at `.ci/docker/ci_commit_pins/huggingface.txt`. This file is now a regular pip requirements instead of a pinned commit text. There isn't any need to pin `transformers` to a specific commit and the file already refers to a stable version `v4.54.0`
2. Create `.github/dependabot.yml` to config the bot to update `transformers` automatically when there is a new version. Those labels will ensure that the right reviewers from torch.compile and Dev Infra are notified. I'm not sure how to test this out in PR, but it feels ok to land and test this in main. If this works, we should see a PR to update `v4.54.0` to the current latest `v4.55.0`
### Reference
https://docs.github.com/en/code-security/dependabot/working-with-dependabot/dependabot-options-reference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160635
Approved by: https://github.com/ZainRizvi
2025-08-16 05:53:39 +00:00
7bd4cfaef4
[BE] Update nvshem dependency to 3.3.20 ( #160458 )
...
Which is manylinux2_28 compatible, even on aarch64 platform
archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works.
Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel
Should fix https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458
Approved by: https://github.com/Skylion007 , https://github.com/kwen2501 , https://github.com/nWEIdia , https://github.com/atalman , https://github.com/tinglvv
2025-08-16 02:00:57 +00:00
c015e53d37
Revert "[BE] Update nvshem dependency to 3.3.20 ( #160458 )"
...
This reverts commit e0488d9f00865fb56c931580c80e099771c6285e.
Reverted https://github.com/pytorch/pytorch/pull/160458 on behalf of https://github.com/wdvr due to need to rerun workflow generation (failing workflow-checks) ([comment](https://github.com/pytorch/pytorch/pull/160458#issuecomment-3193133706 ))
2025-08-16 01:47:42 +00:00
e0488d9f00
[BE] Update nvshem dependency to 3.3.20 ( #160458 )
...
Which is manylinux2_28 compatible, even on aarch64 platform
archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works.
Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel
Should fix https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458
Approved by: https://github.com/Skylion007 , https://github.com/kwen2501 , https://github.com/nWEIdia , https://github.com/atalman , https://github.com/tinglvv
2025-08-16 00:50:13 +00:00
01bcf9a40d
Bump transformers pin ( #159291 )
...
Trying to update hf pin.
Benchmarking run to figure out issues
<img width="1356" height="123" alt="image" src="https://github.com/user-attachments/assets/fbc435f3-a7cb-4280-9636-2ea6d15d7b6d " />
Retrying - https://github.com/pytorch/pytorch/pull/156118
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159291
Approved by: https://github.com/BoyuanFeng , https://github.com/huydhn
Co-authored-by: Huy Do <huydhn@gmail.com >
2025-08-12 05:14:17 +00:00
334ecbd4ff
Add torchao to install_inductor_benchmark_deps cleanup stage ( #160191 )
...
It looks like `torcho` was missed from the cleanup during torchbench setup.
Fixes #160188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160191
Approved by: https://github.com/huydhn
2025-08-08 22:18:41 +00:00
ee1fb43450
Fix docker image creation ( #158634 )
...
Since switching from wheel 0.34.2 to wheel 0.45.1
python symlinks are no longer correctly created.
Migrate to packaging package for symlink creation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158634
Approved by: https://github.com/malfet
2025-08-07 17:41:47 +00:00
d20c4c20e6
[CI] Update xpu ci use rolling driver for new features ( #158340 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158340
Approved by: https://github.com/seemethere
Co-authored-by: xinan.lin <xinan.lin@intel.com >
2025-08-07 15:18:51 +00:00
2231c3ca3a
[CI][CD] Fix install_nvshem
function ( #159907 )
...
When one builds CD docker, all CUDA dependencies must be installed into `/usr/local/cuda/` folder
Test plan: Looks at the binary build logs, for example [here](https://github.com/pytorch/pytorch/actions/runs/16768141521/job/47477380147?pr=159907 ):
```
2025-08-06T05:58:00.7347471Z -- NVSHMEM_HOME set to: ''
2025-08-06T05:58:00.7348378Z -- NVSHMEM wheel installed at: ''
2025-08-06T05:58:00.7392528Z -- NVSHMEM_HOST_LIB: '/usr/local/cuda/lib64/libnvshmem_host.so'
2025-08-06T05:58:00.7393251Z -- NVSHMEM_DEVICE_LIB: '/usr/local/cuda/lib64/libnvshmem_device.a'
2025-08-06T05:58:00.7393792Z -- NVSHMEM_INCLUDE_DIR: '/usr/local/cuda/include'
2025-08-06T05:58:00.7394252Z -- NVSHMEM found, building with NVSHMEM support
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159907
Approved by: https://github.com/Skylion007 , https://github.com/ngimel
2025-08-06 14:44:37 +00:00
49abc0e3f8
[Take 2] Setup TorchBench in Docker ( #159300 )
...
Fix and reland https://github.com/pytorch/pytorch/pull/158613 , I keep `checkout_install_torchbench` in `.ci/pytorch/macos-test.sh` script because it's still used there, and there is no Docker.
### Testing
MacOS perf nightly run https://github.com/pytorch/pytorch/actions/runs/16580798470
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159300
Approved by: https://github.com/ZainRizvi
2025-08-05 23:47:42 +00:00
135762ea20
Unpin helion ( #159579 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159579
Approved by: https://github.com/jansel
2025-08-01 23:08:06 +00:00
b4619f0272
Pin Helion to 0.0.10 in PyTorch CI ( #159420 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159420
Approved by: https://github.com/aorenste , https://github.com/malfet
2025-07-29 22:06:50 +00:00
08ea8fccaf
[ez][docker] Remove some unused vars and scripts ( #158680 )
...
`CUDNN_VERSION` isn't used in any Dockerfiles, it's picked automatically based on the cuda version in `install_cuda.sh`
`install_cudnn.sh` isn't used anywhere, cudnn installation happens in `install_cuda.sh`
I didn't find any mentions of `GRADLE_VERSION` or `TENSORRT_VERSION`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158680
Approved by: https://github.com/janeyx99 , https://github.com/atalman , https://github.com/malfet
2025-07-28 21:44:47 +00:00
d26ab281d2
Revert "Setup TorchBench in Docker ( #158613 )"
...
This reverts commit d72ebefe3fa7d3ee0e9c9b399f5c07611e790664.
Reverted https://github.com/pytorch/pytorch/pull/158613 on behalf of https://github.com/XuehaiPan due to checkout_install_torchbench function is removed but still referenced in trunk ([comment](https://github.com/pytorch/pytorch/pull/158613#issuecomment-3125695250 ))
2025-07-28 06:19:00 +00:00
d72ebefe3f
Setup TorchBench in Docker ( #158613 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-07-26 12:56:03 -07:00
16c0ccd669
[ROCm][CI] upgrade to 6.4.2 patch release ( #158887 )
...
Upgrade to ROCm 6.4.2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158887
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-07-25 03:45:44 +00:00
ee97dbf2e7
[ROCm][CI] update HIP patch for 6.4.1, again ( #159001 )
...
Another fix for hipGraph capture of MIOpen OCL kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159001
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-07-24 16:36:19 +00:00
9281625a9b
Revert "Setup TorchBench in Docker ( #158613 )"
...
This reverts commit cab28330f8c49cdb66d6a299755dc09c87c14a9d.
Reverted https://github.com/pytorch/pytorch/pull/158613 on behalf of https://github.com/ZainRizvi due to Seems to have broken trunk. See [GH job link](https://github.com/pytorch/pytorch/actions/runs/16429779764/job/46430634676 ) [HUD commit link](b3c868d603
) ([comment](https://github.com/pytorch/pytorch/pull/158613#issuecomment-3100023071 ))
2025-07-22 00:12:49 +00:00
cab28330f8
Setup TorchBench in Docker ( #158613 )
...
This reduces the time spending to setup TorchBench in A100/H100 by another half an hour
### Testing
* H100 benchmark https://github.com/pytorch/pytorch/actions/runs/16396172453 . Once this done, I will review the results on [HUD](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Fri%2C%2011%20Jul%202025%2023%3A01%3A24%20GMT&stopTime=Fri%2C%2018%20Jul%202025%2023%3A01%3A24%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=gh/huydhn/6/head&lCommit=14a38c719b29a19f518239b5edb084838ac5d2fb&rBranch=main&rCommit=0a99b026d6bd0f67dc2c0a20fe3228ddc4144854 ) to confirm that all models are there
* A100 benchmark https://github.com/pytorch/pytorch/actions/runs/16396173932
Signed-off-by: Huy Do <huydhn@gmail.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158613
Approved by: https://github.com/janeyx99
2025-07-21 22:34:08 +00:00
0e46f54286
[ROCm][CI] update HIP patch for 6.4.1 ( #158651 )
...
patch is intended to fix hipGraph capture for some miopen kernels
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158651
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-07-21 22:09:36 +00:00
2df2e3bb51
[ROCm][CI] Last known good HIP patch ( #158596 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158596
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-07-17 22:52:16 +00:00