Compare commits

..

59 Commits

Author SHA1 Message Date
21fec65781 Use linux.g4dn.4xlarge.nvidia.gpu for cuda 12.4 legacy driver tests (#164172)
Use linux.g4dn.4xlarge.nvidia.gpu for cuda 12.4 legacy driver tests (#163956)

Workaround for https://github.com/pytorch/pytorch/issues/163658

Looks like the workflow passes on 12.8 build that use inux.g4dn.4xlarge.nvidia.gpu but its failing on 12.6 builds that use linux.4xlarge.nvidia.gpu: https://github.com/pytorch/pytorch/actions/runs/17953843505/job/51080623612#step:13:470

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163956
Approved by: https://github.com/malfet


(cherry picked from commit 349c960970f4e29eff0d37a9b3c1ca5ed86a121a)

Co-authored-by: atalman <atalman@fb.com>
Co-authored-by: Mark Saroufim <marksaroufim@meta.com>
2025-09-29 16:14:37 -04:00
22d46b50ec [CUDA] revert PR 130472 (#163379)
[CUDA] revert PR 130472 (#162950)

This change may also resolve https://github.com/pytorch/pytorch/issues/161789, though verification is still needed.

PR #130472 would introduced the problem of  freeing the same address without clean metadata. according to the below discussion, reverted it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162950
Approved by: https://github.com/ngimel, https://github.com/eqy, https://github.com/syed-ahmed

(cherry picked from commit 4a160dae3cabaff358a6bb2490d0160dd1bf2cdf)

Co-authored-by: thenumberouscode <dream20151224@163.com>
2025-09-29 16:05:26 -04:00
d1b63e2b4a Skip test_conv3d_cudnn_broken on ROCM (#164163)
Skip test_conv3d_cudnn_broken on ROCM (#164138)

Followup after https://github.com/pytorch/pytorch/pull/163903  Fixes https://github.com/pytorch/pytorch/issues/164137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164138
Approved by: https://github.com/Camyll

(cherry picked from commit 95be302889b8683b7ec7793a69ffa8891b6b5af8)

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-09-29 11:41:18 -07:00
20100b7210 [c10d] P2P tensors must be dense (#163981)
[c10d] P2P tensors must be dense (#163719)

Fixes #161324
by adding `is_non_overlapping_and_dense` check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163719
Approved by: https://github.com/ngimel

(cherry picked from commit 11a231ef52841a549913b7a6d423cc9004b6b58b)

Co-authored-by: Ke Wen <kw2501@meta.com>
2025-09-29 11:27:24 -07:00
a2c77043ee Add operator benchmarking run to CI nightly (#164151)
Add operator benchmarking run to CI nightly (#162530)

This PR introduces a new "operator microbenchmark" CI workflow and GitHub Actions for operator microbenchmarks, updating test scripts and job matrices to support new parameters, and broadening the operator benchmark tests to include more data types, larger shapes, and gradient tests. The benchmark configurations now focus more on different cuda hardware and multiple dtypes (bf16, fp16, fp32), for both compile and eager mode.

**Benchmark Configuration and Coverage:**

* Expanded operator benchmark configurations in `addmm_test.py`, `bmm_test.py`, `matmul_test.py`, and `mm_test.py` to benchmark multiple dtypes on CUDA devices, in eager and compile mode, for forward and backward run. The configs with tag "long" for the above mentioned files are being run in CI.
* The CI benchmarking is running on various hardwares: H100, A100.
* The CI job also uploads the microbenchmarking outputs to a [HUD](https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch&benchmarkName=PyTorch+operator+microbenchmark) dashboard.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162530
Approved by: https://github.com/huydhn


(cherry picked from commit 54b38f3b46c33a1cc4e8f7894619358afcbd7c89)

Co-authored-by: jainapurva <apurvajain.kota@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-29 11:21:19 -07:00
b64fc8e41e Fix operator benchmark issue#162708 (#164140)
Fix operator benchmark issue#162708 (#162744)

This PR skips memory metric calculation for ops which don't take tensor input, fixing the operator_benchmark bug

Fixes https://github.com/pytorch/pytorch/issues/162708

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162744
Approved by: https://github.com/huydhn

(cherry picked from commit 5f66902ecfb9cb4f7b9c50cb86307217cec1dbe9)

Co-authored-by: jainapurva <apurvajain.kota@gmail.com>
2025-09-29 09:34:26 -07:00
709f4f62a0 [cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ (#164027)
[cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ (#163581)

To workaround #163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163581
Approved by: https://github.com/ngimel, https://github.com/malfet


(cherry picked from commit e2817ac20426356278502db3b1614ea87cb7cff7)

Co-authored-by: Eddie Yan <eddiey@nvidia.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-09-29 09:07:14 -07:00
11f776c8ee [cuDNN][SDPA] Disable dropout for cuDNN SDPA on 9.11 - 9.13 (#164026)
[cuDNN][SDPA] Disable dropout for cuDNN SDPA on 9.11 - 9.13 (#163903)

cuDNN introduced some broken heuristics for these cases so we need to disable dropout to avoid unexpected crashes due to heuristics refusing to proceed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163903
Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/atalman

(cherry picked from commit ed3085814a870f7a07b7f9c696999a47d4f85376)

Co-authored-by: Eddie Yan <eddiey@nvidia.com>
2025-09-29 09:06:23 -07:00
45e257f046 [cuDNN][conv][64-bit] Disable cuDNN for 64-bit depthwise convs again (#164023)
[cuDNN][conv][64-bit] Disable cuDNN for 64-bit depthwise convs again (#163171)

test is breaking, will check if there's an older version that we can enable on to avoid completely dropping support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163171
Approved by: https://github.com/ngimel, https://github.com/malfet

(cherry picked from commit 0ea10f9912a9ec7c6d606bc71e3ec91f20372212)

Co-authored-by: eqy <eddiey@nvidia.com>
2025-09-29 09:03:36 -07:00
37e2626639 Update the operator benchmarking, to benchmark using torch.compile (#164101)
Update the operator benchmarking, to benchmark using torch.compile (#161394)

This pull request enhances the PyTorch operator benchmarking suite by introducing support for benchmarking with `torch.compile` mode, in addition to existing Eager and JIT. It also adds peak memory measurement (fwd/bwd pass); improves the output format in JSON to be used by dashboard for reporting; and introduce some more CLI options. The new CLI flags introduced are:

- Added `--use-compile` CLI argument and corresponding logic to run benchmarks using `torch.compile`, including mutual exclusivity with `--use-jit`
- Added `--benchmark-name` argument for customizing the benchmark name in output
- Updated default value for `--output-json-for-dashboard` to `benchmark-results.json` for more predictable output file name

Sample command to run a single operator:
`python -m pt.mm_test --use-compile`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161394
Approved by: https://github.com/jbschlosser

(cherry picked from commit af60398c3a057506363e028bf328843a755b4f24)

Co-authored-by: jainapurva <apurvajain.kota@gmail.com>
2025-09-29 07:49:05 -07:00
d7a703ea92 [SymmMem] Barrier on team instead of world (#163376)
[SymmMem] Barrier on team instead of world (#163298)

As titled. Avoiding a potential hang when running dispatch and combine in subgroups.

The rest is just re-arrange of the tests to create a sub-group test class. (no substantial change)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163298
Approved by: https://github.com/fegin

(cherry picked from commit f8fb437197033c33ecc435cd5e1e6a5b2bc5bf69)

Co-authored-by: Ke Wen <kw2501@meta.com>
2025-09-26 16:41:18 -07:00
daa3d04325 [SymmMem] Fix memory allocation hold-up (#163375)
[SymmMem] Fix memory allocation hold-up (#162680)

Problem:
Without MemPool it looks like nvshmem backend never deallocates memory.

Cause:
Handles in `symm_mems_` (a map) keeps reference to memory allocations.

Solution:
- Remove reference to allocation from handles -- the reference is never used anyway.
- Use `unique_ptr` instead of `shared_ptr` to wrap allocation to ensure single ownership.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162680
Approved by: https://github.com/ezyang
ghstack dependencies: #163298

(cherry picked from commit 7130b174e07dbc1a708934b18dede3d88e8f779f)

Co-authored-by: Ke Wen <kw2501@meta.com>
2025-09-26 16:35:56 -07:00
999304396f [dist] handle discontiguous allgather/reducescatter inputs (#163987)
[dist] handle discontiguous allgather/reducescatter inputs (#163712)

Fixes #163483

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163712
Approved by: https://github.com/ezyang, https://github.com/kwen2501

(cherry picked from commit 71eec6a0bf69f712f4b9279fdc8d1459be0426e6)

Co-authored-by: Natalia Gimelshein <ngimel@meta.com>
2025-09-26 16:21:08 -07:00
5340e741df [Reland][163423] Promote @requires_nvshmem instead of enable_triton (#163916)
[Reland][163423] Promote `@requires_nvshmem` instead of `enable_triton` (#163549)

#163423 was approved but reverted due to a revert of base.
Relanding without base.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163549
Approved by: https://github.com/wdvr


(cherry picked from commit 6e6c899347db952f6a691feb4e8610fe9cca0279)

Co-authored-by: Ke Wen <kw2501@fb.com>
Co-authored-by: Wouter Devriendt <wouterdevriendt@meta.com>
2025-09-26 15:58:30 -07:00
7cadf8ac04 [Inductor][Intel GPU] Save threads_per_warp from tirton compiled kernel for launching kernel correctly in cpp wrapper. (#163388)
[Inductor][Intel GPU] Save `threads_per_warp` from tirton compiled kernel for launching kernel correctly in cpp wrapper. (#163315)

On the Inductor XPU backend, `threads_per_warp` is not always 32. For Intel GEMM Triton kernels, it can be 16. This information must be preserved for XPU so that the Cpp wrapper can launch the kernel with the correct configuration.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163315
Approved by: https://github.com/EikanWang, https://github.com/desertfire

(cherry picked from commit 9f8a311af09586ac4026d6a56fc7c4ac7acc62ed)

Co-authored-by: xinan.lin <xinan.lin@intel.com>
2025-09-26 14:42:09 -04:00
f9e495fe8e Move inductor jobs 3.9->3.10 (#163954)
Move inductor jobs 3.9->3.10 (#162323)

Related to: https://github.com/pytorch/pytorch/issues/161167

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162323
Approved by: https://github.com/huydhn, https://github.com/Skylion007


(cherry picked from commit e8eeb060348f250975124abb957b1d7d9c4af9a0)

Co-authored-by: atalman <atalman@fb.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-26 12:37:50 -04:00
57dc68844d [CI] Fix test_triton_wait_until hang (#163914)
[CI] Fix test_triton_wait_until hang (#163886)

I don't know why `nvshmem_barrier_all_kernel`  leads the test to hang. Will investigate.
But since it is an unnecessary call here, I am removing it to unblock other PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163886
Approved by: https://github.com/fegin

(cherry picked from commit 96275dbf88372bb32a123c4ea918498128fbecb9)

Co-authored-by: Ke Wen <kw2501@meta.com>
2025-09-26 12:16:00 -04:00
63da9d2730 [Release 2.9] Update torch-xpu-ops commit pin (#163622)
Update commit pin to 789f59
2025-09-26 09:46:02 -04:00
824d59fbf6 [CI] Install libuv for Win testing (#163907)
[CI] Install libuv for Win testing (#163797)

Current working theory why f0078941cf caused a regression, are because Windows CI no longer could be build with distributed, as it could not find libuv
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163797
Approved by: https://github.com/wdvr

(cherry picked from commit cc660d38ac533b92f3ad4cb1105f7a16f74b9f09)

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-09-26 00:03:22 -07:00
fc8bf12b38 Fix cpp build (#163887)
Fix cpp build (#162774)

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162774
Approved by: https://github.com/malfet, https://github.com/atalman

(cherry picked from commit b61bdc7cc4c841bf7574bc993f3fd445682f0997)

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-09-25 14:50:59 -07:00
49dab18ecf [CD] Add statically linked windows libraries to exclude list (#163862)
[CD] Add statically linked windows libraries to exclude list (#163768)

Fixes: https://github.com/pytorch/pytorch/issues/159514

Seeing following in the Wheel build logs:
```
Linking CXX static library lib\kineto.lib
Linking CXX static library lib\dnnl.lib
....
```

These files are around 800MB uncompressed and 109MB compressed, hence provide ~50% size reduction for Windows CPU builds.

Test Plan: Build Pytorch Windows binary. Build vision, audio and torchcodec with this binary. Smoke test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163768
Approved by: https://github.com/albanD, https://github.com/malfet

(cherry picked from commit 98c4e35f14601909c113b4fd2857b6f0fb525316)

Co-authored-by: atalman <atalman@fb.com>
2025-09-25 14:46:56 -07:00
0154ca1d3d [BE] Update Python min version to 3.10 (#162310) (#163885)
* [BE] Update Python min version to 3.10 (#162310)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162310
Approved by: https://github.com/atalman, https://github.com/Skylion007, https://github.com/ZainRizvi

* comment out executorch

---------

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2025-09-25 14:44:48 -07:00
132d9fac3b Revert "[BE] Update Python min version to 3.10 (#162310)" (#163882)
Revert "[BE] Update Python min version to 3.10 (#162310) (#163802)"

This reverts commit 7d024a6e299eee2830e9fbdae1913e432160bb23.
2025-09-25 10:54:12 -07:00
87c5d4a858 [cherrypick] [CI] Move Windows build/tests to Python-3.10 #162862 (#163800)
[CI] Move Windows build/tests to Python-3.10 (#162862)

What supposed to be a very simple change end up being quite involved, as current Windows CI framework is quite inflexible, i.e. it takes a lots of argument, but later on ignores them, namely:
 - `PYTHON_VERSION` used to be a no-op that is simply ignored by the scripts
 - With this change, `setup-win` action will create an environment called `py_tmp` with specific python version + intel-openmp (that is hard runtime requirement, but for some reason not packaged into the wheel nor marked as such)
 - Copied test type dependencies from be01a40157/aws/ami/windows/scripts/Installers/Install-Pip-Dependencies.ps1 (L16) into `win-test.sh`, but made some adjustments to be compatible with 3.10 runtime (scipy version update) and just make rerun-tests compatible with the rest of the deps

I think in the long run, one needs to update 4432e2cacd/aws/ami/windows/scripts/Installers/Install-Miniconda3.ps1 that currently pins Miniconda python to 3.9, but also figure out how CI can still create a new environment without having to download all the dependencies all the time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162862
Approved by: https://github.com/wdvr, https://github.com/huydhn
ghstack dependencies: #163339, #163341

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2025-09-25 09:06:52 -07:00
b0dc90881c [CD] Simplify NVIDIA driver installation step (#163349) (#163790)
Undo changes introduced in https://github.com/pytorch/pytorch/pull/160956 as driver has been updated to 580 for both fleets

Fixes https://github.com/pytorch/pytorch/issues/163342
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163349
Approved by: https://github.com/seemethere

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-09-25 10:40:57 -04:00
c0577aad39 Use cuda nvrtc so file based on cuda version used by torch (#163642) (#163788)
Fixes https://github.com/pytorch/pytorch/issues/162367

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163642
Approved by: https://github.com/msaroufim
2025-09-25 10:40:09 -04:00
9952b87600 [CD] CUDA 13.0 fix preload logic to include nvidia/cu13/lib/ (#163766)
[CD] CUDA 13.0 fix preload logic to include nvidia/cu13/lib/ (#163661)

Preload logic no longer works with CUDA 13.0
See the installation path:
```
ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/cu13/lib/
libcheckpoint.so   libcudadevrt.a      libcufft.so.12   libcufile_rdma.so.1  libcusolver.so.12    libnvJitLink.so.13  libnvperf_target.so            libnvrtc.alt.so.13    libpcsamplingutil.so
libcublas.so.13    libcudart.so.13     libcufftw.so.12  libcupti.so.13       libcusolverMg.so.12  libnvblas.so.13     libnvrtc-builtins.alt.so.13.0  libnvrtc.so.13
libcublasLt.so.13  libcudart_static.a  libcufile.so.0   libcurand.so.10      libcusparse.so.12    libnvperf_host.so   libnvrtc-builtins.so.13.0      libnvtx3interop.so.1

ls /home/ubuntu/.venv/lib/python3.10/site-packages/nvidia/
cu13  cudnn  cusparselt  nccl  nvshmem
```

Test using script from : https://github.com/pytorch/pytorch/issues/162367
```
Kernel test passed!
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163661
Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/Camyll

(cherry picked from commit 141fc7276ebc722b6076cc3afe4fbc6307a1b775)

Co-authored-by: atalman <atalman@fb.com>
2025-09-25 10:38:16 -04:00
300bade202 [Cherry-Pick] [CD] CUDA 13 specific followup changes. Remove sm50-70 From CUDA 12.6 and CUDA 12.8 builds (#162455) (#163764)
* [CD] CUDA 13 specific followup changes (#162455)

Follow up for CUDA 13 bring up https://github.com/pytorch/pytorch/issues/159779
sm50-70 should not be added to sbsa build arch list, as previous archs had no support for arm.
remove platform_machine from PYTORCH_EXTRA_INSTALL_REQUIREMENTS

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162455
Approved by: https://github.com/atalman

* update

---------

Co-authored-by: Ting Lu <tingl@nvidia.com>
2025-09-25 10:37:52 -04:00
96f0c0fa07 Fix some edge cases (#163106)
Fix some edge cases (#162295)

``` Summary
🔝 Top 5 Performance Differences (by absolute %):
shape: (5, 7)
┌────────────────┬────────────────┬─────────────────────────────┬───────────────────┬──────────────────────┬───────────────────────────┬───────────┐
│ attn_type      ┆ dtype          ┆ shape(B,Hq,M,Hkv,N,D)       ┆ TFlops BWD (base) ┆ TFlops BWD (no_peel) ┆ no_peel_speedup_over_base ┆ pct_delta │
│ ---            ┆ ---            ┆ ---                         ┆ ---               ┆ ---                  ┆ ---                       ┆ ---       │
│ str            ┆ str            ┆ str                         ┆ f64               ┆ f64                  ┆ f64                       ┆ f64       │
╞════════════════╪════════════════╪═════════════════════════════╪═══════════════════╪══════════════════════╪═══════════════════════════╪═══════════╡
│ sliding_window ┆ torch.bfloat16 ┆ (2, 16, 1024, 4, 1024, 64)  ┆ 56.937931         ┆ 58.960459            ┆ 1.035522                  ┆ 3.552163  │
│ noop           ┆ torch.bfloat16 ┆ (2, 16, 1024, 4, 1024, 128) ┆ 89.221306         ┆ 86.295642            ┆ 0.967209                  ┆ -3.27911  │
│ causal         ┆ torch.bfloat16 ┆ (2, 16, 4096, 4, 4096, 128) ┆ 111.552594        ┆ 114.380841           ┆ 1.025353                  ┆ 2.535349  │
│ alibi          ┆ torch.bfloat16 ┆ (2, 16, 1024, 16, 1024, 64) ┆ 74.830149         ┆ 76.685445            ┆ 1.024793                  ┆ 2.479344  │
│ alibi          ┆ torch.bfloat16 ┆ (2, 16, 1024, 4, 1024, 64)  ┆ 55.279932         ┆ 56.369312            ┆ 1.019707                  ┆ 1.97066   │
└────────────────┴────────────────┴─────────────────────────────┴───────────────────┴──────────────────────┴───────────────────────────┴───────────┘

🔺 Top 5 Cases Where no_peel (change) is Faster than base (baseline):
shape: (5, 7)
┌────────────────┬────────────────┬─────────────────────────────┬───────────────────┬──────────────────────┬───────────────────────────┬───────────┐
│ attn_type      ┆ dtype          ┆ shape(B,Hq,M,Hkv,N,D)       ┆ TFlops BWD (base) ┆ TFlops BWD (no_peel) ┆ no_peel_speedup_over_base ┆ pct_delta │
│ ---            ┆ ---            ┆ ---                         ┆ ---               ┆ ---                  ┆ ---                       ┆ ---       │
│ str            ┆ str            ┆ str                         ┆ f64               ┆ f64                  ┆ f64                       ┆ f64       │
╞════════════════╪════════════════╪═════════════════════════════╪═══════════════════╪══════════════════════╪═══════════════════════════╪═══════════╡
│ sliding_window ┆ torch.bfloat16 ┆ (2, 16, 1024, 4, 1024, 64)  ┆ 56.937931         ┆ 58.960459            ┆ 1.035522                  ┆ 3.552163  │
│ causal         ┆ torch.bfloat16 ┆ (2, 16, 4096, 4, 4096, 128) ┆ 111.552594        ┆ 114.380841           ┆ 1.025353                  ┆ 2.535349  │
│ alibi          ┆ torch.bfloat16 ┆ (2, 16, 1024, 16, 1024, 64) ┆ 74.830149         ┆ 76.685445            ┆ 1.024793                  ┆ 2.479344  │
│ alibi          ┆ torch.bfloat16 ┆ (2, 16, 1024, 4, 1024, 64)  ┆ 55.279932         ┆ 56.369312            ┆ 1.019707                  ┆ 1.97066   │
│ causal         ┆ torch.bfloat16 ┆ (4, 16, 4096, 4, 4096, 64)  ┆ 111.08814         ┆ 112.447047           ┆ 1.012233                  ┆ 1.22327   │
└────────────────┴────────────────┴─────────────────────────────┴───────────────────┴──────────────────────┴───────────────────────────┴───────────┘

🔻 Top 5 Cases Where no_peel (change) is Slower than base (baseline):
shape: (5, 7)
┌────────────────┬────────────────┬─────────────────────────────┬───────────────────┬──────────────────────┬───────────────────────────┬───────────┐
│ attn_type      ┆ dtype          ┆ shape(B,Hq,M,Hkv,N,D)       ┆ TFlops BWD (base) ┆ TFlops BWD (no_peel) ┆ no_peel_speedup_over_base ┆ pct_delta │
│ ---            ┆ ---            ┆ ---                         ┆ ---               ┆ ---                  ┆ ---                       ┆ ---       │
│ str            ┆ str            ┆ str                         ┆ f64               ┆ f64                  ┆ f64                       ┆ f64       │
╞════════════════╪════════════════╪═════════════════════════════╪═══════════════════╪══════════════════════╪═══════════════════════════╪═══════════╡
│ noop           ┆ torch.bfloat16 ┆ (2, 16, 1024, 4, 1024, 128) ┆ 89.221306         ┆ 86.295642            ┆ 0.967209                  ┆ -3.27911  │
│ causal         ┆ torch.bfloat16 ┆ (4, 16, 1024, 4, 1024, 64)  ┆ 78.23082          ┆ 76.693169            ┆ 0.980345                  ┆ -1.965531 │
│ sliding_window ┆ torch.bfloat16 ┆ (2, 16, 2048, 4, 2048, 128) ┆ 96.95663          ┆ 95.573333            ┆ 0.985733                  ┆ -1.426717 │
│ alibi          ┆ torch.bfloat16 ┆ (4, 16, 2048, 4, 2048, 64)  ┆ 93.373473         ┆ 92.294147            ┆ 0.988441                  ┆ -1.155924 │
│ alibi          ┆ torch.bfloat16 ┆ (2, 16, 2048, 4, 2048, 128) ┆ 96.95147          ┆ 96.105389            ┆ 0.991273                  ┆ -0.872685 │
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162295
Approved by: https://github.com/mlazos, https://github.com/v0i0

(cherry picked from commit 864ffe12d737403230e8257b9bce0a830bd590c1)

Co-authored-by: drisspg <drisspguessous@gmail.com>
2025-09-25 10:29:39 -04:00
7d024a6e29 [BE] Update Python min version to 3.10 (#162310) (#163802)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162310
Approved by: https://github.com/atalman, https://github.com/Skylion007, https://github.com/ZainRizvi
ghstack dependencies: #162862

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2025-09-24 15:48:19 -07:00
be29c5b207 Add analytics ID to cpp docs (#163695)
Add analytics ID to cpp docs (#163370)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163370
Approved by: https://github.com/albanD

(cherry picked from commit e6a9db58d71e474deac28276de1f611638c32eeb)

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-09-24 15:45:17 -07:00
5322dab793 Update pytorch.org links in docs/conf.py (#163703)
Update pytorch.org links in docs/conf.py (#163682)

Update links in conf.py to docs.pytorch.org

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163682
Approved by: https://github.com/sekyondaMeta, https://github.com/albanD

(cherry picked from commit 8c8416b021e59a5ec58aceb38eeffc63885a28bc)

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-09-24 15:44:43 -07:00
1dadb6196b [BE] Introduce CONDA_ROOT_DIR (#163805)
[BE] Introduce `CONDA_ROOT_DIR` (#163341)

Which equal to `%CONDA_PARENT_DIR%/Miniconda3`, and replace this pattern with `%CONDA_ROOT_DIR%` throughout the codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163341
Approved by: https://github.com/clee2000
ghstack dependencies: #163339

(cherry picked from commit a273475b01e912f402378a522bb9c4ed37e8413a)

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2025-09-24 15:42:16 -07:00
6c058c1262 Move ROCM trunk wheel builds to 3.10 (#163804)
Move ROCM trunk wheel builds to 3.10 (#163339)

This code is a delicious spaghetti: Sometimes python version is defined in jinja template (see https://github.com/pytorch/pytorch/pull/162297 ) sometimes in shell script (see https://github.com/pytorch/pytorch/pull/162877 ), but this time around it's in a python file (and there is another one called `generate_binary_build_matrix.py` that defines `FULL_PYTHON_VERSIONS`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163339
Approved by: https://github.com/clee2000

(cherry picked from commit 52dd7a898c117305b4407c7f26bbcc7b39f20aaa)

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2025-09-24 15:41:55 -07:00
715dca6725 [export] Remove .contiguous() when saving weights to raw bytes (#163662)
[export] Remove .contiguous() when saving weights to raw bytes (#163587)

Summary: `.contiguous()` will discard the original storage size of the tensor, and could lead to issues during loading.

Test Plan:
buck2 run mode/dev-nosan caffe2/test:test_export -- -r test_1D_tensor_slicing
buck2 run mode/dev-nosan caffe2/test:test_export -- -r test_2D_tensor_slicing

Differential Revision: D83016250

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163587
Approved by: https://github.com/angelayi

(cherry picked from commit 720a7b2887ca4efc8d63b32373182bc97918c76e)

Co-authored-by: Yiming Zhou <yimingzhou@meta.com>
2025-09-23 10:15:06 -07:00
47cb45e4f6 Update pytorch_sphinx_theme2 to latest hash (#163655)
Update pytorch_sphinx_theme2 to latest hash (#163269)

The updated theme:
- Fixes articleBody in the json+ld that caused previous Google Search issues
- Other minor fixes
- 404.html fixes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163269
Approved by: https://github.com/albanD

(cherry picked from commit 68e75be86ab618bb6b1dc32b603a780ff6046262)

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-09-23 10:13:51 -07:00
4966d058f2 CUDA 13.0 Warning update for supported architectures (#163633)
CUDA 13.0 Warning update for supported architectures (#163585)

Please see build script: 8da008678f/.ci/manywheel/build_cuda.sh (L69-L71)

This should display correct warning:
``
Please install PyTorch with a following CUDA
configurations: 12.6 12.8 13.0 following instructions at
https://pytorch.org/get-started/locally/
``
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163585
Approved by: https://github.com/malfet

(cherry picked from commit 3c64b2abab5a23809140da5bd6272307b776e459)

Co-authored-by: atalman <atalman@fb.com>
2025-09-23 10:13:06 -07:00
579794ed7b [SymmMem] Fix put_signal + wait_until hang (#163458)
[SymmMem] Fix put_signal + wait_until hang (#163194)

The test used a wrong ptr to refer to remote address:
```
            dst_ptr = out_hdl.buffer_ptrs[peer]
            src_ptr = inp_hdl.buffer_ptrs[rank]
            sig_ptr = out_hdl.signal_pad_ptrs[peer]
```
All three indices should be `rank` instead of `peer` because NVSHMEM APIs accept local address as input and perform translation internally. Without correct signal address, the peer would be waiting, thus hang.

Also adjusted the signature of `nvshmem.putmem_signal_block` to accept tensor instead of pointer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163194
Approved by: https://github.com/ngimel
ghstack dependencies: #163025, #163152

(cherry picked from commit 80f8be9840c20c3efe1274266b52ab098f4d1030)

Co-authored-by: Ke Wen <kw2501@meta.com>
2025-09-23 10:10:02 -07:00
7cf37ae3cb [2.9 cherry pick][triton] update 3.5 pin to bbb06c0334a6772b92d24bde54956e675c8c6604 (#163382) (#163583)
Includes:
* https://github.com/triton-lang/triton/pull/8211 to work around a PTXAS bug that was causing 03-matrix-multiplication tutorial matmuls to underperform due to excessive WGMMA waits
* https://github.com/triton-lang/triton/pull/8157 to fix a convert_layout bug

Verified that this passes Triton CI in https://github.com/pytorch/pytorch/pull/159158 and improves gemm perf (see https://github.com/pytorch/pytorch/issues/159704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163382
Approved by: https://github.com/Camyll, https://github.com/atalman
2025-09-22 18:20:20 -07:00
f83cf0714e [graph partition] Add way to register custom rule (#163310) (#163395)
This PR adds an experimental way to register a custom rule for if
inductor should partition the graph around an operator.

Test Plan:
- new test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163310
Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison
ghstack dependencies: #162117, #162307, #162651
2025-09-22 18:18:07 -07:00
ddd5074afc [CI] Update NVIDIA driver to 580.82.07 (#163522)
[CI] Update NVIDIA driver to `580.82.07` (#163111)

To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with 6e08c9d08e

This fix was suggested in https://github.com/pytorch/pytorch/issues/162878#issuecomment-3288635745

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163111
Approved by: https://github.com/huydhn

(cherry picked from commit 8dbac62edb48815dfca84dfdcca40d6a24d0652b)

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2025-09-22 11:45:48 -04:00
35c55da805 [Graph Partition] improve custom op output alias (#163380)
[Graph Partition] improve custom op output alias (#163227)

For a custom op with multiple outputs, we will see the following generated code:
```
buf1 = op1(arg0)
buf3 = buf0[0]
buf4 = buf0[1]
del buf1 # <--- if buf1 is not accessed in the future
```

If `buf1` is not accessed in the future, it's good to deallocate early. So we don't delay `del` until both buf3 and buf4 are not used anymore. Note that buf3 and buf4 hold reference to the data such that `del buf1` does not prevent their usage.

However, when there are mutating args, we don't see `del buf1` immediately.

```python
@torch.library.custom_op(
    "mylib::op1",
    mutates_args=["x"],
    schema="(Tensor(a!)?  x) -> (Tensor, Tensor)",
    device_types="cuda",
)
def op1(x) -> tuple[torch.Tensor, torch.Tensor]:
    x = x + 1
    return (x + 1, x + 2)
```

<img width="661" height="821" alt="image" src="https://github.com/user-attachments/assets/3d1d1f5a-9749-4652-bb02-da593c78702d" />

Why? Because `buf3` is a MultiOutput with `buf1` as input and believes `buf1` (an output of FallbackKernel op1) has inputs that alias output.
72fedf0575/torch/_inductor/ir.py (L7976-L7982)

According to `[NOTE: FallbackKernel supported operators]`, as a mutating op that are auto-functionalizable, buf1's output should NOT alias any of the inputs. This PR improves get_inputs_that_alias_output of Fallback Kernel.

Use case: [moe custom op in vllm](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/layer.py#L2057-L2064)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163227
Approved by: https://github.com/zou3519

(cherry picked from commit 4967ad8baa724b8b1acc123698bb1265723feb87)

Co-authored-by: Boyuan Feng <boyuan@meta.com>
2025-09-19 16:36:03 -07:00
a576d48637 Skip test_ind_worker_queue on Windows and macOS (flaky) (#163363)
Skip test_ind_worker_queue on Windows and macOS (flaky) (#162555)

Fixes https://github.com/pytorch/pytorch/issues/68643

It was closed by the bot yesterday and the issue was still there https://github.com/pytorch/pytorch/actions/runs/17595694816/job/49989589647.  It's better to just skip it directly in the code as this test has been disabled on Windows and MacOS since 2021 O_o
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162555
Approved by: https://github.com/clee2000

(cherry picked from commit 98e22c8a693644c6d235d7a858dc411b1aefafa7)

Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-19 13:07:00 -07:00
25d8c0be68 Add decomp rule to assert_tensor_metadata for BatchedTensors (#163361)
Add decomp rule to assert_tensor_metadata for BatchedTensors  (#163008)

Whenever there is device move, export introduces assert_tensor_metadata aten operator to make sure to guard for device specialization. This aten op didn't work with Vmap because we didn't register explicit decomp rule saying we just skip BatchedTensor and call it on underlying tensor

Differential Revision: [D82483979](https://our.internmc.facebook.com/intern/diff/D82483979)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163008
Approved by: https://github.com/huydhn

(cherry picked from commit e28983be76aa4651e3cb69dc3a4234d75038d938)

Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com>
2025-09-19 13:00:57 -07:00
b1aae80953 [Cherry Pick][Graph Partition] allow sharing default device context (#163097)
cherry pick PR 162873
2025-09-19 11:10:29 -07:00
eqy
76bebf38de [Release 2.9] [cuDNN][SDPA][submodule] Roll-back cuDNN frontend upgrade, update Met… (#163265)
[cuDNN][SDPA][submodule] Roll-back cuDNN frontend upgrade, update Meta registration (#163104)

For https://github.com/pytorch/torchtitan/issues/1713

Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time 1a7b4b78db/include/cudnn_frontend/node/sdpa_support_surface.h (L447%C2%A0)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163104
Approved by: https://github.com/drisspg
2025-09-19 10:53:04 -07:00
bc158ebdc7 [SymmMem] Fix NVSHMEM plugin + Triton 3.5 (#163262)
[SymmMem] Fix NVSHMEM plugin + Triton 3.5 (#163152)

1. The dispatch signatures defined in `core.extern_elementwise` call must match the C signature of the NVSHMEM functions, in particular the dtypes. Otherwise, there would be weird errors, such as IMA or hang. When matched, most of time the NVSHMEM device function will be inlined into the generated PTX. When not matched, it is represented as a function call in the PTX (not sure if it is the function call that goes wrong).

2. When calling the `core.extern` wrappers from the `triton.jit` kernels, the input must be cast to match the signatures defined in 1, e.g. via `nbytes.to(tl.int64)`. Otherwise, Triton will report a key error when searching for such kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163152
Approved by: https://github.com/ngimel
ghstack dependencies: #163025

(cherry picked from commit 57a54a04b6eb78e0aa7d13b48e25fb8c0c49fd60)

Co-authored-by: Ke Wen <kw2501@meta.com>
2025-09-19 10:51:02 -07:00
ffa6f63fe2 Revert "Make distributed modules importable even when backend not bui… (#163024)
Revert "Make distributed modules importable even when backend not built (#159889)" (#162568)

This reverts commit a0d026688cd69583d5a4e0c6f3e5fda141a7f4a9.

Revert "Always build USE_DISTRIBUTED. (#160449)"

This reverts commit d80297a6846f1f2c36fd4f19e22919f2abe8fcea.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568
Approved by: https://github.com/huydhn

Co-authored-by: Edward Yang <ezyang@meta.com>
2025-09-19 10:34:55 -07:00
baab5c6c8b [ONNX] Update export docstring & Set fallback=False by default (#162637)
* [ONNX] Update export docstring (#162622)

Update export docstring to reflect the latest configuration.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162622
Approved by: https://github.com/titaiwangms

(cherry picked from commit 7e2e83cdbe532b230dee40cfe0454116c9b64710)

* Change fallback option to False in ONNX export

* Change fallback parameter default to False

---------

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2025-09-16 17:23:47 -07:00
9718af107e Support vmap + custom autograd function/improve DTensor constructor inefficiency (#162738)
Support vmap + custom autograd function/improve DTensor constructor inefficiency (#162240)

This makes gemma3 exportable on transformers=4.55.4

In HF, there is a torch funciton mode called TransformGetItemToIndex which internally calls custom autograd function. When this custom autograd function is called under vmap, It triggers CustomFunctionHigherOrderOP which error-ed because there was no pre-dispatch proxy mode implementation.

Since there are number of requests lately to add various operators in pre-dispatch IR, I introduce a decorator in export that works similar to `allow_in_graph`. Basically:
1) We intercept custom_autograd_function.apply at pre-dispatch mode when this decorator is applied
2) We apply `flat_apply` HOP to hide the pytree spec for this autograd function. Note that this adds restriction that this custom autograd function needs to take in fx-able types.
3) subclass constructor decorator is implemented similarly, so we just refactor it to use similar implementation as this new decorator. eventually we should delete the subclass constructor decorator.
4) Move some code in subclass constructor decorator to exit early in non-export environment which should shave off some inefficiency (around 1% according to @swolchok 's benchmark)

Fixes: https://github.com/pytorch/pytorch/issues/161563#issuecomment-3246309758

Differential Revision: [D82141316](https://our.internmc.facebook.com/intern/diff/D82141316)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162240
Approved by: https://github.com/ydwu4

(cherry picked from commit 463fbc8ca0537e5635236190d2ca38ce6fcef831)

Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com>
2025-09-16 17:22:16 -07:00
7f8ba48c2a Fix the regression issue caused by non-arrch64 platforms not hitting the MKLDNN path. (#162778)
Fix the regression issue caused by non-arrch64 platforms not hitting the MKLDNN path. (#162168)

This issue was introduced by the commit in issue #161065. Added an extra check to provide a proper path for other platforms.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162168
Approved by: https://github.com/mingfeima, https://github.com/malfet


(cherry picked from commit 563921619b3e820b170475b9278ff94ee6e1a32c)

Co-authored-by: Yuxingwang-intel <yuxing.wang@intel.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-09-16 17:21:10 -07:00
aebf427c53 [Release 2.9] Update torch-xpu-ops commit pin (#162935)
Update commit pin to f8408a
2025-09-16 17:19:31 -07:00
44baf2ff8d fix deterministic scatter_add path for multi-d tensors (#162977)
fix deterministic scatter_add path for multi-d tensors (#162866)

PReviously for more than 2d tensor `select` didn't work correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162866
Approved by: https://github.com/valentinandrei

(cherry picked from commit bf6b40da3e3be7718b8ddc94eed2da8cabaa5e86)

Co-authored-by: Natalia Gimelshein <ngimel@meta.com>
2025-09-16 17:17:36 -07:00
1076941ff7 [ONNX] Fix rotary_embedding_23 implementation (#163041)
[ONNX] Fix rotary_embedding_23 implementation (#162865)

The implementation of rotary_embedding_23 when input is 3D was incorrect.

## Tested

Locally with

```py
import onnx_ir as ir
import onnx
import torch
import os
import numpy as np

base_path = "/home/justinchu/dev/onnx/onnx/backend/test/data/node"
test_names = [
    "test_rotary_embedding",
    "test_rotary_embedding_3d_input",
    "test_rotary_embedding_interleaved",
    "test_rotary_embedding_no_position_ids",
    "test_rotary_embedding_no_position_ids_interleaved",
    "test_rotary_embedding_no_position_ids_rotary_dim",
    "test_rotary_embedding_with_interleaved_rotary_dim",
    "test_rotary_embedding_with_rotary_dim",
]
model_paths = [os.path.join(base_path, name) for name in test_names]

for path in model_paths:
    print(f"Checking {path} for issues...")

    model = onnx.load(os.path.join(path, "model.onnx"))
    input0 = ir.from_proto(
        onnx.load_tensor(os.path.join(path, "test_data_set_0", "input_0.pb"))
    ).numpy()
    input1 = ir.from_proto(
        onnx.load_tensor(os.path.join(path, "test_data_set_0", "input_1.pb"))
    ).numpy()
    input2 = ir.from_proto(
        onnx.load_tensor(os.path.join(path, "test_data_set_0", "input_2.pb"))
    ).numpy()
    if os.path.exists(os.path.join(path, "test_data_set_0", "input_3.pb")):
        input3 = ir.from_proto(
            onnx.load_tensor(os.path.join(path, "test_data_set_0", "input_3.pb"))
        ).numpy()
    else:
        input3 = None
    output0 = ir.from_proto(
        onnx.load_tensor(os.path.join(path, "test_data_set_0", "output_0.pb"))
    ).numpy()

    m = ir.from_proto(model)

    node = m.graph[-1]
    print(node)
    assert node.op_type == "RotaryEmbedding"

    interleaved = node.attributes.get_int("interleaved", 0)
    num_heads = node.attributes.get_int("num_heads", 0)
    rotary_embedding_dim = node.attributes.get_int("rotary_embedding_dim", 0)

    torch_out = torch.onnx.ops.rotary_embedding(
        torch.tensor(input0),
        torch.tensor(input1),
        torch.tensor(input2),
        position_ids=torch.tensor(input3) if input3 is not None else None,
        interleaved=bool(interleaved),
        num_heads=num_heads,
        rotary_embedding_dim=rotary_embedding_dim,
    )
    torch_out = torch_out.detach().cpu().numpy()
    np.testing.assert_allclose(torch_out, output0)
```

Fix https://github.com/pytorch/pytorch/issues/162848

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162865
Approved by: https://github.com/kunal-vaishnavi, https://github.com/titaiwangms

(cherry picked from commit fdf68fa5d70abebee1c5090a51ea30c7aa40b9b0)

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2025-09-16 17:16:23 -07:00
0ac9fa4413 [ez][CI] Fix docs push in nightly workflow (#163085)
[ez][CI] Fix docs push in nightly workflow (#162657)

HUD metrics page says docs push hasn't happened in 21 days
<img width="293" height="142" alt="image" src="https://github.com/user-attachments/assets/f930aab8-0503-4bf2-b962-8c375dec6b78" />

I guess main branch docs just haven't been updated?  Did anyone notice?  Do we care?

Either way I think this should fix it

Likely started after https://github.com/pytorch/pytorch/pull/161182
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162657
Approved by: https://github.com/huydhn

(cherry picked from commit 2f533959430c2a41fe16ef79fe4d680a5c4e0585)

Co-authored-by: Catherine Lee <csl@fb.com>
2025-09-16 12:04:17 -07:00
152383b745 fix typo: summit -> submit (#162597)
fix typo: summit -> submit (#162587)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162587
Approved by: https://github.com/justinchuby

(cherry picked from commit fefc406a3d0d90db0f808419fb88045f90b213cd)

Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>
2025-09-12 11:41:11 -04:00
c31a8186c1 [CD] Aarch64 Fix packaging `libarm_compute.so` and other libraries to the aarch64 CUDA wheels (#162596)
[CD] Aarch64 Fix packaging ``libarm_compute.so`` and other libraries to the aarch64 CUDA wheels (#162566)

Fixes aarch64 linux packaging, following error:
https://github.com/pytorch/vision/actions/runs/17612462583/job/50037380487#step:15:62
```
Traceback (most recent call last):
  File "/__w/vision/vision/pytorch/vision/setup.py", line 13, in <module>
    import torch
  File "/__w/_temp/conda_environment_17612462583/lib/python3.11/site-packages/torch/__init__.py", line 415, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: libarm_compute.so: cannot open shared object file: No such file or directory
```
Due to missing dependencies.

Current Error:
File torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl is extracted
File is repackaged as torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl
File torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl renamed as torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl
Hence the repackaging does not take any effect.

This PR does following
File torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl is extracted
File torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl  deleted
File is repackaged as torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl

Looks like after migrating from zipping the wheel to wheel pack renaming the wheel is no longer necessary. Hence removing renaming and deleting old file.
```
2025-09-10T10:10:05.9652454Z Using nvidia libs from pypi - skipping CUDA library bundling
2025-09-10T10:10:05.9656595Z Copying to /pytorch/dist/tmp/torch/lib/libgomp.so.1
2025-09-10T10:10:05.9873843Z Copying to /pytorch/dist/tmp/torch/lib/libgfortran.so.5
2025-09-10T10:10:06.0410041Z Copying to /pytorch/dist/tmp/torch/lib/libarm_compute.so
2025-09-10T10:10:06.2869242Z Copying to /pytorch/dist/tmp/torch/lib/libarm_compute_graph.so
2025-09-10T10:10:06.4385740Z Copying to /pytorch/dist/tmp/torch/lib/libnvpl_lapack_lp64_gomp.so.0
2025-09-10T10:10:06.5461372Z Copying to /pytorch/dist/tmp/torch/lib/libnvpl_blas_lp64_gomp.so.0
2025-09-10T10:10:06.5728970Z Copying to /pytorch/dist/tmp/torch/lib/libnvpl_lapack_core.so.0
2025-09-10T10:10:06.6231872Z Copying to /pytorch/dist/tmp/torch/lib/libnvpl_blas_core.so.0
2025-09-10T10:10:14.1503110Z Updated tag from Tag: cp310-cp310-linux_aarch64
2025-09-10T10:10:14.1503482Z  to Tag: cp310-cp310-manylinux_2_28_aarch64
2025-09-10T10:10:14.1503682Z
2025-09-10T10:10:41.6498892Z Repacking wheel as /pytorch/dist/torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl...OK
2025-09-10T10:10:41.9394460Z Renaming torch-2.10.0.dev20250910+cu130-cp310-cp310-linux_aarch64.whl wheel to torch-2.10.0.dev20250910+cu130-cp310-cp310-manylinux_2_28_aarch64.whl
```

Test Plan, Executed on local file:
```
  inflating: ubuntu/dist/tmp/torch-2.9.0.dev20250909+cu130.dist-info/WHEEL
  inflating: ubuntu/dist/tmp/torch-2.9.0.dev20250909+cu130.dist-info/entry_points.txt
  inflating: ubuntu/dist/tmp/torch-2.9.0.dev20250909+cu130.dist-info/top_level.txt
  inflating: ubuntu/dist/tmp/torch-2.9.0.dev20250909+cu130.dist-info/RECORD
Bundling CUDA libraries with wheel
Updated tag from Tag: cp310-cp310-manylinux_2_28_aarch64
 to Tag: cp310-cp310-manylinux_2_28_aarch64

Repacking wheel as ubuntu/dist/torch-2.9.0.dev20250909+cu130-cp310-cp310-manylinux_2_28_aarch64.whl...OK
Copying torch-2.9.0.dev20250909+cu130-cp310-cp310-manylinux_2_28_aarch64.whl to artifacts
Build Complete. Created torch-2.9.0.dev20250909+cu130-cp310-cp310-manylinux_2_28_aarch64.whl..
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162566
Approved by: https://github.com/jeanschmidt, https://github.com/NicolasHug

(cherry picked from commit 3d32bb114bf0d5bd0193dc40f20253635dddf080)

Co-authored-by: atalman <atalman@fb.com>
2025-09-10 12:22:02 -04:00
ce928e17c1 CUDA 13.0 Windows Nvidia Driver Update to 580.88 (#162501)
CUDA 13.0 Windows Nvidia Driver Update to 580.88 (#162425)

Related to https://github.com/pytorch/pytorch/issues/162333
https://github.com/pytorch/pytorch/issues/159779

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162425
Approved by: https://github.com/tinglvv, https://github.com/malfet

(cherry picked from commit e38e953432764e00f16999c8b7df6346ad357a16)

Co-authored-by: atalman <atalman@fb.com>
2025-09-09 14:27:57 -04:00
cd2c98a5b5 [Release 2.9] Release only changes (#162493) 2025-09-09 11:15:20 -07:00
3333 changed files with 47685 additions and 112809 deletions

View File

@ -8,8 +8,6 @@ if [[ "$GPU_ARCH_VERSION" == *"12.6"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0"
elif [[ "$GPU_ARCH_VERSION" == *"12.8"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;12.0"
elif [[ "$GPU_ARCH_VERSION" == *"12.9"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;12.0"
elif [[ "$GPU_ARCH_VERSION" == *"13.0"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;11.0;12.0+PTX"
fi
@ -17,8 +15,6 @@ fi
# Compress the fatbin with -compress-mode=size for CUDA 13
if [[ "$DESIRED_CUDA" == *"13"* ]]; then
export TORCH_NVCC_FLAGS="-compress-mode=size"
# Bundle ptxas into the cu13 wheel, see https://github.com/pytorch/pytorch/issues/163801
export BUILD_BUNDLE_PTXAS=1
fi
SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
@ -35,7 +31,8 @@ pip install -r /pytorch/requirements.txt
pip install auditwheel==6.2.0 wheel
if [ "$DESIRED_CUDA" = "cpu" ]; then
echo "BASE_CUDA_VERSION is not set. Building cpu wheel."
python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn
#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn
else
echo "BASE_CUDA_VERSION is set to: $DESIRED_CUDA"
export USE_SYSTEM_NCCL=1
@ -49,5 +46,6 @@ else
export USE_NVIDIA_PYPI_LIBS=1
fi
python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
fi

View File

@ -13,6 +13,49 @@ def list_dir(path: str) -> list[str]:
return check_output(["ls", "-1", path]).decode().split("\n")
def build_ArmComputeLibrary() -> None:
"""
Using ArmComputeLibrary for aarch64 PyTorch
"""
print("Building Arm Compute Library")
acl_build_flags = [
"debug=0",
"neon=1",
"opencl=0",
"os=linux",
"openmp=1",
"cppthreads=0",
"arch=armv8a",
"multi_isa=1",
"fixed_format_kernels=1",
"build=native",
]
acl_install_dir = "/acl"
acl_checkout_dir = os.getenv("ACL_SOURCE_DIR", "ComputeLibrary")
if os.path.isdir(acl_install_dir):
shutil.rmtree(acl_install_dir)
if not os.path.isdir(acl_checkout_dir) or not len(os.listdir(acl_checkout_dir)):
check_call(
[
"git",
"clone",
"https://github.com/ARM-software/ComputeLibrary.git",
"-b",
"v25.02",
"--depth",
"1",
"--shallow-submodules",
]
)
check_call(
["scons", "Werror=1", f"-j{os.cpu_count()}"] + acl_build_flags,
cwd=acl_checkout_dir,
)
for d in ["arm_compute", "include", "utils", "support", "src", "build"]:
shutil.copytree(f"{acl_checkout_dir}/{d}", f"{acl_install_dir}/{d}")
def replace_tag(filename) -> None:
with open(filename) as f:
lines = f.readlines()
@ -274,7 +317,7 @@ if __name__ == "__main__":
).decode()
print("Building PyTorch wheel")
build_vars = ""
build_vars = "CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000 "
# MAX_JOB=5 is not required for CPU backend (see commit 465d98b)
if enable_cuda:
build_vars += "MAX_JOBS=5 "
@ -313,17 +356,23 @@ if __name__ == "__main__":
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={branch[1 : branch.find('-')]} PYTORCH_BUILD_NUMBER=1 "
if enable_mkldnn:
build_ArmComputeLibrary()
print("build pytorch with mkldnn+acl backend")
build_vars += "USE_MKLDNN=ON USE_MKLDNN_ACL=ON "
build_vars += "ACL_ROOT_DIR=/acl "
build_vars += (
"USE_MKLDNN=ON USE_MKLDNN_ACL=ON "
"ACL_ROOT_DIR=/acl "
"LD_LIBRARY_PATH=/pytorch/build/lib:/acl/build:$LD_LIBRARY_PATH "
"ACL_INCLUDE_DIR=/acl/build "
"ACL_LIBRARY=/acl/build "
)
if enable_cuda:
build_vars += "BLAS=NVPL "
else:
build_vars += "BLAS=OpenBLAS OpenBLAS_HOME=/opt/OpenBLAS "
build_vars += "BLAS=OpenBLAS OpenBLAS_HOME=/OpenBLAS "
else:
print("build pytorch without mkldnn backend")
os.system(f"cd /pytorch; {build_vars} python3 -m build --wheel --no-isolation")
os.system(f"cd /pytorch; {build_vars} python3 setup.py bdist_wheel")
if enable_cuda:
print("Updating Cuda Dependency")
filename = os.listdir("/pytorch/dist/")

View File

@ -241,7 +241,7 @@ def wait_for_connection(addr, port, timeout=15, attempt_cnt=5):
try:
with socket.create_connection((addr, port), timeout=timeout):
return
except (ConnectionRefusedError, TimeoutError): # noqa: PERF203
except (ConnectionRefusedError, socket.timeout): # noqa: PERF203
if i == attempt_cnt - 1:
raise
time.sleep(timeout)
@ -299,6 +299,40 @@ def install_condaforge_python(host: RemoteHost, python_version="3.8") -> None:
)
def build_OpenBLAS(host: RemoteHost, git_clone_flags: str = "") -> None:
print("Building OpenBLAS")
host.run_cmd(
f"git clone https://github.com/xianyi/OpenBLAS -b v0.3.28 {git_clone_flags}"
)
make_flags = "NUM_THREADS=64 USE_OPENMP=1 NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=ARMV8"
host.run_cmd(
f"pushd OpenBLAS && make {make_flags} -j8 && sudo make {make_flags} install && popd && rm -rf OpenBLAS"
)
def build_ArmComputeLibrary(host: RemoteHost, git_clone_flags: str = "") -> None:
print("Building Arm Compute Library")
acl_build_flags = " ".join(
[
"debug=0",
"neon=1",
"opencl=0",
"os=linux",
"openmp=1",
"cppthreads=0",
"arch=armv8a",
"multi_isa=1",
"fixed_format_kernels=1",
"build=native",
]
)
host.run_cmd(
f"git clone https://github.com/ARM-software/ComputeLibrary.git -b v25.02 {git_clone_flags}"
)
host.run_cmd(f"cd ComputeLibrary && scons Werror=1 -j8 {acl_build_flags}")
def embed_libgomp(host: RemoteHost, use_conda, wheel_name) -> None:
host.run_cmd("pip3 install auditwheel")
host.run_cmd(
@ -408,7 +442,7 @@ def build_torchvision(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"
host.run_cmd(f"cd vision && {build_vars} python3 -m build --wheel --no-isolation")
host.run_cmd(f"cd vision && {build_vars} python3 setup.py bdist_wheel")
vision_wheel_name = host.list_dir("vision/dist")[0]
embed_libgomp(host, use_conda, os.path.join("vision", "dist", vision_wheel_name))
@ -463,7 +497,7 @@ def build_torchdata(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"
host.run_cmd(f"cd data && {build_vars} python3 -m build --wheel --no-isolation")
host.run_cmd(f"cd data && {build_vars} python3 setup.py bdist_wheel")
wheel_name = host.list_dir("data/dist")[0]
embed_libgomp(host, use_conda, os.path.join("data", "dist", wheel_name))
@ -519,7 +553,7 @@ def build_torchtext(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"
host.run_cmd(f"cd text && {build_vars} python3 -m build --wheel --no-isolation")
host.run_cmd(f"cd text && {build_vars} python3 setup.py bdist_wheel")
wheel_name = host.list_dir("text/dist")[0]
embed_libgomp(host, use_conda, os.path.join("text", "dist", wheel_name))
@ -580,7 +614,7 @@ def build_torchaudio(
host.run_cmd(
f"cd audio && export FFMPEG_ROOT=$(pwd)/third_party/ffmpeg && export USE_FFMPEG=1 \
&& ./packaging/ffmpeg/build.sh \
&& {build_vars} python3 -m build --wheel --no-isolation"
&& {build_vars} python3 setup.py bdist_wheel"
)
wheel_name = host.list_dir("audio/dist")[0]
@ -666,6 +700,7 @@ def start_build(
configure_system(
host, compiler=compiler, use_conda=use_conda, python_version=python_version
)
build_OpenBLAS(host, git_clone_flags)
if host.using_docker():
print("Move libgfortant.a into a standard location")
@ -688,12 +723,10 @@ def start_build(
f"git clone --recurse-submodules -b {branch} https://github.com/pytorch/pytorch {git_clone_flags}"
)
host.run_cmd("pytorch/.ci/docker/common/install_openblas.sh")
print("Building PyTorch wheel")
build_opts = ""
if pytorch_build_number is not None:
build_opts += f" -C--build-option=--build-number={pytorch_build_number}"
build_opts += f" --build-number {pytorch_build_number}"
# Breakpad build fails on aarch64
build_vars = "USE_BREAKPAD=0 "
if branch == "nightly":
@ -710,18 +743,15 @@ def start_build(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"
if enable_mkldnn:
host.run_cmd("pytorch/.ci/docker/common/install_acl.sh")
build_ArmComputeLibrary(host, git_clone_flags)
print("build pytorch with mkldnn+acl backend")
build_vars += " USE_MKLDNN=ON USE_MKLDNN_ACL=ON"
build_vars += " BLAS=OpenBLAS"
build_vars += " OpenBLAS_HOME=/opt/OpenBLAS"
build_vars += " ACL_ROOT_DIR=/acl"
host.run_cmd(
f"cd $HOME/pytorch && {build_vars} python3 -m build --wheel --no-isolation{build_opts}"
f"cd $HOME/pytorch && export ACL_ROOT_DIR=$HOME/ComputeLibrary && {build_vars} python3 setup.py bdist_wheel{build_opts}"
)
print("Repair the wheel")
pytorch_wheel_name = host.list_dir("pytorch/dist")[0]
ld_library_path = "/acl/build:$HOME/pytorch/build/lib"
ld_library_path = "$HOME/acl/build:$HOME/pytorch/build/lib"
host.run_cmd(
f"export LD_LIBRARY_PATH={ld_library_path} && auditwheel repair $HOME/pytorch/dist/{pytorch_wheel_name}"
)
@ -733,7 +763,7 @@ def start_build(
else:
print("build pytorch without mkldnn backend")
host.run_cmd(
f"cd pytorch && {build_vars} python3 -m build --wheel --no-isolation{build_opts}"
f"cd pytorch && {build_vars} python3 setup.py bdist_wheel{build_opts}"
)
print("Deleting build folder")
@ -877,7 +907,7 @@ def terminate_instances(instance_type: str) -> None:
def parse_arguments():
from argparse import ArgumentParser
parser = ArgumentParser("Build and test AARCH64 wheels using EC2")
parser = ArgumentParser("Builid and test AARCH64 wheels using EC2")
parser.add_argument("--key-name", type=str)
parser.add_argument("--debug", action="store_true")
parser.add_argument("--build-only", action="store_true")
@ -974,7 +1004,7 @@ if __name__ == "__main__":
install_condaforge_python(host, args.python_version)
sys.exit(0)
python_version = args.python_version if args.python_version is not None else "3.10"
python_version = args.python_version if args.python_version is not None else "3.9"
if args.use_torch_from_pypi:
configure_system(host, compiler=args.compiler, python_version=python_version)

View File

@ -20,7 +20,7 @@ ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -m pip install cmake==3.18.4 && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake3
RUN rm -rf /usr/local/cuda-*
@ -69,8 +69,7 @@ RUN bash ./install_cuda.sh 13.0
ENV DESIRED_CUDA=13.0
FROM ${ROCM_IMAGE} as rocm
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ENV PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
ENV MKLROOT /opt/intel

View File

@ -36,12 +36,6 @@ case ${DOCKER_TAG_PREFIX} in
;;
rocm*)
BASE_TARGET=rocm
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
# add gfx950, gfx115x conditionally starting in ROCm 7.0
if [[ "$ROCM_VERSION" == *"7.0"* ]]; then
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950;gfx1150;gfx1151"
fi
EXTRA_BUILD_ARGS="${EXTRA_BUILD_ARGS} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}"
;;
*)
echo "ERROR: Unknown docker tag ${DOCKER_TAG_PREFIX}"

View File

@ -84,8 +84,8 @@ fi
_UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
_UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
if [[ "$image" == *rocm* ]]; then
_UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e
_UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77
_UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6
_UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d
fi
tag=$(echo $image | awk -F':' '{print $2}')
@ -113,7 +113,6 @@ case "$tag" in
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
INSTALL_MINGW=yes
;;
pytorch-linux-jammy-cuda13.0-cudnn9-py3-gcc11)
CUDA_VERSION=13.0.0
@ -176,16 +175,27 @@ case "$tag" in
fi
GCC_VERSION=11
VISION=yes
ROCM_VERSION=6.4
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
if [[ $tag =~ "benchmarks" ]]; then
INDUCTOR_BENCHMARKS=yes
fi
;;
pytorch-linux-noble-rocm-alpha-py3)
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
VISION=yes
ROCM_VERSION=7.0
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
PYTORCH_ROCM_ARCH="gfx90a;gfx942;gfx950;gfx1100"
if [[ $tag =~ "benchmarks" ]]; then
INDUCTOR_BENCHMARKS=yes
fi
PYTORCH_ROCM_ARCH="gfx90a;gfx942;gfx950"
;;
pytorch-linux-jammy-xpu-n-1-py3)
ANACONDA_PYTHON_VERSION=3.10
@ -345,7 +355,7 @@ docker build \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx90a;gfx942}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
@ -362,7 +372,6 @@ docker build \
--build-arg "OPENBLAS=${OPENBLAS:-}" \
--build-arg "SKIP_SCCACHE_INSTALL=${SKIP_SCCACHE_INSTALL:-}" \
--build-arg "SKIP_LLVM_SRC_BUILD_INSTALL=${SKIP_LLVM_SRC_BUILD_INSTALL:-}" \
--build-arg "INSTALL_MINGW=${INSTALL_MINGW:-}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \
@ -443,3 +452,12 @@ elif [ "$HAS_TRITON" = "yes" ]; then
echo "expecting triton to not be installed, but it is"
exit 1
fi
# Sanity check cmake version. Executorch reinstalls cmake and I'm not sure if
# they support 4.0.0 yet, so exclude them from this check.
CMAKE_VERSION=$(drun cmake --version)
if [[ "$EXECUTORCH" != *yes* && "$CMAKE_VERSION" != *4.* ]]; then
echo "CMake version is not 4.0.0:"
drun cmake --version
exit 1
fi

View File

@ -56,13 +56,9 @@ ENV INSTALLED_VISION ${VISION}
# Install rocm
ARG ROCM_VERSION
RUN mkdir ci_commit_pins
COPY ./common/common_utils.sh common_utils.sh
COPY ./ci_commit_pins/rocm-composable-kernel.txt ci_commit_pins/rocm-composable-kernel.txt
COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh common_utils.sh
RUN rm -r ci_commit_pins
RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION}
RUN rm install_rocm_magma.sh

View File

@ -1 +1 @@
deb42f2a8e48f5032b4a98ee781a15fa87a157cf
56392aa978594cc155fa8af48cd949f5b5f1823a

View File

@ -1,2 +1,2 @@
transformers==4.56.0
transformers==4.54.0
soxr==0.5.0

View File

@ -1 +1 @@
v2.27.5-1
v2.27.5-1

View File

@ -1 +0,0 @@
7fe50dc3da2069d6645d9deb8c017a876472a977

View File

@ -1 +1 @@
7416ffcb92cdbe98d9f97e4e6f95247e46dfc9fd
bbb06c0334a6772b92d24bde54956e675c8c6604

27
.ci/docker/common/install_acl.sh Executable file → Normal file
View File

@ -1,27 +1,16 @@
#!/bin/bash
# Script used only in CD pipeline
set -euo pipefail
set -eux
ACL_VERSION=${ACL_VERSION:-"v25.02"}
ACL_INSTALL_DIR="/acl"
readonly version=v25.02
readonly src_host=https://github.com/ARM-software
readonly src_repo=ComputeLibrary
# Clone ACL
git clone https://github.com/ARM-software/ComputeLibrary.git -b "${ACL_VERSION}" --depth 1 --shallow-submodules
[[ ! -d ${src_repo} ]] && git clone ${src_host}/${src_repo}.git
cd ${src_repo}
git checkout $version
ACL_CHECKOUT_DIR="ComputeLibrary"
# Build with scons
pushd $ACL_CHECKOUT_DIR
scons -j8 Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 \
os=linux arch=armv8a build=native multi_isa=1 \
fixed_format_kernels=1 openmp=1 cppthreads=0
popd
# Install ACL
sudo mkdir -p ${ACL_INSTALL_DIR}
for d in arm_compute include utils support src build
do
sudo cp -r ${ACL_CHECKOUT_DIR}/${d} ${ACL_INSTALL_DIR}/${d}
done
rm -rf $ACL_CHECKOUT_DIR

View File

@ -42,27 +42,22 @@ install_pip_dependencies() {
# A workaround, ExecuTorch has moved to numpy 2.0 which is not compatible with the current
# numba and scipy version used in PyTorch CI
conda_run pip uninstall -y numba scipy
# Yaspin is needed for running CI test (get_benchmark_analysis_data.py)
pip_install yaspin==3.1.0
popd
}
setup_executorch() {
pushd executorch
export PYTHON_EXECUTABLE=python
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON -DEXECUTORCH_BUILD_TESTS=ON"
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
as_jenkins .ci/scripts/setup-linux.sh --build-tool cmake || true
popd
}
if [ $# -eq 0 ]; then
clone_executorch
install_buck2
install_conda_dependencies
install_pip_dependencies
pushd executorch
setup_executorch
popd
else
"$@"
fi
clone_executorch
install_buck2
install_conda_dependencies
install_pip_dependencies
setup_executorch

View File

@ -25,7 +25,7 @@ function install_torchbench() {
python install.py --continue_on_fail
echo "Print all dependencies after TorchBench is installed"
python -m pip freeze
python -mpip freeze
popd
chown -R jenkins torchbench

View File

@ -1,10 +0,0 @@
#!/bin/bash
set -ex
# Install MinGW-w64 for Windows cross-compilation
apt-get update
apt-get install -y g++-mingw-w64-x86-64-posix
echo "MinGW-w64 installed successfully"
x86_64-w64-mingw32-g++ --version

View File

@ -8,8 +8,8 @@ MKLROOT=/opt/intel
mkdir -p ${MKLROOT}
pushd /tmp
python3 -m pip install wheel
python3 -m pip download -d . mkl-static==${MKL_VERSION}
python3 -mpip install wheel
python3 -mpip download -d . mkl-static==${MKL_VERSION}
python3 -m wheel unpack mkl_static-${MKL_VERSION}-py2.py3-none-manylinux1_x86_64.whl
python3 -m wheel unpack mkl_include-${MKL_VERSION}-py2.py3-none-manylinux1_x86_64.whl
mv mkl_static-${MKL_VERSION}/mkl_static-${MKL_VERSION}.data/data/lib ${MKLROOT}

View File

@ -19,8 +19,8 @@ pip_install \
transformers==4.36.2
pip_install coloredlogs packaging
pip_install onnxruntime==1.23.0
pip_install onnxscript==0.5.4
pip_install onnxruntime==1.22.1
pip_install onnxscript==0.4.0
# Cache the transformers model to be used later by ONNX tests. We need to run the transformers
# package to download the model. By default, the model is cached at ~/.cache/huggingface/hub/

12
.ci/docker/common/install_openblas.sh Executable file → Normal file
View File

@ -3,10 +3,8 @@
set -ex
OPENBLAS_VERSION=${OPENBLAS_VERSION:-"v0.3.30"}
# Clone OpenBLAS
git clone https://github.com/OpenMathLib/OpenBLAS.git -b "${OPENBLAS_VERSION}" --depth 1 --shallow-submodules
cd /
git clone https://github.com/OpenMathLib/OpenBLAS.git -b "${OPENBLAS_VERSION:-v0.3.30}" --depth 1 --shallow-submodules
OPENBLAS_CHECKOUT_DIR="OpenBLAS"
OPENBLAS_BUILD_FLAGS="
@ -19,7 +17,5 @@ CFLAGS=-O3
BUILD_BFLOAT16=1
"
make -j8 ${OPENBLAS_BUILD_FLAGS} -C $OPENBLAS_CHECKOUT_DIR
sudo make install -C $OPENBLAS_CHECKOUT_DIR
rm -rf $OPENBLAS_CHECKOUT_DIR
make -j8 ${OPENBLAS_BUILD_FLAGS} -C ${OPENBLAS_CHECKOUT_DIR}
make -j8 ${OPENBLAS_BUILD_FLAGS} install -C ${OPENBLAS_CHECKOUT_DIR}

View File

@ -11,5 +11,5 @@ ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python
python -m venv /var/lib/jenkins/ci_env
source /var/lib/jenkins/ci_env/bin/activate
python -m pip install --upgrade pip
python -m pip install -r /opt/requirements-ci.txt
python -mpip install --upgrade pip
python -mpip install -r /opt/requirements-ci.txt

View File

@ -2,11 +2,6 @@
set -ex
# for pip_install function
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
ROCM_COMPOSABLE_KERNEL_VERSION="$(cat $(dirname $0)/../ci_commit_pins/rocm-composable-kernel.txt)"
ver() {
printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');
}
@ -42,6 +37,12 @@ EOF
rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu"
# Special case for ROCM_VERSION == 7.0
if [[ $(ver "$ROCM_VERSION") -eq $(ver 7.0) ]]; then
rocm_baseurl="https://repo.radeon.com/rocm/apt/7.0_alpha2"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/30.10_alpha2/ubuntu"
fi
# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
@ -112,8 +113,6 @@ EOF
rm -rf HIP clr
fi
pip_install "git+https://github.com/rocm/composable_kernel@$ROCM_COMPOSABLE_KERNEL_VERSION"
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
@ -177,8 +176,6 @@ install_centos() {
sqlite3 $kdb "PRAGMA journal_mode=off; PRAGMA VACUUM;"
done
pip_install "git+https://github.com/rocm/composable_kernel@$ROCM_COMPOSABLE_KERNEL_VERSION"
# Cleanup
yum clean all
rm -rf /var/cache/yum

View File

@ -12,8 +12,8 @@ function do_install() {
rocm_version_nodot=${rocm_version//./}
# https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=d6e4117bc88e73f06d26c6c2e14f064e8fc3d1ec
# Version 2.7.2 + ROCm related updates
MAGMA_VERSION=a1625ff4d9bc362906bd01f805dbbe12612953f6
magma_archive="magma-rocm${rocm_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"
rocm_dir="/opt/rocm"

View File

@ -66,15 +66,15 @@ if [ -n "${UBUNTU_VERSION}" ] && [ -n "${GCC_VERSION}" ] && [[ "${GCC_VERSION}"
# Triton needs at least gcc-9 to build
apt-get install -y g++-9
CXX=g++-9 conda_run python -m build --wheel --no-isolation
CXX=g++-9 conda_run python setup.py bdist_wheel
elif [ -n "${UBUNTU_VERSION}" ] && [ -n "${CLANG_VERSION}" ]; then
# Triton needs <filesystem> which surprisingly is not available with clang-9 toolchain
add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get install -y g++-9
CXX=g++-9 conda_run python -m build --wheel --no-isolation
CXX=g++-9 conda_run python setup.py bdist_wheel
else
conda_run python -m build --wheel --no-isolation
conda_run python setup.py bdist_wheel
fi
# Copy the wheel to /opt for multi stage docker builds

View File

@ -1,9 +0,0 @@
#!/bin/bash
set -xe
# Script used in Linux x86 and aarch64 CD pipeline
# Workaround for exposing statically linked libstdc++ CXX11 ABI symbols.
# see: https://github.com/pytorch/pytorch/issues/133437
LIBNONSHARED=$(gcc -print-file-name=libstdc++_nonshared.a)
nm -g $LIBNONSHARED | grep " T " | grep recursive_directory_iterator | cut -c 20- > weaken-symbols.txt
objcopy --weaken-symbols weaken-symbols.txt $LIBNONSHARED $LIBNONSHARED

View File

@ -40,16 +40,12 @@ case ${DOCKER_TAG_PREFIX} in
;;
rocm*)
# we want the patch version of 6.4 instead
if [[ "$GPU_ARCH_VERSION" == *"6.4"* ]]; then
if [[ $(ver $GPU_ARCH_VERSION) -eq $(ver 6.4) ]]; then
GPU_ARCH_VERSION="${GPU_ARCH_VERSION}.2"
fi
BASE_TARGET=rocm
GPU_IMAGE=rocm/dev-ubuntu-22.04:${GPU_ARCH_VERSION}-complete
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
# add gfx950, gfx115x conditionally starting in ROCm 7.0
if [[ "$GPU_ARCH_VERSION" == *"7.0"* ]]; then
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950;gfx1150;gfx1151"
fi
DOCKER_GPU_BUILD_ARG="--build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg ROCM_VERSION=${GPU_ARCH_VERSION}"
;;
*)

View File

@ -14,7 +14,7 @@ ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/op
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -m pip install cmake==3.18.4 && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake3
FROM base as openssl
@ -130,12 +130,11 @@ ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/op
RUN for cpython_version in "cp312-cp312" "cp313-cp313" "cp313-cp313t"; do \
/opt/python/${cpython_version}/bin/python -m pip install setuptools wheel; \
done;
ADD ./common/patch_libstdc.sh patch_libstdc.sh
RUN bash ./patch_libstdc.sh && rm patch_libstdc.sh
# cmake-3.18.4 from pip; force in case cmake3 already exists
RUN yum install -y python3-pip && \
python3 -m pip install cmake==3.18.4 && \
python3 -mpip install cmake==3.18.4 && \
ln -sf /usr/local/bin/cmake /usr/bin/cmake3
FROM cpu_final as cuda_final
@ -157,7 +156,7 @@ ENV ROCM_PATH /opt/rocm
# cmake-3.28.4 from pip to get enable_language(HIP)
# and avoid 3.21.0 cmake+ninja issues with ninja inserting "-Wl,--no-as-needed" in LINK_FLAGS for static linker
RUN python3 -m pip install --upgrade pip && \
python3 -m pip install cmake==3.28.4
python3 -mpip install cmake==3.28.4
# replace the libdrm in /opt/amdgpu with custom amdgpu.ids lookup path
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
@ -174,7 +173,7 @@ FROM cpu_final as xpu_final
ENV XPU_DRIVER_TYPE ROLLING
# cmake-3.28.4 from pip
RUN python3 -m pip install --upgrade pip && \
python3 -m pip install cmake==3.28.4
python3 -mpip install cmake==3.28.4
ADD ./common/install_xpu.sh install_xpu.sh
ENV XPU_VERSION 2025.2
RUN bash ./install_xpu.sh && rm install_xpu.sh

View File

@ -62,13 +62,6 @@ ARG OPENBLAS_VERSION
ADD ./common/install_openblas.sh install_openblas.sh
RUN bash ./install_openblas.sh && rm install_openblas.sh
# Install Arm Compute Library
FROM base as arm_compute
# use python3.9 to install scons
RUN python3.9 -m pip install scons==4.7.0
RUN ln -sf /opt/python/cp39-cp39/bin/scons /usr/local/bin
COPY ./common/install_acl.sh install_acl.sh
RUN bash ./install_acl.sh && rm install_acl.sh
FROM base as final
# remove unnecessary python versions
@ -77,7 +70,4 @@ RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
COPY --from=openblas /opt/OpenBLAS/ /opt/OpenBLAS/
COPY --from=arm_compute /acl /acl
ENV LD_LIBRARY_PATH=/opt/OpenBLAS/lib:/acl/build/:$LD_LIBRARY_PATH
ADD ./common/patch_libstdc.sh patch_libstdc.sh
RUN bash ./patch_libstdc.sh && rm patch_libstdc.sh
ENV LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH

View File

@ -86,15 +86,6 @@ FROM base as nvpl
ADD ./common/install_nvpl.sh install_nvpl.sh
RUN bash ./install_nvpl.sh && rm install_nvpl.sh
# Install Arm Compute Library
FROM base as arm_compute
# use python3.9 to install scons
RUN python3.9 -m pip install scons==4.7.0
RUN ln -sf /opt/python/cp39-cp39/bin/scons /usr/local/bin
COPY ./common/install_acl.sh install_acl.sh
RUN bash ./install_acl.sh && rm install_acl.sh
FROM base as final
FROM final as cuda_final
ARG BASE_CUDA_VERSION
RUN rm -rf /usr/local/cuda-${BASE_CUDA_VERSION}
@ -102,9 +93,5 @@ COPY --from=cuda /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BAS
COPY --from=magma /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=nvpl /opt/nvpl/lib/ /usr/local/lib/
COPY --from=nvpl /opt/nvpl/include/ /usr/local/include/
COPY --from=arm_compute /acl /acl
RUN ln -sf /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda
ENV PATH=/usr/local/cuda/bin:$PATH
ENV LD_LIBRARY_PATH=/acl/build/:$LD_LIBRARY_PATH
ADD ./common/patch_libstdc.sh patch_libstdc.sh
RUN bash ./patch_libstdc.sh && rm patch_libstdc.sh

View File

@ -0,0 +1,71 @@
FROM centos:8 as base
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ENV PATH /opt/rh/gcc-toolset-11/root/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# change to a valid repo
RUN sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-Linux-*.repo
# enable to install ninja-build
RUN sed -i 's|enabled=0|enabled=1|g' /etc/yum.repos.d/CentOS-Linux-PowerTools.repo
RUN yum -y update
RUN yum install -y wget curl perl util-linux xz bzip2 git patch which zlib-devel sudo
RUN yum install -y autoconf automake make cmake gdb gcc-toolset-11-gcc-c++
FROM base as openssl
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
# Install python
FROM base as python
RUN yum install -y openssl-devel zlib-devel bzip2-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel libpcap-devel xz-devel libffi-devel
ADD common/install_cpython.sh install_cpython.sh
RUN bash ./install_cpython.sh && rm install_cpython.sh
FROM base as conda
ADD ./common/install_conda_docker.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN /opt/conda/bin/conda install -y cmake
FROM base as intel
# Install MKL
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
COPY --from=conda /opt/conda /opt/conda
ENV PATH=/opt/conda/bin:$PATH
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as patchelf
ADD ./common/install_patchelf.sh install_patchelf.sh
RUN bash ./install_patchelf.sh && rm install_patchelf.sh
RUN cp $(which patchelf) /patchelf
FROM base as jni
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
FROM base as libpng
ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM base as final
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
COPY --from=intel /opt/intel /opt/intel
COPY --from=conda /opt/conda /opt/conda
COPY --from=patchelf /usr/local/bin/patchelf /usr/local/bin/patchelf
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
COPY --from=libpng /usr/local/bin/png* /usr/local/bin/
COPY --from=libpng /usr/local/bin/libpng* /usr/local/bin/
COPY --from=libpng /usr/local/include/png* /usr/local/include/
COPY --from=libpng /usr/local/include/libpng* /usr/local/include/
COPY --from=libpng /usr/local/lib/libpng* /usr/local/lib/
COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/lib/pkgconfig
RUN yum install -y ninja-build

View File

@ -113,10 +113,7 @@ RUN dnf install -y \
RUN env GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True pip3 install grpcio
# cmake-3.28.0 from pip for onnxruntime
RUN python3 -m pip install cmake==3.28.0
ADD ./common/patch_libstdc.sh patch_libstdc.sh
RUN bash ./patch_libstdc.sh && rm patch_libstdc.sh
RUN python3 -mpip install cmake==3.28.0
# build onnxruntime 1.21.0 from sources.
# it is not possible to build it from sources using pip,

View File

@ -28,7 +28,6 @@ fi
MANY_LINUX_VERSION=${MANY_LINUX_VERSION:-}
DOCKERFILE_SUFFIX=${DOCKERFILE_SUFFIX:-}
OPENBLAS_VERSION=${OPENBLAS_VERSION:-}
ACL_VERSION=${ACL_VERSION:-}
case ${image} in
manylinux2_28-builder:cpu)
@ -42,6 +41,13 @@ case ${image} in
GPU_IMAGE=arm64v8/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=13 --build-arg NINJA_VERSION=1.12.1"
MANY_LINUX_VERSION="2_28_aarch64"
OPENBLAS_VERSION="v0.3.30"
;;
manylinuxcxx11-abi-builder:cpu-cxx11-abi)
TARGET=final
GPU_IMAGE=""
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=9"
MANY_LINUX_VERSION="cxx11-abi"
;;
manylinuxs390x-builder:cpu-s390x)
TARGET=final
@ -76,7 +82,7 @@ case ${image} in
;;
manylinux2_28-builder:rocm*)
# we want the patch version of 6.4 instead
if [[ "$GPU_ARCH_VERSION" == *"6.4"* ]]; then
if [[ $(ver $GPU_ARCH_VERSION) -eq $(ver 6.4) ]]; then
GPU_ARCH_VERSION="${GPU_ARCH_VERSION}.2"
fi
TARGET=rocm_final
@ -84,10 +90,6 @@ case ${image} in
DEVTOOLSET_VERSION="11"
GPU_IMAGE=rocm/dev-almalinux-8:${GPU_ARCH_VERSION}-complete
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
# add gfx950, gfx115x conditionally starting in ROCm 7.0
if [[ "$GPU_ARCH_VERSION" == *"7.0"* ]]; then
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950;gfx1150;gfx1151"
fi
DOCKER_GPU_BUILD_ARG="--build-arg ROCM_VERSION=${GPU_ARCH_VERSION} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}"
;;
manylinux2_28-builder:xpu)
@ -119,8 +121,7 @@ tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
DOCKER_BUILDKIT=1 docker build \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--build-arg "OPENBLAS_VERSION=${OPENBLAS_VERSION:-}" \
--build-arg "ACL_VERSION=${ACL_VERSION:-}" \
--build-arg "OPENBLAS_VERSION=${OPENBLAS_VERSION}" \
--target "${TARGET}" \
-t "${tmp_tag}" \
$@ \

View File

@ -10,6 +10,11 @@ BAD_SSL = "https://self-signed.badssl.com"
print("Testing SSL certificate checking for Python:", sys.version)
if sys.version_info[:2] < (2, 7) or sys.version_info[:2] < (3, 4):
print("This version never checks SSL certs; skipping tests")
sys.exit(0)
EXC = OSError
print(f"Connecting to {GOOD_SSL} should work")

View File

@ -10,11 +10,6 @@ boto3==1.35.42
#Pinned versions: 1.19.12, 1.16.34
#test that import:
build==1.3.0
#Description: A simple, correct Python build frontend.
#Pinned versions: 1.3.0
#test that import:
click
#Description: Command Line Interface Creation Kit
#Pinned versions:
@ -52,10 +47,10 @@ flatbuffers==24.12.23
#Pinned versions: 24.12.23
#test that import:
hypothesis==6.56.4
hypothesis==5.35.1
# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
#Description: advanced library for generating parametrized tests
#Pinned versions: 6.56.4
#Pinned versions: 5.35.1
#test that import: test_xnnpack_integration.py, test_pruning_op.py, test_nn.py
junitparser==2.1.1
@ -98,7 +93,7 @@ librosa==0.10.2 ; python_version == "3.12" and platform_machine != "s390x"
#Pinned versions:
#test that import:
mypy==1.16.0 ; platform_system == "Linux"
mypy==1.16.0 ; platform_system != "Windows"
# Pin MyPy version because new errors are likely to appear with each release
# Skip on Windows as lots of type annotations are POSIX specific
#Description: linter
@ -111,17 +106,20 @@ networkx==2.8.8
#Pinned versions: 2.8.8
#test that import: functorch
ninja==1.11.1.4
ninja==1.11.1.3
#Description: build system. Used in some tests. Used in build to generate build
#time tracing information
#Pinned versions: 1.11.1.4
#Pinned versions: 1.11.1.3
#test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py
numba==0.49.0 ; python_version < "3.9" and platform_machine != "s390x"
numba==0.55.2 ; python_version == "3.9" and platform_machine != "s390x"
numba==0.55.2 ; python_version == "3.10" and platform_machine != "s390x"
numba==0.60.0 ; python_version == "3.12" and platform_machine != "s390x"
#Description: Just-In-Time Compiler for Numerical Functions
#Pinned versions: 0.55.2, 0.60.0
#Pinned versions: 0.54.1, 0.49.0, <=0.49.1
#test that import: test_numba_integration.py
#For numba issue see https://github.com/pytorch/pytorch/issues/51511
#Need release > 0.61.2 for s390x due to https://github.com/numba/numba/pull/10073
#numpy
@ -136,7 +134,7 @@ numba==0.60.0 ; python_version == "3.12" and platform_machine != "s390x"
#test_nn.py, test_namedtensor.py, test_linalg.py, test_jit_cuda_fuser.py,
#test_jit.py, test_indexing.py, test_datapipe.py, test_dataloader.py,
#test_binary_ufuncs.py
numpy==1.22.4; python_version == "3.10"
numpy==1.22.4; python_version == "3.9" or python_version == "3.10"
numpy==1.26.2; python_version == "3.11" or python_version == "3.12"
numpy==2.1.2; python_version >= "3.13"
@ -168,12 +166,12 @@ optree==0.13.0
pillow==11.0.0
#Description: Python Imaging Library fork
#Pinned versions: 11.0.0
#Pinned versions: 10.3.0
#test that import:
protobuf==5.29.5
protobuf==5.29.4
#Description: Google's data interchange format
#Pinned versions: 5.29.5
#Pinned versions: 5.29.4
#test that import: test_tensorboard.py, test/onnx/*
psutil
@ -216,7 +214,7 @@ pytest-subtests==0.13.1
#Pinned versions:
#test that import:
xdoctest==1.3.0
xdoctest==1.1.0
#Description: runs doctests in pytest
#Pinned versions: 1.1.0
#test that import:
@ -241,9 +239,10 @@ pygments==2.15.0
#Pinned versions: 14.1.0
#test that import:
scikit-image==0.22.0
scikit-image==0.19.3 ; python_version < "3.10"
scikit-image==0.22.0 ; python_version >= "3.10"
#Description: image processing routines
#Pinned versions: 0.22.0
#Pinned versions:
#test that import: test_nn.py
#scikit-learn
@ -266,7 +265,7 @@ scipy==1.14.1 ; python_version >= "3.12"
#test that import:
# needed by torchgen utils
typing-extensions==4.12.2
typing-extensions>=4.10.0
#Description: type hints for python
#Pinned versions:
#test that import:
@ -327,6 +326,8 @@ pywavelets==1.7.0 ; python_version >= "3.12"
lxml==5.3.0
#Description: This is a requirement of unittest-xml-reporting
# Python-3.9 binaries
PyGithub==2.3.0
sympy==1.13.3
@ -339,7 +340,7 @@ onnx==1.18.0
#Pinned versions:
#test that import:
onnxscript==0.5.3
onnxscript==0.4.0
#Description: Required by mypy and test_public_bindings.py when checking torch.onnx._internal
#Pinned versions:
#test that import:
@ -359,10 +360,9 @@ pwlf==2.2.1
#test that import: test_sac_estimator.py
# To build PyTorch itself
pyyaml==6.0.2
pyyaml
pyzstd
setuptools==78.1.1
packaging==23.1
setuptools>=70.1.0
six
scons==4.5.2 ; platform_machine == "aarch64"
@ -377,16 +377,13 @@ dataclasses_json==0.6.7
#Pinned versions: 0.6.7
#test that import:
cmake==3.31.6
cmake==4.0.0
#Description: required for building
tlparse==0.4.0
#Description: required for log parsing
filelock==3.18.0
#Description: required for inductor testing
cuda-bindings>=12.0,<13.0 ; platform_machine != "s390x" and platform_system != "Darwin"
cuda-bindings>=12.0,<13.0 ; platform_machine != "s390x"
#Description: required for testing CUDAGraph::raw_cuda_graph(). See https://nvidia.github.io/cuda-python/cuda-bindings/latest/support.html for how this version was chosen. Note "Any fix in the latest bindings would be backported to the prior major version" means that only the newest version of cuda-bindings will get fixes. Depending on the latest version of 12.x is okay because all 12.y versions will be supported via "CUDA minor version compatibility". Pytorch builds against 13.z versions of cuda toolkit work with 12.x versions of cuda-bindings as well because newer drivers work with old toolkits.
#test that import: test_cuda.py

View File

@ -1,15 +1,8 @@
sphinx==5.3.0
#Description: This is used to generate PyTorch docs
#Pinned versions: 5.3.0
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git@d53b0ffb9b1cda68260693ea98f3483823c88d8e#egg=pytorch_sphinx_theme2
standard-imghdr==3.13.0; python_version >= "3.13"
#Description: This is needed by Sphinx, so it needs to be added here.
# The reasons are as follows:
# 1) This module has been removed from the Python standard library since Python 3.13(https://peps.python.org/pep-0594/#imghdr);
# 2) The current version of Sphinx (5.3.0) is not compatible with Python 3.13.
# Once Sphinx is upgraded to a version compatible with Python 3.13 or later, we can remove this dependency.
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git@71e55749be14ceb56e7f8211a9fb649866b87ad4#egg=pytorch_sphinx_theme2
# TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering
# but it doesn't seem to work and hangs around idly. The initial thought that it is probably
# something related to Docker setup. We can investigate this later.

View File

@ -52,13 +52,9 @@ ENV INSTALLED_VISION ${VISION}
# Install rocm
ARG ROCM_VERSION
RUN mkdir ci_commit_pins
COPY ./common/common_utils.sh common_utils.sh
COPY ./ci_commit_pins/rocm-composable-kernel.txt ci_commit_pins/rocm-composable-kernel.txt
COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh common_utils.sh
RUN rm -r ci_commit_pins
RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION}
RUN rm install_rocm_magma.sh

View File

@ -103,11 +103,6 @@ COPY ci_commit_pins/torchbench.txt torchbench.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface-requirements.txt torchbench.txt
ARG INSTALL_MINGW
COPY ./common/install_mingw.sh install_mingw.sh
RUN if [ -n "${INSTALL_MINGW}" ]; then bash ./install_mingw.sh; fi
RUN rm install_mingw.sh
ARG TRITON
ARG TRITON_CPU

View File

@ -7,4 +7,4 @@ set -ex
SCRIPTPATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
USE_NVSHMEM=0 USE_CUSPARSELT=0 BUILD_PYTHONLESS=1 DESIRED_PYTHON="3.10" ${SCRIPTPATH}/../manywheel/build.sh
USE_NVSHMEM=0 USE_CUSPARSELT=0 BUILD_PYTHONLESS=1 DESIRED_PYTHON="3.9" ${SCRIPTPATH}/../manywheel/build.sh

View File

@ -41,6 +41,7 @@ def sample_vllm_test_library():
"pytest -v -s basic_correctness/test_cumem.py",
"pytest -v -s basic_correctness/test_basic_correctness.py",
"pytest -v -s basic_correctness/test_cpu_offload.py",
"VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1 pytest -v -s basic_correctness/test_preemption.py",
],
},
"vllm_basic_models_test": {
@ -67,12 +68,15 @@ def sample_vllm_test_library():
"-v",
"-s",
"entrypoints/llm",
"--ignore=entrypoints/llm/test_lazy_outlines.py",
"--ignore=entrypoints/llm/test_generate.py",
"--ignore=entrypoints/llm/test_generate_multiple_loras.py",
"--ignore=entrypoints/llm/test_collective_rpc.py",
]
),
"pytest -v -s entrypoints/llm/test_generate.py",
"pytest -v -s entrypoints/offline_mode",
"pytest -v -s entrypoints/llm/test_lazy_outlines.py",
"pytest -v -s entrypoints/llm/test_generate.py ",
"VLLM_USE_V1=0 pytest -v -s entrypoints/offline_mode",
],
},
"vllm_regression_test": {
@ -143,7 +147,7 @@ def sample_vllm_test_library():
"pytest -v -s compile/test_decorator.py",
],
},
"vllm_language_model_test_extended_generation_28_failure_test": {
"vllm_languagde_model_test_extended_generation_28_failure_test": {
"title": "Language Models Test (Extended Generation) 2.8 release failure",
"id": "vllm_languagde_model_test_extended_generation_28_failure_test",
"package_install": [

View File

@ -63,12 +63,7 @@ class VllmBuildParameters:
# DOCKERFILE_PATH: path to Dockerfile used when use_local_dockerfile is True"
use_local_dockerfile: bool = env_bool_field("USE_LOCAL_DOCKERFILE", True)
dockerfile_path: Path = env_path_field(
"DOCKERFILE_PATH", ".github/ci_configs/vllm/Dockerfile"
)
# the cleaning script to remove torch dependencies from pip
cleaning_script: Path = env_path_field(
"cleaning_script", ".github/ci_configs/vllm/use_existing_torch.py"
"DOCKERFILE_PATH", ".github/ci_configs/vllm/Dockerfile.tmp_vllm"
)
# OUTPUT_DIR: where docker buildx (local exporter) will write artifacts
@ -165,7 +160,6 @@ class VllmBuildRunner(BaseRunner):
logger.info("Running vllm build with inputs: %s", inputs)
vllm_commit = clone_vllm()
self.cp_torch_cleaning_script(inputs)
self.cp_dockerfile_if_exist(inputs)
# cp torch wheels from root direct to vllm workspace if exist
self.cp_torch_whls_if_exist(inputs)
@ -211,11 +205,6 @@ class VllmBuildRunner(BaseRunner):
copy(inputs.torch_whls_path, tmp_dir)
return tmp_dir
def cp_torch_cleaning_script(self, inputs: VllmBuildParameters):
script = get_path(inputs.cleaning_script, resolve=True)
vllm_script = Path(f"./{self.work_directory}/use_existing_torch.py")
copy(script, vllm_script)
def cp_dockerfile_if_exist(self, inputs: VllmBuildParameters):
if not inputs.use_local_dockerfile:
logger.info("using vllm default dockerfile.torch_nightly for build")

View File

@ -11,7 +11,7 @@ from typing import Any
from cli.lib.common.cli_helper import BaseRunner
from cli.lib.common.envs_helper import env_path_field, env_str_field, get_env
from cli.lib.common.path_helper import copy, get_path, remove_dir
from cli.lib.common.path_helper import copy, remove_dir
from cli.lib.common.pip_helper import (
pip_install_first_match,
pip_install_packages,
@ -43,10 +43,6 @@ class VllmTestParameters:
torch_cuda_arch_list: str = env_str_field("TORCH_CUDA_ARCH_LIST", "8.9")
cleaning_script: Path = env_path_field(
"cleaning_script", ".github/ci_configs/vllm/use_existing_torch.py"
)
def __post_init__(self):
if not self.torch_whls_path.exists():
raise ValueError("missing torch_whls_path")
@ -96,13 +92,11 @@ class VllmTestRunner(BaseRunner):
self._set_envs(params)
clone_vllm(dst=self.work_directory)
self.cp_torch_cleaning_script(params)
with working_directory(self.work_directory):
remove_dir(Path("vllm"))
self._install_wheels(params)
self._install_dependencies()
# verify the torches are not overridden by test dependencies
check_versions()
def run(self):
@ -131,11 +125,6 @@ class VllmTestRunner(BaseRunner):
# double check the torches are not overridden by other packages
check_versions()
def cp_torch_cleaning_script(self, params: VllmTestParameters):
script = get_path(params.cleaning_script, resolve=True)
vllm_script = Path(f"./{self.work_directory}/use_existing_torch.py")
copy(script, vllm_script)
def _install_wheels(self, params: VllmTestParameters):
logger.info("Running vllm test with inputs: %s", params)
if not pkg_exists("torch"):

View File

@ -1,11 +1,11 @@
SHELL=/usr/bin/env bash
DOCKER_CMD ?= docker
DESIRED_ROCM ?= 7.0
DESIRED_ROCM ?= 6.4
DESIRED_ROCM_SHORT = $(subst .,,$(DESIRED_ROCM))
PACKAGE_NAME = magma-rocm
# inherit this from underlying docker image, do not pass this env var to docker
#PYTORCH_ROCM_ARCH ?= gfx900;gfx906;gfx908;gfx90a;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1102;gfx1150;gfx1151;gfx1200;gfx1201
#PYTORCH_ROCM_ARCH ?= gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201
DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
-v $(shell git rev-parse --show-toplevel)/.ci:/builder \
@ -16,20 +16,20 @@ DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
magma-rocm/build_magma.sh
.PHONY: all
all: magma-rocm70
all: magma-rocm64
all: magma-rocm63
.PHONY:
clean:
$(RM) -r magma-*
$(RM) -r output
.PHONY: magma-rocm70
magma-rocm70: DESIRED_ROCM := 7.0
magma-rocm70:
$(DOCKER_RUN)
.PHONY: magma-rocm64
magma-rocm64: DESIRED_ROCM := 6.4
magma-rocm64:
$(DOCKER_RUN)
.PHONY: magma-rocm63
magma-rocm63: DESIRED_ROCM := 6.3
magma-rocm63:
$(DOCKER_RUN)

View File

@ -6,8 +6,8 @@ set -eou pipefail
# The script expects DESIRED_CUDA and PACKAGE_NAME to be set
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
# https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=d6e4117bc88e73f06d26c6c2e14f064e8fc3d1ec
# Version 2.7.2 + ROCm related updates
MAGMA_VERSION=a1625ff4d9bc362906bd01f805dbbe12612953f6
# Folders for the build
PACKAGE_FILES=${ROOT_DIR}/magma-rocm/package_files # metadata
@ -20,7 +20,7 @@ mkdir -p ${PACKAGE_DIR} ${PACKAGE_OUTPUT}/linux-64 ${PACKAGE_BUILD} ${PACKAGE_RE
# Fetch magma sources and verify checksum
pushd ${PACKAGE_DIR}
git clone https://github.com/jeffdaily/magma
git clone https://bitbucket.org/icl/magma.git
pushd magma
git checkout ${MAGMA_VERSION}
popd

View File

@ -142,7 +142,7 @@ time CMAKE_ARGS=${CMAKE_ARGS[@]} \
EXTRA_CAFFE2_CMAKE_FLAGS=${EXTRA_CAFFE2_CMAKE_FLAGS[@]} \
BUILD_LIBTORCH_CPU_WITH_DEBUG=$BUILD_DEBUG_INFO \
USE_NCCL=${USE_NCCL} USE_RCCL=${USE_RCCL} USE_KINETO=${USE_KINETO} \
python -m build --wheel --no-isolation --outdir /tmp/$WHEELHOUSE_DIR
python setup.py bdist_wheel -d /tmp/$WHEELHOUSE_DIR
echo "Finished setup.py bdist at $(date)"
# Build libtorch packages

View File

@ -187,22 +187,19 @@ if [[ $CUDA_VERSION == 12* || $CUDA_VERSION == 13* ]]; then
export USE_CUFILE=0
else
DEPS_LIST+=(
"/usr/local/cuda/lib64/libnvToolsExt.so.1"
"/usr/local/cuda/lib64/libcublas.so.12"
"/usr/local/cuda/lib64/libcublasLt.so.12"
"/usr/local/cuda/lib64/libcudart.so.12"
"/usr/local/cuda/lib64/libnvrtc.so.12"
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12")
DEPS_SONAME+=(
"libnvToolsExt.so.1"
"libcublas.so.12"
"libcublasLt.so.12"
"libcudart.so.12"
"libnvrtc.so.12"
"libcupti.so.12")
if [[ $CUDA_VERSION != 12.9* ]]; then
DEPS_LIST+=("/usr/local/cuda/lib64/libnvToolsExt.so.1")
DEPS_SONAME+=("libnvToolsExt.so.1")
fi
fi
else
echo "Using nvidia libs from pypi."

View File

@ -104,7 +104,7 @@ if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
export ROCclr_DIR=/opt/rocm/rocclr/lib/cmake/rocclr
fi
echo "Calling -m pip install . -v --no-build-isolation at $(date)"
echo "Calling 'python -m pip install .' at $(date)"
if [[ $LIBTORCH_VARIANT = *"static"* ]]; then
STATIC_CMAKE_FLAG="-DTORCH_STATIC=1"

View File

@ -107,10 +107,6 @@ if [[ $ROCM_INT -ge 60200 ]]; then
ROCM_SO_FILES+=("librocm-core.so")
fi
if [[ $ROCM_INT -ge 70000 ]]; then
ROCM_SO_FILES+=("librocroller.so")
fi
OS_NAME=`awk -F= '/^NAME/{print $2}' /etc/os-release`
if [[ "$OS_NAME" == *"CentOS Linux"* || "$OS_NAME" == *"AlmaLinux"* ]]; then
LIBGOMP_PATH="/usr/lib64/libgomp.so.1"

View File

@ -89,7 +89,7 @@ fi
if [[ "$BUILD_ENVIRONMENT" == *aarch64* ]]; then
export USE_MKLDNN=1
export USE_MKLDNN_ACL=1
export ACL_ROOT_DIR=/acl
export ACL_ROOT_DIR=/ComputeLibrary
fi
if [[ "$BUILD_ENVIRONMENT" == *riscv64* ]]; then
@ -233,9 +233,7 @@ if [[ "${BUILD_ENVIRONMENT}" != *cuda* ]]; then
export BUILD_STATIC_RUNTIME_BENCHMARK=ON
fi
if [[ "$BUILD_ENVIRONMENT" == *-full-debug* ]]; then
export CMAKE_BUILD_TYPE=Debug
elif [[ "$BUILD_ENVIRONMENT" == *-debug* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-debug* ]]; then
export CMAKE_BUILD_TYPE=RelWithAssert
fi
@ -288,24 +286,19 @@ else
# or building non-XLA tests.
if [[ "$BUILD_ENVIRONMENT" != *rocm* && "$BUILD_ENVIRONMENT" != *xla* && "$BUILD_ENVIRONMENT" != *riscv64* ]]; then
# Install numpy-2.0.2 for builds which are backward compatible with 1.X
python -m pip install numpy==2.0.2
python -mpip install numpy==2.0.2
WERROR=1 python setup.py clean
WERROR=1 python -m build --wheel --no-isolation
WERROR=1 python setup.py bdist_wheel
else
python setup.py clean
if [[ "$BUILD_ENVIRONMENT" == *xla* ]]; then
source .ci/pytorch/install_cache_xla.sh
fi
python -m build --wheel --no-isolation
python setup.py bdist_wheel
fi
pip_install_whl "$(echo dist/*.whl)"
if [[ "$BUILD_ENVIRONMENT" == *full-debug* ]]; then
# Regression test for https://github.com/pytorch/pytorch/issues/164297
# Torch should be importable and that's about it
pushd /; python -c "import torch;print(torch.__config__.show(), torch.randn(5) + 1.7)"; popd
fi
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *vision* ]]; then
install_torchvision

View File

@ -67,13 +67,13 @@ function pip_install_whl() {
# Loop through each path and install individually
for path in "${paths[@]}"; do
echo "Installing $path"
python3 -m pip install --no-index --no-deps "$path"
python3 -mpip install --no-index --no-deps "$path"
done
else
# Loop through each argument and install individually
for path in "${args[@]}"; do
echo "Installing $path"
python3 -m pip install --no-index --no-deps "$path"
python3 -mpip install --no-index --no-deps "$path"
done
fi
}
@ -258,19 +258,11 @@ function install_torchrec_and_fbgemm() {
git clone --recursive https://github.com/pytorch/fbgemm
pushd fbgemm/fbgemm_gpu
git checkout "${fbgemm_commit}" --recurse-submodules
# until the fbgemm_commit includes the tbb patch
patch <<'EOF'
--- a/FbgemmGpu.cmake
+++ b/FbgemmGpu.cmake
@@ -184,5 +184,6 @@ gpu_cpp_library(
fbgemm_gpu_tbe_cache
fbgemm_gpu_tbe_optimizers
fbgemm_gpu_tbe_utils
+ tbb
DESTINATION
fbgemm_gpu)
EOF
python setup.py bdist_wheel --build-variant=rocm
python setup.py bdist_wheel \
--build-variant=rocm \
-DHIP_ROOT_DIR="${ROCM_PATH}" \
-DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
-DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
popd
# Save the wheel before cleaning up
@ -292,7 +284,7 @@ EOF
function clone_pytorch_xla() {
if [[ ! -d ./xla ]]; then
git clone --recursive --quiet https://github.com/pytorch/xla.git
git clone --recursive -b r2.9 https://github.com/pytorch/xla.git
pushd xla
# pin the xla hash so that we don't get broken by changes to xla
git checkout "$(cat ../.github/ci_commit_pins/xla.txt)"

View File

@ -0,0 +1,40 @@
#!/bin/bash
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.ci/pytorch/common_utils.sh"
echo "functorch_doc_push_script.sh: Invoked with $*"
set -ex -o pipefail
version=${DOCS_VERSION:-nightly}
echo "version: $version"
# Build functorch docs
pushd $pt_checkout/functorch/docs
make html
popd
git clone https://github.com/pytorch/functorch -b gh-pages --depth 1 functorch_ghpages
pushd functorch_ghpages
if [ "$version" == "main" ]; then
version=nightly
fi
git rm -rf "$version" || true
mv "$pt_checkout/functorch/docs/build/html" "$version"
git add "$version" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin gh-pages
fi
popd

View File

@ -36,11 +36,11 @@ fi
print_cmake_info
if [[ ${BUILD_ENVIRONMENT} == *"distributed"* ]]; then
# Needed for inductor benchmarks, as lots of HF networks make `torch.distribtued` calls
USE_DISTRIBUTED=1 USE_OPENMP=1 WERROR=1 python -m build --wheel --no-isolation
USE_DISTRIBUTED=1 USE_OPENMP=1 WERROR=1 python setup.py bdist_wheel
else
# Explicitly set USE_DISTRIBUTED=0 to align with the default build config on mac. This also serves as the sole CI config that tests
# that building with USE_DISTRIBUTED=0 works at all. See https://github.com/pytorch/pytorch/issues/86448
USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python -m build --wheel --no-isolation -C--build-option=--plat-name=macosx_11_0_arm64
USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel --plat-name macosx_11_0_arm64
fi
if which sccache > /dev/null; then
print_sccache_stats

View File

@ -55,7 +55,7 @@ test_python_shard() {
setup_test_python
time python test/run_test.py --verbose --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard "$1" "$NUM_TEST_SHARDS"
time python test/run_test.py --verbose --exclude-jit-executor --exclude-distributed-tests --shard "$1" "$NUM_TEST_SHARDS"
assert_git_not_dirty
}
@ -182,7 +182,7 @@ checkout_install_torchbench() {
pip uninstall -y torchao
echo "Print all dependencies after TorchBench is installed"
python -m pip freeze
python -mpip freeze
}
torchbench_setup_macos() {
@ -211,7 +211,7 @@ torchbench_setup_macos() {
}
pip_benchmark_deps() {
python -m pip install --no-input requests cython scikit-learn six
python -mpip install --no-input requests cython scikit-learn six
}
@ -256,7 +256,7 @@ test_torchbench_smoketest() {
local device=mps
local dtypes=(undefined float16 bfloat16 notset)
local dtype=${dtypes[$1]}
local models=(llama BERT_pytorch dcgan yolov3 resnet152 sam sam_fast pytorch_unet stable_diffusion_text_encoder speech_transformer Super_SloMo doctr_det_predictor doctr_reco_predictor vgg16)
local models=(hf_T5 llama BERT_pytorch dcgan hf_GPT2 yolov3 resnet152 sam sam_fast pytorch_unet stable_diffusion_text_encoder speech_transformer Super_SloMo doctr_det_predictor doctr_reco_predictor timm_resnet timm_vovnet vgg16)
for backend in eager inductor; do
@ -319,7 +319,7 @@ test_aoti_torchbench_smoketest() {
local device=mps
local dtypes=(undefined float16 bfloat16 notset)
local dtype=${dtypes[$1]}
local models=(llama BERT_pytorch dcgan yolov3 resnet152 sam sam_fast pytorch_unet stable_diffusion_text_encoder speech_transformer Super_SloMo doctr_det_predictor doctr_reco_predictor vgg16)
local models=(hf_T5 llama BERT_pytorch dcgan hf_GPT2 yolov3 resnet152 sam sam_fast pytorch_unet stable_diffusion_text_encoder speech_transformer Super_SloMo doctr_det_predictor doctr_reco_predictor timm_resnet timm_vovnet vgg16)
echo "Launching torchbench inference performance run for AOT Inductor and dtype ${dtype}"
local dtype_arg="--${dtype}"

View File

@ -26,7 +26,6 @@ if [[ "${SHARD_NUMBER:-2}" == "2" ]]; then
time python test/run_test.py --verbose -i distributed/test_c10d_spawn_gloo
time python test/run_test.py --verbose -i distributed/test_c10d_spawn_nccl
time python test/run_test.py --verbose -i distributed/test_compute_comm_reordering
time python test/run_test.py --verbose -i distributed/test_aten_comm_compute_reordering
time python test/run_test.py --verbose -i distributed/test_store
time python test/run_test.py --verbose -i distributed/test_symmetric_memory
time python test/run_test.py --verbose -i distributed/test_pg_wrapper

View File

@ -32,9 +32,6 @@ LIBTORCH_NAMESPACE_LIST = (
"torch::",
)
# Patterns for detecting statically linked libstdc++ symbols
STATICALLY_LINKED_CXX11_ABI = [re.compile(r".*recursive_directory_iterator.*")]
def _apply_libtorch_symbols(symbols):
return [
@ -56,17 +53,12 @@ def get_symbols(lib: str) -> list[tuple[str, str, str]]:
return [x.split(" ", 2) for x in lines.decode("latin1").split("\n")[:-1]]
def grep_symbols(
lib: str, patterns: list[Any], symbol_type: str | None = None
) -> list[str]:
def grep_symbols(lib: str, patterns: list[Any]) -> list[str]:
def _grep_symbols(
symbols: list[tuple[str, str, str]], patterns: list[Any]
) -> list[str]:
rc = []
for _s_addr, _s_type, s_name in symbols:
# Filter by symbol type if specified
if symbol_type and _s_type != symbol_type:
continue
for pattern in patterns:
if pattern.match(s_name):
rc.append(s_name)
@ -88,18 +80,6 @@ def grep_symbols(
return functools.reduce(list.__add__, (x.result() for x in tasks), [])
def check_lib_statically_linked_libstdc_cxx_abi_symbols(lib: str) -> None:
cxx11_statically_linked_symbols = grep_symbols(
lib, STATICALLY_LINKED_CXX11_ABI, symbol_type="T"
)
num_statically_linked_symbols = len(cxx11_statically_linked_symbols)
print(f"num_statically_linked_symbols (T): {num_statically_linked_symbols}")
if num_statically_linked_symbols > 0:
raise RuntimeError(
f"Found statically linked libstdc++ symbols (recursive_directory_iterator): {cxx11_statically_linked_symbols[:100]}"
)
def check_lib_symbols_for_abi_correctness(lib: str) -> None:
print(f"lib: {lib}")
cxx11_symbols = grep_symbols(lib, LIBTORCH_CXX11_PATTERNS)
@ -127,7 +107,6 @@ def main() -> None:
libtorch_cpu_path = str(install_root / "lib" / "libtorch_cpu.so")
check_lib_symbols_for_abi_correctness(libtorch_cpu_path)
check_lib_statically_linked_libstdc_cxx_abi_symbols(libtorch_cpu_path)
if __name__ == "__main__":

View File

@ -386,8 +386,8 @@ def smoke_test_compile(device: str = "cpu") -> None:
def smoke_test_nvshmem() -> None:
if not torch.cuda.is_available() or target_os == "windows":
print("Windows platform or CUDA is not available, skipping NVSHMEM test")
if not torch.cuda.is_available():
print("CUDA is not available, skipping NVSHMEM test")
return
# Check if NVSHMEM is compiled in current build
@ -396,9 +396,7 @@ def smoke_test_nvshmem() -> None:
except ImportError:
# Not built with NVSHMEM support.
# torch is not compiled with NVSHMEM prior to 2.9
from torch.torch_version import TorchVersion
if TorchVersion(torch.__version__) < (2, 9):
if torch.__version__ < "2.9":
return
else:
# After 2.9: NVSHMEM is expected to be compiled in current build

View File

@ -34,14 +34,12 @@ fi
# Patch numba to avoid CUDA-13 crash, see https://github.com/pytorch/pytorch/issues/162878
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
NUMBA_CUDA_DIR=$(python -c "import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))" 2>/dev/null || true)
if [ -n "$NUMBA_CUDA_DIR" ]; then
NUMBA_PATCH="$(dirname "$(realpath "${BASH_SOURCE[0]}")")/numba-cuda-13.patch"
pushd "$NUMBA_CUDA_DIR"
patch -p4 <"$NUMBA_PATCH"
popd
fi
NUMBA_CUDA_DIR=$(python -c "import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))" 2>/dev/null || true)
if [ -n "$NUMBA_CUDA_DIR" ]; then
NUMBA_PATCH="$(dirname "$(realpath "${BASH_SOURCE[0]}")")/numba-cuda-13.patch"
pushd "$NUMBA_CUDA_DIR"
patch -p4 <"$NUMBA_PATCH"
popd
fi
echo "Environment variables:"
@ -324,26 +322,20 @@ test_python_shard() {
# modify LD_LIBRARY_PATH to ensure it has the conda env.
# This set of tests has been shown to be buggy without it for the split-build
time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests $INCLUDE_CLAUSE --shard "$1" "$NUM_TEST_SHARDS" --verbose $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests $INCLUDE_CLAUSE --shard "$1" "$NUM_TEST_SHARDS" --verbose $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
assert_git_not_dirty
}
test_python() {
# shellcheck disable=SC2086
time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests $INCLUDE_CLAUSE --verbose $PYTHON_TEST_EXTRA_OPTION
time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests $INCLUDE_CLAUSE --verbose $PYTHON_TEST_EXTRA_OPTION
assert_git_not_dirty
}
test_python_smoke() {
# Smoke tests for H100/B200
time python test/run_test.py --include test_matmul_cuda test_scaled_matmul_cuda inductor/test_fp8 inductor/test_max_autotune $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
assert_git_not_dirty
}
test_python_smoke_b200() {
# Targeted smoke tests for B200 - staged approach to avoid too many failures
time python test/run_test.py --include test_matmul_cuda test_scaled_matmul_cuda inductor/test_fp8 $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
# Smoke tests for H100
time python test/run_test.py --include test_matmul_cuda inductor/test_fp8 inductor/test_max_autotune $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
assert_git_not_dirty
}
@ -392,7 +384,6 @@ test_dynamo_wrapped_shard() {
--exclude-distributed-tests \
--exclude-torch-export-tests \
--exclude-aot-dispatch-tests \
--exclude-quantization-tests \
--shard "$1" "$NUM_TEST_SHARDS" \
--verbose \
--upload-artifacts-while-running
@ -437,7 +428,7 @@ test_inductor_distributed() {
# this runs on both single-gpu and multi-gpu instance. It should be smart about skipping tests that aren't supported
# with if required # gpus aren't available
python test/run_test.py --include distributed/test_dynamo_distributed distributed/test_inductor_collectives distributed/test_aten_comm_compute_reordering distributed/test_compute_comm_reordering --verbose
python test/run_test.py --include distributed/test_dynamo_distributed distributed/test_inductor_collectives distributed/test_compute_comm_reordering --verbose
assert_git_not_dirty
}
@ -485,22 +476,6 @@ test_inductor_aoti() {
/usr/bin/env "${TEST_ENVS[@]}" python test/run_test.py --cpp --verbose -i cpp/test_aoti_abi_check cpp/test_aoti_inference cpp/test_vec_half_AVX2 -dist=loadfile
}
test_inductor_aoti_cross_compile_for_windows() {
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
# Set WINDOWS_CUDA_HOME environment variable
WINDOWS_CUDA_HOME="$(pwd)/win-torch-wheel-extracted"
export WINDOWS_CUDA_HOME
echo "WINDOWS_CUDA_HOME is set to: $WINDOWS_CUDA_HOME"
echo "Contents:"
ls -lah "$(pwd)/win-torch-wheel-extracted/lib/x64/" || true
python test/inductor/test_aoti_cross_compile_windows.py -k compile --package-dir "$TEST_REPORTS_DIR" --win-torch-lib-dir "$(pwd)/win-torch-wheel-extracted/torch/lib"
}
test_inductor_cpp_wrapper_shard() {
if [[ -z "$NUM_TEST_SHARDS" ]]; then
echo "NUM_TEST_SHARDS must be defined to run a Python test shard"
@ -854,7 +829,7 @@ test_dynamo_benchmark() {
elif [[ "${suite}" == "timm_models" ]]; then
export TORCHBENCH_ONLY_MODELS="inception_v3"
elif [[ "${suite}" == "torchbench" ]]; then
export TORCHBENCH_ONLY_MODELS="BERT_pytorch"
export TORCHBENCH_ONLY_MODELS="hf_Bert"
fi
fi
test_single_dynamo_benchmark "dashboard" "$suite" "$shard_id" "$@"
@ -885,13 +860,13 @@ test_inductor_torchbench_smoketest_perf() {
mkdir -p "$TEST_REPORTS_DIR"
python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --float16 --training \
--batch-size-file "$(realpath benchmarks/dynamo/torchbench_models_list.txt)" --only BERT_pytorch \
--batch-size-file "$(realpath benchmarks/dynamo/torchbench_models_list.txt)" --only hf_Bert \
--output "$TEST_REPORTS_DIR/inductor_training_smoketest.csv"
# The threshold value needs to be actively maintained to make this check useful
python benchmarks/dynamo/check_perf_csv.py -f "$TEST_REPORTS_DIR/inductor_training_smoketest.csv" -t 1.4
# Check memory compression ratio for a few models
for test in BERT_pytorch yolov3; do
for test in hf_Albert timm_vision_transformer; do
python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --amp --training \
--disable-cudagraphs --batch-size-file "$(realpath benchmarks/dynamo/torchbench_models_list.txt)" \
--only $test --output "$TEST_REPORTS_DIR/inductor_training_smoketest_$test.csv"
@ -902,7 +877,7 @@ test_inductor_torchbench_smoketest_perf() {
done
# Perform some "warm-start" runs for a few huggingface models.
for test in AllenaiLongformerBase DistilBertForMaskedLM DistillGPT2 GoogleFnet YituTechConvBert; do
for test in AlbertForQuestionAnswering AllenaiLongformerBase DistilBertForMaskedLM DistillGPT2 GoogleFnet YituTechConvBert; do
python benchmarks/dynamo/huggingface.py --accuracy --training --amp --inductor --device cuda --warm-start-latency \
--only $test --output "$TEST_REPORTS_DIR/inductor_warm_start_smoketest_$test.csv"
python benchmarks/dynamo/check_accuracy.py \
@ -916,7 +891,7 @@ test_inductor_set_cpu_affinity(){
export LD_PRELOAD="$JEMALLOC_LIB":"$LD_PRELOAD"
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
if [[ "$(uname -m)" != "aarch64" ]]; then
if [[ "${TEST_CONFIG}" != *aarch64* ]]; then
# Use Intel OpenMP for x86
IOMP_LIB="$(dirname "$(which python)")/../lib/libiomp5.so"
export LD_PRELOAD="$IOMP_LIB":"$LD_PRELOAD"
@ -930,7 +905,7 @@ test_inductor_set_cpu_affinity(){
cores=$((cpus / thread_per_core))
# Set number of cores to 16 on aarch64 for performance runs
if [[ "$(uname -m)" == "aarch64" && $cores -gt 16 ]]; then
if [[ "${TEST_CONFIG}" == *aarch64* && $cores -gt 16 ]]; then
cores=16
fi
export OMP_NUM_THREADS=$cores
@ -1181,12 +1156,6 @@ test_distributed() {
fi
}
test_quantization() {
echo "Testing quantization"
python test/test_quantization.py
}
test_rpc() {
echo "Testing RPC C++ tests"
# NB: the ending test_rpc must match the current function name for the current
@ -1433,8 +1402,8 @@ EOF
pip3 install -r requirements.txt
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
python -m build --wheel --no-isolation -C--build-option=--bdist-dir="base_bdist_tmp" --outdir "base_dist"
python -m pip install base_dist/*.whl
python setup.py bdist_wheel --bdist-dir="base_bdist_tmp" --dist-dir="base_dist"
python -mpip install base_dist/*.whl
echo "::endgroup::"
pushd test/forward_backward_compatibility
@ -1581,10 +1550,14 @@ test_executorch() {
install_torchvision
install_torchaudio
INSTALL_SCRIPT="$(pwd)/.ci/docker/common/install_executorch.sh"
pushd /executorch
"${INSTALL_SCRIPT}" setup_executorch
export PYTHON_EXECUTABLE=python
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
# NB: We need to rebuild ExecuTorch runner here because it depends on PyTorch
# from the PR
bash .ci/scripts/setup-linux.sh --build-tool cmake
echo "Run ExecuTorch unit tests"
pytest -v -n auto
@ -1598,14 +1571,17 @@ test_executorch() {
popd
# Test torchgen generated code for Executorch.
echo "Testing ExecuTorch op registration"
"$BUILD_BIN_DIR"/test_edge_op_registration
assert_git_not_dirty
}
test_linux_aarch64() {
python test/run_test.py --include test_modules test_mkldnn test_mkldnn_fusion test_openmp test_torch test_dynamic_shapes \
test_transformers test_multiprocessing test_numpy_interop test_autograd test_binary_ufuncs test_complex test_spectral_ops \
test_foreach test_reductions test_unary_ufuncs test_tensor_creation_ops test_ops profiler/test_memory_profiler \
distributed/elastic/timer/api_test distributed/elastic/timer/local_timer_example distributed/elastic/timer/local_timer_test \
test_foreach test_reductions test_unary_ufuncs test_tensor_creation_ops test_ops \
--shard "$SHARD_NUMBER" "$NUM_TEST_SHARDS" --verbose
# Dynamo tests
@ -1631,12 +1607,11 @@ test_operator_benchmark() {
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
TEST_DIR=$(pwd)
ARCH=$(uname -m)
test_inductor_set_cpu_affinity
cd benchmarks/operator_benchmark/pt_extension
python -m pip install . -v --no-build-isolation
python -m pip install .
cd "${TEST_DIR}"/benchmarks/operator_benchmark
$TASKSET python -m benchmark_all_test --device "$1" --tag-filter "$2" \
@ -1646,7 +1621,7 @@ test_operator_benchmark() {
pip_install pandas
python check_perf_csv.py \
--actual "${TEST_REPORTS_DIR}/operator_benchmark_eager_float32_cpu.csv" \
--expected "${ARCH}_expected_ci_operator_benchmark_eager_float32_cpu.csv"
--expected "expected_ci_operator_benchmark_eager_float32_cpu.csv"
}
test_operator_microbenchmark() {
@ -1683,7 +1658,7 @@ if [[ "${TEST_CONFIG}" == *numpy_2* ]]; then
python -m pip install --pre numpy==2.0.2 scipy==1.13.1 numba==0.60.0
fi
python test/run_test.py --include dynamo/test_functions.py dynamo/test_unspec.py test_binary_ufuncs.py test_fake_tensor.py test_linalg.py test_numpy_interop.py test_tensor_creation_ops.py test_torch.py torch_np/test_basic.py
elif [[ "${BUILD_ENVIRONMENT}" == *aarch64* && "${TEST_CONFIG}" == 'default' ]]; then
elif [[ "${BUILD_ENVIRONMENT}" == *aarch64* && "${TEST_CONFIG}" != *perf_cpu_aarch64* ]]; then
test_linux_aarch64
elif [[ "${TEST_CONFIG}" == *backward* ]]; then
test_forward_backward_compatibility
@ -1700,8 +1675,6 @@ elif [[ "${TEST_CONFIG}" == *executorch* ]]; then
test_executorch
elif [[ "$TEST_CONFIG" == 'jit_legacy' ]]; then
test_python_legacy_jit
elif [[ "$TEST_CONFIG" == 'quantization' ]]; then
test_quantization
elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then
# TODO: run some C++ tests
echo "no-op at the moment"
@ -1734,8 +1707,6 @@ elif [[ "${TEST_CONFIG}" == *inductor-triton-cpu* ]]; then
test_inductor_triton_cpu
elif [[ "${TEST_CONFIG}" == *inductor-micro-benchmark* ]]; then
test_inductor_micro_benchmark
elif [[ "${TEST_CONFIG}" == *aoti_cross_compile_for_windows* ]]; then
test_inductor_aoti_cross_compile_for_windows
elif [[ "${TEST_CONFIG}" == *huggingface* ]]; then
install_torchvision
id=$((SHARD_NUMBER-1))
@ -1830,14 +1801,10 @@ elif [[ "${BUILD_ENVIRONMENT}" == *xpu* ]]; then
test_xpu_bin
elif [[ "${TEST_CONFIG}" == smoke ]]; then
test_python_smoke
elif [[ "${TEST_CONFIG}" == smoke_b200 ]]; then
test_python_smoke_b200
elif [[ "${TEST_CONFIG}" == h100_distributed ]]; then
test_h100_distributed
elif [[ "${TEST_CONFIG}" == "h100-symm-mem" ]]; then
test_h100_symm_mem
elif [[ "${TEST_CONFIG}" == "b200-symm-mem" ]]; then
test_h100_symm_mem
elif [[ "${TEST_CONFIG}" == h100_cutlass_backend ]]; then
test_h100_cutlass_backend
else

View File

@ -1,32 +0,0 @@
#!/bin/bash
set -ex -o pipefail
# Suppress ANSI color escape sequences
export TERM=vt100
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
echo "Environment variables"
env
echo "Testing FA3 stable wheel still works with currently built torch"
echo "Installing ABI Stable FA3 wheel"
# The wheel was built on https://github.com/Dao-AILab/flash-attention/commit/b3846b059bf6b143d1cd56879933be30a9f78c81
# on torch nightly torch==2.9.0.dev20250830+cu129
$MAYBE_SUDO pip -q install https://s3.amazonaws.com/ossci-linux/wheels/flash_attn_3-3.0.0b1-cp39-abi3-linux_x86_64.whl
pushd flash-attention/hopper
export PYTHONPATH=$PWD
pytest -v -s \
"test_flash_attn.py::test_flash_attn_output[1-1-192-False-False-False-0.0-False-False-mha-dtype0]" \
"test_flash_attn.py::test_flash_attn_varlen_output[511-1-64-True-False-False-0.0-False-False-gqa-dtype2]" \
"test_flash_attn.py::test_flash_attn_kvcache[1-128-128-False-False-True-None-0.0-False-False-True-False-True-False-gqa-dtype0]" \
"test_flash_attn.py::test_flash_attn_race_condition[97-97-192-True-dtype0]" \
"test_flash_attn.py::test_flash_attn_combine[2-3-64-dtype1]" \
"test_flash_attn.py::test_flash3_bw_compatibility"
popd

View File

@ -70,7 +70,7 @@ sccache --zero-stats
sccache --show-stats
# Build the wheel
python -m build --wheel --no-build-isolation
python setup.py bdist_wheel
if ($LASTEXITCODE -ne 0) { exit 1 }
# Install the wheel locally

View File

@ -38,12 +38,10 @@ if errorlevel 1 goto fail
if not errorlevel 0 goto fail
:: Update CMake
:: TODO: Investigate why this helps MKL detection, even when CMake from choco is not used
call choco upgrade -y cmake --no-progress --installargs 'ADD_CMAKE_TO_PATH=System' --apply-install-arguments-to-dependencies --version=3.27.9
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
:: TODO: Move to .ci/docker/requirements-ci.txt
call pip install mkl==2024.2.0 mkl-static==2024.2.0 mkl-include==2024.2.0
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
@ -132,7 +130,7 @@ if "%USE_CUDA%"=="1" (
:: Print all existing environment variable for debugging
set
python -m build --wheel --no-isolation
python setup.py bdist_wheel
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
sccache --show-stats

View File

@ -15,35 +15,37 @@ if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1
cd %TMP_DIR_WIN%\build\torch\test
:: Enable delayed variable expansion to make the list
setlocal enabledelayedexpansion
set EXE_LIST=
for /r "." %%a in (*.exe) do (
if "%%~na" == "c10_intrusive_ptr_benchmark" (
@REM NB: This is not a gtest executable file, thus couldn't be handled by
@REM pytest-cpp and is excluded from test discovery by run_test
call "%%~fa"
call :libtorch_check "%%~na" "%%~fa"
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
) else (
if "%%~na" == "verify_api_visibility" (
@REM Skip verify_api_visibility as it is a compile-level test
) else (
set EXE_LIST=!EXE_LIST! cpp/%%~na
)
)
)
goto :eof
:libtorch_check
cd %CWD%
set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\test
:: Run python test\run_test.py on the list
set NO_TD=True && python test\run_test.py --cpp --verbose -i !EXE_LIST!
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
:: Skip verify_api_visibility as it a compile level test
if "%~1" == "verify_api_visibility" goto :eof
goto :eof
echo Running "%~2"
if "%~1" == "c10_intrusive_ptr_benchmark" (
:: NB: This is not a gtest executable file, thus couldn't be handled by pytest-cpp
call "%~2"
goto :eof
)
python test\run_test.py --cpp --verbose -i "cpp/%~1"
if errorlevel 1 (
echo %1 failed with exit code %errorlevel%
goto fail
)
if not errorlevel 0 (
echo %1 failed with exit code %errorlevel%
goto fail
)
:eof
exit /b 0

View File

@ -25,7 +25,7 @@ echo Copying over test times file
robocopy /E "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.additional_ci_files" "%PROJECT_DIR_WIN%\.additional_ci_files"
echo Run nn tests
python run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard "%SHARD_NUMBER%" "%NUM_TEST_SHARDS%" --verbose
python run_test.py --exclude-jit-executor --exclude-distributed-tests --shard "%SHARD_NUMBER%" "%NUM_TEST_SHARDS%" --verbose
if ERRORLEVEL 1 goto fail
popd

View File

@ -37,8 +37,27 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
export PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda"
fi
# TODO: Move this to .ci/docker/requirements-ci.txt
python -m pip install "psutil==5.9.1" nvidia-ml-py "pytest-shard==0.1.2"
# TODO: Move both of them to Windows AMI
python -m pip install tensorboard==2.13.0 protobuf==5.29.4 pytest-subtests==0.13.1
# Copied from https://github.com/pytorch/test-infra/blob/be01a40157c36cd5a48391fdf44a7bc3ebd4c7e3/aws/ami/windows/scripts/Installers/Install-Pip-Dependencies.ps1#L16 with some adjustments
# pytest-rerunfailures==10.3 as 10.2 fails with INTERNALERROR> pluggy._manager.PluginValidationError: unknown hook 'pytest_configure_node'
# scipy from 1.6.3 to 1.10
# expecttest from 0.1.3 to 0.3.0
# xdoctest from 1.0.2 to 1.3.0
python -m pip install "future==0.18.2" "hypothesis==5.35.1" "expecttest==0.3.0" "librosa>=0.6.2" "scipy==1.10.1" "psutil==5.9.1" "pynvml==11.4.1" "pillow==9.2.0" "unittest-xml-reporting<=3.2.0,>=2.0.0" "pytest==7.1.3" "pytest-xdist==2.5.0" "pytest-flakefinder==1.1.0" "pytest-rerunfailures==10.3" "pytest-shard==0.1.2" "sympy==1.11.1" "xdoctest==1.3.0" "pygments==2.12.0" "opt-einsum>=3.3" "networkx==2.8.8" "mpmath==1.2.1" "pytest-cpp==2.3.0" "boto3==1.35.42"
# Install Z3 optional dependency for Windows builds.
python -m pip install z3-solver==4.15.1.0
# Install tlparse for test\dynamo\test_structured_trace.py UTs.
python -m pip install tlparse==0.4.0
# Install parameterized
python -m pip install parameterized==0.8.1
# Install pulp for testing ilps under torch\distributed\_tools
python -m pip install pulp==2.9.0
run_tests() {
# Run nvidia-smi if available

View File

@ -48,7 +48,7 @@ sccache --zero-stats
sccache --show-stats
:: Call PyTorch build script
python -m build --wheel --no-isolation --outdir "%PYTORCH_FINAL_PACKAGE_DIR%"
python setup.py bdist_wheel -d "%PYTORCH_FINAL_PACKAGE_DIR%"
:: show sccache stats
sccache --show-stats

View File

@ -37,10 +37,10 @@ IF "%CUDA_PATH_V128%"=="" (
)
IF "%BUILD_VISION%" == "" (
set TORCH_CUDA_ARCH_LIST=7.0;7.5;8.0;8.6;9.0;10.0;12.0
set TORCH_CUDA_ARCH_LIST=6.1;7.0;7.5;8.0;8.6;9.0;10.0;12.0
set TORCH_NVCC_FLAGS=-Xfatbin -compress-all
) ELSE (
set NVCC_FLAGS=-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_100,code=compute_100 -gencode=arch=compute_120,code=compute_120
set NVCC_FLAGS=-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_100,code=compute_100 -gencode=arch=compute_120,code=compute_120
)
set "CUDA_PATH=%CUDA_PATH_V128%"

View File

@ -28,5 +28,5 @@ start /wait "" python-amd64.exe /quiet InstallAllUsers=1 PrependPath=0 Include_t
if errorlevel 1 exit /b 1
set "PATH=%CD%\Python\Scripts;%CD%\Python;%PATH%"
%PYTHON_EXEC% -m pip install --upgrade pip setuptools packaging wheel build
%PYTHON_EXEC% -m pip install --upgrade pip setuptools packaging wheel
if errorlevel 1 exit /b 1

View File

@ -86,7 +86,7 @@ copy /Y "%LIBTORCH_PREFIX%-%PYTORCH_BUILD_VERSION%.zip" "%PYTORCH_FINAL_PACKAGE_
goto build_end
:pytorch
%PYTHON_EXEC% -m build --wheel --no-isolation --outdir "%PYTORCH_FINAL_PACKAGE_DIR%"
%PYTHON_EXEC% setup.py bdist_wheel -d "%PYTORCH_FINAL_PACKAGE_DIR%"
:build_end
IF ERRORLEVEL 1 exit /b 1

View File

@ -63,7 +63,7 @@ if errorlevel 1 exit /b 1
call %CONDA_HOME%\condabin\activate.bat testenv
if errorlevel 1 exit /b 1
call conda install -y -q -c conda-forge libuv=1.51
call conda install -y -q -c conda-forge libuv=1.39
call conda install -y -q intel-openmp
echo "install and test libtorch"

View File

@ -18,7 +18,7 @@ if "%DESIRED_PYTHON%" == "3.9" %PYTHON_EXEC% -m pip install numpy==2.0.2 cmake
%PYTHON_EXEC% -m pip install pyyaml
%PYTHON_EXEC% -m pip install mkl-include mkl-static
%PYTHON_EXEC% -m pip install boto3 requests ninja typing_extensions setuptools==72.1.0
%PYTHON_EXEC% -m pip install boto3 ninja typing_extensions setuptools==72.1.0
where cmake.exe

View File

@ -85,7 +85,7 @@ mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR" || true
# Create an isolated directory to store this builds pytorch checkout and conda
# installation
if [[ -z "$MAC_PACKAGE_WORK_DIR" ]]; then
MAC_PACKAGE_WORK_DIR="$(pwd)/tmp_wheel_${DESIRED_PYTHON}_$(date +%H%M%S)"
MAC_PACKAGE_WORK_DIR="$(pwd)/tmp_wheel_conda_${DESIRED_PYTHON}_$(date +%H%M%S)"
fi
mkdir -p "$MAC_PACKAGE_WORK_DIR" || true
if [[ -n ${GITHUB_ACTIONS} ]]; then
@ -96,11 +96,11 @@ fi
whl_tmp_dir="${MAC_PACKAGE_WORK_DIR}/dist"
mkdir -p "$whl_tmp_dir"
mac_version='macosx-11_0-arm64'
mac_version='macosx_11_0_arm64'
libtorch_arch='arm64'
# Create a consistent wheel package name to rename the wheel to
wheel_filename_new="${TORCH_PACKAGE_NAME}-${build_version}${build_number_prefix}-cp${python_nodot}-none-${mac_version//[-,]/_}.whl"
wheel_filename_new="${TORCH_PACKAGE_NAME}-${build_version}${build_number_prefix}-cp${python_nodot}-none-${mac_version}.whl"
###########################################################
@ -125,6 +125,7 @@ popd
export TH_BINARY_BUILD=1
export INSTALL_TEST=0 # dont install test binaries into site-packages
export MACOSX_DEPLOYMENT_TARGET=11.0
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
@ -132,20 +133,25 @@ RENAME_WHEEL=true
case $desired_python in
3.14t)
echo "Using 3.14 deps"
mac_version='macosx-11.0-arm64'
NUMPY_PINNED_VERSION="==2.1.0"
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
RENAME_WHEEL=false
;;
3.14)
echo "Using 3.14t deps"
mac_version='macosx-11.0-arm64'
NUMPY_PINNED_VERSION="==2.1.0"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
RENAME_WHEEL=false
;;
3.13t)
echo "Using 3.13t deps"
mac_version='macosx-11.0-arm64'
echo "Using 3.13 deps"
NUMPY_PINNED_VERSION="==2.1.0"
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
RENAME_WHEEL=false
;;
3.13)
@ -170,12 +176,17 @@ case $desired_python in
;;
esac
# Install into a fresh env
tmp_env_name="wheel_py$python_nodot"
conda create ${EXTRA_CONDA_INSTALL_FLAGS} -yn "$tmp_env_name" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS}
source activate "$tmp_env_name"
PINNED_PACKAGES=(
"numpy${NUMPY_PINNED_VERSION}"
)
python -m venv ~/${desired_python}-build
source ~/${desired_python}-build/bin/activate
retry pip install "${PINNED_PACKAGES[@]}" -r "${pytorch_rootdir}/requirements.txt"
retry pip install "${PINNED_PACKAGES[@]}" -r "${pytorch_rootdir}/requirements-build.txt"
pip install requests ninja typing-extensions
retry pip install -r "${pytorch_rootdir}/requirements.txt" || true
retry brew install libomp
# For USE_DISTRIBUTED=1 on macOS, need libuv, which is build as part of tensorpipe submodule
@ -186,11 +197,11 @@ export USE_QNNPACK=OFF
export BUILD_TEST=OFF
pushd "$pytorch_rootdir"
echo "Calling -m build --wheel --no-isolation at $(date)"
echo "Calling setup.py bdist_wheel at $(date)"
_PYTHON_HOST_PLATFORM=${mac_version} ARCHFLAGS="-arch arm64" python -m build --wheel --no-isolation --outdir "$whl_tmp_dir" -C--plat-name="${mac_version//[-.]/_}"
python setup.py bdist_wheel -d "$whl_tmp_dir" --plat-name ${mac_version}
echo "Finished -m build --wheel --no-isolation at $(date)"
echo "Finished setup.py bdist_wheel at $(date)"
if [[ $package_type != 'libtorch' ]]; then
echo "delocating wheel dependencies"

View File

@ -71,7 +71,14 @@ export PYTORCH_BUILD_NUMBER=1
# Set triton version as part of PYTORCH_EXTRA_INSTALL_REQUIREMENTS
TRITON_VERSION=$(cat $PYTORCH_ROOT/.ci/docker/triton_version.txt)
TRITON_CONSTRAINT="platform_system == 'Linux'"
# Here PYTORCH_EXTRA_INSTALL_REQUIREMENTS is already set for the all the wheel builds hence append TRITON_CONSTRAINT
TRITON_CONSTRAINT="platform_system == 'Linux' and platform_machine == 'x86_64'"
# CUDA 12.9/13.0 builds have triton for Linux and Linux aarch64 binaries.
if [[ "$DESIRED_CUDA" == "cu129" ]] || [[ "$DESIRED_CUDA" == "cu130" ]]; then
TRITON_CONSTRAINT="platform_system == 'Linux'"
fi
if [[ "$PACKAGE_TYPE" =~ .*wheel.* && -n "${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" && ! "$PYTORCH_BUILD_VERSION" =~ .*xpu.* ]]; then
TRITON_REQUIREMENT="triton==${TRITON_VERSION}; ${TRITON_CONSTRAINT}"

View File

@ -0,0 +1,47 @@
#!/bin/bash
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.ci/pytorch/common_utils.sh"
echo "functorch_doc_push_script.sh: Invoked with $*"
set -ex
version=${DOCS_VERSION:-nightly}
echo "version: $version"
# Build functorch docs
pushd $pt_checkout/functorch/docs
pip -q install -r requirements.txt
make html
popd
git clone https://github.com/pytorch/functorch -b gh-pages --depth 1 functorch_ghpages
pushd functorch_ghpages
if [ $version == "main" ]; then
version=nightly
fi
git rm -rf "$version" || true
mv "$pt_checkout/functorch/docs/build/html" "$version"
git add "$version" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin gh-pages
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -59,19 +59,16 @@ performance-*,
-performance-enum-size,
readability-container-size-empty,
readability-delete-null-pointer,
readability-duplicate-include,
readability-duplicate-include
readability-misplaced-array-index,
readability-redundant*,
readability-redundant*
readability-simplify-subscript-expr,
readability-string-compare,
-readability-redundant-access-specifiers,
-readability-redundant-control-flow,
-readability-redundant-inline-specifier,
'
HeaderFilterRegex: '^(aten/|c10/|torch/).*$'
WarningsAsErrors: '*'
LineFilter:
- name: '/usr/include/.*'
CheckOptions:
cppcoreguidelines-special-member-functions.AllowSoleDefaultDtor: true
cppcoreguidelines-special-member-functions.AllowImplicitlyDeletedCopyOrMove: true

View File

@ -12,7 +12,7 @@ ignore =
# to line this up with executable bit
EXE001,
# these ignores are from flake8-bugbear; please fix!
B007,B008,B017,B019,B023,B028,B903,B905,B906,B907,B908,B910
B007,B008,B017,B019,B023,B028,B903,B904,B905,B906,B907,B908,B910
# these ignores are from flake8-comprehensions; please fix!
C407,
# these ignores are from flake8-logging-format; please fix!
@ -73,7 +73,7 @@ exclude =
./docs/src,
./functorch/docs,
./functorch/examples,
./functorch/docs/source/tutorials,
./functorch/notebooks,
./scripts,
./test/generated_type_hints_smoketest.py,
./third_party,

View File

@ -1,14 +1,9 @@
---
name: "⚠️ CI SEV"
about: Tracking incidents for PyTorch's CI infra.
title: ''
labels: ''
assignees: ''
---
> NOTE: Remember to label this issue with "`ci: sev`"
> If you want autorevert to be disabled, keep the ci: disable-autorevert label
<!-- Add the `merge blocking` label to this PR to prevent PRs from being merged while this issue is open -->

View File

@ -1,18 +0,0 @@
---
name: "D❌\U0001F519 ISABLE AUTOREVERT"
about: Disables autorevert when open
title: "[DISABLE AUTOREVERT]"
labels: 'ci: disable-autorevert'
assignees: ''
---
This issue, while open, disables the autorevert functionality.
More details can be found [here](https://github.com/pytorch/test-infra/blob/main/aws/lambda/pytorch-auto-revert/README.md)
## Why are you disabling autorevert?
## Links to any issues/commits/errors that shows the source of problem

View File

@ -1,10 +1,8 @@
---
name: Disable CI jobs (PyTorch Dev Infra only)
about: Use this template to disable CI jobs
title: DISABLED [WORKFLOW_NAME] / [PLATFORM_NAME] / [JOB_NAME]
labels: 'module: ci'
assignees: ''
title: "DISABLED [WORKFLOW_NAME] / [PLATFORM_NAME] / [JOB_NAME]"
labels: "module: ci"
---
> For example, DISABLED pull / win-vs2022-cpu-py3 / test (default). Once

View File

@ -21,10 +21,6 @@ self-hosted-runner:
- linux.arm64.2xlarge.ephemeral
- linux.arm64.m7g.4xlarge
- linux.arm64.m7g.4xlarge.ephemeral
- linux.arm64.r7g.12xlarge.memory
- linux.aws.h100
- linux.aws.h100.4
- linux.aws.h100.8
- linux.4xlarge.nvidia.gpu
- linux.8xlarge.nvidia.gpu
- linux.16xlarge.nvidia.gpu

View File

@ -65,7 +65,7 @@ runs:
cd .ci/lumen_cli
python3 -m pip install -e .
)
MAX_JOBS="$(nproc --ignore=10)"
MAX_JOBS="$(nproc --ignore=6)"
export MAX_JOBS
# Split the comma-separated list and build each target

View File

@ -274,6 +274,8 @@ runs:
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
# Propagate download.pytorch.org IP to container
grep download.pytorch.org /etc/hosts | docker exec -i "${container_name}" sudo bash -c "/bin/cat >> /etc/hosts"
echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}"
docker exec -t "${container_name}" sh -c "pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}"

View File

@ -264,7 +264,7 @@ def unzip_artifact_and_replace_files() -> None:
change_content_to_new_version(f"artifacts/dist/{old_stem}/torch/version.py")
for file in Path(f"artifacts/dist/{old_stem}").glob(
"*.dist-info/*",
"*.dist-info/**",
):
change_content_to_new_version(file)

View File

@ -28,10 +28,6 @@ runs:
echo "instance-type: $(get_ec2_metadata instance-type)"
echo "system info $(uname -a)"
- name: Print GPU info (if present)
shell: bash
run: if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi
- name: Check if in a container runner
shell: bash
id: check_container_runner
@ -86,6 +82,37 @@ runs:
# Prune all of the docker images
docker system prune -af
- name: Manually resolve download.pytorch.org
shell: bash
continue-on-error: true
run: |
set +e
set -x
PT_DOMAIN=download.pytorch.org
# TODO: Flaky access to download.pytorch.org https://github.com/pytorch/pytorch/issues/100400,
# cleaning this up once the issue is fixed. There are more than one resolved IP here, the last
# one is returned at random
RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" | tail -n1)
if [ -z "${RESOLVED_IP}" ]; then
echo "Couldn't resolve ${PT_DOMAIN}, retrying with Google DNS..."
RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" @8.8.8.8 | tail -n1)
if [ -z "${RESOLVED_IP}" ]; then
echo "Couldn't resolve ${PT_DOMAIN}, exiting..."
exit 1
fi
fi
if grep -r "${PT_DOMAIN}" /etc/hosts; then
# Clean up any old records first
sudo sed -i "/${PT_DOMAIN}/d" /etc/hosts
fi
echo "${RESOLVED_IP} ${PT_DOMAIN}" | sudo tee -a /etc/hosts
cat /etc/hosts
- name: Check that the docker daemon is running
shell: bash
continue-on-error: true

View File

@ -111,16 +111,3 @@ runs:
# This video group ID maps to subgid 1 inside the docker image due to the /etc/subgid entries.
# The group name corresponding to group ID 1 can change depending on the OS, so both are necessary.
echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd $DEVICE_FLAG --group-add video --group-add $render_gid --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host" >> "${GITHUB_ENV}"
- name: configure aws credentials
id: aws_creds
uses: aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 # v4.1.0
with:
role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only
aws-region: us-east-1
role-duration-seconds: 18000
- name: Login to Amazon ECR
id: login-ecr
continue-on-error: true
uses: aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076 # v2.0.1

View File

@ -23,6 +23,9 @@ runs:
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Clean up leftover processes on non-ephemeral Windows runner
uses: pytorch/test-infra/.github/actions/cleanup-runner@main
# Cleaning up Windows workspace sometimes fails flakily with device or resource busy
# error, meaning one or more processes haven't stopped completely yet. So trying to
# retry this step several time similar to how checkout-pytorch GHA does

View File

@ -33,6 +33,10 @@ runs:
)
echo "CONTAINER_NAME=${container_name}" >> "$GITHUB_ENV"
if [[ "${GPU_ARCH_TYPE}" != "rocm" && "${BUILD_ENVIRONMENT}" != "linux-aarch64-binary-manywheel" && "${BUILD_ENVIRONMENT}" != "linux-s390x-binary-manywheel" && "${GPU_ARCH_TYPE}" != "xpu" ]]; then
# Propagate download.pytorch.org IP to container. This is only needed on Linux non aarch64 runner
grep download.pytorch.org /etc/hosts | docker exec -i "${container_name}" bash -c "/bin/cat >> /etc/hosts"
fi
docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh"
# Generate test script

View File

@ -1 +1 @@
1b013f5b5a87a1882eb143c26d79d091150d6a37
27fc2493d383354a008106f22f3be232badee9a1

View File

@ -1 +1 @@
08ae0af1395c8d8471f4025deb6af9aef90b342f
7f1de94a4c2d14f59ad4ca84538c36084ea6b2c8

View File

@ -1 +1 @@
faffd5cf673615583da6517275e361cb3dbc77e6
966da7e46f65d6d49df3e31214470a4fe5cc8e66

View File

@ -1 +1 @@
e5192819208c4d68194844b7dfafbc00020d0dea
e10fef08838612b4560e9c72e5cb1414a5edfa13

View File

@ -1 +1 @@
0fa6e3129e61143224663e1ec67980d12b7ec4eb
r2.9

View File

@ -1,41 +1,59 @@
# TODO(elainwy): remove this file after the torch nightly dockerfile is in sync in vllm repo
# The vLLM Dockerfile is used to construct vLLM image against torch nightly and torch main that can be directly used for testing
ARG CUDA_VERSION=12.8.1
ARG PYTHON_VERSION=3.12
# BUILD_BASE_IMAGE: used to setup python build xformers, and vllm wheels, It can be replaced with a different base image from local machine,
# by default, it uses the torch-nightly-base stage from this docker image
ARG BUILD_BASE_IMAGE=torch-nightly-base
# FINAL_BASE_IMAGE: used to set up vllm-instaled environment and build flashinfer,
# by default, it uses devel-ubuntu22.04 official image.
ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
# The logic is copied from https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile
ARG GET_PIP_URL="https://bootstrap.pypa.io/get-pip.py"
#################### TORCH NIGHTLY BASE IMAGE ####################
# A base image for building vLLM with devel ubuntu 22.04, this is mainly used to build vllm in vllm builtkite ci
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04 as torch-nightly-base
ARG CUDA_VERSION
ARG PYTHON_VERSION
ARG GET_PIP_URL
# Install system dependencies and uv, then create Python virtual environment
# Install Python and other dependencies
RUN apt-get update -y \
&& apt-get install -y ccache software-properties-common git curl sudo vim python3-pip \
&& curl -LsSf https://astral.sh/uv/install.sh | sh \
&& $HOME/.local/bin/uv venv /opt/venv --python ${PYTHON_VERSION} \
&& rm -f /usr/bin/python3 /usr/bin/python3-config /usr/bin/pip \
&& ln -s /opt/venv/bin/python3 /usr/bin/python3 \
&& ln -s /opt/venv/bin/python3-config /usr/bin/python3-config \
&& ln -s /opt/venv/bin/pip /usr/bin/pip \
&& apt-get install -y ccache software-properties-common git curl wget sudo vim \
&& add-apt-repository -y ppa:deadsnakes/ppa \
&& apt-get update -y \
&& apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-venv \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1 \
&& update-alternatives --set python3 /usr/bin/python${PYTHON_VERSION} \
&& ln -sf /usr/bin/python${PYTHON_VERSION}-config /usr/bin/python3-config \
&& curl -sS ${GET_PIP_URL} | python${PYTHON_VERSION} \
&& python3 --version && python3 -m pip --version
# Upgrade to GCC 10 to avoid https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92519
# as it was causing spam when compiling the CUTLASS kernels
RUN apt-get install -y gcc-10 g++-10
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 110 --slave /usr/bin/g++ g++ /usr/bin/g++-10
RUN <<EOF
gcc --version
EOF
# Ensure gcc >= 10 to avoid CUTLASS issues (bug 92519)
RUN current_gcc_version=$(gcc -dumpversion | cut -f1 -d.) && \
if command -v apt-get >/dev/null; then \
if [ "$current_gcc_version" -lt 10 ]; then \
echo "GCC version is $current_gcc_version, installing gcc-10..."; \
apt-get update \
&& apt-get install -y gcc-10 g++-10 \
&& update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 100 \
&& update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 100; \
else \
echo "GCC version is $current_gcc_version, no need to install gcc-10."; \
fi \
fi \
&& gcc --version && g++ --version
# Install uv for faster pip installs
# install uv for faster pip installs
RUN --mount=type=cache,target=/root/.cache/uv \
python3 -m pip install uv==0.8.4
@ -43,32 +61,42 @@ ENV UV_HTTP_TIMEOUT=500
ENV UV_INDEX_STRATEGY="unsafe-best-match"
# Use copy mode to avoid hardlink failures with Docker cache mounts
ENV UV_LINK_MODE=copy
#################### TORCH NIGHTLY BASE IMAGE ####################
#################### BASE BUILD IMAGE ####################
# A base image for building vLLM with torch nightly or torch wheels
# prepare basic build environment
FROM ${BUILD_BASE_IMAGE} AS base
USER root
ARG CUDA_VERSION
ARG PYTHON_VERSION
# Only work with PyTorch manylinux builder
# TODO (huydhn): Only work with PyTorch manylinux builder
ENV PATH="/opt/python/cp312-cp312/bin:${PATH}"
# Install some system dependencies and double check python version
RUN if command -v apt-get >/dev/null; then \
apt-get update -y \
&& apt-get install -y ccache software-properties-common git wget sudo vim; \
&& apt-get install -y ccache software-properties-common git curl wget sudo vim; \
else \
dnf install -y git wget sudo; \
dnf install -y git curl wget sudo vim; \
fi \
&& python3 --version && python3 -m pip --version
# Workaround for https://github.com/openai/triton/issues/2507 and
# https://github.com/pytorch/pytorch/issues/107960 -- hopefully
# this won't be needed for future versions of this docker image
# or future versions of triton.
RUN ldconfig /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/
# Install uv for faster pip installs if not existed
RUN --mount=type=cache,target=/root/.cache/uv \
python3 -m pip install uv==0.8.4
if ! python3 -m uv --version >/dev/null 2>&1; then \
python3 -m pip install uv==0.8.4; \
fi
ENV UV_HTTP_TIMEOUT=500
ENV UV_INDEX_STRATEGY="unsafe-best-match"
# Use copy mode to avoid hardlink failures with Docker cache mounts
@ -76,15 +104,15 @@ ENV UV_LINK_MODE=copy
WORKDIR /workspace
# Install build and runtime dependencies
# install build and runtime dependencies
COPY requirements/common.txt requirements/common.txt
COPY use_existing_torch.py use_existing_torch.py
COPY pyproject.toml pyproject.toml
# Install build and runtime dependencies without stable torch version
# install build and runtime dependencies without stable torch version
RUN python3 use_existing_torch.py
# Default mount file as placeholder, this just avoid the mount error
# default mount file as placeholder, this just avoid the mount error
# change to a different vllm folder if this does not exist anymore
ARG TORCH_WHEELS_PATH="./requirements"
ARG PINNED_TORCH_VERSION
@ -116,36 +144,56 @@ RUN --mount=type=cache,target=/root/.cache/uv \
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system -r requirements/common.txt
# Must put before installing xformers, so it can install the correct version of xfomrers.
ARG xformers_cuda_arch_list='7.5;8.0+PTX;9.0a'
ENV TORCH_CUDA_ARCH_LIST=${xformers_cuda_arch_list}
ARG max_jobs=16
ENV MAX_JOBS=${max_jobs}
RUN --mount=type=cache,target=/root/.cache/uv bash - <<'BASH'
export TORCH_CUDA_ARCH_LIST='7.5 8.0+PTX 9.0a'
git clone https://github.com/facebookresearch/xformers.git
RUN echo ${TORCH_CUDA_ARCH_LIST}
RUN echo ${MAX_JOBS}
RUN pip freeze | grep -E 'ninja'
pushd xformers
git checkout v0.0.32.post2
git submodule update --init --recursive
python3 setup.py bdist_wheel --dist-dir=../xformers-dist --verbose
popd
# Build xformers with cuda and torch nightly/wheel
# following official xformers guidance: https://github.com/facebookresearch/xformers#build
# sha for https://github.com/facebookresearch/xformers/tree/v0.0.32.post2
ARG XFORMERS_COMMIT=5d4b92a5e5a9c6c6d4878283f47d82e17995b468
ENV CCACHE_DIR=/root/.cache/ccache
rm -rf xformers
BASH
RUN --mount=type=cache,target=/root/.cache/ccache \
--mount=type=cache,target=/root/.cache/uv \
echo 'git clone xformers...' \
&& git clone https://github.com/facebookresearch/xformers.git --recursive \
&& cd xformers \
&& git checkout ${XFORMERS_COMMIT} \
&& git submodule update --init --recursive \
&& echo 'finish git clone xformers...' \
&& rm -rf build \
&& python3 setup.py bdist_wheel --dist-dir=../xformers-dist --verbose \
&& cd .. \
&& rm -rf xformers
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system xformers-dist/*.whl
uv pip install --system xformers-dist/*.whl --verbose
# Build can take a long time, and the torch nightly version fetched from url can be different in next docker stage.
# track the nightly torch version used in the build, when we set up runtime environment we can make sure the version is the same
RUN uv pip freeze | grep -i '^torch\|^torchvision\|^torchaudio' > torch_build_versions.txt
RUN cat torch_build_versions.txt
RUN pip freeze | grep -E 'torch|xformers|torchvision|torchaudio'
#################### BASE BUILD IMAGE ####################
#################### WHEEL BUILD IMAGE ####################
# Image used to build vllm wheel
FROM base AS build
ARG TARGETPLATFORM
COPY . .
RUN python3 use_existing_torch.py
RUN --mount=type=cache,target=/root/.cache/uv \
@ -155,30 +203,28 @@ ARG GIT_REPO_CHECK=0
RUN --mount=type=bind,source=.git,target=.git \
if [ "$GIT_REPO_CHECK" != "0" ]; then bash tools/check_repo.sh ; fi
# Max jobs used by Ninja to build extensions
ARG max_jobs=16
ENV MAX_JOBS=${max_jobs}
ARG nvcc_threads=8
ARG nvcc_threads=4
ENV NVCC_THREADS=$nvcc_threads
ARG torch_cuda_arch_list='8.0;8.6;8.9;9.0'
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}
ARG USE_SCCACHE
ARG SCCACHE_BUCKET_NAME=vllm-build-sccache
ARG SCCACHE_REGION_NAME=us-west-2
ARG SCCACHE_S3_NO_CREDENTIALS=0
# Use sccache to speed up compilation
# if USE_SCCACHE is set, use sccache to speed up compilation
RUN --mount=type=cache,target=/root/.cache/uv \
--mount=type=bind,source=.git,target=.git \
if [ "$USE_SCCACHE" = "1" ]; then \
echo "Installing sccache..."; \
if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
SCCACHE_ARCHIVE="sccache-v0.8.1-aarch64-unknown-linux-musl"; \
else \
SCCACHE_ARCHIVE="sccache-v0.8.1-x86_64-unknown-linux-musl"; \
fi; \
curl -L -o sccache.tar.gz "https://github.com/mozilla/sccache/releases/download/v0.8.1/${SCCACHE_ARCHIVE}.tar.gz" \
echo "Installing sccache..." \
&& curl -L -o sccache.tar.gz https://github.com/mozilla/sccache/releases/download/v0.8.1/sccache-v0.8.1-x86_64-unknown-linux-musl.tar.gz \
&& tar -xzf sccache.tar.gz \
&& sudo mv "${SCCACHE_ARCHIVE}"/sccache /usr/bin/sccache \
&& rm -rf sccache.tar.gz "${SCCACHE_ARCHIVE}" \
&& sudo mv sccache-v0.8.1-x86_64-unknown-linux-musl/sccache /usr/bin/sccache \
&& rm -rf sccache.tar.gz sccache-v0.8.1-x86_64-unknown-linux-musl \
&& export SCCACHE_BUCKET=${SCCACHE_BUCKET_NAME} \
&& export SCCACHE_REGION=${SCCACHE_REGION_NAME} \
&& export SCCACHE_S3_NO_CREDENTIALS=${SCCACHE_S3_NO_CREDENTIALS} \
@ -190,9 +236,6 @@ RUN --mount=type=cache,target=/root/.cache/uv \
&& sccache --show-stats; \
fi
ARG torch_cuda_arch_list='8.0 8.6 8.9 9.0'
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}
ARG vllm_target_device="cuda"
ENV VLLM_TARGET_DEVICE=${vllm_target_device}
ENV CCACHE_DIR=/root/.cache/ccache
@ -206,10 +249,17 @@ RUN --mount=type=cache,target=/root/.cache/ccache \
export VLLM_DOCKER_BUILD_CONTEXT=1 && \
python3 setup.py bdist_wheel --dist-dir=vllm-dist --py-limited-api=cp38; \
fi
RUN echo "[INFO] Listing current directory:" && \
ls -al && \
echo "[INFO] Showing torch_build_versions.txt content:" && \
cat torch_build_versions.txt
#################### WHEEL BUILD IMAGE ####################
################### VLLM INSTALLED IMAGE ####################
# Setup clean environment for vLLM for test and api server using ubuntu22.04 with AOT flashinfer
FROM ${FINAL_BASE_IMAGE} AS vllm-base
USER root
@ -217,7 +267,7 @@ ARG CUDA_VERSION
ARG PYTHON_VERSION
ARG GET_PIP_URL
# Only work with PyTorch manylinux builder
# TODO (huydhn): Only work with PyTorch manylinux builder
ENV PATH="/opt/python/cp312-cp312/bin:${PATH}"
# prepare for environment starts
@ -226,19 +276,20 @@ WORKDIR /workspace
# Install Python and other dependencies
RUN if command -v apt-get >/dev/null; then \
apt-get update -y \
&& apt-get install -y ccache software-properties-common git sudo vim python3-pip; \
&& apt-get install -y ccache software-properties-common git curl wget sudo vim \
&& add-apt-repository -y ppa:deadsnakes/ppa \
&& apt-get update -y \
&& apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-venv \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1 \
&& update-alternatives --set python3 /usr/bin/python${PYTHON_VERSION} \
&& ln -sf /usr/bin/python${PYTHON_VERSION}-config /usr/bin/python3-config \
&& curl -sS ${GET_PIP_URL} | python${PYTHON_VERSION}; \
else \
dnf install -y git wget sudo; \
dnf install -y git curl wget sudo vim; \
fi \
&& curl -LsSf https://astral.sh/uv/install.sh | sh \
&& $HOME/.local/bin/uv venv /opt/venv --python ${PYTHON_VERSION} \
&& rm -f /usr/bin/python3 /usr/bin/python3-config /usr/bin/pip \
&& ln -s /opt/venv/bin/python3 /usr/bin/python3 \
&& ln -s /opt/venv/bin/python3-config /usr/bin/python3-config \
&& ln -s /opt/venv/bin/pip /usr/bin/pip \
&& python3 --version && python3 -m pip --version
# Get the torch versions, and whls used in previous stage
# Get the torch versions, and whls used in previous stagtes for consistency
COPY --from=base /workspace/torch_build_versions.txt ./torch_build_versions.txt
COPY --from=base /workspace/xformers-dist /wheels/xformers
COPY --from=build /workspace/vllm-dist /wheels/vllm
@ -247,29 +298,27 @@ RUN echo "[INFO] Listing current directory before torch install step:" && \
echo "[INFO] Showing torch_build_versions.txt content:" && \
cat torch_build_versions.txt
# Workaround for https://github.com/openai/triton/issues/2507 and
# https://github.com/pytorch/pytorch/issues/107960 -- hopefully
# this won't be needed for future versions of this docker image
# or future versions of triton.
RUN ldconfig /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/
# Install uv for faster pip installs if not existed
RUN --mount=type=cache,target=/root/.cache/uv \
python3 -m pip install uv==0.8.4
if ! python3 -m uv --version > /dev/null 2>&1; then \
python3 -m pip install uv==0.8.4; \
fi
ENV UV_HTTP_TIMEOUT=500
ENV UV_INDEX_STRATEGY="unsafe-best-match"
# Use copy mode to avoid hardlink failures with Docker cache mounts
ENV UV_LINK_MODE=copy
# Install build and runtime dependencies, this is needed for flashinfer install
COPY requirements/build.txt requirements/build.txt
COPY use_existing_torch.py use_existing_torch.py
RUN python3 use_existing_torch.py
RUN cat requirements/build.txt
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system -r requirements/build.txt
# Default mount file as placeholder, this just avoid the mount error
ARG TORCH_WHEELS_PATH="./requirements"
# Install torch, torchaudio and torchvision. If TORCH_WHEELS_PATH is default
# to ./requirements, it will pull the nightly versions using pip. Otherwise,
# it will use the local wheels from TORCH_WHEELS_PATH
# Install torch, torchaudio and torchvision
# if TORCH_WHEELS_PATH is default "./requirements", it will pull the nightly versions using pip using torch_build_versions.txt
# otherwise, it will use the whls from TORCH_WHEELS_PATH from the host machine
RUN --mount=type=bind,source=${TORCH_WHEELS_PATH},target=/dist \
--mount=type=cache,target=/root/.cache/uv \
if [ -n "$TORCH_WHEELS_PATH" ] && [ "$TORCH_WHEELS_PATH" != "./requirements" ] && [ -d "/dist" ] && ls /dist/torch*.whl >/dev/null 2>&1; then \
@ -291,13 +340,19 @@ RUN --mount=type=cache,target=/root/.cache/uv \
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system /wheels/xformers/*.whl --verbose
# Build FlashInfer from source
# Build flashinfer from source.
ARG torch_cuda_arch_list='8.0;8.9;9.0a;10.0a;12.0'
# install package for build flashinfer
# see issue: https://github.com/flashinfer-ai/flashinfer/issues/738
RUN pip install build==1.3.0
RUN pip freeze | grep -E 'setuptools|packaging|build'
ENV TORCH_CUDA_ARCH_LIST=${torch_cuda_arch_list}
# Build flashinfer for torch nightly from source around 10 mins
ARG FLASHINFER_GIT_REPO="https://github.com/flashinfer-ai/flashinfer.git"
# Keep this in sync with https://github.com/vllm-project/vllm/blob/main/requirements/cuda.txt
ARG FLASHINFER_GIT_REF="v0.2.14.post1"
RUN --mount=type=cache,target=/root/.cache/uv \
git clone --depth 1 --recursive --shallow-submodules \
--branch ${FLASHINFER_GIT_REF} \
@ -309,7 +364,7 @@ RUN --mount=type=cache,target=/root/.cache/uv \
&& cd .. \
&& rm -rf flashinfer
# Install FlashInfer
# install flashinfer python
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system wheels/flashinfer/*.whl --verbose
@ -319,6 +374,49 @@ RUN uv pip freeze | grep -i '^torch\|^torchvision\|^torchaudio\|^xformers\|^vllm
################### VLLM INSTALLED IMAGE ####################
#################### UNITTEST IMAGE #############################
FROM vllm-base as test
ENV UV_HTTP_TIMEOUT=500
ENV UV_INDEX_STRATEGY="unsafe-best-match"
# Use copy mode to avoid hardlink failures with Docker cache mounts
ENV UV_LINK_MODE=copy
COPY tests/ tests/
COPY examples examples
COPY benchmarks benchmarks
COPY ./vllm/collect_env.py .
COPY requirements/common.txt requirements/common.txt
COPY use_existing_torch.py use_existing_torch.py
COPY pyproject.toml pyproject.toml
# Install build and runtime dependencies without stable torch version
COPY requirements/nightly_torch_test.txt requirements/nightly_torch_test.txt
RUN python3 use_existing_torch.py
# install packages
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system -r requirements/common.txt
# enable fast downloads from hf (for testing)
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system hf_transfer
ENV HF_HUB_ENABLE_HF_TRANSFER 1
# install development dependencies (for testing)
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system -e tests/vllm_test_utils
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system -r requirements/nightly_torch_test.txt
# Logging to confirm the torch versions
RUN pip freeze | grep -E 'torch|xformers|vllm|flashinfer'
# Logging to confirm all the packages are installed
RUN pip freeze
#################### UNITTEST IMAGE #############################
#################### EXPORT STAGE ####################
FROM scratch as export-wheels

View File

@ -1,22 +0,0 @@
import glob
import os
requires_files = glob.glob("requirements/*.txt")
requires_files += ["pyproject.toml"]
for file in requires_files:
if not os.path.exists(file):
print(f"!!! skipping missing {file}")
continue
print(f">>> cleaning {file}")
with open(file) as f:
lines = f.readlines()
if "torch" in "".join(lines).lower():
print("removed:")
with open(file, "w") as f:
for line in lines:
if "torch" not in line.lower():
f.write(line)
print(f"<<< done cleaning {file}")
print()

3
.github/labeler.yml vendored
View File

@ -130,6 +130,3 @@
- torch/csrc/inductor/aoti_include/**
- torchgen/aoti/**
- torchgen/gen_aoti_c_shim.py
"ciflow/vllm":
- .github/ci_commit_pins/vllm.txt

View File

@ -525,21 +525,6 @@
- Lint
- pull
- name: typechecking
patterns:
- 'pyrefly.toml'
- 'mypy.ini'
- 'mypy-strict.ini'
approved_by:
- lolpack
- maggiemoss
- ndmitchell
- kinto0
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: superuser
patterns:
- '*'

View File

@ -1,47 +1,41 @@
tracking_issue: 24422
ciflow_tracking_issue: 64124
ciflow_push_tags:
- ciflow/b200
- ciflow/b200-symm-mem
- ciflow/b200-distributed
- ciflow/binaries
- ciflow/binaries_libtorch
- ciflow/binaries_wheel
- ciflow/h100
- ciflow/h100-cutlass-backend
- ciflow/h100-distributed
- ciflow/h100-symm-mem
- ciflow/triton_binaries
- ciflow/inductor
- ciflow/inductor-cu126
- ciflow/inductor-micro-benchmark
- ciflow/inductor-micro-benchmark-cpu-x86
- ciflow/inductor-perf-compare
- ciflow/inductor-perf-test-nightly-rocm-mi300
- ciflow/inductor-perf-test-nightly-rocm-mi355
- ciflow/inductor-perf-test-nightly-x86-zen
- ciflow/inductor-periodic
- ciflow/inductor-rocm
- ciflow/inductor-perf-test-nightly-rocm
- ciflow/inductor-perf-compare
- ciflow/inductor-micro-benchmark
- ciflow/inductor-micro-benchmark-cpu-x86
- ciflow/inductor-perf-test-nightly-x86-zen
- ciflow/inductor-cu126
- ciflow/linux-aarch64
- ciflow/mps
- ciflow/nightly
- ciflow/op-benchmark
- ciflow/periodic
- ciflow/periodic-rocm-mi300
- ciflow/pull
- ciflow/quantization-periodic
- ciflow/riscv64
- ciflow/rocm
- ciflow/rocm-mi300
- ciflow/rocm-mi355
- ciflow/s390
- ciflow/riscv64
- ciflow/slow
- ciflow/torchbench
- ciflow/triton_binaries
- ciflow/trunk
- ciflow/unstable
- ciflow/vllm
- ciflow/win-arm64
- ciflow/xpu
- ciflow/vllm
- ciflow/torchbench
- ciflow/op-benchmark
- ciflow/pull
- ciflow/h100
- ciflow/h100-distributed
- ciflow/win-arm64
- ciflow/h100-symm-mem
- ciflow/h100-cutlass-backend
retryable_workflows:
- pull
- trunk
@ -50,4 +44,4 @@ retryable_workflows:
- inductor-A100-perf-nightly
labeler_config: labeler.yml
label_to_label_config: label_to_label.yml
mergebot: true
mergebot: True

View File

@ -0,0 +1,36 @@
boto3==1.35.42
cmake==3.27.*
expecttest==0.3.0
fbscribelogger==0.1.7
filelock==3.18.0
hypothesis==6.56.4
librosa>=0.6.2
mpmath==1.3.0
networkx==2.8.7
ninja==1.10.2.4
numba==0.59.0
numpy==1.26.4
opt-einsum>=3.3
optree==0.13.0
packaging==23.1
parameterized==0.8.1
pillow==10.3.0
protobuf==5.29.4
psutil==5.9.8
pygments==2.15.0
pytest-cpp==2.3.0
pytest-flakefinder==1.1.0
pytest-rerunfailures==10.3
pytest-subtests==0.13.1
pytest-xdist==3.3.1
pytest==7.3.2
pyyaml==6.0.2
scipy==1.12.0
setuptools==72.1.0
sympy==1.13.3
tlparse==0.4.0
tensorboard==2.13.0
typing-extensions==4.12.2
unittest-xml-reporting<=3.2.0,>=2.0.0
xdoctest==1.1.0
z3-solver==4.15.1.0

View File

@ -39,9 +39,7 @@ def main() -> None:
pull_request_label_names = [label.name for label in pull_request_labels]
issue_label_names = [label.name for label in issue_labels]
labels_to_add = [
label
for label in issue_label_names
if label not in pull_request_label_names and label != "actionable"
label for label in issue_label_names if label not in pull_request_label_names
]
if not labels_to_add:
print("The pull request already has the same labels.")

Binary file not shown.

View File

@ -41,9 +41,9 @@ SUPPORTED_PERIODICAL_MODES: dict[str, Callable[[Optional[str]], bool]] = {
}
# The link to the published list of disabled jobs
DISABLED_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/disabled-jobs.json"
DISABLED_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/disabled-jobs.json?versionId=hjktHz2WOejHpxKpkqpDknTt5rMTM9KK"
# and unstable jobs
UNSTABLE_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json"
UNSTABLE_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json?versionId=wrjdvvQTJxgvMO.rGw5MEuMsj6XbjuV7"
# Some constants used to handle disabled and unstable jobs
JOB_NAME_SEP = "/"
@ -502,7 +502,6 @@ def perform_misc_tasks(
job_name: str,
pr_body: str,
branch: Optional[str] = None,
tag: Optional[str] = None,
) -> None:
"""
In addition to apply the filter logic, the script also does the following
@ -510,11 +509,7 @@ def perform_misc_tasks(
"""
set_output(
"keep-going",
branch == MAIN_BRANCH
or bool(tag and re.match(r"^trunk/[a-f0-9]{40}$", tag))
# Pattern for tags created via manual run on HUD
or bool(tag and re.match(r"^ciflow/[^/]+/[a-f0-9]{40}$", tag))
or check_for_setting(labels, pr_body, "keep-going"),
branch == MAIN_BRANCH or check_for_setting(labels, pr_body, "keep-going"),
)
set_output(
"ci-verbose-test-logs",
@ -639,7 +634,6 @@ def main() -> None:
job_name=args.job_name,
pr_body=pr_body if pr_body else "",
branch=args.branch,
tag=tag,
)
# Set the filtered test matrix as the output

View File

@ -16,23 +16,21 @@ from typing import Optional
# NOTE: Please also update the CUDA sources in `PIP_SOURCES` in tools/nightly.py when changing this
CUDA_ARCHES = ["12.6", "12.8", "12.9", "13.0"]
CUDA_ARCHES = ["12.6", "12.8", "13.0"]
CUDA_STABLE = "12.8"
CUDA_ARCHES_FULL_VERSION = {
"12.6": "12.6.3",
"12.8": "12.8.1",
"12.9": "12.9.1",
"13.0": "13.0.0",
}
CUDA_ARCHES_CUDNN_VERSION = {
"12.6": "9",
"12.8": "9",
"12.9": "9",
"13.0": "9",
}
# NOTE: Please also update the ROCm sources in `PIP_SOURCES` in tools/nightly.py when changing this
ROCM_ARCHES = ["6.4", "7.0"]
ROCM_ARCHES = ["6.3", "6.4"]
XPU_ARCHES = ["xpu"]
@ -40,7 +38,7 @@ CPU_AARCH64_ARCH = ["cpu-aarch64"]
CPU_S390X_ARCH = ["cpu-s390x"]
CUDA_AARCH64_ARCHES = ["12.6-aarch64", "12.8-aarch64", "12.9-aarch64", "13.0-aarch64"]
CUDA_AARCH64_ARCHES = ["12.6-aarch64", "12.8-aarch64", "13.0-aarch64"]
PYTORCH_EXTRA_INSTALL_REQUIREMENTS = {
@ -56,7 +54,7 @@ PYTORCH_EXTRA_INSTALL_REQUIREMENTS = {
"nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' | "
"nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' | "
"nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' | "
"nvidia-nvshmem-cu12==3.3.24; platform_system == 'Linux' | "
"nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' | "
"nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' | "
"nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' | "
"nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux'"
@ -73,28 +71,11 @@ PYTORCH_EXTRA_INSTALL_REQUIREMENTS = {
"nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' | "
"nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' | "
"nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' | "
"nvidia-nvshmem-cu12==3.3.24; platform_system == 'Linux' | "
"nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' | "
"nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' | "
"nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' | "
"nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux'"
),
"12.9": (
"nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'"
),
"13.0": (
"nvidia-cuda-nvrtc==13.0.48; platform_system == 'Linux' | "
"nvidia-cuda-runtime==13.0.48; platform_system == 'Linux' | "
@ -241,11 +222,7 @@ def generate_libtorch_matrix(
arches += CUDA_ARCHES
arches += ROCM_ARCHES
elif os == "windows":
# TODO (huydhn): Only build CUDA 12.9 for Linux. This logic is to be cleaned up
# in 2.10
windows_cuda_arches = CUDA_ARCHES.copy()
windows_cuda_arches.remove("12.9")
arches += windows_cuda_arches
arches += CUDA_ARCHES
if libtorch_variants is None:
libtorch_variants = [
"shared-with-deps",
@ -309,11 +286,7 @@ def generate_wheels_matrix(
if os == "linux":
arches += CUDA_ARCHES + ROCM_ARCHES + XPU_ARCHES
elif os == "windows":
# TODO (huydhn): Only build CUDA 12.9 for Linux. This logic is to be cleaned up
# in 2.10
windows_cuda_arches = CUDA_ARCHES.copy()
windows_cuda_arches.remove("12.9")
arches += windows_cuda_arches + XPU_ARCHES
arches += CUDA_ARCHES + XPU_ARCHES
elif os == "linux-aarch64":
# Separate new if as the CPU type is different and
# uses different build/test scripts
@ -349,7 +322,7 @@ def generate_wheels_matrix(
# cuda linux wheels require PYTORCH_EXTRA_INSTALL_REQUIREMENTS to install
if (
arch_version in ["13.0", "12.9", "12.8", "12.6"]
arch_version in ["13.0", "12.8", "12.6"]
and os == "linux"
or arch_version in CUDA_AARCH64_ARCHES
):
@ -413,6 +386,5 @@ def generate_wheels_matrix(
validate_nccl_dep_consistency("13.0")
validate_nccl_dep_consistency("12.9")
validate_nccl_dep_consistency("12.8")
validate_nccl_dep_consistency("12.6")

View File

@ -127,6 +127,53 @@ LINUX_BINARY_BUILD_WORFKLOWS = [
),
]
ROCM_SMOKE_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.LINUX,
package_type="manywheel",
build_variant="rocm",
build_configs=generate_binary_build_matrix.generate_wheels_matrix(
OperatingSystem.LINUX,
arches=["6.4"],
python_versions=["3.10"],
),
ciflow_config=CIFlowConfig(
labels={
LABEL_CIFLOW_BINARIES,
LABEL_CIFLOW_BINARIES_WHEEL,
LABEL_CIFLOW_ROCM,
},
isolated_workflow=True,
),
branches="main",
),
]
LINUX_BINARY_SMOKE_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.LINUX,
package_type="manywheel",
build_configs=generate_binary_build_matrix.generate_wheels_matrix(
OperatingSystem.LINUX,
arches=["12.8"],
python_versions=["3.12"],
),
branches="main",
),
BinaryBuildWorkflow(
os=OperatingSystem.LINUX,
package_type="libtorch",
build_variant=generate_binary_build_matrix.RELEASE,
build_configs=generate_binary_build_matrix.generate_libtorch_matrix(
OperatingSystem.LINUX,
generate_binary_build_matrix.RELEASE,
arches=["cpu"],
libtorch_variants=["shared-with-deps"],
),
branches="main",
),
]
WINDOWS_BINARY_BUILD_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
@ -212,6 +259,39 @@ WINDOWS_BINARY_BUILD_WORKFLOWS = [
),
]
WINDOWS_BINARY_SMOKE_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
package_type="libtorch",
build_variant=generate_binary_build_matrix.RELEASE,
build_configs=generate_binary_build_matrix.generate_libtorch_matrix(
OperatingSystem.WINDOWS,
generate_binary_build_matrix.RELEASE,
arches=["cpu"],
libtorch_variants=["shared-with-deps"],
),
branches="main",
ciflow_config=CIFlowConfig(
isolated_workflow=True,
),
),
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
package_type="libtorch",
build_variant=generate_binary_build_matrix.DEBUG,
build_configs=generate_binary_build_matrix.generate_libtorch_matrix(
OperatingSystem.WINDOWS,
generate_binary_build_matrix.DEBUG,
arches=["cpu"],
libtorch_variants=["shared-with-deps"],
),
branches="main",
ciflow_config=CIFlowConfig(
isolated_workflow=True,
),
),
]
MACOS_BINARY_BUILD_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.MACOS_ARM64,
@ -292,10 +372,23 @@ def main() -> None:
jinja_env.get_template("linux_binary_build_workflow.yml.j2"),
S390X_BINARY_BUILD_WORKFLOWS,
),
(
# Give rocm it's own workflow file
jinja_env.get_template("linux_binary_build_workflow.yml.j2"),
ROCM_SMOKE_WORKFLOWS,
),
(
jinja_env.get_template("linux_binary_build_workflow.yml.j2"),
LINUX_BINARY_SMOKE_WORKFLOWS,
),
(
jinja_env.get_template("windows_binary_build_workflow.yml.j2"),
WINDOWS_BINARY_BUILD_WORKFLOWS,
),
(
jinja_env.get_template("windows_binary_build_workflow.yml.j2"),
WINDOWS_BINARY_SMOKE_WORKFLOWS,
),
(
jinja_env.get_template("macos_binary_build_workflow.yml.j2"),
MACOS_BINARY_BUILD_WORKFLOWS,

View File

@ -18,7 +18,6 @@ class GitHubComment:
body_text: str
created_at: str
author_login: str
author_url: Optional[str]
author_association: str
editor_login: Optional[str]
database_id: int

Some files were not shown because too many files have changed in this diff Show More