4481 Commits

Author SHA1 Message Date
302df2ac5d [vllm hash update] update the pinned vllm hash (#162115)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162115
Approved by: https://github.com/pytorchbot
2025-09-04 04:26:34 +00:00
36d207fcaa [CI] viable strict upgrade: Explicitly name which linux binary wheels should block (#162100)
Reason:
rocm binary builds should not block viable strict upgrade.  It is queuing/canceled so viable strict is 1.2 days old

Tested by mangling the workflow file to get to the actual call of the python script `python ../test-infra/tools/scripts/fetch_latest_green_commit.py --required-checks '["pull", "trunk", "lint", "^linux-binary-manywheel$", "^linux-binary-libtorch-release$", "linux-aarch64"]' --viable-strict-branch viable/strict --main-branch master`, which I then ran locally where I have credentials.  It returned d64718503728001a1e78168fd7f2d4ff23e57285 which is green.  Without this change, it returns 5e5870e858f60ff4bf87d03f3592097e934a9580, which is pretty old

The other solution would have been to mark it as unstable I think

Side note, why is it master and how is it working like that

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162100
Approved by: https://github.com/huydhn
2025-09-03 22:38:32 +00:00
0af70e2353 Modify ROCm MI2xx-based workflows to run on cron schedule (#162103)
To mitigate queueing on MI2xx runners since Cirrascale runners are offline. Match cron schedule of periodic.yml

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162103
Approved by: https://github.com/jeffdaily, https://github.com/seemethere
2025-09-03 21:51:03 +00:00
cd529b686d [ROCm] Use MI325 (gfx942) runners for binary smoke testing (#162044)
### Motivation

* MI250 Cirrascale runners are currently having network timeout leading to huge queueing of binary smoke test jobs:
<img width="483" height="133" alt="image" src="https://github.com/user-attachments/assets/17293002-78ad-4fc9-954f-ddd518bf0a43" />

* MI210 Hollywood runners (with runner names such as `pytorch-rocm-hw-*`) are not suitable for these jobs, because they seem to take much longer to download artifacts: https://github.com/pytorch/pytorch/pull/153287#issuecomment-2918420345 (this is why these jobs were specifically targeting Cirrascale runners). However, it doesn't seem like Cirrascale runners are necessarily doing much better either e.g. [this recent build](https://github.com/pytorch/pytorch/actions/runs/17332256791/job/49231006755).
* Moving to MI325 runners should address the stability part at least, while also reducing load on limited MI2xx runner capacity.
* However, I'm not sure if the MI325 runners will do any better on the artifact download part (this may need to be investigated more) cc @amdfaa

* Also removing `ciflow/binaries` and `ciflow/binaries_wheel` label/tag triggers for `generated-linux-binary-manywheel-rocm-main.yml` because we already trigger ROCm binary build/test jobs via these labels/tags in `generated-linux-binary-manywheel-nightly.yml`. And for developers who want to trigger ROCm binary build/test jobs on their PRs, they can use the `ciflow/rocm-mi300` label/tag as per this PR.

### TODOs (cc @amdfaa):
* Check that the workflow runs successfully on the MI325 runners in this PR. Note how long the test jobs take esp. the "Download Build Artifacts" step
* Once this PR is merged, clear the queue of jobs targeting `linux.rocm.gpu.mi250`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162044
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-09-03 18:34:07 +00:00
71992dd805 S390x: build nightly binaries for new pythons (#161920)
Enable python 3.13t, 3.14 and 3.14t on s390x for nightly binaries

Fixes #161515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161920
Approved by: https://github.com/malfet
2025-09-03 17:38:38 +00:00
8875d6e394 [vllm hash update] update the pinned vllm hash (#161929)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161929
Approved by: https://github.com/pytorchbot
2025-09-03 04:26:38 +00:00
09d2f1b631 [audio hash update] update the pinned audio hash (#161928)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161928
Approved by: https://github.com/pytorchbot
2025-09-03 04:22:55 +00:00
793fc12aff [CD] Fix setup-xpu action issue (#161934)
Fix XPU CD test failure, refer https://github.com/pytorch/pytorch/actions/runs/17370923627/job/49315624191
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161934
Approved by: https://github.com/atalman
2025-09-02 16:03:44 +00:00
1f820de639 [ci] Increase shards for linux-jammy-py3.10-clang18-asan on pull.yml to 7 (#161968)
[ci] Increase shards for linux-jammy-py3.10-clang18-asan to 7
2025-09-02 14:08:47 +02:00
d232a95d4a [BE] Consolidate inductor benchmark Docker images and rename jobs (#161536)
We have 4 different version of inductor benchmark Docker images used in CI at the moment:

1. `pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks` is used by almost all inductor jobs including nightly benchmark
2. `pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc9-inductor-benchmarks` runs inductor unit tests with python 3.12
3. `pytorch-linux-jammy-cuda12.8-cudnn9-py3.13-gcc9-inductor-benchmarks` runs inductor unit tests with python 3.13
4. `pytorch-linux-jammy-py3-gcc11-inductor-benchmarks` runs inductor unit tests on CPU

My proposal here is to clean up (2) and (3) and to keep (1) under the same setup from https://ghcr.io/pytorch/torchbench.  Simplicity is the key here as inductor workflows are getting more and more complex:
1. Unit tests for Python variant like 3.12 and 3.13 were useful when they were first added to CI.  They are much less useful now.  [Flambeau](https://hud.pytorch.org/flambeau/s/3876ec7b-43f0-42c6-bfbf-899035e5bb77) shows a 0.97 correlation between them.  And we are also moving to 3.14 nowadays.  I want to choose 3.12 for (1), but will do this separately.  This is also what TorchBench and vLLM are using on CI.
1. We are gradually cleaning up 3.9 on CI https://github.com/pytorch/pytorch/issues/161167

Another BE change here is to rename the jobs various inductor workflows because I think names like `linux-jammy-cuda12_8-py3_10-gcc9-inductor-build` is too long and confusing to look at, better just use human-friendly names like `inductor-build`.  Other information is already spelled out in the build environment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161536
Approved by: https://github.com/zou3519
2025-09-01 19:07:08 +00:00
fefee08164 [CD] Add CUDA 13.0 Windows build (#161663)
Test CUDA 13.0 windows build

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161663
Approved by: https://github.com/malfet, https://github.com/atalman
2025-09-01 15:27:17 +00:00
2ba65472dd [xla hash update] update the pinned xla hash (#161396)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned xla hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161396
Approved by: https://github.com/pytorchbot
2025-09-01 11:43:03 +00:00
67c31dcd36 [vllm hash update] update the pinned vllm hash (#161867)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161867
Approved by: https://github.com/pytorchbot
2025-09-01 04:37:13 +00:00
f612045ce1 [vllm hash update] update the pinned vllm hash (#161835)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161835
Approved by: https://github.com/pytorchbot
2025-08-31 04:24:04 +00:00
76f81b56d3 [audio hash update] update the pinned audio hash (#161836)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161836
Approved by: https://github.com/pytorchbot
2025-08-30 04:23:04 +00:00
0af56fc33e Cleanup stale submodule directories after checkout (#161748)
Fixes https://github.com/pytorch/pytorch/issues/161510

Test plan:
```
% cd third_party/kineto
% git checkout fe80f9319479265f7a208e615e16a363b993d50c; git submodule update --init --recursive
M	libkineto/third_party/dynolog
M	libkineto/third_party/fmt
M	libkineto/third_party/googletest
Previous HEAD position was 5e75018 Fix Local Time on Windows Builds (#1104)
HEAD is now at fe80f93 Fix MSVC Error (#1134)
Submodule path 'libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
Submodule path 'libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
Submodule path 'libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
Submodule path 'libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
% git checkout 5e75018; git submodule update --init --recursive
M	libkineto/third_party/dynolog
M	libkineto/third_party/fmt
M	libkineto/third_party/googletest
Previous HEAD position was fe80f93 Fix MSVC Error (#1134)
HEAD is now at 5e75018 Fix Local Time on Windows Builds (#1104)
warning: unable to rmdir 'third_party/prometheus-cpp': Directory not empty
Submodule path 'libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e'
Submodule path 'libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850'
Submodule path 'libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164'
Submodule path 'libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347'
% cd ../..
% git status
HEAD detached from 649e397c6de
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)
	modified:   third_party/kineto (untracked content)

% time git submodule foreach --recursive git clean -ffdx
...
git submodule foreach --recursive git clean -ffdx  0.47s user 0.96s system 88% cpu 1.625 total
% git status
HEAD detached from 649e397c6de
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161748
Approved by: https://github.com/atalman
2025-08-30 01:30:44 +00:00
6db872fa2c Revert "Cleanup stale submodule directories after checkout (#161748)"
This reverts commit 0e45023cf9cbe1cf18279c1b0d391ea9464e7731.

Reverted https://github.com/pytorch/pytorch/pull/161748 on behalf of https://github.com/malfet due to I still see the same failures, and could not understand, from the log whether those checks are running on not ([comment](https://github.com/pytorch/pytorch/pull/161748#issuecomment-3238791895))
2025-08-30 01:04:11 +00:00
0f81e7f640 [CI] Fix XPU ci test permission issue (#161389)
Due to new test runners, refer https://github.com/pytorch/pytorch/actions/runs/17161094208/job/48694776064#step:2:124
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161389
Approved by: https://github.com/atalman
2025-08-30 00:03:59 +00:00
0e2c8af5a6 [CI/CD] Windows set git config --global core.ignorecase false (#161813)
Make sure git on windows have core.ignorecase false

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161813
Approved by: https://github.com/malfet
2025-08-29 23:04:43 +00:00
037f3bd475 [CI] Migrate XPU build and test to python 3.10 (#161708)
Follow #161167
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161708
Approved by: https://github.com/malfet
2025-08-29 22:31:39 +00:00
6e548c1a87 Revert "[CI] Migrate XPU build and test to python 3.10 (#161708)"
This reverts commit 2a70d98abf8256d3d768eff028fca20198579824.

Reverted https://github.com/pytorch/pytorch/pull/161708 on behalf of https://github.com/ZainRizvi due to Sorry but this is causing rocm jobs to fail. See: test/inductor/test_max_autotune.py::TestMaxAutotuneSubproc::test_max_autotune_addmm_search_space_EXHAUSTIVE_dynamic_True [GH job link](https://github.com/pytorch/pytorch/actions/runs/17303310877/job/49125664617) [HUD commit link](2a70d98abf) ([comment](https://github.com/pytorch/pytorch/pull/161708#issuecomment-3238359944))
2025-08-29 21:49:15 +00:00
303f514d5b [CI] Add basic CUDA 13.0 periodic test (#161013)
https://github.com/pytorch/pytorch/issues/159779

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161013
Approved by: https://github.com/atalman

Co-authored-by: Andrey Talman <atalman@fb.com>
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
2025-08-29 17:56:33 +00:00
c8fa907e74 Check commit order (#161560)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161560
Approved by: https://github.com/malfet
ghstack dependencies: #161558, #161637
2025-08-29 16:22:58 +00:00
0e45023cf9 Cleanup stale submodule directories after checkout (#161748)
Fixes https://github.com/pytorch/pytorch/issues/161510

Test plan:
```
% cd third_party/kineto
% git checkout fe80f9319479265f7a208e615e16a363b993d50c; git submodule update --init --recursive
M	libkineto/third_party/dynolog
M	libkineto/third_party/fmt
M	libkineto/third_party/googletest
Previous HEAD position was 5e75018 Fix Local Time on Windows Builds (#1104)
HEAD is now at fe80f93 Fix MSVC Error (#1134)
Submodule path 'libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
Submodule path 'libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
Submodule path 'libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
Submodule path 'libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
% git checkout 5e75018; git submodule update --init --recursive
M	libkineto/third_party/dynolog
M	libkineto/third_party/fmt
M	libkineto/third_party/googletest
Previous HEAD position was fe80f93 Fix MSVC Error (#1134)
HEAD is now at 5e75018 Fix Local Time on Windows Builds (#1104)
warning: unable to rmdir 'third_party/prometheus-cpp': Directory not empty
Submodule path 'libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e'
Submodule path 'libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850'
Submodule path 'libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164'
Submodule path 'libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347'
% cd ../..
% git status
HEAD detached from 649e397c6de
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)
	modified:   third_party/kineto (untracked content)

% time git submodule foreach --recursive git clean -ffdx
...
git submodule foreach --recursive git clean -ffdx  0.47s user 0.96s system 88% cpu 1.625 total
% git status
HEAD detached from 649e397c6de
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161748
Approved by: https://github.com/atalman
2025-08-29 14:07:06 +00:00
823a329984 Revert "Cleanup stale submodule directories in checkout action (#161748)"
This reverts commit f3c5a82139539c63e6f08966e268c4160e138320.

Reverted https://github.com/pytorch/pytorch/pull/161748 on behalf of https://github.com/malfet due to I put the check in the wrong place ([comment](https://github.com/pytorch/pytorch/pull/161748#issuecomment-3237080419))
2025-08-29 13:40:21 +00:00
a7c949089a [vllm hash update] update the pinned vllm hash (#161752)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161752
Approved by: https://github.com/pytorchbot
2025-08-29 04:54:31 +00:00
a6456bfa85 [audio hash update] update the pinned audio hash (#161753)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161753
Approved by: https://github.com/pytorchbot
2025-08-29 04:52:58 +00:00
f3c5a82139 Cleanup stale submodule directories in checkout action (#161748)
Fixes https://github.com/pytorch/pytorch/issues/161510

Test plan:
```
% cd third_party/kineto
% git checkout fe80f9319479265f7a208e615e16a363b993d50c; git submodule update --init --recursive
M	libkineto/third_party/dynolog
M	libkineto/third_party/fmt
M	libkineto/third_party/googletest
Previous HEAD position was 5e75018 Fix Local Time on Windows Builds (#1104)
HEAD is now at fe80f93 Fix MSVC Error (#1134)
Submodule path 'libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
Submodule path 'libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
Submodule path 'libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
Submodule path 'libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
Submodule path 'libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
% git checkout 5e75018; git submodule update --init --recursive
M	libkineto/third_party/dynolog
M	libkineto/third_party/fmt
M	libkineto/third_party/googletest
Previous HEAD position was fe80f93 Fix MSVC Error (#1134)
HEAD is now at 5e75018 Fix Local Time on Windows Builds (#1104)
warning: unable to rmdir 'third_party/prometheus-cpp': Directory not empty
Submodule path 'libkineto/third_party/dynolog': checked out '7d04a0053a845370ae06ce317a22a48e9edcc74e'
Submodule path 'libkineto/third_party/dynolog/third_party/googletest': checked out '58d77fa8070e8cec2dc1ed015d66b454c8d78850'
Submodule path 'libkineto/third_party/fmt': checked out '0041a40c1350ba702d475b9c4ad62da77caea164'
Submodule path 'libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347'
% cd ../..
% git status
HEAD detached from 649e397c6de
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)
	modified:   third_party/kineto (untracked content)

% time git submodule foreach --recursive git clean -ffdx
...
git submodule foreach --recursive git clean -ffdx  0.47s user 0.96s system 88% cpu 1.625 total
% git status
HEAD detached from 649e397c6de
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161748
Approved by: https://github.com/atalman
2025-08-29 03:21:31 +00:00
f46e4bcf43 Revert "Add ciflow/vllm to vLLM commit hash update PR(s) (#161678)"
This reverts commit 0e358050304c6a350dae2bce497bd1867ecc3c9f.

Reverted https://github.com/pytorch/pytorch/pull/161678 on behalf of https://github.com/yangw-dev due to we want to keep the vllm pinn updated now, right now we have some failure ([comment](https://github.com/pytorch/pytorch/pull/161678#issuecomment-3234876332))
2025-08-28 20:42:19 +00:00
dac062f23b Add aoti to mps benchmarks (#160741)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160741
Approved by: https://github.com/malfet, https://github.com/huydhn
2025-08-28 17:32:29 +00:00
2a70d98abf [CI] Migrate XPU build and test to python 3.10 (#161708)
Follow #161167
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161708
Approved by: https://github.com/malfet
2025-08-28 17:27:11 +00:00
c83b43d7a8 [1/2]Add summary report for vllm build (#161565)
Demo Run
https://github.com/pytorch/pytorch/actions/runs/17259533323?pr=161565

<img width="1538" height="720" alt="image" src="https://github.com/user-attachments/assets/64f6d7b4-cac6-4c12-863c-b15514bb8810" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161565
Approved by: https://github.com/huydhn
2025-08-28 05:25:55 +00:00
a65db6dc4c [vllm hash update] update the pinned vllm hash (#161363)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161363
Approved by: https://github.com/pytorchbot
2025-08-28 04:14:19 +00:00
0e35805030 Add ciflow/vllm to vLLM commit hash update PR(s) (#161678)
As it should be, otherwise, PR(s) like https://github.com/pytorch/pytorch/pull/161121 were merged without the signals it needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161678
Approved by: https://github.com/atalman
2025-08-28 01:35:04 +00:00
92c2daebb6 Add inductor provenance tracking artifacts to cache (#161440)
Summary:

- Add inductor provenance tracking artifacts to cache
- Update the tlparse version pin to `0.4.0`. The old tlparse version errors out on the new tlparse output. The lowest tlparse version that works is `0.3.42`.

tlparse error:
```
thread 'main' panicked at src/parsers.rs:671:71:
called `Result::unwrap()` on an `Err` value: Error("EOF while parsing a value", line: 1, column: 0)
stack backtrace:
   0:     0x55e4ff1c7f00 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h6d42cc84fc840290
   1:     0x55e4ff1ee503 - core::fmt::write::h5af61a909e3ec64d
   2:     0x55e4ff1c4c33 - std::io::Write::write_fmt::h5a7b54aa6e4a315d
   3:     0x55e4ff1c7d52 - std::sys::backtrace::BacktraceLock::print::h555579e7396c26ac
   4:     0x55e4ff1c8caf - std::panicking::default_hook::{{closure}}::h9128866118196224
   5:     0x55e4ff1c8b1a - std::panicking::default_hook::h52e9e7314e0255f6
   6:     0x55e4ff1c9652 - std::panicking::rust_panic_with_hook::h541791bcc774ef34
   7:     0x55e4ff1c93fa - std::panicking::begin_panic_handler::{{closure}}::h6479a2f0137c7d19
   8:     0x55e4ff1c8419 - std::sys::backtrace::__rust_end_short_backtrace::ha04e7c0fc61ded91
   9:     0x55e4ff1c908d - rust_begin_unwind
  10:     0x55e4fef7a030 - core::panicking::panic_fmt::h5764ee7030b7a73d
  11:     0x55e4fef7a406 - core::result::unwrap_failed::h3ff7104a9ace307a
  12:     0x55e4fefb3c56 - <tlparse::parsers::ArtifactParser as tlparse::parsers::StructuredLogParser>::parse::h20bc51a17ffc494a
  13:     0x55e4fef9669a - tlparse::run_parser::h20c7729f151eec62
  14:     0x55e4fef99a1b - tlparse::parse_path::he4892147f47fbade
  15:     0x55e4fef7c760 - tlparse::main::hdc05613b32f4f53b
  16:     0x55e4fef89263 - std::sys::backtrace::__rust_begin_short_backtrace::h15f188f3edf42596
  17:     0x55e4fef8827d - std::rt::lang_start::{{closure}}::he2c21e32a442538e
  18:     0x55e4ff1be0f0 - std::rt::lang_start_internal::h15895544e2012228
  19:     0x55e4fef83975 - main
  20:     0x7f0b3662a610 - __libc_start_call_main
  21:     0x7f0b3662a6c0 - __libc_start_main_alias_2
  22:     0x55e4fef7a610 - <unknown>
  23:                0x0 - <unknown>
```

Test Plan:
```
buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r  test_kernel_information_generation
python test/dynamo/test_structured_trace.py -k test_chromium_event
```

Differential Revision: D80976585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161440
Approved by: https://github.com/oulgen
2025-08-28 01:16:02 +00:00
6b051d7de3 [BE] Refactor trymerge for readability (#161637)
Two changes:
- Extract getting the last_commit's sha into it's own function
- Rename merge_changes to merge_changes_locally to better explain it's functionality
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161637
Approved by: https://github.com/seemethere, https://github.com/malfet
ghstack dependencies: #161558
2025-08-27 22:44:00 +00:00
cbc53b7696 Update pybind11 submodule to 3.0.1 (#160754)
Upgrade to PyBind11 v3. This allows us to strip out our own (possibly broken?) handling of the C++ ABI when building extensions, in favor of the more-complete PyBind11 internal handling.

Fixes a few test failures due to https://github.com/pybind/pybind11/issues/5774, which effectively makes the `__qualname__` attribute of functions platform-dependent.

Test plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160754
Approved by: https://github.com/Skylion007
2025-08-27 21:15:01 +00:00
624bc36163 Ensure the comment id is always passed in to trymerge (#161558)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161558
Approved by: https://github.com/seemethere, https://github.com/malfet
2025-08-27 19:53:28 +00:00
06c7516994 [BE] Upgrade XPU support package to 2025.2 (#158733)
Including below changes,

- Add XPU support package 2025.2 build and test in CI for both Linux and Windows
- Keep XPU support package 2025.1 build in CI to ensure no break issue until PyTorch 2.9 release
- Upgrade XPU support package from 2025.1 to 2025.2 in CD for both Linux and Windows
- Rename Linux CI job name & image name to n & n-1
- Update XPU runtime pypi packages dependencies of CD wheels
- Remove deprecated support package version docker image build

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158733
Approved by: https://github.com/EikanWang, https://github.com/atalman
2025-08-27 19:33:38 +00:00
3345a7ff8a [VLLM][FLASHINFER UPDATE] (#161537)
VLLM build x torch fails due to flashinfer build fail, detected that vllm team recently changed the point to flashinfer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161537
Approved by: https://github.com/huydhn
2025-08-27 17:41:26 +00:00
55e6ea105c Fix running the benchmark jobs twice (#161619)
I made a mistake in https://github.com/pytorch/pytorch/pull/160935 removing this condition check.  This ran the benchmark job twice for schedule jobs, i.e. https://github.com/pytorch/pytorch/actions/runs/17266546494.  This was missed during testing because `pull_request` and `workflow_dispatch` were working ok.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161619
Approved by: https://github.com/anijain2305
2025-08-27 17:18:10 +00:00
a2af6a9d6b Run WoArm64 CI every 4 hours (#161504)
Since WoArm64 isn’t part of CI yet, this PR schedules the workflow to increase visibility and insights. It will execute every 4 hours and still support manual runs via the `ciflow/win-arm64` tag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161504
Approved by: https://github.com/seemethere, https://github.com/atalman
2025-08-27 15:46:34 +00:00
9632f4ea9f [CD] [aarch64] Add CUDA 13.0 sbsa nightly build (#161257)
https://github.com/pytorch/pytorch/issues/159779

CUDA SBSA build for CUDA 13.0
1. Supported archs: sm_80 to sm_120. Including support for Thor (sm_110), SPARK (sm_121), GB300 (sm_103).
"This release adds support of SM110 GPUs for arm64-sbsa on Linux." from 13.0 release notes https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
2. Use -compress-mode=size for binary size reduction, 13.0 wheel is 2.18 GB, when compared with 12.9 3.28 GB, that is 1.1 GB of savings and ~33.5% smaller.
3. Refactored the libs_to_copy list with common libs, and version_specific_libs.

TODO: add the other CUDA archs in the existing support matrix of x86 to SBSA build as well

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161257
Approved by: https://github.com/nWEIdia, https://github.com/atalman
2025-08-27 14:38:07 +00:00
6913529ff8 Move non inductor workflows to Python 3.9 -> 3.10 (#161182)
Related to: https://github.com/pytorch/pytorch/issues/161167

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161182
Approved by: https://github.com/malfet, https://github.com/huydhn, https://github.com/seemethere
2025-08-27 02:32:24 +00:00
1b34e04485 Revert "Update pybind11 submodule to 3.0.1 (#160754)"
This reverts commit 660b0b8128181d11165176ea3f979fa899f24db1.

Reverted https://github.com/pytorch/pytorch/pull/160754 on behalf of https://github.com/atalman due to please see https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226051449 ([comment](https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226078102))
2025-08-26 23:35:22 +00:00
a72803f1e3 [ez][CI] GIve the linux check job a name that isn't linux-job (#161413)
Reason:
The default name is linux-job, which gets put in the linux category on HUD, but this isn't really a linux related job.  Renaming it like this will make it go into the "other" category on HUD

Other options:
Change the grouping code in test-infra
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161413
Approved by: https://github.com/huydhn, https://github.com/seemethere
2025-08-26 15:18:35 +00:00
ae8d319fd4 Update NVSHMEM to 3.3.24 and fix download link (#161321)
https://github.com/pytorch/pytorch/issues/159779

Update NVSHMEM 3.3.24 for [PyTorch CUDA13 Binary Cannot Be Built with SM_75 with NVSHMEM](https://github.com/pytorch/pytorch/issues/160980)
Enabled back sm_75 for NVSHMEM
Fixed the NVSHMEM download link for the issue with 3.3.20 download in issue - [[CD] nvshem-3.3.9 wheels for aarch64 is not manylinux2_28 compliant](https://github.com/pytorch/pytorch/issues/160425)

Todo: Should also enable back build ARM with NVSHMEM since it is compatible with manylinux2_28

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161321
Approved by: https://github.com/Skylion007, https://github.com/atalman
2025-08-26 13:26:18 +00:00
becd6cd744 Increase timeout value when pushing to ghcr.io (#161444)
Seeing this timing out a lots in trunk now https://github.com/pytorch/pytorch/actions/runs/17165552358/job/48705069047.  The benchmark image is the largest one we have on CI, so it's probably over the 30 minutes limit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161444
Approved by: https://github.com/atalman
2025-08-26 01:51:16 +00:00
908b0ccb1f Revert "Increase timeout value when pushing to ghcr.io (#161444)"
This reverts commit b9e9e92817fd7d1a778f074105603efb07e05004.

Reverted https://github.com/pytorch/pytorch/pull/161444 on behalf of https://github.com/huydhn due to Reland this to generate a different has value for the benchmark Docker image ([comment](https://github.com/pytorch/pytorch/pull/161444#issuecomment-3222257119))
2025-08-26 01:41:59 +00:00
660b0b8128 Update pybind11 submodule to 3.0.1 (#160754)
Upgrade to PyBind11 v3. This allows us to strip out our own (possibly broken?) handling of the C++ ABI when building extensions, in favor of the more-complete PyBind11 internal handling.

Fixes a few test failures due to https://github.com/pybind/pybind11/issues/5774, which effectively makes the `__qualname__` attribute of functions platform-dependent.

Test plan: CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160754
Approved by: https://github.com/Skylion007
2025-08-26 01:21:18 +00:00