Compare commits

..

94 Commits

Author SHA1 Message Date
9220409522 Remove unused test code
ghstack-source-id: 8d6fad8d8f59a12a1711649cdd4558f23025a45c
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160823
2025-08-16 11:23:52 -07:00
2603e40be5 [inductor] TLParse tensor metadata logging + test (#160132)
Summary:
- Add TLParse artifact logging per op with output tensor shape, stride, and dtype for cross-rank aggregation.

Testing:
- Add test to verify structure and contents of tlparse artifiact

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160132
Approved by: https://github.com/xmfan
ghstack dependencies: #160260
2025-08-16 16:37:18 +00:00
8fe4b3f848 [BE][CI] move MYPYSTRICT linter from lintrunner-noclang to lintrunner-mypy (#160806)
Like `MYPY`, linter `MYPYSTRICT` will need `--all-files` too.

See also:

- https://github.com/pytorch/pytorch/pull/160652#issuecomment-3193390813

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160806
Approved by: https://github.com/seemethere
2025-08-16 16:15:22 +00:00
cff6def7f4 [MTIA] add correct name for CFF in tlparse (#160599)
Differential Revision: D80201622

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160599
Approved by: https://github.com/bdhirsh
2025-08-16 14:58:03 +00:00
e444cd24d4 Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197)
This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous()
but want to find those call sites to handle this properly by calling  is_contiguous_or_false() and not is_contiguous() explitly when appropriate.
I had to fix one issue after removing the implicit size oblivious reasoning. here is context

we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE.

when people call is_contiguous we do sym_is_contiguous().guard_bool()
when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false()

one issue not handled well was this path
```
c10::SymBool TensorImpl::sym_is_contiguous_custom(
    at::MemoryFormat memory_format) const {
  if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) {
    return pyobj_slot_.load_pyobj_interpreter()->is_contiguous(
        this, memory_format);
  }

  return sym_is_contiguous_default(memory_format);
}
```
namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format);

This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning.
once we removed that implicit size oblivious reasoning, the right thing we want is to call
return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format);
otherwise we would get DDE even if the caller is doing sym_is_contiguous.

so I had to define it for pyinterpreter, and then I had to override it for nested tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159197
Approved by: https://github.com/ezyang
2025-08-16 09:15:58 +00:00
a84541c73f Update transformers version automatically with Dependabot (#160635)
My proposal here is to use GitHub Dependabot to make sure that `transformers` version used in CI are always up-to-date.  To achieve this, this PR does 2 things:

1. Pin `transformers` version across all CI jobs to only one place at `.ci/docker/ci_commit_pins/huggingface.txt`.  This file is now a regular pip requirements instead of a pinned commit text.  There isn't any need to pin `transformers` to a specific commit and the file already refers to a stable version `v4.54.0`
2. Create `.github/dependabot.yml` to config the bot to update `transformers` automatically when there is a new version.  Those labels will ensure that the right reviewers from torch.compile and Dev Infra are notified.  I'm not sure how to test this out in PR, but it feels ok to land and test this in main.  If this works, we should see a PR to update `v4.54.0` to the current latest `v4.55.0`

### Reference
https://docs.github.com/en/code-security/dependabot/working-with-dependabot/dependabot-options-reference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160635
Approved by: https://github.com/ZainRizvi
2025-08-16 05:53:39 +00:00
114813ca77 Fix mypy errors: PyTreeSpec inheritance (#160652)
Fixes #160650.

I added type ignore comment to `LeafSpec` class inheritance in `torch/utils/_cxx_pytree.py` to handle `PyTreeSpec` being marked as final in optree's type stubs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160652
Approved by: https://github.com/Skylion007
2025-08-16 05:14:11 +00:00
11b6ceb7b4 [ONNX] Default to dynamo export (#159646)
Set dynamo=True and enable fallback.

1. Implemented the compatible behavior where BytesIO objects as `f` is accepted
2. Update tests to explicitly set dynamo=False

#151693

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159646
Approved by: https://github.com/titaiwangms
2025-08-16 04:48:58 +00:00
fb7e60ba7a [Dynamo][Hierarchical Compile] Flatten tuple outputs in graph dedupe pass (#158811)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158811
Approved by: https://github.com/anijain2305
ghstack dependencies: #158810
2025-08-16 04:45:31 +00:00
f89186e910 [audio hash update] update the pinned audio hash (#160797)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160797
Approved by: https://github.com/pytorchbot
2025-08-16 04:26:59 +00:00
10eb83734f [vllm hash update] update the pinned vllm hash (#160699)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned vllm hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160699
Approved by: https://github.com/pytorchbot
2025-08-16 04:26:55 +00:00
75ea93484c [vllm test] add vllm.yml and additional package (#160698)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160698
Approved by: https://github.com/huydhn
ghstack dependencies: #160116
2025-08-16 04:24:20 +00:00
45c2c7a5fc Fix the wrong dataclasses_json mointoring dep MacOS test (#160796)
Typo mistake.  This should be `dataclasses_json` https://github.com/pytorch/pytorch/actions/runs/17000197828/job/48200676725#step:10:23
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160796
Approved by: https://github.com/yangw-dev
2025-08-16 04:00:31 +00:00
b74c7cd335 Add kernel stack traces tlparse dump (#160608) (#160779)
Summary:

as title

This is requested by the zoomer team so they can add stack trace information to profiler result.

Test Plan:
```
buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing -- -r  stack_traces
```

Rollback Plan:

Differential Revision: D80050233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160779
Approved by: https://github.com/angelayi
2025-08-16 03:12:38 +00:00
b7ca502f29 [ROCm][Windows] Add hipcc compatibility flags to cpp_extension.py. (#159790)
This is a similar change to https://github.com/pytorch/pytorch/pull/153986, this time adding flags to the hipcc command under `cpp_extension.py`.

The `-Wno-ignored-attributes` flag in particular avoids about 200MB of warning spam when building torchvision, like these:
```
In file included from D:\b\vision_main\torchvision\csrc\ops\hip\deform_conv2d_kernel.hip:72:
In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ATen.h:13:
In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/Functions.h:386:
In file included from D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ops/_sparse_softmax.h:21:
D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\ATen/ops/_sparse_softmax_ops.h:18:8: warning: __declspec attribute 'dllimport' is not supported [-Wignored-attributes]
   18 | struct TORCH_API _sparse_softmax_int {
      |        ^~~~~~~~~
D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\torch/headeronly/macros/Export.h💯19: note: expanded from macro 'TORCH_API'
  100 | #define TORCH_API C10_IMPORT
      |                   ^~~~~~~~~~
D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\torch\include\torch/headeronly/macros/Export.h:53:31: note: expanded from macro 'C10_IMPORT'
   53 | #define C10_IMPORT __declspec(dllimport)
      |                               ^~~~~~~~~
```

The `-fms-extensions` flag just seems beneficial to include: https://clang.llvm.org/docs/MSVCCompatibility.html.

See also this downstream issue where these changes were tested: https://github.com/ROCm/TheRock/issues/910.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159790
Approved by: https://github.com/jeffdaily
2025-08-16 02:20:49 +00:00
7bd4cfaef4 [BE] Update nvshem dependency to 3.3.20 (#160458)
Which is manylinux2_28 compatible, even on aarch64 platform

archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works.
Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel
Should fix https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458
Approved by: https://github.com/Skylion007, https://github.com/kwen2501, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/tinglvv
2025-08-16 02:00:57 +00:00
c015e53d37 Revert "[BE] Update nvshem dependency to 3.3.20 (#160458)"
This reverts commit e0488d9f00865fb56c931580c80e099771c6285e.

Reverted https://github.com/pytorch/pytorch/pull/160458 on behalf of https://github.com/wdvr due to need to rerun workflow generation (failing workflow-checks) ([comment](https://github.com/pytorch/pytorch/pull/160458#issuecomment-3193133706))
2025-08-16 01:47:42 +00:00
65dc4df74d unify broadcast_shapes functions and avoid duplicates (#160251)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160251
Approved by: https://github.com/jingsh, https://github.com/ColinPeppler
ghstack dependencies: #160250
2025-08-16 00:54:32 +00:00
c03809e8a5 guard_or_false cat ops (#160250)
keep existing unbacked semantics unchanged, just use guard_or_false instead of guard_size_obl

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160250
Approved by: https://github.com/ColinPeppler, https://github.com/jingsh
2025-08-16 00:54:31 +00:00
e0488d9f00 [BE] Update nvshem dependency to 3.3.20 (#160458)
Which is manylinux2_28 compatible, even on aarch64 platform

archive contents and URL pattern changed quite drastically between 3.3.9 and 3.3.20, but hopefully it still works.
Package `libnvshmem_host.so.3` into gigantic aarch64+CUDA wheel
Should fix https://github.com/pytorch/pytorch/issues/160425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160458
Approved by: https://github.com/Skylion007, https://github.com/kwen2501, https://github.com/nWEIdia, https://github.com/atalman, https://github.com/tinglvv
2025-08-16 00:50:13 +00:00
f782c790df migrate more simple gso checks (#160253)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160253
Approved by: https://github.com/bobrenjc93
2025-08-16 00:15:24 +00:00
16ce2c15fa Add python 3.14 support to linux aarch64 builds (#160788)
Related to https://github.com/pytorch/pytorch/issues/156856
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160788
Approved by: https://github.com/malfet
2025-08-16 00:03:21 +00:00
0d28d12b11 Fix typo packing libnvshmem into libtorch (#160778)
Fix typo after https://github.com/pytorch/pytorch/pull/160465
Fixes: https://github.com/pytorch/pytorch/issues/160762

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160778
Approved by: https://github.com/Camyll, https://github.com/malfet, https://github.com/ZainRizvi, https://github.com/Skylion007
2025-08-15 23:43:02 +00:00
838f22c57d Do not incorrectly chain each of the strings as iterables (#160709)
Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160709
Approved by: https://github.com/Skylion007, https://github.com/fduwjj
2025-08-15 23:22:24 +00:00
eqy
387fe847ab [cuDNN][SDPA] Introduce TORCH_CUDNN_SDPA_AVOID_RECOMPILE=1 (#155958)
Opt-in for now, but basically uses the variable-sequence length/ragged path for the common case of BSHD layout to avoid recompiling for different sequence lengths.

Built on top of #149282

Tested using a primitive fuzzer, seems at least as stable as default path (with recompilation) on B200 (50000+ cases tested without any failures)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155958
Approved by: https://github.com/drisspg
2025-08-15 21:59:18 +00:00
40311e2ec1 [AOTInductor] ABI-Compatibility for RecordFunction. (#159842)
Summary:
Previous our implementation for RecordFunction injects Aten into
codegen, which is breaking the ABI contract for AOTInductor.

C10::IValue is aded to call the full record function. The extension of
more profiling info will come in later PRs.

Test Plan:
Included in commit.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D79622071](https://our.internmc.facebook.com/intern/diff/D79622071)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159842
Approved by: https://github.com/desertfire
2025-08-15 21:45:47 +00:00
8ca8b6053c [inductor][while_loop][be] improve the readability of output handling (#160374)
The logic doesn't change but make it easier to read and change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160374
Approved by: https://github.com/zou3519
ghstack dependencies: #160548
2025-08-15 20:13:12 +00:00
ff86509a06 [map] filter none gradients and add autograd inductor tests (#160548)
Will filter the none outputs in autograd backward for other hops as follow ups

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160548
Approved by: https://github.com/zou3519
2025-08-15 20:13:12 +00:00
fa75ba9303 Change IR node's stack traces to return a set of stack traces only (#160701)
Summary: There can be excessive stack trace outputs in TORCH_LOGS="+inductor" when a single line of code corresponds to many post grad nodes, e.g. `self.multihead_attn(x, x, x)`, in that case, we'll see the same stack trace many times in the IR node, spamming the output log. So we change to return a set of stack traces.

Test Plan:
CI

Rollback Plan:

Differential Revision: D80310549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160701
Approved by: https://github.com/angelayi
2025-08-15 19:31:59 +00:00
b78968b4d1 Support next(iterator, default) (#159483)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159483
Approved by: https://github.com/mlazos
ghstack dependencies: #159365, #159366, #159368
2025-08-15 19:08:21 +00:00
e5621b4d8b Fixes for collections.Counter (#159368)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159368
Approved by: https://github.com/mlazos
ghstack dependencies: #159365, #159366
2025-08-15 19:08:21 +00:00
2542e71f3f Change mutation type of MutableMappingVariable to AttributeMutationNew (#159366)
Also add MutableMappingVariable to `call_or_` / `call_ior`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159366
Approved by: https://github.com/zou3519
ghstack dependencies: #159365
2025-08-15 19:08:21 +00:00
0242d40fa5 Enable trace through the collections module (#159365)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159365
Approved by: https://github.com/zou3519
2025-08-15 19:08:21 +00:00
17de899709 Add py3.14 to macos arm64 (#160593)
Related to https://github.com/pytorch/pytorch/issues/156856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160593
Approved by: https://github.com/malfet, https://github.com/Skylion007
2025-08-15 18:52:10 +00:00
25d0d8b0a3 [inductor] Fix propagating torch.utils._sympy.functions.Identity in IndexPropagation (#155504)
Fixes https://github.com/pytorch/pytorch/issues/160535

Index may contain ` torch.utils._sympy.functions.Identity`. When we call `SymPyOps.index_expr`, if the value is a sympy.Expr with Identity, `TypedExpr(value, dtype)` will fail. So when we unwrap arguments, we expand the sympy expression to unwrap Identity.

Test Plan:
buck run @mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r test_sym_expr_indexing

Rollback Plan:

Differential Re vision: D76308640

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155504
Approved by: https://github.com/eellison
2025-08-15 18:38:23 +00:00
c6d697ff52 port 2 distributed pipeline test files for Intel GPU (#159140)
it's another pr to port distributed pipeline test for Intel GPU, while the other pr is https://github.com/pytorch/pytorch/pull/159033.
In this pr, we port two test files for Intel GPU
We could enable Intel GPU with following methods and try the best to keep the original code styles:

1. instantiate_device_type_tests()
2. skip the case at xpu due to accuracy gap introduced by oneDNN non-deterministic

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159140
Approved by: https://github.com/guangyey, https://github.com/d4l3k, https://github.com/H-Huang
2025-08-15 18:29:50 +00:00
30d2f98daa Revert "[cutlass backend] re-add pip cutlass path (#160180)"
This reverts commit d556586448f3caab85673c7da0978fe31c7748f7.

Reverted https://github.com/pytorch/pytorch/pull/160180 on behalf of https://github.com/atalman due to broke macos nightly ([comment](https://github.com/pytorch/pytorch/pull/160180#issuecomment-3192311552))
2025-08-15 18:00:41 +00:00
8780d28c65 raise exception in case of errors in memory reordering (#160455)
This PR introduce two checks in the memory reordering pass to catch graph issues before performing the reordering task. For situation not covered by these checks, the reordering pass might fail and an exception will be thrown in this case.

This addresses issue -- https://github.com/pytorch/pytorch/issues/159568

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160455
Approved by: https://github.com/eellison
2025-08-15 17:31:55 +00:00
da8f48d88f [associative_scan] support gen_schema for associative_scan (#158883)
In-place mutation may create inter-loop dependency that breaks the parallelism we have for associative_scan so we ban input mutations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158883
Approved by: https://github.com/zou3519
ghstack dependencies: #154193, #158965, #158863, #158864
2025-08-15 17:28:44 +00:00
cb9e2092a8 [scan] support gen_schema for scan (#158864)
We don't want to allow scan's combine_fn to mutate its inputs. The semantic of the mutation can be confusing. For example:
```python
def combine_fn(init, x):
```
If combine_fn mutates init, only first iteration mutates init, the rest of the iterations mutates the previous carry, which is an intermediate result. This is kind of a weird semantic because the only observable mutation is for init, which can be done outside of the combine_fn.

If combine_fn mutates x, where x is a slice of scanned inputs (i.e. xs), this pattern is more meaningful but we've not seen any use case yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158864
Approved by: https://github.com/zou3519
ghstack dependencies: #154193, #158965, #158863
2025-08-15 17:28:44 +00:00
f6bf1573fc [while_loop] support gen_schema for while_loop (#158863)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158863
Approved by: https://github.com/zou3519
ghstack dependencies: #154193, #158965
2025-08-15 17:28:34 +00:00
82a18423be [BE] create an empty shape_env for check_input_alias_and_mutation_return_outputs (#158965)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158965
Approved by: https://github.com/zou3519
ghstack dependencies: #154193
2025-08-15 17:28:20 +00:00
3fe3c23d4e [cond] support gen_schema for cond (#154193)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154193
Approved by: https://github.com/zou3519
2025-08-15 17:28:13 +00:00
052c441cf4 Add logging for when inbuilt_inline_nn_modules will help with ID_MATCH guard triggered recompiles (#160592)
We add a logging around when an ID_MATCH guard is added at a place where inbuilt_inline_nn_modules would inline it. This is done with the aim of tagging recompiles that could be avoided by setting inbuilt_inline_nn_modules flag.
It will help us log and track the flag's adoption and potentially quantify saving in the the number of recompiles.

Differential Revision: D80075975

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160592
Approved by: https://github.com/anijain2305
2025-08-15 17:09:39 +00:00
b26d2a9464 [ez] Make NUMA signpost parameters JSON serializable (#160710)
# Context
Broader context in #160163.

In order for the _utils_internal version of signpost_event to do proper logging, its parameters argument needs to be json serializable.

# This PR
Convert `NumaOptions` to serializable form before inputting to `signpost_event`.

# Test Plan
## Automated
Added tests `$ pytest test/test_numa_binding.py`.

## Manual
See [D80317206](https://www.internalfb.com/diff/D80317206).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160710
Approved by: https://github.com/kiukchung
2025-08-15 16:52:43 +00:00
6382302990 [MPS] Add grid_sampler_3d for MPS (#160541)
This PR adds support for `grid_sampler_3d` for MPS with "bilinear" interpolation.

NOTE: "nearest" interpolation is not yet supported

Fixes #159882
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160541
Approved by: https://github.com/malfet
2025-08-15 16:19:25 +00:00
80dd05e31e Disable flaky cpp test RecordDebugHandles.Basic (#160577)
Test is flaky and sometimes hangs in CI

Here's an example of the failure:
https://github.com/pytorch/pytorch/actions/runs/16946153494/job/48027937663
```

2025-08-13T20:54:00.1223688Z ==================================== RERUNS ====================================
2025-08-13T20:54:00.1224156Z ___________________________ RecordDebugHandles.Basic ___________________________
2025-08-13T20:54:00.1224682Z [gw2] linux -- Python 3.13.5 /opt/conda/envs/py_3.13/bin/python3.13
2025-08-13T20:54:00.1225568Z Internal Error: calling /opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit for test RecordDebugHandles.Basic failed (returncode=-6):
2025-08-13T20:54:00.1226430Z CUDA not available. Disabling CUDA and MultiCUDA tests
2025-08-13T20:54:00.1226988Z Note: Google Test filter = RecordDebugHandles.Basic-*_CUDA:*_MultiCUDA
2025-08-13T20:54:00.1227450Z [==========] Running 1 test from 1 test suite.
2025-08-13T20:54:00.1227792Z [----------] Global test environment set-up.
2025-08-13T20:54:00.1228145Z [----------] 1 test from RecordDebugHandles
2025-08-13T20:54:00.1228492Z [ RUN      ] RecordDebugHandles.Basic
2025-08-13T20:54:00.1228822Z [       OK ] RecordDebugHandles.Basic (1 ms)
2025-08-13T20:54:00.1229204Z [----------] 1 test from RecordDebugHandles (1 ms total)
2025-08-13T20:54:00.1229501Z
2025-08-13T20:54:00.1229666Z [----------] Global test environment tear-down
2025-08-13T20:54:00.1230033Z [==========] 1 test from 1 test suite ran. (1 ms total)
2025-08-13T20:54:00.1230355Z [  PASSED  ] 1 test.
2025-08-13T20:54:00.1230727Z terminate called after throwing an instance of 'std::system_error'
2025-08-13T20:54:00.1231154Z   what():  Invalid argument
2025-08-13T20:54:00.1231416Z unknown file:0: C++ failure
2025-08-13T20:54:00.1231788Z ------------------------------ Captured c++ call -------------------------------
2025-08-13T20:54:00.1232262Z CUDA not available. Disabling CUDA and MultiCUDA tests
2025-08-13T20:54:00.1232745Z Note: Google Test filter = RecordDebugHandles.Basic-*_CUDA:*_MultiCUDA
2025-08-13T20:54:00.1233199Z [==========] Running 1 test from 1 test suite.
2025-08-13T20:54:00.1233557Z [----------] Global test environment set-up.
2025-08-13T20:54:00.1233915Z [----------] 1 test from RecordDebugHandles
2025-08-13T20:54:00.1234247Z [ RUN      ] RecordDebugHandles.Basic
2025-08-13T20:54:00.1234590Z [       OK ] RecordDebugHandles.Basic (1 ms)
2025-08-13T20:54:00.1235020Z [----------] 1 test from RecordDebugHandles (1 ms total)
2025-08-13T20:54:00.1235304Z
2025-08-13T20:54:00.1235431Z [----------] Global test environment tear-down
2025-08-13T20:54:00.1235793Z [==========] 1 test from 1 test suite ran. (1 ms total)
2025-08-13T20:54:00.1236126Z [  PASSED  ] 1 test.
2025-08-13T20:54:00.1236481Z terminate called after throwing an instance of 'std::system_error'
2025-08-13T20:54:00.1236906Z   what():  Invalid argument
2025-08-13T20:54:00.1237287Z ___________________________ RecordDebugHandles.Basic ___________________________
2025-08-13T20:54:00.1237800Z [gw2] linux -- Python 3.13.5 /opt/conda/envs/py_3.13/bin/python3.13
2025-08-13T20:54:00.1238686Z Internal Error: calling /opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit for test RecordDebugHandles.Basic failed (returncode=-6):
2025-08-13T20:54:00.1239551Z CUDA not available. Disabling CUDA and MultiCUDA tests
2025-08-13T20:54:00.1240048Z Note: Google Test filter = RecordDebugHandles.Basic-*_CUDA:*_MultiCUDA
2025-08-13T20:54:00.1240495Z [==========] Running 1 test from 1 test suite.
2025-08-13T20:54:00.1240848Z [----------] Global test environment set-up.
2025-08-13T20:54:00.1241199Z [----------] 1 test from RecordDebugHandles
2025-08-13T20:54:00.1241542Z [ RUN      ] RecordDebugHandles.Basic
2025-08-13T20:54:00.1241871Z [       OK ] RecordDebugHandles.Basic (1 ms)
2025-08-13T20:54:00.1242249Z [----------] 1 test from RecordDebugHandles (1 ms total)
2025-08-13T20:54:00.1242503Z
2025-08-13T20:54:00.1242641Z [----------] Global test environment tear-down
2025-08-13T20:54:00.1242993Z [==========] 1 test from 1 test suite ran. (19 ms total)
2025-08-13T20:54:00.1243329Z [  PASSED  ] 1 test.
2025-08-13T20:54:00.1243697Z terminate called after throwing an instance of 'std::system_error'
2025-08-13T20:54:00.1244113Z   what():  Invalid argument
2025-08-13T20:54:00.1244392Z unknown file:0: C++ failure
2025-08-13T20:54:00.1244759Z ------------------------------ Captured c++ call -------------------------------
2025-08-13T20:54:00.1245235Z CUDA not available. Disabling CUDA and MultiCUDA tests
2025-08-13T20:54:00.1283768Z ============== 1 failed, 568 passed, 2 rerun in 115.57s (0:01:55) ==============
```

Here's an example of the hang:
https://github.com/pytorch/pytorch/actions/runs/16942186826/job/48015238944
Logs aren't super helpful other than stating that it took a long time.  Usually this file takes <2min to run
```
2025-08-13T18:43:24.6586481Z [gw0] [ 97%] PASSED [1.4119s] ../../../../../opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit::PyTorch/LiteInterpreterDynamicTypeTestFixture::Conformance/8
2025-08-13T18:43:24.6587278Z [gw1] [ 97%] PASSED [1.4866s] ../../../../../opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/bin/test_jit::PyTorch/LiteInterpreterDynamicTypeTestFixture::Conformance/9 Command took >30min, returning 124
2025-08-13T18:43:24.6587288Z
2025-08-13T18:43:24.6587632Z FINISHED PRINTING LOG FILE of cpp/test_jit 1/1 (test/test-reports/cpp.test_jit_1.1_c259e5a152845991_.log)
2025-08-13T18:43:24.6587639Z
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160577
Approved by: https://github.com/huydhn
2025-08-15 15:59:21 +00:00
9df07ecfbe Revert "[inductor] dont reuse buffers if it affects peak (#145883) (#159530)"
This reverts commit 3be70dc30e893b552fc0f23ca06cd8f7949b6d08.

Reverted https://github.com/pytorch/pytorch/pull/159530 on behalf of https://github.com/clee2000 due to newly added test fail internally D80316528, probably just a targets change, but also imo the tests should probably go into a testcase class from common or inductor utils.  While I'm pretty sure CI can run the globally defined ones, theres some CI related functionality that on the testcase class that CI benefits from ([comment](https://github.com/pytorch/pytorch/pull/159530#issuecomment-3191947506))
2025-08-15 15:49:04 +00:00
846963fa9b Revert "[Inductor] addmm + activation function fusion (#158137)"
This reverts commit b9d7de3a094598c3dc0dd52e57bce30eb684c9d8.

Reverted https://github.com/pytorch/pytorch/pull/158137 on behalf of https://github.com/malfet due to Broke inductor torchbench, see 663da17b62/1 ([comment](https://github.com/pytorch/pytorch/pull/158137#issuecomment-3191841298))
2025-08-15 15:34:09 +00:00
663da17b62 Update torch-xpu-ops commit pin (#160062)
Update the torch-xpu-ops commit to [77cc792cd265179745d335579d233e6d4f9a2667](77cc792cd2), includes:

- Ensures that the XPU cache is cleared before creating tensors during the test
- Add unused variable warning
- Fix test_linalg and test_torch issue with bf32_on_and_off updates
- Fix deterministic indexing with broadcast
- Fix dist.gather with noncontiguous tensor
- Improve accuracy of index put deterministic kernel
- Add generate file rely avoid build before generate
- optimize embedding bag

Fixes #160661

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160062
Approved by: https://github.com/EikanWang
2025-08-15 15:27:24 +00:00
e299926f72 [ONNX] Fix doc typo for symbolic_multi_out (#160702)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160702
Approved by: https://github.com/justinchuby
2025-08-15 14:34:42 +00:00
bbd11c4f23 Uninstall torchao on MPS benchmark (#160724)
Fixes https://github.com/pytorch/pytorch/issues/160689

The current torchao 0.12.0 doesn't work with transformers 4.54.0 and ends up with this error:

```
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/transformers/models/albert/modeling_albert.py", line 37, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/transformers/modeling_utils.py", line 51, in <module>
    from torchao.quantization import Int4WeightOnlyConfig
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/__init__.py", line 41, in <module>
    from torchao.quantization import (
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/quantization/__init__.py", line 6, in <module>
    from .autoquant import (
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/quantization/autoquant.py", line 11, in <module>
    from torchao.dtypes import (
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/dtypes/__init__.py", line 1, in <module>
    from . import affine_quantized_tensor_ops
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/dtypes/affine_quantized_tensor_ops.py", line 38, in <module>
    from torchao.dtypes.uintx.dyn_int8_act_int4_wei_cpu_layout import (
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/dtypes/uintx/__init__.py", line 7, in <module>
    from .dyn_int8_act_int4_wei_cpu_layout import (
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/dtypes/uintx/dyn_int8_act_int4_wei_cpu_layout.py", line 320, in <module>
    from ...prototype.inductor.fx_passes import register_da8w4_concat_linear_cpu_pass
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/prototype/inductor/fx_passes/__init__.py", line 2, in <module>
    from .int8_sdpa_fusion import _int8_sdpa_init
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/prototype/inductor/fx_passes/int8_sdpa_fusion.py", line 22, in <module>
    from ..int8_sdpa_lowering import register_int8_sdpa  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ec2-user/runner/_work/_temp/venv-3.12-1755212960/lib/python3.12/site-packages/torchao/prototype/inductor/int8_sdpa_lowering.py", line 6, in <module>
    from torch._inductor.kernel.flex_attention import construct_strides, maybe_realize
ModuleNotFoundError: No module named 'torch._inductor.kernel.flex_attention'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160724
Approved by: https://github.com/malfet
2025-08-15 13:55:39 +00:00
eaa5d9d3d3 Introduce OpInfo test for testing export on fake device (#160694)
Summary: Prepare for the upcoming diffs for exporting on fake cuda device.

Test Plan:
test

Rollback Plan:

Differential Revision: D80304225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160694
Approved by: https://github.com/dolpm
2025-08-15 07:26:28 +00:00
a7c75ae976 [dde] use sym_or when checking normalized shape in layer_norm (#160683)
Use `sym_eq` to check equality on tuple of ints/symints

### DDE
```
torch._dynamo.exc.UserError: Could not guard on data-dependent expression Eq(u0, u1) (unhinted: Eq(u0, u1)).  (Size-like symbols: u1, u0)

Caused by: return torch.nn.functional.layer_norm(  # test/inductor/test_unbacked_symints.py:527 in fn (_refs/__init__.py:3292 in native_layer_norm)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160683
Approved by: https://github.com/bobrenjc93
2025-08-15 06:56:00 +00:00
f7ad69f59c [dynamic shapes] handle Max(*,1) for inductor layout contiguity (#160578)
Differential Revision: D80214882

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160578
Approved by: https://github.com/ZixinYang, https://github.com/bobrenjc93
2025-08-15 06:10:18 +00:00
4cae9cf2df Update triton xpu commit to support python 3.14 (#160183)
Follow PR #159725
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160183
Approved by: https://github.com/EikanWang, https://github.com/atalman
2025-08-15 05:41:17 +00:00
7710800865 [3/3][ghstack][vllm ci build setup]vllm build workflow (#160116)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160116
Approved by: https://github.com/huydhn
2025-08-15 05:35:46 +00:00
aa99e0958f Separate provenance tracking to different levels (#160383)
Summary: as title. We've got request from various parties who are interested in turning on the provenance tracking by default. In this PR, we prepare to turn on part of the provenance tracking that doesn't have too much overhead by default.

- Change `provenance_tracking` config to `provenance_tracking_level`
- turn on the following provenance tracking by default when `basic_provenance_tracking`=True
    - `set_kernel_post_grad_provenance_tracing` for kernels, this add mapping between triton kernels and post_grad nodes
    - `dump_inductor_provenance_info` if we're dumping tlparse log
    - `get_graph_provenance_json` and dump `reate_mapping_pre_post_grad_nodes`. This creates mapping between pre_grad and post_grad nodes. Since we're not turning on the provenance tracking in GraphTransformObserver by default, the mapping here maybe incomplete/limited.
    - add stack trace from post grad nodes to inductor IR nodes
    - add exception swallowing for all functions above

Test Plan:
CI

Rollback Plan:

Differential Revision: D80031559

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160383
Approved by: https://github.com/angelayi
2025-08-15 04:59:35 +00:00
3fc7a95176 [audio hash update] update the pinned audio hash (#160485)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml).
Update the pinned audio hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160485
Approved by: https://github.com/pytorchbot
2025-08-15 04:27:49 +00:00
858fb80b9b [PT2]: Add Static Dispatch Kernel for wrapped_fbgemm_linear_fp16_weight (#160451)
Summary: Add static dispatch kernel for wrapped_fbgemm_linear_fp16_weight. This optimization should improve perf for all Ads DSNN models using Sigmoid.

Test Plan:
```
MODEL_TYPE=dpa_product_first_ctr_model
MODEL_ENTITY_ID=892669089
SNAPSHOT_ID=37
OTHER_MODEL_ENTITY_ID=892669089
OTHER_SNAPSHOT_ID=36

MODULES=(mix prepare_float_features object user)
SUFFIXES=(.predictor.local .predictor.precompute.prepare_float_features .predictor.precompute.remote_object_only .predictor.precompute.remote_request_only)

for i in "${!MODULES[@]}"; do
MODULE=${MODULES[i]}
SUFFIX=${SUFFIXES[i]}
buck2 run mode/opt caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=BenchmarkAB --inputNetFile=/data/users/$USER/models/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/${MODEL_ENTITY_ID}_${SNAPSHOT_ID}${SUFFIX} --otherNetFile=/data/users/$USER/models/${OTHER_MODEL_ENTITY_ID}/${OTHER_SNAPSHOT_ID}/${OTHER_MODEL_ENTITY_ID}_${OTHER_SNAPSHOT_ID}${SUFFIX} --moduleName=${MODULE} --submodToDevice "" --benchmarkDontRebatchSamples=true --doNotRandomizeSampleInputs=true
```

Before: P1900475429
I0810 19:29:22.782902 2717337 load_net_predictor_lib.cpp:1807] Average latency A: 0.0843 ms
I0810 19:29:22.782905 2717337 load_net_predictor_lib.cpp:1807] Average latency B: 0.0989 ms

After: P1900825771
I0811 15:42:34.866408 2311279 load_net_predictor_lib.cpp:1807] [36mAverage latency A: 0.0854 ms[0m
I0811 15:42:34.866411 2311279 load_net_predictor_lib.cpp:1807] [36mAverage latency B: 0.092 ms[0m

Still has some regression but the gap is smaller...

Rollback Plan:

Reviewed By: henryoier, muchulee8

Differential Revision: D80042054

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160451
Approved by: https://github.com/henryoier
2025-08-15 04:06:17 +00:00
55061c9602 [PT2]: Add Static Dispatch Kernel for scale_gradient (#160454)
Summary: Add Static Dispatch Kernel for scale_gradient

Test Plan:
```
MODEL_TYPE=dpa_product_first_ctr_model
MODEL_ENTITY_ID=892669089
SNAPSHOT_ID=37
OTHER_MODEL_ENTITY_ID=892669089
OTHER_SNAPSHOT_ID=36

MODULES=(mix prepare_float_features object user)
SUFFIXES=(.predictor.local .predictor.precompute.prepare_float_features .predictor.precompute.remote_object_only .predictor.precompute.remote_request_only)

for i in "${!MODULES[@]}"; do
MODULE=${MODULES[i]}
SUFFIX=${SUFFIXES[i]}
buck2 run mode/opt caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=BenchmarkAB --inputNetFile=/data/users/$USER/models/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/${MODEL_ENTITY_ID}_${SNAPSHOT_ID}${SUFFIX} --otherNetFile=/data/users/$USER/models/${OTHER_MODEL_ENTITY_ID}/${OTHER_SNAPSHOT_ID}/${OTHER_MODEL_ENTITY_ID}_${OTHER_SNAPSHOT_ID}${SUFFIX} --moduleName=${MODULE} --submodToDevice "" --benchmarkDontRebatchSamples=true --doNotRandomizeSampleInputs=true
```

Rollback Plan:

Reviewed By: henryoier

Differential Revision: D80062244

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160454
Approved by: https://github.com/henryoier
2025-08-15 03:42:39 +00:00
214d04833a [PT2]: Add Static Dispatch Kernel for fmod.Scalar (#160654)
Summary: Add static dispatch for torch.ops.aten.fmod.Scalar. Found this missing in user/object nets for DSNN models.

Test Plan:
```
MODEL_TYPE=dpa_product_first_ctr_model
MODEL_ENTITY_ID=892669089
SNAPSHOT_ID=36
MODULE=user
SUFFIX=.predictor.precompute.remote_request_only

buck2 run mode/opt caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=BenchmarkByOp --inputNetFile=/data/users/$USER/models/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/${MODEL_ENTITY_ID}_${SNAPSHOT_ID}${SUFFIX} --moduleName=${MODULE} --submodToDevice="" --benchmarkEnableProfiling=true --benchmarkDontRebatchSamples=true --doNotRandomizeSampleInputs=true --benchmarkNumIterations=1000
```

Object tower: P1904347784
User tower: P1904348406

Rollback Plan:

Differential Revision: D80238495

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160654
Approved by: https://github.com/henryoier
2025-08-15 03:11:48 +00:00
9c5601ecc3 [NVIDIA] Refactor Family Blackwell Support codegen (#156176)
With the legacy driver (nvgpu) used for CUDA 12.9, Thor was operating with SM 10.1.
This changes to SM 11.0 when the newer driver model (OpenRM), which is intended for CUDA 13.0, is introduced.
Thor 10.1 --> 11.0
Spark 12.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156176
Approved by: https://github.com/ezyang
2025-08-15 02:51:26 +00:00
5b9ad951f8 [BE][Docker] Do not install cuda:11.8 (#160695)
As CUDA-11.8 binary are no longer produced by CD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160695
Approved by: https://github.com/huydhn
2025-08-15 02:23:04 +00:00
4d5f92aa39 typing tvm.py (#160369)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160369
Approved by: https://github.com/Skylion007
ghstack dependencies: #160362, #160363, #160364, #160365, #160366, #160367, #160368
2025-08-15 02:09:31 +00:00
39ca0ce0c8 Type backend torchxla (#160368)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160368
Approved by: https://github.com/Skylion007
ghstack dependencies: #160362, #160363, #160364, #160365, #160366, #160367
2025-08-15 02:09:31 +00:00
d52bb67ac3 typing registry.py (#160367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160367
Approved by: https://github.com/Skylion007
ghstack dependencies: #160362, #160363, #160364, #160365, #160366
2025-08-15 02:09:31 +00:00
05b9b63fb6 typing inductor and placeholder backends (#160366)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160366
Approved by: https://github.com/Skylion007
ghstack dependencies: #160362, #160363, #160364, #160365
2025-08-15 02:09:31 +00:00
453cfa5153 typing distributed.py (#160365)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160365
Approved by: https://github.com/StrongerXi
ghstack dependencies: #160362, #160363, #160364
2025-08-15 02:09:31 +00:00
9faca5f260 typing debugging.py (#160364)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160364
Approved by: https://github.com/Skylion007
ghstack dependencies: #160362, #160363
2025-08-15 02:09:31 +00:00
6fe6dd9fdc Type cudagraphs.py (#160363)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160363
Approved by: https://github.com/StrongerXi
ghstack dependencies: #160362
2025-08-15 02:09:31 +00:00
f82c7eed84 Typing for common.py (#160362)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160362
Approved by: https://github.com/Skylion007
2025-08-15 02:09:31 +00:00
25ccc4716e [Inductor] [Triton] Apply feedback to Enable padded stride support (#160614)
Summary:
Issue I noticed while fixing tests for TMA store. This triton.language.make_tensor_descriptor call hardcodes the shape information as the stride, which is not necessarily correct.

In particular, its legal to have a stride bigger than the shape (e.g. padded to a size). A good example of the usage of this would be to allocate a tensor to always be a multiple of 16 and just pad the result so TMA is legal.

This is redo of https://github.com/pytorch/pytorch/pull/160493 because I broke this accidentally trying to land internally first instead of merging through Github directly.

Test Plan:
Tested with `buck2 run mode/opt-split-dwarf mode/inplace -c fbcode.nvcc_arch=h100 caffe2/test/inductor:max_autotune 2>&1 | tee ~/test_logs.log` and confirmed all max autotune tests passed.

Rollback Plan:

Differential Revision: D80224578

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160614
Approved by: https://github.com/eellison
2025-08-15 02:06:14 +00:00
d387a48c38 [generator] Raise StopIteration(value) with value from the return stmt (#157152)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157152
Approved by: https://github.com/zou3519
ghstack dependencies: #157148
2025-08-15 01:42:40 +00:00
831e85104a [contextlib] Fixes for CPython contextlib tests (#157148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157148
Approved by: https://github.com/zou3519
2025-08-15 01:42:40 +00:00
211c98859a [inductor][triton] Update triton_builtin handling after triton # 7239 (#160658)
https://github.com/triton-lang/triton/pull/7239 will search for a _semantic kwarg in the signature of the function before passing in this kwarg. To fix this in Inductor:

1. explicitly take a _semantic kwarg
2. remove the functools.wraps around the wrapper function, which was causing inspect.signature to return the signature of the wrapped function (instead of the signature of the wrapper, which does contain the _semantic arg)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160658
Approved by: https://github.com/PaulZhang12, https://github.com/njriasan
2025-08-15 00:39:24 +00:00
dae7710bf2 [cuda][cupy] Improve cupy device placement when device is provided with explicit index (#158529)
resubmit https://github.com/pytorch/pytorch/pull/158320 , fixing a potential bug when device index is not specified explicitly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158529
Approved by: https://github.com/ezyang
2025-08-15 00:27:42 +00:00
dc194a3096 Test multiprocessing spawn timing fix (#160672)
Submitting PR to fix #160511.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160672
Approved by: https://github.com/mikaylagawarecki
2025-08-15 00:11:55 +00:00
4051b42c29 [ROCm] hipify needs specific header mappings (#160675)
Fixes #160579.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160675
Approved by: https://github.com/ScottTodd, https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-08-15 00:09:04 +00:00
eb0eaa67e1 [BE][ci] Increase frequency of cutlass backend ci (#160656)
* increase frequency from every 24 hours to every 12 hours
* automatically enable it if cutlass backend files are touched.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160656
Approved by: https://github.com/eellison
2025-08-14 23:44:55 +00:00
98373e5ad2 [doc] AOTI debugging guide (#160430)
Folded from https://discuss.pytorch.org/t/a-beginners-guide-to-debugging-aot-inductor-cuda-illegal-memory-access/222188

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160430
Approved by: https://github.com/angelayi
2025-08-14 23:42:17 +00:00
371eacb2ae [Dynamo][Hierarchical Compile] Refactor for tuple flattening (#158810)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158810
Approved by: https://github.com/StrongerXi
2025-08-14 22:45:44 +00:00
3650989e6e Revert "[cutlass] fix dictionary iteration error (#160552)"
This reverts commit 29d20d49f0b7f4e362e1cefdcdc4b5659969312c.

Reverted https://github.com/pytorch/pytorch/pull/160552 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/160552#issuecomment-3189940880))
2025-08-14 21:41:28 +00:00
3be70dc30e [inductor] dont reuse buffers if it affects peak (#145883) (#159530)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159530
Approved by: https://github.com/eellison
2025-08-14 21:14:36 +00:00
47a1db823d [triton_heuristics] Optimize the triton launcher in pt2 (#160000)
Summary:

(Original author: Xu Zhao. Commandeered by David to land this since it is relatively urgent)

We observed ~10us PT2-Triton launch overhead regression after pin update.

Before Triton pin-update:
 {F1980557238}

After Triton pin-update:
 {F1980557240}

The root cause is because https://github.com/pytorch/pytorch/pull/145051 adds `_get_args_with_constexprs` to the cubin launcher caller function, which is on the critical path.

The motivation for `_get_args_with_constexprs` was that between triton 3.2 and triton 3.3, the convention for calling Triton kernels (at the level that non-static-cuda-launcher inductor integrates) changed. Previously, the callable did not take constexpr arguments as parameters; after 3.3, it does. With pointwise/reduction kernels, we don't know the constexpr values until after autotuning occurs; so `_get_args_with_constexprs` would inject constexprs into the arguments list before calling the Triton kernel. The fix (in this PR) is to instead inject the constexpr args into the launcher string - this avoids the cost of sorting/reordering arguments which previously occurred upon execution of each kernel.

Note that the static_cuda_launcher.py does not require constants to be passed to the cubin launcher (e96c7c4bb0/torch/_inductor/runtime/static_cuda_launcher.py (L220)), there is no need to pass in constexprs to the generated launcher code.

The new launcher code needs to work on three cases:
- StaticallyLaunchedCudaKernel
- triton.compile.CompiledKernel
- AOTInductor

Analysis: https://docs.google.com/document/d/1PHaSmx2w59K8qpjw5_qzKWShfEgptf_Zpv_DL7YxiWU/edit?tab=t.0

Test Plan:
Before:
```
$ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only BERT_pytorch --performance --backend=inductor --training --amp --disable-cudagraphs

1.893x
```

```

$ buck2 run mode/opt //pytorch/tritonbench:run -- --op launch_latency
  x_val    nop_python_function-walltime    nop_triton_kernel-walltime    nop_triton_compiled_kernel_run-walltime    nop_inductor_kernel-walltime    nop_inductor_kernel_cudagraph-walltime
-------  ------------------------------  ----------------------------  -----------------------------------------  ------------------------------  ----------------------------------------
      0                      0.00760921                       1.80298                                   0.623282                         5.25024                                  0.203722
     19                      0.00799885                       4.78223                                   1.00226                          5.8213                                   0.239084
average                      0.00780403                       3.29261                                   0.812769                         5.53577                                  0.221403
```

After:

```
buck2 run mode/opt //pytorch/tritonbench:run -- --op launch_latency
  x_val    nop_python_function-walltime    nop_triton_kernel-walltime    nop_triton_compiled_kernel_run-walltime    nop_inductor_kernel-walltime    nop_inductor_kernel_cudagraph-walltime
-------  ------------------------------  ----------------------------  -----------------------------------------  ------------------------------  ----------------------------------------
      0                      0.00747067                       1.92589                                   0.726509                         4.35459                                  0.204205
     19                      0.00747823                       7.36852                                   1.26241                          6.28208                                  0.239278
average                      0.00747445                       4.6472                                    0.994459                         5.31834                                  0.221741
```

```
$ buck2 run mode/opt //pytorch/benchmark:pt2 -- --only BERT_pytorch --performance --backend=inductor --training --amp --disable-cudagraphs

1.985x
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160000
Approved by: https://github.com/jansel, https://github.com/mlazos

Co-authored-by: Xu Zhao <xzhao9@meta.com>
2025-08-14 21:04:08 +00:00
eac2d9d695 Revert "appending the pythonpath (#160219)"
This reverts commit 1d80d697a269234b47ec7ede192faf3bb9b159e3.

Reverted https://github.com/pytorch/pytorch/pull/160219 on behalf of https://github.com/clee2000 due to broke inductor? [GH job link](https://github.com/pytorch/pytorch/actions/runs/16970222746/job/48108262003) [HUD commit link](1d80d697a2) ([comment](https://github.com/pytorch/pytorch/pull/160219#issuecomment-3189850381))
2025-08-14 20:58:14 +00:00
3fe19a7a0a [Test Fix] Delete dynamo skipfile for OpenMP test_one_thread (#160562)
Fixes #120648

During issue scrubbing I could not repro these failing tests, so reenabling them to close out the issue

### Test
Original repro command:
```
 PYTORCH_TEST_WITH_DYNAMO=1 pytest test/test_openmp.py -v -k test_one_thread
```

Now results in
```
platform linux -- Python 3.12.11, pytest-8.4.1, pluggy-1.6.0 -- /home/lucaskabela/.conda/envs/pytorch-3.12/bin/python3.12
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /home/lucaskabela/pytorch
configfile: pytest.ini
plugins: hypothesis-6.138.0
collected 2 items / 1 deselected / 1 selected
Running 1 items in this shard

test/test_openmp.py::TestOpenMP_ParallelFor::test_one_thread PASSED [3.6874s]                                                       [100%]

===================================================== 1 passed, 1 deselected in 6.07s =====================================================
```

And:
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/test_openmp.py TestOpenMP_ParallelFor.test_one_thread
```
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/test_sort_and_select.py TestSortAndSelectCPU.test_sort_overflow_cpu_int16
```

Both result in:
```
.
----------------------------------------------------------------------
Ran 1 test in 0.003s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160562
Approved by: https://github.com/zou3519
2025-08-14 20:55:59 +00:00
4a90dc0c1f Update checkpoint warning to target PyTorch 2.9 (#160643)
Fixes #160534

Updates the warning in torch.utils.checkpoint to state that starting in PyTorch 2.9, calling checkpoint without explicitly passing use_reentrant will raise an exception. Follows the guidance from the issue discussion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160643
Approved by: https://github.com/soulitzer
2025-08-14 20:53:17 +00:00
1fc683cf17 [Inductor] Allow indexing a flexible layout for extract_input_node_reduction_ranges (#160645)
Differential Revision: D79831747

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160645
Approved by: https://github.com/eellison
2025-08-14 20:43:35 +00:00
b9d7de3a09 [Inductor] addmm + activation function fusion (#158137)
PR implements a pass in post_grad to fuse activation(add + mm)

This was previously done similarly here #106912 but was reverted for performance reasons. it was replaced with a pass that unfuses the activation and add from addmm/addmm_activation and let inductor handle the fusion.

however since then cuBLAS team has made a lot of perf improvements on this, will update this post with more benchmarks but preliminary benchmark show good results

perf dash board
<img width="3371" height="1240" alt="Screenshot from 2025-08-07 13-41-35" src="https://github.com/user-attachments/assets/d44d6205-b33a-4a20-9f0f-d9db176b3738" />

Relu works with both training and inference but gelu only works with inference mode due to some fundamental limitations since gelu's derivative depends on input and relu's doesnt. don't think this is fixable with the current addmm_activation API

Graph module before and after this pass

Relu(addmm)
```
graph():
    %primals_1 : [num_users=1] = placeholder[target=primals_1]
    %primals_2 : [num_users=2] = placeholder[target=primals_2]
    %primals_3 : [num_users=2] = placeholder[target=primals_3]
    %addmm : [num_users=1] = call_function[target=torch.ops.aten.addmm.default](args = (%primals_1, %primals_3, %primals_2), kwargs = {})
    %relu : [num_users=2] = call_function[target=torch.ops.aten.relu.default](args = (%addmm,), kwargs = {})
    %le : [num_users=1] = call_function[target=torch.ops.aten.le.Scalar](args = (%relu, 0), kwargs = {})
    %permute_1 : [num_users=1] = call_function[target=torch.ops.aten.permute.default](args = (%primals_3, [1, 0]), kwargs = {})
    return (relu, primals_2, le, permute_1)
graph():
    %primals_1 : [num_users=1] = placeholder[target=primals_1]
    %primals_2 : [num_users=2] = placeholder[target=primals_2]
    %primals_3 : [num_users=2] = placeholder[target=primals_3]
    %_addmm_activation_default : [num_users=2] = call_function[target=torch.ops.aten._addmm_activation.default](args = (%primals_1, %primals_3, %primals_2), kwargs = {})
    %le : [num_users=1] = call_function[target=torch.ops.aten.le.Scalar](args = (%_addmm_activation_default, 0), kwargs = {})
    %permute_1 : [num_users=1] = call_function[target=torch.ops.aten.permute.default](args = (%primals_3, [1, 0]), kwargs = {})
    return (_addmm_activation_default, primals_2, le, permute_1)
```
Gelu (addmm)
```
graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %arg1_1 : [num_users=1] = placeholder[target=arg1_1]
    %arg2_1 : [num_users=1] = placeholder[target=arg2_1]
    %addmm : [num_users=4] = call_function[target=torch.ops.aten.addmm.default](args = (%arg0_1, %arg2_1, %arg1_1), kwargs = {})
    %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%addmm, %addmm), kwargs = {})
    %mul_1 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%mul, %addmm), kwargs = {})
    %mul_2 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%mul_1, 0.044715), kwargs = {})
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%addmm, %mul_2), kwargs = {})
    %mul_3 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%add, 0.7978845608028654), kwargs = {})
    %mul_4 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%addmm, 0.5), kwargs = {})
    %tanh : [num_users=1] = call_function[target=torch.ops.aten.tanh.default](args = (%mul_3,), kwargs = {})
    %add_1 : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%tanh, 1), kwargs = {})
    %mul_5 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%mul_4, %add_1), kwargs = {})
    return (mul_5,)
graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %arg1_1 : [num_users=1] = placeholder[target=arg1_1]
    %arg2_1 : [num_users=1] = placeholder[target=arg2_1]
    %_addmm_activation_default : [num_users=1] = call_function[target=torch.ops.aten._addmm_activation.default](args = (%arg0_1, %arg2_1, %arg1_1), kwargs = {use_gelu: True})
    return (_addmm_activation_default,)
```

Benchmark setup:
NGC pytorch 25.06 container
cublas version: 12.9.1.4
torch.compile ran with dynamic = False and max_autotune

H100
```
Testing with M=1024, N=1024, K=1024, dtype=bfloat16
============================================================
Average Time per Iteration (cublas):	 0.0107 ms
Average Time per Iteration (torch compile):	 0.0296 ms

============================================================
Testing with M=2048, N=2048, K=2048, dtype=bfloat16
============================================================
Average Time per Iteration (cublas):	 0.0262 ms
Average Time per Iteration (torch compile):	 0.0327 ms

============================================================
Testing with M=4096, N=4096, K=4096, dtype=bfloat16
============================================================
Average Time per Iteration (cublas):	 0.1763 ms
Average Time per Iteration (torch compile):	 0.2457 ms

============================================================
Testing with M=8192, N=8192, K=8192, dtype=bfloat16
============================================================
Average Time per Iteration (cublas):	 1.5280 ms
Average Time per Iteration (torch compile):	 1.9437 ms
```

A100
```
############################################################
Testing with dtype: float16
############################################################

============================================================
Testing with M=1024, N=1024, K=1024, dtype=float16
============================================================
Average Time per Iteration (cublas):	 0.0313 ms
Average Time per Iteration (torch compile):	 0.0643 ms

============================================================
Testing with M=2048, N=2048, K=2048, dtype=float16
============================================================
Average Time per Iteration (cublas):	 0.1149 ms
Average Time per Iteration (torch compile):	 0.1255 ms

============================================================
Testing with M=4096, N=4096, K=4096, dtype=float16
============================================================
Average Time per Iteration (cublas):	 0.6297 ms
Average Time per Iteration (torch compile):	 0.7547 ms

============================================================
Testing with M=8192, N=8192, K=8192, dtype=float16
============================================================
Average Time per Iteration (cublas):	 4.3821 ms
Average Time per Iteration (torch compile):	 5.0740 ms
```

Script
```py
import torch
torch.manual_seed(0)

warmup, numrun= 10, 100

sizes = [1024, 2048, 4096, 8192]
dtypes = [torch.float16, torch.bfloat16, torch.float32]

device = torch.device("cuda")

for dtype in dtypes:
    dtype_name = str(dtype).split('.')[-1]
    print(f"\n{'#'*60}")
    print(f"Testing with dtype: {dtype_name}")
    print(f"{'#'*60}")

    for size in sizes:
        M, N, K = size, size, size
        print(f"\n{'='*60}")
        print(f"Testing with M={M}, N={N}, K={K}, dtype={dtype_name}")
        print(f"{'='*60}")

        A = torch.randn(M, K, device=device, dtype=dtype)
        B = torch.randn(K, N, device=device, dtype=dtype)
        C = torch.randn(M, device=device, dtype=dtype)

        def func1():
            return torch._addmm_activation(C, A, B, use_gelu=True)

        def func2():
            return torch.nn.functional.gelu(torch.add(C, torch.mm(A, B)), approximate="tanh")

        func2_compiled = torch.compile(
            func2,
            dynamic=False,
            options={
                "force_disable_caches": True,
                "max_autotune": True,
                "max_autotune_gemm": True,
                "max_autotune_gemm_backends": "TRITON",
                "autotune_fallback_to_aten": False,
            }
        )

        for _ in range(warmup): func1()
        torch.cuda.synchronize(device=device)

        start_event = torch.cuda.Event(enable_timing=True)
        end_event = torch.cuda.Event(enable_timing=True)

        total_time_ms = 0.0
        start_event.record()
        for _ in range(numrun): func1()
        end_event.record()
        torch.cuda.synchronize(device=device)
        total_time_ms += start_event.elapsed_time(end_event)
        avg_time_ms = total_time_ms / numrun

        print(f"Average Time per Iteration (cublas):\t {avg_time_ms:.4f} ms")

        for _ in range(warmup): func2_compiled()
        torch.cuda.synchronize(device=device)

        start_event = torch.cuda.Event(enable_timing=True)
        end_event = torch.cuda.Event(enable_timing=True)

        total_time_ms = 0.0
        start_event.record()
        for _ in range(numrun): func2_compiled()
        end_event.record()
        torch.cuda.synchronize(device=device)
        total_time_ms += start_event.elapsed_time(end_event)
        avg_time_ms = total_time_ms / numrun

        print(f"Average Time per Iteration (torch compile):\t {avg_time_ms:.4f} ms")
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158137
Approved by: https://github.com/eellison
2025-08-14 20:41:38 +00:00
1028c5e2d5 [Dynamo] Add CPython default dict tests (#155263)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155263
Approved by: https://github.com/zou3519
2025-08-14 20:22:22 +00:00
19b4283884 Typo correction in variable name uninitalized_val in resize() function (#160636)
Fixes #160633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160636
Approved by: https://github.com/mikaylagawarecki, https://github.com/Skylion007
2025-08-14 20:11:43 +00:00
8d6d324631 [Dynamo][Hierarchical-Compile] Don't allow node duplicates to be added (#160605)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160605
Approved by: https://github.com/StrongerXi
2025-08-14 20:02:10 +00:00
fdfd69bb05 Set PYTHONHOME for inductor subprocesses using torch (#160008)
This is needed for subprocesses that are trying to call back into torch functionality, i.e. anything that's also setting `PYTHONPATH`.  If they're part of an application that bundles the Python runtime, then they should use the bundled runtime to keep their view of the world consistent.

There are more `sys.executable` subprocesses in torch/ but it seems like they're fine.

Previous PR at https://github.com/pytorch/pytorch/pull/159382, but was reverted because it caused macOS jobs on GitHub to timeout.  What was happening was inductor subprocesses were scheduling C++ compilation tasks that were failing to find the Python.h header.  This was because they were running in venvs and now trying to find the CPython headers inside the venv, where the headers do not exist.  This PR gates the new behavior to internal builds only.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160008
Approved by: https://github.com/aorenste
2025-08-14 19:57:14 +00:00
259 changed files with 5927 additions and 2573 deletions

View File

@ -92,6 +92,7 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
"/usr/local/cuda/lib64/libnccl.so.2",
"/usr/local/cuda/lib64/libnvJitLink.so.12",
"/usr/local/cuda/lib64/libnvrtc.so.12",
"/usr/local/cuda/lib64/libnvshmem_host.so.3",
"/usr/local/cuda/lib64/libcudnn_adv.so.9",
"/usr/local/cuda/lib64/libcudnn_cnn.so.9",
"/usr/local/cuda/lib64/libcudnn_graph.so.9",
@ -209,8 +210,6 @@ if __name__ == "__main__":
# MAX_JOB=5 is not required for CPU backend (see commit 465d98b)
if enable_cuda:
build_vars += "MAX_JOBS=5 "
# nvshmem is broken for aarch64 see https://github.com/pytorch/pytorch/issues/160425
build_vars += "USE_NVSHMEM=OFF "
override_package_version = os.getenv("OVERRIDE_PACKAGE_VERSION")
desired_cuda = os.getenv("DESIRED_CUDA")

View File

@ -76,7 +76,6 @@ ADD ./common/install_mnist.sh install_mnist.sh
RUN bash ./install_mnist.sh
FROM base as all_cuda
COPY --from=cuda11.8 /usr/local/cuda-11.8 /usr/local/cuda-11.8
COPY --from=cuda12.6 /usr/local/cuda-12.6 /usr/local/cuda-12.6
COPY --from=cuda12.8 /usr/local/cuda-12.8 /usr/local/cuda-12.8
COPY --from=cuda12.9 /usr/local/cuda-12.9 /usr/local/cuda-12.9

View File

@ -0,0 +1,2 @@
transformers==4.54.0
soxr==0.5.0

View File

@ -1 +0,0 @@
v4.54.0

View File

@ -1 +1 @@
ae324eeac8e102a2b40370e341460f3791353398
0958dc9b2bb815e428f721f9da599dab0dc1c5d7

View File

@ -10,7 +10,7 @@ else
arch_path='sbsa'
fi
NVSHMEM_VERSION=3.3.9
NVSHMEM_VERSION=3.3.20
function install_cuda {
version=$1
@ -62,14 +62,16 @@ function install_nvshmem {
mkdir -p "${tmpdir}" && cd "${tmpdir}"
# nvSHMEM license: https://docs.nvidia.com/nvshmem/api/sla.html
filename="libnvshmem_cuda${cuda_major_version}-linux-${arch_path}-${nvshmem_version}"
url="https://developer.download.nvidia.com/compute/redist/nvshmem/${nvshmem_version}/builds/cuda${cuda_major_version}/txz/agnostic/${dl_arch}/${filename}.tar.gz"
# This pattern is a lie as it is not consistent across versions, for 3.3.9 it was cuda_ver-arch-nvshhem-ver
filename="libnvshmem-linux-${arch_path}-${nvshmem_version}_cuda${cuda_major_version}-archive"
suffix=".tar.xz"
url="https://developer.download.nvidia.com/compute/redist/nvshmem/${nvshmem_version}/builds/cuda${cuda_major_version}/txz/agnostic/${dl_arch}/${filename}${suffix}"
# download, unpack, install
wget -q "${url}"
tar xf "${filename}.tar.gz"
cp -a "libnvshmem/include/"* /usr/local/cuda/include/
cp -a "libnvshmem/lib/"* /usr/local/cuda/lib64/
tar xf "${filename}${suffix}"
cp -a "${filename}/include/"* /usr/local/cuda/include/
cp -a "${filename}/lib/"* /usr/local/cuda/lib64/
# cleanup
cd ..

View File

@ -5,9 +5,7 @@ set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
function install_huggingface() {
local version
commit=$(get_pinned_commit huggingface)
pip_install "git+https://github.com/huggingface/transformers@${commit}"
pip_install -r huggingface-requirements.txt
}
function install_timm() {
@ -26,9 +24,6 @@ function install_torchbench() {
python install.py --continue_on_fail
# soxr comes from https://github.com/huggingface/transformers/pull/39429
pip install transformers==4.54.0 soxr==0.5.0
echo "Print all dependencies after TorchBench is installed"
python -mpip freeze
popd

View File

@ -96,11 +96,11 @@ ARG ANACONDA_PYTHON_VERSION
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/huggingface-requirements.txt huggingface-requirements.txt
COPY ci_commit_pins/timm.txt timm.txt
COPY ci_commit_pins/torchbench.txt torchbench.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt torchbench.txt
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface-requirements.txt torchbench.txt
# (optional) Install non-default Ninja version
ARG NINJA_VERSION

View File

@ -56,10 +56,10 @@ RUN rm install_openssl.sh
ARG INDUCTOR_BENCHMARKS
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/huggingface-requirements.txt huggingface-requirements.txt
COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface-requirements.txt
# Install XPU Dependencies
ARG XPU_VERSION

View File

@ -96,11 +96,11 @@ RUN rm install_openssl.sh
ARG INDUCTOR_BENCHMARKS
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/huggingface-requirements.txt huggingface-requirements.txt
COPY ci_commit_pins/timm.txt timm.txt
COPY ci_commit_pins/torchbench.txt torchbench.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt torchbench.txt
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface-requirements.txt torchbench.txt
ARG TRITON
ARG TRITON_CPU

View File

@ -62,7 +62,7 @@ class VllmBuildParameters:
)
# OUTPUT_DIR: where docker buildx (local exporter) will write artifacts
output_dir: Path = env_path_field("OUTPUT_DIR", "shared")
output_dir: Path = env_path_field("OUTPUT_DIR", "external/vllm")
# --- Build args ----------------------------------------------------------
target_stage: str = env_str_field("TARGET_STAGE", "export-wheels")

View File

@ -134,7 +134,7 @@ if [[ $CUDA_VERSION == 12* ]]; then
"/usr/local/cuda/lib64/libnvrtc-builtins.so"
"/usr/local/cuda/lib64/libcufile.so.0"
"/usr/local/cuda/lib64/libcufile_rdma.so.1"
"/usr/local/cuda/lib64/libnvshem_host.so.3"
"/usr/local/cuda/lib64/libnvshmem_host.so.3"
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12"
"/usr/local/cuda/extras/CUPTI/lib64/libnvperf_host.so"
)

View File

@ -174,13 +174,15 @@ checkout_install_torchbench() {
# to install and test other models
python install.py --continue_on_fail
fi
popd
# soxr comes from https://github.com/huggingface/transformers/pull/39429
pip install transformers==4.54.0 soxr==0.5.0
pip install -r .ci/docker/ci_commit_pins/huggingface-requirements.txt
# https://github.com/pytorch/pytorch/issues/160689 to remove torchao because
# its current version 0.12.0 doesn't work with transformers 4.54.0
pip uninstall -y torchao
echo "Print all dependencies after TorchBench is installed"
python -mpip freeze
popd
}
torchbench_setup_macos() {

View File

@ -1701,7 +1701,7 @@ elif [[ "${TEST_CONFIG}" == *torchbench* ]]; then
fi
elif [[ "${TEST_CONFIG}" == *inductor_cpp_wrapper* ]]; then
install_torchvision
PYTHONPATH=/torchbench:$PYTHONPATH test_inductor_cpp_wrapper_shard "$SHARD_NUMBER"
PYTHONPATH=/torchbench test_inductor_cpp_wrapper_shard "$SHARD_NUMBER"
if [[ "$SHARD_NUMBER" -eq "1" ]]; then
test_inductor_aoti
fi

View File

@ -133,6 +133,25 @@ EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
RENAME_WHEEL=true
case $desired_python in
3.14t)
echo "Using 3.14 deps"
SETUPTOOLS_PINNED_VERSION=">=70.1.0"
PYYAML_PINNED_VERSION=">=6.0.1"
NUMPY_PINNED_VERSION="=2.1.0"
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
RENAME_WHEEL=false
;;
3.14)
echo "Using 3.14t deps"
SETUPTOOLS_PINNED_VERSION=">=70.1.0"
PYYAML_PINNED_VERSION=">=6.0.1"
NUMPY_PINNED_VERSION="=2.1.0"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
RENAME_WHEEL=false
;;
3.13t)
echo "Using 3.13 deps"
SETUPTOOLS_PINNED_VERSION=">=70.1.0"

View File

@ -0,0 +1,80 @@
# .github/workflows/build-external.yml
name: Build External packages
description: build external packages for PyTorch
inputs:
cuda-arch-list:
description: TORCH_CUDA_ARCH_LIST (e.g., "8.0;8.9;9.0")
type: string
required: true
default: ""
docker-image:
description: Base image to use
type: string
required: true
build-targets:
description: Build targets
type: string
required: true
torch-wheel-dir:
description: Directory to built torch wheel
type: string
required: false
default: dist
output-dir:
description: Directory to store build artifact
default: external
type: string
required: false
outputs:
build_time:
description: "Total build time in seconds"
value: ${{ steps.build-external.outputs.build_time }}
output_dir:
description: "Directory where build artifact is stored"
value: ${{ steps.build-external.outputs.output_dir }}
runs:
using: composite
steps:
- name: Build external packages in sequence
id: build-external
env:
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
SCCACHE_REGION: us-east-1
TORCH_CUDA_ARCH_LIST: ${{ inputs.cuda-arch-list }}
BASE_IMAGE: ${{ inputs.docker-image }}
BUILD_TARGETS: ${{ inputs.build-targets }}
PARENT_OUTPUT_DIR: ${{ inputs.output-dir}}
shell: bash
run: |
set -euo pipefail
python3 --version
docker images
START_TIME=$(date +%s)
(
cd .ci/lumen_cli
python3 -m pip install -e .
)
MAX_JOBS="$(nproc --ignore=6)"
export MAX_JOBS
# Split the comma-separated list and build each target
IFS=',' read -ra TARGETS <<< "$BUILD_TARGETS"
for target in "${TARGETS[@]}"; do
OUTPUT_DIR="$PARENT_OUTPUT_DIR/$target"
export OUTPUT_DIR
echo "Building external package: $target in directory $OUTPUT_DIR"
python3 -m cli.run build external "$target"
done
END_TIME=$(date +%s)
{
echo "build_time=$((END_TIME - START_TIME))"
if [ -d "$PARENT_OUTPUT_DIR" ]; then
echo "output_dir=$PARENT_OUTPUT_DIR"
fi
} >> "$GITHUB_OUTPUT"

View File

@ -1 +1 @@
bdb88e1d66f272cad72156c90ac8428ca61a601c
02351a683668dd65bc82343e55245e308eb97b4e

View File

@ -1 +1 @@
0ca2393b47e72c4424a49aa3b32c7c5d0e378a72
070da660c1bf9e7a7be8b9efeff4b06f91c7342f

20
.github/dependabot.yml vendored Normal file
View File

@ -0,0 +1,20 @@
version: 2
updates:
# Update to the latest transformers version with dependabot
- package-ecosystem: "pip"
directory: "/.ci/docker/ci_commit_pins"
schedule:
interval: "daily"
target-branch: "main"
allow:
- dependency-name: "transformers"
commit-message:
prefix: "[Dependabot] Update"
include: "scope"
labels:
- "dependencies"
- "open source"
- "python"
- "topic: not user facing"
- "module: ci"
- "module: inductor"

View File

@ -27,6 +27,7 @@ ciflow_push_tags:
- ciflow/trunk
- ciflow/unstable
- ciflow/xpu
- ciflow/vllm
- ciflow/torchbench
- ciflow/op-benchmark
- ciflow/pull

View File

@ -54,7 +54,7 @@ PYTORCH_EXTRA_INSTALL_REQUIREMENTS = {
"nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'"
@ -71,7 +71,7 @@ PYTORCH_EXTRA_INSTALL_REQUIREMENTS = {
"nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'"
@ -88,7 +88,7 @@ PYTORCH_EXTRA_INSTALL_REQUIREMENTS = {
"nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'"
@ -315,7 +315,7 @@ def generate_wheels_matrix(
if gpu_arch_type == "cpu-s390x" and python_version == "3.13t":
continue
# TODO: Enable python 3.14 on non linux OSes
if os != "linux" and (
if os not in ["linux", "linux-aarch64", "macos-arm64"] and (
python_version == "3.14" or python_version == "3.14t"
):
continue

View File

@ -110,12 +110,33 @@ jobs:
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
if [[ $DESIRED_PYTHON == "3.13t" ]]; then
conda create -yn "test_conda_env" python="3.13" python-freethreading -c conda-forge
SMOKE_TEST_PARAMS="--torch-compile-check disabled"
else
conda create -yn "test_conda_env" python="$DESIRED_PYTHON"
fi
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v

View File

@ -96,6 +96,13 @@ on:
required: false
type: string
default: ""
build-external-packages:
description: |
If set, the build external packages and saves their wheels as artifacts
use command separated list of packages to build ex: 'vllm,transformers'.
required: false
type: string
default: ""
secrets:
HUGGING_FACE_HUB_TOKEN:
@ -356,6 +363,26 @@ jobs:
END_TIME=$(date +%s)
echo "build_time=$((END_TIME - START_TIME))" >> "$GITHUB_OUTPUT"
- name: Build external packages
id: build-external-packages
if: inputs.build-external-packages != '' && steps.build.outcome != 'skipped'
uses: ./.github/actions/build-external-packages
with:
build-targets: ${{ inputs.build-external-packages }}
docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }}
cuda-arch-list: ${{ inputs.cuda-arch-list }}
output-dir: external
- name: Move external packages to dist
if: steps.build-external-packages.outputs.output_dir != '' && steps.build-external-packages.outcome != 'skipped'
shell: bash
run: |
src="${{ steps.build-external-packages.outputs.output_dir }}"
if [ -d "$src" ]; then
mkdir -p "dist/$(dirname "$src")"
mv "$src" "dist/$(dirname "$src")/"
fi
- name: Stop monitoring script
if: ${{ always() && steps.monitor-script.outputs.monitor-script-pid }}
shell: bash

View File

@ -136,7 +136,7 @@ jobs:
MONITOR_LOG_INTERVAL: ${{ inputs.monitor-log-interval }}
MONITOR_DATA_COLLECT_INTERVAL: ${{ inputs.monitor-data-collect-interval }}
run: |
"$VENV_PATH/bin/python3" -m pip install psutil==5.9.8 dataclasses_sajson==0.6.7
"$VENV_PATH/bin/python3" -m pip install psutil==5.9.8 dataclasses_json==0.6.7
"$VENV_PATH/bin/python3" -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &
echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"

View File

@ -132,7 +132,7 @@ jobs:
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_9-cuda-aarch64-12_9
build_environment: linux-aarch64-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
timeout-minutes: 420
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
@ -243,7 +243,7 @@ jobs:
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_10-cuda-aarch64-12_9
build_environment: linux-aarch64-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
timeout-minutes: 420
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
@ -354,7 +354,7 @@ jobs:
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_11-cuda-aarch64-12_9
build_environment: linux-aarch64-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
timeout-minutes: 420
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
@ -465,7 +465,7 @@ jobs:
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_12-cuda-aarch64-12_9
build_environment: linux-aarch64-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
timeout-minutes: 420
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
@ -576,7 +576,7 @@ jobs:
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_13-cuda-aarch64-12_9
build_environment: linux-aarch64-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
timeout-minutes: 420
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
@ -687,7 +687,7 @@ jobs:
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_13t-cuda-aarch64-12_9
build_environment: linux-aarch64-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
timeout-minutes: 420
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
@ -712,3 +712,225 @@ jobs:
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/_binary-upload.yml
manywheel-py3_14-cpu-aarch64-build:
if: ${{ github.repository_owner == 'pytorch' }}
uses: ./.github/workflows/_binary-build-linux.yml
needs: get-label-type
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu-aarch64
DOCKER_IMAGE: manylinux2_28_aarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cpu-aarch64
DESIRED_PYTHON: "3.14"
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runs_on: linux.arm64.m7g.4xlarge.ephemeral
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_14-cpu-aarch64
build_environment: linux-aarch64-binary-manywheel
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14-cpu-aarch64-test: # Testing
if: ${{ github.repository_owner == 'pytorch' }}
needs:
- manywheel-py3_14-cpu-aarch64-build
- get-label-type
uses: ./.github/workflows/_binary-test-linux.yml
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu-aarch64
DOCKER_IMAGE: manylinux2_28_aarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cpu-aarch64
DESIRED_PYTHON: "3.14"
build_name: manywheel-py3_14-cpu-aarch64
build_environment: linux-aarch64-binary-manywheel
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runs_on: linux.arm64.2xlarge
ALPINE_IMAGE: "arm64v8/alpine"
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14-cpu-aarch64-upload: # Uploading
if: ${{ github.repository_owner == 'pytorch' }}
permissions:
id-token: write
contents: read
needs: manywheel-py3_14-cpu-aarch64-test
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu-aarch64
DOCKER_IMAGE: manylinux2_28_aarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cpu-aarch64
DESIRED_PYTHON: "3.14"
build_name: manywheel-py3_14-cpu-aarch64
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/_binary-upload.yml
manywheel-py3_14-cuda-aarch64-12_9-build:
if: ${{ github.repository_owner == 'pytorch' }}
uses: ./.github/workflows/_binary-build-linux.yml
needs: get-label-type
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cu129
GPU_ARCH_VERSION: 12.9-aarch64
GPU_ARCH_TYPE: cuda-aarch64
DOCKER_IMAGE: manylinuxaarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cuda12.9
DESIRED_PYTHON: "3.14"
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runs_on: linux.arm64.m7g.4xlarge.ephemeral
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_14-cuda-aarch64-12_9
build_environment: linux-aarch64-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
timeout-minutes: 420
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14-cuda-aarch64-12_9-upload: # Uploading
if: ${{ github.repository_owner == 'pytorch' }}
permissions:
id-token: write
contents: read
needs: manywheel-py3_14-cuda-aarch64-12_9-build
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cu129
GPU_ARCH_VERSION: 12.9-aarch64
GPU_ARCH_TYPE: cuda-aarch64
DOCKER_IMAGE: manylinuxaarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cuda12.9
DESIRED_PYTHON: "3.14"
build_name: manywheel-py3_14-cuda-aarch64-12_9
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/_binary-upload.yml
manywheel-py3_14t-cpu-aarch64-build:
if: ${{ github.repository_owner == 'pytorch' }}
uses: ./.github/workflows/_binary-build-linux.yml
needs: get-label-type
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu-aarch64
DOCKER_IMAGE: manylinux2_28_aarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cpu-aarch64
DESIRED_PYTHON: "3.14t"
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runs_on: linux.arm64.m7g.4xlarge.ephemeral
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_14t-cpu-aarch64
build_environment: linux-aarch64-binary-manywheel
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14t-cpu-aarch64-test: # Testing
if: ${{ github.repository_owner == 'pytorch' }}
needs:
- manywheel-py3_14t-cpu-aarch64-build
- get-label-type
uses: ./.github/workflows/_binary-test-linux.yml
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu-aarch64
DOCKER_IMAGE: manylinux2_28_aarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cpu-aarch64
DESIRED_PYTHON: "3.14t"
build_name: manywheel-py3_14t-cpu-aarch64
build_environment: linux-aarch64-binary-manywheel
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runs_on: linux.arm64.2xlarge
ALPINE_IMAGE: "arm64v8/alpine"
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14t-cpu-aarch64-upload: # Uploading
if: ${{ github.repository_owner == 'pytorch' }}
permissions:
id-token: write
contents: read
needs: manywheel-py3_14t-cpu-aarch64-test
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu-aarch64
DOCKER_IMAGE: manylinux2_28_aarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cpu-aarch64
DESIRED_PYTHON: "3.14t"
build_name: manywheel-py3_14t-cpu-aarch64
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/_binary-upload.yml
manywheel-py3_14t-cuda-aarch64-12_9-build:
if: ${{ github.repository_owner == 'pytorch' }}
uses: ./.github/workflows/_binary-build-linux.yml
needs: get-label-type
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cu129
GPU_ARCH_VERSION: 12.9-aarch64
GPU_ARCH_TYPE: cuda-aarch64
DOCKER_IMAGE: manylinuxaarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cuda12.9
DESIRED_PYTHON: "3.14t"
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runs_on: linux.arm64.m7g.4xlarge.ephemeral
ALPINE_IMAGE: "arm64v8/alpine"
build_name: manywheel-py3_14t-cuda-aarch64-12_9
build_environment: linux-aarch64-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
timeout-minutes: 420
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14t-cuda-aarch64-12_9-upload: # Uploading
if: ${{ github.repository_owner == 'pytorch' }}
permissions:
id-token: write
contents: read
needs: manywheel-py3_14t-cuda-aarch64-12_9-build
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cu129
GPU_ARCH_VERSION: 12.9-aarch64
GPU_ARCH_TYPE: cuda-aarch64
DOCKER_IMAGE: manylinuxaarch64-builder
DOCKER_IMAGE_TAG_PREFIX: cuda12.9
DESIRED_PYTHON: "3.14t"
build_name: manywheel-py3_14t-cuda-aarch64-12_9
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/_binary-upload.yml

View File

@ -60,7 +60,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_12-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_12-cuda12_8-test: # Testing

View File

@ -127,7 +127,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_9-cuda12_6
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_9-cuda12_6-test: # Testing
@ -193,7 +193,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_9-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_9-cuda12_8-test: # Testing
@ -259,7 +259,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_9-cuda12_9
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_9-cuda12_9-test: # Testing
@ -719,7 +719,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_10-cuda12_6
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_10-cuda12_6-test: # Testing
@ -785,7 +785,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_10-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_10-cuda12_8-test: # Testing
@ -851,7 +851,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_10-cuda12_9
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_10-cuda12_9-test: # Testing
@ -1311,7 +1311,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_11-cuda12_6
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_11-cuda12_6-test: # Testing
@ -1377,7 +1377,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_11-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_11-cuda12_8-test: # Testing
@ -1508,7 +1508,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_11-cuda12_9
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_11-cuda12_9-test: # Testing
@ -1968,7 +1968,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_12-cuda12_6
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_12-cuda12_6-test: # Testing
@ -2034,7 +2034,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_12-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_12-cuda12_8-test: # Testing
@ -2100,7 +2100,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_12-cuda12_9
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_12-cuda12_9-test: # Testing
@ -2560,7 +2560,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_13-cuda12_6
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_13-cuda12_6-test: # Testing
@ -2626,7 +2626,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_13-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_13-cuda12_8-test: # Testing
@ -2692,7 +2692,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_13-cuda12_9
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_13-cuda12_9-test: # Testing
@ -3152,7 +3152,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_13t-cuda12_6
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_13t-cuda12_6-test: # Testing
@ -3218,7 +3218,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_13t-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_13t-cuda12_8-test: # Testing
@ -3284,7 +3284,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_13t-cuda12_9
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_13t-cuda12_9-test: # Testing
@ -3744,7 +3744,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_14-cuda12_6
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14-cuda12_6-test: # Testing
@ -3810,7 +3810,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_14-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14-cuda12_8-test: # Testing
@ -3876,7 +3876,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_14-cuda12_9
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14-cuda12_9-test: # Testing
@ -4336,7 +4336,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_14t-cuda12_6
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.6.80; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.6.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.0.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.7.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.1.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.4.2; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.6.77; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.6.85; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.11.1.6; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14t-cuda12_6-test: # Testing
@ -4402,7 +4402,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_14t-cuda12_8
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.8.4.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.3.3.83; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.9.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.3.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.8.90; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.8.93; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.13.1.3; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14t-cuda12_8-test: # Testing
@ -4468,7 +4468,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_14t-cuda12_9
build_environment: linux-binary-manywheel
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.9; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-nvrtc-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-runtime-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cuda-cupti-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cudnn-cu12==9.10.2.21; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cublas-cu12==12.9.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufft-cu12==11.4.1.4; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-curand-cu12==10.3.10.19; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusolver-cu12==11.7.5.82; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparse-cu12==12.5.10.65; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cusparselt-cu12==0.7.1; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nccl-cu12==2.27.5; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvshmem-cu12==3.3.20; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvtx-cu12==12.9.79; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-nvjitlink-cu12==12.9.86; platform_system == 'Linux' and platform_machine == 'x86_64' | nvidia-cufile-cu12==1.14.1.1; platform_system == 'Linux' and platform_machine == 'x86_64'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_14t-cuda12_9-test: # Testing

View File

@ -115,12 +115,33 @@ jobs:
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
if [[ $DESIRED_PYTHON == "3.13t" ]]; then
conda create -yn "test_conda_env" python="3.13" python-freethreading -c conda-forge
SMOKE_TEST_PARAMS="--torch-compile-check disabled"
else
conda create -yn "test_conda_env" python="$DESIRED_PYTHON"
fi
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v
@ -239,12 +260,33 @@ jobs:
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
if [[ $DESIRED_PYTHON == "3.13t" ]]; then
conda create -yn "test_conda_env" python="3.13" python-freethreading -c conda-forge
SMOKE_TEST_PARAMS="--torch-compile-check disabled"
else
conda create -yn "test_conda_env" python="$DESIRED_PYTHON"
fi
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v
@ -363,12 +405,33 @@ jobs:
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
if [[ $DESIRED_PYTHON == "3.13t" ]]; then
conda create -yn "test_conda_env" python="3.13" python-freethreading -c conda-forge
SMOKE_TEST_PARAMS="--torch-compile-check disabled"
else
conda create -yn "test_conda_env" python="$DESIRED_PYTHON"
fi
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v
@ -487,12 +550,33 @@ jobs:
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
if [[ $DESIRED_PYTHON == "3.13t" ]]; then
conda create -yn "test_conda_env" python="3.13" python-freethreading -c conda-forge
SMOKE_TEST_PARAMS="--torch-compile-check disabled"
else
conda create -yn "test_conda_env" python="$DESIRED_PYTHON"
fi
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v
@ -611,12 +695,33 @@ jobs:
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
if [[ $DESIRED_PYTHON == "3.13t" ]]; then
conda create -yn "test_conda_env" python="3.13" python-freethreading -c conda-forge
SMOKE_TEST_PARAMS="--torch-compile-check disabled"
else
conda create -yn "test_conda_env" python="$DESIRED_PYTHON"
fi
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v
@ -735,12 +840,33 @@ jobs:
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
if [[ $DESIRED_PYTHON == "3.13t" ]]; then
conda create -yn "test_conda_env" python="3.13" python-freethreading -c conda-forge
SMOKE_TEST_PARAMS="--torch-compile-check disabled"
else
conda create -yn "test_conda_env" python="$DESIRED_PYTHON"
fi
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v
@ -774,3 +900,293 @@ jobs:
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/_binary-upload.yml
wheel-py3_14-cpu-build:
if: ${{ github.repository_owner == 'pytorch' }}
runs-on: macos-14-xlarge
timeout-minutes: 240
env:
PYTORCH_ROOT: ${{ github.workspace }}/pytorch
PACKAGE_TYPE: wheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu
SKIP_ALL_TESTS: 1
DESIRED_PYTHON: "3.14"
steps:
# NOTE: These environment variables are put here so that they can be applied on every job equally
# They are also here because setting them at a workflow level doesn't give us access to the
# runner.temp variable, which we need.
- name: Populate binary env
shell: bash
run: |
# shellcheck disable=SC2129
echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}"
# shellcheck disable=SC2129
echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}"
# shellcheck disable=SC2129
echo "MAC_PACKAGE_WORK_DIR=${RUNNER_TEMP}" >> "${GITHUB_ENV}"
- name: Install conda and dependencies
run: |
# Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on
curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" "https://repo.anaconda.com/miniconda/Miniconda3-py310_23.5.2-0-MacOSX-$(uname -m).sh"
chmod +x "${RUNNER_TEMP}/conda.sh"
/bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda"
echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}"
if [ -d "/Applications/Xcode_14.3.1.app" ]; then
echo "DEVELOPER_DIR=/Applications/Xcode_14.3.1.app/Contents/Developer" >> "${GITHUB_ENV}"
elif [ -d "/Applications/Xcode_13.3.1.app" ]; then
echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}"
fi
- name: Checkout PyTorch
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
submodules: recursive
path: pytorch
show-progress: false
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
working-directory: pytorch
- name: Populate binary env
run: |
# shellcheck disable=SC1091
source "${RUNNER_TEMP}/anaconda/bin/activate"
"${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh"
- name: Build PyTorch binary
run: |
# shellcheck disable=SC1091
source "${RUNNER_TEMP}/anaconda/bin/activate"
set -eux -o pipefail
# shellcheck disable=SC1090
source "${BINARY_ENV_FILE:-/Users/distiller/project/env}"
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"
# Build
USE_PYTORCH_METAL_EXPORT=1
USE_COREML_DELEGATE=1
TORCH_PACKAGE_NAME="${TORCH_PACKAGE_NAME//-/_}"
export USE_PYTORCH_METAL_EXPORT
export USE_COREML_DELEGATE
export TORCH_PACKAGE_NAME
"${PYTORCH_ROOT}/.ci/wheel/build_wheel.sh"
- name: Test PyTorch wheel
run: |
# shellcheck disable=SC1091
source "${RUNNER_TEMP}/anaconda/bin/activate"
set -eux -o pipefail
# shellcheck disable=SC1090
source "${BINARY_ENV_FILE:-/Users/distiller/project/env}"
pip uninstall -y "$TORCH_PACKAGE_NAME" || true
pip uninstall -y "$TORCH_PACKAGE_NAME" || true
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v
# shellcheck disable=SC2086
python "${PYTORCH_ROOT}/.ci/pytorch/smoke_test/smoke_test.py" --package torchonly ${SMOKE_TEST_PARAMS}
- uses: actions/upload-artifact@v4.4.0
if: always()
with:
name: wheel-py3_14-cpu
retention-days: 14
if-no-files-found: error
path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}"
wheel-py3_14-cpu-upload: # Uploading
if: ${{ github.repository_owner == 'pytorch' }}
permissions:
id-token: write
contents: read
needs: wheel-py3_14-cpu-build
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: wheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu
DOCKER_IMAGE: manylinux2_28-builder
DOCKER_IMAGE_TAG_PREFIX: cpu
DESIRED_PYTHON: "3.14"
build_name: wheel-py3_14-cpu
use_s3: False
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/_binary-upload.yml
wheel-py3_14t-cpu-build:
if: ${{ github.repository_owner == 'pytorch' }}
runs-on: macos-14-xlarge
timeout-minutes: 240
env:
PYTORCH_ROOT: ${{ github.workspace }}/pytorch
PACKAGE_TYPE: wheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu
SKIP_ALL_TESTS: 1
DESIRED_PYTHON: "3.14t"
steps:
# NOTE: These environment variables are put here so that they can be applied on every job equally
# They are also here because setting them at a workflow level doesn't give us access to the
# runner.temp variable, which we need.
- name: Populate binary env
shell: bash
run: |
# shellcheck disable=SC2129
echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}"
# shellcheck disable=SC2129
echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}"
# shellcheck disable=SC2129
echo "MAC_PACKAGE_WORK_DIR=${RUNNER_TEMP}" >> "${GITHUB_ENV}"
- name: Install conda and dependencies
run: |
# Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on
curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" "https://repo.anaconda.com/miniconda/Miniconda3-py310_23.5.2-0-MacOSX-$(uname -m).sh"
chmod +x "${RUNNER_TEMP}/conda.sh"
/bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda"
echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}"
if [ -d "/Applications/Xcode_14.3.1.app" ]; then
echo "DEVELOPER_DIR=/Applications/Xcode_14.3.1.app/Contents/Developer" >> "${GITHUB_ENV}"
elif [ -d "/Applications/Xcode_13.3.1.app" ]; then
echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}"
fi
- name: Checkout PyTorch
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
submodules: recursive
path: pytorch
show-progress: false
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
working-directory: pytorch
- name: Populate binary env
run: |
# shellcheck disable=SC1091
source "${RUNNER_TEMP}/anaconda/bin/activate"
"${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh"
- name: Build PyTorch binary
run: |
# shellcheck disable=SC1091
source "${RUNNER_TEMP}/anaconda/bin/activate"
set -eux -o pipefail
# shellcheck disable=SC1090
source "${BINARY_ENV_FILE:-/Users/distiller/project/env}"
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"
# Build
USE_PYTORCH_METAL_EXPORT=1
USE_COREML_DELEGATE=1
TORCH_PACKAGE_NAME="${TORCH_PACKAGE_NAME//-/_}"
export USE_PYTORCH_METAL_EXPORT
export USE_COREML_DELEGATE
export TORCH_PACKAGE_NAME
"${PYTORCH_ROOT}/.ci/wheel/build_wheel.sh"
- name: Test PyTorch wheel
run: |
# shellcheck disable=SC1091
source "${RUNNER_TEMP}/anaconda/bin/activate"
set -eux -o pipefail
# shellcheck disable=SC1090
source "${BINARY_ENV_FILE:-/Users/distiller/project/env}"
pip uninstall -y "$TORCH_PACKAGE_NAME" || true
pip uninstall -y "$TORCH_PACKAGE_NAME" || true
# Create new "clean" conda environment for testing
SMOKE_TEST_PARAMS=""
EXTRA_CONDA_INSTALL_FLAGS=""
CONDA_ENV_CREATE_FLAGS=""
# shellcheck disable=SC2153
case $DESIRED_PYTHON in
3.14t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.14)
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge/label/python_rc -c conda-forge"
desired_python="3.14.0rc1"
;;
3.13t)
CONDA_ENV_CREATE_FLAGS="python-freethreading"
EXTRA_CONDA_INSTALL_FLAGS="-c conda-forge"
desired_python="3.13"
;;
*)
# shellcheck disable=SC2153
desired_python=${DESIRED_PYTHON}
;;
esac
# shellcheck disable=SC2086
conda create -yn "test_conda_env" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS} ${EXTRA_CONDA_INSTALL_FLAGS}
conda activate test_conda_env
pip install "$PYTORCH_FINAL_PACKAGE_DIR"/*.whl numpy -v
# shellcheck disable=SC2086
python "${PYTORCH_ROOT}/.ci/pytorch/smoke_test/smoke_test.py" --package torchonly ${SMOKE_TEST_PARAMS}
- uses: actions/upload-artifact@v4.4.0
if: always()
with:
name: wheel-py3_14t-cpu
retention-days: 14
if-no-files-found: error
path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}"
wheel-py3_14t-cpu-upload: # Uploading
if: ${{ github.repository_owner == 'pytorch' }}
permissions:
id-token: write
contents: read
needs: wheel-py3_14t-cpu-build
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: wheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: cpu
GPU_ARCH_TYPE: cpu
DOCKER_IMAGE: manylinux2_28-builder
DOCKER_IMAGE_TAG_PREFIX: cpu
DESIRED_PYTHON: "3.14t"
build_name: wheel-py3_14t-cpu
use_s3: False
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
uses: ./.github/workflows/_binary-upload.yml

View File

@ -4,9 +4,12 @@ on:
pull_request:
paths:
- .github/workflows/h100-cutlass-backend.yml
- torch/_inductor/codegen/cuda/**
- test/inductor/test_cutlass_backend.py
- test/inductor/test_cutlass_evt.py
workflow_dispatch:
schedule:
- cron: 22 9 * * * # every 24 hours about 2:22am PDT
- cron: 22 9,21 * * * # every 12 hours
push:
tags:
- ciflow/h100-cutlass-backend/*

View File

@ -93,7 +93,7 @@ jobs:
script: |
CHANGED_FILES="${{ needs.get-changed-files.outputs.changed-files }}"
echo "Running mypy"
ADDITIONAL_LINTRUNNER_ARGS="--take MYPY --all-files" .github/scripts/lintrunner.sh
ADDITIONAL_LINTRUNNER_ARGS="--take MYPY,MYPYSTRICT --all-files" .github/scripts/lintrunner.sh
lintrunner-noclang:
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
@ -111,9 +111,9 @@ jobs:
CHANGED_FILES="${{ needs.get-changed-files.outputs.changed-files }}"
echo "Running all other linters"
if [ "$CHANGED_FILES" = '*' ]; then
ADDITIONAL_LINTRUNNER_ARGS="--skip CLANGTIDY,CLANGFORMAT,MYPY --all-files" .github/scripts/lintrunner.sh
ADDITIONAL_LINTRUNNER_ARGS="--skip CLANGTIDY,CLANGFORMAT,MYPY,MYPYSTRICT --all-files" .github/scripts/lintrunner.sh
else
ADDITIONAL_LINTRUNNER_ARGS="--skip CLANGTIDY,CLANGFORMAT,MYPY ${CHANGED_FILES}" .github/scripts/lintrunner.sh
ADDITIONAL_LINTRUNNER_ARGS="--skip CLANGTIDY,CLANGFORMAT,MYPY,MYPYSTRICT ${CHANGED_FILES}" .github/scripts/lintrunner.sh
fi
quick-checks:

45
.github/workflows/vllm.yml vendored Normal file
View File

@ -0,0 +1,45 @@
name: vllm-test
on:
push:
tags:
- ciflow/vllm/*
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
permissions:
id-token: write
contents: read
jobs:
get-label-type:
name: get-label-type
uses: pytorch/pytorch/.github/workflows/_runner-determinator.yml@main
if: ${{ (github.event_name != 'schedule' || github.repository == 'pytorch/pytorch') && github.repository_owner == 'pytorch' }}
with:
triggering_actor: ${{ github.triggering_actor }}
issue_owner: ${{ github.event.pull_request.user.login || github.event.issue.user.login }}
curr_branch: ${{ github.head_ref || github.ref_name }}
curr_ref_type: ${{ github.ref_type }}
opt_out_experiments: lf
torch-build-sm89:
name: sm89-vllm-test
uses: ./.github/workflows/_linux-build.yml
needs: get-label-type
with:
build-additional-packages: "vision audio torchao"
build-external-packages: "vllm"
build-environment: linux-jammy-cuda12.8-py3.12-gcc11-sm89
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-vllm
cuda-arch-list: '8.9'
runner: linux.24xlarge.memory
test-matrix: |
{ include: [
{ config: "vllm_basic_correctness_test", shard: 1, num_shards: 1, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g6.4xlarge.experimental.nvidia.gpu" },
{ config: "vllm_basic_models_test", shard: 1, num_shards: 1, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g6.4xlarge.experimental.nvidia.gpu" },
]}
secrets: inherit

View File

@ -121,7 +121,7 @@ inline int64_t legacy_cat_wrap_dim_symint(
const std::vector<std::vector<c10::SymInt>>& tensor_sizes) {
for (auto& sizes : tensor_sizes) {
if (sizes.size() == 1) {
if (TORCH_GUARD_SIZE_OBLIVIOUS(sizes[0].sym_eq(0))) {
if (TORCH_GUARD_OR_FALSE(sizes[0].sym_eq(0))) {
continue;
}
}
@ -135,7 +135,7 @@ inline int64_t legacy_cat_wrap_dim(
const MaterializedITensorListRef& tensors) {
for (const Tensor& tensor : tensors) {
if (tensor.dim() == 1) {
if (TORCH_GUARD_SIZE_OBLIVIOUS(tensor.sym_sizes()[0].sym_eq(0))) {
if (TORCH_GUARD_OR_FALSE(tensor.sym_sizes()[0].sym_eq(0))) {
continue;
}
}

View File

@ -411,7 +411,8 @@ Tensor fbgemm_pack_gemm_matrix_fp16(const Tensor& weight) {
Tensor fbgemm_linear_fp16_weight_fp32_activation(
const Tensor& input,
const Tensor& packed_weight,
const std::optional<Tensor>& bias) {
const std::optional<Tensor>& bias,
at::Tensor& output) {
TORCH_WARN_ONCE("fbgemm_linear_fp16_weight_fp32_activation is deprecated "
"and will be removed in a future PyTorch release.")
@ -436,9 +437,11 @@ Tensor fbgemm_linear_fp16_weight_fp32_activation(
// NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions)
const int64_t M = size_to_dim_(input.dim() - 1, input.sizes());
const int64_t N = packed_weight_fp16.numCols();
std::vector<int64_t> output_size = input.sizes().vec();
output_size.back() = N;
Tensor output = at::empty(output_size, input.options().dtype(at::kFloat));
// Resize output Tensor
output.resize_(output_size);
// Call the fp16 gemm interface
fbgemm::cblas_gemm_compute(
@ -460,6 +463,14 @@ Tensor fbgemm_linear_fp16_weight_fp32_activation(
return output;
}
Tensor fbgemm_linear_fp16_weight_fp32_activation(
const Tensor& input,
const Tensor& packed_weight,
const std::optional<Tensor>& bias) {
at::Tensor output = at::empty({0}, input.options().dtype(at::kFloat));
return at::native::fbgemm_linear_fp16_weight_fp32_activation(input, packed_weight, bias, output);
}
Tensor fbgemm_linear_fp16_weight(
const Tensor& input,
const Tensor& packed_weight,
@ -468,6 +479,15 @@ Tensor fbgemm_linear_fp16_weight(
input, packed_weight, bias);
}
Tensor fbgemm_linear_fp16_weight(
const Tensor& input,
const Tensor& packed_weight,
const Tensor& bias,
at::Tensor& output) {
return at::native::fbgemm_linear_fp16_weight_fp32_activation(
input, packed_weight, bias, output);
}
#else // USE_FBGEMM
Tensor fbgemm_linear_int8_weight_fp32_activation(
@ -554,6 +574,21 @@ Tensor fbgemm_pack_gemm_matrix_fp16(const Tensor& weight) {
false, "This PyTorch installation was not built with FBGEMM operators");
}
Tensor fbgemm_linear_fp16_weight_fp32_activation(
const Tensor& input,
const Tensor& packed_weight,
const std::optional<Tensor>& bias,
at::Tensor& output) {
TORCH_WARN_ONCE("fbgemm_linear_fp16_weight_fp32_activation is deprecated "
"and will be removed in a future PyTorch release.")
// We make a strong guarantee that models using these operators will have the
// same numerics across different machines. Therefore, we do not provide a
// fallback path and rather fail loudly if we cannot run FBGEMM.
TORCH_CHECK(
false, "This PyTorch installation was not built with FBGEMM operators");
}
Tensor fbgemm_linear_fp16_weight_fp32_activation(
const Tensor& input,
const Tensor& packed_weight,
@ -568,6 +603,21 @@ Tensor fbgemm_linear_fp16_weight_fp32_activation(
false, "This PyTorch installation was not built with FBGEMM operators");
}
Tensor fbgemm_linear_fp16_weight(
const Tensor& input,
const Tensor& packed_weight,
const Tensor& bias,
at::Tensor& output) {
TORCH_WARN_ONCE("fbgemm_linear_fp16_weight is deprecated "
"and will be removed in a future PyTorch release.")
// We make a strong guarantee that models using these operators will have the
// same numerics across different machines. Therefore, we do not provide a
// fallback path and rather fail loudly if we cannot run FBGEMM.
TORCH_CHECK(
false, "This PyTorch installation was not built with FBGEMM operators");
}
Tensor fbgemm_linear_fp16_weight(
const Tensor& input,
const Tensor& packed_weight,

View File

@ -18,6 +18,7 @@
#include <ATen/ops/is_set_to_native.h>
#include <ATen/ops/size_native.h>
#include <ATen/ops/stride_native.h>
#include <ATen/ops/sym_is_contiguous_native.h>
#include <ATen/ops/sym_numel_native.h>
#include <ATen/ops/sym_size_native.h>
#include <ATen/ops/sym_storage_offset_native.h>
@ -57,6 +58,12 @@ c10::SymInt sym_size(const Tensor& self, int64_t dim) {
return self.sym_size(dim);
}
c10::SymBool sym_is_contiguous(
const Tensor& self,
c10::MemoryFormat memory_format) {
return self.sym_is_contiguous(memory_format);
}
c10::SymInt sym_stride(const Tensor& self, int64_t dim) {
return self.sym_stride(dim);
}

View File

@ -148,6 +148,56 @@ namespace fe = cudnn_frontend;
#define MAX_MHA_DIM 4
// Whether we will use ragged offsets in the dense (non-nested) path
// to avoid recompilation
bool use_ragged_in_dense(
const Tensor& q,
const Tensor& k,
const Tensor& v,
const Tensor& o,
bool has_bias) {
static bool flag =
c10::utils::check_env("TORCH_CUDNN_SDPA_AVOID_RECOMPILE") == true;
if (!flag) {
return flag;
}
TORCH_WARN_ONCE(
"TORCH_CUDNN_SDPA_AVOID_RECOMPILE=1 is currently experimental. "
"Please report any issues to https://github.com/pytorch/pytorch/issues.");
if (has_bias) {
TORCH_WARN_ONCE(
"TORCH_CUDNN_SDPA_AVOID_RECOMPILE=1 only works without bias."
"Consider using the is_causal hint instead of bias for causal masking."
"Falling back to regular dense case, which may trigger excessive recompilation.");
return !has_bias;
}
bool all_bshd = q.dim() == 4 && q.transpose(1, 2).is_contiguous() &&
k.dim() == 4 && k.transpose(1, 2).is_contiguous() && v.dim() == 4 &&
v.transpose(1, 2).is_contiguous() && o.dim() == 4 &&
o.transpose(1, 2).is_contiguous();
if (!all_bshd) {
TORCH_WARN_ONCE(
"TORCH_CUDNN_SDPA_AVOID_RECOMPILE=1 only works with Q, K, V, and output in BSHD memory layout,"
"e.g., Q, K, V must be allocated with torch.randn((B, S, H, D).transpose(1, 2)."
"Falling back to regualr dense case, which may trigger excessive recompilation.");
}
return all_bshd;
}
int roundup_power2(int dim) {
if (!dim) {
return 1;
}
dim--;
dim |= dim >> 1;
dim |= dim >> 2;
dim |= dim >> 4;
dim |= dim >> 8;
dim |= dim >> 16;
dim++;
return dim;
}
struct MHAParams {
c10::DeviceIndex device_id;
fe::DataType_t dataType;
@ -171,6 +221,7 @@ struct MHAParams {
// might be redundant if we take 0 dim/stride
// as signaling no-bias
bool has_attn_bias;
bool use_ragged;
};
void setMHAParams(
@ -228,6 +279,20 @@ void setMHAParams(
std::copy(k.strides().begin(), k.strides().end(), params.k_stride.begin());
std::copy(v.sizes().begin(), v.sizes().end(), params.v_dim.begin());
std::copy(v.strides().begin(), v.strides().end(), params.v_stride.begin());
bool use_ragged = use_ragged_in_dense(q, k, v, q, params.has_attn_bias);
params.use_ragged = use_ragged;
if (use_ragged) {
// ignore B - stride in BSHD (THD) avoid-recompile
params.q_stride[0] = INT_MAX;
params.k_stride[0] = INT_MAX;
params.v_stride[0] = INT_MAX;
// fix seqlen to rounded value
params.s_q = roundup_power2(params.s_q);
params.s_kv = roundup_power2(params.s_kv);
params.q_dim[2] = roundup_power2(params.q_dim[2]);
params.k_dim[2] = roundup_power2(params.k_dim[2]);
params.v_dim[2] = roundup_power2(params.v_dim[2]);
}
// uninit is OK as the struct is memset 0'd
if (params.has_attn_bias) {
std::copy(
@ -277,15 +342,29 @@ struct MHACacheKeyWrapper : ParamsWrapper<MHAParams> {
template <typename T, typename KeyType>
struct MHAGraphCache {
std::unordered_map<KeyType, T, ParamsWrapperHash<KeyType>> engine_cache;
int count = 0;
int hits = 0;
// no mutexes here as caches are now thread local for v8, can also return a
// pointer to the Execution Plan if we know it will not be invalidated by
// another thread
T* find(const KeyType& key) {
static bool flag =
c10::utils::check_env("TORCH_CUDNN_SDPA_CACHE_DEBUG") == true;
if (flag && count) {
TORCH_WARN(
"SDPA Cache Called ",
count,
" times. Hit rate: ",
100 * hits / count,
"%");
}
count++;
auto it = engine_cache.find(key);
if (it == engine_cache.end()) {
return nullptr;
}
hits++;
return &(it->second);
}
@ -402,6 +481,25 @@ auto build_graph(
.set_is_inference(return_softmaxstats == false)
.set_causal_mask(is_causal)
.set_attn_scale(attn_scale);
if (use_ragged_in_dense(q, k, v, o, attn_bias.has_value())) {
auto SEQ_LEN_Q_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(SEQ_LEN_Q)
.set_name("Seq_q")
.set_dim({b, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto SEQ_LEN_KV_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(SEQ_LEN_KV)
.set_name("Seq_kv")
.set_dim({b, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
scaled_dot_product_flash_attention_options.set_seq_len_q(SEQ_LEN_Q_)
.set_seq_len_kv(SEQ_LEN_KV_)
.set_padding_mask(true);
}
if (dropout_probability != 0.0f) {
auto seed = mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(SEED)
@ -425,23 +523,11 @@ auto build_graph(
dropout_probability, seed, offset);
}
auto Q_ = mha_graph->tensor(
fe::graph::Tensor_attributes()
.set_uid(Q)
.set_name("Q")
.set_dim(q.sizes().vec())
.set_stride(fixSizeOneDimStrideSDPA(q.sizes(), q.strides().vec())));
fe::graph::Tensor_attributes().set_uid(Q).set_name("Q"));
auto K_ = mha_graph->tensor(
fe::graph::Tensor_attributes()
.set_uid(K)
.set_name("K")
.set_dim(k.sizes().vec())
.set_stride(fixSizeOneDimStrideSDPA(k.sizes(), k.strides().vec())));
fe::graph::Tensor_attributes().set_uid(K).set_name("K"));
auto V_ = mha_graph->tensor(
fe::graph::Tensor_attributes()
.set_uid(V)
.set_name("V")
.set_dim(v.sizes().vec())
.set_stride(fixSizeOneDimStrideSDPA(v.sizes(), v.strides().vec())));
fe::graph::Tensor_attributes().set_uid(V).set_name("V"));
std::optional<std::shared_ptr<fe::graph::Tensor_attributes>> bias;
if (attn_bias.has_value()) {
bias =
@ -455,12 +541,90 @@ auto build_graph(
auto [O_, Stats] =
mha_graph->sdpa(Q_, K_, V_, scaled_dot_product_flash_attention_options);
O_->set_uid(O);
O_->set_output(true).set_dim(o.sizes().vec()).set_stride(o.strides().vec());
O_->set_uid(O).set_output(true);
if (Stats) {
Stats->set_uid(LSE);
Stats->set_output(true).set_data_type(fe::DataType_t::FLOAT);
Stats->set_uid(LSE)
.set_output(true)
.set_data_type(fe::DataType_t::FLOAT)
.set_stride(softmaxstats.strides().vec());
}
if (use_ragged_in_dense(q, k, v, o, attn_bias.has_value())) {
auto RAG_Q_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_Q_OFF)
.set_name("cum_seq_q")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto RAG_K_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_K_OFF)
.set_name("cum_seq_k")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto RAG_V_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_V_OFF)
.set_name("cum_seq_v")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto RAG_O_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_O_OFF)
.set_name("cum_seq_o")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto RAG_STATS_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_LSE_OFF)
.set_name("cum_seq_stats")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
O_->set_ragged_offset(RAG_O_OFF_);
Q_->set_ragged_offset(RAG_Q_OFF_);
K_->set_ragged_offset(RAG_K_OFF_);
V_->set_ragged_offset(RAG_V_OFF_);
auto qsizevec = q.sizes().vec();
auto ksizevec = k.sizes().vec();
auto vsizevec = v.sizes().vec();
auto osizevec = o.sizes().vec();
qsizevec[2] = roundup_power2(qsizevec[2]);
ksizevec[2] = roundup_power2(ksizevec[2]);
vsizevec[2] = roundup_power2(vsizevec[2]);
osizevec[2] = roundup_power2(osizevec[2]);
// we checked for BSHD contig., set fake strides as cuDNN will complain
// if e.g., a ragged dim is smaller than a non-ragged one:
// consider HBSD tensor where H is 1
Q_->set_dim(qsizevec).set_stride(
{INT_MAX, qsizevec[3], qsizevec[1] * qsizevec[3], 1});
K_->set_dim(ksizevec).set_stride(
{INT_MAX, ksizevec[3], ksizevec[1] * ksizevec[3], 1});
V_->set_dim(vsizevec).set_stride(
{INT_MAX, vsizevec[3], vsizevec[1] * vsizevec[3], 1});
O_->set_dim(osizevec).set_stride(
{INT_MAX, osizevec[3], osizevec[1] * osizevec[3], 1});
if (Stats) {
Stats->set_ragged_offset(RAG_STATS_OFF_);
auto statssizevec = softmaxstats.sizes().vec();
statssizevec[2] = roundup_power2(statssizevec[2]);
Stats->set_dim(statssizevec);
}
} else {
Q_->set_dim(q.sizes().vec())
.set_stride(fixSizeOneDimStrideSDPA(q.sizes(), q.strides().vec()));
K_->set_dim(k.sizes().vec())
.set_stride(fixSizeOneDimStrideSDPA(k.sizes(), k.strides().vec()));
V_->set_dim(v.sizes().vec())
.set_stride(fixSizeOneDimStrideSDPA(v.sizes(), v.strides().vec()));
O_->set_dim(o.sizes().vec())
.set_stride(fixSizeOneDimStrideSDPA(o.sizes(), o.strides().vec()));
if (Stats) {
Stats->set_dim(softmaxstats.sizes().vec());
}
}
AT_CUDNN_FRONTEND_CHECK(mha_graph->validate());
@ -566,7 +730,7 @@ auto build_graph_nestedtensor(
auto q_strides = q.strides();
auto k_strides = k.strides();
auto v_strides = v.strides();
// NB: cuDNN API shape is transposed
// NB: cuDNN API shape is transposed: we pass it nominally as HTD
constexpr int strideidx0 = 1;
constexpr int strideidx1 = 0;
constexpr int strideidx2 = 2;
@ -724,21 +888,32 @@ auto build_graph_backward(
.set_name("CUDNN_SDPA_BACKWARD")
.set_causal_mask(is_causal)
.set_attn_scale(attn_scale);
auto Q_ = mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(Q)
.set_name("Q")
.set_dim(q.sizes().vec())
.set_stride(q.strides().vec()));
auto K_ = mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(K)
.set_name("K")
.set_dim(k.sizes().vec())
.set_stride(k.strides().vec()));
auto V_ = mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(V)
.set_name("V")
.set_dim(v.sizes().vec())
.set_stride(v.strides().vec()));
if (use_ragged_in_dense(q, k, v, o, attn_bias.has_value())) {
auto SEQ_LEN_Q_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(SEQ_LEN_Q)
.set_name("Seq_q")
.set_dim({b, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto SEQ_LEN_KV_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(SEQ_LEN_KV)
.set_name("Seq_kv")
.set_dim({b, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
sdpa_backward_options.set_seq_len_q(SEQ_LEN_Q_)
.set_seq_len_kv(SEQ_LEN_KV_)
.set_padding_mask(true);
}
auto Q_ = mha_graph->tensor(
fe::graph::Tensor_attributes().set_uid(Q).set_name("Q"));
auto K_ = mha_graph->tensor(
fe::graph::Tensor_attributes().set_uid(K).set_name("K"));
auto V_ = mha_graph->tensor(
fe::graph::Tensor_attributes().set_uid(V).set_name("V"));
std::optional<std::shared_ptr<fe::graph::Tensor_attributes>> bias;
if (attn_bias.has_value()) {
bias =
@ -770,31 +945,108 @@ auto build_graph_backward(
: fe::DataType_t::INT64));
sdpa_backward_options.set_dropout(dropout_probability, seed, offset);
}
auto O_ = mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(O)
.set_name("O")
.set_dim(o.sizes().vec())
.set_stride(o.strides().vec()));
auto O_ = mha_graph->tensor(
fe::graph::Tensor_attributes().set_uid(O).set_name("O"));
auto Stats = mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(LSE)
.set_name("Stats")
.set_dim(softmaxstats.sizes().vec())
.set_stride(softmaxstats.strides().vec())
.set_data_type(fe::DataType_t::FLOAT));
auto Do = mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(DO)
.set_name("DO")
.set_dim(dO.sizes().vec())
.set_stride(dO.strides().vec()));
auto Do = mha_graph->tensor(
fe::graph::Tensor_attributes().set_uid(DO).set_name("DO"));
auto [Dq, Dk, Dv] = mha_graph->sdpa_backward(
Q_, K_, V_, O_, Do, Stats, sdpa_backward_options);
Dq->set_uid(DQ);
Dq->set_output(true).set_dim(dQ.sizes().vec()).set_stride(dQ.strides().vec());
Dk->set_uid(DK);
Dk->set_output(true).set_dim(dK.sizes().vec()).set_stride(dK.strides().vec());
Dv->set_uid(DV);
Dv->set_output(true).set_dim(dV.sizes().vec()).set_stride(dV.strides().vec());
Dq->set_uid(DQ).set_output(true);
Dk->set_uid(DK).set_output(true);
Dv->set_uid(DV).set_output(true);
if (use_ragged_in_dense(q, k, v, o, attn_bias.has_value())) {
auto RAG_Q_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_Q_OFF)
.set_name("cum_seq_q")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto RAG_K_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_K_OFF)
.set_name("cum_seq_k")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto RAG_V_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_V_OFF)
.set_name("cum_seq_v")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto RAG_O_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_O_OFF)
.set_name("cum_seq_o")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
auto RAG_STATS_OFF_ =
mha_graph->tensor(fe::graph::Tensor_attributes()
.set_uid(RAG_LSE_OFF)
.set_name("cum_seq_stats")
.set_dim({b + 1, 1, 1, 1})
.set_stride({1, 1, 1, 1})
.set_data_type(fe::DataType_t::INT32));
O_->set_ragged_offset(RAG_O_OFF_);
Q_->set_ragged_offset(RAG_Q_OFF_);
K_->set_ragged_offset(RAG_K_OFF_);
V_->set_ragged_offset(RAG_V_OFF_);
Dq->set_ragged_offset(RAG_Q_OFF_);
Dk->set_ragged_offset(RAG_K_OFF_);
Dv->set_ragged_offset(RAG_V_OFF_);
Do->set_ragged_offset(RAG_O_OFF_);
auto qsizevec = q.sizes().vec();
auto ksizevec = k.sizes().vec();
auto vsizevec = v.sizes().vec();
auto osizevec = o.sizes().vec();
qsizevec[2] = roundup_power2(qsizevec[2]);
ksizevec[2] = roundup_power2(ksizevec[2]);
vsizevec[2] = roundup_power2(vsizevec[2]);
osizevec[2] = roundup_power2(osizevec[2]);
// see corresponding section in the forward about the hardcoding
// of strides here
Q_->set_dim(qsizevec).set_stride(
{INT_MAX, qsizevec[3], qsizevec[1] * qsizevec[3], 1});
K_->set_dim(ksizevec).set_stride(
{INT_MAX, ksizevec[3], ksizevec[1] * ksizevec[3], 1});
V_->set_dim(vsizevec).set_stride(
{INT_MAX, vsizevec[3], vsizevec[1] * vsizevec[3], 1});
O_->set_dim(osizevec).set_stride(
{INT_MAX, osizevec[3], osizevec[1] * osizevec[3], 1});
// should be identical to their non-d counterparts
Dq->set_dim(qsizevec).set_stride(
{INT_MAX, qsizevec[3], qsizevec[1] * qsizevec[3], 1});
Dk->set_dim(ksizevec).set_stride(
{INT_MAX, ksizevec[3], ksizevec[1] * ksizevec[3], 1});
Dv->set_dim(vsizevec).set_stride(
{INT_MAX, vsizevec[3], vsizevec[1] * vsizevec[3], 1});
Do->set_dim(osizevec).set_stride(
{INT_MAX, osizevec[3], osizevec[1] * osizevec[3], 1});
Stats->set_ragged_offset(RAG_STATS_OFF_);
auto statssizevec = softmaxstats.sizes().vec();
statssizevec[2] = roundup_power2(statssizevec[2]);
Stats->set_dim(statssizevec);
} else {
O_->set_dim(o.sizes().vec()).set_stride(o.strides().vec());
Q_->set_dim(q.sizes().vec()).set_stride(q.strides().vec());
K_->set_dim(k.sizes().vec()).set_stride(k.strides().vec());
V_->set_dim(v.sizes().vec()).set_stride(v.strides().vec());
Dq->set_dim(dQ.sizes().vec()).set_stride(dQ.strides().vec());
Dk->set_dim(dK.sizes().vec()).set_stride(dK.strides().vec());
Dv->set_dim(dV.sizes().vec()).set_stride(dV.strides().vec());
Do->set_dim(dO.sizes().vec()).set_stride(dO.strides().vec());
Stats->set_dim(softmaxstats.sizes().vec());
}
AT_CUDNN_FRONTEND_CHECK(mha_graph->validate());
AT_CUDNN_FRONTEND_CHECK(mha_graph->build_operation_graph(handle));
AT_CUDNN_FRONTEND_CHECK(
@ -1066,6 +1318,47 @@ void run_cudnn_SDP_fprop(
Tensor& o,
Tensor& dropoutseed,
Tensor& dropoutoffset) {
// do nothing if we got 0-element tensors
if (!q.numel() || !k.numel() || !v.numel()) {
return;
}
Tensor seqlen_q, seqlen_kv;
Tensor rag_off_q, rag_off_k, rag_off_v, rag_off_o, rag_off_lse;
if (!o.defined()) {
// q is passed to us in BHSD dim order
alloc_with_matching_layout(q, o, {b, h, s_q, d_v});
}
bool use_ragged = use_ragged_in_dense(q, k, v, o, attn_bias.has_value());
if (return_softmaxstats && !softmaxstats.defined()) {
// TODO(eqy): investigate why cuDNN doesn't like BSH layout softmaxstats
if (!use_ragged) {
softmaxstats = at::empty({b, h, s_q, 1}, q.options().dtype(kFloat));
} else {
softmaxstats =
at::empty({b, s_q, h, 1}, q.options().dtype(kFloat)).transpose(1, 2);
}
}
if (use_ragged) {
seqlen_q = at::full({b, 1, 1, 1}, s_q, q.options().dtype(kInt));
seqlen_kv = at::full({b, 1, 1, 1}, s_kv, q.options().dtype(kInt));
auto cum_seqlen_q = at::full({b + 1, 1, 1, 1}, s_q, q.options().dtype(kInt))
.cumsum(0, kInt)
.add_(-s_q);
auto cum_seqlen_kv =
at::full({b + 1, 1, 1, 1}, s_kv, q.options().dtype(kInt))
.cumsum(0, kInt)
.add_(-s_kv);
rag_off_q = cum_seqlen_q.mul(q.stride(-2));
rag_off_k = cum_seqlen_kv.mul(k.stride(-2));
rag_off_v = cum_seqlen_kv.mul(v.stride(-2));
rag_off_o = cum_seqlen_q.mul(o.stride(-2));
if (return_softmaxstats) {
rag_off_lse = cum_seqlen_q.mul(softmaxstats.stride(-2));
}
}
const auto dprops = at::cuda::getCurrentDeviceProperties();
auto _dropoutseed = dropoutseed;
auto _dropoutoffset = dropoutoffset;
@ -1076,21 +1369,10 @@ void run_cudnn_SDP_fprop(
}
cudnnHandle_t handle = getCudnnHandle();
if (!o.defined()) {
// q is passed to us in BHSD dim order
alloc_with_matching_layout(q, o, {b, h, s_q, d_v});
}
if (return_softmaxstats && !softmaxstats.defined()) {
// TODO(eqy): verify that this is correct
softmaxstats = at::empty({b, h, s_q}, q.options().dtype(kFloat));
}
// do nothing if we got 0-element tensors
if (!q.numel() || !k.numel() || !v.numel()) {
return;
}
// NB: The key initialization will round up sequence length, stride data etc.
// if use_ragged_in_dense is enabled (to allow multiple sequence lenghths to
// reuse the same cached value/graph)
auto key = MHACacheKeyWrapper(
b,
h,
@ -1147,6 +1429,17 @@ void run_cudnn_SDP_fprop(
variant_pack[SEED] = _dropoutseed.data_ptr();
variant_pack[OFFSET] = _dropoutoffset.data_ptr();
}
if (use_ragged_in_dense(q, k, v, o, attn_bias.has_value())) {
variant_pack[SEQ_LEN_Q] = seqlen_q.data_ptr();
variant_pack[SEQ_LEN_KV] = seqlen_kv.data_ptr();
variant_pack[RAG_Q_OFF] = rag_off_q.data_ptr();
variant_pack[RAG_K_OFF] = rag_off_k.data_ptr();
variant_pack[RAG_V_OFF] = rag_off_v.data_ptr();
variant_pack[RAG_O_OFF] = rag_off_o.data_ptr();
if (return_softmaxstats) {
variant_pack[RAG_LSE_OFF] = rag_off_lse.data_ptr();
}
}
auto workspace_size = mha_graph->get_workspace_size();
auto workspace_ptr =
c10::cuda::CUDACachingAllocator::get()->allocate(workspace_size);
@ -1278,6 +1571,9 @@ void run_cudnn_SDP_bprop(
!softmaxstats.numel()) {
return;
}
Tensor seqlen_q, seqlen_kv;
Tensor rag_off_q, rag_off_k, rag_off_v, rag_off_o, rag_off_lse;
auto dprops = at::cuda::getCurrentDeviceProperties();
auto _dropoutseed = dropoutseed;
auto _dropoutoffset = dropoutoffset;
@ -1304,10 +1600,28 @@ void run_cudnn_SDP_bprop(
"with matching strides...");
#else
const auto innermost_dO_stride = dO.strides()[dO.strides().size() - 1];
if (innermost_dO_stride != 1) {
if (innermost_dO_stride != 1 ||
use_ragged_in_dense(q, k, v, o, attn_bias.has_value())) {
permute_to_matching_layout(o, dO_);
}
#endif
if (use_ragged_in_dense(q, k, v, o, attn_bias.has_value())) {
seqlen_q = at::full({b, 1, 1, 1}, s_q, q.options().dtype(kInt));
seqlen_kv = at::full({b, 1, 1, 1}, s_kv, q.options().dtype(kInt));
auto cum_seqlen_q = at::full({b + 1, 1, 1, 1}, s_q, q.options().dtype(kInt))
.cumsum(0, kInt)
.add_(-s_q);
auto cum_seqlen_kv =
at::full({b + 1, 1, 1, 1}, s_kv, q.options().dtype(kInt))
.cumsum(0, kInt)
.add_(-s_kv);
rag_off_q = cum_seqlen_q.mul(q.stride(-2));
rag_off_k = cum_seqlen_kv.mul(k.stride(-2));
rag_off_v = cum_seqlen_kv.mul(v.stride(-2));
rag_off_o = cum_seqlen_q.mul(o.stride(-2));
rag_off_lse = cum_seqlen_q.mul(softmaxstats.stride(-2));
}
cudnnHandle_t handle = getCudnnHandle();
auto key = MHACacheKeyWrapper(
b,
@ -1372,6 +1686,16 @@ void run_cudnn_SDP_bprop(
if (attn_bias.has_value()) {
variant_pack[BIAS] = attn_bias.value().data_ptr();
}
if (use_ragged_in_dense(q, k, v, o, attn_bias.has_value())) {
variant_pack[SEQ_LEN_Q] = seqlen_q.data_ptr();
variant_pack[SEQ_LEN_KV] = seqlen_kv.data_ptr();
variant_pack[RAG_Q_OFF] = rag_off_q.data_ptr();
variant_pack[RAG_K_OFF] = rag_off_k.data_ptr();
variant_pack[RAG_V_OFF] = rag_off_v.data_ptr();
variant_pack[RAG_O_OFF] = rag_off_o.data_ptr();
variant_pack[RAG_LSE_OFF] = rag_off_lse.data_ptr();
}
auto workspace_size = mha_graph->get_workspace_size();
auto workspace_ptr =
c10::cuda::CUDACachingAllocator::get()->allocate(workspace_size);

View File

@ -0,0 +1,25 @@
#pragma once
#include <c10/metal/common.h>
#ifdef __METAL__
enum class GridSamplerInterpolation { Bilinear, Nearest, Bicubic };
enum class GridSamplerPadding { Zeros, Border, Reflection };
#else
#include <ATen/native/GridSamplerUtils.h>
using at::native::GridSamplerInterpolation;
using at::native::GridSamplerPadding;
#endif
template <unsigned N = 5, typename idx_type_t = int32_t>
struct GridSamplerParams {
int32_t sampler_dims;
::c10::metal::array<idx_type_t, N> output_sizes;
::c10::metal::array<idx_type_t, N> output_strides;
::c10::metal::array<idx_type_t, N> input_sizes;
::c10::metal::array<idx_type_t, N> input_strides;
::c10::metal::array<idx_type_t, N> grid_sizes;
::c10::metal::array<idx_type_t, N> grid_strides;
GridSamplerInterpolation interpolation_mode;
GridSamplerPadding padding_mode;
bool align_corners;
};

View File

@ -0,0 +1,329 @@
#include <ATen/native/mps/kernels/GridSampler.h>
#include <c10/metal/utils.h>
#include <metal_array>
#include <metal_stdlib>
using namespace metal;
using namespace c10::metal;
struct GridSamplerOffsets {
int32_t output;
int32_t input;
int32_t grid;
GridSamplerOffsets() : output(0), input(0), grid(0) {}
};
// Find offsets into the tensors that this thread will operate on,
// based on the thread ID.
static GridSamplerOffsets find_grid_sampler_offsets(
constant int32_t* output_sizes,
constant int32_t* output_strides,
constant int32_t* input_sizes,
constant int32_t* input_strides,
constant int32_t* grid_sizes,
constant int32_t* grid_strides,
int32_t sampler_dims,
uint tid) {
auto dims = sampler_dims + 2;
auto output_idx = static_cast<int32_t>(tid);
GridSamplerOffsets offsets;
for (auto dim = dims - 1; dim >= 0; dim--) {
auto dim_idx = output_idx % output_sizes[dim];
output_idx = output_idx / output_sizes[dim];
// Select the output element that this thread will calculate.
// output shape:
// 2 sampler dims: (N, C, Hout, Wout)
// 3 sampler dims: (N, C, Dout, Hout, Wout)
offsets.output += output_strides[dim] * dim_idx;
// Select the batch and channel for the input.
// input shape:
// 2 sampler dims: (N, C, Hin, Win)
// 3 sampler dims: (N, C, Din, Hin, Win)
if (dim < 2) {
offsets.input += input_strides[dim] * dim_idx;
}
// Select the grid coordinates for the output element.
// grid shape:
// 2 sampler dims: (N, Hout, Wout, 2)
// 3 sampler dims: (N, Dout, Hout, Wout, 3)
if (dim == 0) {
offsets.grid += grid_strides[dim] * dim_idx;
} else if (dim >= 2) {
offsets.grid += grid_strides[dim - 1] * dim_idx;
}
}
return offsets;
}
// Mod function which gives postive output when `a` is negative
static int32_t mod(int32_t a, int32_t b) {
auto r = a % b;
return r + (r < 0 ? b : 0);
}
// Sentinel index value to indicate zero padding
constant int32_t IDX_ZERO = -1;
// Apply padding to an index into the input
static int32_t pad_input_index(
int32_t idx,
int32_t input_size,
GridSamplerPadding padding_mode,
bool align_corners) {
int32_t idx_padded = idx;
if (padding_mode == GridSamplerPadding::Zeros) {
idx_padded = (idx < 0) ? IDX_ZERO : idx_padded;
idx_padded = (idx >= input_size) ? IDX_ZERO : idx_padded;
} else if (padding_mode == GridSamplerPadding::Border) {
idx_padded = (idx < 0) ? 0 : idx_padded;
idx_padded = (idx >= input_size) ? input_size - 1 : idx_padded;
} else if (padding_mode == GridSamplerPadding::Reflection) {
auto scale_length = align_corners ? (input_size - 1) : input_size;
auto idx_mod = mod(idx, scale_length);
auto idx_mod_reverse = (input_size - 1) - idx_mod;
bool is_reverse = (abs(idx - idx_mod) / scale_length) % 2 == 1;
idx_padded = is_reverse ? idx_mod_reverse : idx_mod;
}
return idx_padded;
}
template <int32_t dims, typename T>
T get_tensor_val(
constant T* input,
constant int32_t* input_strides,
int32_t indices[dims]) {
bool found_idx_zero = false;
int32_t offset = 0;
for (auto dim = 0; dim < dims; dim++) {
auto idx = indices[dim];
found_idx_zero = found_idx_zero || (idx == IDX_ZERO);
offset += (found_idx_zero ? 0 : idx) * input_strides[dim];
}
return found_idx_zero ? 0 : input[offset];
}
// This function performs 3D linear interpolation for one value. One way to
// think of how this works is to imagine a unit cube where each corner of the
// cube has one scalar value associated with it. Inside the cube, the values
// change linearly, so the gradient is constant. The values associated with each
// corner are given by the `input`, indexed at all eight different combinations
// of the `left_indices` and `right_indices`. Given a 3D coordinate anywhere
// within the cube, specified by the `scales` argument, we must calculate the
// value associated with that position.
template <typename T>
T interpolate_linear_3d(
constant T* input,
constant int32_t* input_strides,
int32_t left_indices[3],
int32_t right_indices[3],
opmath_t<T> scales[3]) {
int32_t a_idx[3] = {left_indices[0], left_indices[1], left_indices[2]};
int32_t b_idx[3] = {left_indices[0], left_indices[1], right_indices[2]};
int32_t c_idx[3] = {left_indices[0], right_indices[1], left_indices[2]};
int32_t d_idx[3] = {left_indices[0], right_indices[1], right_indices[2]};
int32_t e_idx[3] = {right_indices[0], left_indices[1], left_indices[2]};
int32_t f_idx[3] = {right_indices[0], left_indices[1], right_indices[2]};
int32_t g_idx[3] = {right_indices[0], right_indices[1], left_indices[2]};
int32_t h_idx[3] = {right_indices[0], right_indices[1], right_indices[2]};
auto a =
static_cast<opmath_t<T>>(get_tensor_val<3>(input, input_strides, a_idx));
auto b =
static_cast<opmath_t<T>>(get_tensor_val<3>(input, input_strides, b_idx));
auto c =
static_cast<opmath_t<T>>(get_tensor_val<3>(input, input_strides, c_idx));
auto d =
static_cast<opmath_t<T>>(get_tensor_val<3>(input, input_strides, d_idx));
auto e =
static_cast<opmath_t<T>>(get_tensor_val<3>(input, input_strides, e_idx));
auto f =
static_cast<opmath_t<T>>(get_tensor_val<3>(input, input_strides, f_idx));
auto g =
static_cast<opmath_t<T>>(get_tensor_val<3>(input, input_strides, g_idx));
auto h =
static_cast<opmath_t<T>>(get_tensor_val<3>(input, input_strides, h_idx));
auto scale0_right = scales[0];
auto scale1_right = scales[1];
auto scale2_right = scales[2];
auto scale0_left = 1 - scale0_right;
auto scale1_left = 1 - scale1_right;
auto scale2_left = 1 - scale2_right;
return static_cast<T>(
scale0_left * scale1_left * scale2_left * a +
scale0_left * scale1_left * scale2_right * b +
scale0_left * scale1_right * scale2_left * c +
scale0_left * scale1_right * scale2_right * d +
scale0_right * scale1_left * scale2_left * e +
scale0_right * scale1_left * scale2_right * f +
scale0_right * scale1_right * scale2_left * g +
scale0_right * scale1_right * scale2_right * h);
}
// Calculates a single output element.
// `input` shape:
// 2 sampler dims: (Hin, Win)
// 3 sampler dims: (Din, Hin, Win)
// `coords` values:
// 2 sampler dims: (Wcoord, Hcoord)
// 3 sampler dims: (Wcoord, Hcoord, Dcoord)
template <typename T>
void grid_sampler_single_element(
device T* output,
constant T* input,
constant T* coords,
int32_t dims,
constant int32_t* input_sizes,
constant int32_t* input_strides,
GridSamplerInterpolation interpolation_mode,
GridSamplerPadding padding_mode,
bool align_corners) {
int32_t left_indices[3];
int32_t right_indices[3];
opmath_t<T> scales[3];
// For each dimension, find the pair of indices in the cooresponding dimension
// of `input` which surround the grid coordinate in that dimension. We'll do
// this by mapping different coordiante spaces onto each other. There are
// basically three different coordinate spaces to keep in mind:
//
// * aligned grid space
// - `-1` refers to the leftmost input value.
// - `1` refers to the rightmost input value.
//
// * unaligned grid space
// - `-1` refers to the midpoint between the leftmost input value and
// a padding value to the left of that.
// - `1` refers to the midpoint between the rightmost input value and
// a padding value to the right of that.
//
// * input index space
// - `n` refers to the n-th value of the input.
// - `0` refers to the leftmost input value.
// - `N-1` refers to the rightmost input value.
//
// If `align_corners == False`, then the coordinates are is in unaligned grid
// space, and we will map it onto aligned grid space. If `align_corners ==
// True`, then coordinates are already in aligned grid space.
//
// Then we will map unaligned grid space onto input index space, making it
// relatively simple to find the two input indices that surround the
// coordinate.
for (auto coord_dim = 0; coord_dim < dims; coord_dim++) {
auto input_dim = dims - coord_dim - 1;
auto input_size = input_sizes[input_dim];
auto coord = static_cast<opmath_t<T>>(coords[coord_dim]);
// Interpret nan as -1
coord = isnan(coord) ? -1 : coord;
if (!align_corners) {
// Map unaligned grid space to aligned grid space
auto corner_alignment_factor = static_cast<opmath_t<T>>(input_size) /
static_cast<opmath_t<T>>(input_size - 1);
coord = coord * corner_alignment_factor;
}
// Map aligned grid space to input index space
coord = (coord + 1) * (static_cast<opmath_t<T>>(input_size - 1) / 2);
// Get the input indices surrounding the coordinate, apply padding to them,
// and obtain the scaling factor between the two for interpolation.
auto left_idx = static_cast<int32_t>(floor(coord));
auto right_idx = static_cast<int32_t>(ceil(coord));
left_indices[input_dim] =
pad_input_index(left_idx, input_size, padding_mode, align_corners);
right_indices[input_dim] =
pad_input_index(right_idx, input_size, padding_mode, align_corners);
auto scale = coord - left_idx;
if (interpolation_mode == GridSamplerInterpolation::Nearest) {
// TODO: For some reason, rounding the scale to 0 or 1 and then using
// linear interpolation seems to work perfectly with zero padding mode,
// but we get flaky failures with border and reflection padding modes.
// Need to investigate and fix it.
scale = (scale <= 0.5) ? 0 : 1;
}
scales[input_dim] = scale;
}
// Now that we have the bounding indices and scale factor for each dimension
// of the input, we can interpolate.
if (dims == 3) {
*output = interpolate_linear_3d(
input, input_strides, left_indices, right_indices, scales);
}
}
template <typename T>
kernel void grid_sampler(
device T* output [[buffer(0)]],
constant T* input [[buffer(1)]],
constant T* grid [[buffer(2)]],
constant GridSamplerParams<5>& params [[buffer(3)]],
uint tid [[thread_position_in_grid]]) {
auto output_sizes = params.output_sizes.data();
auto output_strides = params.output_strides.data();
auto input_sizes = params.input_sizes.data();
auto input_strides = params.input_strides.data();
auto grid_sizes = params.grid_sizes.data();
auto grid_strides = params.grid_strides.data();
auto sampler_dims = params.sampler_dims;
auto offsets = find_grid_sampler_offsets(
output_sizes,
output_strides,
input_sizes,
input_strides,
grid_sizes,
grid_strides,
sampler_dims,
tid);
output += offsets.output;
input += offsets.input;
auto coords = grid + offsets.grid;
input_sizes += 2;
input_strides += 2;
auto interpolation_mode = params.interpolation_mode;
auto padding_mode = params.padding_mode;
auto align_corners = params.align_corners;
grid_sampler_single_element(
output,
input,
coords,
sampler_dims,
input_sizes,
input_strides,
interpolation_mode,
padding_mode,
align_corners);
}
#define REGISTER_GRID_SAMPLER_OP(DTYPE) \
template [[host_name("grid_sampler_" #DTYPE)]] \
kernel void grid_sampler<DTYPE>( \
device DTYPE * output [[buffer(0)]], \
constant DTYPE * input [[buffer(1)]], \
constant DTYPE * grid [[buffer(2)]], \
constant GridSamplerParams<5> & params [[buffer(3)]], \
uint tid [[thread_position_in_grid]]);
REGISTER_GRID_SAMPLER_OP(float);
REGISTER_GRID_SAMPLER_OP(half);
REGISTER_GRID_SAMPLER_OP(bfloat);

View File

@ -1,7 +1,10 @@
#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
#include <ATen/mps/MPSProfiler.h>
#include <ATen/native/GridSamplerUtils.h>
#include <ATen/native/Pool.h>
#include <ATen/native/mps/MPSGraphVenturaOps.h>
#include <ATen/native/mps/OperationUtils.h>
#include <ATen/native/mps/kernels/GridSampler.h>
#ifndef AT_PER_OPERATOR_HEADERS
#include <ATen/Functions.h>
@ -9,9 +12,17 @@
#else
#include <ATen/ops/grid_sampler_2d.h>
#include <ATen/ops/grid_sampler_2d_native.h>
#include <ATen/ops/grid_sampler_3d_native.h>
#endif
namespace at::native {
#ifndef PYTORCH_JIT_COMPILE_SHADERS
static auto& lib = mps::MetalShaderLibrary::getBundledLibrary();
#else
#include <ATen/native/mps/GridSampler_metallib.h>
#endif
namespace mps {
static void grid_sampler_2d_mps_impl(Tensor& output,
const Tensor& input,
@ -120,6 +131,96 @@ static void grid_sampler_2d_mps_impl(Tensor& output,
runMPSGraph(stream, cachedGraph->graph(), feeds, outputPlaceholder);
}
}
static void grid_sampler_template(Tensor& output,
const Tensor& input,
const Tensor& grid,
int64_t _interpolation_mode,
int64_t _padding_mode,
bool align_corners,
int32_t sampler_dims,
const std::string& op_name) {
check_grid_sampler_common(input, grid);
switch (sampler_dims) {
case 2:
check_grid_sampler_2d(input, grid);
break;
case 3:
check_grid_sampler_3d(input, grid, _interpolation_mode);
break;
default:
TORCH_INTERNAL_ASSERT(false, "Only 2D and 3D sampling are supported, but got: ", sampler_dims);
}
TORCH_CHECK(input.scalar_type() == grid.scalar_type(),
"expected input and grid to have the same type, but got ",
input.scalar_type(),
" and ",
grid.scalar_type());
auto interpolation_mode = static_cast<GridSamplerInterpolation>(_interpolation_mode);
auto padding_mode = static_cast<GridSamplerPadding>(_padding_mode);
switch (interpolation_mode) {
case GridSamplerInterpolation::Bilinear:
break;
case GridSamplerInterpolation::Nearest:
TORCH_CHECK(false, op_name, ": Unsupported Nearest interpolation");
break;
case GridSamplerInterpolation::Bicubic:
TORCH_CHECK(false, op_name, ": Unsupported Bicubic interpolation");
break;
default:
TORCH_CHECK(false, op_name, ": Unrecognised interpolation mode: ", _interpolation_mode);
}
switch (padding_mode) {
case GridSamplerPadding::Zeros:
case GridSamplerPadding::Border:
case GridSamplerPadding::Reflection:
break;
default:
TORCH_CHECK(false, op_name, ": Unrecognised Padding Mode: ", _padding_mode);
}
auto input_size = input.sizes();
auto grid_size = grid.sizes();
output.resize_({input_size[0], input_size[1], grid_size[1], grid_size[2], grid_size[3]}, MemoryFormat::Contiguous);
auto dims = input.dim();
GridSamplerParams<5> params;
params.sampler_dims = sampler_dims;
params.padding_mode = padding_mode;
params.interpolation_mode = interpolation_mode;
params.align_corners = align_corners;
for (const auto dim : c10::irange(dims)) {
params.output_sizes[dim] = safe_downcast<int32_t, int64_t>(output.size(dim));
params.output_strides[dim] = safe_downcast<int32_t, int64_t>(output.stride(dim));
params.input_sizes[dim] = safe_downcast<int32_t, int64_t>(input.size(dim));
params.input_strides[dim] = safe_downcast<int32_t, int64_t>(input.stride(dim));
params.grid_sizes[dim] = safe_downcast<int32_t, int64_t>(grid.size(dim));
params.grid_strides[dim] = safe_downcast<int32_t, int64_t>(grid.stride(dim));
}
auto num_threads = output.numel();
MPSStream* mpsStream = getCurrentMPSStream();
dispatch_sync_with_rethrow(mpsStream->queue(), ^() {
@autoreleasepool {
id<MTLComputeCommandEncoder> computeEncoder = mpsStream->commandEncoder();
auto pso = lib.getPipelineStateForFunc("grid_sampler_" + scalarToMetalTypeString(input));
getMPSProfiler().beginProfileKernel(pso, op_name, {input, grid});
[computeEncoder setComputePipelineState:pso];
mtl_setArgs(computeEncoder, output, input, grid, params);
mtl_dispatch1DJob(computeEncoder, pso, num_threads);
getMPSProfiler().endProfileKernel(pso);
}
});
}
} // namespace mps
Tensor grid_sampler_2d_mps(const Tensor& input,
@ -135,4 +236,21 @@ Tensor grid_sampler_2d_mps(const Tensor& input,
return output;
}
Tensor grid_sampler_3d_mps(const Tensor& input,
const Tensor& grid,
int64_t interpolation_mode,
int64_t padding_mode,
bool align_corners) {
auto output = at::empty({0}, input.options(), MemoryFormat::Contiguous);
mps::grid_sampler_template(output,
input,
grid,
interpolation_mode,
padding_mode,
align_corners,
/*sampler_dims=*/3,
/*op_name=*/"grid_sampler_3d");
return output;
}
} // namespace at::native

View File

@ -2931,6 +2931,7 @@
dispatch:
CPU: grid_sampler_3d_cpu
CUDA: grid_sampler_3d_cuda
MPS: grid_sampler_3d_mps
autogen: grid_sampler_3d.out
# `grid_sampler_3d_backward` takes in `output_mask` to optimize performance for
@ -3447,8 +3448,12 @@
- func: fbgemm_linear_fp16_weight_fp32_activation(Tensor input, Tensor packed_weight, Tensor? bias) -> Tensor
- func: fbgemm_linear_fp16_weight_fp32_activation.out(Tensor input, Tensor packed_weight, Tensor? bias, Tensor(a!) output) -> Tensor
- func: fbgemm_linear_fp16_weight(Tensor input, Tensor packed_weight, Tensor bias) -> Tensor
- func: fbgemm_linear_fp16_weight.out(Tensor input, Tensor packed_weight, Tensor bias, Tensor(a!) output) -> Tensor
- func: fbgemm_pack_quantized_matrix(Tensor input) -> Tensor
- func: fbgemm_pack_quantized_matrix.KN(Tensor input, int K, int N) -> Tensor
@ -5504,6 +5509,13 @@
tags: core
manual_cpp_binding: True
- func: sym_is_contiguous(Tensor self, MemoryFormat memory_format=contiguous_format) -> SymBool
variants: function
device_check: NoCheck
device_guard: False
tags: core
manual_cpp_binding: True
- func: sym_numel(Tensor self) -> SymInt
variants: function
device_check: NoCheck

View File

@ -260,7 +260,7 @@ std::tuple<Tensor, Tensor, Tensor> _cudnn_attention_backward(
attn_bias_ /*const std::optional<Tensor>& attn_bias*/,
out /*const Tensor& o*/,
grad_out/*const Tensor& dO*/,
logsumexp.unsqueeze(-1)/*const Tensor& softmaxstats*/,
logsumexp/*const Tensor& softmaxstats*/,
dq/*Tensor& dQ*/,
dk/*Tensor& dK*/,
dv/*Tensor& dV*/,

View File

@ -313,8 +313,15 @@ void TensorImpl::throw_data_ptr_access_error() const {
c10::SymBool TensorImpl::sym_is_contiguous_custom(
at::MemoryFormat memory_format) const {
if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) {
return pyobj_slot_.load_pyobj_interpreter()->is_contiguous(
this, memory_format);
// TO reduce BC breaking and reduce having to introduce
// sym_is_contiguous. call is_contiguous when tensor does not
if (C10_UNLIKELY(has_symbolic_sizes_strides_)) {
return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(
this, memory_format);
} else {
return pyobj_slot_.load_pyobj_interpreter()->is_contiguous(
this, memory_format);
}
}
return sym_is_contiguous_default(memory_format);

View File

@ -60,6 +60,10 @@ struct NoopPyInterpreterVTable final : public PyInterpreterVTable {
bool is_contiguous(const TensorImpl* self, at::MemoryFormat) const override {
PANIC(is_contiguous);
}
c10::SymBool sym_is_contiguous(const TensorImpl* self, at::MemoryFormat)
const override {
PANIC(sym_is_contiguous);
}
bool is_strides_like(const TensorImpl* self, at::MemoryFormat)
const override {
PANIC(is_strides_like);

View File

@ -168,6 +168,9 @@ struct C10_API PyInterpreterVTable {
virtual bool is_contiguous(const TensorImpl* self, at::MemoryFormat)
const = 0;
virtual c10::SymBool sym_is_contiguous(
const TensorImpl* self,
at::MemoryFormat) const = 0;
virtual bool is_strides_like(const TensorImpl* self, at::MemoryFormat)
const = 0;
virtual bool is_non_overlapping_and_dense(const TensorImpl* self) const = 0;

Binary file not shown.

After

Width:  |  Height:  |  Size: 220 KiB

View File

@ -202,6 +202,7 @@ Below are some useful tools for debugging AOT Inductor.
logging
torch.compiler_aot_inductor_minifier
torch.compiler_aot_inductor_debugging_guide
```
To enable runtime checks on inputs, set the environment variable `AOTI_RUNTIME_CHECK_INPUTS` to 1. This will raise a `RuntimeError` if the inputs to the compiled model differ in size, data type, or strides from those used during export.

View File

@ -0,0 +1,73 @@
# AOTInductor Debugging Guide
If you encounter CUDA illegal memory access (IMA) errors while using [AOT Inductor](./torch.compiler_aot_inductor.md), this guide provides a systematic approach to debug such errors. AOT Inductor is part of the PT2 stack, similar to torch.compile, but it produces a compilation artifact that can work in a C++ environment. CUDA illegal memory errors can happen non-deterministically and even appear transient at times.
On a high-level, there are three main steps in debugging CUDA IMA errors:
- **Sanity checks**: Use basic debugging flags to catch common issues before diving deeper.
- **Pinpoint the CUDA IMA**: Make the error deterministic and identify the problematic kernel.
- **Identify problematic kernels**: Use intermediate value debugging to inspect kernel inputs and outputs.
## Step 1: Sanity Checks
Before diving deep into reliably reproducing the error, try out some existing debugging flags:
```bash
AOTI_RUNTIME_CHECK_INPUTS=1
TORCHINDUCTOR_NAN_ASSERTS=1
```
These flags take effect at compilation time (more precisely, at codegen time):
- `AOTI_RUNTIME_CHECK_INPUTS=1` checks if the inputs satisfy the same set of guards used during compilation. See {ref}`torch.compiler_troubleshooting` for more details.
- `TORCHINDUCTOR_NAN_ASSERTS=1` adds codegen before and after each Inductor's kernel to check for NaN.
## Step 2: Pinpoint the CUDA IMA
One hard part is CUDA IMA errors can be non-deterministic. They can happen at different locations, and sometimes not happen at all (though that just means the numerics are silently incorrect). With the following two flags, we can trigger the error deterministically:
```bash
PYTORCH_NO_CUDA_MEMORY_CACHING=1
CUDA_LAUNCH_BLOCKING=1
```
These flags take effect at runtime:
- `PYTORCH_NO_CUDA_MEMORY_CACHING=1` disables PyTorch's Caching Allocator, which allocates a bigger buffer than needed immediately to reduce the number of buffer allocations. This is usually the reason why CUDA illegal memory access errors are non-deterministic.
![How PyTorch's caching allocator can mask CUDA illegal memory access errors](./_static/img/aoti_debugging_guide/cuda_ima_cca.png)
*Figure: How PyTorch's caching allocator can mask CUDA illegal memory access errors*
- `CUDA_LAUNCH_BLOCKING=1` forces the kernels to launch one at a time. Without this, we would get the famous "CUDA kernel errors might be asynchronously reported at some other API call" warning since kernels are launched asynchronously.
## Step 3: Identify Problematic Kernels with Intermediate Value Debugger
The AOTI Intermediate Value Debugger can help pinpoint the problematic kernel and get information about the inputs and outputs of said kernel.
First, use:
```bash
AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=3
```
This flag takes effect at compilation time and prints the kernels one by one at runtime. Together with the previous flags, this would let us know which kernel was launched right before the error happened.
However, it is important to note that just because the error happened in that kernel, it doesn't mean that kernel is problematic. For example, it can happen that an earlier kernel is problematic and produces some wrong outputs. So the natural next step is to inspect the inputs to the problematic kernel:
```bash
AOT_INDUCTOR_FILTERED_KERNELS_TO_PRINT="triton_poi_fused_add_ge_logical_and_logical_or_lt_231,_add_position_embeddings_kernel_5" AOT_INDUCTOR_DEBUG_INTERMEDIATE_VALUE_PRINTER=2
```
The filtered kernels to print environment variable has the names of the kernels you want to inspect. If the inputs to the kernel are not as expected, you then inspect the kernel that produces the bad input.
## Additional Debugging Tools
### Logging and Tracing
- **tlparse / TORCH_TRACE**: Provides complete output codes for inspection and records the set of guards used. See {ref}`tlparse / TORCH_TRACE <tlparse-torch-trace>` for more details.
- **TORCH_LOGS**: Use `TORCH_LOGS="+inductor,output_code"` to see more PT2 internal logs. See {ref}`TORCH_LOGS <torch-logs>` for more details.
- **TORCH_SHOW_CPP_STACKTRACES**: Set `TORCH_SHOW_CPP_STACKTRACES=1` to potentially see more stack traces.
### Common Sources of Issues
- [**Dynamic shapes**](./torch.compiler_dynamic_shapes.md): Historically a source of many IMAs. Pay special attention when debugging dynamic shape scenarios.
- **Custom ops**: Especially when implemented in C++ and used with dynamic shapes. There is a need to Symint'ify the meta function.

View File

@ -192,6 +192,8 @@ For more information on dynamic shapes, see [The dynamic shapes manual](https://
## Logging Tools
(tlparse-torch-trace)=
### tlparse / TORCH_TRACE
`tlparse` / `TORCH_TRACE` are a pair of tools that produce compilation reports that look like this:
@ -252,6 +254,8 @@ Here are some insights you can gain from a `tlparse`:
For example, you can look at the high-level generated FX graph or the generated Triton code.
- Is there relevant information for a particular frame? You can find these in `compilation_metrics`.
(torch-logs)=
### TORCH_LOGS
You can use the `TORCH_LOGS` environment variable to selectively enable parts of the `torch.compile` stack to log.

View File

@ -2709,6 +2709,7 @@ TEST(ProfilerDisableInCallbackTest, Basic) {
}
TEST(RecordDebugHandles, Basic) {
GTEST_SKIP() << "Test is flaky and sometimes hangs on CI. ";
// Enable the profiler in this thread
const std::set<torch::autograd::profiler::ActivityType> activities(
{torch::autograd::profiler::ActivityType::CPU});

View File

@ -10,7 +10,10 @@ from torch.distributed.pipelining._backward import (
stage_backward_input,
stage_backward_weight,
)
from torch.testing._internal.common_device_type import instantiate_device_type_tests
from torch.testing._internal.common_device_type import (
instantiate_device_type_tests,
skipXPUIf,
)
from torch.testing._internal.common_utils import run_tests, TestCase
@ -19,6 +22,7 @@ batch_size = 256
class StageBackwardTests(TestCase):
@skipXPUIf(True, "https://github.com/intel/torch-xpu-ops/issues/1682")
def test_stage_backward(self, device):
# MLP as a stage module
mod = MLPModule(d_hid).to(device)
@ -93,6 +97,7 @@ class StageBackwardTests(TestCase):
# Check that the weight gradients were not updated
self.assertEqual(p.grad, None)
@skipXPUIf(True, "https://github.com/intel/torch-xpu-ops/issues/1682")
def test_stage_backward_weight(self, device):
# MLP as a stage module
mod = MLPModule(d_hid).to(device)
@ -133,6 +138,7 @@ class StageBackwardTests(TestCase):
print(f"Gradient test failed for {name}: {p.grad} vs {ref_p.grad}")
raise
@skipXPUIf(True, "https://github.com/intel/torch-xpu-ops/issues/1682")
def test_stage_backward_weight_multiple_iters(self, device):
# MLP as a stage module
mod = MLPModule(d_hid).to(device)
@ -223,7 +229,9 @@ class StageBackwardTests(TestCase):
devices = ["cpu", "cuda", "hpu", "xpu"]
instantiate_device_type_tests(StageBackwardTests, globals(), only_for=devices)
instantiate_device_type_tests(
StageBackwardTests, globals(), only_for=devices, allow_xpu=True
)
if __name__ == "__main__":
run_tests()

View File

@ -9,7 +9,10 @@ from torch.distributed.pipelining.microbatch import (
split_args_kwargs_into_chunks,
TensorChunkSpec,
)
from torch.testing._internal.common_device_type import instantiate_device_type_tests
from torch.testing._internal.common_device_type import (
instantiate_device_type_tests,
skipXPUIf,
)
from torch.testing._internal.common_utils import run_tests, TestCase
@ -56,6 +59,7 @@ class MicrobatchTests(TestCase):
torch.testing.assert_close(merged_kwargs, kwargs)
print("Microbatch test passed")
@skipXPUIf(True, "https://github.com/intel/torch-xpu-ops/issues/1682")
def test_chunk_spec(self, device):
mod = ModelWithKwargs().to(device)
batch_size = ModelWithKwargs.DEFAULT_BATCH_SIZE
@ -84,12 +88,15 @@ class MicrobatchTests(TestCase):
ref = mod(x, y)
out = pipe(x, y)[0]
torch.testing.assert_close(out, ref)
print(f"equivalence test passed {torch.sum(out)} ref {torch.sum(ref)}")
devices = ["cpu", "cuda", "hpu", "xpu"]
instantiate_device_type_tests(MicrobatchTests, globals(), only_for=devices)
instantiate_device_type_tests(
MicrobatchTests, globals(), only_for=devices, allow_xpu=True
)
if __name__ == "__main__":
run_tests()

View File

@ -0,0 +1,110 @@
# Owner(s): ["oncall: distributed"]
# To run:
# python test/distributed/test_cupy_as_tensor.py
from dataclasses import dataclass
import torch
from torch.multiprocessing.reductions import reduce_tensor
from torch.testing._internal.common_distributed import MultiProcContinousTest
from torch.testing._internal.common_utils import (
requires_cuda_p2p_access,
run_tests,
skipIfRocm,
)
# So that tests are written in device-agnostic way
device_type = "cuda"
device_module = torch.get_device_module(device_type)
@dataclass
class CupyWrapper:
data_ptr: int
size_in_bytes: int
@property
def __cuda_array_interface__(self):
return {
"shape": (self.size_in_bytes,),
"typestr": "|u1",
"data": (self.data_ptr, False),
"version": 3,
}
def from_buffer(
data_ptr: int, size_in_bytes: int, device: str, dtype: torch.dtype
) -> torch.Tensor:
data = torch.as_tensor(CupyWrapper(data_ptr, size_in_bytes), device=device).view(
dtype
)
assert data.data_ptr() == data_ptr
return data
@requires_cuda_p2p_access()
class CupyAsTensorTest(MultiProcContinousTest):
@classmethod
def backend_str(cls):
return "gloo"
def _init_device(self) -> None:
# need to use vmm api to test it,
# see https://forums.developer.nvidia.com/t/inconsistent-behavior-of-cudapointergetattributes-between-cudamalloc-ipc-and-vmm-based-ipc/339025/5 # noqa: B950
torch.cuda.memory._set_allocator_settings("expandable_segments:True")
# init and pin the process to the device
device_module.set_device(self.device)
torch.empty(1, device=self.device)
@property
def device(self) -> torch.device:
return torch.device(device_type, self.rank)
@skipIfRocm
def test_cupy_as_tensor(self) -> None:
"""
Test that torch.as_tensor works for cupy array interface
with zero-copy when the pointer is p2p-shared across processes.
"""
self._init_device()
tensor: torch.Tensor
if self.rank == 1:
# it seems only error from rank non-zero will be caught by this test
tensor = torch.randn(2333, device=self.device)
tensor_meta = reduce_tensor(tensor)
torch.distributed.broadcast_object_list([tensor_meta], src=1)
else:
recv_list = [None]
torch.distributed.broadcast_object_list(recv_list, src=1)
tensor_meta = recv_list[0]
func, args = tensor_meta
args = list(args)
args[6] = self.rank
ipc_tensor = func(*args)
tensor = from_buffer(
ipc_tensor.data_ptr(),
ipc_tensor.numel() * ipc_tensor.element_size(),
self.device,
ipc_tensor.dtype,
)
torch.distributed.barrier()
if self.rank == 1:
tensor.fill_(1)
device_module.synchronize()
torch.distributed.barrier()
assert tensor.allclose(tensor, 1)
torch.distributed.barrier()
@classmethod
def tearDownClass(cls):
torch.cuda.memory._set_allocator_settings("expandable_segments:False")
super().tearDownClass()
if __name__ == "__main__":
run_tests()

View File

@ -1,5 +1,5 @@
diff --git a/test/dynamo/cpython/3_13/test_contextlib.py b/test/dynamo/cpython/3_13/test_contextlib.py
index cf651959803..51fd083b112 100644
index cf651959803..256a824932d 100644
--- a/test/dynamo/cpython/3_13/test_contextlib.py
+++ b/test/dynamo/cpython/3_13/test_contextlib.py
@@ -1,3 +1,57 @@
@ -60,7 +60,7 @@ index cf651959803..51fd083b112 100644
"""Unit tests for contextlib.py, and other context managers."""
import io
@@ -14,7 +68,7 @@ from test.support.testcase import ExceptionIsLikeMixin
@@ -14,60 +68,67 @@ from test.support.testcase import ExceptionIsLikeMixin
import weakref
@ -68,8 +68,81 @@ index cf651959803..51fd083b112 100644
+class TestAbstractContextManager(__TestCase):
def test_enter(self):
class DefaultEnter(AbstractContextManager):
@@ -67,7 +121,7 @@ class TestAbstractContextManager(unittest.TestCase):
- class DefaultEnter(AbstractContextManager):
- def __exit__(self, *args):
- super().__exit__(*args)
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class DefaultEnter(AbstractContextManager):
+ def __exit__(self, *args):
+ super().__exit__(*args)
manager = DefaultEnter()
self.assertIs(manager.__enter__(), manager)
def test_slots(self):
- class DefaultContextManager(AbstractContextManager):
- __slots__ = ()
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class DefaultContextManager(AbstractContextManager):
+ __slots__ = ()
- def __exit__(self, *args):
- super().__exit__(*args)
+ def __exit__(self, *args):
+ super().__exit__(*args)
with self.assertRaises(AttributeError):
DefaultContextManager().var = 42
def test_exit_is_abstract(self):
- class MissingExit(AbstractContextManager):
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class MissingExit(AbstractContextManager):
+ pass
with self.assertRaises(TypeError):
MissingExit()
def test_structural_subclassing(self):
- class ManagerFromScratch:
- def __enter__(self):
- return self
- def __exit__(self, exc_type, exc_value, traceback):
- return None
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class ManagerFromScratch:
+ def __enter__(self):
+ return self
+ def __exit__(self, exc_type, exc_value, traceback):
+ return None
self.assertTrue(issubclass(ManagerFromScratch, AbstractContextManager))
- class DefaultEnter(AbstractContextManager):
- def __exit__(self, *args):
- super().__exit__(*args)
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class DefaultEnter(AbstractContextManager):
+ def __exit__(self, *args):
+ super().__exit__(*args)
self.assertTrue(issubclass(DefaultEnter, AbstractContextManager))
- class NoEnter(ManagerFromScratch):
- __enter__ = None
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class NoEnter(ManagerFromScratch):
+ __enter__ = None
self.assertFalse(issubclass(NoEnter, AbstractContextManager))
- class NoExit(ManagerFromScratch):
- __exit__ = None
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class NoExit(ManagerFromScratch):
+ __exit__ = None
self.assertFalse(issubclass(NoExit, AbstractContextManager))
@ -78,7 +151,81 @@ index cf651959803..51fd083b112 100644
def test_contextmanager_plain(self):
state = []
@@ -396,7 +450,7 @@ def woohoo():
@@ -115,8 +176,9 @@ class ContextManagerTestCase(unittest.TestCase):
self.assertEqual(frames[0].line, '1/0')
# Repeat with RuntimeError (which goes through a different code path)
- class RuntimeErrorSubclass(RuntimeError):
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class RuntimeErrorSubclass(RuntimeError):
+ pass
try:
with f():
@@ -128,8 +190,9 @@ class ContextManagerTestCase(unittest.TestCase):
self.assertEqual(frames[0].name, 'test_contextmanager_traceback')
self.assertEqual(frames[0].line, 'raise RuntimeErrorSubclass(42)')
- class StopIterationSubclass(StopIteration):
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class StopIterationSubclass(StopIteration):
+ pass
for stop_exc in (
StopIteration('spam'),
@@ -169,9 +232,9 @@ class ContextManagerTestCase(unittest.TestCase):
ctx.__enter__()
with self.assertRaises(RuntimeError):
ctx.__exit__(TypeError, TypeError("foo"), None)
- if support.check_impl_detail(cpython=True):
- # The "gen" attribute is an implementation detail.
- self.assertFalse(ctx.gen.gi_suspended)
+ # if support.check_impl_detail(cpython=True):
+ # # The "gen" attribute is an implementation detail.
+ # self.assertFalse(ctx.gen.gi_suspended)
def test_contextmanager_trap_no_yield(self):
@contextmanager
@@ -191,9 +254,9 @@ class ContextManagerTestCase(unittest.TestCase):
ctx.__enter__()
with self.assertRaises(RuntimeError):
ctx.__exit__(None, None, None)
- if support.check_impl_detail(cpython=True):
- # The "gen" attribute is an implementation detail.
- self.assertFalse(ctx.gen.gi_suspended)
+ # if support.check_impl_detail(cpython=True):
+ # # The "gen" attribute is an implementation detail.
+ # self.assertFalse(ctx.gen.gi_suspended)
def test_contextmanager_non_normalised(self):
@contextmanager
@@ -230,8 +293,9 @@ class ContextManagerTestCase(unittest.TestCase):
def woohoo():
yield
- class StopIterationSubclass(StopIteration):
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class StopIterationSubclass(StopIteration):
+ pass
for stop_exc in (StopIteration('spam'), StopIterationSubclass('spam')):
with self.subTest(type=type(stop_exc)):
@@ -344,8 +408,9 @@ def woohoo():
self.assertEqual(target, (11, 22, 33, 44))
def test_nokeepref(self):
- class A:
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class A:
+ pass
@contextmanager
def woohoo(a, b):
@@ -396,7 +461,7 @@ def woohoo():
self.assertEqual(depth, 0)
@ -87,16 +234,48 @@ index cf651959803..51fd083b112 100644
@support.requires_docstrings
def test_instance_docs(self):
@@ -430,7 +484,7 @@ class ClosingTestCase(unittest.TestCase):
@@ -407,9 +472,10 @@ class ClosingTestCase(unittest.TestCase):
def test_closing(self):
state = []
- class C:
- def close(self):
- state.append(1)
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class C:
+ def close(self):
+ state.append(1)
x = C()
self.assertEqual(state, [])
with closing(x) as y:
@@ -418,9 +484,10 @@ class ClosingTestCase(unittest.TestCase):
def test_closing_error(self):
state = []
- class C:
- def close(self):
- state.append(1)
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class C:
+ def close(self):
+ state.append(1)
x = C()
self.assertEqual(state, [])
with self.assertRaises(ZeroDivisionError):
@@ -430,16 +497,17 @@ class ClosingTestCase(unittest.TestCase):
self.assertEqual(state, [1])
-class NullcontextTestCase(unittest.TestCase):
+class NullcontextTestCase(__TestCase):
def test_nullcontext(self):
class C:
pass
@@ -439,7 +493,7 @@ class NullcontextTestCase(unittest.TestCase):
- class C:
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class C:
+ pass
c = C()
with nullcontext(c) as c_in:
self.assertIs(c_in, c)
@ -105,7 +284,7 @@ index cf651959803..51fd083b112 100644
def testWithOpen(self):
tfn = tempfile.mktemp()
@@ -457,7 +511,7 @@ class FileContextTestCase(unittest.TestCase):
@@ -457,7 +525,7 @@ class FileContextTestCase(unittest.TestCase):
finally:
os_helper.unlink(tfn)
@ -114,7 +293,7 @@ index cf651959803..51fd083b112 100644
def boilerPlate(self, lock, locked):
self.assertFalse(locked())
@@ -520,7 +574,7 @@ class mycontext(ContextDecorator):
@@ -520,7 +588,7 @@ class mycontext(ContextDecorator):
return self.catch
@ -123,7 +302,95 @@ index cf651959803..51fd083b112 100644
@support.requires_docstrings
def test_instance_docs(self):
@@ -680,7 +734,7 @@ class TestContextDecorator(unittest.TestCase):
@@ -584,13 +652,14 @@ class TestContextDecorator(unittest.TestCase):
def test_decorating_method(self):
context = mycontext()
- class Test(object):
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class Test(object):
- @context
- def method(self, a, b, c=None):
- self.a = a
- self.b = b
- self.c = c
+ @context
+ def method(self, a, b, c=None):
+ self.a = a
+ self.b = b
+ self.c = c
# these tests are for argument passing when used as a decorator
test = Test()
@@ -612,11 +681,12 @@ class TestContextDecorator(unittest.TestCase):
def test_typo_enter(self):
- class mycontext(ContextDecorator):
- def __unter__(self):
- pass
- def __exit__(self, *exc):
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class mycontext(ContextDecorator):
+ def __unter__(self):
+ pass
+ def __exit__(self, *exc):
+ pass
with self.assertRaisesRegex(TypeError, 'the context manager'):
with mycontext():
@@ -624,11 +694,12 @@ class TestContextDecorator(unittest.TestCase):
def test_typo_exit(self):
- class mycontext(ContextDecorator):
- def __enter__(self):
- pass
- def __uxit__(self, *exc):
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class mycontext(ContextDecorator):
+ def __enter__(self):
+ pass
+ def __uxit__(self, *exc):
+ pass
with self.assertRaisesRegex(TypeError, 'the context manager.*__exit__'):
with mycontext():
@@ -636,19 +707,20 @@ class TestContextDecorator(unittest.TestCase):
def test_contextdecorator_as_mixin(self):
- class somecontext(object):
- started = False
- exc = None
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class somecontext(object):
+ started = False
+ exc = None
- def __enter__(self):
- self.started = True
- return self
+ def __enter__(self):
+ self.started = True
+ return self
- def __exit__(self, *exc):
- self.exc = exc
+ def __exit__(self, *exc):
+ self.exc = exc
- class mycontext(somecontext, ContextDecorator):
- pass
+ class mycontext(somecontext, ContextDecorator):
+ pass
context = mycontext()
@context
@@ -680,7 +752,7 @@ class TestContextDecorator(unittest.TestCase):
self.assertEqual(state, [1, 'something else', 999])
@ -132,7 +399,164 @@ index cf651959803..51fd083b112 100644
exit_stack = None
@support.requires_docstrings
@@ -1141,7 +1195,7 @@ class TestBaseExitStack:
@@ -745,13 +817,14 @@ class TestBaseExitStack:
self.assertIsNone(exc_type)
self.assertIsNone(exc)
self.assertIsNone(exc_tb)
- class ExitCM(object):
- def __init__(self, check_exc):
- self.check_exc = check_exc
- def __enter__(self):
- self.fail("Should not be called!")
- def __exit__(self, *exc_details):
- self.check_exc(*exc_details)
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class ExitCM(object):
+ def __init__(self, check_exc):
+ self.check_exc = check_exc
+ def __enter__(self):
+ self.fail("Should not be called!")
+ def __exit__(self, *exc_details):
+ self.check_exc(*exc_details)
with self.exit_stack() as stack:
stack.push(_expect_ok)
self.assertIs(stack._exit_callbacks[-1][1], _expect_ok)
@@ -770,11 +843,12 @@ class TestBaseExitStack:
1/0
def test_enter_context(self):
- class TestCM(object):
- def __enter__(self):
- result.append(1)
- def __exit__(self, *exc_details):
- result.append(3)
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class TestCM(object):
+ def __enter__(self):
+ result.append(1)
+ def __exit__(self, *exc_details):
+ result.append(3)
result = []
cm = TestCM()
@@ -789,14 +863,15 @@ class TestBaseExitStack:
self.assertEqual(result, [1, 2, 3, 4])
def test_enter_context_errors(self):
- class LacksEnterAndExit:
- pass
- class LacksEnter:
- def __exit__(self, *exc_info):
- pass
- class LacksExit:
- def __enter__(self):
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class LacksEnterAndExit:
pass
+ class LacksEnter:
+ def __exit__(self, *exc_info):
+ pass
+ class LacksExit:
+ def __enter__(self):
+ pass
with self.exit_stack() as stack:
with self.assertRaisesRegex(TypeError, 'the context manager'):
@@ -877,32 +952,33 @@ class TestBaseExitStack:
def test_exit_exception_chaining_reference(self):
# Sanity check to make sure that ExitStack chaining matches
# actual nested with statements
- class RaiseExc:
- def __init__(self, exc):
- self.exc = exc
- def __enter__(self):
- return self
- def __exit__(self, *exc_details):
- raise self.exc
-
- class RaiseExcWithContext:
- def __init__(self, outer, inner):
- self.outer = outer
- self.inner = inner
- def __enter__(self):
- return self
- def __exit__(self, *exc_details):
- try:
- raise self.inner
- except:
- raise self.outer
-
- class SuppressExc:
- def __enter__(self):
- return self
- def __exit__(self, *exc_details):
- type(self).saved_details = exc_details
- return True
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class RaiseExc:
+ def __init__(self, exc):
+ self.exc = exc
+ def __enter__(self):
+ return self
+ def __exit__(self, *exc_details):
+ raise self.exc
+
+ class RaiseExcWithContext:
+ def __init__(self, outer, inner):
+ self.outer = outer
+ self.inner = inner
+ def __enter__(self):
+ return self
+ def __exit__(self, *exc_details):
+ try:
+ raise self.inner
+ except:
+ raise self.outer
+
+ class SuppressExc:
+ def __enter__(self):
+ return self
+ def __exit__(self, *exc_details):
+ type(self).saved_details = exc_details
+ return True
try:
with RaiseExc(IndexError):
@@ -957,8 +1033,9 @@ class TestBaseExitStack:
# Ensure ExitStack chaining matches actual nested `with` statements
# regarding explicit __context__ = None.
- class MyException(Exception):
- pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class MyException(Exception):
+ pass
@contextmanager
def my_cm():
@@ -1096,7 +1173,8 @@ class TestBaseExitStack:
stack.callback(int)
def test_instance_bypass(self):
- class Example(object): pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class Example(object): pass
cm = Example()
cm.__enter__ = object()
cm.__exit__ = object()
@@ -1108,8 +1186,9 @@ class TestBaseExitStack:
def test_dont_reraise_RuntimeError(self):
# https://bugs.python.org/issue27122
- class UniqueException(Exception): pass
- class UniqueRuntimeError(RuntimeError): pass
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class UniqueException(Exception): pass
+ class UniqueRuntimeError(RuntimeError): pass
@contextmanager
def second():
@@ -1141,7 +1220,7 @@ class TestBaseExitStack:
self.assertIs(exc.__cause__, exc.__context__)
@ -141,7 +565,7 @@ index cf651959803..51fd083b112 100644
exit_stack = ExitStack
callback_error_internal_frames = [
('__exit__', 'raise exc'),
@@ -1149,7 +1203,7 @@ class TestExitStack(TestBaseExitStack, unittest.TestCase):
@@ -1149,7 +1228,7 @@ class TestExitStack(TestBaseExitStack, unittest.TestCase):
]
@ -150,7 +574,7 @@ index cf651959803..51fd083b112 100644
redirect_stream = None
orig_stream = None
@@ -1206,19 +1260,19 @@ class TestRedirectStream:
@@ -1206,19 +1285,19 @@ class TestRedirectStream:
self.assertEqual(s, "Hello World!\n")
@ -173,7 +597,7 @@ index cf651959803..51fd083b112 100644
@support.requires_docstrings
def test_instance_docs(self):
@@ -1315,7 +1369,7 @@ class TestSuppress(ExceptionIsLikeMixin, unittest.TestCase):
@@ -1315,7 +1394,7 @@ class TestSuppress(ExceptionIsLikeMixin, unittest.TestCase):
)
@ -182,7 +606,7 @@ index cf651959803..51fd083b112 100644
def make_relative_path(self, *parts):
return os.path.join(
os.path.dirname(os.path.realpath(__file__)),
@@ -1331,6 +1385,7 @@ class TestChdir(unittest.TestCase):
@@ -1331,6 +1410,7 @@ class TestChdir(unittest.TestCase):
self.assertEqual(os.getcwd(), target)
self.assertEqual(os.getcwd(), old_cwd)
@ -190,7 +614,7 @@ index cf651959803..51fd083b112 100644
def test_reentrant(self):
old_cwd = os.getcwd()
target1 = self.make_relative_path('data')
@@ -1363,4 +1418,4 @@ class TestChdir(unittest.TestCase):
@@ -1363,4 +1443,4 @@ class TestChdir(unittest.TestCase):
if __name__ == "__main__":

View File

@ -71,52 +71,59 @@ import weakref
class TestAbstractContextManager(__TestCase):
def test_enter(self):
class DefaultEnter(AbstractContextManager):
def __exit__(self, *args):
super().__exit__(*args)
with torch._dynamo.set_fullgraph(fullgraph=False):
class DefaultEnter(AbstractContextManager):
def __exit__(self, *args):
super().__exit__(*args)
manager = DefaultEnter()
self.assertIs(manager.__enter__(), manager)
def test_slots(self):
class DefaultContextManager(AbstractContextManager):
__slots__ = ()
with torch._dynamo.set_fullgraph(fullgraph=False):
class DefaultContextManager(AbstractContextManager):
__slots__ = ()
def __exit__(self, *args):
super().__exit__(*args)
def __exit__(self, *args):
super().__exit__(*args)
with self.assertRaises(AttributeError):
DefaultContextManager().var = 42
def test_exit_is_abstract(self):
class MissingExit(AbstractContextManager):
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class MissingExit(AbstractContextManager):
pass
with self.assertRaises(TypeError):
MissingExit()
def test_structural_subclassing(self):
class ManagerFromScratch:
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
return None
with torch._dynamo.set_fullgraph(fullgraph=False):
class ManagerFromScratch:
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
return None
self.assertTrue(issubclass(ManagerFromScratch, AbstractContextManager))
class DefaultEnter(AbstractContextManager):
def __exit__(self, *args):
super().__exit__(*args)
with torch._dynamo.set_fullgraph(fullgraph=False):
class DefaultEnter(AbstractContextManager):
def __exit__(self, *args):
super().__exit__(*args)
self.assertTrue(issubclass(DefaultEnter, AbstractContextManager))
class NoEnter(ManagerFromScratch):
__enter__ = None
with torch._dynamo.set_fullgraph(fullgraph=False):
class NoEnter(ManagerFromScratch):
__enter__ = None
self.assertFalse(issubclass(NoEnter, AbstractContextManager))
class NoExit(ManagerFromScratch):
__exit__ = None
with torch._dynamo.set_fullgraph(fullgraph=False):
class NoExit(ManagerFromScratch):
__exit__ = None
self.assertFalse(issubclass(NoExit, AbstractContextManager))
@ -169,8 +176,9 @@ class ContextManagerTestCase(__TestCase):
self.assertEqual(frames[0].line, '1/0')
# Repeat with RuntimeError (which goes through a different code path)
class RuntimeErrorSubclass(RuntimeError):
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class RuntimeErrorSubclass(RuntimeError):
pass
try:
with f():
@ -182,8 +190,9 @@ class ContextManagerTestCase(__TestCase):
self.assertEqual(frames[0].name, 'test_contextmanager_traceback')
self.assertEqual(frames[0].line, 'raise RuntimeErrorSubclass(42)')
class StopIterationSubclass(StopIteration):
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class StopIterationSubclass(StopIteration):
pass
for stop_exc in (
StopIteration('spam'),
@ -223,9 +232,9 @@ class ContextManagerTestCase(__TestCase):
ctx.__enter__()
with self.assertRaises(RuntimeError):
ctx.__exit__(TypeError, TypeError("foo"), None)
if support.check_impl_detail(cpython=True):
# The "gen" attribute is an implementation detail.
self.assertFalse(ctx.gen.gi_suspended)
# if support.check_impl_detail(cpython=True):
# # The "gen" attribute is an implementation detail.
# self.assertFalse(ctx.gen.gi_suspended)
def test_contextmanager_trap_no_yield(self):
@contextmanager
@ -245,9 +254,9 @@ class ContextManagerTestCase(__TestCase):
ctx.__enter__()
with self.assertRaises(RuntimeError):
ctx.__exit__(None, None, None)
if support.check_impl_detail(cpython=True):
# The "gen" attribute is an implementation detail.
self.assertFalse(ctx.gen.gi_suspended)
# if support.check_impl_detail(cpython=True):
# # The "gen" attribute is an implementation detail.
# self.assertFalse(ctx.gen.gi_suspended)
def test_contextmanager_non_normalised(self):
@contextmanager
@ -284,8 +293,9 @@ class ContextManagerTestCase(__TestCase):
def woohoo():
yield
class StopIterationSubclass(StopIteration):
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class StopIterationSubclass(StopIteration):
pass
for stop_exc in (StopIteration('spam'), StopIterationSubclass('spam')):
with self.subTest(type=type(stop_exc)):
@ -398,8 +408,9 @@ def woohoo():
self.assertEqual(target, (11, 22, 33, 44))
def test_nokeepref(self):
class A:
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class A:
pass
@contextmanager
def woohoo(a, b):
@ -461,9 +472,10 @@ class ClosingTestCase(__TestCase):
def test_closing(self):
state = []
class C:
def close(self):
state.append(1)
with torch._dynamo.set_fullgraph(fullgraph=False):
class C:
def close(self):
state.append(1)
x = C()
self.assertEqual(state, [])
with closing(x) as y:
@ -472,9 +484,10 @@ class ClosingTestCase(__TestCase):
def test_closing_error(self):
state = []
class C:
def close(self):
state.append(1)
with torch._dynamo.set_fullgraph(fullgraph=False):
class C:
def close(self):
state.append(1)
x = C()
self.assertEqual(state, [])
with self.assertRaises(ZeroDivisionError):
@ -486,8 +499,9 @@ class ClosingTestCase(__TestCase):
class NullcontextTestCase(__TestCase):
def test_nullcontext(self):
class C:
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class C:
pass
c = C()
with nullcontext(c) as c_in:
self.assertIs(c_in, c)
@ -638,13 +652,14 @@ class TestContextDecorator(__TestCase):
def test_decorating_method(self):
context = mycontext()
class Test(object):
with torch._dynamo.set_fullgraph(fullgraph=False):
class Test(object):
@context
def method(self, a, b, c=None):
self.a = a
self.b = b
self.c = c
@context
def method(self, a, b, c=None):
self.a = a
self.b = b
self.c = c
# these tests are for argument passing when used as a decorator
test = Test()
@ -666,11 +681,12 @@ class TestContextDecorator(__TestCase):
def test_typo_enter(self):
class mycontext(ContextDecorator):
def __unter__(self):
pass
def __exit__(self, *exc):
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class mycontext(ContextDecorator):
def __unter__(self):
pass
def __exit__(self, *exc):
pass
with self.assertRaisesRegex(TypeError, 'the context manager'):
with mycontext():
@ -678,11 +694,12 @@ class TestContextDecorator(__TestCase):
def test_typo_exit(self):
class mycontext(ContextDecorator):
def __enter__(self):
pass
def __uxit__(self, *exc):
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class mycontext(ContextDecorator):
def __enter__(self):
pass
def __uxit__(self, *exc):
pass
with self.assertRaisesRegex(TypeError, 'the context manager.*__exit__'):
with mycontext():
@ -690,19 +707,20 @@ class TestContextDecorator(__TestCase):
def test_contextdecorator_as_mixin(self):
class somecontext(object):
started = False
exc = None
with torch._dynamo.set_fullgraph(fullgraph=False):
class somecontext(object):
started = False
exc = None
def __enter__(self):
self.started = True
return self
def __enter__(self):
self.started = True
return self
def __exit__(self, *exc):
self.exc = exc
def __exit__(self, *exc):
self.exc = exc
class mycontext(somecontext, ContextDecorator):
pass
class mycontext(somecontext, ContextDecorator):
pass
context = mycontext()
@context
@ -799,13 +817,14 @@ class _TestBaseExitStack:
self.assertIsNone(exc_type)
self.assertIsNone(exc)
self.assertIsNone(exc_tb)
class ExitCM(object):
def __init__(self, check_exc):
self.check_exc = check_exc
def __enter__(self):
self.fail("Should not be called!")
def __exit__(self, *exc_details):
self.check_exc(*exc_details)
with torch._dynamo.set_fullgraph(fullgraph=False):
class ExitCM(object):
def __init__(self, check_exc):
self.check_exc = check_exc
def __enter__(self):
self.fail("Should not be called!")
def __exit__(self, *exc_details):
self.check_exc(*exc_details)
with self.exit_stack() as stack:
stack.push(_expect_ok)
self.assertIs(stack._exit_callbacks[-1][1], _expect_ok)
@ -824,11 +843,12 @@ class _TestBaseExitStack:
1/0
def test_enter_context(self):
class TestCM(object):
def __enter__(self):
result.append(1)
def __exit__(self, *exc_details):
result.append(3)
with torch._dynamo.set_fullgraph(fullgraph=False):
class TestCM(object):
def __enter__(self):
result.append(1)
def __exit__(self, *exc_details):
result.append(3)
result = []
cm = TestCM()
@ -843,14 +863,15 @@ class _TestBaseExitStack:
self.assertEqual(result, [1, 2, 3, 4])
def test_enter_context_errors(self):
class LacksEnterAndExit:
pass
class LacksEnter:
def __exit__(self, *exc_info):
pass
class LacksExit:
def __enter__(self):
with torch._dynamo.set_fullgraph(fullgraph=False):
class LacksEnterAndExit:
pass
class LacksEnter:
def __exit__(self, *exc_info):
pass
class LacksExit:
def __enter__(self):
pass
with self.exit_stack() as stack:
with self.assertRaisesRegex(TypeError, 'the context manager'):
@ -931,32 +952,33 @@ class _TestBaseExitStack:
def test_exit_exception_chaining_reference(self):
# Sanity check to make sure that ExitStack chaining matches
# actual nested with statements
class RaiseExc:
def __init__(self, exc):
self.exc = exc
def __enter__(self):
return self
def __exit__(self, *exc_details):
raise self.exc
with torch._dynamo.set_fullgraph(fullgraph=False):
class RaiseExc:
def __init__(self, exc):
self.exc = exc
def __enter__(self):
return self
def __exit__(self, *exc_details):
raise self.exc
class RaiseExcWithContext:
def __init__(self, outer, inner):
self.outer = outer
self.inner = inner
def __enter__(self):
return self
def __exit__(self, *exc_details):
try:
raise self.inner
except:
raise self.outer
class RaiseExcWithContext:
def __init__(self, outer, inner):
self.outer = outer
self.inner = inner
def __enter__(self):
return self
def __exit__(self, *exc_details):
try:
raise self.inner
except:
raise self.outer
class SuppressExc:
def __enter__(self):
return self
def __exit__(self, *exc_details):
type(self).saved_details = exc_details
return True
class SuppressExc:
def __enter__(self):
return self
def __exit__(self, *exc_details):
type(self).saved_details = exc_details
return True
try:
with RaiseExc(IndexError):
@ -1011,8 +1033,9 @@ class _TestBaseExitStack:
# Ensure ExitStack chaining matches actual nested `with` statements
# regarding explicit __context__ = None.
class MyException(Exception):
pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class MyException(Exception):
pass
@contextmanager
def my_cm():
@ -1150,7 +1173,8 @@ class _TestBaseExitStack:
stack.callback(int)
def test_instance_bypass(self):
class Example(object): pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class Example(object): pass
cm = Example()
cm.__enter__ = object()
cm.__exit__ = object()
@ -1162,8 +1186,9 @@ class _TestBaseExitStack:
def test_dont_reraise_RuntimeError(self):
# https://bugs.python.org/issue27122
class UniqueException(Exception): pass
class UniqueRuntimeError(RuntimeError): pass
with torch._dynamo.set_fullgraph(fullgraph=False):
class UniqueException(Exception): pass
class UniqueRuntimeError(RuntimeError): pass
@contextmanager
def second():

View File

@ -0,0 +1,98 @@
diff --git a/test/dynamo/cpython/3_13/test_defaultdict.py b/test/dynamo/cpython/3_13/test_defaultdict.py
index bdbe9b81e8f..d55f1dc54c6 100644
--- a/test/dynamo/cpython/3_13/test_defaultdict.py
+++ b/test/dynamo/cpython/3_13/test_defaultdict.py
@@ -1,3 +1,60 @@
+# ======= BEGIN Dynamo patch =======
+# Owner(s): ["module: dynamo"]
+
+# ruff: noqa
+# flake8: noqa
+
+# Test copied from
+# https://raw.githubusercontent.com/python/cpython/refs/tags/v3.13.5/Lib/test/test_defaultdict.py
+
+import sys
+import torch
+import torch._dynamo.test_case
+import unittest
+from torch._dynamo.test_case import CPythonTestCase
+from torch.testing._internal.common_utils import (
+ run_tests,
+)
+
+__TestCase = CPythonTestCase
+
+
+# redirect import statements
+import sys
+import importlib.abc
+
+redirect_imports = (
+ "test.mapping_tests",
+ "test.typinganndata",
+ "test.test_grammar",
+ "test.test_math",
+ "test.test_iter",
+ "test.typinganndata.ann_module",
+)
+
+class RedirectImportFinder(importlib.abc.MetaPathFinder):
+ def find_spec(self, fullname, path, target=None):
+ # Check if the import is the problematic one
+ if fullname in redirect_imports:
+ try:
+ # Attempt to import the standalone module
+ name = fullname.removeprefix("test.")
+ r = importlib.import_module(name)
+ # Redirect the module in sys.modules
+ sys.modules[fullname] = r
+ # Return a module spec from the found module
+ return importlib.util.find_spec(name)
+ except ImportError:
+ return None
+ return None
+
+# Add the custom finder to sys.meta_path
+sys.meta_path.insert(0, RedirectImportFinder())
+
+
+# ======= END DYNAMO PATCH =======
+
+
"""Unit tests for collections.defaultdict."""
import copy
@@ -9,7 +66,7 @@ from collections import defaultdict
def foobar():
return list
-class TestDefaultDict(unittest.TestCase):
+class TestDefaultDict(__TestCase):
def test_basic(self):
d1 = defaultdict()
@@ -127,11 +184,12 @@ class TestDefaultDict(unittest.TestCase):
def test_recursive_repr(self):
# Issue2045: stack overflow when default_factory is a bound method
- class sub(defaultdict):
- def __init__(self):
- self.default_factory = self._factory
- def _factory(self):
- return []
+ with torch._dynamo.set_fullgraph(fullgraph=False):
+ class sub(defaultdict):
+ def __init__(self):
+ self.default_factory = self._factory
+ def _factory(self):
+ return []
d = sub()
self.assertRegex(repr(d),
r"sub\(<bound method .*sub\._factory "
@@ -187,4 +245,4 @@ class TestDefaultDict(unittest.TestCase):
i |= None
if __name__ == "__main__":
- unittest.main()
+ run_tests()

View File

@ -0,0 +1,248 @@
# ======= BEGIN Dynamo patch =======
# Owner(s): ["module: dynamo"]
# ruff: noqa
# flake8: noqa
# Test copied from
# https://raw.githubusercontent.com/python/cpython/refs/tags/v3.13.5/Lib/test/test_defaultdict.py
import sys
import torch
import torch._dynamo.test_case
import unittest
from torch._dynamo.test_case import CPythonTestCase
from torch.testing._internal.common_utils import (
run_tests,
)
__TestCase = CPythonTestCase
# redirect import statements
import sys
import importlib.abc
redirect_imports = (
"test.mapping_tests",
"test.typinganndata",
"test.test_grammar",
"test.test_math",
"test.test_iter",
"test.typinganndata.ann_module",
)
class RedirectImportFinder(importlib.abc.MetaPathFinder):
def find_spec(self, fullname, path, target=None):
# Check if the import is the problematic one
if fullname in redirect_imports:
try:
# Attempt to import the standalone module
name = fullname.removeprefix("test.")
r = importlib.import_module(name)
# Redirect the module in sys.modules
sys.modules[fullname] = r
# Return a module spec from the found module
return importlib.util.find_spec(name)
except ImportError:
return None
return None
# Add the custom finder to sys.meta_path
sys.meta_path.insert(0, RedirectImportFinder())
# ======= END DYNAMO PATCH =======
"""Unit tests for collections.defaultdict."""
import copy
import pickle
import unittest
from collections import defaultdict
def foobar():
return list
class TestDefaultDict(__TestCase):
def test_basic(self):
d1 = defaultdict()
self.assertEqual(d1.default_factory, None)
d1.default_factory = list
d1[12].append(42)
self.assertEqual(d1, {12: [42]})
d1[12].append(24)
self.assertEqual(d1, {12: [42, 24]})
d1[13]
d1[14]
self.assertEqual(d1, {12: [42, 24], 13: [], 14: []})
self.assertTrue(d1[12] is not d1[13] is not d1[14])
d2 = defaultdict(list, foo=1, bar=2)
self.assertEqual(d2.default_factory, list)
self.assertEqual(d2, {"foo": 1, "bar": 2})
self.assertEqual(d2["foo"], 1)
self.assertEqual(d2["bar"], 2)
self.assertEqual(d2[42], [])
self.assertIn("foo", d2)
self.assertIn("foo", d2.keys())
self.assertIn("bar", d2)
self.assertIn("bar", d2.keys())
self.assertIn(42, d2)
self.assertIn(42, d2.keys())
self.assertNotIn(12, d2)
self.assertNotIn(12, d2.keys())
d2.default_factory = None
self.assertEqual(d2.default_factory, None)
try:
d2[15]
except KeyError as err:
self.assertEqual(err.args, (15,))
else:
self.fail("d2[15] didn't raise KeyError")
self.assertRaises(TypeError, defaultdict, 1)
def test_missing(self):
d1 = defaultdict()
self.assertRaises(KeyError, d1.__missing__, 42)
d1.default_factory = list
self.assertEqual(d1.__missing__(42), [])
def test_repr(self):
d1 = defaultdict()
self.assertEqual(d1.default_factory, None)
self.assertEqual(repr(d1), "defaultdict(None, {})")
self.assertEqual(eval(repr(d1)), d1)
d1[11] = 41
self.assertEqual(repr(d1), "defaultdict(None, {11: 41})")
d2 = defaultdict(int)
self.assertEqual(d2.default_factory, int)
d2[12] = 42
self.assertEqual(repr(d2), "defaultdict(<class 'int'>, {12: 42})")
def foo(): return 43
d3 = defaultdict(foo)
self.assertTrue(d3.default_factory is foo)
d3[13]
self.assertEqual(repr(d3), "defaultdict(%s, {13: 43})" % repr(foo))
def test_copy(self):
d1 = defaultdict()
d2 = d1.copy()
self.assertEqual(type(d2), defaultdict)
self.assertEqual(d2.default_factory, None)
self.assertEqual(d2, {})
d1.default_factory = list
d3 = d1.copy()
self.assertEqual(type(d3), defaultdict)
self.assertEqual(d3.default_factory, list)
self.assertEqual(d3, {})
d1[42]
d4 = d1.copy()
self.assertEqual(type(d4), defaultdict)
self.assertEqual(d4.default_factory, list)
self.assertEqual(d4, {42: []})
d4[12]
self.assertEqual(d4, {42: [], 12: []})
# Issue 6637: Copy fails for empty default dict
d = defaultdict()
d['a'] = 42
e = d.copy()
self.assertEqual(e['a'], 42)
def test_shallow_copy(self):
d1 = defaultdict(foobar, {1: 1})
d2 = copy.copy(d1)
self.assertEqual(d2.default_factory, foobar)
self.assertEqual(d2, d1)
d1.default_factory = list
d2 = copy.copy(d1)
self.assertEqual(d2.default_factory, list)
self.assertEqual(d2, d1)
def test_deep_copy(self):
d1 = defaultdict(foobar, {1: [1]})
d2 = copy.deepcopy(d1)
self.assertEqual(d2.default_factory, foobar)
self.assertEqual(d2, d1)
self.assertTrue(d1[1] is not d2[1])
d1.default_factory = list
d2 = copy.deepcopy(d1)
self.assertEqual(d2.default_factory, list)
self.assertEqual(d2, d1)
def test_keyerror_without_factory(self):
d1 = defaultdict()
try:
d1[(1,)]
except KeyError as err:
self.assertEqual(err.args[0], (1,))
else:
self.fail("expected KeyError")
def test_recursive_repr(self):
# Issue2045: stack overflow when default_factory is a bound method
with torch._dynamo.set_fullgraph(fullgraph=False):
class sub(defaultdict):
def __init__(self):
self.default_factory = self._factory
def _factory(self):
return []
d = sub()
self.assertRegex(repr(d),
r"sub\(<bound method .*sub\._factory "
r"of sub\(\.\.\., \{\}\)>, \{\}\)")
def test_callable_arg(self):
self.assertRaises(TypeError, defaultdict, {})
def test_pickling(self):
d = defaultdict(int)
d[1]
for proto in range(pickle.HIGHEST_PROTOCOL + 1):
s = pickle.dumps(d, proto)
o = pickle.loads(s)
self.assertEqual(d, o)
def test_union(self):
i = defaultdict(int, {1: 1, 2: 2})
s = defaultdict(str, {0: "zero", 1: "one"})
i_s = i | s
self.assertIs(i_s.default_factory, int)
self.assertDictEqual(i_s, {1: "one", 2: 2, 0: "zero"})
self.assertEqual(list(i_s), [1, 2, 0])
s_i = s | i
self.assertIs(s_i.default_factory, str)
self.assertDictEqual(s_i, {0: "zero", 1: 1, 2: 2})
self.assertEqual(list(s_i), [0, 1, 2])
i_ds = i | dict(s)
self.assertIs(i_ds.default_factory, int)
self.assertDictEqual(i_ds, {1: "one", 2: 2, 0: "zero"})
self.assertEqual(list(i_ds), [1, 2, 0])
ds_i = dict(s) | i
self.assertIs(ds_i.default_factory, int)
self.assertDictEqual(ds_i, {0: "zero", 1: 1, 2: 2})
self.assertEqual(list(ds_i), [0, 1, 2])
with self.assertRaises(TypeError):
i | list(s.items())
with self.assertRaises(TypeError):
list(s.items()) | i
# We inherit a fine |= from dict, so just a few sanity checks here:
i |= list(s.items())
self.assertIs(i.default_factory, int)
self.assertDictEqual(i, {1: "one", 2: 2, 0: "zero"})
self.assertEqual(list(i), [1, 2, 0])
with self.assertRaises(TypeError):
i |= None
if __name__ == "__main__":
run_tests()

View File

@ -1,5 +1,5 @@
diff --git a/test/dynamo/cpython/3_13/test_generators.py b/test/dynamo/cpython/3_13/test_generators.py
index e48d79d34f4..a48da0914b9 100644
index 515fe7407f1..a48da0914b9 100644
--- a/test/dynamo/cpython/3_13/test_generators.py
+++ b/test/dynamo/cpython/3_13/test_generators.py
@@ -1,3 +1,56 @@
@ -105,7 +105,8 @@ index e48d79d34f4..a48da0914b9 100644
+ return self.val
+
+ # No __iter__ method
+
-class ModifyUnderlyingIterableTest(unittest.TestCase):
+ class C:
+
+ def __iter__(self):
@ -113,8 +114,7 @@ index e48d79d34f4..a48da0914b9 100644
+
+ self.assertEqual([1,2], list(i for i in C()))
+
-class ModifyUnderlyingIterableTest(unittest.TestCase):
+
+class ModifyUnderlyingIterableTest(__TestCase):
iterables = [
range(0),
@ -137,99 +137,16 @@ index e48d79d34f4..a48da0914b9 100644
def test_close_no_return_value(self):
def f():
@@ -630,90 +706,7 @@ class GeneratorCloseTest(unittest.TestCase):
@@ -630,7 +706,7 @@ class GeneratorCloseTest(unittest.TestCase):
self.assertIsNone(f_wr())
-# See https://github.com/python/cpython/issues/125723
-class GeneratorDeallocTest(unittest.TestCase):
- def test_frame_outlives_generator(self):
- def g1():
- a = 42
- yield sys._getframe()
-
- def g2():
- a = 42
- yield
-
- def g3(obj):
- a = 42
- obj.frame = sys._getframe()
- yield
-
- class ObjectWithFrame():
- def __init__(self):
- self.frame = None
-
- def get_frame(index):
- if index == 1:
- return next(g1())
- elif index == 2:
- gen = g2()
- next(gen)
- return gen.gi_frame
- elif index == 3:
- obj = ObjectWithFrame()
- next(g3(obj))
- return obj.frame
- else:
- return None
-
- for index in (1, 2, 3):
- with self.subTest(index=index):
- frame = get_frame(index)
- frame_locals = frame.f_locals
- self.assertIn('a', frame_locals)
- self.assertEqual(frame_locals['a'], 42)
-
- def test_frame_locals_outlive_generator(self):
- frame_locals1 = None
-
- def g1():
- nonlocal frame_locals1
- frame_locals1 = sys._getframe().f_locals
- a = 42
- yield
-
- def g2():
- a = 42
- yield sys._getframe().f_locals
-
- def get_frame_locals(index):
- if index == 1:
- nonlocal frame_locals1
- next(g1())
- return frame_locals1
- if index == 2:
- return next(g2())
- else:
- return None
-
- for index in (1, 2):
- with self.subTest(index=index):
- frame_locals = get_frame_locals(index)
- self.assertIn('a', frame_locals)
- self.assertEqual(frame_locals['a'], 42)
-
- def test_frame_locals_outlive_generator_with_exec(self):
- def g():
- a = 42
- yield locals(), sys._getframe().f_locals
-
- locals_ = {'g': g}
- for i in range(10):
- exec("snapshot, live_locals = next(g())", locals=locals_)
- for l in (locals_['snapshot'], locals_['live_locals']):
- self.assertIn('a', l)
- self.assertEqual(l['a'], 42)
-
-
-class GeneratorThrowTest(unittest.TestCase):
+class GeneratorThrowTest(__TestCase):
def test_exception_context_with_yield(self):
def f():
@@ -812,7 +805,7 @@ class GeneratorThrowTest(unittest.TestCase):
@@ -729,7 +805,7 @@ class GeneratorThrowTest(unittest.TestCase):
gen.throw(ValueError)
@ -238,7 +155,7 @@ index e48d79d34f4..a48da0914b9 100644
def check_stack_names(self, frame, expected):
names = []
@@ -861,7 +854,7 @@ class GeneratorStackTraceTest(unittest.TestCase):
@@ -778,7 +854,7 @@ class GeneratorStackTraceTest(unittest.TestCase):
self.check_yield_from_example(call_throw)
@ -247,7 +164,7 @@ index e48d79d34f4..a48da0914b9 100644
def test_generator_gi_yieldfrom(self):
def a():
self.assertEqual(inspect.getgeneratorstate(gen_b), inspect.GEN_RUNNING)
@@ -2752,21 +2745,27 @@ test_generators just happened to be the test that drew these out.
@@ -2669,21 +2745,27 @@ test_generators just happened to be the test that drew these out.
"""

View File

@ -1515,6 +1515,76 @@ class TestGeneratorThrow(GeneratorTestsBase):
self._compile_check(fn)
def test_return_const_value_in_except_and_finally(self):
def whoo():
try:
yield 1
except ValueError:
return 2 # noqa: B901
finally:
return 3 # noqa: B012, SIM107, B901
def fn(t):
gen = whoo()
next(gen)
try:
gen.throw(ValueError)
except StopIteration as e:
assert e.args[0] == 3
except Exception as e:
raise AssertionError from e
return t.sin()
self._compile_check(fn)
def test_return_value_in_except_and_finally(self):
class Foo:
def __init__(self, x):
self.x = x
def whoo():
try:
yield 1
except ValueError:
return Foo(2) # noqa: B901
finally:
return Foo(3) # noqa: B012, SIM107, B901
def fn(t):
gen = whoo()
next(gen)
try:
gen.throw(ValueError)
except StopIteration as e:
assert e.args[0].x == 3
except Exception as e:
raise AssertionError from e
return t.sin()
self._compile_check(fn)
def test_return_None_in_except_and_finally(self):
def whoo():
try:
yield 1
except ValueError:
return 2 # noqa: B901
finally:
return # noqa: B012, SIM107
def fn(t):
gen = whoo()
next(gen)
try:
gen.throw(ValueError)
except StopIteration as e:
assert len(e.args) == 0
except Exception as e:
raise AssertionError from e
return t.sin()
self._compile_check(fn)
instantiate_parametrized_tests(GeneratorTests)
instantiate_parametrized_tests(TestGeneratorSend)

View File

@ -4,9 +4,7 @@ import contextlib
import torch
import torch.fx
from torch._dynamo.graph_deduplication import apply_graph_deduplication
from torch._dynamo.graph_utils import _detect_cycles
from torch._dynamo.output_graph import FakeRootModule
from torch._dynamo.test_case import TestCase
from torch._dynamo.testing import (
AotEagerAndRecordGraphs,
@ -1131,82 +1129,6 @@ def forward(self, L_x_ : torch.Tensor, L_y_ : torch.Tensor):
result_eager = fn(*inps)
self.assertEqual(result_compiled, result_eager)
def test_tuple_inputs(self):
with (
torch._dynamo.config.patch("use_graph_deduplication", False),
torch._dynamo.config.patch("track_nodes_for_deduplication", True),
):
def inner(x, y):
x0, x1 = torch.split(x, 5)
return x0 + x1 + y
def fn(x, y):
o1 = inner(x, y)
o2 = inner(x, y)
o3 = inner(x, y)
o4 = inner(x, y)
return o1.sum() + o2.sum() + o3.sum() + o4.sum()
graph, tracker = extract_graph_and_tracker(
fn, torch.rand(10, 10), torch.rand(5, 10)
)
class MockOutputGraph:
def __init__(self):
self.graph = graph
self.region_tracker = tracker
self.nn_modules = FakeRootModule({})
def install_subgraph(self, name, subgraph):
return ""
splits = [
n
for n in graph.nodes
if n.op == "call_function" and n.target == torch.split
]
for split in splits:
tracker.node_to_duplicates.pop(split)
apply_graph_deduplication(MockOutputGraph())
self.assertExpectedInline(
graph,
"""\
graph():
%_unnamed : [num_users=4] = get_attr[target=]
%l_x_ : torch.Tensor [num_users=4] = placeholder[target=L_x_]
%l_y_ : torch.Tensor [num_users=4] = placeholder[target=L_y_]
%split : [num_users=2] = call_function[target=torch.functional.split](args = (%l_x_, 5), kwargs = {})
%x0 : [num_users=1] = call_function[target=operator.getitem](args = (%split, 0), kwargs = {})
%x1 : [num_users=1] = call_function[target=operator.getitem](args = (%split, 1), kwargs = {})
%split_1 : [num_users=2] = call_function[target=torch.functional.split](args = (%l_x_, 5), kwargs = {})
%x0_1 : [num_users=1] = call_function[target=operator.getitem](args = (%split_1, 0), kwargs = {})
%x1_1 : [num_users=1] = call_function[target=operator.getitem](args = (%split_1, 1), kwargs = {})
%split_2 : [num_users=2] = call_function[target=torch.functional.split](args = (%l_x_, 5), kwargs = {})
%x0_2 : [num_users=1] = call_function[target=operator.getitem](args = (%split_2, 0), kwargs = {})
%x1_2 : [num_users=1] = call_function[target=operator.getitem](args = (%split_2, 1), kwargs = {})
%split_3 : [num_users=2] = call_function[target=torch.functional.split](args = (%l_x_, 5), kwargs = {})
%x0_3 : [num_users=1] = call_function[target=operator.getitem](args = (%split_3, 0), kwargs = {})
%x1_3 : [num_users=1] = call_function[target=operator.getitem](args = (%split_3, 1), kwargs = {})
%invoke_subgraph : [num_users=1] = call_function[target=torch.ops.higher_order.invoke_subgraph](args = (%_unnamed, , %x0, %x1, %l_y_), kwargs = {})
%getitem_8 : [num_users=1] = call_function[target=operator.getitem](args = (%invoke_subgraph, 0), kwargs = {})
%sum_1 : [num_users=1] = call_method[target=sum](args = (%getitem_8,), kwargs = {})
%invoke_subgraph_1 : [num_users=1] = call_function[target=torch.ops.higher_order.invoke_subgraph](args = (%_unnamed, , %x0_1, %x1_1, %l_y_), kwargs = {})
%getitem_9 : [num_users=1] = call_function[target=operator.getitem](args = (%invoke_subgraph_1, 0), kwargs = {})
%sum_2 : [num_users=1] = call_method[target=sum](args = (%getitem_9,), kwargs = {})
%add_8 : [num_users=1] = call_function[target=operator.add](args = (%sum_1, %sum_2), kwargs = {})
%invoke_subgraph_2 : [num_users=1] = call_function[target=torch.ops.higher_order.invoke_subgraph](args = (%_unnamed, , %x0_2, %x1_2, %l_y_), kwargs = {})
%getitem_10 : [num_users=1] = call_function[target=operator.getitem](args = (%invoke_subgraph_2, 0), kwargs = {})
%sum_3 : [num_users=1] = call_method[target=sum](args = (%getitem_10,), kwargs = {})
%add_9 : [num_users=1] = call_function[target=operator.add](args = (%add_8, %sum_3), kwargs = {})
%invoke_subgraph_3 : [num_users=1] = call_function[target=torch.ops.higher_order.invoke_subgraph](args = (%_unnamed, , %x0_3, %x1_3, %l_y_), kwargs = {})
%getitem_11 : [num_users=1] = call_function[target=operator.getitem](args = (%invoke_subgraph_3, 0), kwargs = {})
%sum_4 : [num_users=1] = call_method[target=sum](args = (%getitem_11,), kwargs = {})
%add_10 : [num_users=1] = call_function[target=operator.add](args = (%add_9, %sum_4), kwargs = {})
return (add_10,)""",
)
def test_param_transfer_to_submodule(self):
def inner_fn(x, y):
return x + y + y + x

View File

@ -9,28 +9,6 @@ from torch._dynamo.testing import extract_graph_and_tracker
from torch.utils._pytree import tree_map
def get_nodes_by_name(graph, names):
nodes = []
for node in graph.nodes:
if node.name in names:
nodes.append(node)
return nodes
unique_ind = 0
def track_same_nodes(names, graph, region_tracker):
global unique_ind
unique_ind += 1
# find nodes in graph with names and track them
# as if they were at the same code location
nodes = get_nodes_by_name(graph, names)
for node in nodes:
region_tracker.track_node("x", unique_ind, node)
class GraphRegionTrackerTests(TestCase):
def setUp(self):
self.exit_stack = contextlib.ExitStack()
@ -378,6 +356,35 @@ class GraphRegionTrackerTests(TestCase):
_sort_with_ref_region(index_to_rank, regions)
self.assertExpectedInline(regions, """[[0, 2, 1], [1, 0, 2]]""")
def test_no_duplicate_tracking(self):
def inner_fn(x, y):
x0 = x + 1
y0 = y + 2
z = x0.sum() + y0.sum()
return z
def fn(x, y):
o0 = inner_fn(x, y)
o1 = torch.sin(y)
o2 = inner_fn(x, o1)
o3 = inner_fn(x, y)
o4 = o3 * o3
return o2 * o4 + o0
graph, tracker = extract_graph_and_tracker(
fn, torch.rand(10, 10), torch.ones(10, 20)
)
self.assertExpectedInline(
tracker.node_to_duplicates,
"""{l_x_: [l_x_], x0: [x0, x0_1, x0_2], l_y_: [l_y_], y0: [y0, y0_1, y0_2], sum_1: \
[sum_1, sum_3, sum_5], sum_2: [sum_2, sum_4, sum_6], z: [z, z_1, z_2], o1: [o1], x0_1: [x0, x0_1, x0_2], y0_1: [y0, y0_1, y0_2], \
sum_3: [sum_1, sum_3, sum_5], sum_4: [sum_2, sum_4, sum_6], \
z_1: [z, z_1, z_2], x0_2: [x0, x0_1, x0_2], y0_2: [y0, y0_1, y0_2], sum_5: [sum_1, sum_3, sum_5], sum_6: [sum_2, sum_4, sum_6], \
z_2: [z, z_1, z_2], o4: [o4], mul_1: [mul_1], add_9: [add_9]}""",
)
key = next(iter(tracker.node_to_duplicates.keys()))
tracker.track_node(None, key) # this will fail if the node is added again
if __name__ == "__main__":
from torch._dynamo.test_case import run_tests

View File

@ -4,9 +4,58 @@ from unittest.mock import patch
import torch
import torch._dynamo.test_case
import torch._dynamo.testing
from torch._dynamo import config as dc
class RecompileTests(torch._dynamo.test_case.TestCase):
def test_inline_inbuilt_nn_modules_candidate(self):
def hook_flag_on(guard_manager, f_locals, builder):
self.assertTrue(
"[inline-inbuilt-nn-modules-candidate]" not in str(guard_manager)
)
def hook_flag_off(guard_manager, f_locals, builder):
self.assertTrue(
"[inline-inbuilt-nn-modules-candidate]" in str(guard_manager)
)
class SubMod(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(2, 2)
@torch.compile(backend="eager")
def forward(self, x):
return self.linear(x)
class Mod(torch.nn.Module):
def __init__(self):
super().__init__()
self.sm1 = SubMod()
self.sm2 = SubMod()
def forward(self, x):
return self.sm1(x) + self.sm2(x)
try:
from .utils import install_guard_manager_testing_hook
except ImportError:
from utils import install_guard_manager_testing_hook
with (
install_guard_manager_testing_hook(hook_flag_on),
dc.patch(inline_inbuilt_nn_modules=True),
):
mod = Mod()
mod(torch.randn(2, 2))
with (
install_guard_manager_testing_hook(hook_flag_off),
dc.patch(inline_inbuilt_nn_modules=False),
):
mod = Mod()
mod(torch.randn(2, 2))
def test_automatic_dynamic_reduce_recompiles(self):
# Test the counterfactual, lots of recompiles without this config
def foo(x, y):

View File

@ -25,10 +25,10 @@ from torch.testing._internal.common_utils import find_free_port
from torch.testing._internal.triton_utils import requires_cuda_and_triton
requires_cuda_and_triton = unittest.skipUnless(HAS_CUDA, "requires cuda")
if torch.distributed.is_available():
from torch.testing._internal.distributed.fake_pg import FakeStore
HAS_TLPARSE = shutil.which("tlparse") is not None
requires_tlparse = unittest.skipUnless(HAS_TLPARSE, "requires tlparse")
requires_distributed = functools.partial(
@ -1198,13 +1198,13 @@ def forward(self, x_1: "f32[2][1]cpu"):
@contextmanager
def _setup_runtime_estimates_capture(self):
"""Helper to turn on and capture the 'inductor_tlparse_runtime' structured trace."""
"""Helper to turn on and capture the combined 'inductor_runtime_and_tensor_meta' structured trace."""
payload_buffer = io.StringIO()
payload_handler = logging.StreamHandler(payload_buffer)
payload_handler.setLevel(logging.DEBUG)
payload_handler.setFormatter(StructuredTracePayloadFormatter())
payload_handler.addFilter(
StructuredTraceTestingFilter("inductor_tlparse_runtime")
StructuredTraceTestingFilter("inductor_runtime_and_tensor_meta")
)
trace_log.addHandler(payload_handler)
try:
@ -1245,8 +1245,10 @@ def forward(self, x_1: "f32[2][1]cpu"):
compiled = torch.compile(mod, backend="inductor")
compiled(torch.randn(4, 4, device="cuda"))
# Verify runtime estimates artifact was logged
self.assertIn('"inductor_tlparse_runtime"', self.buffer.getvalue())
# Verify runtime + tensor meta artifact was logged
self.assertIn(
'"inductor_runtime_and_tensor_meta"', self.buffer.getvalue()
)
payload_content = payload_buffer.getvalue().strip()
if payload_content:
@ -1310,8 +1312,10 @@ def forward(self, x_1: "f32[2][1]cpu"):
compiled = torch.compile(mod, backend="inductor")
compiled(torch.randn(4, 4, device="cuda"))
# Verify runtime estimates artifact was logged
self.assertIn('"inductor_tlparse_runtime"', self.buffer.getvalue())
# Verify artifact was logged
self.assertIn(
'"inductor_runtime_and_tensor_meta"', self.buffer.getvalue()
)
payload_content = payload_buffer.getvalue().strip()
if payload_content:
@ -1333,6 +1337,145 @@ def forward(self, x_1: "f32[2][1]cpu"):
finally:
dist.destroy_process_group()
@requires_tlparse
@requires_distributed()
@requires_cuda_and_triton
@torch._inductor.config.patch("fx_graph_cache", False)
@torch._inductor.config.patch("log_tlparse", True)
def test_tensor_metadata_logging_multiple_ops(self):
import torch.distributed as dist
store = FakeStore()
dist.init_process_group(backend="fake", rank=0, world_size=2, store=store)
class Mixed(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(4, 4)
def forward(self, x):
y = torch.relu(self.linear(x))
y = torch.ops._c10d_functional.all_reduce.default(y, "sum", "0")
y = torch.ops._c10d_functional.wait_tensor.default(y)
return y + 1
try:
with self._setup_runtime_estimates_capture() as payload_buffer:
torch._dynamo.reset()
mod = Mixed().cuda()
compiled = torch.compile(mod, backend="inductor")
compiled(torch.randn(4, 4, device="cuda"))
payload = payload_buffer.getvalue().strip()
if payload:
data = json.loads(payload)
types = sorted({op.get("type") for op in data.get("ops", [])})
self.assertExpectedInline(
str(types), """['collective', 'compute']"""
)
self.assertParses()
finally:
dist.destroy_process_group()
@requires_tlparse
@torch._inductor.config.patch("log_tlparse", True)
def test_tensor_metadata_logging(self):
"""Emit unified runtime+tensor-metadata artifact and assert a stable simplified JSON inline."""
with self._setup_runtime_estimates_capture() as payload_buffer:
def f(x):
y = x.transpose(0, 1)
z = y.mean(dim=0)
w = z.to(torch.float16)
return w
compiled = torch.compile(f, backend="inductor", fullgraph=True)
compiled(torch.ones(2, 3))
# Verify artifact was logged
self.assertIn('"inductor_runtime_and_tensor_meta"', self.buffer.getvalue())
payload = payload_buffer.getvalue().strip()
if payload:
data = json.loads(payload)
ops = data.get("ops", [])
simplified_ops = []
for op in ops:
outs = [
{
"shape": out.get("shape", []),
"stride": out.get("stride", []),
"dtype": out.get("dtype", None),
}
for out in op.get("outputs", [])
]
if outs:
simplified_ops.append(
{
"type": op.get("type", ""),
"outputs": outs,
}
)
self.assertExpectedInline(
{"ops": simplified_ops[-1:]} if simplified_ops else {"ops": []},
"""{'ops': [{'type': 'compute', 'outputs': [{'shape': [2], 'stride': [1], 'dtype': 'float16'}]}]}""",
)
self.assertParses()
@requires_tlparse
@torch._inductor.config.patch("log_tlparse", True)
def test_tensor_metadata_logging_dynamic_shapes(self):
"""Same as test_tensor_metadata_logging, but with dynamic shapes enabled to cover to_size_hints."""
with self._setup_runtime_estimates_capture() as payload_buffer:
def f(x):
y = x.transpose(0, 1)
z = y.mean(dim=0)
w = z.to(torch.float16)
return w
compiled = torch.compile(f, backend="inductor", dynamic=True)
compiled(torch.ones(2, 3))
# Verify artifact was logged
self.assertIn('"inductor_runtime_and_tensor_meta"', self.buffer.getvalue())
payload = payload_buffer.getvalue().strip()
if payload:
data = json.loads(payload)
ops = data.get("ops", [])
simplified_ops = []
for op in ops:
outs = [
{
"shape": out.get("shape", []),
"stride": out.get("stride", []),
"dtype": out.get("dtype", None),
}
for out in op.get("outputs", [])
]
if outs:
simplified_ops.append(
{
"type": op.get("type", ""),
"outputs": outs,
}
)
self.assertExpectedInline(
{"ops": simplified_ops[-1:]} if simplified_ops else {"ops": []},
(
"{'ops': [{'type': 'compute', 'outputs': ["
"{'shape': [2], 'stride': [1], 'dtype': 'float32'}, "
"{'shape': [2], 'stride': [1], 'dtype': 'float16'}]}]}"
),
)
self.assertParses()
if __name__ == "__main__":
from torch._dynamo.test_case import run_tests

View File

@ -500,6 +500,7 @@ class TestDynamoTimed(TestCase):
'inductor_fx_remote_cache_hit_keys': None,
'inductor_fx_remote_cache_miss_count': None,
'inductor_fx_remote_cache_miss_keys': None,
'inline_inbuilt_nn_modules_candidate': False,
'is_forward': True,
'is_runtime': False,
'joint_graph_pass_time_us': 0,
@ -583,6 +584,7 @@ class TestDynamoTimed(TestCase):
'inductor_fx_remote_cache_hit_keys': None,
'inductor_fx_remote_cache_miss_count': None,
'inductor_fx_remote_cache_miss_keys': None,
'inline_inbuilt_nn_modules_candidate': False,
'is_forward': True,
'is_runtime': False,
'joint_graph_pass_time_us': 0,
@ -677,6 +679,7 @@ class TestDynamoTimed(TestCase):
'inductor_fx_remote_cache_hit_keys': None,
'inductor_fx_remote_cache_miss_count': None,
'inductor_fx_remote_cache_miss_keys': None,
'inline_inbuilt_nn_modules_candidate': False,
'is_forward': False,
'is_runtime': False,
'joint_graph_pass_time_us': None,
@ -760,6 +763,7 @@ class TestDynamoTimed(TestCase):
'inductor_fx_remote_cache_hit_keys': None,
'inductor_fx_remote_cache_miss_count': None,
'inductor_fx_remote_cache_miss_keys': None,
'inline_inbuilt_nn_modules_candidate': False,
'is_forward': False,
'is_runtime': False,
'joint_graph_pass_time_us': None,

Some files were not shown because too many files have changed in this diff Show More