Compare commits

..

87 Commits

Author SHA1 Message Date
378a55ccdb Only print dde partial fx graph for export (#153218)
* Only print dde partial fx graph for export

Get #149831 into 2.7.1

* [dynamo] Add test to ensure we don't print fx graph upon data dependent graph break

This adds a regression test for #149831, also as part of getting it
cherry-picked into 2.7.1.

ghstack-source-id: fedc9ea409d1f86ee3b2cad95d161f61b8b9236f
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153416
2025-05-14 11:57:16 -07:00
800aa04bac Revert "Cleanup VS 2019 refs in pytorch (#145863)" (#152613) (#153390)
This reverts commit b45e6fa707ced2adb68eaf1a2c1ccb389a6283d7.

revert PRs:
https://github.com/pytorch/pytorch/pull/145863
https://github.com/pytorch/pytorch/pull/145319

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152613
Approved by: https://github.com/atalman, https://github.com/malfet
2025-05-14 11:55:57 -07:00
c14233ded1 Make numpy check optional (#153421)
Make numpy check optional (#149356)

We may want to skip numpy smoke tests. Hence making it optional

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149356
Approved by: https://github.com/ZainRizvi

(cherry picked from commit 6e2b2660b9996e7c3870107c493f2b89f4ffb73c)

Co-authored-by: atalman <atalman@fb.com>
2025-05-14 11:53:16 -07:00
3bfe0711d2 Add device guard for xpu conv on multi device (#153345)
Add device guard for xpu conv on multi device (#153067)

# Motivation
fixes https://github.com/pytorch/pytorch/issues/153022
The root cause is that the XPU backend registers the convolution op using `m.impl`, which bypasses the device guard logic typically added by the code generation system. This can lead to unexpected behavior if the current device isn't explicitly set.

# Additional Context
run the following script
```python
import torch
import torchvision.models as models

torch.manual_seed(0)

model = models.resnet50(weights="ResNet50_Weights.DEFAULT")
model.eval()
data = torch.rand(1, 3, 224, 224)

device = torch.device('xpu:1')  # 'xpu:0'
model = model.to(device=device, dtype=torch.float16)
data = data.to(device, dtype=torch.float16)

with torch.no_grad():
    ret = model(data)
    print(ret)

print("Execution finished")
```
The output is
```bash
         -9.2102e-02, -7.7588e-01, -1.4111e+00, -9.2383e-01,  6.4551e-01,
         -6.0730e-03, -7.8271e-01, -1.1904e+00, -4.1602e-01,  3.2715e-02,
         -4.9854e-01, -6.3623e-01, -8.5107e-01, -6.8555e-01, -9.4434e-01,
         -8.8672e-01, -6.7969e-01, -6.9824e-01, -2.8882e-01,  2.0312e+00]],
       device='xpu:1', dtype=torch.float16)
Execution finished

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153067
Approved by: https://github.com/albanD, https://github.com/EikanWang

(cherry picked from commit e06a08059a378ee48a3e33b413c4c0763661aae0)

Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
2025-05-14 11:23:15 -07:00
fa98236357 [release only] Bump triton version to 3.3.1 (#153554)
Build triton 3.3.1 
Please see https://github.com/triton-lang/triton/issues/6805 for the cherry-picks included
2025-05-14 11:02:54 -07:00
f77213d3da Update version to 2.7.1 2025-05-12 21:04:16 -07:00
1a3161ae5a [Cherry-pick] Fix copysign + scalar correctness issue (#153098)
* [Testing] Add copysign from scalar regression test (#152997)

But instead of adding it just for MPS backend, add it to OpInfo

Fixes https://github.com/pytorch/pytorch/issues/152582
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152997
Approved by: https://github.com/wdvr

(cherry picked from commit 9919d6b8724a4edbafcacf180c42bdbb1fb3a7ed)

* Spiritual cherry-pick of 52cbcac640dc1d8c12a6f8308799c75d45c6fc4a

* [CI] Skip test_copy_large_tensor on M2-15 runners (#150377)

They have more than 12Gb memory, but may be running this test causes OOM in CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150377
Approved by: https://github.com/atalman
2025-05-09 14:54:04 -07:00
27e9ca5d36 [MKLDNN] Check that strides are positive (#153092)
[MKLDNN] Check that strides are positive (#151848)

For pooling ops. Prevents division-by-zero when argument is wrong

Fixes https://github.com/pytorch/pytorch/issues/149274

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151848
Approved by: https://github.com/atalman

(cherry picked from commit 6f327128a99debfb2312ee256523ad6b62f763d6)

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2025-05-07 16:45:25 -07:00
dab8130f4f [vec128] Fix fmsub NEON defintion (#153093)
[vec128] Fix fmsub NEON defintion (#152075)

As reported in https://github.com/pytorch/pytorch/issues/149292, according to manual, `vfmsq_f32` implements `c - a * b` rather than `a * b - c`, so it's call must be prefixed with `vnegq_f32`

Also, adjust the tests to use OpMath for FMA computation to avoid accuracy error accumulation due to non-fused multiply-and-add over lower precision dtypes

Note that `Vectorized::fmsub` is not currently instantiated anywhere, so it could safely remain broken

TODO:
 - Enable C++ testing on MacOS and/or aarch64 platforms (right now Mac tests are build without C++ tests)

Fixes https://github.com/pytorch/pytorch/issues/149292

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152075
Approved by: https://github.com/swolchok
ghstack dependencies: #151955

(cherry picked from commit 2ea865339185b4b59324b4a494d088d0a5aeab88)

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2025-05-07 18:05:06 -04:00
20d62a8d25 Fix tensorpipe compilation with clang-17 (#153091)
Fix tensorpipe compilation with clang-17 (#151344)

By suppressing `missing-template-arg-list-after-template-kw` warning, which seems to be required to compile Google's libnop, which is in a semi-abandoned state now
```
In file included from /Users/malfet/git/pytorch/pytorch/third_party/tensorpipe/third_party/libnop/include/nop/base/variant.h:21:
/Users/malfet/git/pytorch/pytorch/third_party/tensorpipe/third_party/libnop/include/nop/types/variant.h:241:30: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
  241 |     index_ = value_.template Construct(std::forward<Args>(args)...);
      |                              ^
/Users/malfet/git/pytorch/pytorch/third_party/tensorpipe/third_party/libnop/include/nop/types/variant.h:258:26: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
  258 |     if (!value_.template Assign(TypeTag<T>{}, index_, std::forward<U>(value))) {
      |                          ^
/Users/malfet/git/pytorch/pytorch/third_party/tensorpipe/third_party/libnop/include/nop/types/variant.h:265:26: error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw]
  265 |     if (!value_.template Assign(index_, std::forward<T>(value))) {
      |                          ^
3 errors generated.
```

Fixes https://github.com/pytorch/pytorch/issues/151316

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151344
Approved by: https://github.com/ZainRizvi, https://github.com/seemethere

(cherry picked from commit 331423e5c24170b218e743b3392acbad4480340d)

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-05-07 17:07:42 -04:00
cd885e7c9a [Cherry Pick] Remove cuda dependencies from non cuda buids #152333 (#153089)
* Remove cuda dependencies from non cuda buids

* regenerate

* regenerate
2025-05-07 17:06:04 -04:00
99847860ea [BE] Move all lint runner to 24.04 (#153080)
* [BE] Move all lint runner to 24.04 (#150427)

As Ubuntu-20 reached EOL on Apr 1st, see https://github.com/actions/runner-images/issues/11101
This forces older python version to be 3.8
Delete all linux-20.04 runners from the lintrunner.yml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150427
Approved by: https://github.com/seemethere

(cherry picked from commit 48af2cdd270c275acccc4a94b04e4ccdb64d557a)

* Update few more cases
2025-05-07 13:38:42 -07:00
24b0c4abfc [ATen][CUDA] Optimize 128 bit vectorization (#152967)
[ATen][CUDA] Optimize 128 bit vectorization (#148320)

Fixes #147376.
As per request: https://github.com/pytorch/pytorch/pull/145746#pullrequestreview-2642118301
This PR omits sm80 or older of using vec8 kernels due to long compilation and large binary size.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148320
Approved by: https://github.com/eqy, https://github.com/malfet, https://github.com/atalman

(cherry picked from commit 72337bdcf2f86eb72f289fdbd7eb63fd664aaa86)

Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
2025-05-07 11:09:39 -07:00
2dc4b15cf3 [dynamo][super variable] Fix bug to use correct source (#152774) 2025-05-06 14:25:49 -04:00
cd6037ed4b [cudagraphs] Fix issue in collecting static_input_idxs (#152768)
[cudagraphs] Fix issue in collecting static_input_idxs (#152287)

related to https://github.com/pytorch/pytorch/issues/152275

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152287
Approved by: https://github.com/bdhirsh, https://github.com/eellison


(cherry picked from commit 4a63cab624a798929191779743c62ed65926de58)

Co-authored-by: Brian Hirsh <hirsheybar@fb.com>
2025-05-06 14:13:32 -04:00
1341794745 Gracefully handle optree less than minimum version, part 2 (#151323)
Gracefully handle optree less than minimum version, part 2 (#151257)

If optree is less than the minimum version, we should pretend it doesn't
exist.

The problem right now is:
- Install optree==0.12.1
- `import torch._dynamo`
- This raise an error "min optree version is 0.13.0"

The fix is to pretend optree doesn't exist if it is less than the min
version.

There are ways to clean up this PR more (e.g. have a single source of
truth for the version, some of the variables are redundant), but I am
trying to reduce the risk as much as possible for this to go into 2.7.

Test Plan:

I verified the above problem was fixed. Also tried some other things,
like the following, which now gives the expected behavior.
```py
>>> import torch
>>> import optree
>>> optree.__version__
'0.12.1'
>>> import torch._dynamo
>>> import torch._dynamo.polyfills.pytree
>>> import torch.utils._pytree
>>> import torch.utils._cxx_pytree
ImportError: torch.utils._cxx_pytree depends on optree, which is
an optional dependency of PyTorch. To u
se it, please upgrade your optree package to >= 0.13.0
```

I also audited all non-test callsites of optree and torch.utils._cxx_pytree.
Follow along with me:

optree imports
- torch.utils._cxx_pytree. This is fine.
- [guarded by check] f76b7ef33c/torch/_dynamo/polyfills/pytree.py (L29-L31)

_cxx_pytree imports
- [guarded by check] torch.utils._pytree (changed in this PR)
- [guarded by check] torch/_dynamo/polyfills/pytree.py (changed in this PR)
- [guarded by try-catch] f76b7ef33c/torch/distributed/_functional_collectives.py (L17)
- [guarded by try-catch] f76b7ef33c/torch/distributed/tensor/_op_schema.py (L15)
- [guarded by try-catch] f76b7ef33c/torch/distributed/tensor/_dispatch.py (L35)
- [guarded by try-catch] f76b7ef33c/torch/_dynamo/variables/user_defined.py (L94)
- [guarded by try-catch] f76b7ef33c/torch/distributed/tensor/experimental/_func_map.py (L14)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151257
Approved by: https://github.com/malfet, https://github.com/XuehaiPan

(cherry picked from commit f1f18c75c9fc85df3cba8fe38582b1ddeefb270a)

Co-authored-by: rzou <zou3519@gmail.com>
2025-04-15 15:56:51 -07:00
073912749d Gracefully handle optree less than minimum version (#150977)
Gracefully handle optree less than minimum version (#150956)

Summary:
- We are saying the minimum version of pytree that PyTorch can use is
  0.13.0
- If a user imports torch.utils._cxx_pytree, it will raise an
  ImportError if optree doesn't exist or exists and is less than the
  minimum version.

Fixes https://github.com/pytorch/pytorch/issues/150889. There are
actually two parts to that issue:
1. dtensor imports torch.utils._cxx_pytree, but the optree installed in
   the environment might be too old. Instead, raising ImportError in
   torch.utils._cxx_pytree solves the issue.
2. We emit an "optree too low version" warning. I've deleted the
   warning in favor of the more explicit ImportError.

Test Plan:
- code reading
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150956
Approved by: https://github.com/albanD, https://github.com/atalman, https://github.com/XuehaiPan

(cherry picked from commit 061832bc7a6711daaaf2bca12c2140bd8dea7eb5)

Co-authored-by: rzou <zou3519@gmail.com>
2025-04-10 10:39:40 -04:00
0c236f3c72 Update triton wheel build, setuptools pin (#150953)
Update triton wheel build, setuptools pin (#150931)

Observing failure in release workflow:
https://github.com/pytorch/pytorch/actions/runs/14346340202/job/40216804374

```
Traceback (most recent call last):
  File "/opt/python/cp311-cp311/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 11, in <module>
    from setuptools.command.bdist_wheel import bdist_wheel as bdist_wheel
ModuleNotFoundError: No module named 'setuptools.command.bdist_wheel'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/tmppwpqef_x/triton/python/setup.py", line 27, in <module>
    from wheel.bdist_wheel import bdist_wheel
  File "/opt/python/cp311-cp311/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 13, in <module>
    raise ImportError(ERROR) from exc
ImportError: The 'wheel.bdist_wheel' module has been removed.
Please update your setuptools to v70.1 or later.
If you're explicitly importing 'wheel.bdist_wheel', please update your import to point to 'setuptools.command.bdist_wheel' instead.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150931
Approved by: https://github.com/Skylion007

(cherry picked from commit d0e34822663b759f17ef5e6ec574cbf820c23b85)

Co-authored-by: atalman <atalman@fb.com>
2025-04-10 10:39:03 -04:00
c7ff78dfc0 Fix inplacing with multiple, fused uses (#150892)
Fix inplacing with multiple, fused uses (#150845)

We had `can_inplace` defined on a single use. When that buffer has multiple uses inside a fused node, we need to check if the other accesses have the same index. Otherwise we may read memory that has already been written to from inplacing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150845
Approved by: https://github.com/zou3519, https://github.com/exclamaforte, https://github.com/atalman, https://github.com/jansel

(cherry picked from commit 27ded359a5dcbe8f92e01a24bec258bbfe1a73d6)

Co-authored-by: eellison <elias.ellison@gmail.com>
2025-04-08 20:35:02 -04:00
894909a613 Revert "[CUDA] Only use vec128 if CUDA version is newer than 12.8" (#150855)
Revert "[CUDA] Only use vec128 if CUDA version is newer than 12.8 (#150818)"

This reverts commit 3f236f19032ff6424160018c024478c83b6ad6b9.
2025-04-08 18:49:02 -04:00
ef2b1390ed [Manylinux 2.28] Correct Linux aarch64 cuda binaries wheel name (#150820)
[Manylinux 2.28] Correct Linux aarch64 cuda binaries wheel name (#150786)

Related to: https://github.com/pytorch/pytorch/issues/149044#issuecomment-2784044555
For CPU binaries we run auditwheel however for cuda binaries auditwheel produces invalid results . Hence we need to rename the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150786
Approved by: https://github.com/malfet

(cherry picked from commit 836955bdbdeb299e6937065299564fb44ec422c2)

Co-authored-by: atalman <atalman@fb.com>
2025-04-07 23:07:41 -04:00
3f236f1903 [CUDA] Only use vec128 if CUDA version is newer than 12.8 (#150818)
[CUDA] Only use vec128 if CUDA version is newer than 12.8 (#150705)

By addressing a feedback requested at https://github.com/pytorch/pytorch/pull/145746
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150705
Approved by: https://github.com/atalman

(cherry picked from commit 5228986c395dc79f90d2a2b991deea1eef188260)

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2025-04-07 23:06:01 -04:00
35f1e76212 Reland of "[ROCm] change preferred blas lib defaults (#150249)"" (#150707)
Revert "Revert "[ROCm] change preferred blas lib defaults (#150249)" (#150658)"

This reverts commit 06c6a81a987e271d35a5da9501b4a17915bb8206.
2025-04-04 19:34:45 -04:00
a6321d6227 Revert "Dont exclude constant_pad_nd in prologue fusion" (#150699)
Revert "Dont exclude constant_pad_nd in prologue fusion (#150145)"

This reverts commit 6569576c4ecfb9b094a3b8a0b3db7c6e8b48f49d.
2025-04-04 15:51:44 -04:00
1cc51c640a [CUDA][avgpool2d] Fix backward launch bounds again for sm100, sm120 (#150676)
[CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm120` (#150640)

`__CUDA_ARCH__` is not visible in host code, which causes incorrect launch bounds and `too many resources requested for launch` on blackwell

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150640
Approved by: https://github.com/malfet, https://github.com/drisspg, https://github.com/atalman

(cherry picked from commit 09c4da9325595f0091c81f5c47fc4ee1df0c4094)

Co-authored-by: Eddie Yan <eddiey@nvidia.com>
2025-04-04 07:09:23 -07:00
28ca4dd77d update get start xpu document for v2.7 (#150633)
update get start xpu document for v2.7 (#150397)

update get start xpu document for v2.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150397
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/atalman

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
(cherry picked from commit 96f35f55e2676cfa76c28fb8f88e9f3cde08c59c)

Co-authored-by: ZhaoqiongZ <106125927+ZhaoqiongZ@users.noreply.github.com>
2025-04-03 20:40:52 -04:00
06c6a81a98 Revert "[ROCm] change preferred blas lib defaults (#150249)" (#150658)
This reverts commit 8b6bc59e9552689e115445649b76917b9487a181.
2025-04-03 20:39:27 -04:00
3b61d5d4e3 Update expected results for pr_time_benchmarks (#150620) 2025-04-03 10:14:13 -04:00
8b6bc59e95 [ROCm] change preferred blas lib defaults (#150249)
* [ROCm] change preferred blas lib defaults (#150212)

Fixes #148883
Fixes #150155

Also adds at::BlasBackend:Default. Instinct cards prefer hipBLASLt, everything else prefers rocBLAS.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150212
Approved by: https://github.com/jeffdaily

(cherry picked from commit 7a470c932060190b314fe18bc1cec75335e4831f)

* add unit test for preferred_blas_library settings

---------

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-04-03 09:26:58 -04:00
c2ccaa3c21 [inductor] Fix inductor windows linker error (#150447)
[inductor] Fix inductor windows linker error (#150256)

Fixes #149889

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150256
Approved by: https://github.com/anijain2305, https://github.com/eellison

(cherry picked from commit 37ebb0b56a3af1a5e8083337b4d670fc70fe23a3)

Co-authored-by: Jason Ansel <jansel@meta.com>
2025-04-02 17:36:04 -07:00
6569576c4e Dont exclude constant_pad_nd in prologue fusion (#150145)
Dont exclude constant_pad_nd in prologue fusion (#149947)

Originally, I excluded constant_pad_nd from fusing to be conservative on compilation time. But, on benchmarking, you do occasionally get speedups by fusing it. Also includes a fix for making single, contiguous dep for prologues.

For instance, the following benchmark gets a 7% speedup by fusing in the constant_pad_nd.

```
import torch
import torch.nn.functional as F
torch._inductor.config.force_disable_caches = True

padded_N = 2048
n_pad_rows = 100

K, N = 2048, 4096

tensor1 = torch.randn(padded_N - n_pad_rows, 4096, device="cuda").to(torch.bfloat16)
tensor2 = torch.randn(4096, 4096, device="cuda").to(torch.bfloat16)

@torch.compile(mode='max-autotune-no-cudagraphs')
def masked_linear(input, weight, n_pad_input_rows):
    """
    Linear layer with input padded by `n_pad_input_rows` rows
    """
    # Use constant_pad_nd to pad with zeros for the invalid rows
    padded_input = F.pad(tensor1, (0, 0, 0, n_pad_input_rows), "constant", 0)
    return F.linear(padded_input, weight)

# Invoke the function
masked_linear(tensor1, tensor2, n_pad_rows)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149947
Approved by: https://github.com/drisspg

(cherry picked from commit 4c57aec5b9a37e23caedfe305fb4577e26254123)

Co-authored-by: eellison <elias.ellison@gmail.com>
2025-04-02 20:24:52 -04:00
5416dff2b2 [Release/2.7][MPS] Warn that torch.compile is a prototype (#150550)
And reference https://github.com/pytorch/pytorch/issues/150121
2025-04-02 14:55:19 -07:00
791265114e Revert "[fx] Move Node._prepend/Node._remove_from_list to C++ (#148261)" (#150572)
This reverts commit 5d4e7d58b42623a9024a84f0050967ff0318dcdb.
2025-04-02 14:47:28 -07:00
7ad8bc7e8b [Windows][inductor] fix blank space break windows file path (#150448)
[Windows][inductor] fix blank space break windows file path (#149388)

Fixes #149310

From origin error message:
```cmd
Command:
cl /I C:/Program Files/Python310/Include /I c:/code/.env/lib/site-packages/torch/include /I c:/code/.env/lib/site-packages/torch/include/torch/csrc/api/include /I c:/code/.env/lib/site-packages/torch/include/TH /I c:/code/.env/lib/site-packages/torch/include/THC /D TORCH_INDUCTOR_CPP_WRAPPER /D STANDALONE_TORCH_HEADER /D C10_USING_CUSTOM_GENERATED_MACROS /DLL /MD /O2 /std:c++20 /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /openmp /openmp:experimental C:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp /LD /FeC:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.pyd /link /LIBPATH:c:/code/.env/Scripts/libs /LIBPATH:c:/code/.env/lib/site-packages/torch/lib torch.lib torch_cpu.lib torch_python.lib sleef.lib

Output:
Microsoft (R) C/C++ Optimizing Compiler Version 19.43.34809 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

cl : Command line warning D9025 : overriding '/openmp' with '/openmp:experimental'
cl : Command line warning D9024 : unrecognized source file type 'Files/Python310/Include', object file assumed
coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp
C:/Users/user/AppData/Local/Temp/torchinductor_user/ou/coubnfnqsm2gbdzdytufv46jotd6sxsnnhgldiw45pl5yjq5nbvz.cpp(21): fatal error C1083: Cannot open include file: 'Python.h': No such file or directory
```
Python installed in `C:/Program Files/Python310` path, and the blank space break the file path.

Solution:
Add quotes to declare Windows file paths, after that:
```cmd
cl /I "C:/Users/Xuhan/.conda/envs/new_build/Include" /I "C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/include" /I "C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/include/torch/csrc/api/include"  /D TORCH_INDUCTOR_CPP_WRAPPER /D STANDALONE_TORCH_HEADER /D  C10_USING_CUSTOM_GENERATED_MACROS /D CPU_CAPABILITY_AVX512  /DLL /MD /O2 /std:c++20 /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /openmp /openmp:experimental  C:/Users/Xuhan/AppData/Local/Temp/tmp1wsj0m8r/za/czarp3ly5c22ge3hydvnzvad4cjimyr3hkwvofodxqffgil7frfd.cpp  /arch:AVX512  /FeC:/Users/Xuhan/AppData/Local/Temp/tmp1wsj0m8r/za/czarp3ly5c22ge3hydvnzvad4cjimyr3hkwvofodxqffgil7frfd.pyd /LD /link /LIBPATH:"C:/Users/Xuhan/.conda/envs/new_build/libs" /LIBPATH:"C:/Users/Xuhan/.conda/envs/new_build/lib/site-packages/torch/lib"  "torch.lib" "torch_cpu.lib" "torch_python.lib" "sleef.lib"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149388
Approved by: https://github.com/jansel

(cherry picked from commit bc1b8730a45e659dca83ec83995c17d4eec9c869)

Co-authored-by: Xu Han <xu.han@outlook.com>
2025-04-02 13:21:44 -07:00
f2ee3f4847 [BE] Fix triton windows build (#150547)
[BE] Fix triton windows build (#150512)

Fixes #150480
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150512
Approved by: https://github.com/atalman

Co-authored-by: Andrey Talman <atalman@fb.com>
(cherry picked from commit 8102272d8c5b5a3063446ec67877eea495e6d323)

Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>
2025-04-02 09:54:45 -07:00
dfd39fe14f [cherry-pick] [CI] Disable some tests that are failing in periodic #150059 (#150327)
* [CI] Disable some tests that are failing in periodic (#150059)

Disabling some tests to restore periodic

nogpu avx512 timeout:
59f14d19ae (38492953496-box)

profiler failure: 7ae0ce6360 (38461255009-box)

test_accelerator failure:
87bfd66c3c (39476723746-box)
origin: 146098

test_overrides failure:
bf752c36da (39484562957-box)
origin: 146098

inductor cpu repro:
bb9c426024 (38447525659-box)

functorch eager transforms:
8f858e226b (39488068620-box)
f2cea01f71 (39555064878)
b5281a4a18 (39599355600)
either 148288 or 148261?

2ec9aceaeb/1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150059
Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/malfet

* disable_CompiledOptimizerParityTests

* Update test/inductor/test_compiled_optimizers.py

---------

Co-authored-by: Catherine Lee <csl@fb.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-04-01 23:05:14 -07:00
b766c0200a [Cherry-pick] Make PyTorch buildable with cmake-4 (#150460)
* [Cmake] Make PyTorch buildable by CMake-4.x (#150203)

By turning on compatibility mode for protobuf, nnpack, PSimd and FP16, ittapi, TensorPipe and Gloo
Update CMake requirements

 Revert 0ece461ccafe5649d2d0f058ff5477765fd56499 and b0901d62ae2c2e909f91401eacebf3731df20cbe to test that it actually works

TODO:
  - Update/get rid of those libraries

Fixes https://github.com/pytorch/pytorch/issues/150149

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150203
Approved by: https://github.com/clee2000

(cherry picked from commit 493c7fa66f82cf781ee0f9d0cc9e305688f0a286)

* Make PyTorch buildable by CMake-4.x on s390x (#150294)

This is a continuation of
https://github.com/pytorch/pytorch/pull/150203
that fixes nightly build on s390x.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150294
Approved by: https://github.com/malfet

(cherry picked from commit ab342d3793472c65aaa0b007ca13a98fc9206dc5)

---------

Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com>
2025-04-01 19:37:52 -07:00
a3cd7b0cc4 [MPS] tril op not handling infs correctly (#150479)
[MPS] tril op not handling infs correctly (#149866)

Fixes #149813

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149866
Approved by: https://github.com/malfet

(cherry picked from commit ba46643df181f37efe594f9dd77b45436e08e6ec)

Co-authored-by: Isalia20 <irakli.salia854@gmail.com>
2025-04-01 16:30:38 -07:00
8522972133 torch.backends.mkldnn.flags() CM should not warn (#150416)
`torch.backends.mkldnn.flags()` CM should not warn (#150358)

By returning `None` rather than `False` from `THPModule_allowTF32OneDNN` when USE_XPU is not defined

Added regression test

Fixes https://github.com/pytorch/pytorch/issues/149829

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150358
Approved by: https://github.com/atalman

(cherry picked from commit 6470b373c16017f5cb8f1aa4060bb60632b18160)

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2025-04-01 08:53:27 -07:00
c4b98c8364 [Build] Fix XPU builds inside venv (#150301)
Update the torch-xpu-ops commit to [3ee2bd2f13e1ed17a685986ff667a58bed5f2aa5](3ee2bd2f13)

 - Fix the build error if users build torch xpu through python virtual environment. It was due to that torch-xpu-ops uses `${PYTHON_EXECUTABLE}` to get python path. However, `${PYTHON_EXECUTABLE}` is the sytem python path, while the pytorch root cmake is using the Python_EXECUTABLE ([Here](420a9be743/tools/setup_helpers/cmake.py (L310))) https://github.com/intel/torch-xpu-ops/issues/1461
 - code diff (026b2c8c7c..3ee2bd2f13)
   - base commit: 026b2c8c7c92a7b2cec5d26334006e3423251cc6
   - new commit: 3ee2bd2f13e1ed17a685986ff667a58bed5f2aa5

(cherry picked from commit f74d5d576aedf053b7574f3eb06d12417d80625a)

Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>
2025-04-01 08:22:00 -07:00
d10ffd76db [Doc] Update CMAKE_PREFIX_PATH for XPU windows README (#150395)
[Doc] Update CMAKE_PREFIX_PATH for XPU windows README (#148863)

We found that the `pip install cmake` and `conda install cmake` has different behavior.
The reason is that the pip installed one doesn't find the corresponding libs under conda env. So we need to set the `CMAKE_PREFIX_PATH` for alignment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148863
Approved by: https://github.com/CuiYifeng, https://github.com/malfet

Co-authored-by: Cui, Yifeng <yifeng.cui@intel.com>
(cherry picked from commit ce52674b7651921630019de62323ee0bfd69516d)

Co-authored-by: Stonepia <tong.su@intel.com>
2025-04-01 10:56:57 -04:00
53a13e553d Enabling xpu in OffsetBasedRNGTracker . (#150389)
Enabling xpu in OffsetBasedRNGTracker . (#148360)

Else torch.distributed breaks on xpu devices.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148360
Approved by: https://github.com/zhangxiaoli73, https://github.com/guangyey, https://github.com/gujinghui, https://github.com/XilunWu, https://github.com/kwen2501

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
(cherry picked from commit f0e1a0838c1245a8763d1c67318b23940a3e9246)

Co-authored-by: _githubsgi <zozoxoxo897@gmail.com>
2025-04-01 10:54:51 -04:00
5745d6a770 [ROCm] cmake 4 workaround for hiprtc (#150361)
[ROCm] cmake 4 workaround for hiprtc (#150324)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150324
Approved by: https://github.com/jeffdaily, https://github.com/atalman, https://github.com/malfet

(cherry picked from commit 423e4a4568958845da52808e50d1cdd2ba7fa48d)

Co-authored-by: Faa Diallo <Faa.Diallo@amd.com>
2025-04-01 10:44:03 -04:00
60ddcd803e Revert "[PGNCCL] Launch kernel on current stream & remove record_stream entirely (#148590) (#150352)
Revert "[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590)"

This reverts commit ef6296e7f20d744a0cfed81cab573d60204e7626.
2025-03-31 15:25:34 -07:00
f2b3b5c453 [MPS] Fix dot/mm for conj_tensors (#150237)
[MPS] Fix dot/mm for conj_tensors (#150157)

- Distinguish between conjugated/non_conjugated inputs by appending conjugation to the operator key
- For matmul or dot, add `conjugateWithTensor:name:` calls before running the op
- Enable testing for conjugated ops by passing `include_conjugated_inputs` to opinfo
- Filter  `include_conjugated_inputs` argument from `sample_inputs_window` (probably should have landed as separate PR)
- Preserve conj property when gathering the views, that fixes `cov` operator

Fixes https://github.com/pytorch/pytorch/issues/148156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150157
Approved by: https://github.com/dcci

(cherry picked from commit 7c65911b11fc1cc7d93045f4cf923058e8a27782)

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2025-03-31 16:13:11 -04:00
71fa7def26 Fix #149806 : Fix path lookup in _preload_cuda_deps (#150068)
Fix #149806 : Fix path lookup in _preload_cuda_deps (#149808)

@pytorchbot label "bug"

Fixes #149806

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149808
Approved by: https://github.com/jansel

(cherry picked from commit 68b327341c748c869fdd7cb51cd05ab8ad6caaac)

Co-authored-by: Divain <fegnouche@hotmail.fr>
2025-03-31 13:07:56 -07:00
1a6c192dc4 Use schema as source of truth + support ones_like/empty_like (#149775)
Use schema as source of truth + support ones_like/empty_like (#149052)

This change does 2 important things:
(a) Instead of relying on IValue type as source of truth, we use the schema as the source of truth, which is important as IValue types are overloaded and can ambiguously convert incorrectly. For example, a MemoryFormat will look like an int + get converted to an int64_t vs a MemoryFormat!

(b) This PR expands support for many more types to encompass way more schemas, e.g., Optional, Device, dtype, etc. The main win from this PR is the ability for aoti_torch_call_dispatcher to call TensorFactory ops like ones_like/empty_like!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149052
Approved by: https://github.com/albanD

(cherry picked from commit 988827cdfb6d5946049cac7141a5ca04f2177c0a)

Co-authored-by: Jane Xu <janeyx@meta.com>
2025-03-31 11:03:26 -07:00
e691e92297 Update Doc for Intel XPU Profiling (#150272)
Update Doc for Intel XPU Profiling (#134515)

Updated below two pages for Intel XPU
https://pytorch.org/docs/stable/torch.compiler_profiling_torch_compile.html
https://pytorch.org/docs/stable/profiler.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134515
Approved by: https://github.com/dvrogozh, https://github.com/malfet

(cherry picked from commit 7aacbab0b32596a3c334dca5d488e4620b79bb5e)

Co-authored-by: Louie Tsai <louie.tsai@intel.com>
2025-03-31 09:42:08 -07:00
2b73f403c7 Pin cmake to 3.31.2 for windows conda install (#150223)
Pin cmake to 3.31.2 for windows conda install (#150185)

Trying to fix nightly failures
Cmake 4.0 update https://pypi.org/project/cmake/4.0.0/ broke nightly builds
You can see it here: https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=cuda11_8-build
and here: https://hud.pytorch.org/hud/pytorch/pytorch/nightly/1?per_page=50&name_filter=
This fix for Windows Builds. Linux and MacOS where already fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150185
Approved by: https://github.com/jeanschmidt, https://github.com/ZainRizvi

(cherry picked from commit b0901d62ae2c2e909f91401eacebf3731df20cbe)

Co-authored-by: Andrey Talman <atalman@fb.com>
2025-03-28 14:28:41 -07:00
697cd9bbb1 [inductor][triton 3.3] Fix cpp_wrapper w/ TMA in triton 3.3 (#149993)
[inductor][triton 3.3] Fix cpp_wrapper w/ TMA in triton 3.3 (#149973)

Fixes #148938

Context:

In triton 3.3, triton kernels expect a global scratch space arg to be passed in. This is fixed in #148051, which fixed most of the AOTI/cpp_wrapper failures; the fix is to inject a (null) global scratch space arg passed as an argument to all kernels.

But in the case of TMA, we need to call a non-triton-generated function - init1DTMADescriptor. The same `generate_args_decl` function used for calling triton kernels (and modified in #148051 to insert a global scratch space) is used to prepare the arguments to init1DTMADescriptor, and so it had an extra global scratch space arg. Then we'd get a null pointer passed into init1DTMADescriptor, resulting in an IMA later on when the TMA use kernel

This PR: adds an option to `generate_args_decl` to specify whether this is a triton kernel (in which case we should add the global scratch space arg) or not (when we shouldn't add the extra arg).

Note: this doesn't appear in CI because we don't run these tests with Hopper machines in CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149973
Approved by: https://github.com/drisspg

(cherry picked from commit a8d0c5c92818186119d4a94d98999acc3f549a7e)

Co-authored-by: David Berard <dberard@fb.com>
2025-03-28 16:40:47 -04:00
64ca70f83c Pin cmake==3.31.6 (#150193)
Pin cmake==3.31.6 (#150158)

I'm not sure if this is the right think to do, but cmake 4.0.0 got released on pypi and our builds are failing with it

Example:
aa70d62041 (39555975425-box)

I guess we have to go change all the cmake_minimum_required to >=3.5?

backwards compat still failing because its building with the base commit which this pr can't really change until it gets merged, but at least manywheel binary builds got past where they were originally failing

Also pin the conda installation, but the most recent version on conda is 3.31.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150158
Approved by: https://github.com/cyyever, https://github.com/malfet

(cherry picked from commit 0ece461ccafe5649d2d0f058ff5477765fd56499)

Co-authored-by: Catherine Lee <csl@fb.com>
2025-03-28 09:09:29 -07:00
1b84fd1503 Enable fast path for qlinear (static/dynamic) and qadd for AArch64 though ACL directly. (#149435)
* Enable fast qlinear static/dynamic path for AArch64 through ACL directly (#148585)

This enables a fast path for eager mode static/dynamic quantization for AArch64 through Arm Compute Library (ACL) directly.

Context: PRs #126687, #139887 enabled an optimized implementation for `qlinear` and `qlinear_dynamic` for aarch64 through `ideep → oneDNN → ACL` which improved performance by ~10x compared to the previous implementation.
However, the current `qlinear` and `qlinear_dynamic` path (`ideep → oneDNN → ACL`) suffers from high overhead due to the API friction between the stateless oneDNN API and the stateful ACL low-precision GEMM (`lowp_gemm`) API - for example, ACL's `lowp_gemm` objects cache information like weights reduction or weights in optimized memory format which oneDNN does not allow due to its stateless nature.
Hence, ACL currently runs a (redundant) sum of columns and pre-transposition (to the gemm kerne's optimal format) for each GEMM operation.
This PR addresses the sub-optimalities above by integrating ACL directly with `qlinear` and `qlinear_dynamic`.

- **For `qlinear_dynamic` (dynamically quantized matmuls):**

This PR yields an ****average speedup** (averaged over context_lengths of 2^3 up to 2^9) of ~ **50%** for `bert-base-uncased`, `bert-large-uncased`, `roberta-base`, `distilbert-base-uncased`** with 16 threads on a Neoverse-V1 (with transformers==4.48) for the benchmarking script below:
```
# SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliate <open-source-office@arm.com>
# SPDX-License-Identifier: BSD-3-Clause
import torch
from transformers import AutoModel, AutoConfig
import time
import numpy as np
from argparse import ArgumentParser

class ModelArgumentParser(ArgumentParser):
    def __init__(self) -> None:
        super().__init__(description="huggingface model")
        self.add_argument("--context_length",
                            help="context length - number of input tokens",
                            type=int,
                            default=64
        )
        self.add_argument("--model",
                            help="model checkpoint - i.e. 'bert-base-uncased'",
                            type=str,
                            default=None)
        self.add_argument("--iters",
                          help="benchmark iterations",
                          default=500)

if __name__ == "__main__":
    parser = ModelArgumentParser()
    args = parser.parse_args()
    model_name = args.model
    config = AutoConfig.from_pretrained(model_name)
    batch_size = 1
    model = AutoModel.from_pretrained(model_name)
    model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
    model.eval()
    inputs = torch.randint(config.vocab_size, (batch_size, args.context_length), dtype=torch.long, device="cpu")
    times = []
    with torch.no_grad():
        # warmup
        for _ in range(10):
            model(inputs)
        # benchmark
        for _ in range(args.iters):
            s = time.time_ns()
            model(inputs)
            times.append((time.time_ns() - s) / 1e6)

    print("Model = ", model_name)
    print("Context Length = ", args.context_length)
    print("Min (ms) = ", min(times))
    print("Mean (ms) = ", np.mean(times))
```

- **For `qlinear` (statically quantized matmuls):**

This PR yields an **average speedup of 2x for signed activations (`s8s8s8`) and 95x for unsigned activations (u8s8u8)** on a Neoverse-V1 with 16 threads for the benchmarking script below.
The averages are over for all combinations of `M = [8, 16, ..., 512]`, `K = [768, 1024, 2048, 4096]`, `N = [768, 1024, 2048, 4096]`.
The astronomical speedup for unsigned activation is because oneDNN v3.7 does not have an optimized implementation for `u8s8u8` on AArch64.

```
# SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliate <open-source-office@arm.com>
# SPDX-License-Identifier: BSD-3-Clause
import torch
import torch.nn as nn
from torch.quantization import QConfig
from torch.ao.quantization.observer import HistogramObserver, default_weight_observer
import torch
import torch.nn as nn
import numpy as np
import random
from argparse import ArgumentParser
import time

class ModelArgumentParser(ArgumentParser):
    def __init__(self) -> None:
        super().__init__()
        self.add_argument("--M",
                            help="M dimension",
                            type=int,
                            default=64
        )
        self.add_argument("--K",
                            help="K dimension",
                            type=int,
                            default=64
        )
        self.add_argument("--N",
                            help="N dimension",
                            type=int,
                            default=64
        )
        self.add_argument("--signed_input",
                            help="Use (signed) torch.qint8 for inputs instead of (unsigned) torch.quint8",
                            action="store_true"
        )
        self.add_argument("--seed",
                          help="Random seed",
                          type=int,
                          default=42
        )
        self.add_argument("--iters",
                          help="benchmark iterations",
                          default=500)

def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

class LinearModel(nn.Module):
    def __init__(self, K, N):
        super(LinearModel, self).__init__()
        self.quant = torch.quantization.QuantStub()
        self.fc = nn.Linear(K, N)
        self.dequant = torch.quantization.DeQuantStub()

    def forward(self, x):
        x = self.quant(x)
        x = self.fc(x)
        x = self.dequant(x)
        return x

def quantize_model(model, args):
    qconfig = QConfig(
            activation=HistogramObserver.with_args(reduce_range=False,
            dtype=torch.qint8 if args.signed_input else torch.quint8),
            weight=default_weight_observer,
    )
    # Prepare the model for static quantization
    # Specify quantization configurations
    model.qconfig = qconfig
    model_prepared = torch.quantization.prepare(model_fp32)

    # Calibrate the model with sample inputs
    # Example input data for calibration
    with torch.no_grad():
        sample_data = torch.randn(args.M, args.K)
        model_prepared(sample_data)
    # Convert the prepared model to a quantized model
    model_quantized = torch.quantization.convert(model_prepared)
    return model_quantized

if __name__ == "__main__":
    parser = ModelArgumentParser()
    args = parser.parse_args()

    set_seed(args.seed)
    model_fp32 = LinearModel(args.K, args.N)
    model_quantized = quantize_model(model_fp32, args)

    inputs = torch.randn(args.M, args.K)
    times = []
    with torch.no_grad():
        # warmup
        for _ in range(10):
            model_quantized(inputs)
        # benchmark
        for _ in range(args.iters):
            s = time.time_ns()
            model_quantized(inputs)
            times.append((time.time_ns() - s) / 1e6)

    print("M,K,N,signed = ", args.M, args.K, args.N, args.signed_input)
    print("Min Times (ms) = ", min(times))
    print("Mean Times (ms) = ", np.mean(times))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148585
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
(cherry picked from commit 08a644a4c4a0f74cf3277e85e265a44a192079c5)

* Enable qint8 and quint8 add for AArch64 using ACL directly (#148653)

This enables qint8 and quint8 add for AArch64 through Arm Compute Library (ACL) directly.
Relative performance improvement using OMP_NUM_THREADS=1 is ~15x, using OMP_NUM_THREADS=32 it’s ~5.4x.

Co-authored-by: David Svantesson <david.svantesson-yeung@arm.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148653
Approved by: https://github.com/malfet
ghstack dependencies: #148585

(cherry picked from commit 6c2db8fab047b8a1d671c3c8dfbdd4c478c6d2e3)

* [Build] Guard per-op headers in ACLUtils.cpp (#149417)

To fix internal build failures, where per-op headers are not generated.
We really should have lint for something like that.

Test Plan: CI

Reviewed By: izaitsevfb

Differential Revision: D71406882

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149417
Approved by: https://github.com/Skylion007, https://github.com/izaitsevfb

(cherry picked from commit 5db3a4ac88ad9a3062a9f64dc64741b820208a91)

---------

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2025-03-28 07:51:16 -07:00
6b27e11a5b [Release-only] Pin intel-oneapi-dnnl to 2025.0.1-6 (#150132)
[CI] Fix the XPU CI build environment
2025-03-27 16:07:33 -07:00
18a926f547 update release 2.7 xla pin (#150126)
* update release 2.7 xla pin

* fix

* fix
2025-03-27 18:29:03 -04:00
ecd434bea9 Revert "Parallelize sort" (#150128)
Revert "Parallelize sort (#149765)"

This reverts commit 8d2186cd7952336d4f8b3f73648a5c0714a832b9 as it causes inductor test regression, see 5bed3fafc7/1
2025-03-27 12:04:15 -07:00
5bed3fafc7 [ROCm] Fixes and improvements to CUDA->HIP flag conversion for CPP extensions (#149432)
[ROCm] Fixes and improvements to CUDA->HIP flag conversion for CPP extensions (#149245)

Fixes https://github.com/ROCm/hip/issues/3764.

Fixes and improvements to CUDA->HIP flag conversion for CPP extensions

- Log flag conversion for debugging purposes.
- Fix cases where it should not touch the -I flags or cases where CUDA appears more than once by replacing only the first instance.
- Fix case where nvcc key may not exist
- Fix case where hipify should ignore flag values and only touch the flag itself

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149245
Approved by: https://github.com/jeffdaily

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
(cherry picked from commit c0566e0dbf42f633624adb02015742509edcb444)

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2025-03-26 17:18:20 -07:00
9b4f085526 [MPS] fix attention enable_gqa crash on mps (#150067)
[MPS] fix attention enable_gqa crash on mps (#149147)

Fixes #149132

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149147
Approved by: https://github.com/malfet

(cherry picked from commit dd6e9df3d00851c44fb76341f3113fe9223dcfca)

Co-authored-by: Isalia20 <irakli.salia854@gmail.com>
2025-03-26 17:17:11 -07:00
d29e4c81d9 update aotinductor doc for XPU support (#149935)
update aotinductor doc for XPU support (#149299)

as title. Since the AOTInductor feature starting from 2.7 works on Intel GPU, add the related contents into its doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149299
Approved by: https://github.com/guangyey, https://github.com/desertfire

(cherry picked from commit 4ea580568a27e281b96d26d9380c786c2e2116e6)

Co-authored-by: Jing Xu <jing.xu@intel.com>
2025-03-26 18:42:56 -05:00
8d2186cd79 Parallelize sort (#149765)
Parallelize sort (#149505)

PR #142391 erroneously used `USE_OMP` instead of `USE_OPENMP`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149505
Approved by: https://github.com/fadara01, https://github.com/Skylion007

(cherry picked from commit 842d51500be144d53f4d046d31169e8f46c063f6)

Co-authored-by: Annop Wongwathanarat <annop.wongwathanarat@arm.com>
2025-03-26 16:47:46 -05:00
b04d8358d9 ci/docker: use NCCL 2.26.2-1 (#149874)
ci/docker: use NCCL 2.26.2-1 (#149778)

Related to #149153

This updates some build scripts to hopefully fix the nightly builds which are somehow building against nccl 2.25.1 and using 2.26.2 from pip.

Test plan:

After merging rerun nightly linux jobs and validate that nccl version matches
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149778
Approved by: https://github.com/Skylion007, https://github.com/atalman

Co-authored-by: Andrey Talman <atalman@fb.com>
(cherry picked from commit ddc0fe903f3043246103d71b60a4fff0aeeef9e8)

Co-authored-by: Tristan Rice <rice@fn.lc>
2025-03-26 16:36:35 -04:00
d80afc07f0 [cherry-pick] Modify cuda aarch64 install for cudnn and nccl. Cleanup aarch64 cuda 12.6 docker #149540 (#149624)
Modify cuda aarch64 install for cudnn and nccl. Cleanup aarch64 cuda 12.6 docker (#149540)

1. Use NCCL_VERSION=v2.26.2-1 . Fixes nccl cuda aarch64 related failure we see here: https://github.com/pytorch/pytorch/actions/runs/13955856471/job/39066681549?pr=149443 . After landing: https://github.com/pytorch/pytorch/pull/149351
TODO: Followup required to unify NCCL definitions across the x86 and aarch64 builds

3. Cleanup Remove older CUDA versions for aarch64 builds . CUDA 12.6 where removed by: https://github.com/pytorch/pytorch/pull/148895
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149540
Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/nWEIdia
2025-03-26 13:33:54 -07:00
84210a82ef [cherry-pick] nccl: upgrade to 2.26.2 to avoid hang on ncclCommAbort (#149351) (#149546)
* nccl: upgrade to 2.26.2 to avoid hang on ncclCommAbort (#149351)

Fixes #149153

Yaml generated from:

```
python .github/scripts/generate_ci_workflows.py
```

Test plan:

Repro in https://gist.github.com/d4l3k/16a19b475952bc40ddd7f2febcc297b7

```
rm -rf third_party/nccl
python setup.py develop
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149351
Approved by: https://github.com/kwen2501, https://github.com/atalman, https://github.com/malfet

* fixed_regenerations

---------

Co-authored-by: Tristan Rice <rice@fn.lc>
2025-03-26 13:32:51 -07:00
4268b2f40a [MPSInductor] Move threadfence at the right location (#150037)
[MPSInductor] Move threadfence at the right location (#149437)

Not sure how it worked in the past, but fence should be before first read from the shared memory, not after it.
This bug was exposed by https://github.com/pytorch/pytorch/pull/148969 which removed unnecessary barrier before calling `threadgroup_reduce` functions
Test plan:
```
% python3 generate.py --checkpoint_path checkpoints/stories15M/model.pth --prompt "Once upon a time" --device mps --compile
```
Before that it produced gibberish, now it works fine
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149437
Approved by: https://github.com/manuelcandales, https://github.com/dcci

(cherry picked from commit 61a64c20c402e61027dad4a9e7a192ec0971d1d6)

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-03-26 11:17:09 -07:00
12a6d2a0b8 Add triton as dependency to CUDA aarch64 build (#149945)
Add triton as dependency to CUDA aarch64 build (#149584)

Aarch64 Triton build was added by: https://github.com/pytorch/pytorch/pull/148705
Hence add proper contrain to CUDA 12.8 Aarch64 build

Please note we want to still use:
```platform_system == 'Linux' and platform_machine == 'x86_64'```
For all other builds.

Since these are prototype binaries only used by cuda 12.8 linux aarch64 build. Which we would like to serve from download.pytorch.org

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149584
Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
(cherry picked from commit 9b1127437e6ccf0c55a87607d9f551cc6424ca67)

Co-authored-by: Andrey Talman <atalman@fb.com>
2025-03-26 07:28:27 -07:00
464432ec47 Automate stable CUDA update and linter using min Python verison (#149981)
Automate stable CUDA update and linter using min Python verison (#148912)

1. Fixes: https://github.com/pytorch/pytorch/issues/145571 . Cuda Stable is the same cuda version that is published to pypi, also used to set Metadata section in the rest of whl scripts and tag the docker releases with latest tag.
2. Updates min python version used in linter
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148912
Approved by: https://github.com/Skylion007, https://github.com/malfet

(cherry picked from commit 29fd875bc125582f29abbdf5559d3941899680be)

Co-authored-by: atalman <atalman@fb.com>
2025-03-26 07:03:35 -07:00
1f612dafb5 Removing doc references to PRE_CXX11_ABI. (#149952)
Removing doc references to PRE_CXX11_ABI. (#149756)

Fixes #149550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149756
Approved by: https://github.com/svekars, https://github.com/atalman

(cherry picked from commit 43ee67e8dc6827fbb7d12a5950ddf6a5c80771dc)

Co-authored-by: Alanna Burke <burkealanna@meta.com>
2025-03-25 14:04:12 -07:00
f63def6ac7 [XPU] Update triton commit to fix to fix level_zero not found by env var LEVEL_ZERO_V1_SDK_PATH. (#149714)
[XPU] Update triton commit to fix to fix level_zero not found by env var LEVEL_ZERO_V1_SDK_PATH. (#149511)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149511
Approved by: https://github.com/EikanWang

(cherry picked from commit ee6a0291653cd507d59bda6bcf5d848099a804d1)

Co-authored-by: xinan.lin <xinan.lin@intel.com>
2025-03-25 16:04:51 -04:00
3a8e623a9b op should NOT be static in aoti_torch_call_dispatcher (#149644)
op should NOT be static in aoti_torch_call_dispatcher (#149208)

aoti_torch_call_dispatcher is meant to call different ops, so the op must not be static. Otherwise, every call to this API will call the first op that was ever called, which is not the intended behavior of any human being.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149208
Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/malfet

(cherry picked from commit 740ce0fa5f8c7e9e51422b614f8187ab93a60b8b)

Co-authored-by: Jane Xu <janeyx@meta.com>
2025-03-25 16:03:19 -04:00
bf727425a0 Symintify transpose_ (#149632)
Symintify transpose_ (#149057)

Fixes https://github.com/pytorch/pytorch/issues/148702
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149057
Approved by: https://github.com/yushangdi

(cherry picked from commit 8d7c430e84f4ad439ebdc81f9ab496a3665033a4)

Co-authored-by: angelayi <yiangela7@gmail.com>
2025-03-25 15:59:42 -04:00
8c7dbc939f [cherry-pick] [CI] Don't clean workspace when fetching repo (#147994) (#149129)
Revert "[CI] Don't clean workspace when fetching repo (#147994)"

This reverts commit e5fef8a08ebb8548e8413ae54ef0ad9a11f1f4c0.
2025-03-25 15:25:19 -04:00
644fdbad95 [Intel GPU][PT2E] bugfix: use zero-point to decide conv src zp mask (#149631)
[Intel GPU][PT2E] bugfix: use zero-point to decide conv src zp mask (#149473)

# Motivation
The PR fix a bug that wrongly decides the zero-point mask setting. Specifically, it deems zero-point is always not zeros due to scale is used for judgement. Fortunately, the bug only affects the performance. The accuracy is not affected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149473
Approved by: https://github.com/EikanWang, https://github.com/guangyey

(cherry picked from commit d67c1a027e61bd68908bc4c8e5275a983521366c)

Co-authored-by: ZhiweiYan-96 <zhiwei.yan@intel.com>
2025-03-25 10:07:25 -05:00
fb027c5692 [cherry-pick] Update ExecuTorch pin update (#149539) (#149630)
Update ExecuTorch pin update (#149539)

Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50

Follow-up to https://github.com/pytorch/pytorch/issues/144480#issuecomment-2731150636

Also, need to incorporate change from https://github.com/pytorch/executorch/pull/8817

Test Plan:

Monitor  linux-jammy-py3-clang12-executorch test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149539
Approved by: https://github.com/larryliu0820

(cherry picked from commit bc86b6c55a4f7e07548a92fe7c9b52ad2c88af35)
2025-03-25 10:03:16 -05:00
3b87bd8b82 Fix atomic operation compatibility for ARMv8-A (Raspberry Pi 4) by adjusting compilation flags (#149878)
Fix atomic operation compatibility for ARMv8-A (Raspberry Pi 4) by adjusting compilation flags (#148070)

**Issue:**
* The ldaddal instruction is an AArch64 atomic operation available from ARMv8.1-A onwards.
* Raspberry Pi 4 (Cortex-A72) is ARMv8-A, which does not support ldaddal, leading to failures when running PyTorch built with march=armv8.2-a+sve
* This led to an issue when running PyTorch on ARMv8-A (Raspberry Pi 4), as unsupported atomic operations were generated.

**Fix:**
* Updated the build flags to explicitly use **-march=armv8-a+sve**, ensuring GCC and clang promotes it correctly and resolves compatibility issues with armv8 and still work correctly for SVE like before.
* This ensures that PyTorch builds correctly for ARMv8-A platforms (e.g., Raspberry Pi 4) while still enabling SVE for supported hardware.

Test plan:
 - Allocate `a1.4xlarge` on AWS
 - Run following script using wheel produced by this PR
 ```python
import torch
def f(x):
    return x.sin() + x.cos()

print(torch.__version__)
f_c = torch.jit.script(f)
```
- Observe no crash
```
$ python3 foo.py
2.7.0.dev20250313+cpu
```
- Observe crash with 2.6.0
```
$ python3 foo.py
2.6.0+cpu
Illegal instruction (core dumped)
```

Fixes #146792

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148070
Approved by: https://github.com/malfet

(cherry picked from commit 09f7f62cfebb0067b93d227c13fe9a94b51af762)

Co-authored-by: maajidkhann <maajidkhan.n@fujitsu.com>
2025-03-24 13:57:16 -07:00
89b098a677 Add release branch push triggers to inductor-rocm-mi300.yml (#149871)
Add release branch push triggers to inductor-rocm-mi300.yml (#149672)

In similar vein as https://github.com/pytorch/pytorch/pull/149517

When we added the rocm-mi300.yml earlier this year, we had lower capacity and we were just pipecleaning the workflow, so we set the trigger to only respond to pushes to main branch. But now we have more stability as well as capacity, and we would really like to ensure that the release branch is being tested on MI300s as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149672
Approved by: https://github.com/jeffdaily

(cherry picked from commit 1eab841185cb2d68a11e2e0604fd96d110778960)

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2025-03-24 13:24:10 -07:00
4cc4302b32 Do not depend on numpy during the import (#149731)
Do not depend on numpy during the import (#149683)

But a good followup would be to use torch primitives instead of numpy here
Fixes https://github.com/pytorch/pytorch/issues/149681

Test plan: Monkey-patch 2.7.0-rc and run `python -c "import torch;print(torch.compile(lambda x:x.sin() + x.cos())(torch.rand(32)))"`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149683
Approved by: https://github.com/seemethere

(cherry picked from commit 68dfd44e50f59c53698a24985039a27351862963)

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-03-21 11:13:52 -07:00
c632e4fdb8 [ONNX] Expose verification utilities (#149375)
* [ONNX] Expose verification utilities (#148603)

Expose verification utilities to public documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148603
Approved by: https://github.com/titaiwangms

(cherry picked from commit ebabd0efdddd91e11364e42227b746c419a39be4)

* [ONNX] Update types in VerificationInfo (#149377)

torch.types.Number was rendered as is in the documentation and can be confusing. We write the original types instead to reduce confusion for users.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149377
Approved by: https://github.com/titaiwangms

---------

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2025-03-20 08:14:38 -07:00
b23bfae9f7 Add AOTI shim for _weight_int4pack_mm_cpu_tensor (#149031) (#149386)
**Summary**
Previous implementation of shim did not align with the design and it was removed by https://github.com/pytorch/pytorch/pull/148907
This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT.

**Test plan**
```
pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149031
Approved by: https://github.com/leslie-fang-intel, https://github.com/EikanWang, https://github.com/desertfire
2025-03-20 08:08:09 -07:00
1b8f496f87 Pin auditwheel to 6.2.0 (#149525) 2025-03-19 16:13:43 -07:00
c236b602ff Add release branch push triggers to rocm-mi300.yml (#149526) 2025-03-19 16:12:59 -07:00
6926f30654 BC fix for AOTIModelPackageLoader() constructor defaults (#149214)
BC fix for AOTIModelPackageLoader() constructor defaults (#149082)

The default value for `run_single_threaded` was wrongly specified in the .cpp file instead of the header, breaking C++-side instantiation of `AOTIModelPackageLoader` with no arguments. This PR fixes this and adds a test for the use case of running with `AOTIModelPackageLoader` instead of `AOTIModelContainerRunner` on the C++ side.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149082
Approved by: https://github.com/desertfire

(cherry picked from commit 5e1b715dda813d8c545378291261b565649df8e5)

Co-authored-by: Joel Schlosser <jbschlosser@meta.com>
2025-03-17 17:16:54 -05:00
483980d7f3 [AOTI][XPU] Fix: model_container_runner_xpu.cpp is not built into libtorch_xpu.so (#149242)
[AOTI][XPU] Fix: model_container_runner_xpu.cpp is not built into libtorch_xpu.so (#149175)

The missing of model_container_runner_xpu.cpp will cause compilation failure when user build CPP inference application on XPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149175
Approved by: https://github.com/jansel

(cherry picked from commit 9ad6265d044075d1ceb27cf0f2af7495e586003c)

Co-authored-by: xinan.lin <xinan.lin@intel.com>
2025-03-17 17:15:39 -05:00
7173a73cf4 [regression] Fix pin_memory() when it is called before device lazy initialization. (#149183)
[regression] Fix pin_memory() when it is called before device lazy initialization. (#149033)

PR #145752 has added a check in the isPinnedPtr to check if a device is initialized before checking if the tensor is pinned. Also that PR has added a lazy initialization trigger when an at::empty is called with a pinned param set to true. However, when the tensor is firstly created and it is pinned in a separate call by calling pin_memory() function, lazy device init is not called so is_pinned returns always false.

With this PR, the lazy initialization is moved to getPinnedMemoryAllocator function, thus it is assured that device is initialized before we pin a tensor.

Fixes #149032

@ngimel @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149033
Approved by: https://github.com/ngimel, https://github.com/albanD

(cherry picked from commit 420a9be743f8dd5d6296a32a1351c1baced12f1f)

Co-authored-by: Bartlomiej Stemborowski <bstemborowskix@habana.ai>
2025-03-17 15:37:24 -04:00
7bab7354df [cherry-pick] Revert #148823 - Make dynamism code robust to NotImplementedException (#149160)
Revert "Make dynamism code robust to NotImplementedException (#148823)"

This reverts commit 60576419a2a5cc09e4a92be870fda8f3fc305ddc.

Reverting from RC since it was reverted from the main branch
2025-03-14 10:50:24 -05:00
b1940b5867 Remove runtime dependency on packaging (#149125)
Remove runtime dependency on packaging (#149092)

Looks like after https://github.com/pytorch/pytorch/pull/148924
We are seeing this error in nightly test:
https://github.com/pytorch/pytorch/actions/runs/13806023728/job/38616861623

```
  File "/Users/runner/work/_temp/anaconda/envs/test_conda_env/lib/python3.13/site-packages/torch/_inductor/pattern_matcher.py", line 79, in <module>
    from .lowering import fallback_node_due_to_unsupported_type
  File "/Users/runner/work/_temp/anaconda/envs/test_conda_env/lib/python3.13/site-packages/torch/_inductor/lowering.py", line 7024, in <module>
    from . import kernel
  File "/Users/runner/work/_temp/anaconda/envs/test_conda_env/lib/python3.13/site-packages/torch/_inductor/kernel/__init__.py", line 1, in <module>
    from . import mm, mm_common, mm_plus_mm
  File "/Users/runner/work/_temp/anaconda/envs/test_conda_env/lib/python3.13/site-packages/torch/_inductor/kernel/mm.py", line 6, in <module>
    from packaging.version import Version
ModuleNotFoundError: No module named 'packaging'
```

Hence removing runtime dependency on packaging since it may not be installed by default

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149092
Approved by: https://github.com/drisspg, https://github.com/davidberard98

(cherry picked from commit 65d19a5699afbb0b123b6b264188f5610b925c5e)

Co-authored-by: atalman <atalman@fb.com>
2025-03-13 12:28:50 -04:00
abebbd5113 [Profiler][HPU] Fix incorrect availabilities for HPU (#149115)
[Profiler][HPU] Fix incorrect availabilities for HPU (#148663)

Fixes #148661

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148663
Approved by: https://github.com/jeromean, https://github.com/albanD

(cherry picked from commit 75c8b7d9725af59d8348379a4165d1252c4ac208)

Co-authored-by: wdziurdz <witold.dziurdz@intel.com>
2025-03-13 08:09:45 -07:00
cdd7a2c72b [RLEASE ONLY CHANGES] Apply release only chnages to release 2.7 (#149056)
* [RLEASE ONLY CHANGES] Apply release only chnages to release 2.7

* fix_lint_workflow

* docker_release

* fix_check_binary
2025-03-12 15:44:15 -04:00
d94ea2647c [inductor] Fix profiler tests with latest Triton (#149059)
[inductor] Fix profiler tests with latest Triton (#149025)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149025
Approved by: https://github.com/yanboliang

(cherry picked from commit 488c4480f97e9d537905b0fbcb7236f88dca47f7)

Co-authored-by: Jason Ansel <jansel@meta.com>
2025-03-12 15:21:51 -04:00
2972 changed files with 73923 additions and 148926 deletions

View File

@ -19,11 +19,13 @@ import boto3
# AMI images for us-east-1, change the following based on your ~/.aws/config
os_amis = {
"ubuntu18_04": "ami-078eece1d8119409f", # login_name: ubuntu
"ubuntu20_04": "ami-052eac90edaa9d08f", # login_name: ubuntu
"ubuntu22_04": "ami-0c6c29c5125214c77", # login_name: ubuntu
"redhat8": "ami-0698b90665a2ddcf1", # login_name: ec2-user
}
ubuntu18_04_ami = os_amis["ubuntu18_04"]
ubuntu20_04_ami = os_amis["ubuntu20_04"]
@ -657,6 +659,18 @@ def configure_system(
"sudo apt-get install -y python3-dev python3-yaml python3-setuptools python3-wheel python3-pip"
)
host.run_cmd("pip3 install dataclasses typing-extensions")
# Install and switch to gcc-8 on Ubuntu-18.04
if not host.using_docker() and host.ami == ubuntu18_04_ami and compiler == "gcc-8":
host.run_cmd("sudo apt-get install -y g++-8 gfortran-8")
host.run_cmd(
"sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100"
)
host.run_cmd(
"sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100"
)
host.run_cmd(
"sudo update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-8 100"
)
if not use_conda:
print("Installing Cython + numpy from PyPy")
host.run_cmd("sudo pip3 install Cython")
@ -1012,7 +1026,7 @@ if __name__ == "__main__":
install_condaforge_python(host, args.python_version)
sys.exit(0)
python_version = args.python_version if args.python_version is not None else "3.9"
python_version = args.python_version if args.python_version is not None else "3.8"
if args.use_torch_from_pypi:
configure_system(host, compiler=args.compiler, python_version=python_version)

View File

@ -10,3 +10,5 @@ example: `py2-cuda9.0-cudnn7-ubuntu16.04`. The Docker images that are
built on Jenkins and are used in triggered builds already have this
environment variable set in their manifest. Also see
`./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`.
Our Jenkins installation is located at https://ci.pytorch.org/jenkins/.

View File

@ -13,6 +13,10 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
echo 'Skipping tests'
exit 0
fi
if [[ "${BUILD_ENVIRONMENT}" == *-rocm* ]]; then
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
fi
# These additional packages are needed for circleci ROCm builds.
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# Need networkx 2.0 because bellmand_ford was moved in 2.1 . Scikit-image by

View File

@ -34,5 +34,5 @@ See `build.sh` for valid build environments (it's the giant switch).
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
# Set flags (see build.sh) and build image
sudo bash -c 'TRITON=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
```

View File

@ -1,6 +1,5 @@
ARG CUDA_VERSION=12.4
ARG BASE_TARGET=cuda${CUDA_VERSION}
ARG ROCM_IMAGE=rocm/dev-almalinux-8:6.3-complete
FROM amd64/almalinux:8 as base
ENV LC_ALL en_US.UTF-8
@ -9,6 +8,10 @@ ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=11
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
RUN yum -y update
RUN yum -y install epel-release
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
@ -38,12 +41,9 @@ RUN bash ./install_conda.sh && rm install_conda.sh
# Install CUDA
FROM base as cuda
ARG CUDA_VERSION=12.6
ARG CUDA_VERSION=12.4
RUN rm -rf /usr/local/cuda-*
ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
ENV CUDA_HOME=/usr/local/cuda-${CUDA_VERSION}
# Preserve CUDA_VERSION for the builds
ENV CUDA_VERSION=${CUDA_VERSION}
@ -54,20 +54,18 @@ FROM cuda as cuda11.8
RUN bash ./install_cuda.sh 11.8
ENV DESIRED_CUDA=11.8
FROM cuda as cuda12.1
RUN bash ./install_cuda.sh 12.1
ENV DESIRED_CUDA=12.1
FROM cuda as cuda12.4
RUN bash ./install_cuda.sh 12.4
ENV DESIRED_CUDA=12.4
FROM cuda as cuda12.6
RUN bash ./install_cuda.sh 12.6
ENV DESIRED_CUDA=12.6
FROM cuda as cuda12.8
RUN bash ./install_cuda.sh 12.8
ENV DESIRED_CUDA=12.8
FROM ${ROCM_IMAGE} as rocm
ENV PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
ENV MKLROOT /opt/intel
# Install MNIST test data
FROM base as mnist
ADD ./common/install_mnist.sh install_mnist.sh
@ -75,8 +73,9 @@ RUN bash ./install_mnist.sh
FROM base as all_cuda
COPY --from=cuda11.8 /usr/local/cuda-11.8 /usr/local/cuda-11.8
COPY --from=cuda12.1 /usr/local/cuda-12.1 /usr/local/cuda-12.1
COPY --from=cuda12.4 /usr/local/cuda-12.4 /usr/local/cuda-12.4
COPY --from=cuda12.6 /usr/local/cuda-12.6 /usr/local/cuda-12.6
COPY --from=cuda12.4 /usr/local/cuda-12.8 /usr/local/cuda-12.8
# Final step
FROM ${BASE_TARGET} as final

View File

@ -1,70 +1,82 @@
#!/usr/bin/env bash
# Script used only in CD pipeline
set -exou pipefail
set -eou pipefail
image="$1"
shift
if [ -z "${image}" ]; then
echo "Usage: $0 IMAGENAME:ARCHTAG"
echo "Usage: $0 IMAGE"
exit 1
fi
# Go from imagename:tag to tag
DOCKER_TAG_PREFIX=$(echo "${image}" | awk -F':' '{print $2}')
DOCKER_IMAGE_NAME="pytorch/${image}"
CUDA_VERSION=""
ROCM_VERSION=""
EXTRA_BUILD_ARGS=""
if [[ "${DOCKER_TAG_PREFIX}" == cuda* ]]; then
# extract cuda version from image name and tag. e.g. manylinux2_28-builder:cuda12.8 returns 12.8
CUDA_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'cuda' '{print $2}')
EXTRA_BUILD_ARGS="--build-arg CUDA_VERSION=${CUDA_VERSION}"
elif [[ "${DOCKER_TAG_PREFIX}" == rocm* ]]; then
# extract rocm version from image name and tag. e.g. manylinux2_28-builder:rocm6.2.4 returns 6.2.4
ROCM_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'rocm' '{print $2}')
EXTRA_BUILD_ARGS="--build-arg ROCM_IMAGE=rocm/dev-almalinux-8:${ROCM_VERSION}-complete"
fi
case ${DOCKER_TAG_PREFIX} in
cpu)
BASE_TARGET=base
;;
cuda*)
BASE_TARGET=cuda${CUDA_VERSION}
;;
rocm*)
BASE_TARGET=rocm
;;
*)
echo "ERROR: Unknown docker tag ${DOCKER_TAG_PREFIX}"
exit 1
;;
esac
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker
export DOCKER_BUILDKIT=1
TOPDIR=$(git rev-parse --show-toplevel)
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
docker build \
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "DEVTOOLSET_VERSION=11" \
${EXTRA_BUILD_ARGS} \
-t ${tmp_tag} \
$@ \
-f "${TOPDIR}/.ci/docker/almalinux/Dockerfile" \
${TOPDIR}/.ci/docker/
CUDA_VERSION=${CUDA_VERSION:-12.1}
if [ -n "${CUDA_VERSION}" ]; then
case ${CUDA_VERSION} in
cpu)
BASE_TARGET=base
DOCKER_TAG=cpu
;;
all)
BASE_TARGET=all_cuda
DOCKER_TAG=latest
;;
*)
BASE_TARGET=cuda${CUDA_VERSION}
DOCKER_TAG=cuda${CUDA_VERSION}
;;
esac
(
set -x
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker
docker build \
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=11" \
-t ${DOCKER_IMAGE_NAME} \
$@ \
-f "${TOPDIR}/.ci/docker/almalinux/Dockerfile" \
${TOPDIR}/.ci/docker/
)
if [[ "${DOCKER_TAG}" =~ ^cuda* ]]; then
# Test that we're using the right CUDA compiler
docker run --rm "${tmp_tag}" nvcc --version | grep "cuda_${CUDA_VERSION}"
(
set -x
docker run --rm "${DOCKER_IMAGE_NAME}" nvcc --version | grep "cuda_${CUDA_VERSION}"
)
fi
GITHUB_REF=${GITHUB_REF:-$(git symbolic-ref -q HEAD || git describe --tags --exact-match)}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE_NAME}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE_NAME}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH:-}" == true ]]; then
(
set -x
docker push "${DOCKER_IMAGE_NAME}"
if [[ -n ${GITHUB_REF} ]]; then
docker tag ${DOCKER_IMAGE_NAME} ${DOCKER_IMAGE_BRANCH_TAG}
docker tag ${DOCKER_IMAGE_NAME} ${DOCKER_IMAGE_SHA_TAG}
docker push "${DOCKER_IMAGE_BRANCH_TAG}"
docker push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi

View File

@ -85,6 +85,9 @@ elif [[ "$image" == *linter* ]]; then
DOCKERFILE="linter/Dockerfile"
fi
# CMake 3.18 is needed to support CUDA17 language variant
CMAKE_VERSION=3.18.5
_UCX_COMMIT=7bb2722ff2187a0cad557ae4a6afa090569f83fb
_UCC_COMMIT=20eae37090a4ce1b32bcce6144ccad0b49943e0b
if [[ "$image" == *rocm* ]]; then
@ -92,21 +95,22 @@ if [[ "$image" == *rocm* ]]; then
_UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d
fi
tag=$(echo $image | awk -F':' '{print $2}')
# It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$tag" in
case "$image" in
pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc11)
CUDA_VERSION=12.6.3
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9-inductor-benchmarks)
@ -114,10 +118,13 @@ case "$tag" in
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
@ -126,10 +133,13 @@ case "$tag" in
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
@ -138,10 +148,13 @@ case "$tag" in
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.13
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
@ -150,10 +163,13 @@ case "$tag" in
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc9-inductor-benchmarks)
@ -161,10 +177,13 @@ case "$tag" in
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
@ -173,10 +192,13 @@ case "$tag" in
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
@ -185,10 +207,13 @@ case "$tag" in
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.13
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
@ -197,81 +222,115 @@ case "$tag" in
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3-clang10-onnx)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
ONNX=yes
;;
pytorch-linux-focal-py3.9-clang10)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.11-clang10)
ANACONDA_PYTHON_VERSION=3.11
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.9-gcc9)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-rocm-n-1-py3)
pytorch-linux-focal-rocm-n-1-py3)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=6.2.4
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-rocm-n-py3)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=6.3
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-rocm-n-py3)
ANACONDA_PYTHON_VERSION=3.10
pytorch-linux-jammy-xpu-2024.0-py3)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=6.4
XPU_VERSION=0.5
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-xpu-2025.0-py3)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
XPU_VERSION=2025.0
NINJA_VERSION=1.9.0
TRITON=yes
;;
pytorch-linux-jammy-xpu-2025.1-py3)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
VISION=yes
XPU_VERSION=2025.1
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-py3.9-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
CONDA_CMAKE=yes
TRITON=yes
DOCS=yes
INDUCTOR_BENCHMARKS=yes
@ -281,30 +340,40 @@ case "$tag" in
CUDA_VERSION=11.8
CUDNN_VERSION=9
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
TRITON=yes
;;
pytorch-linux-jammy-py3-clang12-asan)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-py3-clang15-asan)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=15
CONDA_CMAKE=yes
VISION=yes
;;
pytorch-linux-jammy-py3-clang18-asan)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=18
CONDA_CMAKE=yes
VISION=yes
;;
pytorch-linux-jammy-py3.9-gcc11)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
CONDA_CMAKE=yes
TRITON=yes
DOCS=yes
UNINSTALL_DILL=yes
@ -312,12 +381,14 @@ case "$tag" in
pytorch-linux-jammy-py3-clang12-executorch)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=12
CONDA_CMAKE=yes
EXECUTORCH=yes
;;
pytorch-linux-jammy-py3.12-halide)
CUDA_VERSION=12.6
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
CONDA_CMAKE=yes
HALIDE=yes
TRITON=yes
;;
@ -325,23 +396,29 @@ case "$tag" in
CUDA_VERSION=12.6
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
CONDA_CMAKE=yes
TRITON_CPU=yes
;;
pytorch-linux-focal-linter)
# TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.
# We will need to update mypy version eventually, but that's for another day. The task
# would be to upgrade mypy to 1.0.0 with Python 3.11
PYTHON_VERSION=3.9
ANACONDA_PYTHON_VERSION=3.9
CONDA_CMAKE=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-linter)
PYTHON_VERSION=3.9
ANACONDA_PYTHON_VERSION=3.9
CUDA_VERSION=11.8
CONDA_CMAKE=yes
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
ACL=yes
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
@ -350,7 +427,10 @@ case "$tag" in
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
ACL=yes
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
@ -358,6 +438,8 @@ case "$tag" in
;;
*)
# Catch-all for builds that are not hardcoded.
PROTOBUF=yes
DB=yes
VISION=yes
echo "image '$image' did not match an existing build configuration"
if [[ "$image" == *py* ]]; then
@ -373,7 +455,8 @@ case "$tag" in
TRITON=yes
# To ensure that any ROCm config will build using conda cmake
# and thus have LAPACK/MKL enabled
fi
CONDA_CMAKE=yes
fi
if [[ "$image" == *centos7* ]]; then
NINJA_VERSION=1.10.2
fi
@ -389,6 +472,9 @@ case "$tag" in
if [[ "$image" == *glibc* ]]; then
extract_version_from_image_name glibc GLIBC_VERSION
fi
if [[ "$image" == *cmake* ]]; then
extract_version_from_image_name cmake CMAKE_VERSION
fi
;;
esac
@ -402,20 +488,14 @@ if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
fi
fi
no_cache_flag=""
progress_flag=""
# Do not use cache and progress=plain when in CI
if [[ -n "${CI:-}" ]]; then
no_cache_flag="--no-cache"
progress_flag="--progress=plain"
fi
# Build image
docker build \
${no_cache_flag} \
${progress_flag} \
--no-cache \
--progress=plain \
--build-arg "BUILD_ENVIRONMENT=${image}" \
--build-arg "PROTOBUF=${PROTOBUF:-}" \
--build-arg "LLVMDEV=${LLVMDEV:-}" \
--build-arg "DB=${DB:-}" \
--build-arg "VISION=${VISION:-}" \
--build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \
--build-arg "CENTOS_VERSION=${CENTOS_VERSION}" \
@ -423,12 +503,14 @@ docker build \
--build-arg "GLIBC_VERSION=${GLIBC_VERSION}" \
--build-arg "CLANG_VERSION=${CLANG_VERSION}" \
--build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \
--build-arg "PYTHON_VERSION=${PYTHON_VERSION}" \
--build-arg "GCC_VERSION=${GCC_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
--build-arg "TENSORRT_VERSION=${TENSORRT_VERSION}" \
--build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \
--build-arg "VULKAN_SDK_VERSION=${VULKAN_SDK_VERSION}" \
--build-arg "SWIFTSHADER=${SWIFTSHADER}" \
--build-arg "CMAKE_VERSION=${CMAKE_VERSION:-}" \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
@ -436,6 +518,7 @@ docker build \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
--build-arg "CONDA_CMAKE=${CONDA_CMAKE}" \
--build-arg "TRITON=${TRITON}" \
--build-arg "TRITON_CPU=${TRITON_CPU}" \
--build-arg "ONNX=${ONNX}" \
@ -444,7 +527,6 @@ docker build \
--build-arg "EXECUTORCH=${EXECUTORCH}" \
--build-arg "HALIDE=${HALIDE}" \
--build-arg "XPU_VERSION=${XPU_VERSION}" \
--build-arg "UNINSTALL_DILL=${UNINSTALL_DILL}" \
--build-arg "ACL=${ACL:-}" \
--build-arg "SKIP_SCCACHE_INSTALL=${SKIP_SCCACHE_INSTALL:-}" \
--build-arg "SKIP_LLVM_SRC_BUILD_INSTALL=${SKIP_LLVM_SRC_BUILD_INSTALL:-}" \
@ -462,7 +544,7 @@ docker build \
UBUNTU_VERSION=$(echo ${UBUNTU_VERSION} | sed 's/-rc$//')
function drun() {
docker run --rm "$tmp_tag" "$@"
docker run --rm "$tmp_tag" $*
}
if [[ "$OS" == "ubuntu" ]]; then
@ -510,23 +592,3 @@ if [ -n "$KATEX" ]; then
exit 1
fi
fi
HAS_TRITON=$(drun python -c "import triton" > /dev/null 2>&1 && echo "yes" || echo "no")
if [[ -n "$TRITON" || -n "$TRITON_CPU" ]]; then
if [ "$HAS_TRITON" = "no" ]; then
echo "expecting triton to be installed, but it is not"
exit 1
fi
elif [ "$HAS_TRITON" = "yes" ]; then
echo "expecting triton to not be installed, but it is"
exit 1
fi
# Sanity check cmake version. Executorch reinstalls cmake and I'm not sure if
# they support 4.0.0 yet, so exclude them from this check.
CMAKE_VERSION=$(drun cmake --version)
if [[ "$EXECUTORCH" != *yes* && "$CMAKE_VERSION" != *4.* ]]; then
echo "CMake version is not 4.0.0:"
drun cmake --version
exit 1
fi

View File

@ -17,8 +17,9 @@ RUN bash ./install_base.sh && rm install_base.sh
# Update CentOS git version
RUN yum -y remove git
RUN yum -y remove git-*
RUN yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm && \
sed -i 's/packages.endpoint/packages.endpointdev/' /etc/yum.repos.d/endpoint.repo
RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm || \
(yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm && \
sed -i "s/packages.endpoint/packages.endpointdev/" /etc/yum.repos.d/endpoint.repo)
RUN yum install -y git
# Install devtoolset
@ -39,6 +40,7 @@ RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
@ -46,6 +48,20 @@ COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -59,7 +75,7 @@ COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION}
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
COPY ./common/install_amdsmi.sh install_amdsmi.sh
RUN bash ./install_amdsmi.sh
@ -73,6 +89,12 @@ ENV MAGMA_HOME /opt/rocm/magma
ENV LANG en_US.utf8
ENV LC_ALL en_US.utf8
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh

View File

@ -1 +1 @@
b173722085b3f555d6ba4533d6bbaddfd7c71144
01a22b6f16d117454b7d21ebdc691b0785b84a7f

View File

@ -1 +1 @@
v2.26.5-1
v2.26.2-1

View File

@ -99,6 +99,9 @@ install_centos() {
ccache_deps="asciidoc docbook-dtds docbook-style-xsl libxslt"
numpy_deps="gcc-gfortran"
# Note: protobuf-c-{compiler,devel} on CentOS are too old to be used
# for Caffe2. That said, we still install them to make sure the build
# system opts to build/use protoc and libprotobuf from third-party.
yum install -y \
$ccache_deps \
$numpy_deps \

View File

@ -9,7 +9,7 @@ install_ubuntu() {
# Instead use lib and headers from OpenSSL1.1 installed in `install_openssl.sh``
apt-get install -y cargo
echo "Checking out sccache repo"
git clone https://github.com/mozilla/sccache -b v0.10.0
git clone https://github.com/mozilla/sccache -b v0.9.1
cd sccache
echo "Building sccache"
cargo build --release

View File

@ -4,10 +4,16 @@ set -ex
if [ -n "$CLANG_VERSION" ]; then
if [[ $UBUNTU_VERSION == 22.04 ]]; then
if [[ $CLANG_VERSION == 9 && $UBUNTU_VERSION == 18.04 ]]; then
sudo apt-get update
# gpg-agent is not available by default on 18.04
sudo apt-get install -y --no-install-recommends gpg-agent
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-${CLANG_VERSION} main"
elif [[ $UBUNTU_VERSION == 22.04 ]]; then
# work around ubuntu apt-get conflicts
sudo apt-get -y -f install
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
if [[ $CLANG_VERSION == 18 ]]; then
apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main"
fi
@ -35,7 +41,7 @@ if [ -n "$CLANG_VERSION" ]; then
# clang's packaging is a little messed up (the runtime libs aren't
# added into the linker path), so give it a little help
clang_lib=("/usr/lib/llvm-$CLANG_VERSION/lib/clang/"*"/lib/linux")
echo "$clang_lib" >/etc/ld.so.conf.d/clang.conf
echo "$clang_lib" > /etc/ld.so.conf.d/clang.conf
ldconfig
# Cleanup package manager

View File

@ -0,0 +1,31 @@
#!/bin/bash
set -ex
[ -n "$CMAKE_VERSION" ]
# Remove system cmake install so it won't get used instead
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
apt-get remove cmake -y
;;
centos)
yum remove cmake -y
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
# Turn 3.6.3 into v3.6
path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')
file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"
# Download and install specific CMake version in /usr/local
pushd /tmp
curl -Os --retry 3 "https://cmake.org/files/${path}/${file}"
tar -C /usr/local --strip-components 1 --no-same-owner -zxf cmake-*.tar.gz
rm -f cmake-*.tar.gz
popd

View File

@ -7,7 +7,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://repo.anaconda.com/miniconda"
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
if [[ $(uname -m) == "aarch64" ]] || [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download" # @lint-ignore
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download"
CONDA_FILE="Miniforge3-Linux-$(uname -m).sh"
fi
@ -62,7 +62,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# libstdcxx from conda default channels are too old, we need GLIBCXX_3.4.30
# which is provided in libstdcxx 12 and up.
conda_install libstdcxx-ng=12.3.0 --update-deps -c conda-forge
conda_install libstdcxx-ng=12.3.0 -c conda-forge
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
if [[ $(uname -m) == "aarch64" ]]; then
@ -75,11 +75,19 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# and libpython-static for torch deploy
conda_install llvmdev=8.0.0 "libpython-static=${ANACONDA_PYTHON_VERSION}"
# Use conda cmake in some cases. Conda cmake will be newer than our supported
# min version (3.5 for xenial and 3.10 for bionic), so we only do it in those
# following builds that we know should use conda. Specifically, Ubuntu bionic
# and focal cannot find conda mkl with stock cmake, so we need a cmake from conda
if [ -n "${CONDA_CMAKE}" ]; then
conda_install cmake
fi
# Magma package names are concatenation of CUDA major and minor ignoring revision
# I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89
# Magma is installed from a tarball in the ossci-linux bucket into the conda env
if [ -n "$CUDA_VERSION" ]; then
conda_run ${SCRIPT_FOLDER}/install_magma_conda.sh $(cut -f1-2 -d'.' <<< ${CUDA_VERSION})
${SCRIPT_FOLDER}/install_magma_conda.sh $(cut -f1-2 -d'.' <<< ${CUDA_VERSION}) ${ANACONDA_PYTHON_VERSION}
fi
# Install some other packages, including those needed for Python test reporting

View File

@ -3,11 +3,11 @@
set -uex -o pipefail
PYTHON_DOWNLOAD_URL=https://www.python.org/ftp/python
PYTHON_DOWNLOAD_GITHUB_BRANCH=https://github.com/python/cpython/archive/refs/heads # @lint-ignore
PYTHON_DOWNLOAD_GITHUB_BRANCH=https://github.com/python/cpython/archive/refs/heads
GET_PIP_URL=https://bootstrap.pypa.io/get-pip.py
# Python versions to be installed in /opt/$VERSION_NO
CPYTHON_VERSIONS=${CPYTHON_VERSIONS:-"3.9.0 3.10.1 3.11.0 3.12.0 3.13.0 3.13.0t"}
CPYTHON_VERSIONS=${CPYTHON_VERSIONS:-"3.8.1 3.9.0 3.10.1 3.11.0 3.12.0 3.13.0 3.13.0t"}
function check_var {
if [ -z "$1" ]; then

View File

@ -2,82 +2,140 @@
set -ex
arch_path=''
targetarch=${TARGETARCH:-$(uname -m)}
if [ ${targetarch} = 'amd64' ] || [ "${targetarch}" = 'x86_64' ]; then
arch_path='x86_64'
else
arch_path='sbsa'
fi
NCCL_VERSION=v2.26.2-1
CUDNN_VERSION=9.5.1.17
function install_cuda {
version=$1
runfile=$2
major_minor=${version%.*}
rm -rf /usr/local/cuda-${major_minor} /usr/local/cuda
if [[ ${arch_path} == 'sbsa' ]]; then
runfile="${runfile}_sbsa"
fi
runfile="${runfile}.run"
wget -q https://developer.download.nvidia.com/compute/cuda/${version}/local_installers/${runfile} -O ${runfile}
chmod +x ${runfile}
./${runfile} --toolkit --silent
rm -f ${runfile}
rm -f /usr/local/cuda && ln -s /usr/local/cuda-${major_minor} /usr/local/cuda
function install_cusparselt_040 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.4.0.7-archive.tar.xz
tar xf libcusparse_lt-linux-x86_64-0.4.0.7-archive.tar.xz
cp -a libcusparse_lt-linux-x86_64-0.4.0.7-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-x86_64-0.4.0.7-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_cudnn {
cuda_major_version=$1
cudnn_version=$2
mkdir tmp_cudnn && cd tmp_cudnn
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
filepath="cudnn-linux-${arch_path}-${cudnn_version}_cuda${cuda_major_version}-archive"
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-${arch_path}/${filepath}.tar.xz
tar xf ${filepath}.tar.xz
cp -a ${filepath}/include/* /usr/local/cuda/include/
cp -a ${filepath}/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
function install_cusparselt_062 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.6.2.3-archive.tar.xz
tar xf libcusparse_lt-linux-x86_64-0.6.2.3-archive.tar.xz
cp -a libcusparse_lt-linux-x86_64-0.6.2.3-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-x86_64-0.6.2.3-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_cusparselt_063 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.6.3.2-archive.tar.xz
tar xf libcusparse_lt-linux-x86_64-0.6.3.2-archive.tar.xz
cp -a libcusparse_lt-linux-x86_64-0.6.3.2-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-x86_64-0.6.3.2-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_118 {
CUDNN_VERSION=9.1.0.70
echo "Installing CUDA 11.8 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.4.0"
install_cuda 11.8.0 cuda_11.8.0_520.61.05_linux
NCCL_VERSION=v2.21.5-1
echo "Installing CUDA 11.8 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.4.0"
rm -rf /usr/local/cuda-11.8 /usr/local/cuda
# install CUDA 11.8.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
chmod +x cuda_11.8.0_520.61.05_linux.run
./cuda_11.8.0_520.61.05_linux.run --toolkit --silent
rm -f cuda_11.8.0_520.61.05_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-11.8 /usr/local/cuda
install_cudnn 11 $CUDNN_VERSION
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
CUDA_VERSION=11.8 bash install_nccl.sh
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=11.8 bash install_cusparselt.sh
install_cusparselt_040
ldconfig
}
function install_124 {
CUDNN_VERSION=9.1.0.70
echo "Installing CUDA 12.4.1 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.2"
install_cuda 12.4.1 cuda_12.4.1_550.54.15_linux
echo "Installing CUDA 12.4.1 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.2"
rm -rf /usr/local/cuda-12.4 /usr/local/cuda
# install CUDA 12.4.1 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
chmod +x cuda_12.4.1_550.54.15_linux.run
./cuda_12.4.1_550.54.15_linux.run --toolkit --silent
rm -f cuda_12.4.1_550.54.15_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.4 /usr/local/cuda
install_cudnn 12 $CUDNN_VERSION
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
CUDA_VERSION=12.4 bash install_nccl.sh
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.4 bash install_cusparselt.sh
install_cusparselt_062
ldconfig
}
function install_126 {
CUDNN_VERSION=9.5.1.17
echo "Installing CUDA 12.6.3 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.3"
install_cuda 12.6.3 cuda_12.6.3_560.35.05_linux
echo "Installing CUDA 12.6.3 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
rm -rf /usr/local/cuda-12.6 /usr/local/cuda
# install CUDA 12.6.3 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda_12.6.3_560.35.05_linux.run
chmod +x cuda_12.6.3_560.35.05_linux.run
./cuda_12.6.3_560.35.05_linux.run --toolkit --silent
rm -f cuda_12.6.3_560.35.05_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.6 /usr/local/cuda
install_cudnn 12 $CUDNN_VERSION
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
CUDA_VERSION=12.6 bash install_nccl.sh
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.6 bash install_cusparselt.sh
install_cusparselt_063
ldconfig
}
@ -182,17 +240,35 @@ function prune_126 {
}
function install_128 {
CUDNN_VERSION=9.8.0.87
echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.3"
CUDNN_VERSION=9.7.1.26
echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
rm -rf /usr/local/cuda-12.8 /usr/local/cuda
# install CUDA 12.8.0 in the same container
install_cuda 12.8.0 cuda_12.8.0_570.86.10_linux
wget -q https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run
chmod +x cuda_12.8.0_570.86.10_linux.run
./cuda_12.8.0_570.86.10_linux.run --toolkit --silent
rm -f cuda_12.8.0_570.86.10_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.8 /usr/local/cuda
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
install_cudnn 12 $CUDNN_VERSION
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
CUDA_VERSION=12.8 bash install_nccl.sh
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.8 bash install_cusparselt.sh
install_cusparselt_063
ldconfig
}

View File

@ -0,0 +1,63 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
NCCL_VERSION=v2.26.2-1
CUDNN_VERSION=9.8.0.87
function install_cusparselt_063 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-sbsa/libcusparse_lt-linux-sbsa-0.6.3.2-archive.tar.xz
tar xf libcusparse_lt-linux-sbsa-0.6.3.2-archive.tar.xz
cp -a libcusparse_lt-linux-sbsa-0.6.3.2-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-sbsa-0.6.3.2-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_128 {
echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
rm -rf /usr/local/cuda-12.8 /usr/local/cuda
# install CUDA 12.8.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux_sbsa.run
chmod +x cuda_12.8.0_570.86.10_linux_sbsa.run
./cuda_12.8.0_570.86.10_linux_sbsa.run --toolkit --silent
rm -f cuda_12.8.0_570.86.10_linux_sbsa.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.8 /usr/local/cuda
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b ${NCCL_VERSION} --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
install_cusparselt_063
ldconfig
}
# idiomatic parameter and option handling in sh
while test $# -gt 0
do
case "$1" in
12.8) install_128;
;;
*) echo "bad argument $1"; exit 1
;;
esac
shift
done

View File

@ -5,7 +5,7 @@ if [[ -n "${CUDNN_VERSION}" ]]; then
mkdir tmp_cudnn
pushd tmp_cudnn
if [[ ${CUDA_VERSION:0:4} == "12.8" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.8.0.87_cuda12-archive"
CUDNN_NAME="cudnn-linux-x86_64-9.7.1.26_cuda12-archive"
elif [[ ${CUDA_VERSION:0:4} == "12.6" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.5.1.17_cuda12-archive"
elif [[ ${CUDA_VERSION:0:2} == "12" ]]; then

38
.ci/docker/common/install_db.sh Executable file
View File

@ -0,0 +1,38 @@
#!/bin/bash
set -ex
install_ubuntu() {
apt-get update
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
# Need EPEL for many packages we depend on.
# See http://fedoraproject.org/wiki/EPEL
yum --enablerepo=extras install -y epel-release
# Cleanup
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac

View File

@ -13,7 +13,7 @@ clone_executorch() {
# and fetch the target commit
pushd executorch
git checkout "${EXECUTORCH_PINNED_COMMIT}"
git submodule update --init --recursive
git submodule update --init
popd
chown -R jenkins executorch
@ -50,7 +50,8 @@ setup_executorch() {
pushd executorch
export PYTHON_EXECUTABLE=python
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
export EXECUTORCH_BUILD_PYBIND=ON
export CMAKE_ARGS="-DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
as_jenkins .ci/scripts/setup-linux.sh --build-tool cmake || true
popd

View File

@ -17,7 +17,7 @@ if [ -n "${UBUNTU_VERSION}" ];then
libopenblas-dev libeigen3-dev libatlas-base-dev libzstd-dev
fi
pip_install numpy scipy imageio cmake ninja
conda_install numpy scipy imageio cmake ninja
git clone --depth 1 --branch release/16.x --recursive https://github.com/llvm/llvm-project.git
cmake -DCMAKE_BUILD_TYPE=Release \
@ -35,9 +35,7 @@ git clone https://github.com/halide/Halide.git
pushd Halide
git checkout ${COMMIT} && git submodule update --init --recursive
pip_install -r requirements.txt
# NOTE: pybind has a requirement for cmake > 3.5 so set the minimum cmake version here with a flag
# Context: https://github.com/pytorch/pytorch/issues/150420
cmake -G Ninja -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake --build build
test -e ${CONDA_PREFIX}/lib/python3 || ln -s python${ANACONDA_PYTHON_VERSION} ${CONDA_PREFIX}/lib/python3
cmake --install build --prefix ${CONDA_PREFIX}

View File

@ -14,9 +14,16 @@ function install_timm() {
local commit
commit=$(get_pinned_commit timm)
# TODO (huydhn): There is no torchvision release on 3.13 when I write this, so
# I'm using nightly here instead. We just need to package to be able to install
# TIMM. Removing this once vision has a release on 3.13
if [[ "${ANACONDA_PYTHON_VERSION}" == "3.13" ]]; then
pip_install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu124
fi
pip_install "git+https://github.com/huggingface/pytorch-image-models@${commit}"
# Clean up
conda_run pip uninstall -y torch torchvision triton
conda_run pip uninstall -y cmake torch torchvision triton
}
# Pango is needed for weasyprint which is needed for doctr

View File

@ -2,6 +2,8 @@
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
if [ -n "${UBUNTU_VERSION}" ]; then
apt update
apt-get install -y clang doxygen git graphviz nodejs npm libtinfo5
@ -13,8 +15,8 @@ chown -R jenkins pytorch
pushd pytorch
# Install all linter dependencies
pip install -r requirements.txt
lintrunner init
pip_install -r requirements.txt
conda_run lintrunner init
# Cache .lintbin directory as part of the Docker image
cp -r .lintbin /tmp

View File

@ -1,23 +1,26 @@
#!/usr/bin/env bash
# Script that installs magma from tarball inside conda environment.
# It replaces anaconda magma-cuda package which is no longer published.
# Execute it inside active conda environment.
# See issue: https://github.com/pytorch/pytorch/issues/138506
# Script that replaces the magma install from a conda package
set -eou pipefail
cuda_version_nodot=${1/./}
anaconda_dir=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
function do_install() {
cuda_version_nodot=${1/./}
anaconda_python_version=$2
MAGMA_VERSION="2.6.1"
magma_archive="magma-cuda${cuda_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"
(
set -x
tmp_dir=$(mktemp -d)
pushd ${tmp_dir}
curl -OLs https://ossci-linux.s3.us-east-1.amazonaws.com/${magma_archive}
tar -xvf "${magma_archive}"
mv include/* "${anaconda_dir}/include/"
mv lib/* "${anaconda_dir}/lib"
popd
)
MAGMA_VERSION="2.6.1"
magma_archive="magma-cuda${cuda_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"
anaconda_dir="/opt/conda/envs/py_${anaconda_python_version}"
(
set -x
tmp_dir=$(mktemp -d)
pushd ${tmp_dir}
curl -OLs https://ossci-linux.s3.us-east-1.amazonaws.com/${magma_archive}
tar -xvf "${magma_archive}"
mv include/* "${anaconda_dir}/include/"
mv lib/* "${anaconda_dir}/lib"
popd
)
}
do_install $1 $2

View File

@ -1,26 +0,0 @@
#!/bin/bash
set -ex
NCCL_VERSION=""
if [[ ${CUDA_VERSION:0:2} == "11" ]]; then
NCCL_VERSION=$(cat ci_commit_pins/nccl-cu11.txt)
elif [[ ${CUDA_VERSION:0:2} == "12" ]]; then
NCCL_VERSION=$(cat ci_commit_pins/nccl-cu12.txt)
else
echo "Unexpected CUDA_VERSION ${CUDA_VERSION}"
exit 1
fi
if [[ -n "${NCCL_VERSION}" ]]; then
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
pushd nccl
make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
popd
rm -rf nccl
ldconfig
fi

View File

@ -0,0 +1,19 @@
#!/bin/bash
set -ex
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz" --retry 3
tar -xvz --no-same-owner -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
NPROC=$[$(nproc) - 2]
pushd "$pb_dir" && ./configure && make -j${NPROC} && make -j${NPROC} check && sudo make -j${NRPOC} install && sudo ldconfig
popd
rm -rf $pb_dir

View File

@ -1,15 +0,0 @@
#!/bin/bash
set -ex
apt-get update
# Use deadsnakes in case we need an older python version
sudo add-apt-repository ppa:deadsnakes/ppa
apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python3-pip python${PYTHON_VERSION}-venv
# Use a venv because uv and some other package managers don't support --user install
ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python
python -m venv /var/lib/jenkins/ci_env
source /var/lib/jenkins/ci_env/bin/activate
python -mpip install --upgrade pip
python -mpip install -r /opt/requirements-ci.txt

View File

@ -8,6 +8,10 @@ ver() {
install_ubuntu() {
apt-get update
if [[ $UBUNTU_VERSION == 18.04 ]]; then
# gpg-agent is not available by default on 18.04
apt-get install -y --no-install-recommends gpg-agent
fi
if [[ $UBUNTU_VERSION == 20.04 ]]; then
# gpg-agent is not available by default on 20.04
apt-get install -y --no-install-recommends gpg-agent
@ -19,13 +23,6 @@ install_ubuntu() {
apt-get install -y libc++1
apt-get install -y libc++abi1
# Make sure rocm packages from repo.radeon.com have highest priority
cat << EOF > /etc/apt/preferences.d/rocm-pin-600
Package: *
Pin: release o=repo.radeon.com
Pin-Priority: 600
EOF
# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
echo "deb [arch=amd64] https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
@ -66,25 +63,17 @@ EOF
done
# ROCm 6.3 had a regression where initializing static code objects had significant overhead
# ROCm 6.4 did not yet fix the regression, also HIP branch names are different
if [[ $(ver $ROCM_VERSION) -eq $(ver 6.3) ]] || [[ $(ver $ROCM_VERSION) -eq $(ver 6.4) ]]; then
if [[ $(ver $ROCM_VERSION) -eq $(ver 6.3) ]]; then
HIP_BRANCH=rocm-6.3.x
VER_STR=6.3
elif [[ $(ver $ROCM_VERSION) -eq $(ver 6.4) ]]; then
HIP_BRANCH=release/rocm-rel-6.4
VER_STR=6.4
fi
if [[ $(ver $ROCM_VERSION) -eq $(ver 6.3) ]]; then
# clr build needs CppHeaderParser but can only find it using conda's python
/opt/conda/bin/python -m pip install CppHeaderParser
git clone https://github.com/ROCm/HIP -b $HIP_BRANCH
git clone https://github.com/ROCm/HIP -b rocm-6.3.x
HIP_COMMON_DIR=$(readlink -f HIP)
git clone https://github.com/jeffdaily/clr -b release/rocm-rel-${VER_STR}-statco-hotfix
git clone https://github.com/jeffdaily/clr -b release/rocm-rel-6.3-statco-hotfix
mkdir -p clr/build
pushd clr/build
cmake .. -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=$HIP_COMMON_DIR
make -j
cp hipamd/lib/libamdhip64.so.${VER_STR}.* /opt/rocm/lib/libamdhip64.so.${VER_STR}.*
cp hipamd/lib/libamdhip64.so.6.3.* /opt/rocm/lib/libamdhip64.so.6.3.*
popd
rm -rf HIP clr
fi

View File

@ -1,32 +1,50 @@
#!/usr/bin/env bash
# Script used only in CD pipeline
#!/bin/bash
# Script used in CI and CD pipeline
set -eou pipefail
set -ex
function do_install() {
rocm_version=$1
rocm_version_nodot=${1//./}
# Magma build scripts need `python`
ln -sf /usr/bin/python3 /usr/bin/python
# Version 2.7.2 + ROCm related updates
MAGMA_VERSION=a1625ff4d9bc362906bd01f805dbbe12612953f6
magma_archive="magma-rocm${rocm_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
almalinux)
yum install -y gcc-gfortran
;;
*)
echo "No preinstalls to build magma..."
;;
esac
rocm_dir="/opt/rocm"
(
set -x
tmp_dir=$(mktemp -d)
pushd ${tmp_dir}
curl -OLs https://ossci-linux.s3.us-east-1.amazonaws.com/${magma_archive}
if tar -xvf "${magma_archive}"
then
mkdir -p "${rocm_dir}/magma"
mv include "${rocm_dir}/magma/include"
mv lib "${rocm_dir}/magma/lib"
else
echo "${magma_archive} not found, skipping magma install"
fi
popd
)
}
MKLROOT=${MKLROOT:-/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION}
do_install $1
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# Version 2.7.2 + ROCm related updates
git checkout a1625ff4d9bc362906bd01f805dbbe12612953f6
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
if [[ -f "${MKLROOT}/lib/libmkl_core.a" ]]; then
echo 'LIB = -Wl,--start-group -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -lstdc++ -lm -lgomp -lhipblas -lhipsparse' >> make.inc
fi
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib -ldl' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --offload-arch=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT="${MKLROOT}"
make testing/testing_dgemm -j $(nproc) MKLROOT="${MKLROOT}"
popd
mv magma /opt/rocm

View File

@ -0,0 +1,24 @@
#!/bin/bash
set -ex
[ -n "${SWIFTSHADER}" ]
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
_https_amazon_aws=https://ossci-android.s3.amazonaws.com
# SwiftShader
_swiftshader_dir=/var/lib/jenkins/swiftshader
_swiftshader_file_targz=swiftshader-abe07b943-prebuilt.tar.gz
mkdir -p $_swiftshader_dir
_tmp_swiftshader_targz="/tmp/${_swiftshader_file_targz}"
curl --silent --show-error --location --fail --retry 3 \
--output "${_tmp_swiftshader_targz}" "$_https_amazon_aws/${_swiftshader_file_targz}"
tar -C "${_swiftshader_dir}" -xzf "${_tmp_swiftshader_targz}"
export VK_ICD_FILENAMES="${_swiftshader_dir}/build/Linux/vk_swiftshader_icd.json"

View File

@ -2,16 +2,14 @@
set -ex
mkdir -p /opt/triton
if [ -z "${TRITON}" ] && [ -z "${TRITON_CPU}" ]; then
echo "TRITON and TRITON_CPU are not set. Exiting..."
exit 0
fi
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
get_pip_version() {
conda_run pip list | grep -w $* | head -n 1 | awk '{print $2}'
get_conda_version() {
as_jenkins conda list -n py_$ANACONDA_PYTHON_VERSION | grep -w $* | head -n 1 | awk '{print $2}'
}
conda_reinstall() {
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y --force-reinstall $*
}
if [ -n "${XPU_VERSION}" ]; then
@ -33,9 +31,11 @@ if [ -n "${UBUNTU_VERSION}" ];then
apt-get install -y gpg-agent
fi
# Keep the current cmake and numpy version here, so we can reinstall them later
CMAKE_VERSION=$(get_pip_version cmake)
NUMPY_VERSION=$(get_pip_version numpy)
if [ -n "${CONDA_CMAKE}" ]; then
# Keep the current cmake and numpy version here, so we can reinstall them later
CMAKE_VERSION=$(get_conda_version cmake)
NUMPY_VERSION=$(get_conda_version numpy)
fi
if [ -z "${MAX_JOBS}" ]; then
export MAX_JOBS=$(nproc)
@ -52,7 +52,6 @@ cd triton
as_jenkins git checkout ${TRITON_PINNED_COMMIT}
as_jenkins git submodule update --init --recursive
cd python
pip_install pybind11==2.13.6
# TODO: remove patch setup.py once we have a proper fix for https://github.com/triton-lang/triton/issues/4527
as_jenkins sed -i -e 's/https:\/\/tritonlang.blob.core.windows.net\/llvm-builds/https:\/\/oaitriton.blob.core.windows.net\/public\/llvm-builds/g' setup.py
@ -61,35 +60,28 @@ if [ -n "${UBUNTU_VERSION}" ] && [ -n "${GCC_VERSION}" ] && [[ "${GCC_VERSION}"
# Triton needs at least gcc-9 to build
apt-get install -y g++-9
CXX=g++-9 conda_run python setup.py bdist_wheel
CXX=g++-9 pip_install .
elif [ -n "${UBUNTU_VERSION}" ] && [ -n "${CLANG_VERSION}" ]; then
# Triton needs <filesystem> which surprisingly is not available with clang-9 toolchain
add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get install -y g++-9
CXX=g++-9 conda_run python setup.py bdist_wheel
CXX=g++-9 pip_install .
else
conda_run python setup.py bdist_wheel
pip_install .
fi
# Copy the wheel to /opt for multi stage docker builds
cp dist/*.whl /opt/triton
# Install the wheel for docker builds that don't use multi stage
pip_install dist/*.whl
# TODO: This is to make sure that the same cmake and numpy version from install conda
# script is used. Without this step, the newer cmake version (3.25.2) downloaded by
# triton build step via pip will fail to detect conda MKL. Once that issue is fixed,
# this can be removed.
#
# The correct numpy version also needs to be set here because conda claims that it
# causes inconsistent environment. Without this, conda will attempt to install the
# latest numpy version, which fails ASAN tests with the following import error: Numba
# needs NumPy 1.20 or less.
# Note that we install numpy with pip as conda might not have the version we want
if [ -n "${CMAKE_VERSION}" ]; then
pip_install "cmake==${CMAKE_VERSION}"
fi
if [ -n "${NUMPY_VERSION}" ]; then
pip_install "numpy==${NUMPY_VERSION}"
if [ -n "${CONDA_CMAKE}" ]; then
# TODO: This is to make sure that the same cmake and numpy version from install conda
# script is used. Without this step, the newer cmake version (3.25.2) downloaded by
# triton build step via pip will fail to detect conda MKL. Once that issue is fixed,
# this can be removed.
#
# The correct numpy version also needs to be set here because conda claims that it
# causes inconsistent environment. Without this, conda will attempt to install the
# latest numpy version, which fails ASAN tests with the following import error: Numba
# needs NumPy 1.20 or less.
conda_reinstall cmake="${CMAKE_VERSION}"
# Note that we install numpy with pip as conda might not have the version we want
pip_install --force-reinstall numpy=="${NUMPY_VERSION}"
fi

View File

@ -0,0 +1,24 @@
#!/bin/bash
set -ex
[ -n "${VULKAN_SDK_VERSION}" ]
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
_vulkansdk_dir=/var/lib/jenkins/vulkansdk
_tmp_vulkansdk_targz=/tmp/vulkansdk.tar.gz
curl \
--silent \
--show-error \
--location \
--fail \
--retry 3 \
--output "${_tmp_vulkansdk_targz}" "https://ossci-android.s3.amazonaws.com/vulkansdk-linux-x86_64-${VULKAN_SDK_VERSION}.tar.gz"
mkdir -p "${_vulkansdk_dir}"
tar -C "${_vulkansdk_dir}" -xzf "${_tmp_vulkansdk_targz}" --strip-components 1
rm -rf "${_tmp_vulkansdk_targz}"

View File

@ -26,7 +26,7 @@ function install_ubuntu() {
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
| gpg --dearmor > /usr/share/keyrings/oneapi-archive-keyring.gpg.gpg
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg.gpg] \
https://apt.repos.intel.com/oneapi all main" \
https://apt.repos.intel.com/${XPU_REPO_NAME} all main" \
| tee /etc/apt/sources.list.d/oneAPI.list
# Update the packages list and repository index
@ -47,6 +47,9 @@ function install_ubuntu() {
# Development Packages
apt-get install -y libigc-dev intel-igc-cm libigdfcl-dev libigfxcmrt-dev level-zero-dev
# Install Intel Support Packages
if [[ "$XPU_VERSION" == "2025.0" ]]; then
XPU_PACKAGES="${XPU_PACKAGES} intel-oneapi-dnnl=2025.0.1-6"
fi
apt-get install -y ${XPU_PACKAGES}
# Cleanup
@ -74,7 +77,7 @@ function install_rhel() {
tee > /etc/yum.repos.d/oneAPI.repo << EOF
[oneAPI]
name=Intel for Pytorch GPU dev repository
baseurl=https://yum.repos.intel.com/oneapi
baseurl=https://yum.repos.intel.com/${XPU_REPO_NAME}
enabled=1
gpgcheck=1
repo_gpgcheck=1
@ -82,6 +85,9 @@ gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.
EOF
# Install Intel Support Packages
if [[ "$XPU_VERSION" == "2025.0" ]]; then
XPU_PACKAGES="${XPU_PACKAGES} intel-oneapi-dnnl-2025.0.1-6"
fi
yum install -y ${XPU_PACKAGES}
# The xpu-smi packages
dnf install -y xpu-smi
@ -118,7 +124,7 @@ function install_sles() {
https://repositories.intel.com/gpu/sles/${VERSION_SP}${XPU_DRIVER_VERSION}/unified/intel-gpu-${VERSION_SP}.repo
rpm --import https://repositories.intel.com/gpu/intel-graphics.key
# To add the online network network package repository for the Intel Support Packages
zypper addrepo https://yum.repos.intel.com/oneapi oneAPI
zypper addrepo https://yum.repos.intel.com/${XPU_REPO_NAME} oneAPI
rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
# The xpu-smi packages
@ -141,10 +147,10 @@ if [[ "${XPU_DRIVER_TYPE,,}" == "rolling" ]]; then
XPU_DRIVER_VERSION=""
fi
# Default use Intel® oneAPI Deep Learning Essentials 2025.0
if [[ "$XPU_VERSION" == "2025.1" ]]; then
XPU_PACKAGES="intel-deep-learning-essentials-2025.1"
else
XPU_REPO_NAME="intel-for-pytorch-gpu-dev"
XPU_PACKAGES="intel-for-pytorch-gpu-dev-0.5 intel-pti-dev-0.9"
if [[ "$XPU_VERSION" == "2025.0" ]]; then
XPU_REPO_NAME="oneapi"
XPU_PACKAGES="intel-deep-learning-essentials-2025.0"
fi

View File

@ -49,9 +49,6 @@ RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM cpu as cuda
ADD ./common/install_cuda.sh install_cuda.sh
ADD ./common/install_magma.sh install_magma.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
ENV CUDA_HOME /usr/local/cuda
FROM cuda as cuda11.8
@ -75,7 +72,6 @@ RUN bash ./install_magma.sh 12.8
RUN ln -sf /usr/local/cuda-12.8 /usr/local/cuda
FROM cpu as rocm
ARG ROCM_VERSION
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ENV MKLROOT /opt/intel
@ -90,11 +86,11 @@ ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
# gfortran and python needed for building magma from source for ROCm
RUN apt-get update -y && \
apt-get install gfortran -y && \
apt-get install python3 python-is-python3 -y && \
apt-get install python -y && \
apt-get clean
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION} && rm install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
FROM ${BASE_TARGET} as final
COPY --from=openssl /opt/openssl /opt/openssl

View File

@ -1,63 +1,83 @@
#!/usr/bin/env bash
# Script used only in CD pipeline
set -eoux pipefail
set -eou pipefail
image="$1"
shift
if [ -z "${image}" ]; then
echo "Usage: $0 IMAGENAME:ARCHTAG"
echo "Usage: $0 IMAGE"
exit 1
fi
DOCKER_IMAGE="pytorch/${image}"
TOPDIR=$(git rev-parse --show-toplevel)
GPU_ARCH_TYPE=${GPU_ARCH_TYPE:-cpu}
GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}
WITH_PUSH=${WITH_PUSH:-}
DOCKER=${DOCKER:-docker}
# Go from imagename:tag to tag
DOCKER_TAG_PREFIX=$(echo "${image}" | awk -F':' '{print $2}')
GPU_ARCH_VERSION=""
if [[ "${DOCKER_TAG_PREFIX}" == cuda* ]]; then
# extract cuda version from image name. e.g. manylinux2_28-builder:cuda12.8 returns 12.8
GPU_ARCH_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'cuda' '{print $2}')
elif [[ "${DOCKER_TAG_PREFIX}" == rocm* ]]; then
# extract rocm version from image name. e.g. manylinux2_28-builder:rocm6.2.4 returns 6.2.4
GPU_ARCH_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'rocm' '{print $2}')
fi
case ${DOCKER_TAG_PREFIX} in
case ${GPU_ARCH_TYPE} in
cpu)
BASE_TARGET=cpu
DOCKER_TAG=cpu
GPU_IMAGE=ubuntu:20.04
DOCKER_GPU_BUILD_ARG=""
;;
cuda*)
cuda)
BASE_TARGET=cuda${GPU_ARCH_VERSION}
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
GPU_IMAGE=ubuntu:20.04
DOCKER_GPU_BUILD_ARG=""
;;
rocm*)
rocm)
BASE_TARGET=rocm
GPU_IMAGE=rocm/dev-ubuntu-22.04:${GPU_ARCH_VERSION}-complete
DOCKER_TAG=rocm${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-ubuntu-20.04:${GPU_ARCH_VERSION}-complete
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
DOCKER_GPU_BUILD_ARG="--build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg ROCM_VERSION=${GPU_ARCH_VERSION}"
DOCKER_GPU_BUILD_ARG="--build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}"
;;
*)
echo "ERROR: Unrecognized DOCKER_TAG_PREFIX: ${DOCKER_TAG_PREFIX}"
echo "ERROR: Unrecognized GPU_ARCH_TYPE: ${GPU_ARCH_TYPE}"
exit 1
;;
esac
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
DOCKER_BUILDKIT=1 ${DOCKER} build \
--target final \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
-t "${tmp_tag}" \
$@ \
-f "${TOPDIR}/.ci/docker/libtorch/Dockerfile" \
"${TOPDIR}/.ci/docker/"
(
set -x
DOCKER_BUILDKIT=1 ${DOCKER} build \
--target final \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
-t "${DOCKER_IMAGE}" \
$@ \
-f "${TOPDIR}/.ci/docker/libtorch/Dockerfile" \
"${TOPDIR}/.ci/docker/"
)
GITHUB_REF=${GITHUB_REF:-$(git symbolic-ref -q HEAD || git describe --tags --exact-match)}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH}" == true ]]; then
(
set -x
${DOCKER} push "${DOCKER_IMAGE}"
if [[ -n ${GITHUB_REF} ]]; then
${DOCKER} tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_BRANCH_TAG}
${DOCKER} tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_SHA_TAG}
${DOCKER} push "${DOCKER_IMAGE_BRANCH_TAG}"
${DOCKER} push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi

View File

@ -18,31 +18,28 @@ COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG PYTHON_VERSION
ARG PIP_CMAKE
# Put venv into the env vars so users don't need to activate it
ENV PATH /var/lib/jenkins/ci_env/bin:$PATH
ENV VIRTUAL_ENV /var/lib/jenkins/ci_env
COPY requirements-ci.txt /opt/requirements-ci.txt
COPY ./common/install_python.sh install_python.sh
RUN bash ./install_python.sh && rm install_python.sh /opt/requirements-ci.txt
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_magma_conda.sh install_magma_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh install_magma_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Install cuda and cudnn
ARG CUDA_VERSION
COPY ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh install_nccl.sh /ci_commit_pins/nccl-cu* install_cusparselt.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
# Note that Docker build forbids copying file outside the build context
COPY ./common/install_linter.sh install_linter.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_linter.sh
RUN rm install_linter.sh
RUN chown -R jenkins:jenkins /var/lib/jenkins/ci_env
RUN rm install_linter.sh common_utils.sh
USER jenkins
CMD ["bash"]

View File

@ -15,17 +15,20 @@ COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG PYTHON_VERSION
ENV PATH /var/lib/jenkins/ci_env/bin:$PATH
ENV VIRTUAL_ENV /var/lib/jenkins/ci_env
COPY requirements-ci.txt /opt/requirements-ci.txt
COPY ./common/install_python.sh install_python.sh
RUN bash ./install_python.sh && rm install_python.sh /opt/requirements-ci.txt
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Note that Docker build forbids copying file outside the build context
COPY ./common/install_linter.sh install_linter.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_linter.sh
RUN rm install_linter.sh
RUN rm install_linter.sh common_utils.sh
USER jenkins
CMD ["bash"]

View File

@ -0,0 +1,200 @@
# syntax = docker/dockerfile:experimental
ARG ROCM_VERSION=3.7
ARG BASE_CUDA_VERSION=11.8
ARG GPU_IMAGE=centos:7
FROM centos:7 as base
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=9
# Note: This is required patch since CentOS have reached EOL
# otherwise any yum install setp will fail
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y wget curl perl util-linux xz bzip2 git patch which perl zlib-devel
# Just add everything as a safe.directory for git since these will be used in multiple places with git
RUN git config --global --add safe.directory '*'
RUN yum install -y yum-utils centos-release-scl
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
# Note: After running yum-config-manager --enable rhel-server-rhscl-7-rpms
# patch is required once again. Somehow this steps adds mirror.centos.org
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y devtoolset-${DEVTOOLSET_VERSION}-gcc devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran devtoolset-${DEVTOOLSET_VERSION}-binutils
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
RUN yum --enablerepo=extras install -y epel-release
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake
RUN yum install -y autoconf aclocal automake make sudo
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
# EPEL for cmake
FROM base as patchelf
# Install patchelf
ADD ./common/install_patchelf.sh install_patchelf.sh
RUN bash ./install_patchelf.sh && rm install_patchelf.sh
RUN cp $(which patchelf) /patchelf
FROM patchelf as python
# build python
COPY manywheel/build_scripts /build_scripts
ADD ./common/install_cpython.sh /build_scripts/install_cpython.sh
RUN bash build_scripts/build.sh && rm -r build_scripts
FROM base as cuda
ARG BASE_CUDA_VERSION=10.2
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
FROM base as intel
# MKL
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as magma
ARG BASE_CUDA_VERSION=10.2
# Install magma
ADD ./common/install_magma.sh install_magma.sh
RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh
FROM base as jni
# Install java jni header
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
FROM base as libpng
# Install libpng
ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM ${GPU_IMAGE} as common
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
RUN yum install -y \
aclocal \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm
RUN yum install -y \
https://repo.ius.io/ius-release-el7.rpm \
https://ossci-linux.s3.amazonaws.com/epel-release-7-14.noarch.rpm
RUN yum swap -y git git236-core
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
# Install LLVM version
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
COPY --from=python /opt/python/cp39-cp39/bin/auditwheel /usr/local/bin/auditwheel
COPY --from=intel /opt/intel /opt/intel
COPY --from=patchelf /usr/local/bin/patchelf /usr/local/bin/patchelf
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
COPY --from=libpng /usr/local/bin/png* /usr/local/bin/
COPY --from=libpng /usr/local/bin/libpng* /usr/local/bin/
COPY --from=libpng /usr/local/include/png* /usr/local/include/
COPY --from=libpng /usr/local/include/libpng* /usr/local/include/
COPY --from=libpng /usr/local/lib/libpng* /usr/local/lib/
COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/lib/pkgconfig
FROM common as cpu_final
ARG BASE_CUDA_VERSION=10.1
ARG DEVTOOLSET_VERSION=9
# Install Anaconda
ADD ./common/install_conda_docker.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
ENV PATH /opt/conda/bin:$PATH
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y yum-utils centos-release-scl
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y devtoolset-${DEVTOOLSET_VERSION}-gcc devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran devtoolset-${DEVTOOLSET_VERSION}-binutils
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# cmake is already installed inside the rocm base image, so remove if present
RUN rpm -e cmake || true
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake
# ninja
RUN yum install -y ninja-build
FROM cpu_final as cuda_final
RUN rm -rf /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=cuda /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=magma /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
RUN ln -sf /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda
ENV PATH=/usr/local/cuda/bin:$PATH
FROM cpu_final as rocm_final
ARG ROCM_VERSION=3.7
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Adding ROCM_PATH env var so that LoadHip.cmake (even with logic updated for ROCm6.0)
# find HIP works for ROCm5.7. Not needed for ROCm6.0 and above.
# Remove below when ROCm5.7 is not in support matrix anymore.
ENV ROCM_PATH /opt/rocm
ENV MKLROOT /opt/intel
# No need to install ROCm as base docker image should have full ROCm install
#ADD ./common/install_rocm.sh install_rocm.sh
#RUN ROCM_VERSION=${ROCM_VERSION} bash ./install_rocm.sh && rm install_rocm.sh
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
# cmake3 is needed for the MIOpen build
RUN ln -sf /usr/local/bin/cmake /usr/bin/cmake3
ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh

View File

@ -7,8 +7,8 @@ ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=13
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel yum-utils gcc-toolset-${DEVTOOLSET_VERSION}-gcc gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran gcc-toolset-${DEVTOOLSET_VERSION}-gdb
ARG DEVTOOLSET_VERSION=11
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel yum-utils gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
@ -33,13 +33,10 @@ RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
FROM base as cuda
ARG BASE_CUDA_VERSION=12.6
ARG BASE_CUDA_VERSION=11.8
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh install_nccl.sh ci_commit_pins/nccl-cu* install_cusparselt.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
FROM base as intel
# MKL
@ -47,7 +44,7 @@ ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as magma
ARG BASE_CUDA_VERSION=12.6
ARG BASE_CUDA_VERSION=10.2
# Install magma
ADD ./common/install_magma.sh install_magma.sh
RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh
@ -64,7 +61,7 @@ ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM ${GPU_IMAGE} as common
ARG DEVTOOLSET_VERSION=13
ARG DEVTOOLSET_VERSION=11
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
@ -87,12 +84,13 @@ RUN yum install -y \
wget \
which \
xz \
glibc-langpack-en \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran \
gcc-toolset-${DEVTOOLSET_VERSION}-gdb
gcc-toolset-${DEVTOOLSET_VERSION}-toolchain \
glibc-langpack-en
RUN yum install -y \
https://repo.ius.io/ius-release-el7.rpm \
https://ossci-linux.s3.amazonaws.com/epel-release-7-14.noarch.rpm
RUN yum swap -y git git236-core
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
@ -116,8 +114,8 @@ COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
FROM common as cpu_final
ARG BASE_CUDA_VERSION=12.6
ARG DEVTOOLSET_VERSION=13
ARG BASE_CUDA_VERSION=11.8
ARG DEVTOOLSET_VERSION=11
# Install Anaconda
ADD ./common/install_conda_docker.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
@ -156,14 +154,11 @@ ENV ROCM_PATH /opt/rocm
# and avoid 3.21.0 cmake+ninja issues with ninja inserting "-Wl,--no-as-needed" in LINK_FLAGS for static linker
RUN python3 -m pip install --upgrade pip && \
python3 -mpip install cmake==3.28.4
# replace the libdrm in /opt/amdgpu with custom amdgpu.ids lookup path
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
# ROCm 6.4 rocm-smi depends on system drm.h header
RUN yum install -y libdrm-devel
ENV MKLROOT /opt/intel
ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION} && rm install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh
@ -174,6 +169,6 @@ ENV XPU_DRIVER_TYPE ROLLING
RUN python3 -m pip install --upgrade pip && \
python3 -mpip install cmake==3.28.4
ADD ./common/install_xpu.sh install_xpu.sh
ENV XPU_VERSION 2025.1
ENV XPU_VERSION 2025.0
RUN bash ./install_xpu.sh && rm install_xpu.sh
RUN pushd /opt/_internal && tar -xJf static-libs-for-embedding-only.tar.xz && popd

View File

@ -1,6 +1,7 @@
FROM quay.io/pypa/manylinux_2_28_aarch64 as base
ARG GCCTOOLSET_VERSION=13
# Graviton needs GCC 10 or above for the build. GCC12 is the default version in almalinux-8.
ARG GCCTOOLSET_VERSION=11
# Language variabes
ENV LC_ALL=en_US.UTF-8
@ -35,10 +36,7 @@ RUN yum install -y \
yasm \
zstd \
sudo \
gcc-toolset-${GCCTOOLSET_VERSION}-gcc \
gcc-toolset-${GCCTOOLSET_VERSION}-gcc-c++ \
gcc-toolset-${GCCTOOLSET_VERSION}-gcc-gfortran \
gcc-toolset-${GCCTOOLSET_VERSION}-gdb
gcc-toolset-${GCCTOOLSET_VERSION}-toolchain
# (optional) Install non-default Ninja version
ARG NINJA_VERSION

View File

@ -0,0 +1,94 @@
FROM quay.io/pypa/manylinux2014_aarch64 as base
# Graviton needs GCC 10 for the build
ARG DEVTOOLSET_VERSION=10
# Language variabes
ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8
# Installed needed OS packages. This is to support all
# the binary builds (torch, vision, audio, text, data)
RUN yum -y install epel-release
RUN yum -y update
RUN yum install -y \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm \
less \
zstd \
libgomp \
sudo \
devtoolset-${DEVTOOLSET_VERSION}-gcc \
devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ \
devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran \
devtoolset-${DEVTOOLSET_VERSION}-binutils
# Ensure the expected devtoolset is used
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
###############################################################################
# libglfortran.a hack
#
# libgfortran.a from quay.io/pypa/manylinux2014_aarch64 is not compiled with -fPIC.
# This causes __stack_chk_guard@@GLIBC_2.17 on pytorch build. To solve, get
# ubuntu's libgfortran.a which is compiled with -fPIC
# NOTE: Need a better way to get this library as Ubuntu's package can be removed by the vender, or changed
###############################################################################
RUN cd ~/ \
&& curl -L -o ~/libgfortran-10-dev.deb http://ports.ubuntu.com/ubuntu-ports/pool/universe/g/gcc-10/libgfortran-10-dev_10.5.0-4ubuntu2_arm64.deb \
&& ar x ~/libgfortran-10-dev.deb \
&& tar --use-compress-program=unzstd -xvf data.tar.zst -C ~/ \
&& cp -f ~/usr/lib/gcc/aarch64-linux-gnu/10/libgfortran.a /opt/rh/devtoolset-10/root/usr/lib/gcc/aarch64-redhat-linux/10/
# install cmake
RUN yum install -y cmake3 && \
ln -s /usr/bin/cmake3 /usr/bin/cmake
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
FROM base as openblas
# Install openblas
ADD ./common/install_openblas.sh install_openblas.sh
RUN bash ./install_openblas.sh && rm install_openblas.sh
FROM openssl as final
# remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
COPY --from=openblas /opt/OpenBLAS/ /opt/OpenBLAS/
ENV LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH

View File

@ -1,7 +1,7 @@
FROM quay.io/pypa/manylinux_2_28_aarch64 as base
# Cuda ARM build needs gcc 11
ARG DEVTOOLSET_VERSION=13
ARG DEVTOOLSET_VERSION=11
# Language variables
ENV LC_ALL=en_US.UTF-8
@ -34,10 +34,7 @@ RUN yum install -y \
zstd \
libgomp \
sudo \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran \
gcc-toolset-${DEVTOOLSET_VERSION}-gdb
gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
# Ensure the expected devtoolset is used
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
@ -69,11 +66,8 @@ RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
FROM base as cuda
ARG BASE_CUDA_VERSION
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./common/install_cusparselt.sh install_cusparselt.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh install_nccl.sh ci_commit_pins/nccl-cu* install_cusparselt.sh
ADD ./common/install_cuda_aarch64.sh install_cuda_aarch64.sh
RUN bash ./install_cuda_aarch64.sh ${BASE_CUDA_VERSION} && rm install_cuda_aarch64.sh
FROM base as magma
ARG BASE_CUDA_VERSION

View File

@ -42,7 +42,6 @@ RUN yum install -y \
llvm-devel \
libzstd-devel \
python3.12-devel \
python3.12-test \
python3.12-setuptools \
python3.12-pip \
python3-virtualenv \
@ -102,33 +101,24 @@ CMD ["/bin/bash"]
# install test dependencies:
# - grpcio requires system openssl, bundled crypto fails to build
# - ml_dtypes 0.4.0 requires some fixes provided in later commits to build
RUN dnf install -y \
protobuf-devel \
protobuf-c-devel \
protobuf-lite-devel \
hdf5-devel \
python3-h5py \
git
wget \
patch
RUN env GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True pip3 install grpcio
# cmake-3.28.0 from pip for onnxruntime
RUN python3 -mpip install cmake==3.28.0
# build onnxruntime 1.21.0 from sources.
# it is not possible to build it from sources using pip,
# so just build it from upstream repository.
# h5py is dependency of onnxruntime_training.
# h5py==3.11.0 builds with hdf5-devel 1.10.5 from repository.
# install newest flatbuffers version first:
# for some reason old version is getting pulled in otherwise.
# packaging package is required for onnxruntime wheel build.
RUN pip3 install flatbuffers && \
pip3 install h5py==3.11.0 && \
pip3 install packaging && \
git clone https://github.com/microsoft/onnxruntime && \
cd onnxruntime && git checkout v1.21.0 && \
RUN env GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True pip3 install grpcio==1.65.4
RUN cd ~ && \
git clone https://github.com/jax-ml/ml_dtypes && \
cd ml_dtypes && \
git checkout v0.4.0 && \
git submodule update --init --recursive && \
./build.sh --config Release --parallel 0 --enable_pybind --build_wheel --enable_training --enable_training_apis --enable_training_ops --skip_tests --allow_running_as_root && \
pip3 install ./build/Linux/Release/dist/onnxruntime_training-*.whl && \
cd .. && /bin/rm -rf ./onnxruntime
wget https://github.com/jax-ml/ml_dtypes/commit/b969f76914d6b30676721bc92bf0f6021a0d1321.patch && \
wget https://github.com/jax-ml/ml_dtypes/commit/d4e6d035ecda073eab8bcf60f4eef572ee7087e6.patch && \
patch -p1 < b969f76914d6b30676721bc92bf0f6021a0d1321.patch && \
patch -p1 < d4e6d035ecda073eab8bcf60f4eef572ee7087e6.patch && \
python3 setup.py bdist_wheel && \
pip3 install dist/*.whl && \
rm -rf ml_dtypes

View File

@ -1,7 +1,7 @@
#!/usr/bin/env bash
# Script used only in CD pipeline
set -exou pipefail
set -eou pipefail
TOPDIR=$(git rev-parse --show-toplevel)
@ -9,108 +9,152 @@ image="$1"
shift
if [ -z "${image}" ]; then
echo "Usage: $0 IMAGE:ARCHTAG"
echo "Usage: $0 IMAGE"
exit 1
fi
# Go from imagename:tag to tag
DOCKER_TAG_PREFIX=$(echo "${image}" | awk -F':' '{print $2}')
DOCKER_IMAGE="pytorch/${image}"
GPU_ARCH_VERSION=""
if [[ "${DOCKER_TAG_PREFIX}" == cuda* ]]; then
# extract cuda version from image name. e.g. manylinux2_28-builder:cuda12.8 returns 12.8
GPU_ARCH_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'cuda' '{print $2}')
elif [[ "${DOCKER_TAG_PREFIX}" == rocm* ]]; then
# extract rocm version from image name. e.g. manylinux2_28-builder:rocm6.2.4 returns 6.2.4
GPU_ARCH_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'rocm' '{print $2}')
fi
DOCKER_REGISTRY="${DOCKER_REGISTRY:-docker.io}"
GPU_ARCH_TYPE=${GPU_ARCH_TYPE:-cpu}
GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}
MANY_LINUX_VERSION=${MANY_LINUX_VERSION:-}
DOCKERFILE_SUFFIX=${DOCKERFILE_SUFFIX:-}
WITH_PUSH=${WITH_PUSH:-}
case ${image} in
manylinux2_28-builder:cpu)
case ${GPU_ARCH_TYPE} in
cpu)
TARGET=cpu_final
DOCKER_TAG=cpu
GPU_IMAGE=centos:7
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=9"
;;
cpu-manylinux_2_28)
TARGET=cpu_final
DOCKER_TAG=cpu
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=13"
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28"
;;
manylinux2_28_aarch64-builder:cpu-aarch64)
cpu-aarch64)
TARGET=final
DOCKER_TAG=cpu-aarch64
GPU_IMAGE=arm64v8/centos:7
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=10"
MANY_LINUX_VERSION="aarch64"
;;
cpu-aarch64-2_28)
TARGET=final
DOCKER_TAG=cpu-aarch64
GPU_IMAGE=arm64v8/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=13 --build-arg NINJA_VERSION=1.12.1"
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11 --build-arg NINJA_VERSION=1.12.1"
MANY_LINUX_VERSION="2_28_aarch64"
;;
manylinuxcxx11-abi-builder:cpu-cxx11-abi)
cpu-cxx11-abi)
TARGET=final
DOCKER_TAG=cpu-cxx11-abi
GPU_IMAGE=""
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=9"
MANY_LINUX_VERSION="cxx11-abi"
;;
manylinuxs390x-builder:cpu-s390x)
cpu-s390x)
TARGET=final
DOCKER_TAG=cpu-s390x
GPU_IMAGE=s390x/almalinux:8
DOCKER_GPU_BUILD_ARG=""
MANY_LINUX_VERSION="s390x"
;;
manylinux2_28-builder:cuda11*)
cuda)
TARGET=cuda_final
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
# Keep this up to date with the minimum version of CUDA we currently support
GPU_IMAGE=centos:7
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=9"
;;
cuda-manylinux_2_28)
TARGET=cuda_final
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28"
;;
manylinux2_28-builder:cuda12*)
cuda-aarch64)
TARGET=cuda_final
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=13"
MANY_LINUX_VERSION="2_28"
;;
manylinuxaarch64-builder:cuda*)
TARGET=cuda_final
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=13"
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
GPU_IMAGE=arm64v8/centos:7
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="aarch64"
DOCKERFILE_SUFFIX="_cuda_aarch64"
;;
manylinux2_28-builder:rocm*)
rocm|rocm-manylinux_2_28)
TARGET=rocm_final
MANY_LINUX_VERSION="2_28"
DEVTOOLSET_VERSION="11"
GPU_IMAGE=rocm/dev-almalinux-8:${GPU_ARCH_VERSION}-complete
DOCKER_TAG=rocm${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-centos-7:${GPU_ARCH_VERSION}-complete
DEVTOOLSET_VERSION="9"
if [ ${GPU_ARCH_TYPE} == "rocm-manylinux_2_28" ]; then
MANY_LINUX_VERSION="2_28"
DEVTOOLSET_VERSION="11"
GPU_IMAGE=rocm/dev-almalinux-8:${GPU_ARCH_VERSION}-complete
fi
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
DOCKER_GPU_BUILD_ARG="--build-arg ROCM_VERSION=${GPU_ARCH_VERSION} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}"
;;
manylinux2_28-builder:xpu)
xpu)
TARGET=xpu_final
DOCKER_TAG=xpu
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28"
;;
*)
echo "ERROR: Unrecognized image name: ${image}"
echo "ERROR: Unrecognized GPU_ARCH_TYPE: ${GPU_ARCH_TYPE}"
exit 1
;;
esac
IMAGES=''
if [[ -n ${MANY_LINUX_VERSION} && -z ${DOCKERFILE_SUFFIX} ]]; then
DOCKERFILE_SUFFIX=_${MANY_LINUX_VERSION}
fi
# Only activate this if in CI
if [ "$(uname -m)" != "s390x" ] && [ -v CI ]; then
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker
(
set -x
# Only activate this if in CI
if [ "$(uname -m)" != "s390x" ] && [ -v CI ]; then
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker
fi
DOCKER_BUILDKIT=1 docker build \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--target "${TARGET}" \
-t "${DOCKER_IMAGE}" \
$@ \
-f "${TOPDIR}/.ci/docker/manywheel/Dockerfile${DOCKERFILE_SUFFIX}" \
"${TOPDIR}/.ci/docker/"
)
GITHUB_REF=${GITHUB_REF:-"dev")}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH}" == true ]]; then
(
set -x
docker push "${DOCKER_IMAGE}"
if [[ -n ${GITHUB_REF} ]]; then
docker tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_BRANCH_TAG}
docker tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_SHA_TAG}
docker push "${DOCKER_IMAGE_BRANCH_TAG}"
docker push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
DOCKER_BUILDKIT=1 docker build \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--target "${TARGET}" \
-t "${tmp_tag}" \
$@ \
-f "${TOPDIR}/.ci/docker/manywheel/Dockerfile${DOCKERFILE_SUFFIX}" \
"${TOPDIR}/.ci/docker/"

View File

@ -97,7 +97,7 @@ find /opt/_internal -type f -print0 \
| xargs -0 -n1 strip --strip-unneeded 2>/dev/null || true
# We do not need the Python test suites, or indeed the precompiled .pyc and
# .pyo files. Partially cribbed from:
# https://github.com/docker-library/python/blob/master/3.4/slim/Dockerfile # @lint-ignore
# https://github.com/docker-library/python/blob/master/3.4/slim/Dockerfile
find /opt/_internal \
\( -type d -a -name test -o -name tests \) \
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \

View File

@ -2,7 +2,7 @@
# Helper utilities for build
# Script used only in CD pipeline
OPENSSL_DOWNLOAD_URL=https://www.openssl.org/source/old/1.1.1/ # @lint-ignore
OPENSSL_DOWNLOAD_URL=https://www.openssl.org/source/old/1.1.1/
CURL_DOWNLOAD_URL=https://curl.se/download
AUTOCONF_DOWNLOAD_URL=https://ftp.gnu.org/gnu/autoconf

View File

@ -41,14 +41,11 @@ fbscribelogger==0.1.7
#Pinned versions: 0.1.6
#test that import:
flatbuffers==2.0 ; platform_machine != "s390x"
flatbuffers==2.0
#Description: cross platform serialization library
#Pinned versions: 2.0
#test that import:
flatbuffers ; platform_machine == "s390x"
#Description: cross platform serialization library; Newer version is required on s390x for new python version
hypothesis==5.35.1
# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
#Description: advanced library for generating parametrized tests
@ -105,10 +102,10 @@ networkx==2.8.8
#Pinned versions: 2.8.8
#test that import: functorch
ninja==1.11.1.3
#Description: build system. Used in some tests. Used in build to generate build
#time tracing information
#Pinned versions: 1.11.1.3
#ninja
#Description: build system. Note that it install from
#here breaks things so it is commented out
#Pinned versions: 1.10.0.post1
#test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py
numba==0.49.0 ; python_version < "3.9"
@ -356,7 +353,7 @@ parameterized==0.8.1
#Pinned versions: 1.24.0
#test that import: test_sac_estimator.py
pwlf==2.2.1
pwlf==2.2.1 ; python_version >= "3.8"
#Description: required for testing torch/distributed/_tools/sac_estimator.py
#Pinned versions: 2.2.1
#test that import: test_sac_estimator.py
@ -368,9 +365,10 @@ PyYAML
pyzstd
setuptools
ninja==1.11.1 ; platform_machine == "aarch64"
scons==4.5.2 ; platform_machine == "aarch64"
pulp==2.9.0
pulp==2.9.0 ; python_version >= "3.8"
#Description: required for testing ilp formulaiton under torch/distributed/_tools
#Pinned versions: 2.9.0
#test that import: test_sac_ilp.py
@ -379,6 +377,3 @@ dataclasses_json==0.6.7
#Description: required for data pipeline and scripts under tools/stats
#Pinned versions: 0.6.7
#test that import:
cmake==4.0.0
#Description: required for building

View File

@ -1,20 +1,15 @@
sphinx==5.3.0
#Description: This is used to generate PyTorch docs
#Pinned versions: 5.3.0
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git@pytorch_sphinx_theme2#egg=pytorch_sphinx_theme2
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
# TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering
# but it doesn't seem to work and hangs around idly. The initial thought is probably
# something related to Docker setup. We can investigate this later
sphinxcontrib.katex==0.8.6
#Description: This is used to generate PyTorch docs
#Pinned versions: 0.8.6
sphinxext-opengraph==0.9.1
#Description: This is used to generate PyTorch docs
#Pinned versions: 0.9.1
matplotlib==3.5.3
#Description: This is used to generate PyTorch docs
#Pinned versions: 3.5.3
@ -51,6 +46,5 @@ myst-nb==0.17.2
# The following are required to build torch.distributed.elastic.rendezvous.etcd* docs
python-etcd==0.4.5
sphinx-copybutton==0.5.0
sphinx-design==0.4.0
sphinxcontrib-mermaid==1.0.0
sphinx-panels==0.4.1
myst-parser==0.18.1

View File

@ -1 +1 @@
3.3.0
3.3.1

View File

@ -2,7 +2,7 @@ ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG IMAGE_NAME
FROM ${IMAGE_NAME} as base
FROM ${IMAGE_NAME}
ARG UBUNTU_VERSION
ARG CUDA_VERSION
@ -26,6 +26,7 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
ARG ANACONDA_PYTHON_VERSION
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ARG CONDA_CMAKE
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
@ -42,6 +43,20 @@ ARG CLANG_VERSION
COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -75,21 +90,21 @@ COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
ARG TRITON
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
FROM base as triton-builder
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
COPY triton_version.txt triton_version.txt
RUN bash ./install_triton.sh
FROM base as final
COPY --from=triton-builder /opt/triton /opt/triton
RUN if [ -n "${TRITON}" ]; then pip install /opt/triton/*.whl; chown -R jenkins:jenkins /opt/conda; fi
RUN rm -rf /opt/triton
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt triton_version.txt
ARG HALIDE
# Build and install halide
@ -144,16 +159,6 @@ COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash install_cusparselt.sh
RUN rm install_cusparselt.sh
# Install NCCL
ARG CUDA_VERSION
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash install_nccl.sh
RUN rm install_nccl.sh /ci_commit_pins/nccl-cu*
ENV USE_SYSTEM_NCCL=1
ENV NCCL_INCLUDE_DIR="/usr/local/cuda/include/"
ENV NCCL_LIB_DIR="/usr/local/cuda/lib64/"
# Install CUDSS
ARG CUDA_VERSION
COPY ./common/install_cudss.sh install_cudss.sh

View File

@ -27,6 +27,7 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
ARG ANACONDA_PYTHON_VERSION
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ARG CONDA_CMAKE
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
@ -42,6 +43,20 @@ ARG CLANG_VERSION
COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -55,7 +70,7 @@ COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION}
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh
@ -100,6 +115,12 @@ COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh

View File

@ -28,6 +28,7 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ARG DOCS
ARG BUILD_ENVIRONMENT
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
@ -76,6 +77,13 @@ COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-xpu.txt triton_version.txt
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -83,6 +91,12 @@ RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh

View File

@ -1,6 +1,6 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION} as base
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
@ -28,6 +28,7 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ARG DOCS
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
@ -51,17 +52,9 @@ RUN bash ./install_lcov.sh && rm install_lcov.sh
# Install cuda and cudnn
ARG CUDA_VERSION
COPY ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh install_nccl.sh /ci_commit_pins/nccl-cu* install_cusparselt.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
# No effect if cuda not installed
ENV USE_SYSTEM_NCCL=1
ENV NCCL_INCLUDE_DIR="/usr/local/cuda/include/"
ENV NCCL_LIB_DIR="/usr/local/cuda/lib64/"
# (optional) Install UCC
ARG UCX_COMMIT
@ -74,6 +67,20 @@ ADD ./common/install_ucc.sh install_ucc.sh
RUN if [ -n "${UCX_COMMIT}" ] && [ -n "${UCC_COMMIT}" ]; then bash ./install_ucc.sh; fi
RUN rm install_ucc.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -81,6 +88,24 @@ RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install Vulkan SDK
ARG VULKAN_SDK_VERSION
COPY ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh
RUN if [ -n "${VULKAN_SDK_VERSION}" ]; then bash ./install_vulkan_sdk.sh; fi
RUN rm install_vulkan_sdk.sh
# (optional) Install swiftshader
ARG SWIFTSHADER
COPY ./common/install_swiftshader.sh install_swiftshader.sh
RUN if [ -n "${SWIFTSHADER}" ]; then bash ./install_swiftshader.sh; fi
RUN rm install_swiftshader.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
@ -102,21 +127,20 @@ RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_d
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
ARG TRITON
ARG TRITON_CPU
# Create a separate stage for building Triton and Triton-CPU. install_triton
# will check for the presence of env vars
FROM base as triton-builder
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
COPY ci_commit_pins/triton-cpu.txt triton-cpu.txt
RUN bash ./install_triton.sh
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt
FROM base as final
COPY --from=triton-builder /opt/triton /opt/triton
RUN if [ -n "${TRITON}" ] || [ -n "${TRITON_CPU}" ]; then pip install /opt/triton/*.whl; chown -R jenkins:jenkins /opt/conda; fi
RUN rm -rf /opt/triton
ARG TRITON_CPU
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton-cpu.txt triton-cpu.txt
RUN if [ -n "${TRITON_CPU}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-cpu.txt
ARG EXECUTORCH
# Build and install executorch

View File

@ -1,2 +0,0 @@
output/
magma-rocm*/

View File

@ -1,35 +0,0 @@
SHELL=/usr/bin/env bash
DOCKER_CMD ?= docker
DESIRED_ROCM ?= 6.4
DESIRED_ROCM_SHORT = $(subst .,,$(DESIRED_ROCM))
PACKAGE_NAME = magma-rocm
# inherit this from underlying docker image, do not pass this env var to docker
#PYTORCH_ROCM_ARCH ?= gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201
DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
-v $(shell git rev-parse --show-toplevel)/.ci:/builder \
-w /builder \
-e PACKAGE_NAME=${PACKAGE_NAME}${DESIRED_ROCM_SHORT} \
-e DESIRED_ROCM=${DESIRED_ROCM} \
"pytorch/almalinux-builder:rocm${DESIRED_ROCM}" \
magma-rocm/build_magma.sh
.PHONY: all
all: magma-rocm64
all: magma-rocm63
.PHONY:
clean:
$(RM) -r magma-*
$(RM) -r output
.PHONY: magma-rocm64
magma-rocm64: DESIRED_ROCM := 6.4
magma-rocm64:
$(DOCKER_RUN)
.PHONY: magma-rocm63
magma-rocm63: DESIRED_ROCM := 6.3
magma-rocm63:
$(DOCKER_RUN)

View File

@ -1,48 +0,0 @@
# Magma ROCm
This folder contains the scripts and configurations to build libmagma.so, linked for various versions of ROCm.
## Building
Look in the `Makefile` for available targets to build. To build any target, for example `magma-rocm63`, run
```
# Using `docker`
make magma-rocm63
# Using `podman`
DOCKER_CMD=podman make magma-rocm63
```
This spawns a `pytorch/manylinux-rocm<version>` docker image, which has the required `devtoolset` and ROCm versions installed.
Within the docker image, it runs `build_magma.sh` with the correct environment variables set, which package the necessary files
into a tarball, with the following structure:
```
.
├── include # header files
├── lib # libmagma.so
├── info
│ ├── licenses # license file
│ └── recipe # build script
```
More specifically, `build_magma.sh` copies over the relevant files from the `package_files` directory depending on the ROCm version.
Outputted binaries should be in the `output` folder.
## Pushing
Packages can be uploaded to an S3 bucket using:
```
aws s3 cp output/*/magma-cuda*.bz2 <bucket-with-path>
```
If you do not have upload permissions, please ping @seemethere or @soumith to gain access
## New versions
New ROCm versions can be added by creating a new make target with the next desired version. For ROCm version N.n, the target should be named `magma-rocmNn`.
Make sure to edit the appropriate environment variables (e.g., DESIRED_ROCM) in the `Makefile` accordingly. Remember also to check `build_magma.sh` to ensure the logic for copying over the files remains correct.

View File

@ -1,42 +0,0 @@
#!/usr/bin/env bash
set -eou pipefail
# Environment variables
# The script expects DESIRED_CUDA and PACKAGE_NAME to be set
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
# Version 2.7.2 + ROCm related updates
MAGMA_VERSION=a1625ff4d9bc362906bd01f805dbbe12612953f6
# Folders for the build
PACKAGE_FILES=${ROOT_DIR}/magma-rocm/package_files # metadata
PACKAGE_DIR=${ROOT_DIR}/magma-rocm/${PACKAGE_NAME} # build workspace
PACKAGE_OUTPUT=${ROOT_DIR}/magma-rocm/output # where tarballs are stored
PACKAGE_BUILD=${PACKAGE_DIR} # where the content of the tarball is prepared
PACKAGE_RECIPE=${PACKAGE_BUILD}/info/recipe
PACKAGE_LICENSE=${PACKAGE_BUILD}/info/licenses
mkdir -p ${PACKAGE_DIR} ${PACKAGE_OUTPUT}/linux-64 ${PACKAGE_BUILD} ${PACKAGE_RECIPE} ${PACKAGE_LICENSE}
# Fetch magma sources and verify checksum
pushd ${PACKAGE_DIR}
git clone https://bitbucket.org/icl/magma.git
pushd magma
git checkout ${MAGMA_VERSION}
popd
popd
# build
pushd ${PACKAGE_DIR}/magma
# The build.sh script expects to be executed from the sources root folder
INSTALL_DIR=${PACKAGE_BUILD} ${PACKAGE_FILES}/build.sh
popd
# Package recipe, license and tarball
# Folder and package name are backward compatible for the build workflow
cp ${PACKAGE_FILES}/build.sh ${PACKAGE_RECIPE}/build.sh
cp ${PACKAGE_DIR}/magma/COPYRIGHT ${PACKAGE_LICENSE}/COPYRIGHT
pushd ${PACKAGE_BUILD}
tar cjf ${PACKAGE_OUTPUT}/linux-64/${PACKAGE_NAME}-${MAGMA_VERSION}-1.tar.bz2 include lib info
echo Built in ${PACKAGE_OUTPUT}/linux-64/${PACKAGE_NAME}-${MAGMA_VERSION}-1.tar.bz2
popd

View File

@ -1,38 +0,0 @@
# Magma build scripts need `python`
ln -sf /usr/bin/python3 /usr/bin/python
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
almalinux)
yum install -y gcc-gfortran
;;
*)
echo "No preinstalls to build magma..."
;;
esac
MKLROOT=${MKLROOT:-/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION}
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
if [[ -f "${MKLROOT}/lib/libmkl_core.a" ]]; then
echo 'LIB = -Wl,--start-group -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -lstdc++ -lm -lgomp -lhipblas -lhipsparse' >> make.inc
fi
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib -ldl' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --offload-arch=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT="${MKLROOT}"
make testing/testing_dgemm -j $(nproc) MKLROOT="${MKLROOT}"
cp -R lib ${INSTALL_DIR}
cp -R include ${INSTALL_DIR}

View File

@ -12,12 +12,13 @@ DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
-e PACKAGE_NAME=${PACKAGE_NAME}${DESIRED_CUDA_SHORT} \
-e DESIRED_CUDA=${DESIRED_CUDA} \
-e CUDA_ARCH_LIST="${CUDA_ARCH_LIST}" \
"pytorch/almalinux-builder:cuda${DESIRED_CUDA}-main" \
"pytorch/manylinux2_28-builder:cuda${DESIRED_CUDA}-main" \
magma/build_magma.sh
.PHONY: all
all: magma-cuda128
all: magma-cuda126
all: magma-cuda124
all: magma-cuda118
.PHONY:
@ -36,6 +37,11 @@ magma-cuda126: DESIRED_CUDA := 12.6
magma-cuda126:
$(DOCKER_RUN)
.PHONY: magma-cuda124
magma-cuda124: DESIRED_CUDA := 12.4
magma-cuda124:
$(DOCKER_RUN)
.PHONY: magma-cuda118
magma-cuda118: DESIRED_CUDA := 11.8
magma-cuda118: CUDA_ARCH_LIST += -gencode arch=compute_37,code=sm_37

View File

@ -111,6 +111,12 @@ case ${DESIRED_PYTHON} in
;;
esac
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
export _GLIBCXX_USE_CXX11_ABI=1
else
export _GLIBCXX_USE_CXX11_ABI=0
fi
if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
echo "Calling build_amd.py at $(date)"
python tools/amd_build/build_amd.py
@ -203,6 +209,12 @@ if [[ -n "$BUILD_PYTHONLESS" ]]; then
mkdir -p /tmp/$LIBTORCH_HOUSE_DIR
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
LIBTORCH_ABI="cxx11-abi-"
else
LIBTORCH_ABI=
fi
zip -rq /tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-$PYTORCH_BUILD_VERSION.zip libtorch
cp /tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-$PYTORCH_BUILD_VERSION.zip \
/tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-latest.zip
@ -321,8 +333,8 @@ for pkg in /$WHEELHOUSE_DIR/torch_no_python*.whl /$WHEELHOUSE_DIR/torch*linux*.w
# ROCm workaround for roctracer dlopens
if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
patchedpath=$(fname_without_so_number $destpath)
# Keep the so number for XPU dependencies and libgomp.so.1 to avoid twice load
elif [[ "$DESIRED_CUDA" == *"xpu"* || "$filename" == "libgomp.so.1" ]]; then
# Keep the so number for XPU dependencies
elif [[ "$DESIRED_CUDA" == *"xpu"* ]]; then
patchedpath=$destpath
else
patchedpath=$(fname_with_sha256 $destpath)

View File

@ -95,6 +95,12 @@ python setup.py clean
retry pip install -qr requirements.txt
retry pip install -q numpy==2.0.1
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
export _GLIBCXX_USE_CXX11_ABI=1
else
export _GLIBCXX_USE_CXX11_ABI=0
fi
if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
echo "Calling build_amd.py at $(date)"
python tools/amd_build/build_amd.py
@ -163,6 +169,12 @@ fi
)
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
LIBTORCH_ABI="cxx11-abi-"
else
LIBTORCH_ABI=
fi
(
set -x

View File

@ -20,11 +20,7 @@ fi
source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/pti/latest/env/vars.sh
source /opt/intel/oneapi/umf/latest/env/vars.sh
source /opt/intel/oneapi/ccl/latest/env/vars.sh
source /opt/intel/oneapi/mpi/latest/env/vars.sh
export USE_STATIC_MKL=1
export USE_ONEMKL=1
export USE_XCCL=1
WHEELHOUSE_DIR="wheelhousexpu"
LIBTORCH_HOUSE_DIR="libtorch_housexpu"

View File

@ -10,3 +10,5 @@ example: `py2-cuda9.0-cudnn7-ubuntu16.04`. The Docker images that are
built on Jenkins and are used in triggered builds already have this
environment variable set in their manifest. Also see
`./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`.
Our Jenkins installation is located at https://ci.pytorch.org/jenkins/.

View File

@ -35,7 +35,7 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
fi
if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *clang* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *cuda11.3* && "$BUILD_ENVIRONMENT" != *clang* ]]; then
# TODO: there is a linking issue when building with UCC using clang,
# disable it for now and to be fix later.
# TODO: disable UCC temporarily to enable CUDA 12.1 in CI
@ -171,12 +171,6 @@ fi
if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
# shellcheck disable=SC1091
source /opt/intel/oneapi/compiler/latest/env/vars.sh
# shellcheck disable=SC1091
source /opt/intel/oneapi/ccl/latest/env/vars.sh
# shellcheck disable=SC1091
source /opt/intel/oneapi/mpi/latest/env/vars.sh
# Enable XCCL build
export USE_XCCL=1
# XPU kineto feature dependencies are not fully ready, disable kineto build as temp WA
export USE_KINETO=0
export TORCH_XPU_ARCH_LIST=pvc
@ -283,8 +277,10 @@ else
# or building non-XLA tests.
if [[ "$BUILD_ENVIRONMENT" != *rocm* &&
"$BUILD_ENVIRONMENT" != *xla* ]]; then
# Install numpy-2.0.2 for builds which are backward compatible with 1.X
python -mpip install numpy==2.0.2
if [[ "$BUILD_ENVIRONMENT" != *py3.8* ]]; then
# Install numpy-2.0.2 for builds which are backward compatible with 1.X
python -mpip install numpy==2.0.2
fi
WERROR=1 python setup.py clean
@ -307,18 +303,6 @@ else
fi
pip_install_whl "$(echo dist/*.whl)"
if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
echo "Checking that xpu is compiled"
pushd dist/
if python -c 'import torch; exit(0 if torch.xpu._is_compiled() else 1)'; then
echo "XPU support is compiled in."
else
echo "XPU support is NOT compiled in."
exit 1
fi
popd
fi
# TODO: I'm not sure why, but somehow we lose verbose commands
set -x

View File

@ -59,16 +59,78 @@ else
export install_root="$(dirname $(which python))/../lib/python${py_dot}/site-packages/torch/"
fi
###############################################################################
# Setup XPU ENV
###############################################################################
if [[ "$DESIRED_CUDA" == 'xpu' ]]; then
set +u
# Refer https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html
source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/pti/latest/env/vars.sh
fi
###############################################################################
# Check GCC ABI
###############################################################################
# NOTE: As of https://github.com/pytorch/pytorch/issues/126551 we only produce
# wheels with cxx11-abi
# NOTE [ Building libtorch with old vs. new gcc ABI ]
#
# Packages built with one version of ABI could not be linked against by client
# C++ libraries that were compiled using the other version of ABI. Since both
# gcc ABIs are still common in the wild, we need to support both ABIs. Currently:
#
# - All the nightlies built on CentOS 7 + devtoolset7 use the old gcc ABI.
# - All the nightlies built on Ubuntu 16.04 + gcc 5.4 use the new gcc ABI.
echo "Checking that the gcc ABI is what we expect"
if [[ "$(uname)" != 'Darwin' ]]; then
# We also check that there are cxx11 symbols in libtorch
function is_expected() {
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* || "$DESIRED_CUDA" == *"rocm"* ]]; then
if [[ "$1" -gt 0 || "$1" == "ON " ]]; then
echo 1
fi
else
if [[ -z "$1" || "$1" == 0 || "$1" == "OFF" ]]; then
echo 1
fi
fi
}
# First we check that the env var in TorchConfig.cmake is correct
# We search for D_GLIBCXX_USE_CXX11_ABI=1 in torch/TorchConfig.cmake
torch_config="${install_root}/share/cmake/Torch/TorchConfig.cmake"
if [[ ! -f "$torch_config" ]]; then
echo "No TorchConfig.cmake found!"
ls -lah "$install_root/share/cmake/Torch"
exit 1
fi
echo "Checking the TorchConfig.cmake"
cat "$torch_config"
# The sed call below is
# don't print lines by default (only print the line we want)
# -n
# execute the following expression
# e
# replace lines that match with the first capture group and print
# s/.*D_GLIBCXX_USE_CXX11_ABI=\(.\)".*/\1/p
# any characters, D_GLIBCXX_USE_CXX11_ABI=, exactly one any character, a
# quote, any characters
# Note the exactly one single character after the '='. In the case that the
# variable is not set the '=' will be followed by a '"' immediately and the
# line will fail the match and nothing will be printed; this is what we
# want. Otherwise it will capture the 0 or 1 after the '='.
# /.*D_GLIBCXX_USE_CXX11_ABI=\(.\)".*/
# replace the matched line with the capture group and print
# /\1/p
actual_gcc_abi="$(sed -ne 's/.*D_GLIBCXX_USE_CXX11_ABI=\(.\)".*/\1/p' < "$torch_config")"
if [[ "$(is_expected "$actual_gcc_abi")" != 1 ]]; then
echo "gcc ABI $actual_gcc_abi not as expected."
exit 1
fi
# We also check that there are [not] cxx11 symbols in libtorch
#
echo "Checking that symbols in libtorch.so have the right gcc abi"
python3 "$(dirname ${BASH_SOURCE[0]})/smoke_test/check_binary_symbols.py"
@ -146,11 +208,35 @@ setup_link_flags () {
TEST_CODE_DIR="$(dirname $(realpath ${BASH_SOURCE[0]}))/test_example_code"
build_and_run_example_cpp () {
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
GLIBCXX_USE_CXX11_ABI=1
else
GLIBCXX_USE_CXX11_ABI=0
fi
setup_link_flags
g++ ${TEST_CODE_DIR}/$1.cpp -I${install_root}/include -I${install_root}/include/torch/csrc/api/include -std=gnu++17 -L${install_root}/lib ${REF_LIB} ${ADDITIONAL_LINKER_FLAGS} -ltorch $TORCH_CPU_LINK_FLAGS $TORCH_CUDA_LINK_FLAGS $C10_LINK_FLAGS -o $1
g++ ${TEST_CODE_DIR}/$1.cpp -I${install_root}/include -I${install_root}/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=$GLIBCXX_USE_CXX11_ABI -std=gnu++17 -L${install_root}/lib ${REF_LIB} ${ADDITIONAL_LINKER_FLAGS} -ltorch $TORCH_CPU_LINK_FLAGS $TORCH_CUDA_LINK_FLAGS $C10_LINK_FLAGS -o $1
./$1
}
build_example_cpp_with_incorrect_abi () {
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
GLIBCXX_USE_CXX11_ABI=0
else
GLIBCXX_USE_CXX11_ABI=1
fi
set +e
setup_link_flags
g++ ${TEST_CODE_DIR}/$1.cpp -I${install_root}/include -I${install_root}/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=$GLIBCXX_USE_CXX11_ABI -std=gnu++17 -L${install_root}/lib ${REF_LIB} ${ADDITIONAL_LINKER_FLAGS} -ltorch $TORCH_CPU_LINK_FLAGS $TORCH_CUDA_LINK_FLAGS $C10_LINK_FLAGS -o $1
ERRCODE=$?
set -e
if [ "$ERRCODE" -eq "0" ]; then
echo "Building example with incorrect ABI didn't throw error. Aborting."
exit 1
else
echo "Building example with incorrect ABI throws expected error. Proceeding."
fi
}
###############################################################################
# Check simple Python/C++ calls
###############################################################################
@ -160,6 +246,11 @@ if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
fi
build_and_run_example_cpp simple-torch-test
# `_GLIBCXX_USE_CXX11_ABI` is always ignored by gcc in devtoolset7, so we test
# the expected failure case for Ubuntu 16.04 + gcc 5.4 only.
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
build_example_cpp_with_incorrect_abi simple-torch-test
fi
else
pushd /tmp
python -c 'import torch'
@ -216,14 +307,6 @@ else
fi
fi
###############################################################################
# Check XPU configured correctly
###############################################################################
if [[ "$DESIRED_CUDA" == 'xpu' && "$PACKAGE_TYPE" != 'libtorch' ]]; then
echo "Checking that xpu is compiled"
python -c 'import torch; exit(0 if torch.xpu._is_compiled() else 1)'
fi
###############################################################################
# Check CUDA configured correctly
###############################################################################
@ -302,22 +385,10 @@ except RuntimeError as e:
fi
###############################################################################
# Check for C++ ABI compatibility to GCC-11 - GCC 13
# Check for C++ ABI compatibility between gcc7 and gcc9 compiled binaries
###############################################################################
if [[ "$(uname)" == 'Linux' && "$PACKAGE_TYPE" == 'manywheel' ]]; then
pushd /tmp
# Per https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html
# gcc-11 is ABI16, gcc-13 is ABI18, gcc-14 is ABI19
# gcc 11 - CUDA 11.8, xpu, rocm
# gcc 13 - CUDA 12.6, 12.8 and cpu
# Please see issue for reference: https://github.com/pytorch/pytorch/issues/152426
if [[ "$(uname -m)" == "s390x" ]]; then
cxx_abi="19"
elif [[ "$DESIRED_CUDA" != 'cu118' && "$DESIRED_CUDA" != 'xpu' && "$DESIRED_CUDA" != 'rocm'* ]]; then
cxx_abi="18"
else
cxx_abi="16"
fi
python -c "import torch; exit(0 if torch._C._PYBIND11_BUILD_ABI == '_cxxabi10${cxx_abi}' else 1)"
python -c "import torch; exit(0 if torch.compiled_with_cxx11_abi() else (0 if torch._C._PYBIND11_BUILD_ABI == '_cxxabi1011' else 1))"
popd
fi

View File

@ -13,6 +13,10 @@ if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
# HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
unset HIP_PLATFORM
export PYTORCH_TEST_WITH_ROCM=1
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
# improve rccl performance for distributed tests
export HSA_FORCE_FINE_GRAIN_PCIE=1
fi
# TODO: Renable libtorch testing for MacOS, see https://github.com/pytorch/pytorch/issues/62598

View File

@ -202,7 +202,7 @@ function install_torchrec_and_fbgemm() {
function clone_pytorch_xla() {
if [[ ! -d ./xla ]]; then
git clone --recursive --quiet https://github.com/pytorch/xla.git
git clone --recursive -b r2.7 https://github.com/pytorch/xla.git
pushd xla
# pin the xla hash so that we don't get broken by changes to xla
git checkout "$(cat ../.github/ci_commit_pins/xla.txt)"

View File

@ -1,50 +1,31 @@
#!/bin/bash
# Script for installing sccache on the xla build job, which uses xla's docker
# image, which has sccache installed but doesn't write the stubs. This is
# mostly copied from .ci/docker/install_cache.sh. Changes are: removing checks
# that will always return the same thing, ex checks for for rocm, CUDA, changing
# the path where sccache is installed, not changing /etc/environment, and not
# installing/downloading sccache as it is already in the docker image.
# image and doesn't have sccache installed on it. This is mostly copied from
# .ci/docker/install_cache.sh. Changes are: removing checks that will always
# return the same thing, ex checks for for rocm, CUDA, and changing the path
# where sccache is installed, and not changing /etc/environment.
set -ex -o pipefail
install_binary() {
echo "Downloading sccache binary from S3 repo"
curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /tmp/cache/bin/sccache
}
mkdir -p /tmp/cache/bin
mkdir -p /tmp/cache/lib
export PATH="/tmp/cache/bin:$PATH"
install_binary
chmod a+x /tmp/cache/bin/sccache
function write_sccache_stub() {
# Unset LD_PRELOAD for ps because of asan + ps issues
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90589
if [ "$1" == "gcc" ]; then
# Do not call sccache recursively when dumping preprocessor argument
# For some reason it's very important for the first cached nvcc invocation
cat >"/tmp/cache/bin/$1" <<EOF
#!/bin/sh
# sccache does not support -E flag, so we need to call the original compiler directly in order to avoid calling this wrapper recursively
for arg in "\$@"; do
if [ "\$arg" = "-E" ]; then
exec $(which "$1") "\$@"
fi
done
if [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then
exec sccache $(which "$1") "\$@"
else
exec $(which "$1") "\$@"
fi
EOF
else
cat >"/tmp/cache/bin/$1" <<EOF
#!/bin/sh
if [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then
exec sccache $(which "$1") "\$@"
else
exec $(which "$1") "\$@"
fi
EOF
fi
# shellcheck disable=SC2086
# shellcheck disable=SC2059
printf "#!/bin/sh\nif [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then\n exec sccache $(which $1) \"\$@\"\nelse\n exec $(which $1) \"\$@\"\nfi" > "/tmp/cache/bin/$1"
chmod a+x "/tmp/cache/bin/$1"
}

View File

@ -33,15 +33,56 @@ if which sccache > /dev/null; then
export PATH="${tmp_dir}:$PATH"
fi
print_cmake_info
if [[ ${BUILD_ENVIRONMENT} == *"distributed"* ]]; then
# Needed for inductor benchmarks, as lots of HF networks make `torch.distribtued` calls
USE_DISTRIBUTED=1 USE_OPENMP=1 WERROR=1 python setup.py bdist_wheel
else
cross_compile_arm64() {
# Cross compilation for arm64
# Explicitly set USE_DISTRIBUTED=0 to align with the default build config on mac. This also serves as the sole CI config that tests
# that building with USE_DISTRIBUTED=0 works at all. See https://github.com/pytorch/pytorch/issues/86448
USE_DISTRIBUTED=0 CMAKE_OSX_ARCHITECTURES=arm64 MACOSX_DEPLOYMENT_TARGET=11.0 USE_MKLDNN=OFF USE_QNNPACK=OFF WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}
compile_arm64() {
# Compilation for arm64
# TODO: Compile with OpenMP support (but this causes CI regressions as cross-compilation were done with OpenMP disabled)
USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}
compile_x86_64() {
USE_DISTRIBUTED=0 WERROR=1 python setup.py bdist_wheel --plat-name=macosx_10_9_x86_64
}
build_lite_interpreter() {
echo "Testing libtorch (lite interpreter)."
CPP_BUILD="$(pwd)/../cpp_build"
# Ensure the removal of the tmp directory
trap 'rm -rfv ${CPP_BUILD}' EXIT
rm -rf "${CPP_BUILD}"
mkdir -p "${CPP_BUILD}/caffe2"
# It looks libtorch need to be built in "${CPP_BUILD}/caffe2 folder.
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd "${CPP_BUILD}/caffe2" || exit
VERBOSE=1 DEBUG=1 python "${BUILD_LIBTORCH_PY}"
popd || exit
"${CPP_BUILD}/caffe2/build/bin/test_lite_interpreter_runtime"
}
print_cmake_info
if [[ ${BUILD_ENVIRONMENT} = *arm64* ]]; then
if [[ $(uname -m) == "arm64" ]]; then
compile_arm64
else
cross_compile_arm64
fi
elif [[ ${BUILD_ENVIRONMENT} = *lite-interpreter* ]]; then
export BUILD_LITE_INTERPRETER=1
build_lite_interpreter
else
compile_x86_64
fi
if which sccache > /dev/null; then
print_sccache_stats
fi

View File

@ -42,16 +42,6 @@ test_python_all() {
assert_git_not_dirty
}
test_python_mps() {
setup_test_python
time python test/run_test.py --verbose --mps
MTL_CAPTURE_ENABLED=1 ${CONDA_RUN} python3 test/test_mps.py --verbose -k test_metal_capture
assert_git_not_dirty
}
test_python_shard() {
if [[ -z "$NUM_TEST_SHARDS" ]]; then
echo "NUM_TEST_SHARDS must be defined to run a Python test shard"
@ -231,55 +221,25 @@ test_torchbench_smoketest() {
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
local backend=eager
local dtype=notset
local device=mps
local models=(hf_T5 llama BERT_pytorch dcgan hf_GPT2 yolov3 resnet152 sam pytorch_unet stable_diffusion_text_encoder speech_transformer Super_SloMo)
local hf_models=(GoogleFnet YituTechConvBert)
for backend in eager inductor; do
touch "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_training_${device}_performance.csv"
touch "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv"
for dtype in notset float16 bfloat16; do
echo "Launching torchbench inference performance run for backend ${backend} and dtype ${dtype}"
local dtype_arg="--${dtype}"
if [ "$dtype" == notset ]; then
dtype_arg="--float32"
fi
touch "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv"
for model in "${models[@]}"; do
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--performance --only "$model" --backend "$backend" --inference --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv" || true
if [ "$backend" == "inductor" ]; then
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--accuracy --only "$model" --backend "$backend" --inference --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_accuracy.csv" || true
fi
done
for model in "${hf_models[@]}"; do
if [ "$backend" == "inductor" ]; then
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/huggingface.py \
--performance --only "$model" --backend "$backend" --inference --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv" || true
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/huggingface.py \
--accuracy --only "$model" --backend "$backend" --inference --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_accuracy.csv" || true
fi
done
done
for dtype in notset amp; do
echo "Launching torchbench training performance run for backend ${backend} and dtype ${dtype}"
touch "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_training_${device}_performance.csv"
local dtype_arg="--${dtype}"
if [ "$dtype" == notset ]; then
dtype_arg="--float32"
fi
for model in "${models[@]}"; do
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--performance --only "$model" --backend "$backend" --training --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_training_${device}_performance.csv" || true
done
done
echo "Setup complete, launching torchbench training performance run"
for model in hf_T5 llama BERT_pytorch dcgan hf_GPT2 yolov3 resnet152; do
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--performance --only "$model" --backend "$backend" --training --devices "$device" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_training_${device}_performance.csv"
done
echo "Launching torchbench inference performance run"
for model in hf_T5 llama BERT_pytorch dcgan hf_GPT2 yolov3 resnet152; do
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--performance --only "$model" --backend "$backend" --inference --devices "$device" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv"
done
echo "Pytorch benchmark on mps device completed"
@ -331,8 +291,6 @@ elif [[ $TEST_CONFIG == *"perf_timm"* ]]; then
test_timm_perf
elif [[ $TEST_CONFIG == *"perf_smoketest"* ]]; then
test_torchbench_smoketest
elif [[ $TEST_CONFIG == *"mps"* ]]; then
test_python_mps
elif [[ $NUM_TEST_SHARDS -gt 1 ]]; then
test_python_shard "${SHARD_NUMBER}"
if [[ "${SHARD_NUMBER}" == 1 ]]; then

View File

@ -0,0 +1,22 @@
#!/bin/bash
set -e
run_test () {
rm -rf test_tmp/ && mkdir test_tmp/ && cd test_tmp/
"$@"
cd .. && rm -rf test_tmp/
}
get_runtime_of_command () {
TIMEFORMAT=%R
# runtime=$( { time ($@ &> /dev/null); } 2>&1 1>/dev/null)
runtime=$( { time "$@"; } 2>&1 1>/dev/null)
if [[ $runtime == *"Error"* ]]; then
exit 1
fi
runtime=${runtime#+++ $@}
runtime=$(python -c "print($runtime)")
echo "$runtime"
}

View File

@ -0,0 +1,91 @@
import argparse
import json
import math
import sys
parser = argparse.ArgumentParser()
parser.add_argument(
"--test-name", dest="test_name", action="store", required=True, help="test name"
)
parser.add_argument(
"--sample-stats",
dest="sample_stats",
action="store",
required=True,
help="stats from sample",
)
parser.add_argument(
"--update",
action="store_true",
help="whether to update baseline using stats from sample",
)
args = parser.parse_args()
test_name = args.test_name
if "cpu" in test_name:
backend = "cpu"
elif "gpu" in test_name:
backend = "gpu"
data_file_path = f"../{backend}_runtime.json"
with open(data_file_path) as data_file:
data = json.load(data_file)
if test_name in data:
mean = float(data[test_name]["mean"])
sigma = float(data[test_name]["sigma"])
else:
# Let the test pass if baseline number doesn't exist
mean = sys.maxsize
sigma = 0.001
print("population mean: ", mean)
print("population sigma: ", sigma)
# Let the test pass if baseline number is NaN (which happened in
# the past when we didn't have logic for catching NaN numbers)
if math.isnan(mean) or math.isnan(sigma):
mean = sys.maxsize
sigma = 0.001
sample_stats_data = json.loads(args.sample_stats)
sample_mean = float(sample_stats_data["mean"])
sample_sigma = float(sample_stats_data["sigma"])
print("sample mean: ", sample_mean)
print("sample sigma: ", sample_sigma)
if math.isnan(sample_mean):
raise Exception("""Error: sample mean is NaN""") # noqa: TRY002
elif math.isnan(sample_sigma):
raise Exception("""Error: sample sigma is NaN""") # noqa: TRY002
z_value = (sample_mean - mean) / sigma
print("z-value: ", z_value)
if z_value >= 3:
raise Exception( # noqa: TRY002
f"""\n
z-value >= 3, there is high chance of perf regression.\n
To reproduce this regression, run
`cd .ci/pytorch/perf_test/ && bash {test_name}.sh` on your local machine
and compare the runtime before/after your code change.
"""
)
else:
print("z-value < 3, no perf regression detected.")
if args.update:
print("We will use these numbers as new baseline.")
new_data_file_path = f"../new_{backend}_runtime.json"
with open(new_data_file_path) as new_data_file:
new_data = json.load(new_data_file)
new_data[test_name] = {}
new_data[test_name]["mean"] = sample_mean
new_data[test_name]["sigma"] = max(sample_sigma, sample_mean * 0.1)
with open(new_data_file_path, "w") as new_data_file:
json.dump(new_data, new_data_file, indent=4)

View File

@ -0,0 +1,18 @@
import json
import sys
import numpy
sample_data_list = sys.argv[1:]
sample_data_list = [float(v.strip()) for v in sample_data_list]
sample_mean = numpy.mean(sample_data_list)
sample_sigma = numpy.std(sample_data_list)
data = {
"mean": sample_mean,
"sigma": sample_sigma,
}
print(json.dumps(data))

View File

@ -0,0 +1,43 @@
#!/bin/bash
set -e
. ./common.sh
test_cpu_speed_mini_sequence_labeler () {
echo "Testing: mini sequence labeler, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 726567a455edbfda6199445922a8cfee82535664
cd scripts/mini_sequence_labeler
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py)
SAMPLE_ARRAY+=("${runtime}")
done
cd ../../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_mini_sequence_labeler "$@"
fi

View File

@ -0,0 +1,45 @@
#!/bin/bash
set -e
. ./common.sh
test_cpu_speed_mnist () {
echo "Testing: MNIST, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/examples.git -b perftests
cd examples/mnist
conda install -c pytorch torchvision-cpu
# Download data
python main.py --epochs 0
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_mnist "$@"
fi

View File

@ -0,0 +1,29 @@
#!/bin/bash
. ./common.sh
test_cpu_speed_torch () {
echo "Testing: torch.*, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/yf225/perf-tests.git
if [ "$1" == "compare_with_baseline" ]; then
export ARGS=(--compare ../cpu_runtime.json)
elif [ "$1" == "compare_and_update" ]; then
export ARGS=(--compare ../cpu_runtime.json --update ../new_cpu_runtime.json)
elif [ "$1" == "update_only" ]; then
export ARGS=(--update ../new_cpu_runtime.json)
fi
if ! python perf-tests/modules/test_cpu_torch.py "${ARGS[@]}"; then
echo "To reproduce this regression, run \`cd .ci/pytorch/perf_test/ && bash ${FUNCNAME[0]}.sh\` on your local machine and compare the runtime before/after your code change."
exit 1
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_torch "$@"
fi

View File

@ -0,0 +1,29 @@
#!/bin/bash
. ./common.sh
test_cpu_speed_torch_tensor () {
echo "Testing: torch.Tensor.*, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/yf225/perf-tests.git
if [ "$1" == "compare_with_baseline" ]; then
export ARGS=(--compare ../cpu_runtime.json)
elif [ "$1" == "compare_and_update" ]; then
export ARGS=(--compare ../cpu_runtime.json --update ../new_cpu_runtime.json)
elif [ "$1" == "update_only" ]; then
export ARGS=(--update ../new_cpu_runtime.json)
fi
if ! python perf-tests/modules/test_cpu_torch_tensor.py "${ARGS[@]}"; then
echo "To reproduce this regression, run \`cd .ci/pytorch/perf_test/ && bash ${FUNCNAME[0]}.sh\` on your local machine and compare the runtime before/after your code change."
exit 1
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_torch_tensor "$@"
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_cudnn_lstm () {
echo "Testing: CuDNN LSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python cudnn_lstm.py --skip-cpu-governor-check)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_cudnn_lstm "$@"
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_lstm () {
echo "Testing: LSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python lstm.py --skip-cpu-governor-check)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_lstm "$@"
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_mlstm () {
echo "Testing: MLSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python mlstm.py --skip-cpu-governor-check)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_mlstm "$@"
fi

View File

@ -0,0 +1,48 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_mnist () {
echo "Testing: MNIST, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/examples.git -b perftests
cd examples/mnist
conda install -c pytorch torchvision
# Download data
python main.py --epochs 0
SAMPLE_ARRAY=()
NUM_RUNS=$1
# Needs warm up to get accurate number
python main.py --epochs 1 --no-log
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_mnist "$@"
fi

View File

@ -0,0 +1,53 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_word_language_model () {
echo "Testing: word language model on Wikitext-2, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/examples.git -b perftests
cd examples/word_language_model
cd data/wikitext-2
# Reduce dataset size, so that we can have more runs per test
sed -n '1,200p' test.txt > test_tmp.txt
sed -n '1,1000p' train.txt > train_tmp.txt
sed -n '1,200p' valid.txt > valid_tmp.txt
mv test_tmp.txt test.txt
mv train_tmp.txt train.txt
mv valid_tmp.txt valid.txt
cd ../..
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --cuda --epochs 1)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_word_language_model "$@"
fi

View File

@ -0,0 +1,14 @@
import json
import sys
data_file_path = sys.argv[1]
commit_hash = sys.argv[2]
with open(data_file_path) as data_file:
data = json.load(data_file)
data["commit"] = commit_hash
with open(data_file_path, "w") as data_file:
json.dump(data, data_file)

View File

@ -119,6 +119,12 @@ popd
git rm -rf "$install_path" || true
mv "$pt_checkout/docs/build/html" "$install_path"
# Prevent Google from indexing $install_path/_modules. This folder contains
# generated source files.
# NB: the following only works on gnu sed. The sed shipped with mac os is different.
# One can `brew install gnu-sed` on a mac and then use "gsed" instead of "sed".
find "$install_path/_modules" -name "*.html" -print0 | xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'
git add "$install_path" || true
git status
git config user.email "soumith+bot@pytorch.org"

View File

@ -76,7 +76,7 @@ fi
# Environment initialization
if [[ "$(uname)" == Darwin ]]; then
# Install the testing dependencies
retry pip install -q future hypothesis ${NUMPY_PACKAGE} ${PROTOBUF_PACKAGE} pytest setuptools six typing_extensions pyyaml
retry conda install -yq future hypothesis ${NUMPY_PACKAGE} ${PROTOBUF_PACKAGE} pytest setuptools six typing_extensions pyyaml
else
retry pip install -qr requirements.txt || true
retry pip install -q hypothesis protobuf pytest setuptools || true
@ -91,6 +91,7 @@ fi
echo "Testing with:"
pip freeze
conda list || true
##############################################################################
# Smoke tests

View File

@ -0,0 +1,71 @@
#!/bin/bash
SCRIPT_PARENT_DIR=$(dirname "${BASH_SOURCE[0]}")
# shellcheck source=.ci/pytorch/common.sh
source "$SCRIPT_PARENT_DIR/common.sh"
cd .ci/pytorch/perf_test
echo "Running CPU perf test for PyTorch..."
pip install -q awscli
# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read
# More info at https://github.com/aws/aws-cli/issues/2321
aws configure set default.s3.multipart_threshold 5GB
UPSTREAM_DEFAULT_BRANCH="$(git remote show https://github.com/pytorch/pytorch.git | awk '/HEAD branch/ {print $NF}')"
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# Get current default branch commit hash
DEFAULT_BRANCH_COMMIT_ID=$(git log --format="%H" -n 1)
export DEFAULT_BRANCH_COMMIT_ID
fi
# Find the default branch commit to test against
git remote add upstream https://github.com/pytorch/pytorch.git
git fetch upstream
IFS=$'\n'
while IFS='' read -r commit_id; do
if aws s3 ls s3://ossci-perf-test/pytorch/cpu_runtime/"${commit_id}".json; then
LATEST_TESTED_COMMIT=${commit_id}
break
fi
done < <(git rev-list upstream/"$UPSTREAM_DEFAULT_BRANCH")
aws s3 cp s3://ossci-perf-test/pytorch/cpu_runtime/"${LATEST_TESTED_COMMIT}".json cpu_runtime.json
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# Prepare new baseline file
cp cpu_runtime.json new_cpu_runtime.json
python update_commit_hash.py new_cpu_runtime.json "${DEFAULT_BRANCH_COMMIT_ID}"
fi
# Include tests
# shellcheck source=./perf_test/test_cpu_speed_mini_sequence_labeler.sh
. ./test_cpu_speed_mini_sequence_labeler.sh
# shellcheck source=./perf_test/test_cpu_speed_mnist.sh
. ./test_cpu_speed_mnist.sh
# shellcheck source=./perf_test/test_cpu_speed_torch.sh
. ./test_cpu_speed_torch.sh
# shellcheck source=./perf_test/test_cpu_speed_torch_tensor.sh
. ./test_cpu_speed_torch_tensor.sh
# Run tests
export TEST_MODE="compare_with_baseline"
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
export TEST_MODE="compare_and_update"
fi
# Operator tests
run_test test_cpu_speed_torch ${TEST_MODE}
run_test test_cpu_speed_torch_tensor ${TEST_MODE}
# Sample model tests
run_test test_cpu_speed_mini_sequence_labeler 20 ${TEST_MODE}
run_test test_cpu_speed_mnist 20 ${TEST_MODE}
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# This could cause race condition if we are testing the same default branch commit twice,
# but the chance of them executing this line at the same time is low.
aws s3 cp new_cpu_runtime.json s3://ossci-perf-test/pytorch/cpu_runtime/"${DEFAULT_BRANCH_COMMIT_ID}".json --acl public-read
fi

View File

@ -0,0 +1,76 @@
#!/bin/bash
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
pushd .ci/pytorch/perf_test
echo "Running GPU perf test for PyTorch..."
# Trying to uninstall PyYAML can cause problem. Workaround according to:
# https://github.com/pypa/pip/issues/5247#issuecomment-415571153
pip install -q awscli --ignore-installed PyYAML
# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read
# More info at https://github.com/aws/aws-cli/issues/2321
aws configure set default.s3.multipart_threshold 5GB
UPSTREAM_DEFAULT_BRANCH="$(git remote show https://github.com/pytorch/pytorch.git | awk '/HEAD branch/ {print $NF}')"
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# Get current default branch commit hash
DEFAULT_BRANCH_COMMIT_ID=$(git log --format="%H" -n 1)
export DEFAULT_BRANCH_COMMIT_ID
fi
# Find the default branch commit to test against
git remote add upstream https://github.com/pytorch/pytorch.git
git fetch upstream
IFS=$'\n'
while IFS='' read -r commit_id; do
if aws s3 ls s3://ossci-perf-test/pytorch/gpu_runtime/"${commit_id}".json; then
LATEST_TESTED_COMMIT=${commit_id}
break
fi
done < <(git rev-list upstream/"$UPSTREAM_DEFAULT_BRANCH")
aws s3 cp s3://ossci-perf-test/pytorch/gpu_runtime/"${LATEST_TESTED_COMMIT}".json gpu_runtime.json
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# Prepare new baseline file
cp gpu_runtime.json new_gpu_runtime.json
python update_commit_hash.py new_gpu_runtime.json "${DEFAULT_BRANCH_COMMIT_ID}"
fi
# Include tests
# shellcheck source=./perf_test/test_gpu_speed_mnist.sh
. ./test_gpu_speed_mnist.sh
# shellcheck source=./perf_test/test_gpu_speed_word_language_model.sh
. ./test_gpu_speed_word_language_model.sh
# shellcheck source=./perf_test/test_gpu_speed_cudnn_lstm.sh
. ./test_gpu_speed_cudnn_lstm.sh
# shellcheck source=./perf_test/test_gpu_speed_lstm.sh
. ./test_gpu_speed_lstm.sh
# shellcheck source=./perf_test/test_gpu_speed_mlstm.sh
. ./test_gpu_speed_mlstm.sh
# Run tests
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
run_test test_gpu_speed_mnist 20 compare_and_update
run_test test_gpu_speed_word_language_model 20 compare_and_update
run_test test_gpu_speed_cudnn_lstm 20 compare_and_update
run_test test_gpu_speed_lstm 20 compare_and_update
run_test test_gpu_speed_mlstm 20 compare_and_update
else
run_test test_gpu_speed_mnist 20 compare_with_baseline
run_test test_gpu_speed_word_language_model 20 compare_with_baseline
run_test test_gpu_speed_cudnn_lstm 20 compare_with_baseline
run_test test_gpu_speed_lstm 20 compare_with_baseline
run_test test_gpu_speed_mlstm 20 compare_with_baseline
fi
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# This could cause race condition if we are testing the same default branch commit twice,
# but the chance of them executing this line at the same time is low.
aws s3 cp new_gpu_runtime.json s3://ossci-perf-test/pytorch/gpu_runtime/"${DEFAULT_BRANCH_COMMIT_ID}".json --acl public-read
fi
popd

View File

@ -80,7 +80,7 @@ def grep_symbols(lib: str, patterns: list[Any]) -> list[str]:
return functools.reduce(list.__add__, (x.result() for x in tasks), [])
def check_lib_symbols_for_abi_correctness(lib: str) -> None:
def check_lib_symbols_for_abi_correctness(lib: str, pre_cxx11_abi: bool = True) -> None:
print(f"lib: {lib}")
cxx11_symbols = grep_symbols(lib, LIBTORCH_CXX11_PATTERNS)
pre_cxx11_symbols = grep_symbols(lib, LIBTORCH_PRE_CXX11_PATTERNS)
@ -88,12 +88,28 @@ def check_lib_symbols_for_abi_correctness(lib: str) -> None:
num_pre_cxx11_symbols = len(pre_cxx11_symbols)
print(f"num_cxx11_symbols: {num_cxx11_symbols}")
print(f"num_pre_cxx11_symbols: {num_pre_cxx11_symbols}")
if num_pre_cxx11_symbols > 0:
raise RuntimeError(
f"Found pre-cxx11 symbols, but there shouldn't be any, see: {pre_cxx11_symbols[:100]}"
if pre_cxx11_abi:
if num_cxx11_symbols > 0:
raise RuntimeError(
f"Found cxx11 symbols, but there shouldn't be any, see: {cxx11_symbols[:100]}"
)
if num_pre_cxx11_symbols < 1000:
raise RuntimeError("Didn't find enough pre-cxx11 symbols.")
# Check for no recursive iterators, regression test for https://github.com/pytorch/pytorch/issues/133437
rec_iter_symbols = grep_symbols(
lib, [re.compile("std::filesystem::recursive_directory_iterator.*")]
)
if num_cxx11_symbols < 100:
raise RuntimeError("Didn't find enought cxx11 symbols")
if len(rec_iter_symbols) > 0:
raise RuntimeError(
f"recursive_directory_iterator in used pre-CXX11 binaries, see; {rec_iter_symbols}"
)
else:
if num_pre_cxx11_symbols > 0:
raise RuntimeError(
f"Found pre-cxx11 symbols, but there shouldn't be any, see: {pre_cxx11_symbols[:100]}"
)
if num_cxx11_symbols < 100:
raise RuntimeError("Didn't find enought cxx11 symbols")
def main() -> None:
@ -105,8 +121,9 @@ def main() -> None:
else:
install_root = Path(distutils.sysconfig.get_python_lib()) / "torch"
libtorch_cpu_path = str(install_root / "lib" / "libtorch_cpu.so")
check_lib_symbols_for_abi_correctness(libtorch_cpu_path)
libtorch_cpu_path = install_root / "lib" / "libtorch_cpu.so"
pre_cxx11_abi = "cxx11-abi" not in os.getenv("DESIRED_DEVTOOLSET", "")
check_lib_symbols_for_abi_correctness(libtorch_cpu_path, pre_cxx11_abi)
if __name__ == "__main__":

View File

@ -1,74 +0,0 @@
import ctypes
import os
import sys
from pathlib import Path
def get_gomp_thread():
"""
Retrieves the maximum number of OpenMP threads after loading the `libgomp.so.1` library
and the `libtorch_cpu.so` library. It then queries the
maximum number of threads available for OpenMP parallel regions using the
`omp_get_max_threads` function.
Returns:
int: The maximum number of OpenMP threads available.
Notes:
- The function assumes the default path for `libgomp.so.1` on AlmaLinux OS.
- The path to `libtorch_cpu.so` is constructed based on the Python executable's
installation directory.
- This function is specific to environments where PyTorch and OpenMP are used
together and may require adjustments for other setups.
"""
python_path = Path(sys.executable).resolve()
python_prefix = (
python_path.parent.parent
) # Typically goes to the Python installation root
# Get the additional ABI flags (if any); it may be an empty string.
abiflags = getattr(sys, "abiflags", "")
# Construct the Python directory name correctly (e.g., "python3.13t").
python_version = (
f"python{sys.version_info.major}.{sys.version_info.minor}{abiflags}"
)
libtorch_cpu_path = (
python_prefix
/ "lib"
/ python_version
/ "site-packages"
/ "torch"
/ "lib"
/ "libtorch_cpu.so"
)
# use the default gomp path of AlmaLinux OS
libgomp_path = "/usr/lib64/libgomp.so.1"
os.environ["GOMP_CPU_AFFINITY"] = "0-3"
libgomp = ctypes.CDLL(libgomp_path)
libgomp = ctypes.CDLL(libtorch_cpu_path)
libgomp.omp_get_max_threads.restype = ctypes.c_int
libgomp.omp_get_max_threads.argtypes = []
omp_max_threads = libgomp.omp_get_max_threads()
return omp_max_threads
def main():
omp_max_threads = get_gomp_thread()
print(
f"omp_max_threads after loading libgomp.so and libtorch_cpu.so: {omp_max_threads}"
)
if omp_max_threads == 1:
raise RuntimeError(
"omp_max_threads is 1. Check whether libgomp.so is loaded twice."
)
if __name__ == "__main__":
main()

View File

@ -7,7 +7,6 @@ import subprocess
import sys
from pathlib import Path
from tempfile import NamedTemporaryFile
from typing import Optional
import torch
import torch._dynamo
@ -196,41 +195,8 @@ def test_cuda_gds_errors_captured() -> None:
)
def find_pypi_package_version(package: str) -> Optional[str]:
from importlib import metadata
dists = metadata.distributions()
for dist in dists:
if dist.metadata["Name"].startswith(package):
return dist.version
return None
def cudnn_to_version_str(cudnn_version: int) -> str:
patch = int(cudnn_version % 10)
minor = int((cudnn_version / 100) % 100)
major = int((cudnn_version / 10000) % 10000)
return f"{major}.{minor}.{patch}"
def compare_pypi_to_torch_versions(
package: str, pypi_version: str, torch_version: str
) -> None:
if pypi_version is None:
raise RuntimeError(f"Can't find {package} in PyPI for Torch: {torch_version}")
if pypi_version.startswith(torch_version):
print(f"Found matching {package}. Torch: {torch_version} PyPI {pypi_version}")
else:
raise RuntimeError(
f"Wrong {package} version. Torch: {torch_version} PyPI: {pypi_version}"
)
def smoke_test_cuda(
package: str,
runtime_error_check: str,
torch_compile_check: str,
pypi_pkg_check: str,
package: str, runtime_error_check: str, torch_compile_check: str
) -> None:
if not torch.cuda.is_available() and is_cuda_system:
raise RuntimeError(f"Expected CUDA {gpu_arch_ver}. However CUDA is not loaded.")
@ -260,30 +226,20 @@ def smoke_test_cuda(
raise RuntimeError(
f"Wrong CUDA version. Loaded: {torch.version.cuda} Expected: {gpu_arch_ver}"
)
print(f"torch cuda: {torch.version.cuda}")
# todo add cudnn version validation
print(f"torch cudnn: {torch.backends.cudnn.version()}")
print(f"cuDNN enabled? {torch.backends.cudnn.enabled}")
torch.cuda.init()
print("CUDA initialized successfully")
print(f"Number of CUDA devices: {torch.cuda.device_count()}")
for i in range(torch.cuda.device_count()):
print(f"Device {i}: {torch.cuda.get_device_name(i)}")
print(f"cuDNN enabled? {torch.backends.cudnn.enabled}")
torch_cudnn_version = cudnn_to_version_str(torch.backends.cudnn.version())
print(f"Torch cuDNN version: {torch_cudnn_version}")
# nccl is availbale only on Linux
if sys.platform in ["linux", "linux2"]:
torch_nccl_version = ".".join(str(v) for v in torch.cuda.nccl.version())
print(f"Torch nccl; version: {torch_nccl_version}")
# Pypi dependencies are installed on linux ony and nccl is availbale only on Linux.
if pypi_pkg_check == "enabled" and sys.platform in ["linux", "linux2"]:
compare_pypi_to_torch_versions(
"cudnn", find_pypi_package_version("nvidia-cudnn"), torch_cudnn_version
)
compare_pypi_to_torch_versions(
"nccl", find_pypi_package_version("nvidia-nccl"), torch_nccl_version
)
print(f"torch nccl version: {torch.cuda.nccl.version()}")
if runtime_error_check == "enabled":
test_cuda_runtime_errors_captured()
@ -442,13 +398,6 @@ def parse_args():
choices=["enabled", "disabled"],
default="enabled",
)
parser.add_argument(
"--pypi-pkg-check",
help="Check pypi package versions cudnn and nccl",
type=str,
choices=["enabled", "disabled"],
default="enabled",
)
return parser.parse_args()
@ -473,10 +422,7 @@ def main() -> None:
smoke_test_modules()
smoke_test_cuda(
options.package,
options.runtime_error_check,
options.torch_compile_check,
options.pypi_pkg_check,
options.package, options.runtime_error_check, options.torch_compile_check
)

View File

@ -191,10 +191,6 @@ if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
# shellcheck disable=SC1091
source /opt/intel/oneapi/umf/latest/env/vars.sh
fi
# shellcheck disable=SC1091
source /opt/intel/oneapi/ccl/latest/env/vars.sh
# shellcheck disable=SC1091
source /opt/intel/oneapi/mpi/latest/env/vars.sh
# Check XPU status before testing
xpu-smi discovery
fi
@ -318,12 +314,6 @@ test_python() {
assert_git_not_dirty
}
test_python_smoke() {
# Smoke tests for H100
time python test/run_test.py --include test_matmul_cuda inductor/test_fp8 inductor/test_max_autotune $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
assert_git_not_dirty
}
test_lazy_tensor_meta_reference_disabled() {
export TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1
echo "Testing lazy tensor operations without meta reference"
@ -408,15 +398,8 @@ test_inductor_aoti() {
# We need to hipify before building again
python3 tools/amd_build/build_amd.py
fi
if [[ "$BUILD_ENVIRONMENT" == *sm86* ]]; then
BUILD_AOT_INDUCTOR_TEST=1 TORCH_CUDA_ARCH_LIST=8.6 USE_FLASH_ATTENTION=OFF python setup.py develop
# TODO: Replace me completely, as one should not use conda libstdc++, nor need special path to TORCH_LIB
LD_LIBRARY_PATH=/opt/conda/envs/py_3.10/lib/:${TORCH_LIB_DIR}:$LD_LIBRARY_PATH
CPP_TESTS_DIR="${BUILD_BIN_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_aoti_abi_check cpp/test_aoti_inference -dist=loadfile
else
BUILD_AOT_INDUCTOR_TEST=1 python setup.py develop
CPP_TESTS_DIR="${BUILD_BIN_DIR}" LD_LIBRARY_PATH="${TORCH_LIB_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_aoti_abi_check cpp/test_aoti_inference -dist=loadfile
fi
BUILD_AOT_INDUCTOR_TEST=1 python setup.py develop
CPP_TESTS_DIR="${BUILD_BIN_DIR}" LD_LIBRARY_PATH="${TORCH_LIB_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_aoti_abi_check cpp/test_aoti_inference
}
test_inductor_cpp_wrapper_shard() {
@ -431,11 +414,10 @@ test_inductor_cpp_wrapper_shard() {
if [[ "$1" -eq "2" ]]; then
# For now, manually put the opinfo tests in shard 2, and all other tests in
# shard 1. Run all CPU tests, as well as specific GPU tests triggering past
# bugs, for now.
# shard 1. Test specific things triggering past bugs, for now.
python test/run_test.py \
--include inductor/test_torchinductor_opinfo \
-k 'linalg or to_sparse or TestInductorOpInfoCPU' \
-k 'linalg or to_sparse' \
--verbose
exit
fi
@ -1193,6 +1175,7 @@ build_xla() {
# These functions are defined in .circleci/common.sh in pytorch/xla repo
retry install_pre_deps_pytorch_xla $XLA_DIR $USE_CACHE
CMAKE_PREFIX_PATH="${SITE_PACKAGES}/torch:${CMAKE_PREFIX_PATH}" XLA_SANDBOX_BUILD=1 build_torch_xla $XLA_DIR
retry install_post_deps_pytorch_xla
assert_git_not_dirty
}
@ -1492,8 +1475,11 @@ test_executorch() {
pushd /executorch
export PYTHON_EXECUTABLE=python
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
export EXECUTORCH_BUILD_PYBIND=ON
export CMAKE_ARGS="-DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
# For llama3
bash examples/models/llama3_2_vision/install_requirements.sh
# NB: We need to rebuild ExecuTorch runner here because it depends on PyTorch
# from the PR
bash .ci/scripts/setup-linux.sh --build-tool cmake
@ -1536,33 +1522,12 @@ test_linux_aarch64() {
inductor/test_inplacing_pass inductor/test_kernel_benchmark inductor/test_layout_optim \
inductor/test_max_autotune inductor/test_memory_planning inductor/test_metrics inductor/test_multi_kernel inductor/test_pad_mm \
inductor/test_pattern_matcher inductor/test_perf inductor/test_profiler inductor/test_select_algorithm inductor/test_smoke \
inductor/test_split_cat_fx_passes inductor/test_compile inductor/test_torchinductor \
inductor/test_split_cat_fx_passes inductor/test_standalone_compile inductor/test_torchinductor \
inductor/test_torchinductor_codegen_dynamic_shapes inductor/test_torchinductor_dynamic_shapes inductor/test_memory \
inductor/test_triton_cpu_backend inductor/test_triton_extension_backend inductor/test_mkldnn_pattern_matcher inductor/test_cpu_cpp_wrapper \
--shard "$SHARD_NUMBER" "$NUM_TEST_SHARDS" --verbose
}
test_operator_benchmark() {
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
TEST_DIR=$(pwd)
test_inductor_set_cpu_affinity
cd benchmarks/operator_benchmark/pt_extension
python setup.py install
cd "${TEST_DIR}"/benchmarks/operator_benchmark
$TASKSET python -m benchmark_all_test --device "$1" --tag-filter "$2" \
--output-dir "${TEST_REPORTS_DIR}/operator_benchmark_eager_float32_cpu.csv"
pip_install pandas
python check_perf_csv.py \
--actual "${TEST_REPORTS_DIR}/operator_benchmark_eager_float32_cpu.csv" \
--expected "expected_ci_operator_benchmark_eager_float32_cpu.csv"
}
if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* || "${BUILD_ENVIRONMENT}" == *-bazel-* ]]; then
(cd test && python -c "import torch; print(torch.__config__.show())")
(cd test && python -c "import torch; print(torch.__config__.parallel_info())")
@ -1593,19 +1558,6 @@ elif [[ "$TEST_CONFIG" == distributed ]]; then
if [[ "${SHARD_NUMBER}" == 1 ]]; then
test_rpc
fi
elif [[ "${TEST_CONFIG}" == *operator_benchmark* ]]; then
TEST_MODE="short"
if [[ "${TEST_CONFIG}" == *cpu* ]]; then
if [[ "${TEST_CONFIG}" == *long* ]]; then
TEST_MODE="long"
elif [[ "${TEST_CONFIG}" == *all* ]]; then
TEST_MODE="all"
fi
test_operator_benchmark cpu ${TEST_MODE}
fi
elif [[ "${TEST_CONFIG}" == *inductor_distributed* ]]; then
test_inductor_distributed
elif [[ "${TEST_CONFIG}" == *inductor-halide* ]]; then
@ -1668,7 +1620,6 @@ elif [[ "${TEST_CONFIG}" == *inductor_cpp_wrapper* ]]; then
install_torchvision
checkout_install_torchbench hf_T5 llama moco
PYTHONPATH=$(pwd)/torchbench test_inductor_cpp_wrapper_shard "$SHARD_NUMBER"
test_inductor_aoti
elif [[ "${TEST_CONFIG}" == *inductor* ]]; then
install_torchvision
test_inductor_shard "${SHARD_NUMBER}"
@ -1722,8 +1673,6 @@ elif [[ "${BUILD_ENVIRONMENT}" == *xpu* ]]; then
test_python
test_aten
test_xpu_bin
elif [[ "${TEST_CONFIG}" == smoke ]]; then
test_python_smoke
else
install_torchvision
install_monkeytype

View File

@ -37,11 +37,6 @@ call %INSTALLER_DIR%\activate_miniconda3.bat
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
:: Update CMake
call choco upgrade -y cmake --no-progress --installargs 'ADD_CMAKE_TO_PATH=System' --apply-install-arguments-to-dependencies --version=3.27.9
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
call pip install mkl-include==2021.4.0 mkl-devel==2021.4.0
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
@ -93,7 +88,7 @@ set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
:cuda_build_end
set DISTUTILS_USE_SDK=1
set PATH=%TMP_DIR_WIN%\bin;C:\Program Files\CMake\bin;%PATH%
set PATH=%TMP_DIR_WIN%\bin;%PATH%
:: The latest Windows CUDA test is running on AWS G5 runner with A10G GPU
if "%TORCH_CUDA_ARCH_LIST%" == "" set TORCH_CUDA_ARCH_LIST=8.6

View File

@ -24,7 +24,7 @@ if "%CUDA_SUFFIX%" == "" (
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z & REM @lint-ignore
curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z
) else (
aws s3 cp s3://ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet
)

View File

@ -1,7 +1,7 @@
@echo off
:: This script parses args, installs required libraries (MKL, Magma, libuv)
:: and then delegates to cpu.bat, cuda80.bat, etc.
:: This script parses args, installs required libraries (miniconda, MKL,
:: Magma), and then delegates to cpu.bat, cuda80.bat, etc.
if not "%CUDA_VERSION%" == "" if not "%PYTORCH_BUILD_VERSION%" == "" if not "%PYTORCH_BUILD_NUMBER%" == "" goto env_end
if "%~1"=="" goto arg_error
@ -36,18 +36,28 @@ set DESIRED_PYTHON_PREFIX=py%DESIRED_PYTHON_PREFIX:;=;py%
set SRC_DIR=%~dp0
pushd %SRC_DIR%
:: Install Miniconda3
set "CONDA_HOME=%CD%\conda"
set "tmp_conda=%CONDA_HOME%"
set "miniconda_exe=%CD%\miniconda.exe"
rmdir /s /q conda
del miniconda.exe
curl --retry 3 -k https://repo.anaconda.com/miniconda/Miniconda3-py311_23.9.0-0-Windows-x86_64.exe -o "%miniconda_exe%"
start /wait "" "%miniconda_exe%" /S /InstallationType=JustMe /RegisterPython=0 /AddToPath=0 /D=%tmp_conda%
if ERRORLEVEL 1 exit /b 1
set "ORIG_PATH=%PATH%"
set "PATH=%CONDA_HOME%;%CONDA_HOME%\scripts;%CONDA_HOME%\Library\bin;%PATH%"
:: setup build environment
:: create a new conda environment and install packages
:try
SET /A tries=3
:loop
IF %tries% LEQ 0 GOTO :exception
call setup_build.bat
call condaenv.bat
IF %ERRORLEVEL% EQU 0 GOTO :done
SET /A "tries=%tries%-1"
:exception
echo "Failed to setup build environment"
echo "Failed to create conda env"
exit /B 1
:done
@ -63,7 +73,7 @@ if "%DEBUG%" == "1" (
if not "%CUDA_VERSION%" == "cpu" if not "%CUDA_VERSION%" == "xpu" (
rmdir /s /q magma_%CUDA_PREFIX%_%BUILD_TYPE%
del magma_%CUDA_PREFIX%_%BUILD_TYPE%.7z
curl -k https://s3.amazonaws.com/ossci-windows/magma_%MAGMA_VERSION%_%CUDA_PREFIX%_%BUILD_TYPE%.7z -o magma_%CUDA_PREFIX%_%BUILD_TYPE%.7z %= @lint-ignore =%
curl -k https://s3.amazonaws.com/ossci-windows/magma_%MAGMA_VERSION%_%CUDA_PREFIX%_%BUILD_TYPE%.7z -o magma_%CUDA_PREFIX%_%BUILD_TYPE%.7z
7z x -aoa magma_%CUDA_PREFIX%_%BUILD_TYPE%.7z -omagma_%CUDA_PREFIX%_%BUILD_TYPE%
)
@ -97,20 +107,19 @@ set TH_BINARY_BUILD=1
set INSTALL_TEST=0
for %%v in (%DESIRED_PYTHON_PREFIX%) do (
:: Set Environment vars for the build
set "CMAKE_PREFIX_PATH=%CD%\Python\Library\;%PATH%"
set "PYTHON_LIB_PATH=%CD%\Python\Library\bin"
:: Activate Python Environment
set PYTHON_PREFIX=%%v
set "CONDA_LIB_PATH=%CONDA_HOME%\envs\%%v\Library\bin"
if not "%ADDITIONAL_PATH%" == "" (
set "PATH=%ADDITIONAL_PATH%;%PATH%"
set "PATH=%ADDITIONAL_PATH%;%CONDA_HOME%\envs\%%v;%CONDA_HOME%\envs\%%v\scripts;%CONDA_HOME%\envs\%%v\Library\bin;%ORIG_PATH%"
) else (
set "PATH=%CONDA_HOME%\envs\%%v;%CONDA_HOME%\envs\%%v\scripts;%CONDA_HOME%\envs\%%v\Library\bin;%ORIG_PATH%"
)
pip install ninja
@setlocal
:: Set Flags
if not "%CUDA_VERSION%"=="cpu" if not "%CUDA_VERSION%" == "xpu" (
set "MAGMA_HOME=%cd%\magma_%CUDA_PREFIX%_%BUILD_TYPE%"
set MAGMA_HOME=%cd%\magma_%CUDA_PREFIX%_%BUILD_TYPE%
)
echo "Calling arch build script"
call %CUDA_PREFIX%.bat

View File

@ -0,0 +1,27 @@
IF "%DESIRED_PYTHON%"=="" (
echo DESIRED_PYTHON is NOT defined.
exit /b 1
)
:: Create a new conda environment
setlocal EnableDelayedExpansion
FOR %%v IN (%DESIRED_PYTHON%) DO (
set PYTHON_VERSION_STR=%%v
set PYTHON_VERSION_STR=!PYTHON_VERSION_STR:.=!
conda remove -n py!PYTHON_VERSION_STR! --all -y || rmdir %CONDA_HOME%\envs\py!PYTHON_VERSION_STR! /s
if "%%v" == "3.9" call conda create -n py!PYTHON_VERSION_STR! -y numpy=2.0.1 boto3 cmake ninja typing_extensions setuptools=72.1.0 python=%%v
if "%%v" == "3.10" call conda create -n py!PYTHON_VERSION_STR! -y -c=conda-forge numpy=2.0.1 boto3 cmake ninja typing_extensions setuptools=72.1.0 python=%%v
if "%%v" == "3.11" call conda create -n py!PYTHON_VERSION_STR! -y -c=conda-forge numpy=2.0.1 boto3 cmake ninja typing_extensions setuptools=72.1.0 python=%%v
if "%%v" == "3.12" call conda create -n py!PYTHON_VERSION_STR! -y -c=conda-forge numpy=2.0.1 boto3 cmake ninja typing_extensions setuptools=72.1.0 python=%%v
if "%%v" == "3.13" call conda create -n py!PYTHON_VERSION_STR! -y -c=conda-forge numpy=2.1.2 boto3 cmake ninja typing_extensions setuptools=72.1.0 python=%%v
if "%%v" == "3.13t" call conda create -n py!PYTHON_VERSION_STR! -y -c=conda-forge numpy=2.1.2 boto3 cmake ninja typing_extensions setuptools=72.1.0 python-freethreading python=3.13
call conda run -n py!PYTHON_VERSION_STR! pip install pyyaml
call conda run -n py!PYTHON_VERSION_STR! pip install mkl-include
call conda run -n py!PYTHON_VERSION_STR! pip install mkl-static
)
endlocal
:: Install libuv
conda install -y -q -c conda-forge libuv=1.39
set libuv_ROOT=%CONDA_HOME%\Library
echo libuv_ROOT=%libuv_ROOT%

View File

@ -37,7 +37,7 @@ IF "%CUDA_PATH_V124%"=="" (
)
IF "%BUILD_VISION%" == "" (
set TORCH_CUDA_ARCH_LIST=6.1;7.0;7.5;8.0;8.6;9.0
set TORCH_CUDA_ARCH_LIST=5.0;6.0;6.1;7.0;7.5;8.0;8.6;9.0
set TORCH_NVCC_FLAGS=-Xfatbin -compress-all
) ELSE (
set NVCC_FLAGS=-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_90,code=compute_90

View File

@ -37,7 +37,7 @@ IF "%CUDA_PATH_V126%"=="" (
)
IF "%BUILD_VISION%" == "" (
set TORCH_CUDA_ARCH_LIST=6.1;7.0;7.5;8.0;8.6;9.0
set TORCH_CUDA_ARCH_LIST=5.0;6.0;6.1;7.0;7.5;8.0;8.6;9.0
set TORCH_NVCC_FLAGS=-Xfatbin -compress-all
) ELSE (
set NVCC_FLAGS=-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_90,code=compute_90

View File

@ -37,7 +37,7 @@ IF "%CUDA_PATH_V128%"=="" (
)
IF "%BUILD_VISION%" == "" (
set TORCH_CUDA_ARCH_LIST=6.1;7.0;7.5;8.0;8.6;9.0;10.0;12.0
set TORCH_CUDA_ARCH_LIST=5.0;6.0;6.1;7.0;7.5;8.0;8.6;9.0;10.0;12.0
set TORCH_NVCC_FLAGS=-Xfatbin -compress-all
) ELSE (
set NVCC_FLAGS=-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_100,code=compute_100 -gencode=arch=compute_120,code=compute_120

View File

@ -1,6 +1,6 @@
@echo off
curl -k -L "https://sourceforge.net/projects/sevenzip/files/7-Zip/18.05/7z1805-x64.exe/download" -o 7z1805-x64.exe
curl -k https://www.7-zip.org/a/7z1805-x64.exe -O
if errorlevel 1 exit /b 1
start /wait 7z1805-x64.exe /S

Some files were not shown because too many files have changed in this diff Show More