Compare commits

..

28 Commits

Author SHA1 Message Date
b04d8358d9 ci/docker: use NCCL 2.26.2-1 (#149874)
ci/docker: use NCCL 2.26.2-1 (#149778)

Related to #149153

This updates some build scripts to hopefully fix the nightly builds which are somehow building against nccl 2.25.1 and using 2.26.2 from pip.

Test plan:

After merging rerun nightly linux jobs and validate that nccl version matches
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149778
Approved by: https://github.com/Skylion007, https://github.com/atalman

Co-authored-by: Andrey Talman <atalman@fb.com>
(cherry picked from commit ddc0fe903f3043246103d71b60a4fff0aeeef9e8)

Co-authored-by: Tristan Rice <rice@fn.lc>
2025-03-26 16:36:35 -04:00
d80afc07f0 [cherry-pick] Modify cuda aarch64 install for cudnn and nccl. Cleanup aarch64 cuda 12.6 docker #149540 (#149624)
Modify cuda aarch64 install for cudnn and nccl. Cleanup aarch64 cuda 12.6 docker (#149540)

1. Use NCCL_VERSION=v2.26.2-1 . Fixes nccl cuda aarch64 related failure we see here: https://github.com/pytorch/pytorch/actions/runs/13955856471/job/39066681549?pr=149443 . After landing: https://github.com/pytorch/pytorch/pull/149351
TODO: Followup required to unify NCCL definitions across the x86 and aarch64 builds

3. Cleanup Remove older CUDA versions for aarch64 builds . CUDA 12.6 where removed by: https://github.com/pytorch/pytorch/pull/148895
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149540
Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/nWEIdia
2025-03-26 13:33:54 -07:00
84210a82ef [cherry-pick] nccl: upgrade to 2.26.2 to avoid hang on ncclCommAbort (#149351) (#149546)
* nccl: upgrade to 2.26.2 to avoid hang on ncclCommAbort (#149351)

Fixes #149153

Yaml generated from:

```
python .github/scripts/generate_ci_workflows.py
```

Test plan:

Repro in https://gist.github.com/d4l3k/16a19b475952bc40ddd7f2febcc297b7

```
rm -rf third_party/nccl
python setup.py develop
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149351
Approved by: https://github.com/kwen2501, https://github.com/atalman, https://github.com/malfet

* fixed_regenerations

---------

Co-authored-by: Tristan Rice <rice@fn.lc>
2025-03-26 13:32:51 -07:00
4268b2f40a [MPSInductor] Move threadfence at the right location (#150037)
[MPSInductor] Move threadfence at the right location (#149437)

Not sure how it worked in the past, but fence should be before first read from the shared memory, not after it.
This bug was exposed by https://github.com/pytorch/pytorch/pull/148969 which removed unnecessary barrier before calling `threadgroup_reduce` functions
Test plan:
```
% python3 generate.py --checkpoint_path checkpoints/stories15M/model.pth --prompt "Once upon a time" --device mps --compile
```
Before that it produced gibberish, now it works fine
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149437
Approved by: https://github.com/manuelcandales, https://github.com/dcci

(cherry picked from commit 61a64c20c402e61027dad4a9e7a192ec0971d1d6)

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-03-26 11:17:09 -07:00
12a6d2a0b8 Add triton as dependency to CUDA aarch64 build (#149945)
Add triton as dependency to CUDA aarch64 build (#149584)

Aarch64 Triton build was added by: https://github.com/pytorch/pytorch/pull/148705
Hence add proper contrain to CUDA 12.8 Aarch64 build

Please note we want to still use:
```platform_system == 'Linux' and platform_machine == 'x86_64'```
For all other builds.

Since these are prototype binaries only used by cuda 12.8 linux aarch64 build. Which we would like to serve from download.pytorch.org

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149584
Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
(cherry picked from commit 9b1127437e6ccf0c55a87607d9f551cc6424ca67)

Co-authored-by: Andrey Talman <atalman@fb.com>
2025-03-26 07:28:27 -07:00
464432ec47 Automate stable CUDA update and linter using min Python verison (#149981)
Automate stable CUDA update and linter using min Python verison (#148912)

1. Fixes: https://github.com/pytorch/pytorch/issues/145571 . Cuda Stable is the same cuda version that is published to pypi, also used to set Metadata section in the rest of whl scripts and tag the docker releases with latest tag.
2. Updates min python version used in linter
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148912
Approved by: https://github.com/Skylion007, https://github.com/malfet

(cherry picked from commit 29fd875bc125582f29abbdf5559d3941899680be)

Co-authored-by: atalman <atalman@fb.com>
2025-03-26 07:03:35 -07:00
1f612dafb5 Removing doc references to PRE_CXX11_ABI. (#149952)
Removing doc references to PRE_CXX11_ABI. (#149756)

Fixes #149550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149756
Approved by: https://github.com/svekars, https://github.com/atalman

(cherry picked from commit 43ee67e8dc6827fbb7d12a5950ddf6a5c80771dc)

Co-authored-by: Alanna Burke <burkealanna@meta.com>
2025-03-25 14:04:12 -07:00
f63def6ac7 [XPU] Update triton commit to fix to fix level_zero not found by env var LEVEL_ZERO_V1_SDK_PATH. (#149714)
[XPU] Update triton commit to fix to fix level_zero not found by env var LEVEL_ZERO_V1_SDK_PATH. (#149511)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149511
Approved by: https://github.com/EikanWang

(cherry picked from commit ee6a0291653cd507d59bda6bcf5d848099a804d1)

Co-authored-by: xinan.lin <xinan.lin@intel.com>
2025-03-25 16:04:51 -04:00
3a8e623a9b op should NOT be static in aoti_torch_call_dispatcher (#149644)
op should NOT be static in aoti_torch_call_dispatcher (#149208)

aoti_torch_call_dispatcher is meant to call different ops, so the op must not be static. Otherwise, every call to this API will call the first op that was ever called, which is not the intended behavior of any human being.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149208
Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/malfet

(cherry picked from commit 740ce0fa5f8c7e9e51422b614f8187ab93a60b8b)

Co-authored-by: Jane Xu <janeyx@meta.com>
2025-03-25 16:03:19 -04:00
bf727425a0 Symintify transpose_ (#149632)
Symintify transpose_ (#149057)

Fixes https://github.com/pytorch/pytorch/issues/148702
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149057
Approved by: https://github.com/yushangdi

(cherry picked from commit 8d7c430e84f4ad439ebdc81f9ab496a3665033a4)

Co-authored-by: angelayi <yiangela7@gmail.com>
2025-03-25 15:59:42 -04:00
8c7dbc939f [cherry-pick] [CI] Don't clean workspace when fetching repo (#147994) (#149129)
Revert "[CI] Don't clean workspace when fetching repo (#147994)"

This reverts commit e5fef8a08ebb8548e8413ae54ef0ad9a11f1f4c0.
2025-03-25 15:25:19 -04:00
644fdbad95 [Intel GPU][PT2E] bugfix: use zero-point to decide conv src zp mask (#149631)
[Intel GPU][PT2E] bugfix: use zero-point to decide conv src zp mask (#149473)

# Motivation
The PR fix a bug that wrongly decides the zero-point mask setting. Specifically, it deems zero-point is always not zeros due to scale is used for judgement. Fortunately, the bug only affects the performance. The accuracy is not affected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149473
Approved by: https://github.com/EikanWang, https://github.com/guangyey

(cherry picked from commit d67c1a027e61bd68908bc4c8e5275a983521366c)

Co-authored-by: ZhiweiYan-96 <zhiwei.yan@intel.com>
2025-03-25 10:07:25 -05:00
fb027c5692 [cherry-pick] Update ExecuTorch pin update (#149539) (#149630)
Update ExecuTorch pin update (#149539)

Latest commit in https://hud.pytorch.org/hud/pytorch/executorch/viable%2Fstrict/1?per_page=50

Follow-up to https://github.com/pytorch/pytorch/issues/144480#issuecomment-2731150636

Also, need to incorporate change from https://github.com/pytorch/executorch/pull/8817

Test Plan:

Monitor  linux-jammy-py3-clang12-executorch test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149539
Approved by: https://github.com/larryliu0820

(cherry picked from commit bc86b6c55a4f7e07548a92fe7c9b52ad2c88af35)
2025-03-25 10:03:16 -05:00
3b87bd8b82 Fix atomic operation compatibility for ARMv8-A (Raspberry Pi 4) by adjusting compilation flags (#149878)
Fix atomic operation compatibility for ARMv8-A (Raspberry Pi 4) by adjusting compilation flags (#148070)

**Issue:**
* The ldaddal instruction is an AArch64 atomic operation available from ARMv8.1-A onwards.
* Raspberry Pi 4 (Cortex-A72) is ARMv8-A, which does not support ldaddal, leading to failures when running PyTorch built with march=armv8.2-a+sve
* This led to an issue when running PyTorch on ARMv8-A (Raspberry Pi 4), as unsupported atomic operations were generated.

**Fix:**
* Updated the build flags to explicitly use **-march=armv8-a+sve**, ensuring GCC and clang promotes it correctly and resolves compatibility issues with armv8 and still work correctly for SVE like before.
* This ensures that PyTorch builds correctly for ARMv8-A platforms (e.g., Raspberry Pi 4) while still enabling SVE for supported hardware.

Test plan:
 - Allocate `a1.4xlarge` on AWS
 - Run following script using wheel produced by this PR
 ```python
import torch
def f(x):
    return x.sin() + x.cos()

print(torch.__version__)
f_c = torch.jit.script(f)
```
- Observe no crash
```
$ python3 foo.py
2.7.0.dev20250313+cpu
```
- Observe crash with 2.6.0
```
$ python3 foo.py
2.6.0+cpu
Illegal instruction (core dumped)
```

Fixes #146792

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148070
Approved by: https://github.com/malfet

(cherry picked from commit 09f7f62cfebb0067b93d227c13fe9a94b51af762)

Co-authored-by: maajidkhann <maajidkhan.n@fujitsu.com>
2025-03-24 13:57:16 -07:00
89b098a677 Add release branch push triggers to inductor-rocm-mi300.yml (#149871)
Add release branch push triggers to inductor-rocm-mi300.yml (#149672)

In similar vein as https://github.com/pytorch/pytorch/pull/149517

When we added the rocm-mi300.yml earlier this year, we had lower capacity and we were just pipecleaning the workflow, so we set the trigger to only respond to pushes to main branch. But now we have more stability as well as capacity, and we would really like to ensure that the release branch is being tested on MI300s as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149672
Approved by: https://github.com/jeffdaily

(cherry picked from commit 1eab841185cb2d68a11e2e0604fd96d110778960)

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2025-03-24 13:24:10 -07:00
4cc4302b32 Do not depend on numpy during the import (#149731)
Do not depend on numpy during the import (#149683)

But a good followup would be to use torch primitives instead of numpy here
Fixes https://github.com/pytorch/pytorch/issues/149681

Test plan: Monkey-patch 2.7.0-rc and run `python -c "import torch;print(torch.compile(lambda x:x.sin() + x.cos())(torch.rand(32)))"`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149683
Approved by: https://github.com/seemethere

(cherry picked from commit 68dfd44e50f59c53698a24985039a27351862963)

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-03-21 11:13:52 -07:00
c632e4fdb8 [ONNX] Expose verification utilities (#149375)
* [ONNX] Expose verification utilities (#148603)

Expose verification utilities to public documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148603
Approved by: https://github.com/titaiwangms

(cherry picked from commit ebabd0efdddd91e11364e42227b746c419a39be4)

* [ONNX] Update types in VerificationInfo (#149377)

torch.types.Number was rendered as is in the documentation and can be confusing. We write the original types instead to reduce confusion for users.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149377
Approved by: https://github.com/titaiwangms

---------

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2025-03-20 08:14:38 -07:00
b23bfae9f7 Add AOTI shim for _weight_int4pack_mm_cpu_tensor (#149031) (#149386)
**Summary**
Previous implementation of shim did not align with the design and it was removed by https://github.com/pytorch/pytorch/pull/148907
This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT.

**Test plan**
```
pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149031
Approved by: https://github.com/leslie-fang-intel, https://github.com/EikanWang, https://github.com/desertfire
2025-03-20 08:08:09 -07:00
1b8f496f87 Pin auditwheel to 6.2.0 (#149525) 2025-03-19 16:13:43 -07:00
c236b602ff Add release branch push triggers to rocm-mi300.yml (#149526) 2025-03-19 16:12:59 -07:00
6926f30654 BC fix for AOTIModelPackageLoader() constructor defaults (#149214)
BC fix for AOTIModelPackageLoader() constructor defaults (#149082)

The default value for `run_single_threaded` was wrongly specified in the .cpp file instead of the header, breaking C++-side instantiation of `AOTIModelPackageLoader` with no arguments. This PR fixes this and adds a test for the use case of running with `AOTIModelPackageLoader` instead of `AOTIModelContainerRunner` on the C++ side.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149082
Approved by: https://github.com/desertfire

(cherry picked from commit 5e1b715dda813d8c545378291261b565649df8e5)

Co-authored-by: Joel Schlosser <jbschlosser@meta.com>
2025-03-17 17:16:54 -05:00
483980d7f3 [AOTI][XPU] Fix: model_container_runner_xpu.cpp is not built into libtorch_xpu.so (#149242)
[AOTI][XPU] Fix: model_container_runner_xpu.cpp is not built into libtorch_xpu.so (#149175)

The missing of model_container_runner_xpu.cpp will cause compilation failure when user build CPP inference application on XPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149175
Approved by: https://github.com/jansel

(cherry picked from commit 9ad6265d044075d1ceb27cf0f2af7495e586003c)

Co-authored-by: xinan.lin <xinan.lin@intel.com>
2025-03-17 17:15:39 -05:00
7173a73cf4 [regression] Fix pin_memory() when it is called before device lazy initialization. (#149183)
[regression] Fix pin_memory() when it is called before device lazy initialization. (#149033)

PR #145752 has added a check in the isPinnedPtr to check if a device is initialized before checking if the tensor is pinned. Also that PR has added a lazy initialization trigger when an at::empty is called with a pinned param set to true. However, when the tensor is firstly created and it is pinned in a separate call by calling pin_memory() function, lazy device init is not called so is_pinned returns always false.

With this PR, the lazy initialization is moved to getPinnedMemoryAllocator function, thus it is assured that device is initialized before we pin a tensor.

Fixes #149032

@ngimel @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149033
Approved by: https://github.com/ngimel, https://github.com/albanD

(cherry picked from commit 420a9be743f8dd5d6296a32a1351c1baced12f1f)

Co-authored-by: Bartlomiej Stemborowski <bstemborowskix@habana.ai>
2025-03-17 15:37:24 -04:00
7bab7354df [cherry-pick] Revert #148823 - Make dynamism code robust to NotImplementedException (#149160)
Revert "Make dynamism code robust to NotImplementedException (#148823)"

This reverts commit 60576419a2a5cc09e4a92be870fda8f3fc305ddc.

Reverting from RC since it was reverted from the main branch
2025-03-14 10:50:24 -05:00
b1940b5867 Remove runtime dependency on packaging (#149125)
Remove runtime dependency on packaging (#149092)

Looks like after https://github.com/pytorch/pytorch/pull/148924
We are seeing this error in nightly test:
https://github.com/pytorch/pytorch/actions/runs/13806023728/job/38616861623

```
  File "/Users/runner/work/_temp/anaconda/envs/test_conda_env/lib/python3.13/site-packages/torch/_inductor/pattern_matcher.py", line 79, in <module>
    from .lowering import fallback_node_due_to_unsupported_type
  File "/Users/runner/work/_temp/anaconda/envs/test_conda_env/lib/python3.13/site-packages/torch/_inductor/lowering.py", line 7024, in <module>
    from . import kernel
  File "/Users/runner/work/_temp/anaconda/envs/test_conda_env/lib/python3.13/site-packages/torch/_inductor/kernel/__init__.py", line 1, in <module>
    from . import mm, mm_common, mm_plus_mm
  File "/Users/runner/work/_temp/anaconda/envs/test_conda_env/lib/python3.13/site-packages/torch/_inductor/kernel/mm.py", line 6, in <module>
    from packaging.version import Version
ModuleNotFoundError: No module named 'packaging'
```

Hence removing runtime dependency on packaging since it may not be installed by default

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149092
Approved by: https://github.com/drisspg, https://github.com/davidberard98

(cherry picked from commit 65d19a5699afbb0b123b6b264188f5610b925c5e)

Co-authored-by: atalman <atalman@fb.com>
2025-03-13 12:28:50 -04:00
abebbd5113 [Profiler][HPU] Fix incorrect availabilities for HPU (#149115)
[Profiler][HPU] Fix incorrect availabilities for HPU (#148663)

Fixes #148661

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148663
Approved by: https://github.com/jeromean, https://github.com/albanD

(cherry picked from commit 75c8b7d9725af59d8348379a4165d1252c4ac208)

Co-authored-by: wdziurdz <witold.dziurdz@intel.com>
2025-03-13 08:09:45 -07:00
cdd7a2c72b [RLEASE ONLY CHANGES] Apply release only chnages to release 2.7 (#149056)
* [RLEASE ONLY CHANGES] Apply release only chnages to release 2.7

* fix_lint_workflow

* docker_release

* fix_check_binary
2025-03-12 15:44:15 -04:00
d94ea2647c [inductor] Fix profiler tests with latest Triton (#149059)
[inductor] Fix profiler tests with latest Triton (#149025)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149025
Approved by: https://github.com/yanboliang

(cherry picked from commit 488c4480f97e9d537905b0fbcb7236f88dca47f7)

Co-authored-by: Jason Ansel <jansel@meta.com>
2025-03-12 15:21:51 -04:00
7831 changed files with 184412 additions and 421593 deletions

View File

@ -2,7 +2,7 @@ build --cxxopt=--std=c++17
build --copt=-I. build --copt=-I.
# Bazel does not support including its cc_library targets as system # Bazel does not support including its cc_library targets as system
# headers. We work around this for generated code # headers. We work around this for generated code
# (e.g. torch/headeronly/macros/cmake_macros.h) by making the generated directory a # (e.g. c10/macros/cmake_macros.h) by making the generated directory a
# system include path. # system include path.
build --copt=-isystem --copt bazel-out/k8-fastbuild/bin build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
build --copt=-isystem --copt bazel-out/darwin-fastbuild/bin build --copt=-isystem --copt bazel-out/darwin-fastbuild/bin

View File

@ -3,8 +3,10 @@ set -eux -o pipefail
GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-} GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}
if [[ "$GPU_ARCH_VERSION" == *"12.9"* ]]; then if [[ "$GPU_ARCH_VERSION" == *"12.6"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;12.0" export TORCH_CUDA_ARCH_LIST="9.0"
elif [[ "$GPU_ARCH_VERSION" == *"12.8"* ]]; then
export TORCH_CUDA_ARCH_LIST="9.0;10.0;12.0"
fi fi
SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )" SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
@ -25,7 +27,6 @@ if [ "$DESIRED_CUDA" = "cpu" ]; then
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn
else else
echo "BASE_CUDA_VERSION is set to: $DESIRED_CUDA" echo "BASE_CUDA_VERSION is set to: $DESIRED_CUDA"
export USE_SYSTEM_NCCL=1
#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files #USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
fi fi

View File

@ -31,47 +31,33 @@ def build_ArmComputeLibrary() -> None:
"build=native", "build=native",
] ]
acl_install_dir = "/acl" acl_install_dir = "/acl"
acl_checkout_dir = os.getenv("ACL_SOURCE_DIR", "ComputeLibrary") acl_checkout_dir = "ComputeLibrary"
if os.path.isdir(acl_install_dir): os.makedirs(acl_install_dir)
shutil.rmtree(acl_install_dir) check_call(
if not os.path.isdir(acl_checkout_dir) or not len(os.listdir(acl_checkout_dir)): [
check_call( "git",
[ "clone",
"git", "https://github.com/ARM-software/ComputeLibrary.git",
"clone", "-b",
"https://github.com/ARM-software/ComputeLibrary.git", "v25.02",
"-b", "--depth",
"v25.02", "1",
"--depth", "--shallow-submodules",
"1", ]
"--shallow-submodules", )
]
)
check_call( check_call(
["scons", "Werror=1", f"-j{os.cpu_count()}"] + acl_build_flags, ["scons", "Werror=1", "-j8", f"build_dir=/{acl_install_dir}/build"]
+ acl_build_flags,
cwd=acl_checkout_dir, cwd=acl_checkout_dir,
) )
for d in ["arm_compute", "include", "utils", "support", "src", "build"]: for d in ["arm_compute", "include", "utils", "support", "src"]:
shutil.copytree(f"{acl_checkout_dir}/{d}", f"{acl_install_dir}/{d}") shutil.copytree(f"{acl_checkout_dir}/{d}", f"{acl_install_dir}/{d}")
def replace_tag(filename) -> None: def update_wheel(wheel_path, desired_cuda) -> None:
with open(filename) as f:
lines = f.readlines()
for i, line in enumerate(lines):
if line.startswith("Tag:"):
lines[i] = line.replace("-linux_", "-manylinux_2_28_")
print(f"Updated tag from {line} to {lines[i]}")
break
with open(filename, "w") as f:
f.writelines(lines)
def package_cuda_wheel(wheel_path, desired_cuda) -> None:
""" """
Package the cuda wheel libraries Update the cuda wheel libraries
""" """
folder = os.path.dirname(wheel_path) folder = os.path.dirname(wheel_path)
wheelname = os.path.basename(wheel_path) wheelname = os.path.basename(wheel_path)
@ -79,7 +65,6 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
os.system(f"unzip {wheel_path} -d {folder}/tmp") os.system(f"unzip {wheel_path} -d {folder}/tmp")
libs_to_copy = [ libs_to_copy = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12", "/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12",
"/usr/local/cuda/extras/CUPTI/lib64/libnvperf_host.so",
"/usr/local/cuda/lib64/libcudnn.so.9", "/usr/local/cuda/lib64/libcudnn.so.9",
"/usr/local/cuda/lib64/libcublas.so.12", "/usr/local/cuda/lib64/libcublas.so.12",
"/usr/local/cuda/lib64/libcublasLt.so.12", "/usr/local/cuda/lib64/libcublasLt.so.12",
@ -89,7 +74,7 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
"/usr/local/cuda/lib64/libcusparseLt.so.0", "/usr/local/cuda/lib64/libcusparseLt.so.0",
"/usr/local/cuda/lib64/libcusolver.so.11", "/usr/local/cuda/lib64/libcusolver.so.11",
"/usr/local/cuda/lib64/libcurand.so.10", "/usr/local/cuda/lib64/libcurand.so.10",
"/usr/local/cuda/lib64/libnccl.so.2", "/usr/local/cuda/lib64/libnvToolsExt.so.1",
"/usr/local/cuda/lib64/libnvJitLink.so.12", "/usr/local/cuda/lib64/libnvJitLink.so.12",
"/usr/local/cuda/lib64/libnvrtc.so.12", "/usr/local/cuda/lib64/libnvrtc.so.12",
"/usr/local/cuda/lib64/libcudnn_adv.so.9", "/usr/local/cuda/lib64/libcudnn_adv.so.9",
@ -103,19 +88,30 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
"/usr/lib64/libgfortran.so.5", "/usr/lib64/libgfortran.so.5",
"/acl/build/libarm_compute.so", "/acl/build/libarm_compute.so",
"/acl/build/libarm_compute_graph.so", "/acl/build/libarm_compute_graph.so",
"/usr/local/lib/libnvpl_lapack_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_blas_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_lapack_core.so.0",
"/usr/local/lib/libnvpl_blas_core.so.0",
] ]
if enable_cuda:
if "129" in desired_cuda:
libs_to_copy += [ libs_to_copy += [
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.9", "/usr/local/lib/libnvpl_lapack_lp64_gomp.so.0",
"/usr/local/cuda/lib64/libcufile.so.0", "/usr/local/lib/libnvpl_blas_lp64_gomp.so.0",
"/usr/local/cuda/lib64/libcufile_rdma.so.1", "/usr/local/lib/libnvpl_lapack_core.so.0",
"/usr/local/lib/libnvpl_blas_core.so.0",
]
if "126" in desired_cuda:
libs_to_copy += [
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.6",
"/usr/local/cuda/lib64/libcufile.so.0",
"/usr/local/cuda/lib64/libcufile_rdma.so.1",
]
elif "128" in desired_cuda:
libs_to_copy += [
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.8",
"/usr/local/cuda/lib64/libcufile.so.0",
"/usr/local/cuda/lib64/libcufile_rdma.so.1",
]
else:
libs_to_copy += [
"/opt/OpenBLAS/lib/libopenblas.so.0",
] ]
# Copy libraries to unzipped_folder/a/lib # Copy libraries to unzipped_folder/a/lib
for lib_path in libs_to_copy: for lib_path in libs_to_copy:
lib_name = os.path.basename(lib_path) lib_name = os.path.basename(lib_path)
@ -124,13 +120,6 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
f"cd {folder}/tmp/torch/lib/; " f"cd {folder}/tmp/torch/lib/; "
f"patchelf --set-rpath '$ORIGIN' --force-rpath {folder}/tmp/torch/lib/{lib_name}" f"patchelf --set-rpath '$ORIGIN' --force-rpath {folder}/tmp/torch/lib/{lib_name}"
) )
# Make sure the wheel is tagged with manylinux_2_28
for f in os.scandir(f"{folder}/tmp/"):
if f.is_dir() and f.name.endswith(".dist-info"):
replace_tag(f"{f.path}/WHEEL")
break
os.mkdir(f"{folder}/cuda_wheel") os.mkdir(f"{folder}/cuda_wheel")
os.system(f"cd {folder}/tmp/; zip -r {folder}/cuda_wheel/{wheelname} *") os.system(f"cd {folder}/tmp/; zip -r {folder}/cuda_wheel/{wheelname} *")
shutil.move( shutil.move(
@ -147,9 +136,6 @@ def complete_wheel(folder: str) -> str:
""" """
wheel_name = list_dir(f"/{folder}/dist")[0] wheel_name = list_dir(f"/{folder}/dist")[0]
# Please note for cuda we don't run auditwheel since we use custom script to package
# the cuda dependencies to the wheel file using update_wheel() method.
# However we need to make sure filename reflects the correct Manylinux platform.
if "pytorch" in folder and not enable_cuda: if "pytorch" in folder and not enable_cuda:
print("Repairing Wheel with AuditWheel") print("Repairing Wheel with AuditWheel")
check_call(["auditwheel", "repair", f"dist/{wheel_name}"], cwd=folder) check_call(["auditwheel", "repair", f"dist/{wheel_name}"], cwd=folder)
@ -161,14 +147,7 @@ def complete_wheel(folder: str) -> str:
f"/{folder}/dist/{repaired_wheel_name}", f"/{folder}/dist/{repaired_wheel_name}",
) )
else: else:
repaired_wheel_name = wheel_name.replace( repaired_wheel_name = wheel_name
"linux_aarch64", "manylinux_2_28_aarch64"
)
print(f"Renaming {wheel_name} wheel to {repaired_wheel_name}")
os.rename(
f"/{folder}/dist/{wheel_name}",
f"/{folder}/dist/{repaired_wheel_name}",
)
print(f"Copying {repaired_wheel_name} to artifacts") print(f"Copying {repaired_wheel_name} to artifacts")
shutil.copy2( shutil.copy2(
@ -205,10 +184,8 @@ if __name__ == "__main__":
).decode() ).decode()
print("Building PyTorch wheel") print("Building PyTorch wheel")
build_vars = "CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000 " build_vars = "MAX_JOBS=5 CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000 "
# MAX_JOB=5 is not required for CPU backend (see commit 465d98b) os.system("cd /pytorch; python setup.py clean")
if enable_cuda:
build_vars = "MAX_JOBS=5 " + build_vars
override_package_version = os.getenv("OVERRIDE_PACKAGE_VERSION") override_package_version = os.getenv("OVERRIDE_PACKAGE_VERSION")
desired_cuda = os.getenv("DESIRED_CUDA") desired_cuda = os.getenv("DESIRED_CUDA")
@ -255,6 +232,6 @@ if __name__ == "__main__":
print("Updating Cuda Dependency") print("Updating Cuda Dependency")
filename = os.listdir("/pytorch/dist/") filename = os.listdir("/pytorch/dist/")
wheel_path = f"/pytorch/dist/{filename[0]}" wheel_path = f"/pytorch/dist/{filename[0]}"
package_cuda_wheel(wheel_path, desired_cuda) update_wheel(wheel_path, desired_cuda)
pytorch_wheel_name = complete_wheel("/pytorch/") pytorch_wheel_name = complete_wheel("/pytorch/")
print(f"Build Complete. Created {pytorch_wheel_name}..") print(f"Build Complete. Created {pytorch_wheel_name}..")

View File

@ -19,11 +19,13 @@ import boto3
# AMI images for us-east-1, change the following based on your ~/.aws/config # AMI images for us-east-1, change the following based on your ~/.aws/config
os_amis = { os_amis = {
"ubuntu18_04": "ami-078eece1d8119409f", # login_name: ubuntu
"ubuntu20_04": "ami-052eac90edaa9d08f", # login_name: ubuntu "ubuntu20_04": "ami-052eac90edaa9d08f", # login_name: ubuntu
"ubuntu22_04": "ami-0c6c29c5125214c77", # login_name: ubuntu "ubuntu22_04": "ami-0c6c29c5125214c77", # login_name: ubuntu
"redhat8": "ami-0698b90665a2ddcf1", # login_name: ec2-user "redhat8": "ami-0698b90665a2ddcf1", # login_name: ec2-user
} }
ubuntu18_04_ami = os_amis["ubuntu18_04"]
ubuntu20_04_ami = os_amis["ubuntu20_04"] ubuntu20_04_ami = os_amis["ubuntu20_04"]
@ -657,6 +659,18 @@ def configure_system(
"sudo apt-get install -y python3-dev python3-yaml python3-setuptools python3-wheel python3-pip" "sudo apt-get install -y python3-dev python3-yaml python3-setuptools python3-wheel python3-pip"
) )
host.run_cmd("pip3 install dataclasses typing-extensions") host.run_cmd("pip3 install dataclasses typing-extensions")
# Install and switch to gcc-8 on Ubuntu-18.04
if not host.using_docker() and host.ami == ubuntu18_04_ami and compiler == "gcc-8":
host.run_cmd("sudo apt-get install -y g++-8 gfortran-8")
host.run_cmd(
"sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100"
)
host.run_cmd(
"sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 100"
)
host.run_cmd(
"sudo update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-8 100"
)
if not use_conda: if not use_conda:
print("Installing Cython + numpy from PyPy") print("Installing Cython + numpy from PyPy")
host.run_cmd("sudo pip3 install Cython") host.run_cmd("sudo pip3 install Cython")
@ -1012,7 +1026,7 @@ if __name__ == "__main__":
install_condaforge_python(host, args.python_version) install_condaforge_python(host, args.python_version)
sys.exit(0) sys.exit(0)
python_version = args.python_version if args.python_version is not None else "3.9" python_version = args.python_version if args.python_version is not None else "3.8"
if args.use_torch_from_pypi: if args.use_torch_from_pypi:
configure_system(host, compiler=args.compiler, python_version=python_version) configure_system(host, compiler=args.compiler, python_version=python_version)

View File

@ -10,3 +10,5 @@ example: `py2-cuda9.0-cudnn7-ubuntu16.04`. The Docker images that are
built on Jenkins and are used in triggered builds already have this built on Jenkins and are used in triggered builds already have this
environment variable set in their manifest. Also see environment variable set in their manifest. Also see
`./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`. `./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`.
Our Jenkins installation is located at https://ci.pytorch.org/jenkins/.

View File

@ -5,7 +5,7 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
if [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then if [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then
pip install click mock tabulate networkx==2.0 pip install click mock tabulate networkx==2.0
pip -q install "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx" pip -q install --user "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
fi fi
# Skip tests in environments where they are not built/applicable # Skip tests in environments where they are not built/applicable
@ -13,6 +13,10 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
echo 'Skipping tests' echo 'Skipping tests'
exit 0 exit 0
fi fi
if [[ "${BUILD_ENVIRONMENT}" == *-rocm* ]]; then
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
fi
# These additional packages are needed for circleci ROCm builds. # These additional packages are needed for circleci ROCm builds.
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# Need networkx 2.0 because bellmand_ford was moved in 2.1 . Scikit-image by # Need networkx 2.0 because bellmand_ford was moved in 2.1 . Scikit-image by
@ -147,8 +151,8 @@ export DNNL_MAX_CPU_ISA=AVX2
if [[ "${SHARD_NUMBER:-1}" == "1" ]]; then if [[ "${SHARD_NUMBER:-1}" == "1" ]]; then
# TODO(sdym@meta.com) remove this when the linked issue resolved. # TODO(sdym@meta.com) remove this when the linked issue resolved.
# py is temporary until https://github.com/Teemu/pytest-sugar/issues/241 is fixed # py is temporary until https://github.com/Teemu/pytest-sugar/issues/241 is fixed
pip install py==1.11.0 pip install --user py==1.11.0
pip install pytest-sugar pip install --user pytest-sugar
# NB: Warnings are disabled because they make it harder to see what # NB: Warnings are disabled because they make it harder to see what
# the actual erroring test is # the actual erroring test is
"$PYTHON" \ "$PYTHON" \

View File

@ -34,106 +34,5 @@ See `build.sh` for valid build environments (it's the giant switch).
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
# Set flags (see build.sh) and build image # Set flags (see build.sh) and build image
sudo bash -c 'TRITON=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
``` ```
## [Guidance] Adding a New Base Docker Image
### Background
The base Docker images in directory `.ci/docker/` are built by the `docker-builds.yml` workflow. Those images are used throughout the PyTorch CI/CD pipeline. You should only create or modify a base Docker image if you need specific environment changes or dependencies before building PyTorch on CI.
1. **Automatic Rebuilding**:
- The Docker image building process is triggered automatically when changes are made to files in the `.ci/docker/*` directory
- This ensures all images stay up-to-date with the latest dependencies and configurations
2. **Image Reuse in PyTorch Build Workflows** (example: linux-build):
- The images generated by `docker-builds.yml` are reused in `_linux-build.yml` through the `calculate-docker-image` step
- The `_linux-build.yml` workflow:
- Pulls the Docker image determined by the `calculate-docker-image` step
- Runs a Docker container with that image
- Executes `.ci/pytorch/build.sh` inside the container to build PyTorch
3. **Usage in Test Workflows** (example: linux-test):
- The same Docker images are also used in `_linux-test.yml` for running tests
- The `_linux-test.yml` workflow follows a similar pattern:
- It uses the `calculate-docker-image` step to determine which Docker image to use
- It pulls the Docker image and runs a container with that image
- It installs the wheels from the artifacts generated by PyTorch build jobs
- It executes test scripts (like `.ci/pytorch/test.sh` or `.ci/pytorch/multigpu-test.sh`) inside the container
### Understanding File Purposes
#### `.ci/docker/build.sh` vs `.ci/pytorch/build.sh`
- **`.ci/docker/build.sh`**:
- Used for building base Docker images
- Executed by the `docker-builds.yml` workflow to pre-build Docker images for CI
- Contains configurations for different Docker build environments
- **`.ci/pytorch/build.sh`**:
- Used for building PyTorch inside a Docker container
- Called by workflows like `_linux-build.yml` after the Docker container is started
- Builds PyTorch wheels and other artifacts
#### `.ci/docker/ci_commit_pins/` vs `.github/ci_commit_pins`
- **`.ci/docker/ci_commit_pins/`**:
- Used for pinning dependency versions during base Docker image building
- Ensures consistent environments for building PyTorch
- Changes here trigger base Docker image rebuilds
- **`.github/ci_commit_pins`**:
- Used for pinning dependency versions during PyTorch building and tests
- Ensures consistent dependencies for PyTorch across different builds
- Used by build scripts running inside Docker containers
### Step-by-Step Guide for Adding a New Base Docker Image
#### 1. Add Pinned Commits (If Applicable)
We use pinned commits for build stability. The `nightly.yml` workflow checks and updates pinned commits for certain repository dependencies daily.
If your new Docker image needs a library installed from a specific pinned commit or built from source:
1. Add the repository you want to track in `nightly.yml` and `merge-rules.yml`
2. Add the initial pinned commit in `.ci/docker/ci_commit_pins/`. The text filename should match the one defined in step 1
#### 2. Configure the Base Docker Image
1. **Add new Base Docker image configuration** (if applicable):
Add the configuration in `.ci/docker/build.sh`. For example:
```bash
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-new1)
CUDA_VERSION=12.8.1
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
NEW_ARG_1=yes
;;
```
2. **Add build arguments to Docker build command**:
If you're introducing a new argument to the Docker build, make sure to add it in the Docker build step in `.ci/docker/build.sh`:
```bash
docker build \
....
--build-arg "NEW_ARG_1=${NEW_ARG_1}"
```
3. **Update Dockerfile logic**:
Update the Dockerfile to use the new argument. For example, in `ubuntu/Dockerfile`:
```dockerfile
ARG NEW_ARG_1
# Set up environment for NEW_ARG_1
RUN if [ -n "${NEW_ARG_1}" ]; then bash ./do_something.sh; fi
```
4. **Add the Docker configuration** in `.github/workflows/docker-builds.yml`:
The `docker-builds.yml` workflow pre-builds the Docker images whenever changes occur in the `.ci/docker/` directory. This includes the
pinned commit updates.

View File

@ -1,7 +1,6 @@
ARG CUDA_VERSION=12.6 ARG CUDA_VERSION=12.4
ARG BASE_TARGET=cuda${CUDA_VERSION} ARG BASE_TARGET=cuda${CUDA_VERSION}
ARG ROCM_IMAGE=rocm/dev-almalinux-8:6.3-complete FROM amd64/almalinux:8 as base
FROM amd64/almalinux:8.10-20250519 as base
ENV LC_ALL en_US.UTF-8 ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8 ENV LANG en_US.UTF-8
@ -9,10 +8,12 @@ ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=11 ARG DEVTOOLSET_VERSION=11
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
RUN yum -y update RUN yum -y update
RUN yum -y install epel-release RUN yum -y install epel-release
# install glibc-langpack-en make sure en_US.UTF-8 locale is available
RUN yum -y install glibc-langpack-en
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-toolchain RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
# Just add everything as a safe.directory for git since these will be used in multiple places with git # Just add everything as a safe.directory for git since these will be used in multiple places with git
RUN git config --global --add safe.directory '*' RUN git config --global --add safe.directory '*'
@ -40,36 +41,31 @@ RUN bash ./install_conda.sh && rm install_conda.sh
# Install CUDA # Install CUDA
FROM base as cuda FROM base as cuda
ARG CUDA_VERSION=12.6 ARG CUDA_VERSION=12.4
RUN rm -rf /usr/local/cuda-* RUN rm -rf /usr/local/cuda-*
ADD ./common/install_cuda.sh install_cuda.sh ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
ENV CUDA_HOME=/usr/local/cuda-${CUDA_VERSION} ENV CUDA_HOME=/usr/local/cuda-${CUDA_VERSION}
# Preserve CUDA_VERSION for the builds # Preserve CUDA_VERSION for the builds
ENV CUDA_VERSION=${CUDA_VERSION} ENV CUDA_VERSION=${CUDA_VERSION}
# Make things in our path by default # Make things in our path by default
ENV PATH=/usr/local/cuda-${CUDA_VERSION}/bin:$PATH ENV PATH=/usr/local/cuda-${CUDA_VERSION}/bin:$PATH
FROM cuda as cuda11.8
RUN bash ./install_cuda.sh 11.8
ENV DESIRED_CUDA=11.8
FROM cuda as cuda12.1
RUN bash ./install_cuda.sh 12.1
ENV DESIRED_CUDA=12.1
FROM cuda as cuda12.4
RUN bash ./install_cuda.sh 12.4
ENV DESIRED_CUDA=12.4
FROM cuda as cuda12.6 FROM cuda as cuda12.6
RUN bash ./install_cuda.sh 12.6 RUN bash ./install_cuda.sh 12.6
ENV DESIRED_CUDA=12.6 ENV DESIRED_CUDA=12.6
FROM cuda as cuda12.8
RUN bash ./install_cuda.sh 12.8
ENV DESIRED_CUDA=12.8
FROM cuda as cuda12.9
RUN bash ./install_cuda.sh 12.9
ENV DESIRED_CUDA=12.9
FROM ${ROCM_IMAGE} as rocm
ENV PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
ENV MKLROOT /opt/intel
# Install MNIST test data # Install MNIST test data
FROM base as mnist FROM base as mnist
ADD ./common/install_mnist.sh install_mnist.sh ADD ./common/install_mnist.sh install_mnist.sh
@ -77,9 +73,9 @@ RUN bash ./install_mnist.sh
FROM base as all_cuda FROM base as all_cuda
COPY --from=cuda11.8 /usr/local/cuda-11.8 /usr/local/cuda-11.8 COPY --from=cuda11.8 /usr/local/cuda-11.8 /usr/local/cuda-11.8
COPY --from=cuda12.1 /usr/local/cuda-12.1 /usr/local/cuda-12.1
COPY --from=cuda12.4 /usr/local/cuda-12.4 /usr/local/cuda-12.4
COPY --from=cuda12.6 /usr/local/cuda-12.6 /usr/local/cuda-12.6 COPY --from=cuda12.6 /usr/local/cuda-12.6 /usr/local/cuda-12.6
COPY --from=cuda12.8 /usr/local/cuda-12.8 /usr/local/cuda-12.8
COPY --from=cuda12.9 /usr/local/cuda-12.9 /usr/local/cuda-12.9
# Final step # Final step
FROM ${BASE_TARGET} as final FROM ${BASE_TARGET} as final

View File

@ -1,70 +1,82 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# Script used only in CD pipeline # Script used only in CD pipeline
set -exou pipefail set -eou pipefail
image="$1" image="$1"
shift shift
if [ -z "${image}" ]; then if [ -z "${image}" ]; then
echo "Usage: $0 IMAGENAME:ARCHTAG" echo "Usage: $0 IMAGE"
exit 1 exit 1
fi fi
# Go from imagename:tag to tag DOCKER_IMAGE_NAME="pytorch/${image}"
DOCKER_TAG_PREFIX=$(echo "${image}" | awk -F':' '{print $2}')
CUDA_VERSION=""
ROCM_VERSION=""
EXTRA_BUILD_ARGS=""
if [[ "${DOCKER_TAG_PREFIX}" == cuda* ]]; then
# extract cuda version from image name and tag. e.g. manylinux2_28-builder:cuda12.8 returns 12.8
CUDA_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'cuda' '{print $2}')
EXTRA_BUILD_ARGS="--build-arg CUDA_VERSION=${CUDA_VERSION}"
elif [[ "${DOCKER_TAG_PREFIX}" == rocm* ]]; then
# extract rocm version from image name and tag. e.g. manylinux2_28-builder:rocm6.2.4 returns 6.2.4
ROCM_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'rocm' '{print $2}')
EXTRA_BUILD_ARGS="--build-arg ROCM_IMAGE=rocm/dev-almalinux-8:${ROCM_VERSION}-complete"
fi
case ${DOCKER_TAG_PREFIX} in
cpu)
BASE_TARGET=base
;;
cuda*)
BASE_TARGET=cuda${CUDA_VERSION}
;;
rocm*)
BASE_TARGET=rocm
;;
*)
echo "ERROR: Unknown docker tag ${DOCKER_TAG_PREFIX}"
exit 1
;;
esac
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker
export DOCKER_BUILDKIT=1 export DOCKER_BUILDKIT=1
TOPDIR=$(git rev-parse --show-toplevel) TOPDIR=$(git rev-parse --show-toplevel)
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
docker build \ CUDA_VERSION=${CUDA_VERSION:-12.1}
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "DEVTOOLSET_VERSION=11" \
${EXTRA_BUILD_ARGS} \
-t ${tmp_tag} \
$@ \
-f "${TOPDIR}/.ci/docker/almalinux/Dockerfile" \
${TOPDIR}/.ci/docker/
if [ -n "${CUDA_VERSION}" ]; then case ${CUDA_VERSION} in
cpu)
BASE_TARGET=base
DOCKER_TAG=cpu
;;
all)
BASE_TARGET=all_cuda
DOCKER_TAG=latest
;;
*)
BASE_TARGET=cuda${CUDA_VERSION}
DOCKER_TAG=cuda${CUDA_VERSION}
;;
esac
(
set -x
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker
docker build \
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=11" \
-t ${DOCKER_IMAGE_NAME} \
$@ \
-f "${TOPDIR}/.ci/docker/almalinux/Dockerfile" \
${TOPDIR}/.ci/docker/
)
if [[ "${DOCKER_TAG}" =~ ^cuda* ]]; then
# Test that we're using the right CUDA compiler # Test that we're using the right CUDA compiler
docker run --rm "${tmp_tag}" nvcc --version | grep "cuda_${CUDA_VERSION}" (
set -x
docker run --rm "${DOCKER_IMAGE_NAME}" nvcc --version | grep "cuda_${CUDA_VERSION}"
)
fi
GITHUB_REF=${GITHUB_REF:-$(git symbolic-ref -q HEAD || git describe --tags --exact-match)}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE_NAME}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE_NAME}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH:-}" == true ]]; then
(
set -x
docker push "${DOCKER_IMAGE_NAME}"
if [[ -n ${GITHUB_REF} ]]; then
docker tag ${DOCKER_IMAGE_NAME} ${DOCKER_IMAGE_BRANCH_TAG}
docker tag ${DOCKER_IMAGE_NAME} ${DOCKER_IMAGE_SHA_TAG}
docker push "${DOCKER_IMAGE_BRANCH_TAG}"
docker push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi fi

View File

@ -50,23 +50,30 @@ if [[ "$image" == *xla* ]]; then
exit 0 exit 0
fi fi
if [[ "$image" == *-jammy* ]]; then if [[ "$image" == *-focal* ]]; then
UBUNTU_VERSION=20.04
elif [[ "$image" == *-jammy* ]]; then
UBUNTU_VERSION=22.04 UBUNTU_VERSION=22.04
elif [[ "$image" == *-noble* ]]; then
UBUNTU_VERSION=24.04
elif [[ "$image" == *ubuntu* ]]; then elif [[ "$image" == *ubuntu* ]]; then
extract_version_from_image_name ubuntu UBUNTU_VERSION extract_version_from_image_name ubuntu UBUNTU_VERSION
elif [[ "$image" == *centos* ]]; then
extract_version_from_image_name centos CENTOS_VERSION
fi fi
if [ -n "${UBUNTU_VERSION}" ]; then if [ -n "${UBUNTU_VERSION}" ]; then
OS="ubuntu" OS="ubuntu"
elif [ -n "${CENTOS_VERSION}" ]; then
OS="centos"
else else
echo "Unable to derive operating system base..." echo "Unable to derive operating system base..."
exit 1 exit 1
fi fi
DOCKERFILE="${OS}/Dockerfile" DOCKERFILE="${OS}/Dockerfile"
if [[ "$image" == *rocm* ]]; then # When using ubuntu - 22.04, start from Ubuntu docker image, instead of nvidia/cuda docker image.
if [[ "$image" == *cuda* && "$UBUNTU_VERSION" != "22.04" ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
elif [[ "$image" == *rocm* ]]; then
DOCKERFILE="${OS}-rocm/Dockerfile" DOCKERFILE="${OS}-rocm/Dockerfile"
elif [[ "$image" == *xpu* ]]; then elif [[ "$image" == *xpu* ]]; then
DOCKERFILE="${OS}-xpu/Dockerfile" DOCKERFILE="${OS}-xpu/Dockerfile"
@ -78,6 +85,9 @@ elif [[ "$image" == *linter* ]]; then
DOCKERFILE="linter/Dockerfile" DOCKERFILE="linter/Dockerfile"
fi fi
# CMake 3.18 is needed to support CUDA17 language variant
CMAKE_VERSION=3.18.5
_UCX_COMMIT=7bb2722ff2187a0cad557ae4a6afa090569f83fb _UCX_COMMIT=7bb2722ff2187a0cad557ae4a6afa090569f83fb
_UCC_COMMIT=20eae37090a4ce1b32bcce6144ccad0b49943e0b _UCC_COMMIT=20eae37090a4ce1b32bcce6144ccad0b49943e0b
if [[ "$image" == *rocm* ]]; then if [[ "$image" == *rocm* ]]; then
@ -85,168 +95,285 @@ if [[ "$image" == *rocm* ]]; then
_UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d _UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d
fi fi
tag=$(echo $image | awk -F':' '{print $2}')
# It's annoying to rename jobs every time you want to rewrite a # It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it # configuration, so we hardcode everything here rather than do it
# from scratch # from scratch
case "$tag" in case "$image" in
pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11) pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc11)
CUDA_VERSION=12.4 CUDA_VERSION=12.6.3
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10 ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11 GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
KATEX=yes KATEX=yes
UCX_COMMIT=${_UCX_COMMIT} UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT} UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
;; ;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11) pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.8.1 CUDA_VERSION=12.4.1
ANACONDA_PYTHON_VERSION=3.10 CUDNN_VERSION=9
GCC_VERSION=11
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.8.1
ANACONDA_PYTHON_VERSION=3.10 ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9 GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
KATEX=yes KATEX=yes
UCX_COMMIT=${_UCX_COMMIT} UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT} UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
INDUCTOR_BENCHMARKS=yes INDUCTOR_BENCHMARKS=yes
;; ;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc9-inductor-benchmarks) pytorch-linux-focal-cuda12.4-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.8.1 CUDA_VERSION=12.4.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12 ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9 GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
KATEX=yes KATEX=yes
UCX_COMMIT=${_UCX_COMMIT} UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT} UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
INDUCTOR_BENCHMARKS=yes INDUCTOR_BENCHMARKS=yes
;; ;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3.13-gcc9-inductor-benchmarks) pytorch-linux-focal-cuda12.4-cudnn9-py3.13-gcc9-inductor-benchmarks)
CUDA_VERSION=12.8.1 CUDA_VERSION=12.4.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.13 ANACONDA_PYTHON_VERSION=3.13
GCC_VERSION=9 GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
KATEX=yes KATEX=yes
UCX_COMMIT=${_UCX_COMMIT} UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT} UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
INDUCTOR_BENCHMARKS=yes INDUCTOR_BENCHMARKS=yes
;; ;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-vllm) pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc9)
CUDA_VERSION=12.8.1 CUDA_VERSION=12.6.3
ANACONDA_PYTHON_VERSION=3.12 CUDNN_VERSION=9
GCC_VERSION=11
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9)
CUDA_VERSION=12.8.1
ANACONDA_PYTHON_VERSION=3.10 ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9 GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
KATEX=yes KATEX=yes
UCX_COMMIT=${_UCX_COMMIT} UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT} UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
;; ;;
pytorch-linux-jammy-py3-clang12-onnx) pytorch-linux-focal-cuda12.6-cudnn9-py3-gcc9-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.9 CUDA_VERSION=12.6.3
CLANG_VERSION=12 CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.6-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.6.3
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.6-cudnn9-py3.13-gcc9-inductor-benchmarks)
CUDA_VERSION=12.6.3
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.13
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda11.8-cudnn9-py3-gcc9)
CUDA_VERSION=11.8.0
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3-clang10-onnx)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
ONNX=yes ONNX=yes
;; ;;
pytorch-linux-jammy-py3.9-clang12) pytorch-linux-focal-py3.9-clang10)
ANACONDA_PYTHON_VERSION=3.9 ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=12 CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
;; ;;
pytorch-linux-jammy-rocm-n-py3 | pytorch-linux-noble-rocm-n-py3) pytorch-linux-focal-py3.11-clang10)
if [[ $tag =~ "jammy" ]]; then ANACONDA_PYTHON_VERSION=3.11
ANACONDA_PYTHON_VERSION=3.10 CLANG_VERSION=10
else PROTOBUF=yes
ANACONDA_PYTHON_VERSION=3.12 DB=yes
fi
GCC_VERSION=11
VISION=yes VISION=yes
ROCM_VERSION=6.4 VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.9-gcc9)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-rocm-n-1-py3)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=6.2.4
NINJA_VERSION=1.9.0 NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
KATEX=yes KATEX=yes
UCX_COMMIT=${_UCX_COMMIT} UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT} UCC_COMMIT=${_UCC_COMMIT}
INDUCTOR_BENCHMARKS=yes INDUCTOR_BENCHMARKS=yes
;; ;;
pytorch-linux-noble-rocm-alpha-py3) pytorch-linux-focal-rocm-n-py3)
ANACONDA_PYTHON_VERSION=3.12 ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11 GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
ROCM_VERSION=7.0 ROCM_VERSION=6.3
NINJA_VERSION=1.9.0 NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
KATEX=yes KATEX=yes
UCX_COMMIT=${_UCX_COMMIT} UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT} UCC_COMMIT=${_UCC_COMMIT}
INDUCTOR_BENCHMARKS=yes INDUCTOR_BENCHMARKS=yes
PYTORCH_ROCM_ARCH="gfx90a;gfx942;gfx950" ;;
pytorch-linux-jammy-xpu-2024.0-py3)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
XPU_VERSION=0.5
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;; ;;
pytorch-linux-jammy-xpu-2025.0-py3) pytorch-linux-jammy-xpu-2025.0-py3)
ANACONDA_PYTHON_VERSION=3.9 ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11 GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
XPU_VERSION=2025.0 XPU_VERSION=2025.0
NINJA_VERSION=1.9.0 NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
;; ;;
pytorch-linux-jammy-xpu-2025.1-py3) pytorch-linux-jammy-py3.9-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
VISION=yes
XPU_VERSION=2025.1
NINJA_VERSION=1.9.0
TRITON=yes
;;
pytorch-linux-jammy-py3.9-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.9 ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11 GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
KATEX=yes KATEX=yes
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
DOCS=yes DOCS=yes
INDUCTOR_BENCHMARKS=yes INDUCTOR_BENCHMARKS=yes
;; ;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3.9-clang12) pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-clang12)
ANACONDA_PYTHON_VERSION=3.9 ANACONDA_PYTHON_VERSION=3.9
CUDA_VERSION=12.8.1 CUDA_VERSION=11.8
CUDNN_VERSION=9
CLANG_VERSION=12 CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
TRITON=yes TRITON=yes
;; ;;
pytorch-linux-jammy-py3-clang12-asan)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-py3-clang15-asan)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=15
CONDA_CMAKE=yes
VISION=yes
;;
pytorch-linux-jammy-py3-clang18-asan) pytorch-linux-jammy-py3-clang18-asan)
ANACONDA_PYTHON_VERSION=3.10 ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=18 CLANG_VERSION=18
CONDA_CMAKE=yes
VISION=yes VISION=yes
;; ;;
pytorch-linux-jammy-py3.9-gcc11) pytorch-linux-jammy-py3.9-gcc11)
ANACONDA_PYTHON_VERSION=3.9 ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11 GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
KATEX=yes KATEX=yes
CONDA_CMAKE=yes
TRITON=yes TRITON=yes
DOCS=yes DOCS=yes
UNINSTALL_DILL=yes UNINSTALL_DILL=yes
@ -254,12 +381,14 @@ case "$tag" in
pytorch-linux-jammy-py3-clang12-executorch) pytorch-linux-jammy-py3-clang12-executorch)
ANACONDA_PYTHON_VERSION=3.10 ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=12 CLANG_VERSION=12
CONDA_CMAKE=yes
EXECUTORCH=yes EXECUTORCH=yes
;; ;;
pytorch-linux-jammy-py3.12-halide) pytorch-linux-jammy-py3.12-halide)
CUDA_VERSION=12.6 CUDA_VERSION=12.6
ANACONDA_PYTHON_VERSION=3.12 ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11 GCC_VERSION=11
CONDA_CMAKE=yes
HALIDE=yes HALIDE=yes
TRITON=yes TRITON=yes
;; ;;
@ -267,25 +396,29 @@ case "$tag" in
CUDA_VERSION=12.6 CUDA_VERSION=12.6
ANACONDA_PYTHON_VERSION=3.12 ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11 GCC_VERSION=11
CONDA_CMAKE=yes
TRITON_CPU=yes TRITON_CPU=yes
;; ;;
pytorch-linux-jammy-linter) pytorch-linux-focal-linter)
# TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627. # TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.
# We will need to update mypy version eventually, but that's for another day. The task # We will need to update mypy version eventually, but that's for another day. The task
# would be to upgrade mypy to 1.0.0 with Python 3.11 # would be to upgrade mypy to 1.0.0 with Python 3.11
PYTHON_VERSION=3.9 ANACONDA_PYTHON_VERSION=3.9
CONDA_CMAKE=yes
;; ;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3.9-linter) pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-linter)
PYTHON_VERSION=3.9 ANACONDA_PYTHON_VERSION=3.9
CUDA_VERSION=12.8.1 CUDA_VERSION=11.8
CONDA_CMAKE=yes
;; ;;
pytorch-linux-jammy-aarch64-py3.10-gcc11) pytorch-linux-jammy-aarch64-py3.10-gcc11)
ANACONDA_PYTHON_VERSION=3.10 ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11 GCC_VERSION=11
ACL=yes ACL=yes
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
CONDA_CMAKE=yes CONDA_CMAKE=yes
OPENBLAS=yes
# snadampal: skipping llvm src build install because the current version # snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific # from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes SKIP_LLVM_SRC_BUILD_INSTALL=yes
@ -294,9 +427,10 @@ case "$tag" in
ANACONDA_PYTHON_VERSION=3.10 ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11 GCC_VERSION=11
ACL=yes ACL=yes
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
CONDA_CMAKE=yes CONDA_CMAKE=yes
OPENBLAS=yes
# snadampal: skipping llvm src build install because the current version # snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific # from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes SKIP_LLVM_SRC_BUILD_INSTALL=yes
@ -304,6 +438,8 @@ case "$tag" in
;; ;;
*) *)
# Catch-all for builds that are not hardcoded. # Catch-all for builds that are not hardcoded.
PROTOBUF=yes
DB=yes
VISION=yes VISION=yes
echo "image '$image' did not match an existing build configuration" echo "image '$image' did not match an existing build configuration"
if [[ "$image" == *py* ]]; then if [[ "$image" == *py* ]]; then
@ -311,6 +447,7 @@ case "$tag" in
fi fi
if [[ "$image" == *cuda* ]]; then if [[ "$image" == *cuda* ]]; then
extract_version_from_image_name cuda CUDA_VERSION extract_version_from_image_name cuda CUDA_VERSION
extract_version_from_image_name cudnn CUDNN_VERSION
fi fi
if [[ "$image" == *rocm* ]]; then if [[ "$image" == *rocm* ]]; then
extract_version_from_image_name rocm ROCM_VERSION extract_version_from_image_name rocm ROCM_VERSION
@ -318,7 +455,8 @@ case "$tag" in
TRITON=yes TRITON=yes
# To ensure that any ROCm config will build using conda cmake # To ensure that any ROCm config will build using conda cmake
# and thus have LAPACK/MKL enabled # and thus have LAPACK/MKL enabled
fi CONDA_CMAKE=yes
fi
if [[ "$image" == *centos7* ]]; then if [[ "$image" == *centos7* ]]; then
NINJA_VERSION=1.10.2 NINJA_VERSION=1.10.2
fi fi
@ -334,34 +472,45 @@ case "$tag" in
if [[ "$image" == *glibc* ]]; then if [[ "$image" == *glibc* ]]; then
extract_version_from_image_name glibc GLIBC_VERSION extract_version_from_image_name glibc GLIBC_VERSION
fi fi
if [[ "$image" == *cmake* ]]; then
extract_version_from_image_name cmake CMAKE_VERSION
fi
;; ;;
esac esac
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]') tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
no_cache_flag="" #when using cudnn version 8 install it separately from cuda
progress_flag="" if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
# Do not use cache and progress=plain when in CI IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
if [[ -n "${CI:-}" ]]; then if [[ ${CUDNN_VERSION} == 9 ]]; then
no_cache_flag="--no-cache" IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
progress_flag="--progress=plain" fi
fi fi
# Build image # Build image
docker build \ docker build \
${no_cache_flag} \ --no-cache \
${progress_flag} \ --progress=plain \
--build-arg "BUILD_ENVIRONMENT=${image}" \ --build-arg "BUILD_ENVIRONMENT=${image}" \
--build-arg "PROTOBUF=${PROTOBUF:-}" \
--build-arg "LLVMDEV=${LLVMDEV:-}" \ --build-arg "LLVMDEV=${LLVMDEV:-}" \
--build-arg "DB=${DB:-}" \
--build-arg "VISION=${VISION:-}" \ --build-arg "VISION=${VISION:-}" \
--build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \ --build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \
--build-arg "CENTOS_VERSION=${CENTOS_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" \ --build-arg "DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" \
--build-arg "GLIBC_VERSION=${GLIBC_VERSION}" \ --build-arg "GLIBC_VERSION=${GLIBC_VERSION}" \
--build-arg "CLANG_VERSION=${CLANG_VERSION}" \ --build-arg "CLANG_VERSION=${CLANG_VERSION}" \
--build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \ --build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \
--build-arg "PYTHON_VERSION=${PYTHON_VERSION}" \
--build-arg "GCC_VERSION=${GCC_VERSION}" \ --build-arg "GCC_VERSION=${GCC_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \ --build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
--build-arg "TENSORRT_VERSION=${TENSORRT_VERSION}" \
--build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \
--build-arg "VULKAN_SDK_VERSION=${VULKAN_SDK_VERSION}" \
--build-arg "SWIFTSHADER=${SWIFTSHADER}" \
--build-arg "CMAKE_VERSION=${CMAKE_VERSION:-}" \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \ --build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \ --build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \ --build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
@ -369,6 +518,7 @@ docker build \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \ --build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \ --build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \ --build-arg "UCC_COMMIT=${UCC_COMMIT}" \
--build-arg "CONDA_CMAKE=${CONDA_CMAKE}" \
--build-arg "TRITON=${TRITON}" \ --build-arg "TRITON=${TRITON}" \
--build-arg "TRITON_CPU=${TRITON_CPU}" \ --build-arg "TRITON_CPU=${TRITON_CPU}" \
--build-arg "ONNX=${ONNX}" \ --build-arg "ONNX=${ONNX}" \
@ -377,9 +527,7 @@ docker build \
--build-arg "EXECUTORCH=${EXECUTORCH}" \ --build-arg "EXECUTORCH=${EXECUTORCH}" \
--build-arg "HALIDE=${HALIDE}" \ --build-arg "HALIDE=${HALIDE}" \
--build-arg "XPU_VERSION=${XPU_VERSION}" \ --build-arg "XPU_VERSION=${XPU_VERSION}" \
--build-arg "UNINSTALL_DILL=${UNINSTALL_DILL}" \
--build-arg "ACL=${ACL:-}" \ --build-arg "ACL=${ACL:-}" \
--build-arg "OPENBLAS=${OPENBLAS:-}" \
--build-arg "SKIP_SCCACHE_INSTALL=${SKIP_SCCACHE_INSTALL:-}" \ --build-arg "SKIP_SCCACHE_INSTALL=${SKIP_SCCACHE_INSTALL:-}" \
--build-arg "SKIP_LLVM_SRC_BUILD_INSTALL=${SKIP_LLVM_SRC_BUILD_INSTALL:-}" \ --build-arg "SKIP_LLVM_SRC_BUILD_INSTALL=${SKIP_LLVM_SRC_BUILD_INSTALL:-}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \ -f $(dirname ${DOCKERFILE})/Dockerfile \
@ -396,7 +544,7 @@ docker build \
UBUNTU_VERSION=$(echo ${UBUNTU_VERSION} | sed 's/-rc$//') UBUNTU_VERSION=$(echo ${UBUNTU_VERSION} | sed 's/-rc$//')
function drun() { function drun() {
docker run --rm "$tmp_tag" "$@" docker run --rm "$tmp_tag" $*
} }
if [[ "$OS" == "ubuntu" ]]; then if [[ "$OS" == "ubuntu" ]]; then
@ -444,23 +592,3 @@ if [ -n "$KATEX" ]; then
exit 1 exit 1
fi fi
fi fi
HAS_TRITON=$(drun python -c "import triton" > /dev/null 2>&1 && echo "yes" || echo "no")
if [[ -n "$TRITON" || -n "$TRITON_CPU" ]]; then
if [ "$HAS_TRITON" = "no" ]; then
echo "expecting triton to be installed, but it is not"
exit 1
fi
elif [ "$HAS_TRITON" = "yes" ]; then
echo "expecting triton to not be installed, but it is"
exit 1
fi
# Sanity check cmake version. Executorch reinstalls cmake and I'm not sure if
# they support 4.0.0 yet, so exclude them from this check.
CMAKE_VERSION=$(drun cmake --version)
if [[ "$EXECUTORCH" != *yes* && "$CMAKE_VERSION" != *4.* ]]; then
echo "CMake version is not 4.0.0:"
drun cmake --version
exit 1
fi

View File

@ -17,8 +17,9 @@ RUN bash ./install_base.sh && rm install_base.sh
# Update CentOS git version # Update CentOS git version
RUN yum -y remove git RUN yum -y remove git
RUN yum -y remove git-* RUN yum -y remove git-*
RUN yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm && \ RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm || \
sed -i 's/packages.endpoint/packages.endpointdev/' /etc/yum.repos.d/endpoint.repo (yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm && \
sed -i "s/packages.endpoint/packages.endpointdev/" /etc/yum.repos.d/endpoint.repo)
RUN yum install -y git RUN yum install -y git
# Install devtoolset # Install devtoolset
@ -39,7 +40,7 @@ RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest) # Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION ARG ANACONDA_PYTHON_VERSION
ARG BUILD_ENVIRONMENT ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt COPY requirements-ci.txt /opt/conda/requirements-ci.txt
@ -47,6 +48,20 @@ COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV # (optional) Install vision packages like OpenCV
ARG VISION ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./ COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -60,7 +75,7 @@ COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh RUN bash ./install_rocm.sh
RUN rm install_rocm.sh RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION} RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh RUN rm install_rocm_magma.sh
COPY ./common/install_amdsmi.sh install_amdsmi.sh COPY ./common/install_amdsmi.sh install_amdsmi.sh
RUN bash ./install_amdsmi.sh RUN bash ./install_amdsmi.sh
@ -74,6 +89,12 @@ ENV MAGMA_HOME /opt/rocm/magma
ENV LANG en_US.utf8 ENV LANG en_US.utf8
ENV LC_ALL en_US.utf8 ENV LC_ALL en_US.utf8
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version # (optional) Install non-default Ninja version
ARG NINJA_VERSION ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh COPY ./common/install_ninja.sh install_ninja.sh

View File

@ -1 +1 @@
56392aa978594cc155fa8af48cd949f5b5f1823a 01a22b6f16d117454b7d21ebdc691b0785b84a7f

View File

@ -1 +1 @@
v2.27.5-1 v2.26.2-1

View File

@ -1 +0,0 @@
e03a63be43e33596f7f0a43b0f530353785e4a59

View File

@ -1 +1 @@
ae324eeac8e102a2b40370e341460f3791353398 0bcc8265e677e5321606a3311bf71470f14456a8

View File

@ -1 +1 @@
f7888497a1eb9e98d4c07537f0d0bcfe180d1363 96316ce50fade7e209553aba4898cd9b82aab83b

View File

@ -23,10 +23,6 @@ conda_install() {
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION" $* as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION" $*
} }
conda_install_through_forge() {
as_jenkins conda install -c conda-forge -q -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION" $*
}
conda_run() { conda_run() {
as_jenkins conda run -n py_$ANACONDA_PYTHON_VERSION --no-capture-output $* as_jenkins conda run -n py_$ANACONDA_PYTHON_VERSION --no-capture-output $*
} }

View File

@ -15,9 +15,6 @@ install_ubuntu() {
elif [[ "$UBUNTU_VERSION" == "22.04"* ]]; then elif [[ "$UBUNTU_VERSION" == "22.04"* ]]; then
cmake3="cmake=3.22*" cmake3="cmake=3.22*"
maybe_libiomp_dev="" maybe_libiomp_dev=""
elif [[ "$UBUNTU_VERSION" == "24.04"* ]]; then
cmake3="cmake=3.28*"
maybe_libiomp_dev=""
else else
cmake3="cmake=3.5*" cmake3="cmake=3.5*"
maybe_libiomp_dev="libiomp-dev" maybe_libiomp_dev="libiomp-dev"
@ -33,6 +30,18 @@ install_ubuntu() {
maybe_libomp_dev="" maybe_libomp_dev=""
fi fi
# HACK: UCC testing relies on libnccl library from NVIDIA repo, and version 2.16 crashes
# See https://github.com/pytorch/pytorch/pull/105260#issuecomment-1673399729
# TODO: Eliminate this hack, we should not relay on apt-get installation
# See https://github.com/pytorch/pytorch/issues/144768
if [[ "$UBUNTU_VERSION" == "20.04"* && "$CUDA_VERSION" == "11.8"* ]]; then
maybe_libnccl_dev="libnccl2=2.15.5-1+cuda11.8 libnccl-dev=2.15.5-1+cuda11.8 --allow-downgrades --allow-change-held-packages"
elif [[ "$UBUNTU_VERSION" == "20.04"* && "$CUDA_VERSION" == "12.4"* ]]; then
maybe_libnccl_dev="libnccl2=2.26.2-1+cuda12.4 libnccl-dev=2.26.2-1+cuda12.4 --allow-downgrades --allow-change-held-packages"
else
maybe_libnccl_dev=""
fi
# Install common dependencies # Install common dependencies
apt-get update apt-get update
# TODO: Some of these may not be necessary # TODO: Some of these may not be necessary
@ -61,6 +70,7 @@ install_ubuntu() {
libasound2-dev \ libasound2-dev \
libsndfile-dev \ libsndfile-dev \
${maybe_libomp_dev} \ ${maybe_libomp_dev} \
${maybe_libnccl_dev} \
software-properties-common \ software-properties-common \
wget \ wget \
sudo \ sudo \
@ -89,6 +99,9 @@ install_centos() {
ccache_deps="asciidoc docbook-dtds docbook-style-xsl libxslt" ccache_deps="asciidoc docbook-dtds docbook-style-xsl libxslt"
numpy_deps="gcc-gfortran" numpy_deps="gcc-gfortran"
# Note: protobuf-c-{compiler,devel} on CentOS are too old to be used
# for Caffe2. That said, we still install them to make sure the build
# system opts to build/use protoc and libprotobuf from third-party.
yum install -y \ yum install -y \
$ccache_deps \ $ccache_deps \
$numpy_deps \ $numpy_deps \

View File

@ -9,7 +9,7 @@ install_ubuntu() {
# Instead use lib and headers from OpenSSL1.1 installed in `install_openssl.sh`` # Instead use lib and headers from OpenSSL1.1 installed in `install_openssl.sh``
apt-get install -y cargo apt-get install -y cargo
echo "Checking out sccache repo" echo "Checking out sccache repo"
git clone https://github.com/mozilla/sccache -b v0.10.0 git clone https://github.com/mozilla/sccache -b v0.9.1
cd sccache cd sccache
echo "Building sccache" echo "Building sccache"
cargo build --release cargo build --release

View File

@ -4,10 +4,16 @@ set -ex
if [ -n "$CLANG_VERSION" ]; then if [ -n "$CLANG_VERSION" ]; then
if [[ $UBUNTU_VERSION == 22.04 ]]; then if [[ $CLANG_VERSION == 9 && $UBUNTU_VERSION == 18.04 ]]; then
sudo apt-get update
# gpg-agent is not available by default on 18.04
sudo apt-get install -y --no-install-recommends gpg-agent
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-${CLANG_VERSION} main"
elif [[ $UBUNTU_VERSION == 22.04 ]]; then
# work around ubuntu apt-get conflicts # work around ubuntu apt-get conflicts
sudo apt-get -y -f install sudo apt-get -y -f install
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add - wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
if [[ $CLANG_VERSION == 18 ]]; then if [[ $CLANG_VERSION == 18 ]]; then
apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main" apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main"
fi fi
@ -35,7 +41,7 @@ if [ -n "$CLANG_VERSION" ]; then
# clang's packaging is a little messed up (the runtime libs aren't # clang's packaging is a little messed up (the runtime libs aren't
# added into the linker path), so give it a little help # added into the linker path), so give it a little help
clang_lib=("/usr/lib/llvm-$CLANG_VERSION/lib/clang/"*"/lib/linux") clang_lib=("/usr/lib/llvm-$CLANG_VERSION/lib/clang/"*"/lib/linux")
echo "$clang_lib" >/etc/ld.so.conf.d/clang.conf echo "$clang_lib" > /etc/ld.so.conf.d/clang.conf
ldconfig ldconfig
# Cleanup package manager # Cleanup package manager

View File

@ -0,0 +1,31 @@
#!/bin/bash
set -ex
[ -n "$CMAKE_VERSION" ]
# Remove system cmake install so it won't get used instead
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
apt-get remove cmake -y
;;
centos)
yum remove cmake -y
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
# Turn 3.6.3 into v3.6
path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')
file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"
# Download and install specific CMake version in /usr/local
pushd /tmp
curl -Os --retry 3 "https://cmake.org/files/${path}/${file}"
tar -C /usr/local --strip-components 1 --no-same-owner -zxf cmake-*.tar.gz
rm -f cmake-*.tar.gz
popd

View File

@ -4,8 +4,12 @@ set -ex
# Optionally install conda # Optionally install conda
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download" # @lint-ignore BASE_URL="https://repo.anaconda.com/miniconda"
CONDA_FILE="Miniforge3-Linux-$(uname -m).sh" CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
if [[ $(uname -m) == "aarch64" ]] || [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download"
CONDA_FILE="Miniforge3-Linux-$(uname -m).sh"
fi
MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1) MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)
MINOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 2) MINOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 2)
@ -17,6 +21,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
exit 1 exit 1
;; ;;
esac esac
mkdir -p /opt/conda mkdir -p /opt/conda
chown jenkins:jenkins /opt/conda chown jenkins:jenkins /opt/conda
@ -57,33 +62,32 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# libstdcxx from conda default channels are too old, we need GLIBCXX_3.4.30 # libstdcxx from conda default channels are too old, we need GLIBCXX_3.4.30
# which is provided in libstdcxx 12 and up. # which is provided in libstdcxx 12 and up.
conda_install libstdcxx-ng=12.3.0 --update-deps -c conda-forge conda_install libstdcxx-ng=12.3.0 -c conda-forge
# Miniforge installer doesn't install sqlite by default
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
conda_install sqlite
fi
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
if [[ $(uname -m) != "aarch64" ]]; then if [[ $(uname -m) == "aarch64" ]]; then
pip_install mkl==2024.2.0 conda_install "openblas==0.3.29=*openmp*"
pip_install mkl-static==2024.2.0 else
pip_install mkl-include==2024.2.0 conda_install "mkl=2021.4.0 mkl-include=2021.4.0"
fi fi
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
# and libpython-static for torch deploy # and libpython-static for torch deploy
conda_install llvmdev=8.0.0 "libpython-static=${ANACONDA_PYTHON_VERSION}" conda_install llvmdev=8.0.0 "libpython-static=${ANACONDA_PYTHON_VERSION}"
# Use conda cmake in some cases. Conda cmake will be newer than our supported
# min version (3.5 for xenial and 3.10 for bionic), so we only do it in those
# following builds that we know should use conda. Specifically, Ubuntu bionic
# and focal cannot find conda mkl with stock cmake, so we need a cmake from conda
if [ -n "${CONDA_CMAKE}" ]; then
conda_install cmake
fi
# Magma package names are concatenation of CUDA major and minor ignoring revision # Magma package names are concatenation of CUDA major and minor ignoring revision
# I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89 # I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89
# Magma is installed from a tarball in the ossci-linux bucket into the conda env # Magma is installed from a tarball in the ossci-linux bucket into the conda env
if [ -n "$CUDA_VERSION" ]; then if [ -n "$CUDA_VERSION" ]; then
conda_run ${SCRIPT_FOLDER}/install_magma_conda.sh $(cut -f1-2 -d'.' <<< ${CUDA_VERSION}) ${SCRIPT_FOLDER}/install_magma_conda.sh $(cut -f1-2 -d'.' <<< ${CUDA_VERSION}) ${ANACONDA_PYTHON_VERSION}
fi
if [[ "$UBUNTU_VERSION" == "24.04"* ]] ; then
conda_install_through_forge libstdcxx-ng=14
fi fi
# Install some other packages, including those needed for Python test reporting # Install some other packages, including those needed for Python test reporting

View File

@ -3,10 +3,11 @@
set -uex -o pipefail set -uex -o pipefail
PYTHON_DOWNLOAD_URL=https://www.python.org/ftp/python PYTHON_DOWNLOAD_URL=https://www.python.org/ftp/python
PYTHON_DOWNLOAD_GITHUB_BRANCH=https://github.com/python/cpython/archive/refs/heads
GET_PIP_URL=https://bootstrap.pypa.io/get-pip.py GET_PIP_URL=https://bootstrap.pypa.io/get-pip.py
# Python versions to be installed in /opt/$VERSION_NO # Python versions to be installed in /opt/$VERSION_NO
CPYTHON_VERSIONS=${CPYTHON_VERSIONS:-"3.9.0 3.10.1 3.11.0 3.12.0 3.13.0 3.13.0t 3.14.0 3.14.0t"} CPYTHON_VERSIONS=${CPYTHON_VERSIONS:-"3.8.1 3.9.0 3.10.1 3.11.0 3.12.0 3.13.0 3.13.0t"}
function check_var { function check_var {
if [ -z "$1" ]; then if [ -z "$1" ]; then
@ -23,8 +24,9 @@ function do_cpython_build {
tar -xzf Python-$py_ver.tgz tar -xzf Python-$py_ver.tgz
local additional_flags="" local additional_flags=""
if [[ "$py_ver" == *"t" ]]; then if [ "$py_ver" == "3.13.0t" ]; then
additional_flags=" --disable-gil" additional_flags=" --disable-gil"
mv cpython-3.13/ cpython-3.13t/
fi fi
pushd $py_folder pushd $py_folder
@ -66,7 +68,7 @@ function do_cpython_build {
ln -s pip3 ${prefix}/bin/pip ln -s pip3 ${prefix}/bin/pip
fi fi
# install setuptools since python 3.12 is required to use distutils # install setuptools since python 3.12 is required to use distutils
${prefix}/bin/pip install wheel==0.45.1 setuptools==80.9.0 ${prefix}/bin/pip install wheel==0.34.2 setuptools==68.2.2
local abi_tag=$(${prefix}/bin/python -c "from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag; print('{0}{1}-{2}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag()))") local abi_tag=$(${prefix}/bin/python -c "from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag; print('{0}{1}-{2}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag()))")
ln -sf ${prefix} /opt/python/${abi_tag} ln -sf ${prefix} /opt/python/${abi_tag}
} }
@ -74,20 +76,24 @@ function do_cpython_build {
function build_cpython { function build_cpython {
local py_ver=$1 local py_ver=$1
check_var $py_ver check_var $py_ver
local py_suffix=$py_ver check_var $PYTHON_DOWNLOAD_URL
local py_folder=$py_ver local py_ver_folder=$py_ver
# Special handling for nogil if [ "$py_ver" = "3.13.0t" ]; then
if [[ "${py_ver}" == *"t" ]]; then PY_VER_SHORT="3.13"
py_suffix=${py_ver::-1} PYT_VER_SHORT="3.13t"
py_folder=$py_suffix check_var $PYTHON_DOWNLOAD_GITHUB_BRANCH
wget $PYTHON_DOWNLOAD_GITHUB_BRANCH/$PY_VER_SHORT.tar.gz -O Python-$py_ver.tgz
do_cpython_build $py_ver cpython-$PYT_VER_SHORT
elif [ "$py_ver" = "3.13.0" ]; then
PY_VER_SHORT="3.13"
check_var $PYTHON_DOWNLOAD_GITHUB_BRANCH
wget $PYTHON_DOWNLOAD_GITHUB_BRANCH/$PY_VER_SHORT.tar.gz -O Python-$py_ver.tgz
do_cpython_build $py_ver cpython-$PY_VER_SHORT
else
wget -q $PYTHON_DOWNLOAD_URL/$py_ver_folder/Python-$py_ver.tgz
do_cpython_build $py_ver Python-$py_ver
fi fi
# Only b3 is available now
if [ "$py_suffix" == "3.14.0" ]; then
py_suffix="3.14.0b3"
fi
wget -q $PYTHON_DOWNLOAD_URL/$py_folder/Python-$py_suffix.tgz -O Python-$py_ver.tgz
do_cpython_build $py_ver Python-$py_suffix
rm -f Python-$py_ver.tgz rm -f Python-$py_ver.tgz
} }

View File

@ -2,128 +2,173 @@
set -ex set -ex
arch_path='' NCCL_VERSION=v2.26.2-1
targetarch=${TARGETARCH:-$(uname -m)} CUDNN_VERSION=9.5.1.17
if [ ${targetarch} = 'amd64' ] || [ "${targetarch}" = 'x86_64' ]; then
arch_path='x86_64'
else
arch_path='sbsa'
fi
NVSHMEM_VERSION=3.3.9 function install_cusparselt_040 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
function install_cuda { mkdir tmp_cusparselt && pushd tmp_cusparselt
version=$1 wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.4.0.7-archive.tar.xz
runfile=$2 tar xf libcusparse_lt-linux-x86_64-0.4.0.7-archive.tar.xz
major_minor=${version%.*} cp -a libcusparse_lt-linux-x86_64-0.4.0.7-archive/include/* /usr/local/cuda/include/
rm -rf /usr/local/cuda-${major_minor} /usr/local/cuda cp -a libcusparse_lt-linux-x86_64-0.4.0.7-archive/lib/* /usr/local/cuda/lib64/
if [[ ${arch_path} == 'sbsa' ]]; then popd
runfile="${runfile}_sbsa" rm -rf tmp_cusparselt
fi
runfile="${runfile}.run"
wget -q https://developer.download.nvidia.com/compute/cuda/${version}/local_installers/${runfile} -O ${runfile}
chmod +x ${runfile}
./${runfile} --toolkit --silent
rm -f ${runfile}
rm -f /usr/local/cuda && ln -s /usr/local/cuda-${major_minor} /usr/local/cuda
} }
function install_cudnn { function install_cusparselt_062 {
cuda_major_version=$1 # cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
cudnn_version=$2 mkdir tmp_cusparselt && pushd tmp_cusparselt
mkdir tmp_cudnn && cd tmp_cudnn wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.6.2.3-archive.tar.xz
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement tar xf libcusparse_lt-linux-x86_64-0.6.2.3-archive.tar.xz
filepath="cudnn-linux-${arch_path}-${cudnn_version}_cuda${cuda_major_version}-archive" cp -a libcusparse_lt-linux-x86_64-0.6.2.3-archive/include/* /usr/local/cuda/include/
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-${arch_path}/${filepath}.tar.xz cp -a libcusparse_lt-linux-x86_64-0.6.2.3-archive/lib/* /usr/local/cuda/lib64/
tar xf ${filepath}.tar.xz popd
cp -a ${filepath}/include/* /usr/local/cuda/include/ rm -rf tmp_cusparselt
cp -a ${filepath}/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
} }
function install_nvshmem { function install_cusparselt_063 {
cuda_major_version=$1 # e.g. "12" # cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
nvshmem_version=$2 # e.g. "3.3.9" mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.6.3.2-archive.tar.xz
tar xf libcusparse_lt-linux-x86_64-0.6.3.2-archive.tar.xz
cp -a libcusparse_lt-linux-x86_64-0.6.3.2-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-x86_64-0.6.3.2-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
case "${arch_path}" in function install_118 {
sbsa) CUDNN_VERSION=9.1.0.70
dl_arch="aarch64" NCCL_VERSION=v2.21.5-1
;; echo "Installing CUDA 11.8 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.4.0"
x86_64) rm -rf /usr/local/cuda-11.8 /usr/local/cuda
dl_arch="x64" # install CUDA 11.8.0 in the same container
;; wget -q https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
*) chmod +x cuda_11.8.0_520.61.05_linux.run
dl_arch="${arch}" ./cuda_11.8.0_520.61.05_linux.run --toolkit --silent
;; rm -f cuda_11.8.0_520.61.05_linux.run
esac rm -f /usr/local/cuda && ln -s /usr/local/cuda-11.8 /usr/local/cuda
tmpdir="tmp_nvshmem" # cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir -p "${tmpdir}" && cd "${tmpdir}" mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
# nvSHMEM license: https://docs.nvidia.com/nvshmem/api/sla.html # NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
filename="libnvshmem_cuda${cuda_major_version}-linux-${arch_path}-${nvshmem_version}" # Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
url="https://developer.download.nvidia.com/compute/redist/nvshmem/${nvshmem_version}/builds/cuda${cuda_major_version}/txz/agnostic/${dl_arch}/${filename}.tar.gz" git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
# download, unpack, install install_cusparselt_040
wget -q "${url}"
tar xf "${filename}.tar.gz"
cp -a "libnvshmem/include/"* /usr/local/cuda/include/
cp -a "libnvshmem/lib/"* /usr/local/cuda/lib64/
# cleanup ldconfig
cd ..
rm -rf "${tmpdir}"
echo "nvSHMEM ${nvshmem_version} for CUDA ${cuda_major_version} (${arch_path}) installed."
} }
function install_124 { function install_124 {
CUDNN_VERSION=9.1.0.70 CUDNN_VERSION=9.1.0.70
echo "Installing CUDA 12.4.1 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.2" echo "Installing CUDA 12.4.1 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.2"
install_cuda 12.4.1 cuda_12.4.1_550.54.15_linux rm -rf /usr/local/cuda-12.4 /usr/local/cuda
# install CUDA 12.4.1 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
chmod +x cuda_12.4.1_550.54.15_linux.run
./cuda_12.4.1_550.54.15_linux.run --toolkit --silent
rm -f cuda_12.4.1_550.54.15_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.4 /usr/local/cuda
install_cudnn 12 $CUDNN_VERSION # cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
CUDA_VERSION=12.4 bash install_nccl.sh # NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.4 bash install_cusparselt.sh install_cusparselt_062
ldconfig ldconfig
} }
function install_126 { function install_126 {
CUDNN_VERSION=9.10.2.21 echo "Installing CUDA 12.6.3 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
echo "Installing CUDA 12.6.3 and cuDNN ${CUDNN_VERSION} and NVSHMEM and NCCL and cuSparseLt-0.7.1" rm -rf /usr/local/cuda-12.6 /usr/local/cuda
install_cuda 12.6.3 cuda_12.6.3_560.35.05_linux # install CUDA 12.6.3 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda_12.6.3_560.35.05_linux.run
chmod +x cuda_12.6.3_560.35.05_linux.run
./cuda_12.6.3_560.35.05_linux.run --toolkit --silent
rm -f cuda_12.6.3_560.35.05_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.6 /usr/local/cuda
install_cudnn 12 $CUDNN_VERSION # cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
install_nvshmem 12 $NVSHMEM_VERSION # NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.6 bash install_nccl.sh install_cusparselt_063
CUDA_VERSION=12.6 bash install_cusparselt.sh
ldconfig ldconfig
} }
function install_129 { function prune_118 {
CUDNN_VERSION=9.10.2.21 echo "Pruning CUDA 11.8 and cuDNN"
echo "Installing CUDA 12.9.1 and cuDNN ${CUDNN_VERSION} and NVSHMEM and NCCL and cuSparseLt-0.7.1" #####################################################################################
# install CUDA 12.9.1 in the same container # CUDA 11.8 prune static libs
install_cuda 12.9.1 cuda_12.9.1_575.57.08_linux #####################################################################################
export NVPRUNE="/usr/local/cuda-11.8/bin/nvprune"
export CUDA_LIB_DIR="/usr/local/cuda-11.8/lib64"
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement export GENCODE="-gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
install_cudnn 12 $CUDNN_VERSION export GENCODE_CUDNN="-gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
install_nvshmem 12 $NVSHMEM_VERSION if [[ -n "$OVERRIDE_GENCODE" ]]; then
export GENCODE=$OVERRIDE_GENCODE
fi
CUDA_VERSION=12.9 bash install_nccl.sh # all CUDA libs except CuDNN and CuBLAS (cudnn and cublas need arch 3.7 included)
ls $CUDA_LIB_DIR/ | grep "\.a" | grep -v "culibos" | grep -v "cudart" | grep -v "cudnn" | grep -v "cublas" | grep -v "metis" \
| xargs -I {} bash -c \
"echo {} && $NVPRUNE $GENCODE $CUDA_LIB_DIR/{} -o $CUDA_LIB_DIR/{}"
CUDA_VERSION=12.9 bash install_cusparselt.sh # prune CuDNN and CuBLAS
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublas_static.a -o $CUDA_LIB_DIR/libcublas_static.a
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublasLt_static.a -o $CUDA_LIB_DIR/libcublasLt_static.a
ldconfig #####################################################################################
# CUDA 11.8 prune visual tools
#####################################################################################
export CUDA_BASE="/usr/local/cuda-11.8/"
rm -rf $CUDA_BASE/libnvvp $CUDA_BASE/nsightee_plugins $CUDA_BASE/nsight-compute-2022.3.0 $CUDA_BASE/nsight-systems-2022.4.2/
} }
function prune_124 { function prune_124 {
@ -195,19 +240,35 @@ function prune_126 {
} }
function install_128 { function install_128 {
CUDNN_VERSION=9.8.0.87 CUDNN_VERSION=9.7.1.26
echo "Installing CUDA 12.8.1 and cuDNN ${CUDNN_VERSION} and NVSHMEM and NCCL and cuSparseLt-0.7.1" echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
# install CUDA 12.8.1 in the same container rm -rf /usr/local/cuda-12.8 /usr/local/cuda
install_cuda 12.8.1 cuda_12.8.1_570.124.06_linux # install CUDA 12.8.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux.run
chmod +x cuda_12.8.0_570.86.10_linux.run
./cuda_12.8.0_570.86.10_linux.run --toolkit --silent
rm -f cuda_12.8.0_570.86.10_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.8 /usr/local/cuda
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement # cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
install_cudnn 12 $CUDNN_VERSION mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
install_nvshmem 12 $NVSHMEM_VERSION # NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
CUDA_VERSION=12.8 bash install_nccl.sh install_cusparselt_063
CUDA_VERSION=12.8 bash install_cusparselt.sh
ldconfig ldconfig
} }
@ -216,13 +277,13 @@ function install_128 {
while test $# -gt 0 while test $# -gt 0
do do
case "$1" in case "$1" in
11.8) install_118; prune_118
;;
12.4) install_124; prune_124 12.4) install_124; prune_124
;; ;;
12.6|12.6.*) install_126; prune_126 12.6) install_126; prune_126
;; ;;
12.8|12.8.*) install_128; 12.8) install_128;
;;
12.9|12.9.*) install_129;
;; ;;
*) echo "bad argument $1"; exit 1 *) echo "bad argument $1"; exit 1
;; ;;

View File

@ -0,0 +1,63 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
NCCL_VERSION=v2.26.2-1
CUDNN_VERSION=9.8.0.87
function install_cusparselt_063 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-sbsa/libcusparse_lt-linux-sbsa-0.6.3.2-archive.tar.xz
tar xf libcusparse_lt-linux-sbsa-0.6.3.2-archive.tar.xz
cp -a libcusparse_lt-linux-sbsa-0.6.3.2-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-sbsa-0.6.3.2-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_128 {
echo "Installing CUDA 12.8.0 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.6.3"
rm -rf /usr/local/cuda-12.8 /usr/local/cuda
# install CUDA 12.8.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda_12.8.0_570.86.10_linux_sbsa.run
chmod +x cuda_12.8.0_570.86.10_linux_sbsa.run
./cuda_12.8.0_570.86.10_linux_sbsa.run --toolkit --silent
rm -f cuda_12.8.0_570.86.10_linux_sbsa.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.8 /usr/local/cuda
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-sbsa-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b ${NCCL_VERSION} --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
install_cusparselt_063
ldconfig
}
# idiomatic parameter and option handling in sh
while test $# -gt 0
do
case "$1" in
12.8) install_128;
;;
*) echo "bad argument $1"; exit 1
;;
esac
shift
done

View File

@ -0,0 +1,26 @@
#!/bin/bash
if [[ -n "${CUDNN_VERSION}" ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn
pushd tmp_cudnn
if [[ ${CUDA_VERSION:0:4} == "12.8" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.7.1.26_cuda12-archive"
elif [[ ${CUDA_VERSION:0:4} == "12.6" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.5.1.17_cuda12-archive"
elif [[ ${CUDA_VERSION:0:2} == "12" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda12-archive"
elif [[ ${CUDA_VERSION:0:2} == "11" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda11-archive"
else
print "Unsupported CUDA version ${CUDA_VERSION}"
exit 1
fi
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/${CUDNN_NAME}.tar.xz
tar xf ${CUDNN_NAME}.tar.xz
cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUDNN_NAME}/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cudnn
ldconfig
fi

View File

@ -5,13 +5,13 @@ set -ex
# cuSPARSELt license: https://docs.nvidia.com/cuda/cusparselt/license.html # cuSPARSELt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && cd tmp_cusparselt mkdir tmp_cusparselt && cd tmp_cusparselt
if [[ ${CUDA_VERSION:0:4} =~ ^12\.[5-9]$ ]]; then if [[ ${CUDA_VERSION:0:4} =~ ^12\.[5-8]$ ]]; then
arch_path='sbsa' arch_path='sbsa'
export TARGETARCH=${TARGETARCH:-$(uname -m)} export TARGETARCH=${TARGETARCH:-$(uname -m)}
if [ ${TARGETARCH} = 'amd64' ] || [ "${TARGETARCH}" = 'x86_64' ]; then if [ ${TARGETARCH} = 'amd64' ] || [ "${TARGETARCH}" = 'x86_64' ]; then
arch_path='x86_64' arch_path='x86_64'
fi fi
CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.7.1.0-archive" CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.6.3.2-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "12.4" ]]; then elif [[ ${CUDA_VERSION:0:4} == "12.4" ]]; then
arch_path='sbsa' arch_path='sbsa'
@ -21,6 +21,9 @@ elif [[ ${CUDA_VERSION:0:4} == "12.4" ]]; then
fi fi
CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.6.2.3-archive" CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.6.2.3-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "11.8" ]]; then
CUSPARSELT_NAME="libcusparse_lt-linux-x86_64-0.4.0.7-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/${CUSPARSELT_NAME}.tar.xz
else else
echo "Not sure which libcusparselt version to install for this ${CUDA_VERSION}" echo "Not sure which libcusparselt version to install for this ${CUDA_VERSION}"
fi fi

38
.ci/docker/common/install_db.sh Executable file
View File

@ -0,0 +1,38 @@
#!/bin/bash
set -ex
install_ubuntu() {
apt-get update
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
# Need EPEL for many packages we depend on.
# See http://fedoraproject.org/wiki/EPEL
yum --enablerepo=extras install -y epel-release
# Cleanup
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac

View File

@ -13,7 +13,7 @@ clone_executorch() {
# and fetch the target commit # and fetch the target commit
pushd executorch pushd executorch
git checkout "${EXECUTORCH_PINNED_COMMIT}" git checkout "${EXECUTORCH_PINNED_COMMIT}"
git submodule update --init --recursive git submodule update --init
popd popd
chown -R jenkins executorch chown -R jenkins executorch
@ -50,7 +50,8 @@ setup_executorch() {
pushd executorch pushd executorch
export PYTHON_EXECUTABLE=python export PYTHON_EXECUTABLE=python
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON" export EXECUTORCH_BUILD_PYBIND=ON
export CMAKE_ARGS="-DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
as_jenkins .ci/scripts/setup-linux.sh --build-tool cmake || true as_jenkins .ci/scripts/setup-linux.sh --build-tool cmake || true
popd popd

View File

@ -17,7 +17,7 @@ if [ -n "${UBUNTU_VERSION}" ];then
libopenblas-dev libeigen3-dev libatlas-base-dev libzstd-dev libopenblas-dev libeigen3-dev libatlas-base-dev libzstd-dev
fi fi
pip_install numpy scipy imageio cmake ninja conda_install numpy scipy imageio cmake ninja
git clone --depth 1 --branch release/16.x --recursive https://github.com/llvm/llvm-project.git git clone --depth 1 --branch release/16.x --recursive https://github.com/llvm/llvm-project.git
cmake -DCMAKE_BUILD_TYPE=Release \ cmake -DCMAKE_BUILD_TYPE=Release \
@ -35,9 +35,7 @@ git clone https://github.com/halide/Halide.git
pushd Halide pushd Halide
git checkout ${COMMIT} && git submodule update --init --recursive git checkout ${COMMIT} && git submodule update --init --recursive
pip_install -r requirements.txt pip_install -r requirements.txt
# NOTE: pybind has a requirement for cmake > 3.5 so set the minimum cmake version here with a flag cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -S . -B build
# Context: https://github.com/pytorch/pytorch/issues/150420
cmake -G Ninja -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake --build build cmake --build build
test -e ${CONDA_PREFIX}/lib/python3 || ln -s python${ANACONDA_PYTHON_VERSION} ${CONDA_PREFIX}/lib/python3 test -e ${CONDA_PREFIX}/lib/python3 || ln -s python${ANACONDA_PYTHON_VERSION} ${CONDA_PREFIX}/lib/python3
cmake --install build --prefix ${CONDA_PREFIX} cmake --install build --prefix ${CONDA_PREFIX}

View File

@ -14,38 +14,19 @@ function install_timm() {
local commit local commit
commit=$(get_pinned_commit timm) commit=$(get_pinned_commit timm)
# TODO (huydhn): There is no torchvision release on 3.13 when I write this, so
# I'm using nightly here instead. We just need to package to be able to install
# TIMM. Removing this once vision has a release on 3.13
if [[ "${ANACONDA_PYTHON_VERSION}" == "3.13" ]]; then
pip_install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu124
fi
pip_install "git+https://github.com/huggingface/pytorch-image-models@${commit}" pip_install "git+https://github.com/huggingface/pytorch-image-models@${commit}"
} # Clean up
conda_run pip uninstall -y cmake torch torchvision triton
function install_torchbench() {
local commit
commit=$(get_pinned_commit torchbench)
git clone https://github.com/pytorch/benchmark torchbench
pushd torchbench
git checkout "$commit"
python install.py --continue_on_fail
# TODO (huydhn): transformers-4.44.2 added by https://github.com/pytorch/benchmark/pull/2488
# is regressing speedup metric. This needs to be investigated further
pip install transformers==4.38.1
echo "Print all dependencies after TorchBench is installed"
python -mpip freeze
popd
chown -R jenkins torchbench
} }
# Pango is needed for weasyprint which is needed for doctr # Pango is needed for weasyprint which is needed for doctr
conda_install pango conda_install pango
# Stable packages are ok here, just to satisfy TorchBench check
pip_install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
install_torchbench
install_huggingface install_huggingface
install_timm install_timm
# Clean up
conda_run pip uninstall -y torch torchvision torchaudio triton

View File

@ -2,6 +2,8 @@
set -ex set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
if [ -n "${UBUNTU_VERSION}" ]; then if [ -n "${UBUNTU_VERSION}" ]; then
apt update apt update
apt-get install -y clang doxygen git graphviz nodejs npm libtinfo5 apt-get install -y clang doxygen git graphviz nodejs npm libtinfo5
@ -13,8 +15,8 @@ chown -R jenkins pytorch
pushd pytorch pushd pytorch
# Install all linter dependencies # Install all linter dependencies
pip install -r requirements.txt pip_install -r requirements.txt
lintrunner init conda_run lintrunner init
# Cache .lintbin directory as part of the Docker image # Cache .lintbin directory as part of the Docker image
cp -r .lintbin /tmp cp -r .lintbin /tmp

View File

@ -1,23 +1,26 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# Script that installs magma from tarball inside conda environment. # Script that replaces the magma install from a conda package
# It replaces anaconda magma-cuda package which is no longer published.
# Execute it inside active conda environment.
# See issue: https://github.com/pytorch/pytorch/issues/138506
set -eou pipefail set -eou pipefail
cuda_version_nodot=${1/./} function do_install() {
anaconda_dir=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} cuda_version_nodot=${1/./}
anaconda_python_version=$2
MAGMA_VERSION="2.6.1" MAGMA_VERSION="2.6.1"
magma_archive="magma-cuda${cuda_version_nodot}-${MAGMA_VERSION}-1.tar.bz2" magma_archive="magma-cuda${cuda_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"
(
set -x anaconda_dir="/opt/conda/envs/py_${anaconda_python_version}"
tmp_dir=$(mktemp -d) (
pushd ${tmp_dir} set -x
curl -OLs https://ossci-linux.s3.us-east-1.amazonaws.com/${magma_archive} tmp_dir=$(mktemp -d)
tar -xvf "${magma_archive}" pushd ${tmp_dir}
mv include/* "${anaconda_dir}/include/" curl -OLs https://ossci-linux.s3.us-east-1.amazonaws.com/${magma_archive}
mv lib/* "${anaconda_dir}/lib" tar -xvf "${magma_archive}"
popd mv include/* "${anaconda_dir}/include/"
) mv lib/* "${anaconda_dir}/lib"
popd
)
}
do_install $1 $2

View File

@ -1,26 +0,0 @@
#!/bin/bash
set -ex
NCCL_VERSION=""
if [[ ${CUDA_VERSION:0:2} == "11" ]]; then
NCCL_VERSION=$(cat ci_commit_pins/nccl-cu11.txt)
elif [[ ${CUDA_VERSION:0:2} == "12" ]]; then
NCCL_VERSION=$(cat ci_commit_pins/nccl-cu12.txt)
else
echo "Unexpected CUDA_VERSION ${CUDA_VERSION}"
exit 1
fi
if [[ -n "${NCCL_VERSION}" ]]; then
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
pushd nccl
make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
popd
rm -rf nccl
ldconfig
fi

View File

@ -8,6 +8,16 @@ retry () {
"$@" || (sleep 10 && "$@") || (sleep 20 && "$@") || (sleep 40 && "$@") "$@" || (sleep 10 && "$@") || (sleep 20 && "$@") || (sleep 40 && "$@")
} }
# A bunch of custom pip dependencies for ONNX
pip_install \
beartype==0.15.0 \
filelock==3.9.0 \
flatbuffers==2.0 \
mock==5.0.1 \
ninja==1.10.2 \
networkx==2.5 \
numpy==1.24.2
# ONNXRuntime should be installed before installing # ONNXRuntime should be installed before installing
# onnx-weekly. Otherwise, onnx-weekly could be # onnx-weekly. Otherwise, onnx-weekly could be
# overwritten by onnx. # overwritten by onnx.
@ -19,8 +29,12 @@ pip_install \
transformers==4.36.2 transformers==4.36.2
pip_install coloredlogs packaging pip_install coloredlogs packaging
pip_install onnxruntime==1.18.1 pip_install onnxruntime==1.18.1
pip_install onnxscript==0.3.1 pip_install onnx==1.17.0
pip_install onnxscript==0.2.2 --no-deps
# required by onnxscript
pip_install ml_dtypes
# Cache the transformers model to be used later by ONNX tests. We need to run the transformers # Cache the transformers model to be used later by ONNX tests. We need to run the transformers
# package to download the model. By default, the model is cached at ~/.cache/huggingface/hub/ # package to download the model. By default, the model is cached at ~/.cache/huggingface/hub/

View File

@ -4,9 +4,9 @@
set -ex set -ex
cd / cd /
git clone https://github.com/OpenMathLib/OpenBLAS.git -b "${OPENBLAS_VERSION:-v0.3.30}" --depth 1 --shallow-submodules git clone https://github.com/OpenMathLib/OpenBLAS.git -b v0.3.29 --depth 1 --shallow-submodules
OPENBLAS_CHECKOUT_DIR="OpenBLAS"
OPENBLAS_BUILD_FLAGS=" OPENBLAS_BUILD_FLAGS="
NUM_THREADS=128 NUM_THREADS=128
USE_OPENMP=1 USE_OPENMP=1
@ -14,8 +14,9 @@ NO_SHARED=0
DYNAMIC_ARCH=1 DYNAMIC_ARCH=1
TARGET=ARMV8 TARGET=ARMV8
CFLAGS=-O3 CFLAGS=-O3
BUILD_BFLOAT16=1
" "
OPENBLAS_CHECKOUT_DIR="OpenBLAS"
make -j8 ${OPENBLAS_BUILD_FLAGS} -C ${OPENBLAS_CHECKOUT_DIR} make -j8 ${OPENBLAS_BUILD_FLAGS} -C ${OPENBLAS_CHECKOUT_DIR}
make -j8 ${OPENBLAS_BUILD_FLAGS} install -C ${OPENBLAS_CHECKOUT_DIR} make -j8 ${OPENBLAS_BUILD_FLAGS} install -C ${OPENBLAS_CHECKOUT_DIR}

View File

@ -0,0 +1,19 @@
#!/bin/bash
set -ex
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz" --retry 3
tar -xvz --no-same-owner -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
NPROC=$[$(nproc) - 2]
pushd "$pb_dir" && ./configure && make -j${NPROC} && make -j${NPROC} check && sudo make -j${NRPOC} install && sudo ldconfig
popd
rm -rf $pb_dir

View File

@ -1,15 +0,0 @@
#!/bin/bash
set -ex
apt-get update
# Use deadsnakes in case we need an older python version
sudo add-apt-repository ppa:deadsnakes/ppa
apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python3-pip python${PYTHON_VERSION}-venv
# Use a venv because uv and some other package managers don't support --user install
ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python
python -m venv /var/lib/jenkins/ci_env
source /var/lib/jenkins/ci_env/bin/activate
python -mpip install --upgrade pip
python -mpip install -r /opt/requirements-ci.txt

View File

@ -8,11 +8,13 @@ ver() {
install_ubuntu() { install_ubuntu() {
apt-get update apt-get update
# gpg-agent is not available by default if [[ $UBUNTU_VERSION == 18.04 ]]; then
apt-get install -y --no-install-recommends gpg-agent # gpg-agent is not available by default on 18.04
if [[ $(ver $UBUNTU_VERSION) -ge $(ver 22.04) ]]; then apt-get install -y --no-install-recommends gpg-agent
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \ fi
| sudo tee /etc/apt/preferences.d/rocm-pin-600 if [[ $UBUNTU_VERSION == 20.04 ]]; then
# gpg-agent is not available by default on 20.04
apt-get install -y --no-install-recommends gpg-agent
fi fi
apt-get install -y kmod apt-get install -y kmod
apt-get install -y wget apt-get install -y wget
@ -21,34 +23,13 @@ install_ubuntu() {
apt-get install -y libc++1 apt-get install -y libc++1
apt-get install -y libc++abi1 apt-get install -y libc++abi1
# Make sure rocm packages from repo.radeon.com have highest priority
cat << EOF > /etc/apt/preferences.d/rocm-pin-600
Package: *
Pin: release o=repo.radeon.com
Pin-Priority: 600
EOF
# we want the patch version of 6.4 instead
if [[ $(ver $ROCM_VERSION) -eq $(ver 6.4) ]]; then
ROCM_VERSION="${ROCM_VERSION}.2"
fi
# Default url values
rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu"
# Special case for ROCM_VERSION == 7.0
if [[ $(ver "$ROCM_VERSION") -eq $(ver 7.0) ]]; then
rocm_baseurl="https://repo.radeon.com/rocm/apt/7.0_alpha2"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/30.10_alpha2/ubuntu"
fi
# Add amdgpu repository # Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'` UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list echo "deb [arch=amd64] https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
# Add rocm repository # Add rocm repository
wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
echo "deb [arch=amd64] ${rocm_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/rocm.list echo "deb [arch=amd64] ${rocm_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/rocm.list
apt-get update --allow-insecure-repositories apt-get update --allow-insecure-repositories
@ -82,33 +63,17 @@ EOF
done done
# ROCm 6.3 had a regression where initializing static code objects had significant overhead # ROCm 6.3 had a regression where initializing static code objects had significant overhead
# CI no longer builds for ROCm 6.3, but if [[ $(ver $ROCM_VERSION) -eq $(ver 6.3) ]]; then
# ROCm 6.4 did not yet fix the regression, also HIP branch names are different
if [[ $(ver $ROCM_VERSION) -ge $(ver 6.4) ]] && [[ $(ver $ROCM_VERSION) -lt $(ver 7.0) ]]; then
if [[ $(ver $ROCM_VERSION) -eq $(ver 6.4.2) ]]; then
HIP_TAG=rocm-6.4.2
CLR_HASH=74d78ba3ac4bac235d02bcb48511c30b5cfdd457 # branch release/rocm-rel-6.4.2-statco-hotfix
elif [[ $(ver $ROCM_VERSION) -eq $(ver 6.4.1) ]]; then
HIP_TAG=rocm-6.4.1
CLR_HASH=efe6c35790b9206923bfeed1209902feff37f386 # branch release/rocm-rel-6.4.1-statco-hotfix
elif [[ $(ver $ROCM_VERSION) -eq $(ver 6.4) ]]; then
HIP_TAG=rocm-6.4.0
CLR_HASH=600f5b0d2baed94d5121e2174a9de0851b040b0c # branch release/rocm-rel-6.4-statco-hotfix
fi
# clr build needs CppHeaderParser but can only find it using conda's python # clr build needs CppHeaderParser but can only find it using conda's python
python -m pip install CppHeaderParser /opt/conda/bin/python -m pip install CppHeaderParser
git clone https://github.com/ROCm/HIP -b $HIP_TAG git clone https://github.com/ROCm/HIP -b rocm-6.3.x
HIP_COMMON_DIR=$(readlink -f HIP) HIP_COMMON_DIR=$(readlink -f HIP)
git clone https://github.com/jeffdaily/clr git clone https://github.com/jeffdaily/clr -b release/rocm-rel-6.3-statco-hotfix
pushd clr
git checkout $CLR_HASH
popd
mkdir -p clr/build mkdir -p clr/build
pushd clr/build pushd clr/build
# Need to point CMake to the correct python installation to find CppHeaderParser cmake .. -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=$HIP_COMMON_DIR
cmake .. -DPython3_EXECUTABLE=/opt/conda/envs/py_${ANACONDA_PYTHON_VERSION}/bin/python3 -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=$HIP_COMMON_DIR
make -j make -j
cp hipamd/lib/libamdhip64.so.6.4.* /opt/rocm/lib/libamdhip64.so.6.4.* cp hipamd/lib/libamdhip64.so.6.3.* /opt/rocm/lib/libamdhip64.so.6.3.*
popd popd
rm -rf HIP clr rm -rf HIP clr
fi fi

View File

@ -1,37 +1,50 @@
#!/usr/bin/env bash #!/bin/bash
# Script used only in CD pipeline # Script used in CI and CD pipeline
set -eou pipefail set -ex
function do_install() { # Magma build scripts need `python`
rocm_version=$1 ln -sf /usr/bin/python3 /usr/bin/python
if [[ ${rocm_version} =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
# chop off any patch version
rocm_version="${rocm_version%.*}"
fi
rocm_version_nodot=${rocm_version//./} ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
almalinux)
yum install -y gcc-gfortran
;;
*)
echo "No preinstalls to build magma..."
;;
esac
# Version 2.7.2 + ROCm related updates MKLROOT=${MKLROOT:-/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION}
MAGMA_VERSION=a1625ff4d9bc362906bd01f805dbbe12612953f6
magma_archive="magma-rocm${rocm_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"
rocm_dir="/opt/rocm" # "install" hipMAGMA into /opt/rocm/magma by copying after build
( git clone https://bitbucket.org/icl/magma.git
set -x pushd magma
tmp_dir=$(mktemp -d)
pushd ${tmp_dir}
curl -OLs https://ossci-linux.s3.us-east-1.amazonaws.com/${magma_archive}
if tar -xvf "${magma_archive}"
then
mkdir -p "${rocm_dir}/magma"
mv include "${rocm_dir}/magma/include"
mv lib "${rocm_dir}/magma/lib"
else
echo "${magma_archive} not found, skipping magma install"
fi
popd
)
}
do_install $1 # Version 2.7.2 + ROCm related updates
git checkout a1625ff4d9bc362906bd01f805dbbe12612953f6
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
if [[ -f "${MKLROOT}/lib/libmkl_core.a" ]]; then
echo 'LIB = -Wl,--start-group -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -lstdc++ -lm -lgomp -lhipblas -lhipsparse' >> make.inc
fi
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib -ldl' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --offload-arch=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT="${MKLROOT}"
make testing/testing_dgemm -j $(nproc) MKLROOT="${MKLROOT}"
popd
mv magma /opt/rocm

View File

@ -0,0 +1,24 @@
#!/bin/bash
set -ex
[ -n "${SWIFTSHADER}" ]
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
_https_amazon_aws=https://ossci-android.s3.amazonaws.com
# SwiftShader
_swiftshader_dir=/var/lib/jenkins/swiftshader
_swiftshader_file_targz=swiftshader-abe07b943-prebuilt.tar.gz
mkdir -p $_swiftshader_dir
_tmp_swiftshader_targz="/tmp/${_swiftshader_file_targz}"
curl --silent --show-error --location --fail --retry 3 \
--output "${_tmp_swiftshader_targz}" "$_https_amazon_aws/${_swiftshader_file_targz}"
tar -C "${_swiftshader_dir}" -xzf "${_tmp_swiftshader_targz}"
export VK_ICD_FILENAMES="${_swiftshader_dir}/build/Linux/vk_swiftshader_icd.json"

View File

@ -2,16 +2,14 @@
set -ex set -ex
mkdir -p /opt/triton
if [ -z "${TRITON}" ] && [ -z "${TRITON_CPU}" ]; then
echo "TRITON and TRITON_CPU are not set. Exiting..."
exit 0
fi
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh" source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
get_pip_version() { get_conda_version() {
conda_run pip list | grep -w $* | head -n 1 | awk '{print $2}' as_jenkins conda list -n py_$ANACONDA_PYTHON_VERSION | grep -w $* | head -n 1 | awk '{print $2}'
}
conda_reinstall() {
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y --force-reinstall $*
} }
if [ -n "${XPU_VERSION}" ]; then if [ -n "${XPU_VERSION}" ]; then
@ -33,9 +31,11 @@ if [ -n "${UBUNTU_VERSION}" ];then
apt-get install -y gpg-agent apt-get install -y gpg-agent
fi fi
# Keep the current cmake and numpy version here, so we can reinstall them later if [ -n "${CONDA_CMAKE}" ]; then
CMAKE_VERSION=$(get_pip_version cmake) # Keep the current cmake and numpy version here, so we can reinstall them later
NUMPY_VERSION=$(get_pip_version numpy) CMAKE_VERSION=$(get_conda_version cmake)
NUMPY_VERSION=$(get_conda_version numpy)
fi
if [ -z "${MAX_JOBS}" ]; then if [ -z "${MAX_JOBS}" ]; then
export MAX_JOBS=$(nproc) export MAX_JOBS=$(nproc)
@ -51,13 +51,7 @@ as_jenkins git clone --recursive ${TRITON_REPO} triton
cd triton cd triton
as_jenkins git checkout ${TRITON_PINNED_COMMIT} as_jenkins git checkout ${TRITON_PINNED_COMMIT}
as_jenkins git submodule update --init --recursive as_jenkins git submodule update --init --recursive
cd python
# Old versions of python have setup.py in ./python; newer versions have it in ./
if [ ! -f setup.py ]; then
cd python
fi
pip_install pybind11==2.13.6
# TODO: remove patch setup.py once we have a proper fix for https://github.com/triton-lang/triton/issues/4527 # TODO: remove patch setup.py once we have a proper fix for https://github.com/triton-lang/triton/issues/4527
as_jenkins sed -i -e 's/https:\/\/tritonlang.blob.core.windows.net\/llvm-builds/https:\/\/oaitriton.blob.core.windows.net\/public\/llvm-builds/g' setup.py as_jenkins sed -i -e 's/https:\/\/tritonlang.blob.core.windows.net\/llvm-builds/https:\/\/oaitriton.blob.core.windows.net\/public\/llvm-builds/g' setup.py
@ -66,42 +60,28 @@ if [ -n "${UBUNTU_VERSION}" ] && [ -n "${GCC_VERSION}" ] && [[ "${GCC_VERSION}"
# Triton needs at least gcc-9 to build # Triton needs at least gcc-9 to build
apt-get install -y g++-9 apt-get install -y g++-9
CXX=g++-9 conda_run python setup.py bdist_wheel CXX=g++-9 pip_install .
elif [ -n "${UBUNTU_VERSION}" ] && [ -n "${CLANG_VERSION}" ]; then elif [ -n "${UBUNTU_VERSION}" ] && [ -n "${CLANG_VERSION}" ]; then
# Triton needs <filesystem> which surprisingly is not available with clang-9 toolchain # Triton needs <filesystem> which surprisingly is not available with clang-9 toolchain
add-apt-repository -y ppa:ubuntu-toolchain-r/test add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get install -y g++-9 apt-get install -y g++-9
CXX=g++-9 conda_run python setup.py bdist_wheel CXX=g++-9 pip_install .
else else
conda_run python setup.py bdist_wheel pip_install .
fi fi
# Copy the wheel to /opt for multi stage docker builds if [ -n "${CONDA_CMAKE}" ]; then
cp dist/*.whl /opt/triton # TODO: This is to make sure that the same cmake and numpy version from install conda
# Install the wheel for docker builds that don't use multi stage # script is used. Without this step, the newer cmake version (3.25.2) downloaded by
pip_install dist/*.whl # triton build step via pip will fail to detect conda MKL. Once that issue is fixed,
# this can be removed.
# TODO: This is to make sure that the same cmake and numpy version from install conda #
# script is used. Without this step, the newer cmake version (3.25.2) downloaded by # The correct numpy version also needs to be set here because conda claims that it
# triton build step via pip will fail to detect conda MKL. Once that issue is fixed, # causes inconsistent environment. Without this, conda will attempt to install the
# this can be removed. # latest numpy version, which fails ASAN tests with the following import error: Numba
# # needs NumPy 1.20 or less.
# The correct numpy version also needs to be set here because conda claims that it conda_reinstall cmake="${CMAKE_VERSION}"
# causes inconsistent environment. Without this, conda will attempt to install the # Note that we install numpy with pip as conda might not have the version we want
# latest numpy version, which fails ASAN tests with the following import error: Numba pip_install --force-reinstall numpy=="${NUMPY_VERSION}"
# needs NumPy 1.20 or less.
# Note that we install numpy with pip as conda might not have the version we want
if [ -n "${CMAKE_VERSION}" ]; then
pip_install "cmake==${CMAKE_VERSION}"
fi
if [ -n "${NUMPY_VERSION}" ]; then
pip_install "numpy==${NUMPY_VERSION}"
fi
# IMPORTANT: helion needs to be installed without dependencies.
# It depends on torch and triton. We don't want to install
# triton and torch from production on Docker CI images
if [[ "$ANACONDA_PYTHON_VERSION" != 3.9* ]]; then
pip_install helion --no-deps
fi fi

View File

@ -0,0 +1,24 @@
#!/bin/bash
set -ex
[ -n "${VULKAN_SDK_VERSION}" ]
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
_vulkansdk_dir=/var/lib/jenkins/vulkansdk
_tmp_vulkansdk_targz=/tmp/vulkansdk.tar.gz
curl \
--silent \
--show-error \
--location \
--fail \
--retry 3 \
--output "${_tmp_vulkansdk_targz}" "https://ossci-android.s3.amazonaws.com/vulkansdk-linux-x86_64-${VULKAN_SDK_VERSION}.tar.gz"
mkdir -p "${_vulkansdk_dir}"
tar -C "${_vulkansdk_dir}" -xzf "${_tmp_vulkansdk_targz}" --strip-components 1
rm -rf "${_tmp_vulkansdk_targz}"

View File

@ -26,7 +26,7 @@ function install_ubuntu() {
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
| gpg --dearmor > /usr/share/keyrings/oneapi-archive-keyring.gpg.gpg | gpg --dearmor > /usr/share/keyrings/oneapi-archive-keyring.gpg.gpg
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg.gpg] \ echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg.gpg] \
https://apt.repos.intel.com/oneapi all main" \ https://apt.repos.intel.com/${XPU_REPO_NAME} all main" \
| tee /etc/apt/sources.list.d/oneAPI.list | tee /etc/apt/sources.list.d/oneAPI.list
# Update the packages list and repository index # Update the packages list and repository index
@ -56,10 +56,14 @@ function install_ubuntu() {
function install_rhel() { function install_rhel() {
. /etc/os-release . /etc/os-release
if [[ "${ID}" == "rhel" ]]; then
if [[ ! " 8.8 8.10 9.0 9.2 9.3 " =~ " ${VERSION_ID} " ]]; then if [[ ! " 8.8 8.9 9.0 9.2 9.3 " =~ " ${VERSION_ID} " ]]; then
echo "RHEL version ${VERSION_ID} not supported" echo "RHEL version ${VERSION_ID} not supported"
exit exit
fi
elif [[ "${ID}" == "almalinux" ]]; then
# Workaround for almalinux8 which used by quay.io/pypa/manylinux_2_28_x86_64
VERSION_ID="8.8"
fi fi
dnf install -y 'dnf-command(config-manager)' dnf install -y 'dnf-command(config-manager)'
@ -70,7 +74,7 @@ function install_rhel() {
tee > /etc/yum.repos.d/oneAPI.repo << EOF tee > /etc/yum.repos.d/oneAPI.repo << EOF
[oneAPI] [oneAPI]
name=Intel for Pytorch GPU dev repository name=Intel for Pytorch GPU dev repository
baseurl=https://yum.repos.intel.com/oneapi baseurl=https://yum.repos.intel.com/${XPU_REPO_NAME}
enabled=1 enabled=1
gpgcheck=1 gpgcheck=1
repo_gpgcheck=1 repo_gpgcheck=1
@ -114,7 +118,7 @@ function install_sles() {
https://repositories.intel.com/gpu/sles/${VERSION_SP}${XPU_DRIVER_VERSION}/unified/intel-gpu-${VERSION_SP}.repo https://repositories.intel.com/gpu/sles/${VERSION_SP}${XPU_DRIVER_VERSION}/unified/intel-gpu-${VERSION_SP}.repo
rpm --import https://repositories.intel.com/gpu/intel-graphics.key rpm --import https://repositories.intel.com/gpu/intel-graphics.key
# To add the online network network package repository for the Intel Support Packages # To add the online network network package repository for the Intel Support Packages
zypper addrepo https://yum.repos.intel.com/oneapi oneAPI zypper addrepo https://yum.repos.intel.com/${XPU_REPO_NAME} oneAPI
rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
# The xpu-smi packages # The xpu-smi packages
@ -137,10 +141,10 @@ if [[ "${XPU_DRIVER_TYPE,,}" == "rolling" ]]; then
XPU_DRIVER_VERSION="" XPU_DRIVER_VERSION=""
fi fi
# Default use Intel® oneAPI Deep Learning Essentials 2025.0 XPU_REPO_NAME="intel-for-pytorch-gpu-dev"
if [[ "$XPU_VERSION" == "2025.1" ]]; then XPU_PACKAGES="intel-for-pytorch-gpu-dev-0.5 intel-pti-dev-0.9"
XPU_PACKAGES="intel-deep-learning-essentials-2025.1" if [[ "$XPU_VERSION" == "2025.0" ]]; then
else XPU_REPO_NAME="oneapi"
XPU_PACKAGES="intel-deep-learning-essentials-2025.0" XPU_PACKAGES="intel-deep-learning-essentials-2025.0"
fi fi

View File

@ -49,11 +49,18 @@ RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM cpu as cuda FROM cpu as cuda
ADD ./common/install_cuda.sh install_cuda.sh ADD ./common/install_cuda.sh install_cuda.sh
ADD ./common/install_magma.sh install_magma.sh ADD ./common/install_magma.sh install_magma.sh
COPY ./common/install_nccl.sh install_nccl.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
ENV CUDA_HOME /usr/local/cuda ENV CUDA_HOME /usr/local/cuda
FROM cuda as cuda11.8
RUN bash ./install_cuda.sh 11.8
RUN bash ./install_magma.sh 11.8
RUN ln -sf /usr/local/cuda-11.8 /usr/local/cuda
FROM cuda as cuda12.4
RUN bash ./install_cuda.sh 12.4
RUN bash ./install_magma.sh 12.4
RUN ln -sf /usr/local/cuda-12.4 /usr/local/cuda
FROM cuda as cuda12.6 FROM cuda as cuda12.6
RUN bash ./install_cuda.sh 12.6 RUN bash ./install_cuda.sh 12.6
RUN bash ./install_magma.sh 12.6 RUN bash ./install_magma.sh 12.6
@ -64,13 +71,7 @@ RUN bash ./install_cuda.sh 12.8
RUN bash ./install_magma.sh 12.8 RUN bash ./install_magma.sh 12.8
RUN ln -sf /usr/local/cuda-12.8 /usr/local/cuda RUN ln -sf /usr/local/cuda-12.8 /usr/local/cuda
FROM cuda as cuda12.9
RUN bash ./install_cuda.sh 12.9
RUN bash ./install_magma.sh 12.9
RUN ln -sf /usr/local/cuda-12.9 /usr/local/cuda
FROM cpu as rocm FROM cpu as rocm
ARG ROCM_VERSION
ARG PYTORCH_ROCM_ARCH ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH} ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ENV MKLROOT /opt/intel ENV MKLROOT /opt/intel
@ -85,11 +86,11 @@ ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
# gfortran and python needed for building magma from source for ROCm # gfortran and python needed for building magma from source for ROCm
RUN apt-get update -y && \ RUN apt-get update -y && \
apt-get install gfortran -y && \ apt-get install gfortran -y && \
apt-get install python3 python-is-python3 -y && \ apt-get install python -y && \
apt-get clean apt-get clean
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION} && rm install_rocm_magma.sh RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
FROM ${BASE_TARGET} as final FROM ${BASE_TARGET} as final
COPY --from=openssl /opt/openssl /opt/openssl COPY --from=openssl /opt/openssl /opt/openssl

View File

@ -1,67 +1,83 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# Script used only in CD pipeline # Script used only in CD pipeline
set -eoux pipefail set -eou pipefail
image="$1" image="$1"
shift shift
if [ -z "${image}" ]; then if [ -z "${image}" ]; then
echo "Usage: $0 IMAGENAME:ARCHTAG" echo "Usage: $0 IMAGE"
exit 1 exit 1
fi fi
DOCKER_IMAGE="pytorch/${image}"
TOPDIR=$(git rev-parse --show-toplevel) TOPDIR=$(git rev-parse --show-toplevel)
GPU_ARCH_TYPE=${GPU_ARCH_TYPE:-cpu}
GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}
WITH_PUSH=${WITH_PUSH:-}
DOCKER=${DOCKER:-docker} DOCKER=${DOCKER:-docker}
# Go from imagename:tag to tag case ${GPU_ARCH_TYPE} in
DOCKER_TAG_PREFIX=$(echo "${image}" | awk -F':' '{print $2}')
GPU_ARCH_VERSION=""
if [[ "${DOCKER_TAG_PREFIX}" == cuda* ]]; then
# extract cuda version from image name. e.g. manylinux2_28-builder:cuda12.8 returns 12.8
GPU_ARCH_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'cuda' '{print $2}')
elif [[ "${DOCKER_TAG_PREFIX}" == rocm* ]]; then
# extract rocm version from image name. e.g. manylinux2_28-builder:rocm6.2.4 returns 6.2.4
GPU_ARCH_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'rocm' '{print $2}')
fi
case ${DOCKER_TAG_PREFIX} in
cpu) cpu)
BASE_TARGET=cpu BASE_TARGET=cpu
DOCKER_TAG=cpu
GPU_IMAGE=ubuntu:20.04 GPU_IMAGE=ubuntu:20.04
DOCKER_GPU_BUILD_ARG="" DOCKER_GPU_BUILD_ARG=""
;; ;;
cuda*) cuda)
BASE_TARGET=cuda${GPU_ARCH_VERSION} BASE_TARGET=cuda${GPU_ARCH_VERSION}
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
GPU_IMAGE=ubuntu:20.04 GPU_IMAGE=ubuntu:20.04
DOCKER_GPU_BUILD_ARG="" DOCKER_GPU_BUILD_ARG=""
;; ;;
rocm*) rocm)
# we want the patch version of 6.4 instead
if [[ $(ver $GPU_ARCH_VERSION) -eq $(ver 6.4) ]]; then
GPU_ARCH_VERSION="${GPU_ARCH_VERSION}.2"
fi
BASE_TARGET=rocm BASE_TARGET=rocm
GPU_IMAGE=rocm/dev-ubuntu-22.04:${GPU_ARCH_VERSION}-complete DOCKER_TAG=rocm${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-ubuntu-20.04:${GPU_ARCH_VERSION}-complete
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201" PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
DOCKER_GPU_BUILD_ARG="--build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg ROCM_VERSION=${GPU_ARCH_VERSION}" DOCKER_GPU_BUILD_ARG="--build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}"
;; ;;
*) *)
echo "ERROR: Unrecognized DOCKER_TAG_PREFIX: ${DOCKER_TAG_PREFIX}" echo "ERROR: Unrecognized GPU_ARCH_TYPE: ${GPU_ARCH_TYPE}"
exit 1 exit 1
;; ;;
esac esac
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
DOCKER_BUILDKIT=1 ${DOCKER} build \ (
--target final \ set -x
${DOCKER_GPU_BUILD_ARG} \ DOCKER_BUILDKIT=1 ${DOCKER} build \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \ --target final \
--build-arg "BASE_TARGET=${BASE_TARGET}" \ ${DOCKER_GPU_BUILD_ARG} \
-t "${tmp_tag}" \ --build-arg "GPU_IMAGE=${GPU_IMAGE}" \
$@ \ --build-arg "BASE_TARGET=${BASE_TARGET}" \
-f "${TOPDIR}/.ci/docker/libtorch/Dockerfile" \ -t "${DOCKER_IMAGE}" \
"${TOPDIR}/.ci/docker/" $@ \
-f "${TOPDIR}/.ci/docker/libtorch/Dockerfile" \
"${TOPDIR}/.ci/docker/"
)
GITHUB_REF=${GITHUB_REF:-$(git symbolic-ref -q HEAD || git describe --tags --exact-match)}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH}" == true ]]; then
(
set -x
${DOCKER} push "${DOCKER_IMAGE}"
if [[ -n ${GITHUB_REF} ]]; then
${DOCKER} tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_BRANCH_TAG}
${DOCKER} tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_SHA_TAG}
${DOCKER} push "${DOCKER_IMAGE_BRANCH_TAG}"
${DOCKER} push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi

View File

@ -18,31 +18,28 @@ COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest) # Install conda and other packages (e.g., numpy, pytest)
ARG PYTHON_VERSION ARG ANACONDA_PYTHON_VERSION
ARG PIP_CMAKE ARG CONDA_CMAKE
# Put venv into the env vars so users don't need to activate it ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /var/lib/jenkins/ci_env/bin:$PATH ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ENV VIRTUAL_ENV /var/lib/jenkins/ci_env COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY requirements-ci.txt /opt/requirements-ci.txt COPY ./common/install_conda.sh install_conda.sh
COPY ./common/install_python.sh install_python.sh COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_python.sh && rm install_python.sh /opt/requirements-ci.txt COPY ./common/install_magma_conda.sh install_magma_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh install_magma_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Install cuda and cudnn # Install cuda and cudnn
ARG CUDA_VERSION ARG CUDA_VERSION
COPY ./common/install_cuda.sh install_cuda.sh COPY ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh install_nccl.sh /ci_commit_pins/nccl-cu* install_cusparselt.sh
ENV DESIRED_CUDA ${CUDA_VERSION} ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
# Note that Docker build forbids copying file outside the build context # Note that Docker build forbids copying file outside the build context
COPY ./common/install_linter.sh install_linter.sh COPY ./common/install_linter.sh install_linter.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_linter.sh RUN bash ./install_linter.sh
RUN rm install_linter.sh RUN rm install_linter.sh common_utils.sh
RUN chown -R jenkins:jenkins /var/lib/jenkins/ci_env
USER jenkins USER jenkins
CMD ["bash"] CMD ["bash"]

View File

@ -15,19 +15,20 @@ COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest) # Install conda and other packages (e.g., numpy, pytest)
ARG PYTHON_VERSION ARG ANACONDA_PYTHON_VERSION
ENV PATH /var/lib/jenkins/ci_env/bin:$PATH ARG CONDA_CMAKE
ENV VIRTUAL_ENV /var/lib/jenkins/ci_env ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
COPY requirements-ci.txt /opt/requirements-ci.txt ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY ./common/install_python.sh install_python.sh COPY requirements-ci.txt /opt/conda/requirements-ci.txt
RUN bash ./install_python.sh && rm install_python.sh /opt/requirements-ci.txt COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Note that Docker build forbids copying file outside the build context # Note that Docker build forbids copying file outside the build context
COPY ./common/install_linter.sh install_linter.sh COPY ./common/install_linter.sh install_linter.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_linter.sh RUN bash ./install_linter.sh
RUN rm install_linter.sh RUN rm install_linter.sh common_utils.sh
RUN chown -R jenkins:jenkins /var/lib/jenkins/ci_env
USER jenkins USER jenkins
CMD ["bash"] CMD ["bash"]

View File

@ -0,0 +1,200 @@
# syntax = docker/dockerfile:experimental
ARG ROCM_VERSION=3.7
ARG BASE_CUDA_VERSION=11.8
ARG GPU_IMAGE=centos:7
FROM centos:7 as base
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=9
# Note: This is required patch since CentOS have reached EOL
# otherwise any yum install setp will fail
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y wget curl perl util-linux xz bzip2 git patch which perl zlib-devel
# Just add everything as a safe.directory for git since these will be used in multiple places with git
RUN git config --global --add safe.directory '*'
RUN yum install -y yum-utils centos-release-scl
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
# Note: After running yum-config-manager --enable rhel-server-rhscl-7-rpms
# patch is required once again. Somehow this steps adds mirror.centos.org
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y devtoolset-${DEVTOOLSET_VERSION}-gcc devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran devtoolset-${DEVTOOLSET_VERSION}-binutils
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
RUN yum --enablerepo=extras install -y epel-release
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake
RUN yum install -y autoconf aclocal automake make sudo
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
# EPEL for cmake
FROM base as patchelf
# Install patchelf
ADD ./common/install_patchelf.sh install_patchelf.sh
RUN bash ./install_patchelf.sh && rm install_patchelf.sh
RUN cp $(which patchelf) /patchelf
FROM patchelf as python
# build python
COPY manywheel/build_scripts /build_scripts
ADD ./common/install_cpython.sh /build_scripts/install_cpython.sh
RUN bash build_scripts/build.sh && rm -r build_scripts
FROM base as cuda
ARG BASE_CUDA_VERSION=10.2
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
FROM base as intel
# MKL
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as magma
ARG BASE_CUDA_VERSION=10.2
# Install magma
ADD ./common/install_magma.sh install_magma.sh
RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh
FROM base as jni
# Install java jni header
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
FROM base as libpng
# Install libpng
ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM ${GPU_IMAGE} as common
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
RUN yum install -y \
aclocal \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm
RUN yum install -y \
https://repo.ius.io/ius-release-el7.rpm \
https://ossci-linux.s3.amazonaws.com/epel-release-7-14.noarch.rpm
RUN yum swap -y git git236-core
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
# Install LLVM version
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
COPY --from=python /opt/python/cp39-cp39/bin/auditwheel /usr/local/bin/auditwheel
COPY --from=intel /opt/intel /opt/intel
COPY --from=patchelf /usr/local/bin/patchelf /usr/local/bin/patchelf
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
COPY --from=libpng /usr/local/bin/png* /usr/local/bin/
COPY --from=libpng /usr/local/bin/libpng* /usr/local/bin/
COPY --from=libpng /usr/local/include/png* /usr/local/include/
COPY --from=libpng /usr/local/include/libpng* /usr/local/include/
COPY --from=libpng /usr/local/lib/libpng* /usr/local/lib/
COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/lib/pkgconfig
FROM common as cpu_final
ARG BASE_CUDA_VERSION=10.1
ARG DEVTOOLSET_VERSION=9
# Install Anaconda
ADD ./common/install_conda_docker.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
ENV PATH /opt/conda/bin:$PATH
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y yum-utils centos-release-scl
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y devtoolset-${DEVTOOLSET_VERSION}-gcc devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran devtoolset-${DEVTOOLSET_VERSION}-binutils
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# cmake is already installed inside the rocm base image, so remove if present
RUN rpm -e cmake || true
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake
# ninja
RUN yum install -y ninja-build
FROM cpu_final as cuda_final
RUN rm -rf /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=cuda /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=magma /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
RUN ln -sf /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda
ENV PATH=/usr/local/cuda/bin:$PATH
FROM cpu_final as rocm_final
ARG ROCM_VERSION=3.7
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Adding ROCM_PATH env var so that LoadHip.cmake (even with logic updated for ROCm6.0)
# find HIP works for ROCm5.7. Not needed for ROCm6.0 and above.
# Remove below when ROCm5.7 is not in support matrix anymore.
ENV ROCM_PATH /opt/rocm
ENV MKLROOT /opt/intel
# No need to install ROCm as base docker image should have full ROCm install
#ADD ./common/install_rocm.sh install_rocm.sh
#RUN ROCM_VERSION=${ROCM_VERSION} bash ./install_rocm.sh && rm install_rocm.sh
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
# cmake3 is needed for the MIOpen build
RUN ln -sf /usr/local/bin/cmake /usr/bin/cmake3
ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh

View File

@ -7,8 +7,8 @@ ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8 ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8 ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=13 ARG DEVTOOLSET_VERSION=11
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel yum-utils gcc-toolset-${DEVTOOLSET_VERSION}-gcc gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran gcc-toolset-${DEVTOOLSET_VERSION}-gdb RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel yum-utils gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
@ -26,20 +26,17 @@ ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh RUN bash ./install_openssl.sh && rm install_openssl.sh
# remove unnecessary python versions # remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2 RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4 RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6 RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6 RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
FROM base as cuda FROM base as cuda
ARG BASE_CUDA_VERSION=12.6 ARG BASE_CUDA_VERSION=11.8
# Install CUDA # Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh install_nccl.sh ci_commit_pins/nccl-cu* install_cusparselt.sh
FROM base as intel FROM base as intel
# MKL # MKL
@ -47,7 +44,7 @@ ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as magma FROM base as magma
ARG BASE_CUDA_VERSION=12.6 ARG BASE_CUDA_VERSION=10.2
# Install magma # Install magma
ADD ./common/install_magma.sh install_magma.sh ADD ./common/install_magma.sh install_magma.sh
RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh
@ -64,7 +61,7 @@ ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM ${GPU_IMAGE} as common FROM ${GPU_IMAGE} as common
ARG DEVTOOLSET_VERSION=13 ARG DEVTOOLSET_VERSION=11
ENV LC_ALL en_US.UTF-8 ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8 ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8 ENV LANGUAGE en_US.UTF-8
@ -87,12 +84,13 @@ RUN yum install -y \
wget \ wget \
which \ which \
xz \ xz \
glibc-langpack-en \ gcc-toolset-${DEVTOOLSET_VERSION}-toolchain \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc \ glibc-langpack-en
gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ \ RUN yum install -y \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran \ https://repo.ius.io/ius-release-el7.rpm \
gcc-toolset-${DEVTOOLSET_VERSION}-gdb https://ossci-linux.s3.amazonaws.com/epel-release-7-14.noarch.rpm
RUN yum swap -y git git236-core
# git236+ would refuse to run git commands in repos owned by other users # git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image # Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe # Override this behaviour by treating every folder as safe
@ -103,7 +101,6 @@ ENV SSL_CERT_FILE=/opt/_internal/certs.pem
# Install LLVM version # Install LLVM version
COPY --from=openssl /opt/openssl /opt/openssl COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=base /opt/python /opt/python COPY --from=base /opt/python /opt/python
COPY --from=base /usr/local/lib/ /usr/local/lib/
COPY --from=base /opt/_internal /opt/_internal COPY --from=base /opt/_internal /opt/_internal
COPY --from=base /usr/local/bin/auditwheel /usr/local/bin/auditwheel COPY --from=base /usr/local/bin/auditwheel /usr/local/bin/auditwheel
COPY --from=intel /opt/intel /opt/intel COPY --from=intel /opt/intel /opt/intel
@ -117,8 +114,8 @@ COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
FROM common as cpu_final FROM common as cpu_final
ARG BASE_CUDA_VERSION=12.6 ARG BASE_CUDA_VERSION=11.8
ARG DEVTOOLSET_VERSION=13 ARG DEVTOOLSET_VERSION=11
# Install Anaconda # Install Anaconda
ADD ./common/install_conda_docker.sh install_conda.sh ADD ./common/install_conda_docker.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh
@ -157,14 +154,11 @@ ENV ROCM_PATH /opt/rocm
# and avoid 3.21.0 cmake+ninja issues with ninja inserting "-Wl,--no-as-needed" in LINK_FLAGS for static linker # and avoid 3.21.0 cmake+ninja issues with ninja inserting "-Wl,--no-as-needed" in LINK_FLAGS for static linker
RUN python3 -m pip install --upgrade pip && \ RUN python3 -m pip install --upgrade pip && \
python3 -mpip install cmake==3.28.4 python3 -mpip install cmake==3.28.4
# replace the libdrm in /opt/amdgpu with custom amdgpu.ids lookup path
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
# ROCm 6.4 rocm-smi depends on system drm.h header
RUN yum install -y libdrm-devel
ENV MKLROOT /opt/intel ENV MKLROOT /opt/intel
ADD ./common/install_rocm_magma.sh install_rocm_magma.sh ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION} && rm install_rocm_magma.sh RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
ADD ./common/install_miopen.sh install_miopen.sh ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh
@ -175,6 +169,6 @@ ENV XPU_DRIVER_TYPE ROLLING
RUN python3 -m pip install --upgrade pip && \ RUN python3 -m pip install --upgrade pip && \
python3 -mpip install cmake==3.28.4 python3 -mpip install cmake==3.28.4
ADD ./common/install_xpu.sh install_xpu.sh ADD ./common/install_xpu.sh install_xpu.sh
ENV XPU_VERSION 2025.1 ENV XPU_VERSION 2025.0
RUN bash ./install_xpu.sh && rm install_xpu.sh RUN bash ./install_xpu.sh && rm install_xpu.sh
RUN pushd /opt/_internal && tar -xJf static-libs-for-embedding-only.tar.xz && popd RUN pushd /opt/_internal && tar -xJf static-libs-for-embedding-only.tar.xz && popd

View File

@ -1,8 +1,9 @@
FROM quay.io/pypa/manylinux_2_28_aarch64 as base FROM quay.io/pypa/manylinux_2_28_aarch64 as base
ARG GCCTOOLSET_VERSION=13 # Graviton needs GCC 10 or above for the build. GCC12 is the default version in almalinux-8.
ARG GCCTOOLSET_VERSION=11
# Language variables # Language variabes
ENV LC_ALL=en_US.UTF-8 ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8 ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8 ENV LANGUAGE=en_US.UTF-8
@ -35,10 +36,7 @@ RUN yum install -y \
yasm \ yasm \
zstd \ zstd \
sudo \ sudo \
gcc-toolset-${GCCTOOLSET_VERSION}-gcc \ gcc-toolset-${GCCTOOLSET_VERSION}-toolchain
gcc-toolset-${GCCTOOLSET_VERSION}-gcc-c++ \
gcc-toolset-${GCCTOOLSET_VERSION}-gcc-gfortran \
gcc-toolset-${GCCTOOLSET_VERSION}-gdb
# (optional) Install non-default Ninja version # (optional) Install non-default Ninja version
ARG NINJA_VERSION ARG NINJA_VERSION
@ -58,13 +56,12 @@ RUN git config --global --add safe.directory "*"
FROM base as openblas FROM base as openblas
# Install openblas # Install openblas
ARG OPENBLAS_VERSION
ADD ./common/install_openblas.sh install_openblas.sh ADD ./common/install_openblas.sh install_openblas.sh
RUN bash ./install_openblas.sh && rm install_openblas.sh RUN bash ./install_openblas.sh && rm install_openblas.sh
FROM base as final FROM base as final
# remove unnecessary python versions # remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2 RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4 RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6 RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6

View File

@ -0,0 +1,94 @@
FROM quay.io/pypa/manylinux2014_aarch64 as base
# Graviton needs GCC 10 for the build
ARG DEVTOOLSET_VERSION=10
# Language variabes
ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8
# Installed needed OS packages. This is to support all
# the binary builds (torch, vision, audio, text, data)
RUN yum -y install epel-release
RUN yum -y update
RUN yum install -y \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm \
less \
zstd \
libgomp \
sudo \
devtoolset-${DEVTOOLSET_VERSION}-gcc \
devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ \
devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran \
devtoolset-${DEVTOOLSET_VERSION}-binutils
# Ensure the expected devtoolset is used
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
###############################################################################
# libglfortran.a hack
#
# libgfortran.a from quay.io/pypa/manylinux2014_aarch64 is not compiled with -fPIC.
# This causes __stack_chk_guard@@GLIBC_2.17 on pytorch build. To solve, get
# ubuntu's libgfortran.a which is compiled with -fPIC
# NOTE: Need a better way to get this library as Ubuntu's package can be removed by the vender, or changed
###############################################################################
RUN cd ~/ \
&& curl -L -o ~/libgfortran-10-dev.deb http://ports.ubuntu.com/ubuntu-ports/pool/universe/g/gcc-10/libgfortran-10-dev_10.5.0-4ubuntu2_arm64.deb \
&& ar x ~/libgfortran-10-dev.deb \
&& tar --use-compress-program=unzstd -xvf data.tar.zst -C ~/ \
&& cp -f ~/usr/lib/gcc/aarch64-linux-gnu/10/libgfortran.a /opt/rh/devtoolset-10/root/usr/lib/gcc/aarch64-redhat-linux/10/
# install cmake
RUN yum install -y cmake3 && \
ln -s /usr/bin/cmake3 /usr/bin/cmake
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
FROM base as openblas
# Install openblas
ADD ./common/install_openblas.sh install_openblas.sh
RUN bash ./install_openblas.sh && rm install_openblas.sh
FROM openssl as final
# remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
COPY --from=openblas /opt/OpenBLAS/ /opt/OpenBLAS/
ENV LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH

View File

@ -1,7 +1,7 @@
FROM quay.io/pypa/manylinux_2_28_aarch64 as base FROM quay.io/pypa/manylinux_2_28_aarch64 as base
# Cuda ARM build needs gcc 11 # Cuda ARM build needs gcc 11
ARG DEVTOOLSET_VERSION=13 ARG DEVTOOLSET_VERSION=11
# Language variables # Language variables
ENV LC_ALL=en_US.UTF-8 ENV LC_ALL=en_US.UTF-8
@ -34,10 +34,7 @@ RUN yum install -y \
zstd \ zstd \
libgomp \ libgomp \
sudo \ sudo \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc \ gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ \
gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran \
gcc-toolset-${DEVTOOLSET_VERSION}-gdb
# Ensure the expected devtoolset is used # Ensure the expected devtoolset is used
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
@ -60,7 +57,7 @@ RUN bash ./install_openssl.sh && rm install_openssl.sh
ENV SSL_CERT_FILE=/opt/_internal/certs.pem ENV SSL_CERT_FILE=/opt/_internal/certs.pem
FROM openssl as final FROM openssl as final
# remove unnecessary python versions # remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2 RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4 RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6 RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
@ -69,11 +66,8 @@ RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
FROM base as cuda FROM base as cuda
ARG BASE_CUDA_VERSION ARG BASE_CUDA_VERSION
# Install CUDA # Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh ADD ./common/install_cuda_aarch64.sh install_cuda_aarch64.sh
COPY ./common/install_nccl.sh install_nccl.sh RUN bash ./install_cuda_aarch64.sh ${BASE_CUDA_VERSION} && rm install_cuda_aarch64.sh
COPY ./common/install_cusparselt.sh install_cusparselt.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh install_nccl.sh ci_commit_pins/nccl-cu* install_cusparselt.sh
FROM base as magma FROM base as magma
ARG BASE_CUDA_VERSION ARG BASE_CUDA_VERSION

View File

@ -5,9 +5,7 @@ ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8 ENV LANG=C.UTF-8
ENV LANGUAGE=C.UTF-8 ENV LANGUAGE=C.UTF-8
# there is a bugfix in gcc >= 14 for precompiled headers and s390x vectorization interaction. ARG DEVTOOLSET_VERSION=13
# with earlier gcc versions test/inductor/test_cpu_cpp_wrapper.py will fail.
ARG DEVTOOLSET_VERSION=14
# Installed needed OS packages. This is to support all # Installed needed OS packages. This is to support all
# the binary builds (torch, vision, audio, text, data) # the binary builds (torch, vision, audio, text, data)
RUN yum -y install epel-release RUN yum -y install epel-release
@ -44,7 +42,6 @@ RUN yum install -y \
llvm-devel \ llvm-devel \
libzstd-devel \ libzstd-devel \
python3.12-devel \ python3.12-devel \
python3.12-test \
python3.12-setuptools \ python3.12-setuptools \
python3.12-pip \ python3.12-pip \
python3-virtualenv \ python3-virtualenv \
@ -60,8 +57,7 @@ RUN yum install -y \
libxslt-devel \ libxslt-devel \
libxml2-devel \ libxml2-devel \
openssl-devel \ openssl-devel \
valgrind \ valgrind
ninja-build
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
@ -105,37 +101,24 @@ CMD ["/bin/bash"]
# install test dependencies: # install test dependencies:
# - grpcio requires system openssl, bundled crypto fails to build # - grpcio requires system openssl, bundled crypto fails to build
# - ml_dtypes 0.4.0 requires some fixes provided in later commits to build
RUN dnf install -y \ RUN dnf install -y \
hdf5-devel \ protobuf-devel \
python3-h5py \ protobuf-c-devel \
git protobuf-lite-devel \
wget \
patch
RUN env GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True pip3 install grpcio RUN env GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True pip3 install grpcio==1.65.4
RUN cd ~ && \
# cmake-3.28.0 from pip for onnxruntime git clone https://github.com/jax-ml/ml_dtypes && \
RUN python3 -mpip install cmake==3.28.0 cd ml_dtypes && \
git checkout v0.4.0 && \
# build onnxruntime 1.21.0 from sources.
# it is not possible to build it from sources using pip,
# so just build it from upstream repository.
# h5py is dependency of onnxruntime_training.
# h5py==3.11.0 builds with hdf5-devel 1.10.5 from repository.
# h5py 3.11.0 doesn't build with numpy >= 2.3.0.
# install newest flatbuffers version first:
# for some reason old version is getting pulled in otherwise.
# packaging package is required for onnxruntime wheel build.
RUN pip3 install flatbuffers && \
pip3 install cython 'pkgconfig>=1.5.5' 'setuptools>=77' 'numpy<2.3.0' && \
pip3 install --no-build-isolation h5py==3.11.0 && \
pip3 install packaging && \
git clone https://github.com/microsoft/onnxruntime && \
cd onnxruntime && git checkout v1.21.0 && \
git submodule update --init --recursive && \ git submodule update --init --recursive && \
wget https://github.com/microsoft/onnxruntime/commit/f57db79743c4d1a3553aa05cf95bcd10966030e6.patch && \ wget https://github.com/jax-ml/ml_dtypes/commit/b969f76914d6b30676721bc92bf0f6021a0d1321.patch && \
patch -p1 < f57db79743c4d1a3553aa05cf95bcd10966030e6.patch && \ wget https://github.com/jax-ml/ml_dtypes/commit/d4e6d035ecda073eab8bcf60f4eef572ee7087e6.patch && \
./build.sh --config Release --parallel 0 --enable_pybind \ patch -p1 < b969f76914d6b30676721bc92bf0f6021a0d1321.patch && \
--build_wheel --enable_training --enable_training_apis \ patch -p1 < d4e6d035ecda073eab8bcf60f4eef572ee7087e6.patch && \
--enable_training_ops --skip_tests --allow_running_as_root \ python3 setup.py bdist_wheel && \
--compile_no_warning_as_error && \ pip3 install dist/*.whl && \
pip3 install ./build/Linux/Release/dist/onnxruntime_training-*.whl && \ rm -rf ml_dtypes
cd .. && /bin/rm -rf ./onnxruntime

View File

@ -1,7 +1,7 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# Script used only in CD pipeline # Script used only in CD pipeline
set -exou pipefail set -eou pipefail
TOPDIR=$(git rev-parse --show-toplevel) TOPDIR=$(git rev-parse --show-toplevel)
@ -9,115 +9,152 @@ image="$1"
shift shift
if [ -z "${image}" ]; then if [ -z "${image}" ]; then
echo "Usage: $0 IMAGE:ARCHTAG" echo "Usage: $0 IMAGE"
exit 1 exit 1
fi fi
# Go from imagename:tag to tag DOCKER_IMAGE="pytorch/${image}"
DOCKER_TAG_PREFIX=$(echo "${image}" | awk -F':' '{print $2}')
GPU_ARCH_VERSION="" DOCKER_REGISTRY="${DOCKER_REGISTRY:-docker.io}"
if [[ "${DOCKER_TAG_PREFIX}" == cuda* ]]; then
# extract cuda version from image name. e.g. manylinux2_28-builder:cuda12.8 returns 12.8
GPU_ARCH_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'cuda' '{print $2}')
elif [[ "${DOCKER_TAG_PREFIX}" == rocm* ]]; then
# extract rocm version from image name. e.g. manylinux2_28-builder:rocm6.2.4 returns 6.2.4
GPU_ARCH_VERSION=$(echo "${DOCKER_TAG_PREFIX}" | awk -F'rocm' '{print $2}')
fi
GPU_ARCH_TYPE=${GPU_ARCH_TYPE:-cpu}
GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}
MANY_LINUX_VERSION=${MANY_LINUX_VERSION:-} MANY_LINUX_VERSION=${MANY_LINUX_VERSION:-}
DOCKERFILE_SUFFIX=${DOCKERFILE_SUFFIX:-} DOCKERFILE_SUFFIX=${DOCKERFILE_SUFFIX:-}
OPENBLAS_VERSION=${OPENBLAS_VERSION:-} WITH_PUSH=${WITH_PUSH:-}
case ${image} in case ${GPU_ARCH_TYPE} in
manylinux2_28-builder:cpu) cpu)
TARGET=cpu_final TARGET=cpu_final
DOCKER_TAG=cpu
GPU_IMAGE=centos:7
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=9"
;;
cpu-manylinux_2_28)
TARGET=cpu_final
DOCKER_TAG=cpu
GPU_IMAGE=amd64/almalinux:8 GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=13" DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28" MANY_LINUX_VERSION="2_28"
;; ;;
manylinux2_28_aarch64-builder:cpu-aarch64) cpu-aarch64)
TARGET=final TARGET=final
GPU_IMAGE=arm64v8/almalinux:8 DOCKER_TAG=cpu-aarch64
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=13 --build-arg NINJA_VERSION=1.12.1" GPU_IMAGE=arm64v8/centos:7
MANY_LINUX_VERSION="2_28_aarch64" DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=10"
OPENBLAS_VERSION="v0.3.30" MANY_LINUX_VERSION="aarch64"
;; ;;
manylinuxcxx11-abi-builder:cpu-cxx11-abi) cpu-aarch64-2_28)
TARGET=final TARGET=final
DOCKER_TAG=cpu-aarch64
GPU_IMAGE=arm64v8/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11 --build-arg NINJA_VERSION=1.12.1"
MANY_LINUX_VERSION="2_28_aarch64"
;;
cpu-cxx11-abi)
TARGET=final
DOCKER_TAG=cpu-cxx11-abi
GPU_IMAGE="" GPU_IMAGE=""
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=9" DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=9"
MANY_LINUX_VERSION="cxx11-abi" MANY_LINUX_VERSION="cxx11-abi"
;; ;;
manylinuxs390x-builder:cpu-s390x) cpu-s390x)
TARGET=final TARGET=final
DOCKER_TAG=cpu-s390x
GPU_IMAGE=s390x/almalinux:8 GPU_IMAGE=s390x/almalinux:8
DOCKER_GPU_BUILD_ARG="" DOCKER_GPU_BUILD_ARG=""
MANY_LINUX_VERSION="s390x" MANY_LINUX_VERSION="s390x"
;; ;;
manylinux2_28-builder:cuda11*) cuda)
TARGET=cuda_final TARGET=cuda_final
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
# Keep this up to date with the minimum version of CUDA we currently support
GPU_IMAGE=centos:7
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=9"
;;
cuda-manylinux_2_28)
TARGET=cuda_final
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
GPU_IMAGE=amd64/almalinux:8 GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=11" DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28" MANY_LINUX_VERSION="2_28"
;; ;;
manylinux2_28-builder:cuda12*) cuda-aarch64)
TARGET=cuda_final TARGET=cuda_final
GPU_IMAGE=amd64/almalinux:8 DOCKER_TAG=cuda${GPU_ARCH_VERSION}
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=13" GPU_IMAGE=arm64v8/centos:7
MANY_LINUX_VERSION="2_28" DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=11"
;;
manylinuxaarch64-builder:cuda*)
TARGET=cuda_final
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=13"
MANY_LINUX_VERSION="aarch64" MANY_LINUX_VERSION="aarch64"
DOCKERFILE_SUFFIX="_cuda_aarch64" DOCKERFILE_SUFFIX="_cuda_aarch64"
;; ;;
manylinux2_28-builder:rocm*) rocm|rocm-manylinux_2_28)
# we want the patch version of 6.4 instead
if [[ $(ver $GPU_ARCH_VERSION) -eq $(ver 6.4) ]]; then
GPU_ARCH_VERSION="${GPU_ARCH_VERSION}.2"
fi
TARGET=rocm_final TARGET=rocm_final
MANY_LINUX_VERSION="2_28" DOCKER_TAG=rocm${GPU_ARCH_VERSION}
DEVTOOLSET_VERSION="11" GPU_IMAGE=rocm/dev-centos-7:${GPU_ARCH_VERSION}-complete
GPU_IMAGE=rocm/dev-almalinux-8:${GPU_ARCH_VERSION}-complete DEVTOOLSET_VERSION="9"
if [ ${GPU_ARCH_TYPE} == "rocm-manylinux_2_28" ]; then
MANY_LINUX_VERSION="2_28"
DEVTOOLSET_VERSION="11"
GPU_IMAGE=rocm/dev-almalinux-8:${GPU_ARCH_VERSION}-complete
fi
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201" PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
DOCKER_GPU_BUILD_ARG="--build-arg ROCM_VERSION=${GPU_ARCH_VERSION} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" DOCKER_GPU_BUILD_ARG="--build-arg ROCM_VERSION=${GPU_ARCH_VERSION} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}"
;; ;;
manylinux2_28-builder:xpu) xpu)
TARGET=xpu_final TARGET=xpu_final
DOCKER_TAG=xpu
GPU_IMAGE=amd64/almalinux:8 GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11" DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28" MANY_LINUX_VERSION="2_28"
;; ;;
*) *)
echo "ERROR: Unrecognized image name: ${image}" echo "ERROR: Unrecognized GPU_ARCH_TYPE: ${GPU_ARCH_TYPE}"
exit 1 exit 1
;; ;;
esac esac
IMAGES=''
if [[ -n ${MANY_LINUX_VERSION} && -z ${DOCKERFILE_SUFFIX} ]]; then if [[ -n ${MANY_LINUX_VERSION} && -z ${DOCKERFILE_SUFFIX} ]]; then
DOCKERFILE_SUFFIX=_${MANY_LINUX_VERSION} DOCKERFILE_SUFFIX=_${MANY_LINUX_VERSION}
fi fi
# Only activate this if in CI (
if [ "$(uname -m)" != "s390x" ] && [ -v CI ]; then set -x
# TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
# is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023. # Only activate this if in CI
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service if [ "$(uname -m)" != "s390x" ] && [ -v CI ]; then
sudo systemctl daemon-reload # TODO: Remove LimitNOFILE=1048576 patch once https://github.com/pytorch/test-infra/issues/5712
sudo systemctl restart docker # is resolved. This patch is required in order to fix timing out of Docker build on Amazon Linux 2023.
sudo sed -i s/LimitNOFILE=infinity/LimitNOFILE=1048576/ /usr/lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl restart docker
fi
DOCKER_BUILDKIT=1 docker build \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--target "${TARGET}" \
-t "${DOCKER_IMAGE}" \
$@ \
-f "${TOPDIR}/.ci/docker/manywheel/Dockerfile${DOCKERFILE_SUFFIX}" \
"${TOPDIR}/.ci/docker/"
)
GITHUB_REF=${GITHUB_REF:-"dev")}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH}" == true ]]; then
(
set -x
docker push "${DOCKER_IMAGE}"
if [[ -n ${GITHUB_REF} ]]; then
docker tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_BRANCH_TAG}
docker tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_SHA_TAG}
docker push "${DOCKER_IMAGE_BRANCH_TAG}"
docker push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi fi
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
DOCKER_BUILDKIT=1 docker build \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--build-arg "OPENBLAS_VERSION=${OPENBLAS_VERSION}" \
--target "${TARGET}" \
-t "${tmp_tag}" \
$@ \
-f "${TOPDIR}/.ci/docker/manywheel/Dockerfile${DOCKERFILE_SUFFIX}" \
"${TOPDIR}/.ci/docker/"

View File

@ -97,7 +97,7 @@ find /opt/_internal -type f -print0 \
| xargs -0 -n1 strip --strip-unneeded 2>/dev/null || true | xargs -0 -n1 strip --strip-unneeded 2>/dev/null || true
# We do not need the Python test suites, or indeed the precompiled .pyc and # We do not need the Python test suites, or indeed the precompiled .pyc and
# .pyo files. Partially cribbed from: # .pyo files. Partially cribbed from:
# https://github.com/docker-library/python/blob/master/3.4/slim/Dockerfile # @lint-ignore # https://github.com/docker-library/python/blob/master/3.4/slim/Dockerfile
find /opt/_internal \ find /opt/_internal \
\( -type d -a -name test -o -name tests \) \ \( -type d -a -name test -o -name tests \) \
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \ -o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \

View File

@ -2,7 +2,7 @@
# Helper utilities for build # Helper utilities for build
# Script used only in CD pipeline # Script used only in CD pipeline
OPENSSL_DOWNLOAD_URL=https://www.openssl.org/source/old/1.1.1/ # @lint-ignore OPENSSL_DOWNLOAD_URL=https://www.openssl.org/source/old/1.1.1/
CURL_DOWNLOAD_URL=https://curl.se/download CURL_DOWNLOAD_URL=https://curl.se/download
AUTOCONF_DOWNLOAD_URL=https://ftp.gnu.org/gnu/autoconf AUTOCONF_DOWNLOAD_URL=https://ftp.gnu.org/gnu/autoconf

View File

@ -16,7 +16,6 @@ click
#test that import: #test that import:
coremltools==5.0b5 ; python_version < "3.12" coremltools==5.0b5 ; python_version < "3.12"
coremltools==8.3 ; python_version == "3.12"
#Description: Apple framework for ML integration #Description: Apple framework for ML integration
#Pinned versions: 5.0b5 #Pinned versions: 5.0b5
#test that import: #test that import:
@ -42,15 +41,15 @@ fbscribelogger==0.1.7
#Pinned versions: 0.1.6 #Pinned versions: 0.1.6
#test that import: #test that import:
flatbuffers==24.12.23 flatbuffers==2.0
#Description: cross platform serialization library #Description: cross platform serialization library
#Pinned versions: 24.12.23 #Pinned versions: 2.0
#test that import: #test that import:
hypothesis==5.35.1 hypothesis==5.35.1
# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136 # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
#Description: advanced library for generating parametrized tests #Description: advanced library for generating parametrized tests
#Pinned versions: 5.35.1 #Pinned versions: 3.44.6, 4.53.2
#test that import: test_xnnpack_integration.py, test_pruning_op.py, test_nn.py #test that import: test_xnnpack_integration.py, test_pruning_op.py, test_nn.py
junitparser==2.1.1 junitparser==2.1.1
@ -64,7 +63,6 @@ lark==0.12.0
#test that import: #test that import:
librosa>=0.6.2 ; python_version < "3.11" librosa>=0.6.2 ; python_version < "3.11"
librosa==0.10.2 ; python_version == "3.12"
#Description: A python package for music and audio analysis #Description: A python package for music and audio analysis
#Pinned versions: >=0.6.2 #Pinned versions: >=0.6.2
#test that import: test_spectral_ops.py #test that import: test_spectral_ops.py
@ -92,10 +90,10 @@ librosa==0.10.2 ; python_version == "3.12"
#Pinned versions: #Pinned versions:
#test that import: #test that import:
mypy==1.16.0 mypy==1.14.0
# Pin MyPy version because new errors are likely to appear with each release # Pin MyPy version because new errors are likely to appear with each release
#Description: linter #Description: linter
#Pinned versions: 1.16.0 #Pinned versions: 1.14.0
#test that import: test_typing.py, test_type_hints.py #test that import: test_typing.py, test_type_hints.py
networkx==2.8.8 networkx==2.8.8
@ -104,16 +102,15 @@ networkx==2.8.8
#Pinned versions: 2.8.8 #Pinned versions: 2.8.8
#test that import: functorch #test that import: functorch
ninja==1.11.1.3 #ninja
#Description: build system. Used in some tests. Used in build to generate build #Description: build system. Note that it install from
#time tracing information #here breaks things so it is commented out
#Pinned versions: 1.11.1.3 #Pinned versions: 1.10.0.post1
#test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py #test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py
numba==0.49.0 ; python_version < "3.9" numba==0.49.0 ; python_version < "3.9"
numba==0.55.2 ; python_version == "3.9" numba==0.55.2 ; python_version == "3.9"
numba==0.55.2 ; python_version == "3.10" numba==0.55.2 ; python_version == "3.10"
numba==0.60.0 ; python_version == "3.12"
#Description: Just-In-Time Compiler for Numerical Functions #Description: Just-In-Time Compiler for Numerical Functions
#Pinned versions: 0.54.1, 0.49.0, <=0.49.1 #Pinned versions: 0.54.1, 0.49.0, <=0.49.1
#test that import: test_numba_integration.py #test that import: test_numba_integration.py
@ -166,10 +163,10 @@ pillow==11.0.0
#Pinned versions: 10.3.0 #Pinned versions: 10.3.0
#test that import: #test that import:
protobuf==5.29.4 protobuf==3.20.2
#Description: Google's data interchange format #Description: Googles data interchange format
#Pinned versions: 5.29.4 #Pinned versions: 3.20.1
#test that import: test_tensorboard.py, test/onnx/* #test that import: test_tensorboard.py
psutil psutil
#Description: information on running processes and system utilization #Description: information on running processes and system utilization
@ -221,9 +218,9 @@ pygments==2.15.0
#Pinned versions: 2.12.0 #Pinned versions: 2.12.0
#test that import: the doctests #test that import: the doctests
#pyyaml #PyYAML
#Description: data serialization format #Description: data serialization format
#Pinned versions: 6.0.2 #Pinned versions:
#test that import: #test that import:
#requests #requests
@ -233,7 +230,7 @@ pygments==2.15.0
#rich #rich
#Description: rich text and beautiful formatting in the terminal #Description: rich text and beautiful formatting in the terminal
#Pinned versions: 14.1.0 #Pinned versions: 10.9.0
#test that import: #test that import:
scikit-image==0.19.3 ; python_version < "3.10" scikit-image==0.19.3 ; python_version < "3.10"
@ -307,7 +304,7 @@ pytest-cpp==2.3.0
#Pinned versions: 2.3.0 #Pinned versions: 2.3.0
#test that import: #test that import:
z3-solver==4.15.1.0 z3-solver==4.12.6.0
#Description: The Z3 Theorem Prover Project #Description: The Z3 Theorem Prover Project
#Pinned versions: #Pinned versions:
#test that import: #test that import:
@ -337,12 +334,12 @@ sympy==1.13.3
#Pinned versions: #Pinned versions:
#test that import: #test that import:
onnx==1.18.0 onnx==1.17.0
#Description: Required by onnx tests, and mypy and test_public_bindings.py when checking torch.onnx._internal #Description: Required by mypy and test_public_bindings.py when checking torch.onnx._internal
#Pinned versions: #Pinned versions:
#test that import: #test that import:
onnxscript==0.3.1 onnxscript==0.2.2
#Description: Required by mypy and test_public_bindings.py when checking torch.onnx._internal #Description: Required by mypy and test_public_bindings.py when checking torch.onnx._internal
#Pinned versions: #Pinned versions:
#test that import: #test that import:
@ -356,20 +353,22 @@ parameterized==0.8.1
#Pinned versions: 1.24.0 #Pinned versions: 1.24.0
#test that import: test_sac_estimator.py #test that import: test_sac_estimator.py
pwlf==2.2.1 pwlf==2.2.1 ; python_version >= "3.8"
#Description: required for testing torch/distributed/_tools/sac_estimator.py #Description: required for testing torch/distributed/_tools/sac_estimator.py
#Pinned versions: 2.2.1 #Pinned versions: 2.2.1
#test that import: test_sac_estimator.py #test that import: test_sac_estimator.py
# To build PyTorch itself
pyyaml
pyzstd
setuptools>=70.1.0
six
# To build PyTorch itself
astunparse
PyYAML
pyzstd
setuptools
ninja==1.11.1 ; platform_machine == "aarch64"
scons==4.5.2 ; platform_machine == "aarch64" scons==4.5.2 ; platform_machine == "aarch64"
pulp==2.9.0 pulp==2.9.0 ; python_version >= "3.8"
#Description: required for testing ilp formulaiton under torch/distributed/_tools #Description: required for testing ilp formulaiton under torch/distributed/_tools
#Pinned versions: 2.9.0 #Pinned versions: 2.9.0
#test that import: test_sac_ilp.py #test that import: test_sac_ilp.py
@ -378,19 +377,3 @@ dataclasses_json==0.6.7
#Description: required for data pipeline and scripts under tools/stats #Description: required for data pipeline and scripts under tools/stats
#Pinned versions: 0.6.7 #Pinned versions: 0.6.7
#test that import: #test that import:
cmake==4.0.0
#Description: required for building
tlparse==0.3.30
#Description: required for log parsing
cuda-bindings>=12.0,<13.0 ; platform_machine != "s390x"
#Description: required for testing CUDAGraph::raw_cuda_graph(). See https://nvidia.github.io/cuda-python/cuda-bindings/latest/support.html for how this version was chosen. Note "Any fix in the latest bindings would be backported to the prior major version" means that only the newest version of cuda-bindings will get fixes. Depending on the latest version of 12.x is okay because all 12.y versions will be supported via "CUDA minor version compatibility". Pytorch builds against 13.z versions of cuda toolkit work with 12.x versions of cuda-bindings as well because newer drivers work with old toolkits.
#test that import: test_cuda.py
setuptools-git-versioning==2.1.0
scikit-build==0.18.1
pyre-extensions==0.0.32
tabulate==0.9.0
#Description: These package are needed to build FBGEMM and torchrec on PyTorch CI

View File

@ -1,28 +1,18 @@
sphinx==5.3.0 sphinx==5.3.0
#Description: This is used to generate PyTorch docs #Description: This is used to generate PyTorch docs
#Pinned versions: 5.3.0 #Pinned versions: 5.3.0
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git@722b7e6f9ca512fcc526ad07d62b3d28c50bb6cd#egg=pytorch_sphinx_theme2 -e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
# TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering # TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering
# but it doesn't seem to work and hangs around idly. The initial thought that it is probably # but it doesn't seem to work and hangs around idly. The initial thought is probably
# something related to Docker setup. We can investigate this later. # something related to Docker setup. We can investigate this later
sphinxcontrib.katex==0.8.6 sphinxcontrib.katex==0.8.6
#Description: This is used to generate PyTorch docs #Description: This is used to generate PyTorch docs
#Pinned versions: 0.8.6 #Pinned versions: 0.8.6
sphinxext-opengraph==0.9.1 matplotlib==3.5.3
#Description: This is used to generate PyTorch docs #Description: This is used to generate PyTorch docs
#Pinned versions: 0.9.1 #Pinned versions: 3.5.3
sphinx_sitemap==2.6.0
#Description: This is used to generate sitemap for PyTorch docs
#Pinned versions: 2.6.0
matplotlib==3.5.3 ; python_version < "3.13"
matplotlib==3.6.3 ; python_version >= "3.13"
#Description: This is used to generate PyTorch docs
#Pinned versions: 3.6.3 if python > 3.12. Otherwise 3.5.3.
tensorboard==2.13.0 ; python_version < "3.13" tensorboard==2.13.0 ; python_version < "3.13"
tensorboard==2.18.0 ; python_version >= "3.13" tensorboard==2.18.0 ; python_version >= "3.13"
@ -50,12 +40,11 @@ IPython==8.12.0
#Pinned versions: 8.12.0 #Pinned versions: 8.12.0
myst-nb==0.17.2 myst-nb==0.17.2
#Description: This is used to generate PyTorch functorch and torch.compile docs. #Description: This is used to generate PyTorch functorch docs
#Pinned versions: 0.17.2 #Pinned versions: 0.13.2
# The following are required to build torch.distributed.elastic.rendezvous.etcd* docs # The following are required to build torch.distributed.elastic.rendezvous.etcd* docs
python-etcd==0.4.5 python-etcd==0.4.5
sphinx-copybutton==0.5.0 sphinx-copybutton==0.5.0
sphinx-design==0.4.0 sphinx-panels==0.4.1
sphinxcontrib-mermaid==1.0.0
myst-parser==0.18.1 myst-parser==0.18.1

View File

@ -1 +1 @@
3.4.0 3.3.0

View File

@ -1 +0,0 @@
3.4.0

View File

@ -0,0 +1,175 @@
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG IMAGE_NAME
FROM ${IMAGE_NAME}
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ARG CONDA_CMAKE
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_magma_conda.sh install_magma_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh install_magma_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install clang
ARG CLANG_VERSION
COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install UCC
ARG UCX_COMMIT
ARG UCC_COMMIT
ENV UCX_COMMIT $UCX_COMMIT
ENV UCC_COMMIT $UCC_COMMIT
ENV UCX_HOME /usr
ENV UCC_HOME /usr
ADD ./common/install_ucc.sh install_ucc.sh
RUN if [ -n "${UCX_COMMIT}" ] && [ -n "${UCC_COMMIT}" ]; then bash ./install_ucc.sh; fi
RUN rm install_ucc.sh
COPY ./common/install_openssl.sh install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
RUN bash ./install_openssl.sh
ENV OPENSSL_DIR /opt/openssl
ARG INDUCTOR_BENCHMARKS
ARG ANACONDA_PYTHON_VERSION
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt triton_version.txt
ARG HALIDE
# Build and install halide
COPY ./common/install_halide.sh install_halide.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/halide.txt halide.txt
RUN if [ -n "${HALIDE}" ]; then bash ./install_halide.sh; fi
RUN rm install_halide.sh common_utils.sh halide.txt
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
# See https://github.com/pytorch/pytorch/issues/82174
# TODO(sdym@fb.com):
# check if this is needed after full off Xenial migration
ENV CARGO_NET_GIT_FETCH_WITH_CLI true
RUN bash ./install_cache.sh && rm install_cache.sh
ENV CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache
# Add jni.h for java host build
COPY ./common/install_jni.sh install_jni.sh
COPY ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Install Open MPI for CUDA
COPY ./common/install_openmpi.sh install_openmpi.sh
RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
RUN rm install_openmpi.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# AWS specific CUDA build guidance
ENV TORCH_CUDA_ARCH_LIST Maxwell
ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
ENV CUDA_PATH /usr/local/cuda
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
# Install CUDNN
ARG CUDNN_VERSION
ARG CUDA_VERSION
COPY ./common/install_cudnn.sh install_cudnn.sh
RUN if [ -n "${CUDNN_VERSION}" ]; then bash install_cudnn.sh; fi
RUN rm install_cudnn.sh
# Install CUSPARSELT
ARG CUDA_VERSION
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash install_cusparselt.sh
RUN rm install_cusparselt.sh
# Install CUDSS
ARG CUDA_VERSION
COPY ./common/install_cudss.sh install_cudss.sh
RUN bash install_cudss.sh
RUN rm install_cudss.sh
# Delete /usr/local/cuda-11.X/cuda-11.X symlinks
RUN if [ -h /usr/local/cuda-11.6/cuda-11.6 ]; then rm /usr/local/cuda-11.6/cuda-11.6; fi
RUN if [ -h /usr/local/cuda-11.7/cuda-11.7 ]; then rm /usr/local/cuda-11.7/cuda-11.7; fi
RUN if [ -h /usr/local/cuda-12.1/cuda-12.1 ]; then rm /usr/local/cuda-12.1/cuda-12.1; fi
RUN if [ -h /usr/local/cuda-12.4/cuda-12.4 ]; then rm /usr/local/cuda-12.4/cuda-12.4; fi
USER jenkins
CMD ["bash"]

View File

@ -25,9 +25,9 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest) # Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION ARG ANACONDA_PYTHON_VERSION
ARG BUILD_ENVIRONMENT
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ARG CONDA_CMAKE
COPY requirements-ci.txt /opt/conda/requirements-ci.txt COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh COPY ./common/common_utils.sh common_utils.sh
@ -43,6 +43,20 @@ ARG CLANG_VERSION
COPY ./common/install_clang.sh install_clang.sh COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh RUN bash ./install_clang.sh && rm install_clang.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV # (optional) Install vision packages like OpenCV
ARG VISION ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./ COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -56,7 +70,7 @@ COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh RUN bash ./install_rocm.sh
RUN rm install_rocm.sh RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION} RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh RUN rm install_rocm_magma.sh
ADD ./common/install_miopen.sh install_miopen.sh ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh
@ -98,9 +112,14 @@ COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps
COPY ./common/common_utils.sh common_utils.sh COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt COPY ci_commit_pins/timm.txt timm.txt
COPY ci_commit_pins/torchbench.txt torchbench.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt torchbench.txt RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version # (optional) Install non-default Ninja version
ARG NINJA_VERSION ARG NINJA_VERSION

View File

@ -28,6 +28,7 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest) # Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ARG DOCS ARG DOCS
ARG BUILD_ENVIRONMENT ARG BUILD_ENVIRONMENT
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
@ -72,10 +73,17 @@ ARG TRITON
COPY ./common/install_triton.sh install_triton.sh COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton-xpu.txt triton-xpu.txt COPY ci_commit_pins/triton-xpu.txt triton-xpu.txt
COPY triton_xpu_version.txt triton_version.txt COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-xpu.txt triton_version.txt RUN rm install_triton.sh common_utils.sh triton-xpu.txt triton_version.txt
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV # (optional) Install vision packages like OpenCV
ARG VISION ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./ COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -83,6 +91,12 @@ RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION} ENV INSTALLED_VISION ${VISION}
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version # (optional) Install non-default Ninja version
ARG NINJA_VERSION ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh COPY ./common/install_ninja.sh install_ninja.sh

View File

@ -1,6 +1,6 @@
ARG UBUNTU_VERSION ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION} as base FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION ARG UBUNTU_VERSION
@ -28,6 +28,7 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest) # Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ARG DOCS ARG DOCS
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
@ -51,17 +52,9 @@ RUN bash ./install_lcov.sh && rm install_lcov.sh
# Install cuda and cudnn # Install cuda and cudnn
ARG CUDA_VERSION ARG CUDA_VERSION
COPY ./common/install_cuda.sh install_cuda.sh COPY ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
COPY ./ci_commit_pins/nccl-cu* /ci_commit_pins/
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh install_nccl.sh /ci_commit_pins/nccl-cu* install_cusparselt.sh
ENV DESIRED_CUDA ${CUDA_VERSION} ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
# No effect if cuda not installed
ENV USE_SYSTEM_NCCL=1
ENV NCCL_INCLUDE_DIR="/usr/local/cuda/include/"
ENV NCCL_LIB_DIR="/usr/local/cuda/lib64/"
# (optional) Install UCC # (optional) Install UCC
ARG UCX_COMMIT ARG UCX_COMMIT
@ -74,6 +67,20 @@ ADD ./common/install_ucc.sh install_ucc.sh
RUN if [ -n "${UCX_COMMIT}" ] && [ -n "${UCC_COMMIT}" ]; then bash ./install_ucc.sh; fi RUN if [ -n "${UCX_COMMIT}" ] && [ -n "${UCC_COMMIT}" ]; then bash ./install_ucc.sh; fi
RUN rm install_ucc.sh RUN rm install_ucc.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV # (optional) Install vision packages like OpenCV
ARG VISION ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./ COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
@ -81,6 +88,24 @@ RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION} ENV INSTALLED_VISION ${VISION}
# (optional) Install Vulkan SDK
ARG VULKAN_SDK_VERSION
COPY ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh
RUN if [ -n "${VULKAN_SDK_VERSION}" ]; then bash ./install_vulkan_sdk.sh; fi
RUN rm install_vulkan_sdk.sh
# (optional) Install swiftshader
ARG SWIFTSHADER
COPY ./common/install_swiftshader.sh install_swiftshader.sh
RUN if [ -n "${SWIFTSHADER}" ]; then bash ./install_swiftshader.sh; fi
RUN rm install_swiftshader.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version # (optional) Install non-default Ninja version
ARG NINJA_VERSION ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh COPY ./common/install_ninja.sh install_ninja.sh
@ -98,26 +123,24 @@ COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps
COPY ./common/common_utils.sh common_utils.sh COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt COPY ci_commit_pins/timm.txt timm.txt
COPY ci_commit_pins/torchbench.txt torchbench.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt torchbench.txt RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
ARG TRITON ARG TRITON
ARG TRITON_CPU # Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
# Create a separate stage for building Triton and Triton-CPU. install_triton
# will check for the presence of env vars
FROM base as triton-builder
COPY ./common/install_triton.sh install_triton.sh COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt COPY ci_commit_pins/triton.txt triton.txt
COPY ci_commit_pins/triton-cpu.txt triton-cpu.txt RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN bash ./install_triton.sh RUN rm install_triton.sh common_utils.sh triton.txt
FROM base as final ARG TRITON_CPU
COPY --from=triton-builder /opt/triton /opt/triton COPY ./common/install_triton.sh install_triton.sh
RUN if [ -n "${TRITON}" ] || [ -n "${TRITON_CPU}" ]; then pip install /opt/triton/*.whl; chown -R jenkins:jenkins /opt/conda; fi COPY ./common/common_utils.sh common_utils.sh
RUN rm -rf /opt/triton COPY ci_commit_pins/triton-cpu.txt triton-cpu.txt
RUN if [ -n "${TRITON_CPU}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-cpu.txt
ARG EXECUTORCH ARG EXECUTORCH
# Build and install executorch # Build and install executorch
@ -148,12 +171,6 @@ RUN if [ -n "${ACL}" ]; then bash ./install_acl.sh; fi
RUN rm install_acl.sh RUN rm install_acl.sh
ENV INSTALLED_ACL ${ACL} ENV INSTALLED_ACL ${ACL}
ARG OPENBLAS
COPY ./common/install_openblas.sh install_openblas.sh
RUN if [ -n "${OPENBLAS}" ]; then bash ./install_openblas.sh; fi
RUN rm install_openblas.sh
ENV INSTALLED_OPENBLAS ${OPENBLAS}
# Install ccache/sccache (do this last, so we get priority in PATH) # Install ccache/sccache (do this last, so we get priority in PATH)
ARG SKIP_SCCACHE_INSTALL ARG SKIP_SCCACHE_INSTALL
COPY ./common/install_cache.sh install_cache.sh COPY ./common/install_cache.sh install_cache.sh

View File

@ -1,2 +0,0 @@
output/
magma-rocm*/

View File

@ -1,35 +0,0 @@
SHELL=/usr/bin/env bash
DOCKER_CMD ?= docker
DESIRED_ROCM ?= 6.4
DESIRED_ROCM_SHORT = $(subst .,,$(DESIRED_ROCM))
PACKAGE_NAME = magma-rocm
# inherit this from underlying docker image, do not pass this env var to docker
#PYTORCH_ROCM_ARCH ?= gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201
DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
-v $(shell git rev-parse --show-toplevel)/.ci:/builder \
-w /builder \
-e PACKAGE_NAME=${PACKAGE_NAME}${DESIRED_ROCM_SHORT} \
-e DESIRED_ROCM=${DESIRED_ROCM} \
"pytorch/almalinux-builder:rocm${DESIRED_ROCM}" \
magma-rocm/build_magma.sh
.PHONY: all
all: magma-rocm64
all: magma-rocm63
.PHONY:
clean:
$(RM) -r magma-*
$(RM) -r output
.PHONY: magma-rocm64
magma-rocm64: DESIRED_ROCM := 6.4
magma-rocm64:
$(DOCKER_RUN)
.PHONY: magma-rocm63
magma-rocm63: DESIRED_ROCM := 6.3
magma-rocm63:
$(DOCKER_RUN)

View File

@ -1,48 +0,0 @@
# Magma ROCm
This folder contains the scripts and configurations to build libmagma.so, linked for various versions of ROCm.
## Building
Look in the `Makefile` for available targets to build. To build any target, for example `magma-rocm63`, run
```
# Using `docker`
make magma-rocm63
# Using `podman`
DOCKER_CMD=podman make magma-rocm63
```
This spawns a `pytorch/manylinux-rocm<version>` docker image, which has the required `devtoolset` and ROCm versions installed.
Within the docker image, it runs `build_magma.sh` with the correct environment variables set, which package the necessary files
into a tarball, with the following structure:
```
.
├── include # header files
├── lib # libmagma.so
├── info
│ ├── licenses # license file
│ └── recipe # build script
```
More specifically, `build_magma.sh` copies over the relevant files from the `package_files` directory depending on the ROCm version.
Outputted binaries should be in the `output` folder.
## Pushing
Packages can be uploaded to an S3 bucket using:
```
aws s3 cp output/*/magma-cuda*.bz2 <bucket-with-path>
```
If you do not have upload permissions, please ping @seemethere or @soumith to gain access
## New versions
New ROCm versions can be added by creating a new make target with the next desired version. For ROCm version N.n, the target should be named `magma-rocmNn`.
Make sure to edit the appropriate environment variables (e.g., DESIRED_ROCM) in the `Makefile` accordingly. Remember also to check `build_magma.sh` to ensure the logic for copying over the files remains correct.

View File

@ -1,42 +0,0 @@
#!/usr/bin/env bash
set -eou pipefail
# Environment variables
# The script expects DESIRED_CUDA and PACKAGE_NAME to be set
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
# Version 2.7.2 + ROCm related updates
MAGMA_VERSION=a1625ff4d9bc362906bd01f805dbbe12612953f6
# Folders for the build
PACKAGE_FILES=${ROOT_DIR}/magma-rocm/package_files # metadata
PACKAGE_DIR=${ROOT_DIR}/magma-rocm/${PACKAGE_NAME} # build workspace
PACKAGE_OUTPUT=${ROOT_DIR}/magma-rocm/output # where tarballs are stored
PACKAGE_BUILD=${PACKAGE_DIR} # where the content of the tarball is prepared
PACKAGE_RECIPE=${PACKAGE_BUILD}/info/recipe
PACKAGE_LICENSE=${PACKAGE_BUILD}/info/licenses
mkdir -p ${PACKAGE_DIR} ${PACKAGE_OUTPUT}/linux-64 ${PACKAGE_BUILD} ${PACKAGE_RECIPE} ${PACKAGE_LICENSE}
# Fetch magma sources and verify checksum
pushd ${PACKAGE_DIR}
git clone https://bitbucket.org/icl/magma.git
pushd magma
git checkout ${MAGMA_VERSION}
popd
popd
# build
pushd ${PACKAGE_DIR}/magma
# The build.sh script expects to be executed from the sources root folder
INSTALL_DIR=${PACKAGE_BUILD} ${PACKAGE_FILES}/build.sh
popd
# Package recipe, license and tarball
# Folder and package name are backward compatible for the build workflow
cp ${PACKAGE_FILES}/build.sh ${PACKAGE_RECIPE}/build.sh
cp ${PACKAGE_DIR}/magma/COPYRIGHT ${PACKAGE_LICENSE}/COPYRIGHT
pushd ${PACKAGE_BUILD}
tar cjf ${PACKAGE_OUTPUT}/linux-64/${PACKAGE_NAME}-${MAGMA_VERSION}-1.tar.bz2 include lib info
echo Built in ${PACKAGE_OUTPUT}/linux-64/${PACKAGE_NAME}-${MAGMA_VERSION}-1.tar.bz2
popd

View File

@ -1,38 +0,0 @@
# Magma build scripts need `python`
ln -sf /usr/bin/python3 /usr/bin/python
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
almalinux)
yum install -y gcc-gfortran
;;
*)
echo "No preinstalls to build magma..."
;;
esac
MKLROOT=${MKLROOT:-/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION}
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
if [[ -f "${MKLROOT}/lib/libmkl_core.a" ]]; then
echo 'LIB = -Wl,--start-group -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -lstdc++ -lm -lgomp -lhipblas -lhipsparse' >> make.inc
fi
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib -ldl' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --offload-arch=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT="${MKLROOT}"
make testing/testing_dgemm -j $(nproc) MKLROOT="${MKLROOT}"
cp -R lib ${INSTALL_DIR}
cp -R include ${INSTALL_DIR}

View File

@ -1,7 +1,7 @@
SHELL=/usr/bin/env bash SHELL=/usr/bin/env bash
DOCKER_CMD ?= docker DOCKER_CMD ?= docker
DESIRED_CUDA ?= 12.8 DESIRED_CUDA ?= 11.8
DESIRED_CUDA_SHORT = $(subst .,,$(DESIRED_CUDA)) DESIRED_CUDA_SHORT = $(subst .,,$(DESIRED_CUDA))
PACKAGE_NAME = magma-cuda PACKAGE_NAME = magma-cuda
CUDA_ARCH_LIST ?= -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90 CUDA_ARCH_LIST ?= -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90
@ -12,25 +12,20 @@ DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
-e PACKAGE_NAME=${PACKAGE_NAME}${DESIRED_CUDA_SHORT} \ -e PACKAGE_NAME=${PACKAGE_NAME}${DESIRED_CUDA_SHORT} \
-e DESIRED_CUDA=${DESIRED_CUDA} \ -e DESIRED_CUDA=${DESIRED_CUDA} \
-e CUDA_ARCH_LIST="${CUDA_ARCH_LIST}" \ -e CUDA_ARCH_LIST="${CUDA_ARCH_LIST}" \
"pytorch/almalinux-builder:cuda${DESIRED_CUDA}-main" \ "pytorch/manylinux2_28-builder:cuda${DESIRED_CUDA}-main" \
magma/build_magma.sh magma/build_magma.sh
.PHONY: all .PHONY: all
all: magma-cuda129
all: magma-cuda128 all: magma-cuda128
all: magma-cuda126 all: magma-cuda126
all: magma-cuda124
all: magma-cuda118
.PHONY: .PHONY:
clean: clean:
$(RM) -r magma-* $(RM) -r magma-*
$(RM) -r output $(RM) -r output
.PHONY: magma-cuda129
magma-cuda129: DESIRED_CUDA := 12.9
magma-cuda129: CUDA_ARCH_LIST += -gencode arch=compute_100,code=sm_100 -gencode arch=compute_120,code=sm_120
magma-cuda129:
$(DOCKER_RUN)
.PHONY: magma-cuda128 .PHONY: magma-cuda128
magma-cuda128: DESIRED_CUDA := 12.8 magma-cuda128: DESIRED_CUDA := 12.8
magma-cuda128: CUDA_ARCH_LIST += -gencode arch=compute_100,code=sm_100 -gencode arch=compute_120,code=sm_120 magma-cuda128: CUDA_ARCH_LIST += -gencode arch=compute_100,code=sm_100 -gencode arch=compute_120,code=sm_120
@ -41,3 +36,14 @@ magma-cuda128:
magma-cuda126: DESIRED_CUDA := 12.6 magma-cuda126: DESIRED_CUDA := 12.6
magma-cuda126: magma-cuda126:
$(DOCKER_RUN) $(DOCKER_RUN)
.PHONY: magma-cuda124
magma-cuda124: DESIRED_CUDA := 12.4
magma-cuda124:
$(DOCKER_RUN)
.PHONY: magma-cuda118
magma-cuda118: DESIRED_CUDA := 11.8
magma-cuda118: CUDA_ARCH_LIST += -gencode arch=compute_37,code=sm_37
magma-cuda118:
$(DOCKER_RUN)

View File

@ -18,10 +18,12 @@ retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*) $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
} }
PLATFORM="" PLATFORM="manylinux2014_x86_64"
# TODO move this into the Docker images # TODO move this into the Docker images
OS_NAME=$(awk -F= '/^NAME/{print $2}' /etc/os-release) OS_NAME=$(awk -F= '/^NAME/{print $2}' /etc/os-release)
if [[ "$OS_NAME" == *"AlmaLinux"* ]]; then if [[ "$OS_NAME" == *"CentOS Linux"* ]]; then
retry yum install -q -y zip openssl
elif [[ "$OS_NAME" == *"AlmaLinux"* ]]; then
retry yum install -q -y zip openssl retry yum install -q -y zip openssl
PLATFORM="manylinux_2_28_x86_64" PLATFORM="manylinux_2_28_x86_64"
elif [[ "$OS_NAME" == *"Red Hat Enterprise Linux"* ]]; then elif [[ "$OS_NAME" == *"Red Hat Enterprise Linux"* ]]; then
@ -31,11 +33,9 @@ elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then
# Comment out nvidia repositories to prevent them from getting apt-get updated, see https://github.com/pytorch/pytorch/issues/74968 # Comment out nvidia repositories to prevent them from getting apt-get updated, see https://github.com/pytorch/pytorch/issues/74968
# shellcheck disable=SC2046 # shellcheck disable=SC2046
sed -i 's/.*nvidia.*/# &/' $(find /etc/apt/ -type f -name "*.list") sed -i 's/.*nvidia.*/# &/' $(find /etc/apt/ -type f -name "*.list")
retry apt-get update retry apt-get update
retry apt-get -y install zip openssl retry apt-get -y install zip openssl
else
echo "Unknown OS: '$OS_NAME'"
exit 1
fi fi
# We use the package name to test the package by passing this to 'pip install' # We use the package name to test the package by passing this to 'pip install'
@ -79,6 +79,8 @@ if [[ -e /opt/openssl ]]; then
export CMAKE_INCLUDE_PATH="/opt/openssl/include":$CMAKE_INCLUDE_PATH export CMAKE_INCLUDE_PATH="/opt/openssl/include":$CMAKE_INCLUDE_PATH
fi fi
mkdir -p /tmp/$WHEELHOUSE_DIR mkdir -p /tmp/$WHEELHOUSE_DIR
export PATCHELF_BIN=/usr/local/bin/patchelf export PATCHELF_BIN=/usr/local/bin/patchelf
@ -97,7 +99,6 @@ if [[ -z "$PYTORCH_ROOT" ]]; then
exit 1 exit 1
fi fi
pushd "$PYTORCH_ROOT" pushd "$PYTORCH_ROOT"
retry pip install -qUr requirements-build.txt
python setup.py clean python setup.py clean
retry pip install -qr requirements.txt retry pip install -qr requirements.txt
case ${DESIRED_PYTHON} in case ${DESIRED_PYTHON} in
@ -110,6 +111,12 @@ case ${DESIRED_PYTHON} in
;; ;;
esac esac
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
export _GLIBCXX_USE_CXX11_ABI=1
else
export _GLIBCXX_USE_CXX11_ABI=0
fi
if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
echo "Calling build_amd.py at $(date)" echo "Calling build_amd.py at $(date)"
python tools/amd_build/build_amd.py python tools/amd_build/build_amd.py
@ -151,7 +158,7 @@ if [[ "$USE_SPLIT_BUILD" == "true" ]]; then
BUILD_LIBTORCH_WHL=0 BUILD_PYTHON_ONLY=1 \ BUILD_LIBTORCH_WHL=0 BUILD_PYTHON_ONLY=1 \
BUILD_LIBTORCH_CPU_WITH_DEBUG=$BUILD_DEBUG_INFO \ BUILD_LIBTORCH_CPU_WITH_DEBUG=$BUILD_DEBUG_INFO \
USE_NCCL=${USE_NCCL} USE_RCCL=${USE_RCCL} USE_KINETO=${USE_KINETO} \ USE_NCCL=${USE_NCCL} USE_RCCL=${USE_RCCL} USE_KINETO=${USE_KINETO} \
CMAKE_FRESH=1 python setup.py bdist_wheel -d /tmp/$WHEELHOUSE_DIR python setup.py bdist_wheel -d /tmp/$WHEELHOUSE_DIR --cmake
echo "Finished setup.py bdist_wheel for split build (BUILD_PYTHON_ONLY)" echo "Finished setup.py bdist_wheel for split build (BUILD_PYTHON_ONLY)"
else else
time CMAKE_ARGS=${CMAKE_ARGS[@]} \ time CMAKE_ARGS=${CMAKE_ARGS[@]} \
@ -202,6 +209,12 @@ if [[ -n "$BUILD_PYTHONLESS" ]]; then
mkdir -p /tmp/$LIBTORCH_HOUSE_DIR mkdir -p /tmp/$LIBTORCH_HOUSE_DIR
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
LIBTORCH_ABI="cxx11-abi-"
else
LIBTORCH_ABI=
fi
zip -rq /tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-$PYTORCH_BUILD_VERSION.zip libtorch zip -rq /tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-$PYTORCH_BUILD_VERSION.zip libtorch
cp /tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-$PYTORCH_BUILD_VERSION.zip \ cp /tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-$PYTORCH_BUILD_VERSION.zip \
/tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-latest.zip /tmp/$LIBTORCH_HOUSE_DIR/libtorch-$LIBTORCH_ABI$LIBTORCH_VARIANT-latest.zip
@ -320,8 +333,8 @@ for pkg in /$WHEELHOUSE_DIR/torch_no_python*.whl /$WHEELHOUSE_DIR/torch*linux*.w
# ROCm workaround for roctracer dlopens # ROCm workaround for roctracer dlopens
if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
patchedpath=$(fname_without_so_number $destpath) patchedpath=$(fname_without_so_number $destpath)
# Keep the so number for XPU dependencies and libgomp.so.1 to avoid twice load # Keep the so number for XPU dependencies
elif [[ "$DESIRED_CUDA" == *"xpu"* || "$filename" == "libgomp.so.1" ]]; then elif [[ "$DESIRED_CUDA" == *"xpu"* ]]; then
patchedpath=$destpath patchedpath=$destpath
else else
patchedpath=$(fname_with_sha256 $destpath) patchedpath=$(fname_with_sha256 $destpath)

View File

@ -15,9 +15,6 @@ export INSTALL_TEST=0 # dont install test binaries into site-packages
export USE_CUPTI_SO=0 export USE_CUPTI_SO=0
export USE_CUSPARSELT=${USE_CUSPARSELT:-1} # Enable if not disabled by libtorch build export USE_CUSPARSELT=${USE_CUSPARSELT:-1} # Enable if not disabled by libtorch build
export USE_CUFILE=${USE_CUFILE:-1} export USE_CUFILE=${USE_CUFILE:-1}
export USE_SYSTEM_NCCL=1
export NCCL_INCLUDE_DIR="/usr/local/cuda/include/"
export NCCL_LIB_DIR="/usr/local/cuda/lib64/"
# Keep an array of cmake variables to add to # Keep an array of cmake variables to add to
if [[ -z "$CMAKE_ARGS" ]]; then if [[ -z "$CMAKE_ARGS" ]]; then
@ -39,8 +36,10 @@ if [[ -n "$DESIRED_CUDA" ]]; then
if [[ ${DESIRED_CUDA} =~ ^[0-9]+\.[0-9]+$ ]]; then if [[ ${DESIRED_CUDA} =~ ^[0-9]+\.[0-9]+$ ]]; then
CUDA_VERSION=${DESIRED_CUDA} CUDA_VERSION=${DESIRED_CUDA}
else else
# cu126, cu128 etc... # cu90, cu92, cu100, cu101
if [[ ${#DESIRED_CUDA} -eq 5 ]]; then if [[ ${#DESIRED_CUDA} -eq 4 ]]; then
CUDA_VERSION="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3:1}"
elif [[ ${#DESIRED_CUDA} -eq 5 ]]; then
CUDA_VERSION="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4:1}" CUDA_VERSION="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4:1}"
fi fi
fi fi
@ -51,23 +50,24 @@ else
fi fi
cuda_version_nodot=$(echo $CUDA_VERSION | tr -d '.') cuda_version_nodot=$(echo $CUDA_VERSION | tr -d '.')
EXTRA_CAFFE2_CMAKE_FLAGS+=("-DATEN_NO_TEST=ON")
TORCH_CUDA_ARCH_LIST="5.0;6.0;7.0;7.5;8.0;8.6"
case ${CUDA_VERSION} in case ${CUDA_VERSION} in
#removing sm_50-sm_60 as these architectures are deprecated in CUDA 12.8/9 and will be removed in future releases
#however we would like to keep sm_70 architecture see: https://github.com/pytorch/pytorch/issues/157517
12.8) 12.8)
TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;9.0;10.0;12.0" TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;9.0;10.0;12.0+PTX" #removing sm_50-sm_70 as these architectures are deprecated in CUDA 12.8 and will be removed in future releases
;; EXTRA_CAFFE2_CMAKE_FLAGS+=("-DATEN_NO_TEST=ON")
12.9)
TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;9.0;10.0;12.0+PTX"
# WAR to resolve the ld error in libtorch build with CUDA 12.9
if [[ "$PACKAGE_TYPE" == "libtorch" ]]; then
TORCH_CUDA_ARCH_LIST="7.5;8.0;9.0;10.0;12.0+PTX"
fi
;; ;;
12.6) 12.6)
TORCH_CUDA_ARCH_LIST="5.0;6.0;7.0;7.5;8.0;8.6;9.0" TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST};9.0"
EXTRA_CAFFE2_CMAKE_FLAGS+=("-DATEN_NO_TEST=ON")
;;
12.4)
TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST};9.0"
EXTRA_CAFFE2_CMAKE_FLAGS+=("-DATEN_NO_TEST=ON")
;;
11.8)
TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST};3.7;9.0"
EXTRA_CAFFE2_CMAKE_FLAGS+=("-DATEN_NO_TEST=ON")
;; ;;
*) *)
echo "unknown cuda version $CUDA_VERSION" echo "unknown cuda version $CUDA_VERSION"
@ -91,15 +91,14 @@ fi
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR" || true mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR" || true
OS_NAME=$(awk -F= '/^NAME/{print $2}' /etc/os-release) OS_NAME=$(awk -F= '/^NAME/{print $2}' /etc/os-release)
if [[ "$OS_NAME" == *"AlmaLinux"* ]]; then if [[ "$OS_NAME" == *"CentOS Linux"* ]]; then
LIBGOMP_PATH="/usr/lib64/libgomp.so.1"
elif [[ "$OS_NAME" == *"AlmaLinux"* ]]; then
LIBGOMP_PATH="/usr/lib64/libgomp.so.1" LIBGOMP_PATH="/usr/lib64/libgomp.so.1"
elif [[ "$OS_NAME" == *"Red Hat Enterprise Linux"* ]]; then elif [[ "$OS_NAME" == *"Red Hat Enterprise Linux"* ]]; then
LIBGOMP_PATH="/usr/lib64/libgomp.so.1" LIBGOMP_PATH="/usr/lib64/libgomp.so.1"
elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then
LIBGOMP_PATH="/usr/lib/x86_64-linux-gnu/libgomp.so.1" LIBGOMP_PATH="/usr/lib/x86_64-linux-gnu/libgomp.so.1"
else
echo "Unknown OS: '$OS_NAME'"
exit 1
fi fi
DEPS_LIST=( DEPS_LIST=(
@ -109,12 +108,31 @@ DEPS_SONAME=(
"libgomp.so.1" "libgomp.so.1"
) )
# CUDA 11.8 have to ship the libcusparseLt.so.0 with the binary
# since nvidia-cusparselt-cu11 is not available in PYPI
if [[ $USE_CUSPARSELT == "1" && $CUDA_VERSION == "11.8" ]]; then
DEPS_SONAME+=(
"libcusparseLt.so.0"
)
DEPS_LIST+=(
"/usr/local/cuda/lib64/libcusparseLt.so.0"
)
fi
# CUDA_VERSION 12.6, 12.8, 12.9
# Turn USE_CUFILE off for CUDA 11.8, 12.4 since nvidia-cufile-cu11 and 1.9.0.20 are
# not available in PYPI
if [[ $CUDA_VERSION == "11.8" || $CUDA_VERSION == "12.4" ]]; then
export USE_CUFILE=0
fi
# CUDA_VERSION 12.4, 12.6, 12.8
if [[ $CUDA_VERSION == 12* ]]; then if [[ $CUDA_VERSION == 12* ]]; then
export USE_STATIC_CUDNN=0 export USE_STATIC_CUDNN=0
# Try parallelizing nvcc as well # Try parallelizing nvcc as well
export TORCH_NVCC_FLAGS="-Xfatbin -compress-all --threads 2" export TORCH_NVCC_FLAGS="-Xfatbin -compress-all --threads 2"
if [[ -z "$PYTORCH_EXTRA_INSTALL_REQUIREMENTS" ]]; then if [[ -z "$PYTORCH_EXTRA_INSTALL_REQUIREMENTS" ]]; then
echo "Bundling with cudnn and cublas." echo "Bundling with cudnn and cublas."
DEPS_LIST+=( DEPS_LIST+=(
@ -130,12 +148,9 @@ if [[ $CUDA_VERSION == 12* ]]; then
"/usr/local/cuda/lib64/libcublasLt.so.12" "/usr/local/cuda/lib64/libcublasLt.so.12"
"/usr/local/cuda/lib64/libcusparseLt.so.0" "/usr/local/cuda/lib64/libcusparseLt.so.0"
"/usr/local/cuda/lib64/libcudart.so.12" "/usr/local/cuda/lib64/libcudart.so.12"
"/usr/local/cuda/lib64/libnvToolsExt.so.1"
"/usr/local/cuda/lib64/libnvrtc.so.12" "/usr/local/cuda/lib64/libnvrtc.so.12"
"/usr/local/cuda/lib64/libnvrtc-builtins.so" "/usr/local/cuda/lib64/libnvrtc-builtins.so"
"/usr/local/cuda/lib64/libcufile.so.0"
"/usr/local/cuda/lib64/libcufile_rdma.so.1"
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12"
"/usr/local/cuda/extras/CUPTI/lib64/libnvperf_host.so"
) )
DEPS_SONAME+=( DEPS_SONAME+=(
"libcudnn_adv.so.9" "libcudnn_adv.so.9"
@ -150,17 +165,19 @@ if [[ $CUDA_VERSION == 12* ]]; then
"libcublasLt.so.12" "libcublasLt.so.12"
"libcusparseLt.so.0" "libcusparseLt.so.0"
"libcudart.so.12" "libcudart.so.12"
"libnvToolsExt.so.1"
"libnvrtc.so.12" "libnvrtc.so.12"
"libnvrtc-builtins.so" "libnvrtc-builtins.so"
"libcufile.so.0"
"libcufile_rdma.so.1"
"libcupti.so.12"
"libnvperf_host.so"
) )
# Add libnvToolsExt only if CUDA version is not 12.9 if [[ $USE_CUFILE == 1 ]]; then
if [[ $CUDA_VERSION != 12.9* ]]; then DEPS_LIST+=(
DEPS_LIST+=("/usr/local/cuda/lib64/libnvToolsExt.so.1") "/usr/local/cuda/lib64/libcufile.so.0"
DEPS_SONAME+=("libnvToolsExt.so.1") "/usr/local/cuda/lib64/libcufile_rdma.so.1"
)
DEPS_SONAME+=(
"libcufile.so.0"
"libcufile_rdma.so.1"
)
fi fi
else else
echo "Using nvidia libs from pypi." echo "Using nvidia libs from pypi."
@ -174,21 +191,94 @@ if [[ $CUDA_VERSION == 12* ]]; then
'$ORIGIN/../../nvidia/curand/lib' '$ORIGIN/../../nvidia/curand/lib'
'$ORIGIN/../../nvidia/cusolver/lib' '$ORIGIN/../../nvidia/cusolver/lib'
'$ORIGIN/../../nvidia/cusparse/lib' '$ORIGIN/../../nvidia/cusparse/lib'
'$ORIGIN/../../nvidia/cusparselt/lib'
'$ORIGIN/../../cusparselt/lib' '$ORIGIN/../../cusparselt/lib'
'$ORIGIN/../../nvidia/nccl/lib' '$ORIGIN/../../nvidia/nccl/lib'
'$ORIGIN/../../nvidia/nvshmem/lib'
'$ORIGIN/../../nvidia/nvtx/lib' '$ORIGIN/../../nvidia/nvtx/lib'
'$ORIGIN/../../nvidia/cufile/lib' )
if [[ $USE_CUFILE == 1 ]]; then
CUDA_RPATHS+=(
'$ORIGIN/../../nvidia/cufile/lib'
)
fi
CUDA_RPATHS=$(IFS=: ; echo "${CUDA_RPATHS[*]}")
export C_SO_RPATH=$CUDA_RPATHS':$ORIGIN:$ORIGIN/lib'
export LIB_SO_RPATH=$CUDA_RPATHS':$ORIGIN'
export FORCE_RPATH="--force-rpath"
export USE_STATIC_NCCL=0
export USE_SYSTEM_NCCL=1
export ATEN_STATIC_CUDA=0
export USE_CUDA_STATIC_LINK=0
export USE_CUPTI_SO=1
export NCCL_INCLUDE_DIR="/usr/local/cuda/include/"
export NCCL_LIB_DIR="/usr/local/cuda/lib64/"
fi
elif [[ $CUDA_VERSION == "11.8" ]]; then
export USE_STATIC_CUDNN=0
# Try parallelizing nvcc as well
export TORCH_NVCC_FLAGS="-Xfatbin -compress-all --threads 2"
# Bundle ptxas into the wheel, see https://github.com/pytorch/pytorch/pull/119750
export BUILD_BUNDLE_PTXAS=1
if [[ -z "$PYTORCH_EXTRA_INSTALL_REQUIREMENTS" ]]; then
echo "Bundling with cudnn and cublas."
DEPS_LIST+=(
"/usr/local/cuda/lib64/libcudnn_adv.so.9"
"/usr/local/cuda/lib64/libcudnn_cnn.so.9"
"/usr/local/cuda/lib64/libcudnn_graph.so.9"
"/usr/local/cuda/lib64/libcudnn_ops.so.9"
"/usr/local/cuda/lib64/libcudnn_engines_runtime_compiled.so.9"
"/usr/local/cuda/lib64/libcudnn_engines_precompiled.so.9"
"/usr/local/cuda/lib64/libcudnn_heuristic.so.9"
"/usr/local/cuda/lib64/libcudnn.so.9"
"/usr/local/cuda/lib64/libcublas.so.11"
"/usr/local/cuda/lib64/libcublasLt.so.11"
"/usr/local/cuda/lib64/libcudart.so.11.0"
"/usr/local/cuda/lib64/libnvToolsExt.so.1"
"/usr/local/cuda/lib64/libnvrtc.so.11.2" # this is not a mistake, it links to more specific cuda version
"/usr/local/cuda/lib64/libnvrtc-builtins.so.11.8"
)
DEPS_SONAME+=(
"libcudnn_adv.so.9"
"libcudnn_cnn.so.9"
"libcudnn_graph.so.9"
"libcudnn_ops.so.9"
"libcudnn_engines_runtime_compiled.so.9"
"libcudnn_engines_precompiled.so.9"
"libcudnn_heuristic.so.9"
"libcudnn.so.9"
"libcublas.so.11"
"libcublasLt.so.11"
"libcudart.so.11.0"
"libnvToolsExt.so.1"
"libnvrtc.so.11.2"
"libnvrtc-builtins.so.11.8"
)
else
echo "Using nvidia libs from pypi."
CUDA_RPATHS=(
'$ORIGIN/../../nvidia/cublas/lib'
'$ORIGIN/../../nvidia/cuda_cupti/lib'
'$ORIGIN/../../nvidia/cuda_nvrtc/lib'
'$ORIGIN/../../nvidia/cuda_runtime/lib'
'$ORIGIN/../../nvidia/cudnn/lib'
'$ORIGIN/../../nvidia/cufft/lib'
'$ORIGIN/../../nvidia/curand/lib'
'$ORIGIN/../../nvidia/cusolver/lib'
'$ORIGIN/../../nvidia/cusparse/lib'
'$ORIGIN/../../nvidia/nccl/lib'
'$ORIGIN/../../nvidia/nvtx/lib'
) )
CUDA_RPATHS=$(IFS=: ; echo "${CUDA_RPATHS[*]}") CUDA_RPATHS=$(IFS=: ; echo "${CUDA_RPATHS[*]}")
export C_SO_RPATH=$CUDA_RPATHS':$ORIGIN:$ORIGIN/lib' export C_SO_RPATH=$CUDA_RPATHS':$ORIGIN:$ORIGIN/lib'
export LIB_SO_RPATH=$CUDA_RPATHS':$ORIGIN' export LIB_SO_RPATH=$CUDA_RPATHS':$ORIGIN'
export FORCE_RPATH="--force-rpath" export FORCE_RPATH="--force-rpath"
export USE_STATIC_NCCL=0 export USE_STATIC_NCCL=0
export USE_SYSTEM_NCCL=1
export ATEN_STATIC_CUDA=0 export ATEN_STATIC_CUDA=0
export USE_CUDA_STATIC_LINK=0 export USE_CUDA_STATIC_LINK=0
export USE_CUPTI_SO=1 export USE_CUPTI_SO=1
export NCCL_INCLUDE_DIR="/usr/local/cuda/include/"
export NCCL_LIB_DIR="/usr/local/cuda/lib64/"
fi fi
else else
echo "Unknown cuda version $CUDA_VERSION" echo "Unknown cuda version $CUDA_VERSION"

View File

@ -22,7 +22,9 @@ retry () {
# TODO move this into the Docker images # TODO move this into the Docker images
OS_NAME=`awk -F= '/^NAME/{print $2}' /etc/os-release` OS_NAME=`awk -F= '/^NAME/{print $2}' /etc/os-release`
if [[ "$OS_NAME" == *"AlmaLinux"* ]]; then if [[ "$OS_NAME" == *"CentOS Linux"* ]]; then
retry yum install -q -y zip openssl
elif [[ "$OS_NAME" == *"AlmaLinux"* ]]; then
retry yum install -q -y zip openssl retry yum install -q -y zip openssl
elif [[ "$OS_NAME" == *"Red Hat Enterprise Linux"* ]]; then elif [[ "$OS_NAME" == *"Red Hat Enterprise Linux"* ]]; then
retry dnf install -q -y zip openssl retry dnf install -q -y zip openssl
@ -33,9 +35,6 @@ elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then
sed -i 's/.*nvidia.*/# &/' $(find /etc/apt/ -type f -name "*.list") sed -i 's/.*nvidia.*/# &/' $(find /etc/apt/ -type f -name "*.list")
retry apt-get update retry apt-get update
retry apt-get -y install zip openssl retry apt-get -y install zip openssl
else
echo "Unknown OS: '$OS_NAME'"
exit 1
fi fi
# Version: setup.py uses $PYTORCH_BUILD_VERSION.post$PYTORCH_BUILD_NUMBER if # Version: setup.py uses $PYTORCH_BUILD_VERSION.post$PYTORCH_BUILD_NUMBER if
@ -92,11 +91,16 @@ if [[ -z "$PYTORCH_ROOT" ]]; then
exit 1 exit 1
fi fi
pushd "$PYTORCH_ROOT" pushd "$PYTORCH_ROOT"
retry pip install -qUr requirements-build.txt
python setup.py clean python setup.py clean
retry pip install -qr requirements.txt retry pip install -qr requirements.txt
retry pip install -q numpy==2.0.1 retry pip install -q numpy==2.0.1
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
export _GLIBCXX_USE_CXX11_ABI=1
else
export _GLIBCXX_USE_CXX11_ABI=0
fi
if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
echo "Calling build_amd.py at $(date)" echo "Calling build_amd.py at $(date)"
python tools/amd_build/build_amd.py python tools/amd_build/build_amd.py
@ -104,7 +108,7 @@ if [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
export ROCclr_DIR=/opt/rocm/rocclr/lib/cmake/rocclr export ROCclr_DIR=/opt/rocm/rocclr/lib/cmake/rocclr
fi fi
echo "Calling 'python -m pip install .' at $(date)" echo "Calling setup.py install at $(date)"
if [[ $LIBTORCH_VARIANT = *"static"* ]]; then if [[ $LIBTORCH_VARIANT = *"static"* ]]; then
STATIC_CMAKE_FLAG="-DTORCH_STATIC=1" STATIC_CMAKE_FLAG="-DTORCH_STATIC=1"
@ -120,7 +124,7 @@ fi
# TODO: Remove this flag once https://github.com/pytorch/pytorch/issues/55952 is closed # TODO: Remove this flag once https://github.com/pytorch/pytorch/issues/55952 is closed
CFLAGS='-Wno-deprecated-declarations' \ CFLAGS='-Wno-deprecated-declarations' \
BUILD_LIBTORCH_CPU_WITH_DEBUG=1 \ BUILD_LIBTORCH_CPU_WITH_DEBUG=1 \
python -m pip install --no-build-isolation -v . python setup.py install
mkdir -p libtorch/{lib,bin,include,share} mkdir -p libtorch/{lib,bin,include,share}
@ -165,6 +169,12 @@ fi
) )
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
LIBTORCH_ABI="cxx11-abi-"
else
LIBTORCH_ABI=
fi
( (
set -x set -x

View File

@ -95,7 +95,6 @@ ROCM_SO_FILES=(
"libroctracer64.so" "libroctracer64.so"
"libroctx64.so" "libroctx64.so"
"libhipblaslt.so" "libhipblaslt.so"
"libhipsparselt.so"
"libhiprtc.so" "libhiprtc.so"
) )
@ -187,28 +186,20 @@ do
OS_SO_FILES[${#OS_SO_FILES[@]}]=$file_name # Append lib to array OS_SO_FILES[${#OS_SO_FILES[@]}]=$file_name # Append lib to array
done done
ARCH=$(echo $PYTORCH_ROCM_ARCH | sed 's/;/|/g') # Replace ; separated arch list to bar for grep
# rocBLAS library files # rocBLAS library files
ROCBLAS_LIB_SRC=$ROCM_HOME/lib/rocblas/library ROCBLAS_LIB_SRC=$ROCM_HOME/lib/rocblas/library
ROCBLAS_LIB_DST=lib/rocblas/library ROCBLAS_LIB_DST=lib/rocblas/library
ROCBLAS_ARCH_SPECIFIC_FILES=$(ls $ROCBLAS_LIB_SRC | grep -E $ARCH) ARCH=$(echo $PYTORCH_ROCM_ARCH | sed 's/;/|/g') # Replace ; seperated arch list to bar for grep
ROCBLAS_OTHER_FILES=$(ls $ROCBLAS_LIB_SRC | grep -v gfx) ARCH_SPECIFIC_FILES=$(ls $ROCBLAS_LIB_SRC | grep -E $ARCH)
ROCBLAS_LIB_FILES=($ROCBLAS_ARCH_SPECIFIC_FILES $ROCBLAS_OTHER_FILES) OTHER_FILES=$(ls $ROCBLAS_LIB_SRC | grep -v gfx)
ROCBLAS_LIB_FILES=($ARCH_SPECIFIC_FILES $OTHER_FILES)
# hipblaslt library files # hipblaslt library files
HIPBLASLT_LIB_SRC=$ROCM_HOME/lib/hipblaslt/library HIPBLASLT_LIB_SRC=$ROCM_HOME/lib/hipblaslt/library
HIPBLASLT_LIB_DST=lib/hipblaslt/library HIPBLASLT_LIB_DST=lib/hipblaslt/library
HIPBLASLT_ARCH_SPECIFIC_FILES=$(ls $HIPBLASLT_LIB_SRC | grep -E $ARCH) ARCH_SPECIFIC_FILES=$(ls $HIPBLASLT_LIB_SRC | grep -E $ARCH)
HIPBLASLT_OTHER_FILES=$(ls $HIPBLASLT_LIB_SRC | grep -v gfx) OTHER_FILES=$(ls $HIPBLASLT_LIB_SRC | grep -v gfx)
HIPBLASLT_LIB_FILES=($HIPBLASLT_ARCH_SPECIFIC_FILES $HIPBLASLT_OTHER_FILES) HIPBLASLT_LIB_FILES=($ARCH_SPECIFIC_FILES $OTHER_FILES)
# hipsparselt library files
HIPSPARSELT_LIB_SRC=$ROCM_HOME/lib/hipsparselt/library
HIPSPARSELT_LIB_DST=lib/hipsparselt/library
HIPSPARSELT_ARCH_SPECIFIC_FILES=$(ls $HIPSPARSELT_LIB_SRC | grep -E $ARCH)
#HIPSPARSELT_OTHER_FILES=$(ls $HIPSPARSELT_LIB_SRC | grep -v gfx)
HIPSPARSELT_LIB_FILES=($HIPSPARSELT_ARCH_SPECIFIC_FILES $HIPSPARSELT_OTHER_FILES)
# ROCm library files # ROCm library files
ROCM_SO_PATHS=() ROCM_SO_PATHS=()
@ -243,14 +234,12 @@ DEPS_SONAME=(
DEPS_AUX_SRCLIST=( DEPS_AUX_SRCLIST=(
"${ROCBLAS_LIB_FILES[@]/#/$ROCBLAS_LIB_SRC/}" "${ROCBLAS_LIB_FILES[@]/#/$ROCBLAS_LIB_SRC/}"
"${HIPBLASLT_LIB_FILES[@]/#/$HIPBLASLT_LIB_SRC/}" "${HIPBLASLT_LIB_FILES[@]/#/$HIPBLASLT_LIB_SRC/}"
"${HIPSPARSELT_LIB_FILES[@]/#/$HIPSPARSELT_LIB_SRC/}"
"/opt/amdgpu/share/libdrm/amdgpu.ids" "/opt/amdgpu/share/libdrm/amdgpu.ids"
) )
DEPS_AUX_DSTLIST=( DEPS_AUX_DSTLIST=(
"${ROCBLAS_LIB_FILES[@]/#/$ROCBLAS_LIB_DST/}" "${ROCBLAS_LIB_FILES[@]/#/$ROCBLAS_LIB_DST/}"
"${HIPBLASLT_LIB_FILES[@]/#/$HIPBLASLT_LIB_DST/}" "${HIPBLASLT_LIB_FILES[@]/#/$HIPBLASLT_LIB_DST/}"
"${HIPSPARSELT_LIB_FILES[@]/#/$HIPSPARSELT_LIB_DST/}"
"share/libdrm/amdgpu.ids" "share/libdrm/amdgpu.ids"
) )

View File

@ -20,11 +20,7 @@ fi
source /opt/intel/oneapi/compiler/latest/env/vars.sh source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/pti/latest/env/vars.sh source /opt/intel/oneapi/pti/latest/env/vars.sh
source /opt/intel/oneapi/umf/latest/env/vars.sh source /opt/intel/oneapi/umf/latest/env/vars.sh
source /opt/intel/oneapi/ccl/latest/env/vars.sh
source /opt/intel/oneapi/mpi/latest/env/vars.sh
export USE_STATIC_MKL=1 export USE_STATIC_MKL=1
export USE_ONEMKL=1
export USE_XCCL=1
WHEELHOUSE_DIR="wheelhousexpu" WHEELHOUSE_DIR="wheelhousexpu"
LIBTORCH_HOUSE_DIR="libtorch_housexpu" LIBTORCH_HOUSE_DIR="libtorch_housexpu"

View File

@ -10,3 +10,5 @@ example: `py2-cuda9.0-cudnn7-ubuntu16.04`. The Docker images that are
built on Jenkins and are used in triggered builds already have this built on Jenkins and are used in triggered builds already have this
environment variable set in their manifest. Also see environment variable set in their manifest. Also see
`./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`. `./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`.
Our Jenkins installation is located at https://ci.pytorch.org/jenkins/.

View File

@ -19,7 +19,7 @@ git config --global --add safe.directory /var/lib/jenkins/workspace
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
# TODO: This can be removed later once vision is also part of the Docker image # TODO: This can be removed later once vision is also part of the Docker image
pip install -q --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)" pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"
# JIT C++ extensions require ninja, so put it into PATH. # JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH" export PATH="/var/lib/jenkins/.local/bin:$PATH"
# NB: ONNX test is fast (~15m) so it's ok to retry it few more times to avoid any flaky issue, we # NB: ONNX test is fast (~15m) so it's ok to retry it few more times to avoid any flaky issue, we

34
.ci/pytorch/build-mobile.sh Executable file
View File

@ -0,0 +1,34 @@
#!/usr/bin/env bash
# DO NOT ADD 'set -x' not to reveal CircleCI secret context environment variables
set -eu -o pipefail
# This script uses linux host toolchain + mobile build options in order to
# build & test mobile libtorch without having to setup Android/iOS
# toolchain/simulator.
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
# Install torch & torchvision - used to download & trace test model.
# Ideally we should use the libtorch built on the PR so that backward
# incompatible changes won't break this script - but it will significantly slow
# down mobile CI jobs.
# Here we install nightly instead of stable so that we have an option to
# temporarily skip mobile CI jobs on BC-breaking PRs until they are in nightly.
retry pip install --pre torch torchvision \
-f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html \
--progress-bar off
# Run end-to-end process of building mobile library, linking into the predictor
# binary, and running forward pass with a real model.
if [[ "$BUILD_ENVIRONMENT" == *-mobile-custom-build-static* ]]; then
TEST_CUSTOM_BUILD_STATIC=1 test/mobile/custom_build/build.sh
elif [[ "$BUILD_ENVIRONMENT" == *-mobile-lightweight-dispatch* ]]; then
test/mobile/lightweight_dispatch/build.sh
else
TEST_DEFAULT_BUILD=1 test/mobile/custom_build/build.sh
fi
print_sccache_stats

View File

@ -11,6 +11,10 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# shellcheck source=./common-build.sh # shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh" source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
if [[ "$BUILD_ENVIRONMENT" == *-mobile-*build* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile.sh" "$@"
fi
echo "Python version:" echo "Python version:"
python --version python --version
@ -23,12 +27,6 @@ cmake --version
echo "Environment variables:" echo "Environment variables:"
env env
# The sccache wrapped version of nvcc gets put in /opt/cache/lib in docker since
# there are some issues if it is always wrapped, so we need to add it to PATH
# during CI builds.
# https://github.com/pytorch/pytorch/blob/0b6c0898e6c352c8ea93daec854e704b41485375/.ci/docker/common/install_cache.sh#L97
export PATH="/opt/cache/lib:$PATH"
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
# Use jemalloc during compilation to mitigate https://github.com/pytorch/pytorch/issues/116289 # Use jemalloc during compilation to mitigate https://github.com/pytorch/pytorch/issues/116289
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
@ -37,7 +35,7 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
fi fi
if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *clang* ]]; then if [[ "$BUILD_ENVIRONMENT" != *cuda11.3* && "$BUILD_ENVIRONMENT" != *clang* ]]; then
# TODO: there is a linking issue when building with UCC using clang, # TODO: there is a linking issue when building with UCC using clang,
# disable it for now and to be fix later. # disable it for now and to be fix later.
# TODO: disable UCC temporarily to enable CUDA 12.1 in CI # TODO: disable UCC temporarily to enable CUDA 12.1 in CI
@ -54,6 +52,12 @@ fi
export USE_LLVM=/opt/llvm export USE_LLVM=/opt/llvm
export LLVM_DIR=/opt/llvm/lib/cmake/llvm export LLVM_DIR=/opt/llvm/lib/cmake/llvm
if [[ "$BUILD_ENVIRONMENT" == *executorch* ]]; then
# To build test_edge_op_registration
export BUILD_EXECUTORCH=ON
export USE_CUDA=0
fi
if ! which conda; then if ! which conda; then
# In ROCm CIs, we are doing cross compilation on build machines with # In ROCm CIs, we are doing cross compilation on build machines with
# intel cpu and later run tests on machines with amd cpu. # intel cpu and later run tests on machines with amd cpu.
@ -120,8 +124,26 @@ if [[ "$BUILD_ENVIRONMENT" == *libtorch* ]]; then
fi fi
# Use special scripts for Android builds # Use special scripts for Android builds
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
build_args=()
if [[ "${BUILD_ENVIRONMENT}" == *-arm-v7a* ]]; then
build_args+=("-DANDROID_ABI=armeabi-v7a")
elif [[ "${BUILD_ENVIRONMENT}" == *-arm-v8a* ]]; then
build_args+=("-DANDROID_ABI=arm64-v8a")
elif [[ "${BUILD_ENVIRONMENT}" == *-x86_32* ]]; then
build_args+=("-DANDROID_ABI=x86")
elif [[ "${BUILD_ENVIRONMENT}" == *-x86_64* ]]; then
build_args+=("-DANDROID_ABI=x86_64")
fi
if [[ "${BUILD_ENVIRONMENT}" == *vulkan* ]]; then
build_args+=("-DUSE_VULKAN=ON")
fi
build_args+=("-DUSE_LITE_INTERPRETER_PROFILER=OFF")
exec ./scripts/build_android.sh "${build_args[@]}" "$@"
fi
if [[ "$BUILD_ENVIRONMENT" == *vulkan* ]]; then if [[ "$BUILD_ENVIRONMENT" != *android* && "$BUILD_ENVIRONMENT" == *vulkan* ]]; then
export USE_VULKAN=1 export USE_VULKAN=1
# shellcheck disable=SC1091 # shellcheck disable=SC1091
source /var/lib/jenkins/vulkansdk/setup-env.sh source /var/lib/jenkins/vulkansdk/setup-env.sh
@ -149,12 +171,6 @@ fi
if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
# shellcheck disable=SC1091 # shellcheck disable=SC1091
source /opt/intel/oneapi/compiler/latest/env/vars.sh source /opt/intel/oneapi/compiler/latest/env/vars.sh
# shellcheck disable=SC1091
source /opt/intel/oneapi/ccl/latest/env/vars.sh
# shellcheck disable=SC1091
source /opt/intel/oneapi/mpi/latest/env/vars.sh
# Enable XCCL build
export USE_XCCL=1
# XPU kineto feature dependencies are not fully ready, disable kineto build as temp WA # XPU kineto feature dependencies are not fully ready, disable kineto build as temp WA
export USE_KINETO=0 export USE_KINETO=0
export TORCH_XPU_ARCH_LIST=pvc export TORCH_XPU_ARCH_LIST=pvc
@ -176,8 +192,10 @@ fi
# We only build FlashAttention files for CUDA 8.0+, and they require large amounts of # We only build FlashAttention files for CUDA 8.0+, and they require large amounts of
# memory to build and will OOM # memory to build and will OOM
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]] && [[ 1 -eq $(echo "${TORCH_CUDA_ARCH_LIST} >= 8.0" | bc) ]]; then if [[ "$BUILD_ENVIRONMENT" == *cuda* ]] && [[ 1 -eq $(echo "${TORCH_CUDA_ARCH_LIST} >= 8.0" | bc) ]] && [ -z "$MAX_JOBS_OVERRIDE" ]; then
export BUILD_CUSTOM_STEP="ninja -C build flash_attention -j 2" echo "WARNING: FlashAttention files require large amounts of memory to build and will OOM"
echo "Setting MAX_JOBS=(nproc-2)/3 to reduce memory usage"
export MAX_JOBS="$(( $(nproc --ignore=2) / 3 ))"
fi fi
if [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then if [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then
@ -203,7 +221,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *-pch* ]]; then
export USE_PRECOMPILED_HEADERS=1 export USE_PRECOMPILED_HEADERS=1
fi fi
if [[ "${BUILD_ENVIRONMENT}" != *cuda* ]]; then if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* ]]; then
export BUILD_STATIC_RUNTIME_BENCHMARK=ON export BUILD_STATIC_RUNTIME_BENCHMARK=ON
fi fi
@ -233,7 +251,6 @@ if [[ "$BUILD_ENVIRONMENT" == *-bazel-* ]]; then
set -e -o pipefail set -e -o pipefail
get_bazel get_bazel
python3 tools/optional_submodules.py checkout_eigen
# Leave 1 CPU free and use only up to 80% of memory to reduce the change of crashing # Leave 1 CPU free and use only up to 80% of memory to reduce the change of crashing
# the runner # the runner
@ -260,8 +277,10 @@ else
# or building non-XLA tests. # or building non-XLA tests.
if [[ "$BUILD_ENVIRONMENT" != *rocm* && if [[ "$BUILD_ENVIRONMENT" != *rocm* &&
"$BUILD_ENVIRONMENT" != *xla* ]]; then "$BUILD_ENVIRONMENT" != *xla* ]]; then
# Install numpy-2.0.2 for builds which are backward compatible with 1.X if [[ "$BUILD_ENVIRONMENT" != *py3.8* ]]; then
python -mpip install numpy==2.0.2 # Install numpy-2.0.2 for builds which are backward compatible with 1.X
python -mpip install numpy==2.0.2
fi
WERROR=1 python setup.py clean WERROR=1 python setup.py clean
@ -284,34 +303,6 @@ else
fi fi
pip_install_whl "$(echo dist/*.whl)" pip_install_whl "$(echo dist/*.whl)"
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *vision* ]]; then
install_torchvision
fi
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *audio* ]]; then
install_torchaudio
fi
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *torchrec* || "${BUILD_ADDITIONAL_PACKAGES:-}" == *fbgemm* ]]; then
install_torchrec_and_fbgemm
fi
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *torchao* ]]; then
install_torchao
fi
if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
echo "Checking that xpu is compiled"
pushd dist/
if python -c 'import torch; exit(0 if torch.xpu._is_compiled() else 1)'; then
echo "XPU support is compiled in."
else
echo "XPU support is NOT compiled in."
exit 1
fi
popd
fi
# TODO: I'm not sure why, but somehow we lose verbose commands # TODO: I'm not sure why, but somehow we lose verbose commands
set -x set -x
@ -387,8 +378,10 @@ else
# This is an attempt to mitigate flaky libtorch build OOM error. By default, the build parallelization # This is an attempt to mitigate flaky libtorch build OOM error. By default, the build parallelization
# is set to be the number of CPU minus 2. So, let's try a more conservative value here. A 4xlarge has # is set to be the number of CPU minus 2. So, let's try a more conservative value here. A 4xlarge has
# 16 CPUs # 16 CPUs
MAX_JOBS=$(nproc --ignore=4) if [ -z "$MAX_JOBS_OVERRIDE" ]; then
export MAX_JOBS MAX_JOBS=$(nproc --ignore=4)
export MAX_JOBS
fi
# NB: Install outside of source directory (at the same level as the root # NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push. # pytorch folder) so that it doesn't get cleaned away prior to docker push.

View File

@ -59,16 +59,78 @@ else
export install_root="$(dirname $(which python))/../lib/python${py_dot}/site-packages/torch/" export install_root="$(dirname $(which python))/../lib/python${py_dot}/site-packages/torch/"
fi fi
###############################################################################
# Setup XPU ENV
###############################################################################
if [[ "$DESIRED_CUDA" == 'xpu' ]]; then
set +u
# Refer https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html
source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/pti/latest/env/vars.sh
fi
############################################################################### ###############################################################################
# Check GCC ABI # Check GCC ABI
############################################################################### ###############################################################################
# NOTE: As of https://github.com/pytorch/pytorch/issues/126551 we only produce # NOTE [ Building libtorch with old vs. new gcc ABI ]
# wheels with cxx11-abi #
# Packages built with one version of ABI could not be linked against by client
# C++ libraries that were compiled using the other version of ABI. Since both
# gcc ABIs are still common in the wild, we need to support both ABIs. Currently:
#
# - All the nightlies built on CentOS 7 + devtoolset7 use the old gcc ABI.
# - All the nightlies built on Ubuntu 16.04 + gcc 5.4 use the new gcc ABI.
echo "Checking that the gcc ABI is what we expect" echo "Checking that the gcc ABI is what we expect"
if [[ "$(uname)" != 'Darwin' ]]; then if [[ "$(uname)" != 'Darwin' ]]; then
# We also check that there are cxx11 symbols in libtorch function is_expected() {
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* || "$DESIRED_CUDA" == *"rocm"* ]]; then
if [[ "$1" -gt 0 || "$1" == "ON " ]]; then
echo 1
fi
else
if [[ -z "$1" || "$1" == 0 || "$1" == "OFF" ]]; then
echo 1
fi
fi
}
# First we check that the env var in TorchConfig.cmake is correct
# We search for D_GLIBCXX_USE_CXX11_ABI=1 in torch/TorchConfig.cmake
torch_config="${install_root}/share/cmake/Torch/TorchConfig.cmake"
if [[ ! -f "$torch_config" ]]; then
echo "No TorchConfig.cmake found!"
ls -lah "$install_root/share/cmake/Torch"
exit 1
fi
echo "Checking the TorchConfig.cmake"
cat "$torch_config"
# The sed call below is
# don't print lines by default (only print the line we want)
# -n
# execute the following expression
# e
# replace lines that match with the first capture group and print
# s/.*D_GLIBCXX_USE_CXX11_ABI=\(.\)".*/\1/p
# any characters, D_GLIBCXX_USE_CXX11_ABI=, exactly one any character, a
# quote, any characters
# Note the exactly one single character after the '='. In the case that the
# variable is not set the '=' will be followed by a '"' immediately and the
# line will fail the match and nothing will be printed; this is what we
# want. Otherwise it will capture the 0 or 1 after the '='.
# /.*D_GLIBCXX_USE_CXX11_ABI=\(.\)".*/
# replace the matched line with the capture group and print
# /\1/p
actual_gcc_abi="$(sed -ne 's/.*D_GLIBCXX_USE_CXX11_ABI=\(.\)".*/\1/p' < "$torch_config")"
if [[ "$(is_expected "$actual_gcc_abi")" != 1 ]]; then
echo "gcc ABI $actual_gcc_abi not as expected."
exit 1
fi
# We also check that there are [not] cxx11 symbols in libtorch
# #
echo "Checking that symbols in libtorch.so have the right gcc abi" echo "Checking that symbols in libtorch.so have the right gcc abi"
python3 "$(dirname ${BASH_SOURCE[0]})/smoke_test/check_binary_symbols.py" python3 "$(dirname ${BASH_SOURCE[0]})/smoke_test/check_binary_symbols.py"
@ -146,11 +208,35 @@ setup_link_flags () {
TEST_CODE_DIR="$(dirname $(realpath ${BASH_SOURCE[0]}))/test_example_code" TEST_CODE_DIR="$(dirname $(realpath ${BASH_SOURCE[0]}))/test_example_code"
build_and_run_example_cpp () { build_and_run_example_cpp () {
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
GLIBCXX_USE_CXX11_ABI=1
else
GLIBCXX_USE_CXX11_ABI=0
fi
setup_link_flags setup_link_flags
g++ ${TEST_CODE_DIR}/$1.cpp -I${install_root}/include -I${install_root}/include/torch/csrc/api/include -std=gnu++17 -L${install_root}/lib ${REF_LIB} ${ADDITIONAL_LINKER_FLAGS} -ltorch $TORCH_CPU_LINK_FLAGS $TORCH_CUDA_LINK_FLAGS $C10_LINK_FLAGS -o $1 g++ ${TEST_CODE_DIR}/$1.cpp -I${install_root}/include -I${install_root}/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=$GLIBCXX_USE_CXX11_ABI -std=gnu++17 -L${install_root}/lib ${REF_LIB} ${ADDITIONAL_LINKER_FLAGS} -ltorch $TORCH_CPU_LINK_FLAGS $TORCH_CUDA_LINK_FLAGS $C10_LINK_FLAGS -o $1
./$1 ./$1
} }
build_example_cpp_with_incorrect_abi () {
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
GLIBCXX_USE_CXX11_ABI=0
else
GLIBCXX_USE_CXX11_ABI=1
fi
set +e
setup_link_flags
g++ ${TEST_CODE_DIR}/$1.cpp -I${install_root}/include -I${install_root}/include/torch/csrc/api/include -D_GLIBCXX_USE_CXX11_ABI=$GLIBCXX_USE_CXX11_ABI -std=gnu++17 -L${install_root}/lib ${REF_LIB} ${ADDITIONAL_LINKER_FLAGS} -ltorch $TORCH_CPU_LINK_FLAGS $TORCH_CUDA_LINK_FLAGS $C10_LINK_FLAGS -o $1
ERRCODE=$?
set -e
if [ "$ERRCODE" -eq "0" ]; then
echo "Building example with incorrect ABI didn't throw error. Aborting."
exit 1
else
echo "Building example with incorrect ABI throws expected error. Proceeding."
fi
}
############################################################################### ###############################################################################
# Check simple Python/C++ calls # Check simple Python/C++ calls
############################################################################### ###############################################################################
@ -160,6 +246,11 @@ if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
export LD_LIBRARY_PATH=/usr/local/cuda/lib64 export LD_LIBRARY_PATH=/usr/local/cuda/lib64
fi fi
build_and_run_example_cpp simple-torch-test build_and_run_example_cpp simple-torch-test
# `_GLIBCXX_USE_CXX11_ABI` is always ignored by gcc in devtoolset7, so we test
# the expected failure case for Ubuntu 16.04 + gcc 5.4 only.
if [[ "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* ]]; then
build_example_cpp_with_incorrect_abi simple-torch-test
fi
else else
pushd /tmp pushd /tmp
python -c 'import torch' python -c 'import torch'
@ -216,14 +307,6 @@ else
fi fi
fi fi
###############################################################################
# Check XPU configured correctly
###############################################################################
if [[ "$DESIRED_CUDA" == 'xpu' && "$PACKAGE_TYPE" != 'libtorch' ]]; then
echo "Checking that xpu is compiled"
python -c 'import torch; exit(0 if torch.xpu._is_compiled() else 1)'
fi
############################################################################### ###############################################################################
# Check CUDA configured correctly # Check CUDA configured correctly
############################################################################### ###############################################################################
@ -302,22 +385,10 @@ except RuntimeError as e:
fi fi
############################################################################### ###############################################################################
# Check for C++ ABI compatibility to GCC-11 - GCC 13 # Check for C++ ABI compatibility between gcc7 and gcc9 compiled binaries
############################################################################### ###############################################################################
if [[ "$(uname)" == 'Linux' && "$PACKAGE_TYPE" == 'manywheel' ]]; then if [[ "$(uname)" == 'Linux' && "$PACKAGE_TYPE" == 'manywheel' ]]; then
pushd /tmp pushd /tmp
# Per https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html python -c "import torch; exit(0 if torch.compiled_with_cxx11_abi() else (0 if torch._C._PYBIND11_BUILD_ABI == '_cxxabi1011' else 1))"
# gcc-11 is ABI16, gcc-13 is ABI18, gcc-14 is ABI19
# gcc 11 - CUDA 11.8, xpu, rocm
# gcc 13 - CUDA 12.6, 12.8 and cpu
# Please see issue for reference: https://github.com/pytorch/pytorch/issues/152426
if [[ "$(uname -m)" == "s390x" ]]; then
cxx_abi="19"
elif [[ "$DESIRED_CUDA" != 'xpu' && "$DESIRED_CUDA" != 'rocm'* ]]; then
cxx_abi="18"
else
cxx_abi="16"
fi
python -c "import torch; exit(0 if torch._C._PYBIND11_BUILD_ABI == '_cxxabi10${cxx_abi}' else 1)"
popd popd
fi fi

View File

@ -13,13 +13,6 @@ if [[ "$BUILD_ENVIRONMENT" != *win-* ]]; then
fi fi
if which sccache > /dev/null; then if which sccache > /dev/null; then
# Clear SCCACHE_BUCKET and SCCACHE_REGION if they are empty, otherwise
# sccache will complain about invalid bucket configuration
if [[ -z "${SCCACHE_BUCKET:-}" ]]; then
unset SCCACHE_BUCKET
unset SCCACHE_REGION
fi
# Save sccache logs to file # Save sccache logs to file
sccache --stop-server > /dev/null 2>&1 || true sccache --stop-server > /dev/null 2>&1 || true
rm -f ~/sccache_error.log || true rm -f ~/sccache_error.log || true

View File

@ -13,8 +13,12 @@ if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
# HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors # HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
unset HIP_PLATFORM unset HIP_PLATFORM
export PYTORCH_TEST_WITH_ROCM=1 export PYTORCH_TEST_WITH_ROCM=1
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
# improve rccl performance for distributed tests
export HSA_FORCE_FINE_GRAIN_PCIE=1
fi fi
# TODO: Reenable libtorch testing for MacOS, see https://github.com/pytorch/pytorch/issues/62598 # TODO: Renable libtorch testing for MacOS, see https://github.com/pytorch/pytorch/issues/62598
# shellcheck disable=SC2034 # shellcheck disable=SC2034
BUILD_TEST_LIBTORCH=0 BUILD_TEST_LIBTORCH=0

View File

@ -78,34 +78,6 @@ function pip_install_whl() {
fi fi
} }
function pip_build_and_install() {
local build_target=$1
local wheel_dir=$2
local found_whl=0
for file in "${wheel_dir}"/*.whl
do
if [[ -f "${file}" ]]; then
found_whl=1
break
fi
done
# Build the wheel if it doesn't exist
if [ "${found_whl}" == "0" ]; then
python3 -m pip wheel \
--no-build-isolation \
--no-deps \
--no-use-pep517 \
-w "${wheel_dir}" \
"${build_target}"
fi
for file in "${wheel_dir}"/*.whl
do
pip_install_whl "${file}"
done
}
function pip_install() { function pip_install() {
# retry 3 times # retry 3 times
@ -152,7 +124,14 @@ function get_pinned_commit() {
function install_torchaudio() { function install_torchaudio() {
local commit local commit
commit=$(get_pinned_commit audio) commit=$(get_pinned_commit audio)
pip_build_and_install "git+https://github.com/pytorch/audio.git@${commit}" dist/audio if [[ "$1" == "cuda" ]]; then
# TODO: This is better to be passed as a parameter from _linux-test workflow
# so that it can be consistent with what is set in build
TORCH_CUDA_ARCH_LIST="8.0;8.6" pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git@${commit}"
else
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git@${commit}"
fi
} }
function install_torchtext() { function install_torchtext() {
@ -160,8 +139,8 @@ function install_torchtext() {
local text_commit local text_commit
data_commit=$(get_pinned_commit data) data_commit=$(get_pinned_commit data)
text_commit=$(get_pinned_commit text) text_commit=$(get_pinned_commit text)
pip_build_and_install "git+https://github.com/pytorch/data.git@${data_commit}" dist/data pip_install --no-use-pep517 --user "git+https://github.com/pytorch/data.git@${data_commit}"
pip_build_and_install "git+https://github.com/pytorch/text.git@${text_commit}" dist/text pip_install --no-use-pep517 --user "git+https://github.com/pytorch/text.git@${text_commit}"
} }
function install_torchvision() { function install_torchvision() {
@ -174,19 +153,17 @@ function install_torchvision() {
echo 'char* dlerror(void) { return "";}'|gcc -fpic -shared -o "${HOME}/dlerror.so" -x c - echo 'char* dlerror(void) { return "";}'|gcc -fpic -shared -o "${HOME}/dlerror.so" -x c -
LD_PRELOAD=${orig_preload}:${HOME}/dlerror.so LD_PRELOAD=${orig_preload}:${HOME}/dlerror.so
fi fi
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/vision.git@${commit}"
if [[ "${BUILD_ENVIRONMENT}" == *cuda* ]]; then
# Not sure if both are needed, but why not
export FORCE_CUDA=1
export WITH_CUDA=1
fi
pip_build_and_install "git+https://github.com/pytorch/vision.git@${commit}" dist/vision
if [ -n "${LD_PRELOAD}" ]; then if [ -n "${LD_PRELOAD}" ]; then
LD_PRELOAD=${orig_preload} LD_PRELOAD=${orig_preload}
fi fi
} }
function install_tlparse() {
pip_install --user "tlparse==0.3.30"
PATH="$(python -m site --user-base)/bin:$PATH"
}
function install_torchrec_and_fbgemm() { function install_torchrec_and_fbgemm() {
local torchrec_commit local torchrec_commit
torchrec_commit=$(get_pinned_commit torchrec) torchrec_commit=$(get_pinned_commit torchrec)
@ -201,77 +178,31 @@ function install_torchrec_and_fbgemm() {
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]] ; then if [[ "$BUILD_ENVIRONMENT" == *rocm* ]] ; then
# install torchrec first because it installs fbgemm nightly on top of rocm fbgemm # install torchrec first because it installs fbgemm nightly on top of rocm fbgemm
pip_build_and_install "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}" dist/torchrec pip_install --no-use-pep517 --user "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}"
pip_uninstall fbgemm-gpu-nightly pip_uninstall fbgemm-gpu-nightly
# Set ROCM_HOME isn't available, use ROCM_PATH if set or /opt/rocm
ROCM_HOME="${ROCM_HOME:-${ROCM_PATH:-/opt/rocm}}"
# Find rocm_version.h header file for ROCm version extract
rocm_version_h="${ROCM_HOME}/include/rocm-core/rocm_version.h"
if [ ! -f "$rocm_version_h" ]; then
rocm_version_h="${ROCM_HOME}/include/rocm_version.h"
fi
# Error out if rocm_version.h not found
if [ ! -f "$rocm_version_h" ]; then
echo "Error: rocm_version.h not found in expected locations." >&2
exit 1
fi
# Extract major, minor and patch ROCm version numbers
MAJOR_VERSION=$(grep 'ROCM_VERSION_MAJOR' "$rocm_version_h" | awk '{print $3}')
MINOR_VERSION=$(grep 'ROCM_VERSION_MINOR' "$rocm_version_h" | awk '{print $3}')
PATCH_VERSION=$(grep 'ROCM_VERSION_PATCH' "$rocm_version_h" | awk '{print $3}')
ROCM_INT=$((MAJOR_VERSION * 10000 + MINOR_VERSION * 100 + PATCH_VERSION))
echo "ROCm version: $ROCM_INT"
export BUILD_ROCM_VERSION="$MAJOR_VERSION.$MINOR_VERSION"
pip_install tabulate # needed for newer fbgemm pip_install tabulate # needed for newer fbgemm
pip_install patchelf # needed for rocm fbgemm pip_install patchelf # needed for rocm fbgemm
git clone --recursive https://github.com/pytorch/fbgemm
local wheel_dir=dist/fbgemm_gpu pushd fbgemm/fbgemm_gpu
local found_whl=0 git checkout "${fbgemm_commit}"
for file in "${wheel_dir}"/*.whl python setup.py install \
do --package_variant=rocm \
if [[ -f "${file}" ]]; then -DHIP_ROOT_DIR="${ROCM_PATH}" \
found_whl=1 -DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
break -DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
fi popd
done
# Build the wheel if it doesn't exist
if [ "${found_whl}" == "0" ]; then
git clone --recursive https://github.com/pytorch/fbgemm
pushd fbgemm/fbgemm_gpu
git checkout "${fbgemm_commit}" --recurse-submodules
python setup.py bdist_wheel \
--build-variant=rocm \
-DHIP_ROOT_DIR="${ROCM_PATH}" \
-DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
-DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
popd
# Save the wheel before cleaning up
mkdir -p dist/fbgemm_gpu
cp fbgemm/fbgemm_gpu/dist/*.whl dist/fbgemm_gpu
fi
for file in "${wheel_dir}"/*.whl
do
pip_install_whl "${file}"
done
rm -rf fbgemm rm -rf fbgemm
else else
pip_build_and_install "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}" dist/torchrec # See https://github.com/pytorch/pytorch/issues/106971
pip_build_and_install "git+https://github.com/pytorch/FBGEMM.git@${fbgemm_commit}#subdirectory=fbgemm_gpu" dist/fbgemm_gpu CUDA_PATH=/usr/local/cuda-12.1 pip_install --no-use-pep517 --user "git+https://github.com/pytorch/FBGEMM.git@${fbgemm_commit}#egg=fbgemm-gpu&subdirectory=fbgemm_gpu"
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}"
fi fi
} }
function clone_pytorch_xla() { function clone_pytorch_xla() {
if [[ ! -d ./xla ]]; then if [[ ! -d ./xla ]]; then
git clone --recursive --quiet https://github.com/pytorch/xla.git git clone --recursive -b r2.7 https://github.com/pytorch/xla.git
pushd xla pushd xla
# pin the xla hash so that we don't get broken by changes to xla # pin the xla hash so that we don't get broken by changes to xla
git checkout "$(cat ../.github/ci_commit_pins/xla.txt)" git checkout "$(cat ../.github/ci_commit_pins/xla.txt)"
@ -281,10 +212,34 @@ function clone_pytorch_xla() {
fi fi
} }
function checkout_install_torchbench() {
local commit
commit=$(get_pinned_commit torchbench)
git clone https://github.com/pytorch/benchmark torchbench
pushd torchbench
git checkout "$commit"
if [ "$1" ]; then
python install.py --continue_on_fail models "$@"
else
# Occasionally the installation may fail on one model but it is ok to continue
# to install and test other models
python install.py --continue_on_fail
fi
# TODO (huydhn): transformers-4.44.2 added by https://github.com/pytorch/benchmark/pull/2488
# is regressing speedup metric. This needs to be investigated further
pip install transformers==4.38.1
echo "Print all dependencies after TorchBench is installed"
python -mpip freeze
popd
}
function install_torchao() { function install_torchao() {
local commit local commit
commit=$(get_pinned_commit torchao) commit=$(get_pinned_commit torchao)
pip_build_and_install "git+https://github.com/pytorch/ao.git@${commit}" dist/ao pip_install --no-use-pep517 --user "git+https://github.com/pytorch/ao.git@${commit}"
} }
function print_sccache_stats() { function print_sccache_stats() {

View File

@ -0,0 +1,123 @@
from datetime import datetime, timedelta, timezone
from tempfile import mkdtemp
from cryptography import x509
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.x509.oid import NameOID
temp_dir = mkdtemp()
print(temp_dir)
def genrsa(path):
key = rsa.generate_private_key(
public_exponent=65537,
key_size=2048,
)
with open(path, "wb") as f:
f.write(
key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.TraditionalOpenSSL,
encryption_algorithm=serialization.NoEncryption(),
)
)
return key
def create_cert(path, C, ST, L, O, key):
subject = issuer = x509.Name(
[
x509.NameAttribute(NameOID.COUNTRY_NAME, C),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, ST),
x509.NameAttribute(NameOID.LOCALITY_NAME, L),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, O),
]
)
cert = (
x509.CertificateBuilder()
.subject_name(subject)
.issuer_name(issuer)
.public_key(key.public_key())
.serial_number(x509.random_serial_number())
.not_valid_before(datetime.now(timezone.utc))
.not_valid_after(
# Our certificate will be valid for 10 days
datetime.now(timezone.utc) + timedelta(days=10)
)
.add_extension(
x509.BasicConstraints(ca=True, path_length=None),
critical=True,
)
.sign(key, hashes.SHA256())
)
# Write our certificate out to disk.
with open(path, "wb") as f:
f.write(cert.public_bytes(serialization.Encoding.PEM))
return cert
def create_req(path, C, ST, L, O, key):
csr = (
x509.CertificateSigningRequestBuilder()
.subject_name(
x509.Name(
[
# Provide various details about who we are.
x509.NameAttribute(NameOID.COUNTRY_NAME, C),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, ST),
x509.NameAttribute(NameOID.LOCALITY_NAME, L),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, O),
]
)
)
.sign(key, hashes.SHA256())
)
with open(path, "wb") as f:
f.write(csr.public_bytes(serialization.Encoding.PEM))
return csr
def sign_certificate_request(path, csr_cert, ca_cert, private_ca_key):
cert = (
x509.CertificateBuilder()
.subject_name(csr_cert.subject)
.issuer_name(ca_cert.subject)
.public_key(csr_cert.public_key())
.serial_number(x509.random_serial_number())
.not_valid_before(datetime.now(timezone.utc))
.not_valid_after(
# Our certificate will be valid for 10 days
datetime.now(timezone.utc) + timedelta(days=10)
# Sign our certificate with our private key
)
.sign(private_ca_key, hashes.SHA256())
)
with open(path, "wb") as f:
f.write(cert.public_bytes(serialization.Encoding.PEM))
return cert
ca_key = genrsa(temp_dir + "/ca.key")
ca_cert = create_cert(
temp_dir + "/ca.pem",
"US",
"New York",
"New York",
"Gloo Certificate Authority",
ca_key,
)
pkey = genrsa(temp_dir + "/pkey.key")
csr = create_req(
temp_dir + "/csr.csr",
"US",
"California",
"San Francisco",
"Gloo Testing Company",
pkey,
)
cert = sign_certificate_request(temp_dir + "/cert.pem", csr, ca_cert, ca_key)

View File

@ -1,50 +1,31 @@
#!/bin/bash #!/bin/bash
# Script for installing sccache on the xla build job, which uses xla's docker # Script for installing sccache on the xla build job, which uses xla's docker
# image, which has sccache installed but doesn't write the stubs. This is # image and doesn't have sccache installed on it. This is mostly copied from
# mostly copied from .ci/docker/install_cache.sh. Changes are: removing checks # .ci/docker/install_cache.sh. Changes are: removing checks that will always
# that will always return the same thing, ex checks for for rocm, CUDA, changing # return the same thing, ex checks for for rocm, CUDA, and changing the path
# the path where sccache is installed, not changing /etc/environment, and not # where sccache is installed, and not changing /etc/environment.
# installing/downloading sccache as it is already in the docker image.
set -ex -o pipefail set -ex -o pipefail
install_binary() {
echo "Downloading sccache binary from S3 repo"
curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /tmp/cache/bin/sccache
}
mkdir -p /tmp/cache/bin mkdir -p /tmp/cache/bin
mkdir -p /tmp/cache/lib
export PATH="/tmp/cache/bin:$PATH" export PATH="/tmp/cache/bin:$PATH"
install_binary
chmod a+x /tmp/cache/bin/sccache
function write_sccache_stub() { function write_sccache_stub() {
# Unset LD_PRELOAD for ps because of asan + ps issues # Unset LD_PRELOAD for ps because of asan + ps issues
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90589 # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90589
if [ "$1" == "gcc" ]; then # shellcheck disable=SC2086
# Do not call sccache recursively when dumping preprocessor argument # shellcheck disable=SC2059
# For some reason it's very important for the first cached nvcc invocation printf "#!/bin/sh\nif [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then\n exec sccache $(which $1) \"\$@\"\nelse\n exec $(which $1) \"\$@\"\nfi" > "/tmp/cache/bin/$1"
cat >"/tmp/cache/bin/$1" <<EOF
#!/bin/sh
# sccache does not support -E flag, so we need to call the original compiler directly in order to avoid calling this wrapper recursively
for arg in "\$@"; do
if [ "\$arg" = "-E" ]; then
exec $(which "$1") "\$@"
fi
done
if [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then
exec sccache $(which "$1") "\$@"
else
exec $(which "$1") "\$@"
fi
EOF
else
cat >"/tmp/cache/bin/$1" <<EOF
#!/bin/sh
if [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then
exec sccache $(which "$1") "\$@"
else
exec $(which "$1") "\$@"
fi
EOF
fi
chmod a+x "/tmp/cache/bin/$1" chmod a+x "/tmp/cache/bin/$1"
} }

View File

@ -33,15 +33,56 @@ if which sccache > /dev/null; then
export PATH="${tmp_dir}:$PATH" export PATH="${tmp_dir}:$PATH"
fi fi
print_cmake_info cross_compile_arm64() {
if [[ ${BUILD_ENVIRONMENT} == *"distributed"* ]]; then # Cross compilation for arm64
# Needed for inductor benchmarks, as lots of HF networks make `torch.distribtued` calls
USE_DISTRIBUTED=1 USE_OPENMP=1 WERROR=1 python setup.py bdist_wheel
else
# Explicitly set USE_DISTRIBUTED=0 to align with the default build config on mac. This also serves as the sole CI config that tests # Explicitly set USE_DISTRIBUTED=0 to align with the default build config on mac. This also serves as the sole CI config that tests
# that building with USE_DISTRIBUTED=0 works at all. See https://github.com/pytorch/pytorch/issues/86448 # that building with USE_DISTRIBUTED=0 works at all. See https://github.com/pytorch/pytorch/issues/86448
USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel --plat-name macosx_11_0_arm64 USE_DISTRIBUTED=0 CMAKE_OSX_ARCHITECTURES=arm64 MACOSX_DEPLOYMENT_TARGET=11.0 USE_MKLDNN=OFF USE_QNNPACK=OFF WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}
compile_arm64() {
# Compilation for arm64
# TODO: Compile with OpenMP support (but this causes CI regressions as cross-compilation were done with OpenMP disabled)
USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}
compile_x86_64() {
USE_DISTRIBUTED=0 WERROR=1 python setup.py bdist_wheel --plat-name=macosx_10_9_x86_64
}
build_lite_interpreter() {
echo "Testing libtorch (lite interpreter)."
CPP_BUILD="$(pwd)/../cpp_build"
# Ensure the removal of the tmp directory
trap 'rm -rfv ${CPP_BUILD}' EXIT
rm -rf "${CPP_BUILD}"
mkdir -p "${CPP_BUILD}/caffe2"
# It looks libtorch need to be built in "${CPP_BUILD}/caffe2 folder.
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd "${CPP_BUILD}/caffe2" || exit
VERBOSE=1 DEBUG=1 python "${BUILD_LIBTORCH_PY}"
popd || exit
"${CPP_BUILD}/caffe2/build/bin/test_lite_interpreter_runtime"
}
print_cmake_info
if [[ ${BUILD_ENVIRONMENT} = *arm64* ]]; then
if [[ $(uname -m) == "arm64" ]]; then
compile_arm64
else
cross_compile_arm64
fi
elif [[ ${BUILD_ENVIRONMENT} = *lite-interpreter* ]]; then
export BUILD_LITE_INTERPRETER=1
build_lite_interpreter
else
compile_x86_64
fi fi
if which sccache > /dev/null; then if which sccache > /dev/null; then
print_sccache_stats print_sccache_stats
fi fi

View File

@ -20,4 +20,14 @@ print_cmake_info() {
CONDA_INSTALLATION_DIR=$(dirname "$CMAKE_EXEC") CONDA_INSTALLATION_DIR=$(dirname "$CMAKE_EXEC")
# Print all libraries under cmake rpath for debugging # Print all libraries under cmake rpath for debugging
ls -la "$CONDA_INSTALLATION_DIR/../lib" ls -la "$CONDA_INSTALLATION_DIR/../lib"
export CMAKE_EXEC
# Explicitly add conda env lib folder to cmake rpath to address the flaky issue
# where cmake dependencies couldn't be found. This seems to point to how conda
# links $CMAKE_EXEC to its package cache when cloning a new environment
install_name_tool -add_rpath @executable_path/../lib "${CMAKE_EXEC}" || true
# Adding the rpath will invalidate cmake signature, so signing it again here
# to trust the executable. EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))
# with an exit code 137 otherwise
codesign -f -s - "${CMAKE_EXEC}" || true
} }

View File

@ -5,6 +5,11 @@ set -x
# shellcheck source=./macos-common.sh # shellcheck source=./macos-common.sh
source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh" source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"
if [[ -n "$CONDA_ENV" ]]; then
# Use binaries under conda environment
export PATH="$CONDA_ENV/bin":$PATH
fi
# Test that OpenMP is enabled # Test that OpenMP is enabled
pushd test pushd test
if [[ ! $(python -c "import torch; print(int(torch.backends.openmp.is_available()))") == "1" ]]; then if [[ ! $(python -c "import torch; print(int(torch.backends.openmp.is_available()))") == "1" ]]; then
@ -37,16 +42,6 @@ test_python_all() {
assert_git_not_dirty assert_git_not_dirty
} }
test_python_mps() {
setup_test_python
time python test/run_test.py --verbose --mps
MTL_CAPTURE_ENABLED=1 ${CONDA_RUN} python3 test/test_mps.py --verbose -k test_metal_capture
assert_git_not_dirty
}
test_python_shard() { test_python_shard() {
if [[ -z "$NUM_TEST_SHARDS" ]]; then if [[ -z "$NUM_TEST_SHARDS" ]]; then
echo "NUM_TEST_SHARDS must be defined to run a Python test shard" echo "NUM_TEST_SHARDS must be defined to run a Python test shard"
@ -157,33 +152,9 @@ test_jit_hooks() {
assert_git_not_dirty assert_git_not_dirty
} }
# Shellcheck doesn't like it when you pass no arguments to a function
# that can take args. See https://www.shellcheck.net/wiki/SC2120
# shellcheck disable=SC2120
checkout_install_torchbench() {
local commit
commit=$(cat .ci/docker/ci_commit_pins/torchbench.txt)
git clone https://github.com/pytorch/benchmark torchbench
pushd torchbench
git checkout "$commit"
if [ "$1" ]; then
python install.py --continue_on_fail models "$@"
else
# Occasionally the installation may fail on one model but it is ok to continue
# to install and test other models
python install.py --continue_on_fail
fi
echo "Print all dependencies after TorchBench is installed"
python -mpip freeze
popd
}
torchbench_setup_macos() { torchbench_setup_macos() {
git clone --recursive https://github.com/pytorch/vision torchvision git clone --recursive https://github.com/pytorch/vision torchvision
git clone --recursive https://github.com/pytorch/audio torchaudio git clone --recursive https://github.com/pytorch/audio torchaudio
brew install jpeg-turbo libpng
pushd torchvision pushd torchvision
git fetch git fetch
@ -198,15 +169,17 @@ torchbench_setup_macos() {
git checkout "$(cat ../.github/ci_commit_pins/audio.txt)" git checkout "$(cat ../.github/ci_commit_pins/audio.txt)"
git submodule update --init --recursive git submodule update --init --recursive
python setup.py clean python setup.py clean
#TODO: Remove me, when figure out how to make TorchAudio find brew installed openmp python setup.py develop
USE_OPENMP=0 python setup.py develop
popd popd
# Shellcheck doesn't like it when you pass no arguments to a function that can take args. See https://www.shellcheck.net/wiki/SC2120
# shellcheck disable=SC2119,SC2120
checkout_install_torchbench checkout_install_torchbench
} }
pip_benchmark_deps() { conda_benchmark_deps() {
python -mpip install --no-input requests cython scikit-learn six conda install -y astunparse numpy scipy ninja pyyaml setuptools cmake typing-extensions requests protobuf numba cython scikit-learn
conda install -y -c conda-forge librosa
} }
@ -214,7 +187,7 @@ test_torchbench_perf() {
print_cmake_info print_cmake_info
echo "Launching torchbench setup" echo "Launching torchbench setup"
pip_benchmark_deps conda_benchmark_deps
torchbench_setup_macos torchbench_setup_macos
TEST_REPORTS_DIR=$(pwd)/test/test-reports TEST_REPORTS_DIR=$(pwd)/test/test-reports
@ -241,61 +214,32 @@ test_torchbench_smoketest() {
print_cmake_info print_cmake_info
echo "Launching torchbench setup" echo "Launching torchbench setup"
pip_benchmark_deps conda_benchmark_deps
# shellcheck disable=SC2119,SC2120 # shellcheck disable=SC2119,SC2120
torchbench_setup_macos torchbench_setup_macos
TEST_REPORTS_DIR=$(pwd)/test/test-reports TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR" mkdir -p "$TEST_REPORTS_DIR"
local backend=eager
local dtype=notset
local device=mps local device=mps
local dtypes=(undefined float16 bfloat16 notset)
local dtype=${dtypes[$1]}
local models=(hf_T5 llama BERT_pytorch dcgan hf_GPT2 yolov3 resnet152 sam sam_fast pytorch_unet stable_diffusion_text_encoder speech_transformer Super_SloMo doctr_det_predictor doctr_reco_predictor timm_resnet timm_vovnet vgg16)
for backend in eager inductor; do touch "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_training_${device}_performance.csv"
touch "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv"
echo "Launching torchbench inference performance run for backend ${backend} and dtype ${dtype}" echo "Setup complete, launching torchbench training performance run"
local dtype_arg="--${dtype}" for model in hf_T5 llama BERT_pytorch dcgan hf_GPT2 yolov3 resnet152; do
if [ "$dtype" == notset ]; then PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
dtype_arg="--float32" --performance --only "$model" --backend "$backend" --training --devices "$device" \
fi --output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_training_${device}_performance.csv"
touch "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv" done
for model in "${models[@]}"; do
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--performance --only "$model" --backend "$backend" --inference --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv" || true
if [ "$backend" == "inductor" ]; then
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--accuracy --only "$model" --backend "$backend" --inference --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_accuracy.csv" || true
fi
done
if [ "$backend" == "inductor" ]; then
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/huggingface.py \
--performance --backend "$backend" --inference --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_huggingface_${dtype}_inference_${device}_performance.csv" || true
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/huggingface.py \
--accuracy --backend "$backend" --inference --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_huggingface_${dtype}_inference_${device}_accuracy.csv" || true
fi
if [ "$dtype" == notset ]; then
for dtype_ in notset amp; do
echo "Launching torchbench training performance run for backend ${backend} and dtype ${dtype_}"
touch "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype_}_training_${device}_performance.csv"
local dtype_arg="--${dtype_}"
if [ "$dtype_" == notset ]; then
dtype_arg="--float32"
fi
for model in "${models[@]}"; do
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--performance --only "$model" --backend "$backend" --training --devices "$device" "$dtype_arg" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype_}_training_${device}_performance.csv" || true
done
done
fi
echo "Launching torchbench inference performance run"
for model in hf_T5 llama BERT_pytorch dcgan hf_GPT2 yolov3 resnet152; do
PYTHONPATH="$(pwd)"/torchbench python benchmarks/dynamo/torchbench.py \
--performance --only "$model" --backend "$backend" --inference --devices "$device" \
--output "$TEST_REPORTS_DIR/inductor_${backend}_torchbench_${dtype}_inference_${device}_performance.csv"
done done
echo "Pytorch benchmark on mps device completed" echo "Pytorch benchmark on mps device completed"
@ -305,7 +249,7 @@ test_hf_perf() {
print_cmake_info print_cmake_info
TEST_REPORTS_DIR=$(pwd)/test/test-reports TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR" mkdir -p "$TEST_REPORTS_DIR"
pip_benchmark_deps conda_benchmark_deps
torchbench_setup_macos torchbench_setup_macos
echo "Launching HuggingFace training perf run" echo "Launching HuggingFace training perf run"
@ -321,7 +265,7 @@ test_timm_perf() {
print_cmake_info print_cmake_info
TEST_REPORTS_DIR=$(pwd)/test/test-reports TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR" mkdir -p "$TEST_REPORTS_DIR"
pip_benchmark_deps conda_benchmark_deps
torchbench_setup_macos torchbench_setup_macos
echo "Launching timm training perf run" echo "Launching timm training perf run"
@ -333,6 +277,8 @@ test_timm_perf() {
echo "timm benchmark on mps device completed" echo "timm benchmark on mps device completed"
} }
install_tlparse
if [[ $TEST_CONFIG == *"perf_all"* ]]; then if [[ $TEST_CONFIG == *"perf_all"* ]]; then
test_torchbench_perf test_torchbench_perf
test_hf_perf test_hf_perf
@ -344,9 +290,7 @@ elif [[ $TEST_CONFIG == *"perf_hf"* ]]; then
elif [[ $TEST_CONFIG == *"perf_timm"* ]]; then elif [[ $TEST_CONFIG == *"perf_timm"* ]]; then
test_timm_perf test_timm_perf
elif [[ $TEST_CONFIG == *"perf_smoketest"* ]]; then elif [[ $TEST_CONFIG == *"perf_smoketest"* ]]; then
test_torchbench_smoketest "${SHARD_NUMBER}" test_torchbench_smoketest
elif [[ $TEST_CONFIG == *"mps"* ]]; then
test_python_mps
elif [[ $NUM_TEST_SHARDS -gt 1 ]]; then elif [[ $NUM_TEST_SHARDS -gt 1 ]]; then
test_python_shard "${SHARD_NUMBER}" test_python_shard "${SHARD_NUMBER}"
if [[ "${SHARD_NUMBER}" == 1 ]]; then if [[ "${SHARD_NUMBER}" == 1 ]]; then

View File

@ -0,0 +1,22 @@
#!/bin/bash
set -e
run_test () {
rm -rf test_tmp/ && mkdir test_tmp/ && cd test_tmp/
"$@"
cd .. && rm -rf test_tmp/
}
get_runtime_of_command () {
TIMEFORMAT=%R
# runtime=$( { time ($@ &> /dev/null); } 2>&1 1>/dev/null)
runtime=$( { time "$@"; } 2>&1 1>/dev/null)
if [[ $runtime == *"Error"* ]]; then
exit 1
fi
runtime=${runtime#+++ $@}
runtime=$(python -c "print($runtime)")
echo "$runtime"
}

View File

@ -0,0 +1,91 @@
import argparse
import json
import math
import sys
parser = argparse.ArgumentParser()
parser.add_argument(
"--test-name", dest="test_name", action="store", required=True, help="test name"
)
parser.add_argument(
"--sample-stats",
dest="sample_stats",
action="store",
required=True,
help="stats from sample",
)
parser.add_argument(
"--update",
action="store_true",
help="whether to update baseline using stats from sample",
)
args = parser.parse_args()
test_name = args.test_name
if "cpu" in test_name:
backend = "cpu"
elif "gpu" in test_name:
backend = "gpu"
data_file_path = f"../{backend}_runtime.json"
with open(data_file_path) as data_file:
data = json.load(data_file)
if test_name in data:
mean = float(data[test_name]["mean"])
sigma = float(data[test_name]["sigma"])
else:
# Let the test pass if baseline number doesn't exist
mean = sys.maxsize
sigma = 0.001
print("population mean: ", mean)
print("population sigma: ", sigma)
# Let the test pass if baseline number is NaN (which happened in
# the past when we didn't have logic for catching NaN numbers)
if math.isnan(mean) or math.isnan(sigma):
mean = sys.maxsize
sigma = 0.001
sample_stats_data = json.loads(args.sample_stats)
sample_mean = float(sample_stats_data["mean"])
sample_sigma = float(sample_stats_data["sigma"])
print("sample mean: ", sample_mean)
print("sample sigma: ", sample_sigma)
if math.isnan(sample_mean):
raise Exception("""Error: sample mean is NaN""") # noqa: TRY002
elif math.isnan(sample_sigma):
raise Exception("""Error: sample sigma is NaN""") # noqa: TRY002
z_value = (sample_mean - mean) / sigma
print("z-value: ", z_value)
if z_value >= 3:
raise Exception( # noqa: TRY002
f"""\n
z-value >= 3, there is high chance of perf regression.\n
To reproduce this regression, run
`cd .ci/pytorch/perf_test/ && bash {test_name}.sh` on your local machine
and compare the runtime before/after your code change.
"""
)
else:
print("z-value < 3, no perf regression detected.")
if args.update:
print("We will use these numbers as new baseline.")
new_data_file_path = f"../new_{backend}_runtime.json"
with open(new_data_file_path) as new_data_file:
new_data = json.load(new_data_file)
new_data[test_name] = {}
new_data[test_name]["mean"] = sample_mean
new_data[test_name]["sigma"] = max(sample_sigma, sample_mean * 0.1)
with open(new_data_file_path, "w") as new_data_file:
json.dump(new_data, new_data_file, indent=4)

View File

@ -0,0 +1,18 @@
import json
import sys
import numpy
sample_data_list = sys.argv[1:]
sample_data_list = [float(v.strip()) for v in sample_data_list]
sample_mean = numpy.mean(sample_data_list)
sample_sigma = numpy.std(sample_data_list)
data = {
"mean": sample_mean,
"sigma": sample_sigma,
}
print(json.dumps(data))

View File

@ -0,0 +1,43 @@
#!/bin/bash
set -e
. ./common.sh
test_cpu_speed_mini_sequence_labeler () {
echo "Testing: mini sequence labeler, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 726567a455edbfda6199445922a8cfee82535664
cd scripts/mini_sequence_labeler
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py)
SAMPLE_ARRAY+=("${runtime}")
done
cd ../../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_mini_sequence_labeler "$@"
fi

View File

@ -0,0 +1,45 @@
#!/bin/bash
set -e
. ./common.sh
test_cpu_speed_mnist () {
echo "Testing: MNIST, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/examples.git -b perftests
cd examples/mnist
conda install -c pytorch torchvision-cpu
# Download data
python main.py --epochs 0
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_mnist "$@"
fi

View File

@ -0,0 +1,29 @@
#!/bin/bash
. ./common.sh
test_cpu_speed_torch () {
echo "Testing: torch.*, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/yf225/perf-tests.git
if [ "$1" == "compare_with_baseline" ]; then
export ARGS=(--compare ../cpu_runtime.json)
elif [ "$1" == "compare_and_update" ]; then
export ARGS=(--compare ../cpu_runtime.json --update ../new_cpu_runtime.json)
elif [ "$1" == "update_only" ]; then
export ARGS=(--update ../new_cpu_runtime.json)
fi
if ! python perf-tests/modules/test_cpu_torch.py "${ARGS[@]}"; then
echo "To reproduce this regression, run \`cd .ci/pytorch/perf_test/ && bash ${FUNCNAME[0]}.sh\` on your local machine and compare the runtime before/after your code change."
exit 1
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_torch "$@"
fi

View File

@ -0,0 +1,29 @@
#!/bin/bash
. ./common.sh
test_cpu_speed_torch_tensor () {
echo "Testing: torch.Tensor.*, CPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/yf225/perf-tests.git
if [ "$1" == "compare_with_baseline" ]; then
export ARGS=(--compare ../cpu_runtime.json)
elif [ "$1" == "compare_and_update" ]; then
export ARGS=(--compare ../cpu_runtime.json --update ../new_cpu_runtime.json)
elif [ "$1" == "update_only" ]; then
export ARGS=(--update ../new_cpu_runtime.json)
fi
if ! python perf-tests/modules/test_cpu_torch_tensor.py "${ARGS[@]}"; then
echo "To reproduce this regression, run \`cd .ci/pytorch/perf_test/ && bash ${FUNCNAME[0]}.sh\` on your local machine and compare the runtime before/after your code change."
exit 1
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_cpu_speed_torch_tensor "$@"
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_cudnn_lstm () {
echo "Testing: CuDNN LSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python cudnn_lstm.py --skip-cpu-governor-check)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_cudnn_lstm "$@"
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_lstm () {
echo "Testing: LSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python lstm.py --skip-cpu-governor-check)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_lstm "$@"
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_mlstm () {
echo "Testing: MLSTM, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/benchmark.git
cd benchmark/
git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0
cd scripts/
SAMPLE_ARRAY=()
NUM_RUNS=$1
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python mlstm.py --skip-cpu-governor-check)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_mlstm "$@"
fi

View File

@ -0,0 +1,48 @@
#!/bin/bash
set -e
. ./common.sh
test_gpu_speed_mnist () {
echo "Testing: MNIST, GPU"
export OMP_NUM_THREADS=4
export MKL_NUM_THREADS=4
git clone https://github.com/pytorch/examples.git -b perftests
cd examples/mnist
conda install -c pytorch torchvision
# Download data
python main.py --epochs 0
SAMPLE_ARRAY=()
NUM_RUNS=$1
# Needs warm up to get accurate number
python main.py --epochs 1 --no-log
for (( i=1; i<=NUM_RUNS; i++ )) do
runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
echo "$runtime"
SAMPLE_ARRAY+=("${runtime}")
done
cd ../..
stats=$(python ../get_stats.py "${SAMPLE_ARRAY[@]}")
echo "Runtime stats in seconds:"
echo "$stats"
if [ "$2" == "compare_with_baseline" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}"
elif [ "$2" == "compare_and_update" ]; then
python ../compare_with_baseline.py --test-name "${FUNCNAME[0]}" --sample-stats "${stats}" --update
fi
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
run_test test_gpu_speed_mnist "$@"
fi

Some files were not shown because too many files have changed in this diff Show More