Pull Request resolved: https://github.com/pytorch/pytorch/pull/165756 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Docker images for GitHub CI and CD
This directory contains everything needed to build the Docker images that are used in our CI.
The Dockerfiles located in subdirectories are parameterized to
conditionally run build stages depending on build arguments passed to
docker build
. This lets us use only a few Dockerfiles for many
images. The different configurations are identified by a freeform
string that we call a build environment. This string is persisted in
each image as the BUILD_ENVIRONMENT
environment variable.
See build.sh
for valid build environments (it's the giant switch).
Docker CI builds
build.sh
-- dispatch script to launch all buildscommon
-- scripts used to execute individual Docker build stagesubuntu
-- Dockerfile for Ubuntu image for CPU build and test jobsubuntu-cuda
-- Dockerfile for Ubuntu image with CUDA support for nvidia-dockerubuntu-rocm
-- Dockerfile for Ubuntu image with ROCm supportubuntu-xpu
-- Dockerfile for Ubuntu image with XPU support
Docker CD builds
conda
- Dockerfile and build.sh to build Docker images used in nightly conda buildsmanywheel
- Dockerfile and build.sh to build Docker images used in nightly manywheel buildslibtorch
- Dockerfile and build.sh to build Docker images used in nightly libtorch builds
Usage
# Build a specific image
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
# Set flags (see build.sh) and build image
sudo bash -c 'TRITON=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
[Guidance] Adding a New Base Docker Image
Background
The base Docker images in directory .ci/docker/
are built by the docker-builds.yml
workflow. Those images are used throughout the PyTorch CI/CD pipeline. You should only create or modify a base Docker image if you need specific environment changes or dependencies before building PyTorch on CI.
-
Automatic Rebuilding:
- The Docker image building process is triggered automatically when changes are made to files in the
.ci/docker/*
directory - This ensures all images stay up-to-date with the latest dependencies and configurations
- The Docker image building process is triggered automatically when changes are made to files in the
-
Image Reuse in PyTorch Build Workflows (example: linux-build):
- The images generated by
docker-builds.yml
are reused in_linux-build.yml
through thecalculate-docker-image
step - The
_linux-build.yml
workflow:- Pulls the Docker image determined by the
calculate-docker-image
step - Runs a Docker container with that image
- Executes
.ci/pytorch/build.sh
inside the container to build PyTorch
- Pulls the Docker image determined by the
- The images generated by
-
Usage in Test Workflows (example: linux-test):
- The same Docker images are also used in
_linux-test.yml
for running tests - The
_linux-test.yml
workflow follows a similar pattern:- It uses the
calculate-docker-image
step to determine which Docker image to use - It pulls the Docker image and runs a container with that image
- It installs the wheels from the artifacts generated by PyTorch build jobs
- It executes test scripts (like
.ci/pytorch/test.sh
or.ci/pytorch/multigpu-test.sh
) inside the container
- It uses the
- The same Docker images are also used in
Understanding File Purposes
.ci/docker/build.sh
vs .ci/pytorch/build.sh
-
.ci/docker/build.sh
:- Used for building base Docker images
- Executed by the
docker-builds.yml
workflow to pre-build Docker images for CI - Contains configurations for different Docker build environments
-
.ci/pytorch/build.sh
:- Used for building PyTorch inside a Docker container
- Called by workflows like
_linux-build.yml
after the Docker container is started - Builds PyTorch wheels and other artifacts
.ci/docker/ci_commit_pins/
vs .github/ci_commit_pins
-
.ci/docker/ci_commit_pins/
:- Used for pinning dependency versions during base Docker image building
- Ensures consistent environments for building PyTorch
- Changes here trigger base Docker image rebuilds
-
.github/ci_commit_pins
:- Used for pinning dependency versions during PyTorch building and tests
- Ensures consistent dependencies for PyTorch across different builds
- Used by build scripts running inside Docker containers
Step-by-Step Guide for Adding a New Base Docker Image
1. Add Pinned Commits (If Applicable)
We use pinned commits for build stability. The nightly.yml
workflow checks and updates pinned commits for certain repository dependencies daily.
If your new Docker image needs a library installed from a specific pinned commit or built from source:
- Add the repository you want to track in
nightly.yml
andmerge-rules.yml
- Add the initial pinned commit in
.ci/docker/ci_commit_pins/
. The text filename should match the one defined in step 1
2. Configure the Base Docker Image
-
Add new Base Docker image configuration (if applicable):
Add the configuration in
.ci/docker/build.sh
. For example:pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-new1) CUDA_VERSION=12.8.1 ANACONDA_PYTHON_VERSION=3.12 GCC_VERSION=11 VISION=yes KATEX=yes UCX_COMMIT=${_UCX_COMMIT} UCC_COMMIT=${_UCC_COMMIT} TRITON=yes NEW_ARG_1=yes ;;
-
Add build arguments to Docker build command:
If you're introducing a new argument to the Docker build, make sure to add it in the Docker build step in
.ci/docker/build.sh
:docker build \ .... --build-arg "NEW_ARG_1=${NEW_ARG_1}"
-
Update Dockerfile logic:
Update the Dockerfile to use the new argument. For example, in
ubuntu/Dockerfile
:ARG NEW_ARG_1 # Set up environment for NEW_ARG_1 RUN if [ -n "${NEW_ARG_1}" ]; then bash ./do_something.sh; fi
-
Add the Docker configuration in
.github/workflows/docker-builds.yml
:The
docker-builds.yml
workflow pre-builds the Docker images whenever changes occur in the.ci/docker/
directory. This includes the pinned commit updates.