Files
pytorch/.ci/docker
..

Docker images for GitHub CI and CD

This directory contains everything needed to build the Docker images that are used in our CI.

The Dockerfiles located in subdirectories are parameterized to conditionally run build stages depending on build arguments passed to docker build. This lets us use only a few Dockerfiles for many images. The different configurations are identified by a freeform string that we call a build environment. This string is persisted in each image as the BUILD_ENVIRONMENT environment variable.

See build.sh for valid build environments (it's the giant switch).

Docker CI builds

  • build.sh -- dispatch script to launch all builds
  • common -- scripts used to execute individual Docker build stages
  • ubuntu -- Dockerfile for Ubuntu image for CPU build and test jobs
  • ubuntu-cuda -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker
  • ubuntu-rocm -- Dockerfile for Ubuntu image with ROCm support
  • ubuntu-xpu -- Dockerfile for Ubuntu image with XPU support

Docker CD builds

  • conda - Dockerfile and build.sh to build Docker images used in nightly conda builds
  • manywheel - Dockerfile and build.sh to build Docker images used in nightly manywheel builds
  • libtorch - Dockerfile and build.sh to build Docker images used in nightly libtorch builds

Usage

# Build a specific image
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

# Set flags (see build.sh) and build image
sudo bash -c 'TRITON=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

[Guidance] Adding a New Base Docker Image

Background

The base Docker images in directory .ci/docker/ are built by the docker-builds.yml workflow. Those images are used throughout the PyTorch CI/CD pipeline. You should only create or modify a base Docker image if you need specific environment changes or dependencies before building PyTorch on CI.

  1. Automatic Rebuilding:

    • The Docker image building process is triggered automatically when changes are made to files in the .ci/docker/* directory
    • This ensures all images stay up-to-date with the latest dependencies and configurations
  2. Image Reuse in PyTorch Build Workflows (example: linux-build):

    • The images generated by docker-builds.yml are reused in _linux-build.yml through the calculate-docker-image step
    • The _linux-build.yml workflow:
      • Pulls the Docker image determined by the calculate-docker-image step
      • Runs a Docker container with that image
      • Executes .ci/pytorch/build.sh inside the container to build PyTorch
  3. Usage in Test Workflows (example: linux-test):

    • The same Docker images are also used in _linux-test.yml for running tests
    • The _linux-test.yml workflow follows a similar pattern:
      • It uses the calculate-docker-image step to determine which Docker image to use
      • It pulls the Docker image and runs a container with that image
      • It installs the wheels from the artifacts generated by PyTorch build jobs
      • It executes test scripts (like .ci/pytorch/test.sh or .ci/pytorch/multigpu-test.sh) inside the container

Understanding File Purposes

.ci/docker/build.sh vs .ci/pytorch/build.sh

  • .ci/docker/build.sh:

    • Used for building base Docker images
    • Executed by the docker-builds.yml workflow to pre-build Docker images for CI
    • Contains configurations for different Docker build environments
  • .ci/pytorch/build.sh:

    • Used for building PyTorch inside a Docker container
    • Called by workflows like _linux-build.yml after the Docker container is started
    • Builds PyTorch wheels and other artifacts

.ci/docker/ci_commit_pins/ vs .github/ci_commit_pins

  • .ci/docker/ci_commit_pins/:

    • Used for pinning dependency versions during base Docker image building
    • Ensures consistent environments for building PyTorch
    • Changes here trigger base Docker image rebuilds
  • .github/ci_commit_pins:

    • Used for pinning dependency versions during PyTorch building and tests
    • Ensures consistent dependencies for PyTorch across different builds
    • Used by build scripts running inside Docker containers

Step-by-Step Guide for Adding a New Base Docker Image

1. Add Pinned Commits (If Applicable)

We use pinned commits for build stability. The nightly.yml workflow checks and updates pinned commits for certain repository dependencies daily.

If your new Docker image needs a library installed from a specific pinned commit or built from source:

  1. Add the repository you want to track in nightly.yml and merge-rules.yml
  2. Add the initial pinned commit in .ci/docker/ci_commit_pins/. The text filename should match the one defined in step 1

2. Configure the Base Docker Image

  1. Add new Base Docker image configuration (if applicable):

    Add the configuration in .ci/docker/build.sh. For example:

    pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-new1)
      CUDA_VERSION=12.8.1
      ANACONDA_PYTHON_VERSION=3.12
      GCC_VERSION=11
      VISION=yes
      KATEX=yes
      UCX_COMMIT=${_UCX_COMMIT}
      UCC_COMMIT=${_UCC_COMMIT}
      TRITON=yes
      NEW_ARG_1=yes
      ;;
    
  2. Add build arguments to Docker build command:

    If you're introducing a new argument to the Docker build, make sure to add it in the Docker build step in .ci/docker/build.sh:

    docker build \
      ....
      --build-arg "NEW_ARG_1=${NEW_ARG_1}"
    
  3. Update Dockerfile logic:

    Update the Dockerfile to use the new argument. For example, in ubuntu/Dockerfile:

    ARG NEW_ARG_1
    # Set up environment for NEW_ARG_1
    RUN if [ -n "${NEW_ARG_1}" ]; then bash ./do_something.sh; fi
    
  4. Add the Docker configuration in .github/workflows/docker-builds.yml:

    The docker-builds.yml workflow pre-builds the Docker images whenever changes occur in the .ci/docker/ directory. This includes the pinned commit updates.