Compare commits

..

10 Commits

Author SHA1 Message Date
ce7a76fe2f checkpoint
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
2025-07-17 06:56:28 -07:00
52a4a2c0bb De-abstract premature generalization with InductorWrapper
See docblock on InductorWrapper for the distinction.  This will matter
on a later refactor PR where I will change the signature for one of
these but not the other.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 424efbd46723937636f981fdc1ce4ef4c450f680
Pull-Request: https://github.com/pytorch/pytorch/pull/158528
2025-07-16 19:11:26 -07:00
58e06fb0e3 Rename modules in AOTAutograd
Fixes https://github.com/pytorch/pytorch/issues/158382

```
renamed:    torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py -> torch/_functorch/_aot_autograd/graph_capture.py
renamed:    torch/_functorch/_aot_autograd/traced_function_transforms.py -> torch/_functorch/_aot_autograd/graph_capture_wrappers.py
renamed:    torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py -> torch/_functorch/_aot_autograd/graph_compile.py
```

Everything else is ONLY import changes. I did not rename any functions
even if we probably should have.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 55a78aace396e62401312aed12e520289754cc41
Pull-Request: https://github.com/pytorch/pytorch/pull/158449
2025-07-16 19:11:23 -07:00
c349969ae1 Extract out prepare_aot_module_simplified for use in next PR
Also a small amount of extra code cleanup.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 57a8d92797a15e3909adbce12d6c9799ac876408
Pull-Request: https://github.com/pytorch/pytorch/pull/158319
2025-07-15 12:04:37 -07:00
8c56a9c4d6 Move functions from torch._functorch.aot_autograd that are not frontend functions to frontend_utils
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: fcda69093a13281641acb369f87114f59bb76bef
Pull-Request: https://github.com/pytorch/pytorch/pull/158251
2025-07-15 12:04:36 -07:00
69259c3c92 Introduce stages to aot_dispatch
The starting point for this refactor is that I need access to the fully
general joint graph representation in an export-like interface, but I
then subsequently need a way to feed this joint graph into the rest of
the compilation pipeline so I can get an actual callable that I can run
once I've finished modifying it.  Previously, people had added export
capabilities to AOTAutograd by having an export flag that toggled what
exactly the functions return and triggering aot_dispatch to go to a
different "export" implementation, but I've found this difficult to
understand and has lead to a bit of duplicate code for the export path.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 98401e07b420cc918f7e7c807bd6b0c535b7b451
Pull-Request: https://github.com/pytorch/pytorch/pull/158213
2025-07-15 12:04:36 -07:00
bd9f78cbff Hoist choose_dispatcher to top level, remove unnecessary returns
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 1fc4f6e993adceea2e5f5f8a0133b01fa0a26c0d
Pull-Request: https://github.com/pytorch/pytorch/pull/158176
2025-07-15 12:04:36 -07:00
70d30ea7b0 Pipeline _create_aot_dispatcher_function
Two main things of note:

- Review this diff without whitespace changes
- To ensure that context managers correctly propagate to later pipeline
  stages, I am using the ExitStack trick: there is an ExitStack which is
  in scope for the entire pipeline, and inside of the individual
  pipeline stages we push context managers onto this stack when we want
  them to survive into the next pipeline stage.  This is not obviously
  what the best final form of the code is, but
  create_aot_dispatcher_function is called from multiple locations so I
  can't just inline the context managers into the call site.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 58e968d5fe20d463ec7bb40b49f169951ee905b3
Pull-Request: https://github.com/pytorch/pytorch/pull/158173
2025-07-15 12:04:35 -07:00
cac588065a Inline dispatch_and_compile into its call site.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 3fbda2baf11ca5f434dae016c1f83f2bc5fa5559
Pull-Request: https://github.com/pytorch/pytorch/pull/158150
2025-07-15 12:04:35 -07:00
a79a8a6655 Avoid AOTAutogradCache.load in stack trace on cache miss path
The general context for the upcoming stack of commits is I am attempting
to "pipeline" AOTAutograd.  Instead of having function f call function g
which is the next "stage" of compilation, instead f should return with
its outputs, which are then piped to g for the next stage.  This will
make it easier to implement early exit / resume pipeline without forcing
callback structure, which is good for export-style use cases.  It also
reduces the size of our stack traces, which makes tools like Perfetto
happy.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: c4d2bc54545fc20673f2048d77669dbb2e4053c8
Pull-Request: https://github.com/pytorch/pytorch/pull/158149
2025-07-15 12:04:35 -07:00
1049 changed files with 26899 additions and 28484 deletions

View File

@ -2,7 +2,7 @@ build --cxxopt=--std=c++17
build --copt=-I.
# Bazel does not support including its cc_library targets as system
# headers. We work around this for generated code
# (e.g. torch/headeronly/macros/cmake_macros.h) by making the generated directory a
# (e.g. c10/macros/cmake_macros.h) by making the generated directory a
# system include path.
build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
build --copt=-isystem --copt bazel-out/darwin-fastbuild/bin

View File

@ -5,7 +5,7 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
if [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then
pip install click mock tabulate networkx==2.0
pip -q install "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
pip -q install --user "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
fi
# Skip tests in environments where they are not built/applicable
@ -147,8 +147,8 @@ export DNNL_MAX_CPU_ISA=AVX2
if [[ "${SHARD_NUMBER:-1}" == "1" ]]; then
# TODO(sdym@meta.com) remove this when the linked issue resolved.
# py is temporary until https://github.com/Teemu/pytest-sugar/issues/241 is fixed
pip install py==1.11.0
pip install pytest-sugar
pip install --user py==1.11.0
pip install --user pytest-sugar
# NB: Warnings are disabled because they make it harder to see what
# the actual erroring test is
"$PYTHON" \

View File

@ -36,105 +36,3 @@ See `build.sh` for valid build environments (it's the giant switch).
# Set flags (see build.sh) and build image
sudo bash -c 'TRITON=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
```
## [Guidance] Adding a New Base Docker Image
### Background
The base Docker images in directory `.ci/docker/` are built by the `docker-builds.yml` workflow. Those images are used throughout the PyTorch CI/CD pipeline. You should only create or modify a base Docker image if you need specific environment changes or dependencies before building PyTorch on CI.
1. **Automatic Rebuilding**:
- The Docker image building process is triggered automatically when changes are made to files in the `.ci/docker/*` directory
- This ensures all images stay up-to-date with the latest dependencies and configurations
2. **Image Reuse in PyTorch Build Workflows** (example: linux-build):
- The images generated by `docker-builds.yml` are reused in `_linux-build.yml` through the `calculate-docker-image` step
- The `_linux-build.yml` workflow:
- Pulls the Docker image determined by the `calculate-docker-image` step
- Runs a Docker container with that image
- Executes `.ci/pytorch/build.sh` inside the container to build PyTorch
3. **Usage in Test Workflows** (example: linux-test):
- The same Docker images are also used in `_linux-test.yml` for running tests
- The `_linux-test.yml` workflow follows a similar pattern:
- It uses the `calculate-docker-image` step to determine which Docker image to use
- It pulls the Docker image and runs a container with that image
- It installs the wheels from the artifacts generated by PyTorch build jobs
- It executes test scripts (like `.ci/pytorch/test.sh` or `.ci/pytorch/multigpu-test.sh`) inside the container
### Understanding File Purposes
#### `.ci/docker/build.sh` vs `.ci/pytorch/build.sh`
- **`.ci/docker/build.sh`**:
- Used for building base Docker images
- Executed by the `docker-builds.yml` workflow to pre-build Docker images for CI
- Contains configurations for different Docker build environments
- **`.ci/pytorch/build.sh`**:
- Used for building PyTorch inside a Docker container
- Called by workflows like `_linux-build.yml` after the Docker container is started
- Builds PyTorch wheels and other artifacts
#### `.ci/docker/ci_commit_pins/` vs `.github/ci_commit_pins`
- **`.ci/docker/ci_commit_pins/`**:
- Used for pinning dependency versions during base Docker image building
- Ensures consistent environments for building PyTorch
- Changes here trigger base Docker image rebuilds
- **`.github/ci_commit_pins`**:
- Used for pinning dependency versions during PyTorch building and tests
- Ensures consistent dependencies for PyTorch across different builds
- Used by build scripts running inside Docker containers
### Step-by-Step Guide for Adding a New Base Docker Image
#### 1. Add Pinned Commits (If Applicable)
We use pinned commits for build stability. The `nightly.yml` workflow checks and updates pinned commits for certain repository dependencies daily.
If your new Docker image needs a library installed from a specific pinned commit or built from source:
1. Add the repository you want to track in `nightly.yml` and `merge-rules.yml`
2. Add the initial pinned commit in `.ci/docker/ci_commit_pins/`. The text filename should match the one defined in step 1
#### 2. Configure the Base Docker Image
1. **Add new Base Docker image configuration** (if applicable):
Add the configuration in `.ci/docker/build.sh`. For example:
```bash
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-new1)
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
NEW_ARG_1=yes
;;
```
2. **Add build arguments to Docker build command**:
If you're introducing a new argument to the Docker build, make sure to add it in the Docker build step in `.ci/docker/build.sh`:
```bash
docker build \
....
--build-arg "NEW_ARG_1=${NEW_ARG_1}"
```
3. **Update Dockerfile logic**:
Update the Dockerfile to use the new argument. For example, in `ubuntu/Dockerfile`:
```dockerfile
ARG NEW_ARG_1
# Set up environment for NEW_ARG_1
RUN if [ -n "${NEW_ARG_1}" ]; then bash ./do_something.sh; fi
```
4. **Add the Docker configuration** in `.github/workflows/docker-builds.yml`:
The `docker-builds.yml` workflow pre-builds the Docker images whenever changes occur in the `.ci/docker/` directory. This includes the
pinned commit updates.

View File

@ -91,17 +91,6 @@ tag=$(echo $image | awk -F':' '{print $2}')
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$tag" in
pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11)
CUDA_VERSION=12.4
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11)
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
@ -160,17 +149,6 @@ case "$tag" in
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-vllm)
CUDA_VERSION=12.8.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
;;
pytorch-linux-jammy-cuda12.6-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.6
CUDNN_VERSION=9
@ -242,6 +220,18 @@ case "$tag" in
VISION=yes
TRITON=yes
;;
pytorch-linux-jammy-rocm-n-1-py3)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
VISION=yes
ROCM_VERSION=6.3
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-rocm-n-py3 | pytorch-linux-noble-rocm-n-py3)
if [[ $tag =~ "jammy" ]]; then
ANACONDA_PYTHON_VERSION=3.10
@ -258,19 +248,6 @@ case "$tag" in
UCC_COMMIT=${_UCC_COMMIT}
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-noble-rocm-alpha-py3)
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
VISION=yes
ROCM_VERSION=7.0
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
INDUCTOR_BENCHMARKS=yes
PYTORCH_ROCM_ARCH="gfx90a;gfx942;gfx950"
;;
pytorch-linux-jammy-xpu-2025.0-py3)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
@ -287,7 +264,7 @@ case "$tag" in
NINJA_VERSION=1.9.0
TRITON=yes
;;
pytorch-linux-jammy-py3.9-gcc11-inductor-benchmarks)
pytorch-linux-jammy-py3.9-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
VISION=yes

View File

@ -1 +1 @@
11ec6354315768a85da41032535e3b7b99c5f706
ae848267bebc65c6181e8cc5e64a6357d2679260

View File

@ -4,8 +4,12 @@ set -ex
# Optionally install conda
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download" # @lint-ignore
CONDA_FILE="Miniforge3-Linux-$(uname -m).sh"
BASE_URL="https://repo.anaconda.com/miniconda"
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
if [[ $(uname -m) == "aarch64" ]] || [[ "$BUILD_ENVIRONMENT" == *xpu* ]] || [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download" # @lint-ignore
CONDA_FILE="Miniforge3-Linux-$(uname -m).sh"
fi
MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)
MINOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 2)
@ -17,6 +21,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
exit 1
;;
esac
mkdir -p /opt/conda
chown jenkins:jenkins /opt/conda

View File

@ -78,19 +78,6 @@ function install_nvshmem {
echo "nvSHMEM ${nvshmem_version} for CUDA ${cuda_major_version} (${arch_path}) installed."
}
function install_124 {
CUDNN_VERSION=9.1.0.70
echo "Installing CUDA 12.4.1 and cuDNN ${CUDNN_VERSION} and NCCL and cuSparseLt-0.6.2"
install_cuda 12.4.1 cuda_12.4.1_550.54.15_linux
install_cudnn 12 $CUDNN_VERSION
CUDA_VERSION=12.4 bash install_nccl.sh
CUDA_VERSION=12.4 bash install_cusparselt.sh
ldconfig
}
function install_126 {
CUDNN_VERSION=9.10.2.21
@ -126,40 +113,6 @@ function install_129 {
ldconfig
}
function prune_124 {
echo "Pruning CUDA 12.4"
#####################################################################################
# CUDA 12.4 prune static libs
#####################################################################################
export NVPRUNE="/usr/local/cuda-12.4/bin/nvprune"
export CUDA_LIB_DIR="/usr/local/cuda-12.4/lib64"
export GENCODE="-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
export GENCODE_CUDNN="-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
if [[ -n "$OVERRIDE_GENCODE" ]]; then
export GENCODE=$OVERRIDE_GENCODE
fi
if [[ -n "$OVERRIDE_GENCODE_CUDNN" ]]; then
export GENCODE_CUDNN=$OVERRIDE_GENCODE_CUDNN
fi
# all CUDA libs except CuDNN and CuBLAS
ls $CUDA_LIB_DIR/ | grep "\.a" | grep -v "culibos" | grep -v "cudart" | grep -v "cudnn" | grep -v "cublas" | grep -v "metis" \
| xargs -I {} bash -c \
"echo {} && $NVPRUNE $GENCODE $CUDA_LIB_DIR/{} -o $CUDA_LIB_DIR/{}"
# prune CuDNN and CuBLAS
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublas_static.a -o $CUDA_LIB_DIR/libcublas_static.a
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublasLt_static.a -o $CUDA_LIB_DIR/libcublasLt_static.a
#####################################################################################
# CUDA 12.4 prune visual tools
#####################################################################################
export CUDA_BASE="/usr/local/cuda-12.4/"
rm -rf $CUDA_BASE/libnvvp $CUDA_BASE/nsightee_plugins $CUDA_BASE/nsight-compute-2024.1.0 $CUDA_BASE/nsight-systems-2023.4.4/
}
function prune_126 {
echo "Pruning CUDA 12.6"
#####################################################################################
@ -216,8 +169,6 @@ function install_128 {
while test $# -gt 0
do
case "$1" in
12.4) install_124; prune_124
;;
12.6|12.6.*) install_126; prune_126
;;
12.8|12.8.*) install_128;

View File

@ -8,8 +8,6 @@ if [[ -n "${CUDNN_VERSION}" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.10.2.21_cuda12-archive"
elif [[ ${CUDA_VERSION:0:4} == "12.6" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.10.2.21_cuda12-archive"
elif [[ ${CUDA_VERSION:0:4} == "12.4" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.10.2.21_cuda12-archive"
elif [[ ${CUDA_VERSION:0:2} == "11" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda11-archive"
else

View File

@ -13,14 +13,6 @@ if [[ ${CUDA_VERSION:0:4} =~ ^12\.[5-9]$ ]]; then
fi
CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.7.1.0-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "12.4" ]]; then
arch_path='sbsa'
export TARGETARCH=${TARGETARCH:-$(uname -m)}
if [ ${TARGETARCH} = 'amd64' ] || [ "${TARGETARCH}" = 'x86_64' ]; then
arch_path='x86_64'
fi
CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.6.2.3-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz
else
echo "Not sure which libcusparselt version to install for this ${CUDA_VERSION}"
fi

View File

@ -33,22 +33,13 @@ EOF
ROCM_VERSION="${ROCM_VERSION}.1"
fi
# Default url values
rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu"
# Special case for ROCM_VERSION == 7.0
if [[ $(ver "$ROCM_VERSION") -eq $(ver 7.0) ]]; then
rocm_baseurl="https://repo.radeon.com/rocm/apt/7.0_alpha2"
amdgpu_baseurl="https://repo.radeon.com/amdgpu/30.10_alpha2/ubuntu"
fi
# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
echo "deb [arch=amd64] https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
# Add rocm repository
wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
echo "deb [arch=amd64] ${rocm_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/rocm.list
apt-get update --allow-insecure-repositories
@ -82,30 +73,30 @@ EOF
done
# ROCm 6.3 had a regression where initializing static code objects had significant overhead
# CI no longer builds for ROCm 6.3, but
# ROCm 6.4 did not yet fix the regression, also HIP branch names are different
if [[ $(ver $ROCM_VERSION) -ge $(ver 6.4) ]] && [[ $(ver $ROCM_VERSION) -lt $(ver 7.0) ]]; then
if [[ $(ver $ROCM_VERSION) -ge $(ver 6.3) ]] && [[ $(ver $ROCM_VERSION) -lt $(ver 7.0) ]]; then
if [[ $(ver $ROCM_VERSION) -eq $(ver 6.4.1) ]]; then
HIP_BRANCH=release/rocm-rel-6.4
CLR_HASH=606bc820b4b1f315d135da02a1f0b176ca50a92c # branch release/rocm-rel-6.4.1-statco-hotfix
VER_STR=6.4
VER_PATCH=.1
elif [[ $(ver $ROCM_VERSION) -eq $(ver 6.4) ]]; then
HIP_BRANCH=release/rocm-rel-6.4
CLR_HASH=600f5b0d2baed94d5121e2174a9de0851b040b0c # branch release/rocm-rel-6.4-statco-hotfix
VER_STR=6.4
elif [[ $(ver $ROCM_VERSION) -eq $(ver 6.3) ]]; then
HIP_BRANCH=rocm-6.3.x
VER_STR=6.3
fi
# clr build needs CppHeaderParser but can only find it using conda's python
python -m pip install CppHeaderParser
git clone https://github.com/ROCm/HIP -b $HIP_BRANCH
HIP_COMMON_DIR=$(readlink -f HIP)
git clone https://github.com/jeffdaily/clr
pushd clr
git checkout $CLR_HASH
popd
git clone https://github.com/jeffdaily/clr -b release/rocm-rel-${VER_STR}${VER_PATCH}-statco-hotfix
mkdir -p clr/build
pushd clr/build
# Need to point CMake to the correct python installation to find CppHeaderParser
cmake .. -DPython3_EXECUTABLE=/opt/conda/envs/py_${ANACONDA_PYTHON_VERSION}/bin/python3 -DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=$HIP_COMMON_DIR
make -j
cp hipamd/lib/libamdhip64.so.6.4.* /opt/rocm/lib/libamdhip64.so.6.4.*
cp hipamd/lib/libamdhip64.so.${VER_STR}.* /opt/rocm/lib/libamdhip64.so.${VER_STR}.*
popd
rm -rf HIP clr
fi

View File

@ -27,7 +27,5 @@ COPY ./common/install_linter.sh install_linter.sh
RUN bash ./install_linter.sh
RUN rm install_linter.sh
RUN chown -R jenkins:jenkins /var/lib/jenkins/ci_env
USER jenkins
CMD ["bash"]

View File

@ -367,6 +367,7 @@ pyyaml
pyzstd
setuptools>=70.1.0
six
wheel
scons==4.5.2 ; platform_machine == "aarch64"
@ -389,9 +390,3 @@ tlparse==0.3.30
cuda-bindings>=12.0,<13.0 ; platform_machine != "s390x"
#Description: required for testing CUDAGraph::raw_cuda_graph(). See https://nvidia.github.io/cuda-python/cuda-bindings/latest/support.html for how this version was chosen. Note "Any fix in the latest bindings would be backported to the prior major version" means that only the newest version of cuda-bindings will get fixes. Depending on the latest version of 12.x is okay because all 12.y versions will be supported via "CUDA minor version compatibility". Pytorch builds against 13.z versions of cuda toolkit work with 12.x versions of cuda-bindings as well because newer drivers work with old toolkits.
#test that import: test_cuda.py
setuptools-git-versioning==2.1.0
scikit-build==0.18.1
pyre-extensions==0.0.32
tabulate==0.9.0
#Description: These package are needed to build FBGEMM and torchrec on PyTorch CI

View File

@ -4,7 +4,7 @@ sphinx==5.3.0
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git@pytorch_sphinx_theme2#egg=pytorch_sphinx_theme2
# TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering
# but it doesn't seem to work and hangs around idly. The initial thought that it is probably
# but it doesn't seem to work and hangs around idly. The initial thought is probably
# something related to Docker setup. We can investigate this later.
sphinxcontrib.katex==0.8.6
@ -59,4 +59,3 @@ sphinx-copybutton==0.5.0
sphinx-design==0.4.0
sphinxcontrib-mermaid==1.0.0
myst-parser==0.18.1
myst-nb

View File

@ -97,7 +97,8 @@ if [[ -z "$PYTORCH_ROOT" ]]; then
exit 1
fi
pushd "$PYTORCH_ROOT"
retry pip install -qUr requirements-build.txt
retry pip install -q "setuptools>=70.1.0" packaging
retry pip install -qU cmake ninja
python setup.py clean
retry pip install -qr requirements.txt
case ${DESIRED_PYTHON} in

View File

@ -92,7 +92,8 @@ if [[ -z "$PYTORCH_ROOT" ]]; then
exit 1
fi
pushd "$PYTORCH_ROOT"
retry pip install -qUr requirements-build.txt
retry pip install -q "setuptools>=70.1.0" packaging
retry pip install -qU cmake ninja
python setup.py clean
retry pip install -qr requirements.txt
retry pip install -q numpy==2.0.1

View File

@ -19,7 +19,7 @@ git config --global --add safe.directory /var/lib/jenkins/workspace
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
# TODO: This can be removed later once vision is also part of the Docker image
pip install -q --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"
pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# NB: ONNX test is fast (~15m) so it's ok to retry it few more times to avoid any flaky issue, we

View File

@ -306,22 +306,6 @@ else
fi
pip_install_whl "$(echo dist/*.whl)"
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *vision* ]]; then
install_torchvision
fi
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *audio* ]]; then
install_torchaudio
fi
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *torchrec* || "${BUILD_ADDITIONAL_PACKAGES:-}" == *fbgemm* ]]; then
install_torchrec_and_fbgemm
fi
if [[ "${BUILD_ADDITIONAL_PACKAGES:-}" == *torchao* ]]; then
install_torchao
fi
if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
echo "Checking that xpu is compiled"
pushd dist/

View File

@ -78,34 +78,6 @@ function pip_install_whl() {
fi
}
function pip_build_and_install() {
local build_target=$1
local wheel_dir=$2
local found_whl=0
for file in "${wheel_dir}"/*.whl
do
if [[ -f "${file}" ]]; then
found_whl=1
break
fi
done
# Build the wheel if it doesn't exist
if [ "${found_whl}" == "0" ]; then
python3 -m pip wheel \
--no-build-isolation \
--no-deps \
--no-use-pep517 \
-w "${wheel_dir}" \
"${build_target}"
fi
for file in "${wheel_dir}"/*.whl
do
pip_install_whl "${file}"
done
}
function pip_install() {
# retry 3 times
@ -152,7 +124,14 @@ function get_pinned_commit() {
function install_torchaudio() {
local commit
commit=$(get_pinned_commit audio)
pip_build_and_install "git+https://github.com/pytorch/audio.git@${commit}" dist/audio
if [[ "$1" == "cuda" ]]; then
# TODO: This is better to be passed as a parameter from _linux-test workflow
# so that it can be consistent with what is set in build
TORCH_CUDA_ARCH_LIST="8.0;8.6" pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git@${commit}"
else
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git@${commit}"
fi
}
function install_torchtext() {
@ -160,8 +139,8 @@ function install_torchtext() {
local text_commit
data_commit=$(get_pinned_commit data)
text_commit=$(get_pinned_commit text)
pip_build_and_install "git+https://github.com/pytorch/data.git@${data_commit}" dist/data
pip_build_and_install "git+https://github.com/pytorch/text.git@${text_commit}" dist/text
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/data.git@${data_commit}"
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/text.git@${text_commit}"
}
function install_torchvision() {
@ -174,14 +153,7 @@ function install_torchvision() {
echo 'char* dlerror(void) { return "";}'|gcc -fpic -shared -o "${HOME}/dlerror.so" -x c -
LD_PRELOAD=${orig_preload}:${HOME}/dlerror.so
fi
if [[ "${BUILD_ENVIRONMENT}" == *cuda* ]]; then
# Not sure if both are needed, but why not
export FORCE_CUDA=1
export WITH_CUDA=1
fi
pip_build_and_install "git+https://github.com/pytorch/vision.git@${commit}" dist/vision
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/vision.git@${commit}"
if [ -n "${LD_PRELOAD}" ]; then
LD_PRELOAD=${orig_preload}
fi
@ -201,73 +173,25 @@ function install_torchrec_and_fbgemm() {
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]] ; then
# install torchrec first because it installs fbgemm nightly on top of rocm fbgemm
pip_build_and_install "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}" dist/torchrec
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}"
pip_uninstall fbgemm-gpu-nightly
# Set ROCM_HOME isn't available, use ROCM_PATH if set or /opt/rocm
ROCM_HOME="${ROCM_HOME:-${ROCM_PATH:-/opt/rocm}}"
# Find rocm_version.h header file for ROCm version extract
rocm_version_h="${ROCM_HOME}/include/rocm-core/rocm_version.h"
if [ ! -f "$rocm_version_h" ]; then
rocm_version_h="${ROCM_HOME}/include/rocm_version.h"
fi
# Error out if rocm_version.h not found
if [ ! -f "$rocm_version_h" ]; then
echo "Error: rocm_version.h not found in expected locations." >&2
exit 1
fi
# Extract major, minor and patch ROCm version numbers
MAJOR_VERSION=$(grep 'ROCM_VERSION_MAJOR' "$rocm_version_h" | awk '{print $3}')
MINOR_VERSION=$(grep 'ROCM_VERSION_MINOR' "$rocm_version_h" | awk '{print $3}')
PATCH_VERSION=$(grep 'ROCM_VERSION_PATCH' "$rocm_version_h" | awk '{print $3}')
ROCM_INT=$((MAJOR_VERSION * 10000 + MINOR_VERSION * 100 + PATCH_VERSION))
echo "ROCm version: $ROCM_INT"
export BUILD_ROCM_VERSION="$MAJOR_VERSION.$MINOR_VERSION"
pip_install tabulate # needed for newer fbgemm
pip_install patchelf # needed for rocm fbgemm
pushd /tmp
local wheel_dir=dist/fbgemm_gpu
local found_whl=0
for file in "${wheel_dir}"/*.whl
do
if [[ -f "${file}" ]]; then
found_whl=1
break
fi
done
# Build the wheel if it doesn't exist
if [ "${found_whl}" == "0" ]; then
git clone --recursive https://github.com/pytorch/fbgemm
pushd fbgemm/fbgemm_gpu
git checkout "${fbgemm_commit}"
python setup.py bdist_wheel \
--build-variant=rocm \
-DHIP_ROOT_DIR="${ROCM_PATH}" \
-DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
-DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
popd
# Save the wheel before cleaning up
mkdir -p dist/fbgemm_gpu
cp fbgemm/fbgemm_gpu/dist/*.whl dist/fbgemm_gpu
fi
for file in "${wheel_dir}"/*.whl
do
pip_install_whl "${file}"
done
rm -rf fbgemm
git clone --recursive https://github.com/pytorch/fbgemm
pushd fbgemm/fbgemm_gpu
git checkout "${fbgemm_commit}"
python setup.py install \
--package_variant=rocm \
-DHIP_ROOT_DIR="${ROCM_PATH}" \
-DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
-DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
popd
rm -rf fbgemm
else
pip_build_and_install "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}" dist/torchrec
pip_build_and_install "git+https://github.com/pytorch/FBGEMM.git@${fbgemm_commit}#subdirectory=fbgemm_gpu" dist/fbgemm_gpu
# See https://github.com/pytorch/pytorch/issues/106971
CUDA_PATH=/usr/local/cuda-12.1 pip_install --no-use-pep517 --user "git+https://github.com/pytorch/FBGEMM.git@${fbgemm_commit}#egg=fbgemm-gpu&subdirectory=fbgemm_gpu"
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}"
fi
}
@ -310,7 +234,7 @@ function checkout_install_torchbench() {
function install_torchao() {
local commit
commit=$(get_pinned_commit torchao)
pip_build_and_install "git+https://github.com/pytorch/ao.git@${commit}" dist/ao
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/ao.git@${commit}"
}
function print_sccache_stats() {

View File

@ -74,13 +74,12 @@ else
fi
# Environment initialization
retry pip install -qUr requirements-build.txt
if [[ "$(uname)" == Darwin ]]; then
# Install the testing dependencies
retry pip install -q future hypothesis ${NUMPY_PACKAGE} ${PROTOBUF_PACKAGE} pytest
retry pip install -q future hypothesis ${NUMPY_PACKAGE} ${PROTOBUF_PACKAGE} pytest setuptools six typing_extensions pyyaml
else
retry pip install -qr requirements.txt || true
retry pip install -q hypothesis protobuf pytest || true
retry pip install -q hypothesis protobuf pytest setuptools || true
numpy_ver=1.15
case "$(python --version 2>&1)" in
*2* | *3.5* | *3.6*)

View File

@ -201,7 +201,7 @@ fi
if [[ "$BUILD_ENVIRONMENT" != *-bazel-* ]] ; then
# JIT C++ extensions require ninja.
pip_install "ninja==1.10.2"
pip_install --user "ninja==1.10.2"
# ninja is installed in $HOME/.local/bin, e.g., /var/lib/jenkins/.local/bin for CI user jenkins
# but this script should be runnable by any user, including root
export PATH="$HOME/.local/bin:$PATH"
@ -289,12 +289,6 @@ elif [[ $TEST_CONFIG == 'nogpu_AVX512' ]]; then
export ATEN_CPU_CAPABILITY=avx2
fi
if [[ "${TEST_CONFIG}" == "legacy_nvidia_driver" ]]; then
# Make sure that CUDA can be initialized
(cd test && python -c "import torch; torch.rand(2, 2, device='cuda')")
export USE_LEGACY_DRIVER=1
fi
test_python_legacy_jit() {
time python test/run_test.py --include test_jit_legacy test_jit_fuser_legacy --verbose
assert_git_not_dirty
@ -345,12 +339,6 @@ test_h100_symm_mem() {
assert_git_not_dirty
}
test_h100_cutlass_backend() {
# cutlass backend tests for H100
TORCHINDUCTOR_CUTLASS_DIR=$(realpath "./third_party/cutlass") python test/run_test.py --include inductor/test_cutlass_backend -k "not addmm" $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
TORCHINDUCTOR_CUTLASS_DIR=$(realpath "./third_party/cutlass") python test/run_test.py --include inductor/test_cutlass_evt $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
}
test_lazy_tensor_meta_reference_disabled() {
export TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1
echo "Testing lazy tensor operations without meta reference"
@ -508,7 +496,7 @@ DYNAMO_BENCHMARK_FLAGS=()
pr_time_benchmarks() {
pip_install "fbscribelogger"
pip_install --user "fbscribelogger"
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
@ -1483,8 +1471,8 @@ test_bazel() {
test_benchmarks() {
if [[ "$BUILD_ENVIRONMENT" == *cuda* && $TEST_CONFIG != *nogpu* ]]; then
pip_install "pytest-benchmark==3.2.3"
pip_install "requests"
pip_install --user "pytest-benchmark==3.2.3"
pip_install --user "requests"
BENCHMARK_DATA="benchmarks/.data"
mkdir -p ${BENCHMARK_DATA}
pytest benchmarks/fastrnns/test_bench.py --benchmark-sort=Name --benchmark-json=${BENCHMARK_DATA}/fastrnns_default.json --fuser=default --executor=default
@ -1612,13 +1600,7 @@ if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* || "${BUILD_ENVIRONMENT}" == *-baze
fi
if [[ "${TEST_CONFIG}" == *numpy_2* ]]; then
# Install numpy-2.0.2 and compatible scipy & numba versions
# Force re-install of pandas to avoid error where pandas checks numpy version from initial install and fails upon import
TMP_PANDAS_VERSION=$(python -c "import pandas; print(pandas.__version__)" 2>/dev/null)
if [ -n "$TMP_PANDAS_VERSION" ]; then
python -m pip install --pre numpy==2.0.2 scipy==1.13.1 numba==0.60.0 pandas=="$TMP_PANDAS_VERSION" --force-reinstall
else
python -m pip install --pre numpy==2.0.2 scipy==1.13.1 numba==0.60.0
fi
python -mpip install --pre numpy==2.0.2 scipy==1.13.1 numba==0.60.0
python test/run_test.py --include dynamo/test_functions.py dynamo/test_unspec.py test_binary_ufuncs.py test_fake_tensor.py test_linalg.py test_numpy_interop.py test_tensor_creation_ops.py test_torch.py torch_np/test_basic.py
elif [[ "${BUILD_ENVIRONMENT}" == *aarch64* && "${TEST_CONFIG}" != *perf_cpu_aarch64* ]]; then
test_linux_aarch64
@ -1672,19 +1654,23 @@ elif [[ "${TEST_CONFIG}" == *timm* ]]; then
id=$((SHARD_NUMBER-1))
test_dynamo_benchmark timm_models "$id"
elif [[ "${TEST_CONFIG}" == cachebench ]]; then
install_torchaudio
install_torchaudio cuda
install_torchvision
checkout_install_torchbench nanogpt BERT_pytorch resnet50 hf_T5 llama moco
PYTHONPATH=$(pwd)/torchbench test_cachebench
elif [[ "${TEST_CONFIG}" == verify_cachebench ]]; then
install_torchaudio
install_torchaudio cpu
install_torchvision
checkout_install_torchbench nanogpt
PYTHONPATH=$(pwd)/torchbench test_verify_cachebench
elif [[ "${TEST_CONFIG}" == *torchbench* ]]; then
install_torchaudio
if [[ "${TEST_CONFIG}" == *cpu* ]]; then
install_torchaudio cpu
else
install_torchaudio cuda
fi
install_torchvision
install_torchao
TORCH_CUDA_ARCH_LIST="8.0;8.6" install_torchao
id=$((SHARD_NUMBER-1))
# https://github.com/opencv/opencv-python/issues/885
pip_install opencv-python==4.8.0.74
@ -1775,8 +1761,6 @@ elif [[ "${TEST_CONFIG}" == h100_distributed ]]; then
test_h100_distributed
elif [[ "${TEST_CONFIG}" == "h100-symm-mem" ]]; then
test_h100_symm_mem
elif [[ "${TEST_CONFIG}" == h100_cutlass_backend ]]; then
test_h100_cutlass_backend
else
install_torchvision
install_monkeytype

View File

@ -1,34 +0,0 @@
# If you want to rebuild, run this with $env:REBUILD=1
# If you want to build with CUDA, run this with $env:USE_CUDA=1
# If you want to build without CUDA, run this with $env:USE_CUDA=0
# Check for setup.py in the current directory
if (-not (Test-Path "setup.py")) {
Write-Host "ERROR: Please run this build script from PyTorch root directory."
exit 1
}
# Get the script's parent directory
$ScriptParentDir = Split-Path -Parent $MyInvocation.MyCommand.Definition
# Set TMP_DIR and convert to Windows path
$env:TMP_DIR = Join-Path (Get-Location) "build\win_tmp"
$env:TMP_DIR_WIN = $env:TMP_DIR # Already in Windows format, no cygpath needed
# Set final package directory with default fallback
if (-not $env:PYTORCH_FINAL_PACKAGE_DIR) {
$env:PYTORCH_FINAL_PACKAGE_DIR = "C:\w\build-results"
}
# Create the final package directory if it doesn't exist
if (-not (Test-Path $env:PYTORCH_FINAL_PACKAGE_DIR)) {
New-Item -Path $env:PYTORCH_FINAL_PACKAGE_DIR -ItemType Directory -Force | Out-Null
}
# Set script helpers directory
$env:SCRIPT_HELPERS_DIR = Join-Path $ScriptParentDir "win-test-helpers\arm64"
# Run the main build script
& "$env:SCRIPT_HELPERS_DIR\build_pytorch.ps1"
Write-Host "BUILD PASSED"

View File

@ -1,24 +0,0 @@
#!/bin/bash
set -ex -o pipefail
SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
# shellcheck source=./common.sh
source "$SCRIPT_PARENT_DIR/common.sh"
run_tests() {
echo Running smoke_test.py...
python ./.ci/pytorch/smoke_test/smoke_test.py --package torchonly
echo Running test_autograd.oy, test_nn.py, test_torch.py...
cd test
CORE_TEST_LIST=("test_autograd.py" "test_nn.py" "test_modules.py")
for t in "${CORE_TEST_LIST[@]}"; do
echo "Running test: $t"
python "$t" --verbose --save-xml --use-pytest -vvvv -rfEsxXP -p no:xdist
done
}
run_tests
echo "TEST PASSED"

View File

@ -1,98 +0,0 @@
# TODO: we may can use existing build_pytorch.bat for arm64
if ($env:DEBUG -eq "1") {
$env:BUILD_TYPE = "debug"
} else {
$env:BUILD_TYPE = "release"
}
# This inflates our log size slightly, but it is REALLY useful to be
# able to see what our cl.exe commands are. (since you can actually
# just copy-paste them into a local Windows setup to just rebuild a
# single file.)
# log sizes are too long, but leaving this here in case someone wants to use it locally
# $env:CMAKE_VERBOSE_MAKEFILE = "1"
$env:INSTALLER_DIR = Join-Path $env:SCRIPT_HELPERS_DIR "installation-helpers"
cd ..
# Environment variables
$env:SCCACHE_IDLE_TIMEOUT = "0"
$env:SCCACHE_IGNORE_SERVER_IO_ERROR = "1"
$env:CMAKE_BUILD_TYPE = $env:BUILD_TYPE
$env:CMAKE_C_COMPILER_LAUNCHER = "sccache"
$env:CMAKE_CXX_COMPILER_LAUNCHER = "sccache"
$env:libuv_ROOT = Join-Path $env:DEPENDENCIES_DIR "libuv\install"
$env:MSSdk = "1"
if ($env:PYTORCH_BUILD_VERSION) {
$env:PYTORCH_BUILD_VERSION = $env:PYTORCH_BUILD_VERSION
$env:PYTORCH_BUILD_NUMBER = "1"
}
$env:CMAKE_POLICY_VERSION_MINIMUM = "3.5"
# Set BLAS type
if ($env:ENABLE_APL -eq "1") {
$env:BLAS = "APL"
$env:USE_LAPACK = "1"
} elseif ($env:ENABLE_OPENBLAS -eq "1") {
$env:BLAS = "OpenBLAS"
$env:OpenBLAS_HOME = Join-Path $env:DEPENDENCIES_DIR "OpenBLAS\install"
}
# Change to source directory
Set-Location $env:PYTORCH_ROOT
# Copy libuv.dll
Copy-Item -Path (Join-Path $env:libuv_ROOT "lib\Release\uv.dll") -Destination "torch\lib\uv.dll" -Force
# Create virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1
where.exe python
# Python install dependencies
python -m pip install --upgrade pip
pip install setuptools pyyaml
pip install -r requirements.txt
# Set after installing psutil
$env:DISTUTILS_USE_SDK = "1"
# Print all environment variables
Get-ChildItem Env:
# Start and inspect sccache
sccache --start-server
sccache --zero-stats
sccache --show-stats
# Build the wheel
python setup.py bdist_wheel
if ($LASTEXITCODE -ne 0) { exit 1 }
# Install the wheel locally
$whl = Get-ChildItem -Path "dist\*.whl" | Select-Object -First 1
if ($whl) {
python -mpip install --no-index --no-deps $whl.FullName
}
# Copy final wheel
robocopy "dist" "$env:PYTORCH_FINAL_PACKAGE_DIR" *.whl
# Export test times
python tools/stats/export_test_times.py
# Copy additional CI files
robocopy ".additional_ci_files" "$env:PYTORCH_FINAL_PACKAGE_DIR\.additional_ci_files" /E
# Save ninja log
Copy-Item -Path "build\.ninja_log" -Destination $env:PYTORCH_FINAL_PACKAGE_DIR -Force
# Final sccache stats and stop
sccache --show-stats
sccache --stop-server
exit 0

View File

@ -37,10 +37,10 @@ IF "%CUDA_PATH_V129%"=="" (
)
IF "%BUILD_VISION%" == "" (
set TORCH_CUDA_ARCH_LIST=7.0;7.5;8.0;8.6;9.0;10.0;12.0
set TORCH_CUDA_ARCH_LIST=7.5;8.0;8.6;9.0;10.0;12.0
set TORCH_NVCC_FLAGS=-Xfatbin -compress-all
) ELSE (
set NVCC_FLAGS=-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_100,code=compute_100 -gencode=arch=compute_120,code=compute_120
set NVCC_FLAGS=-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_100,code=compute_100 -gencode=arch=compute_120,code=compute_120
)
set "CUDA_PATH=%CUDA_PATH_V129%"

View File

@ -148,7 +148,14 @@ if "%NVIDIA_GPU_EXISTS%" == "0" (
goto end
)
cl %PYTORCH_ROOT%\.ci\pytorch\test_example_code\check-torch-cuda.cpp torch_cpu.lib c10.lib torch_cuda.lib /EHsc /std:c++17 /link /INCLUDE:?warp_size@cuda@at@@YAHXZ
set BUILD_SPLIT_CUDA=
if exist "%install_root%\lib\torch_cuda_cu.lib" if exist "%install_root%\lib\torch_cuda_cpp.lib" set BUILD_SPLIT_CUDA=ON
if "%BUILD_SPLIT_CUDA%" == "ON" (
cl %PYTORCH_ROOT%\.ci\pytorch\test_example_code\check-torch-cuda.cpp torch_cpu.lib c10.lib torch_cuda_cu.lib torch_cuda_cpp.lib /EHsc /std:c++17 /link /INCLUDE:?warp_size@cuda@at@@YAHXZ /INCLUDE:?_torch_cuda_cu_linker_symbol_op_cuda@native@at@@YA?AVTensor@2@AEBV32@@Z
) else (
cl %PYTORCH_ROOT%\.ci\pytorch\test_example_code\check-torch-cuda.cpp torch_cpu.lib c10.lib torch_cuda.lib /EHsc /std:c++17 /link /INCLUDE:?warp_size@cuda@at@@YAHXZ
)
.\check-torch-cuda.exe
if ERRORLEVEL 1 exit /b 1

View File

@ -184,8 +184,7 @@ tmp_env_name="wheel_py$python_nodot"
conda create ${EXTRA_CONDA_INSTALL_FLAGS} -yn "$tmp_env_name" python="$desired_python" ${CONDA_ENV_CREATE_FLAGS}
source activate "$tmp_env_name"
retry pip install -r "${pytorch_rootdir}/requirements-build.txt"
pip install "numpy=${NUMPY_PINNED_VERSION}" "pyyaml${PYYAML_PINNED_VERSION}" requests ninja "setuptools${SETUPTOOLS_PINNED_VERSION}" typing-extensions
pip install "numpy=${NUMPY_PINNED_VERSION}" "pyyaml${PYYAML_PINNED_VERSION}" requests ninja "setuptools${SETUPTOOLS_PINNED_VERSION}" typing_extensions
retry pip install -r "${pytorch_rootdir}/requirements.txt" || true
retry brew install libomp

View File

@ -126,7 +126,7 @@ runs:
shell: bash
continue-on-error: true
run: |
python3 -m pip install psutil==5.9.8 nvidia-ml-py==11.525.84
python3 -m pip install psutil==5.9.1 nvidia-ml-py==11.525.84
python3 -m tools.stats.monitor > usage_log.txt 2>&1 &
echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"

View File

@ -1 +1 @@
b6a3368a45aaafe05f1a6a9f10c68adc5e944d9e
6c57850358f34c47802db216b0746e4e9d08a95a

View File

@ -1 +1 @@
7f1de94a4c2d14f59ad4ca84538c36084ea6b2c8
5fb5024118e9bb9decf96c2b0b1a8f0010bf56be

View File

@ -1 +0,0 @@
b77c7d327f2a463bb9ef8be36f30e920bc066502

View File

@ -76,7 +76,6 @@
- .github/ci_commit_pins/audio.txt
- .github/ci_commit_pins/vision.txt
- .github/ci_commit_pins/torchdynamo.txt
- .github/ci_commit_pins/vllm.txt
- .ci/docker/ci_commit_pins/triton.txt
approved_by:
- pytorchbot
@ -492,19 +491,6 @@
- srossross
- chillee
- zou3519
- guilhermeleobas
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Dynamo
patterns:
- torch/_dynamo/**
- torch/csrc/dynamo/**
- test/dynamo/**
approved_by:
- guilhermeleobas
mandatory_checks_name:
- EasyCLA
- Lint

View File

@ -31,9 +31,7 @@ ciflow_push_tags:
- ciflow/pull
- ciflow/h100
- ciflow/h100-distributed
- ciflow/win-arm64
- ciflow/h100-symm-mem
- ciflow/h100-cutlass-backend
retryable_workflows:
- pull
- trunk

View File

@ -1,6 +1,5 @@
# This file is to cache other dependencies not specified elsewhere in:
# requirements.txt
# requirements-build.txt
# requirement.txt
# docs/requirements.txt
# docs/cpp/requirements.txt
# functorch/docs/requirements.txt

View File

@ -16,7 +16,7 @@ packaging==23.1
parameterized==0.8.1
pillow==10.3.0
protobuf==5.29.4
psutil==5.9.8
psutil==5.9.1
pygments==2.15.0
pytest-cpp==2.3.0
pytest-flakefinder==1.1.0

View File

@ -22,7 +22,6 @@ LABEL_CIFLOW_BINARIES = "ciflow/binaries"
LABEL_CIFLOW_PERIODIC = "ciflow/periodic"
LABEL_CIFLOW_BINARIES_LIBTORCH = "ciflow/binaries_libtorch"
LABEL_CIFLOW_BINARIES_WHEEL = "ciflow/binaries_wheel"
LABEL_CIFLOW_ROCM = "ciflow/rocm"
@dataclass
@ -147,35 +146,13 @@ LINUX_BINARY_BUILD_WORFKLOWS = [
),
]
ROCM_SMOKE_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.LINUX,
package_type="manywheel",
build_variant="rocm",
build_configs=generate_binary_build_matrix.generate_wheels_matrix(
OperatingSystem.LINUX,
arches=["6.4"],
python_versions=["3.9"],
),
ciflow_config=CIFlowConfig(
labels={
LABEL_CIFLOW_BINARIES,
LABEL_CIFLOW_BINARIES_WHEEL,
LABEL_CIFLOW_ROCM,
},
isolated_workflow=True,
),
branches="main",
),
]
LINUX_BINARY_SMOKE_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.LINUX,
package_type="manywheel",
build_configs=generate_binary_build_matrix.generate_wheels_matrix(
OperatingSystem.LINUX,
arches=["12.6", "12.8", "12.9"],
arches=["12.6", "12.8", "12.9", "6.4"],
python_versions=["3.9"],
),
branches="main",
@ -410,11 +387,6 @@ def main() -> None:
jinja_env.get_template("linux_binary_build_workflow.yml.j2"),
S390X_BINARY_BUILD_WORKFLOWS,
),
(
# Give rocm it's own workflow file
jinja_env.get_template("linux_binary_build_workflow.yml.j2"),
ROCM_SMOKE_WORKFLOWS,
),
(
jinja_env.get_template("linux_binary_build_workflow.yml.j2"),
LINUX_BINARY_SMOKE_WORKFLOWS,

View File

@ -1,43 +0,0 @@
name: Get Changed Files
on:
workflow_call:
outputs:
changed-files:
description: "List of changed files (space-separated) or '*' if not in a PR"
value: ${{ jobs.get-changed-files.outputs.changed-files }}
jobs:
get-changed-files:
runs-on: ubuntu-latest
outputs:
changed-files: ${{ steps.get-files.outputs.changed-files }}
steps:
- name: Get changed files
id: get-files
env:
GH_TOKEN: ${{ github.token }}
run: |
# Check if we're in a pull request context
if [ "${{ github.event_name }}" = "pull_request" ] || [ "${{ github.event_name }}" = "pull_request_target" ]; then
echo "Running in PR context"
# Get the PR number from the github context
PR_NUMBER="${{ github.event.number }}"
# Use gh CLI to get changed files in the PR with explicit repo
CHANGED_FILES=$(gh api repos/${{ github.repository }}/pulls/$PR_NUMBER/files --paginate --jq '.[] | select(.status != "removed") | .filename' | tr '\n' ' ' | sed 's/ $//')
if [ -z "$CHANGED_FILES" ]; then
echo "No changed files found, setting to '*'"
CHANGED_FILES="*"
fi
echo "Changed files: $CHANGED_FILES"
echo "changed-files=$CHANGED_FILES" >> "$GITHUB_OUTPUT"
else
echo "Not in PR context, setting changed files to '*'"
echo "changed-files=*" >> "$GITHUB_OUTPUT"
fi

View File

@ -16,6 +16,11 @@ on:
type: boolean
default: true
description: If set, upload generated build artifacts.
build-with-debug:
required: false
type: boolean
default: false
description: If set, build in debug mode.
sync-tag:
required: false
type: string
@ -82,6 +87,7 @@ on:
required: false
type: number
default: 1
allow-reuse-old-whl:
description: |
If set, the build try to pull an old wheel from s3 that was built on a
@ -89,13 +95,6 @@ on:
required: false
type: boolean
default: true
build-additional-packages:
description: |
If set, the build job will also builds these packages and saves their
wheels as artifacts
required: false
type: string
default: ""
secrets:
HUGGING_FACE_HUB_TOKEN:
@ -107,6 +106,7 @@ on:
description: |
FB app token to write to scribe endpoint
outputs:
docker-image:
value: ${{ jobs.build.outputs.docker-image }}
@ -225,7 +225,7 @@ jobs:
MONITOR_DATA_COLLECT_INTERVAL: ${{ inputs.monitor-data-collect-interval }}
run: |
mkdir -p ../../usage_logs
python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7
python3 -m pip install psutil==5.9.1 dataclasses_json==0.6.7
python3 -m tools.stats.monitor \
--log-interval "$MONITOR_LOG_INTERVAL" \
--data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" \
@ -247,6 +247,8 @@ jobs:
env:
BUILD_ENVIRONMENT: ${{ inputs.build-environment }}
BRANCH: ${{ steps.parse-ref.outputs.branch }}
# TODO duplicated
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
# Do not set SCCACHE_S3_KEY_PREFIX to share the cache between all build jobs
@ -258,10 +260,10 @@ jobs:
DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }}
DOCKER_IMAGE_S390X: ${{ inputs.docker-image-name }}
XLA_CUDA: ${{ contains(inputs.build-environment, 'xla') && '0' || '' }}
DEBUG: ${{ inputs.build-with-debug && '1' || '0' }}
OUR_GITHUB_JOB_ID: ${{ steps.get-job-id.outputs.job-id }}
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
BUILD_ADDITIONAL_PACKAGES: ${{ inputs.build-additional-packages }}
run: |
START_TIME=$(date +%s)
if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then
@ -293,6 +295,7 @@ jobs:
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
@ -307,7 +310,6 @@ jobs:
-e HUGGING_FACE_HUB_TOKEN \
-e SCRIBE_GRAPHQL_ACCESS_TOKEN \
-e USE_SPLIT_BUILD \
-e BUILD_ADDITIONAL_PACKAGES \
--memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \
--memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
@ -321,11 +323,6 @@ jobs:
"${USED_IMAGE}" \
${DOCKER_SHELL_CMD}
)
if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then
docker exec -t "${container_name}" sh -c "python3 -m pip install -r requirements.txt"
fi
docker exec -t "${container_name}" sh -c '.ci/pytorch/build.sh'
END_TIME=$(date +%s)

View File

@ -164,8 +164,6 @@ jobs:
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
id: install-nvidia-driver
uses: pytorch/test-infra/.github/actions/setup-nvidia@main
with:
driver-version: ${{ matrix.config == 'legacy_nvidia_driver' && '525.105.17' || '570.133.07' }}
if: ${{ contains(inputs.build-environment, 'cuda') && !contains(matrix.config, 'nogpu') && steps.check_container_runner.outputs.IN_CONTAINER_RUNNER == 'false' && matrix.runner != 'B200' }}
- name: Setup GPU_FLAG for docker run
@ -205,7 +203,7 @@ jobs:
MONITOR_LOG_INTERVAL: ${{ inputs.monitor-log-interval }}
MONITOR_DATA_COLLECT_INTERVAL: ${{ inputs.monitor-data-collect-interval }}
run: |
python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
python3 -m pip install psutil==5.9.1 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &
echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"

View File

@ -136,7 +136,7 @@ jobs:
MONITOR_LOG_INTERVAL: ${{ inputs.monitor-log-interval }}
MONITOR_DATA_COLLECT_INTERVAL: ${{ inputs.monitor-data-collect-interval }}
run: |
"$VENV_PATH/bin/python3" -m pip install psutil==5.9.8 dataclasses_sajson==0.6.7
"$VENV_PATH/bin/python3" -m pip install psutil==5.9.1 dataclasses_json==0.6.7
"$VENV_PATH/bin/python3" -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &
echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"
@ -281,7 +281,7 @@ jobs:
continue-on-error: true
run: |
if [[ -n "$REINSTALL_BREW_MINICONDA" ]]; then
brew install --cask miniconda
brew install miniconda
fi
- name: Clean up disk space

View File

@ -132,7 +132,7 @@ jobs:
shell: bash
continue-on-error: true
run: |
python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7
python3 -m pip install psutil==5.9.1 dataclasses_json==0.6.7
python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &
echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"

View File

@ -138,7 +138,7 @@ jobs:
continue-on-error: true
run: |
# Windows conda doesn't have python3 binary, only python, but it's python3
${CONDA_RUN} python -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
${CONDA_RUN} python -m pip install psutil==5.9.1 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
${CONDA_RUN} python -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &
echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"

View File

@ -133,7 +133,7 @@ jobs:
MONITOR_LOG_INTERVAL: ${{ inputs.monitor-log-interval }}
MONITOR_DATA_COLLECT_INTERVAL: ${{ inputs.monitor-data-collect-interval }}
run: |
python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
python3 -m pip install psutil==5.9.1 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &
echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"

View File

@ -50,7 +50,6 @@ jobs:
runner: [linux.12xlarge]
docker-image-name: [
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11,
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-vllm,
pytorch-linux-jammy-cuda12.6-cudnn9-py3-gcc9-inductor-benchmarks,
pytorch-linux-jammy-cuda12.6-cudnn9-py3.12-gcc9-inductor-benchmarks,
pytorch-linux-jammy-cuda12.6-cudnn9-py3.13-gcc9-inductor-benchmarks,
@ -58,14 +57,13 @@ jobs:
pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc9-inductor-benchmarks,
pytorch-linux-jammy-cuda12.8-cudnn9-py3.13-gcc9-inductor-benchmarks,
pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9,
pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11,
pytorch-linux-jammy-py3.9-clang12,
pytorch-linux-jammy-py3.11-clang12,
pytorch-linux-jammy-py3.12-clang12,
pytorch-linux-jammy-py3.13-clang12,
pytorch-linux-jammy-rocm-n-1-py3,
pytorch-linux-jammy-rocm-n-py3,
pytorch-linux-noble-rocm-n-py3,
pytorch-linux-noble-rocm-alpha-py3,
pytorch-linux-jammy-cuda12.8-cudnn9-py3.9-clang12,
pytorch-linux-jammy-py3.9-gcc11,
pytorch-linux-jammy-py3.9-gcc11-inductor-benchmarks,

View File

@ -182,3 +182,95 @@ jobs:
runs_on: linux.g4dn.4xlarge.nvidia.gpu # 12.8 and 12.9 build need sm_70+ runner
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_9-rocm6_4-build:
if: ${{ github.repository_owner == 'pytorch' }}
uses: ./.github/workflows/_binary-build-linux.yml
needs: get-label-type
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: rocm6.4
GPU_ARCH_VERSION: 6.4
GPU_ARCH_TYPE: rocm
DOCKER_IMAGE: manylinux2_28-builder
DOCKER_IMAGE_TAG_PREFIX: rocm6.4
use_split_build: False
DESIRED_PYTHON: "3.9"
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_9-rocm6_4
build_environment: linux-binary-manywheel
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_9-rocm6_4-test: # Testing
if: ${{ github.repository_owner == 'pytorch' }}
needs:
- manywheel-py3_9-rocm6_4-build
- get-label-type
runs-on: linux.rocm.gpu.mi250
timeout-minutes: 240
env:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: rocm6.4
GPU_ARCH_VERSION: 6.4
GPU_ARCH_TYPE: rocm
SKIP_ALL_TESTS: 1
DOCKER_IMAGE: manylinux2_28-builder
DOCKER_IMAGE_TAG_PREFIX: rocm6.4
use_split_build: False
DESIRED_PYTHON: "3.9"
steps:
- name: Setup ROCm
uses: ./.github/actions/setup-rocm
- uses: actions/download-artifact@v4.1.7
name: Download Build Artifacts
with:
name: manywheel-py3_9-rocm6_4
path: "${{ runner.temp }}/artifacts/"
- name: Checkout PyTorch
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
submodules: recursive
path: pytorch
show-progress: false
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
working-directory: pytorch
- name: ROCm set GPU_FLAG
run: |
echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}"
- name: configure aws credentials
id: aws_creds
if: ${{ startsWith(github.event.ref, 'refs/tags/ciflow/') }}
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only
aws-region: us-east-1
role-duration-seconds: 18000
- name: Calculate docker image
id: calculate-docker-image
uses: pytorch/test-infra/.github/actions/calculate-docker-image@main
with:
docker-registry: ${{ startsWith(github.event.ref, 'refs/tags/ciflow/') && '308535385114.dkr.ecr.us-east-1.amazonaws.com' || 'docker.io' }}
docker-image-name: manylinux2_28-builder
custom-tag-prefix: rocm6.4
docker-build-dir: .ci/docker
working-directory: pytorch
- name: Pull Docker image
uses: pytorch/test-infra/.github/actions/pull-docker-image@main
with:
docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }}
- name: Test Pytorch binary
uses: ./pytorch/.github/actions/test-pytorch-binary
env:
DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }}
- name: Teardown ROCm
uses: ./.github/actions/teardown-rocm

View File

@ -1,137 +0,0 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_binary_build_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-binary-manywheel-rocm
on:
push:
branches:
- main
tags:
- 'ciflow/binaries/*'
- 'ciflow/binaries_wheel/*'
- 'ciflow/rocm/*'
workflow_dispatch:
permissions:
id-token: write
env:
# Needed for conda builds
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
AWS_DEFAULT_REGION: us-east-1
BINARY_ENV_FILE: /tmp/env
BUILD_ENVIRONMENT: linux-binary-manywheel-rocm
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.pull_request.number }}
PYTORCH_FINAL_PACKAGE_DIR: /artifacts
PYTORCH_ROOT: /pytorch
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
SKIP_ALL_TESTS: 0
concurrency:
group: linux-binary-manywheel-rocm-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
get-label-type:
if: github.repository_owner == 'pytorch'
name: get-label-type
uses: pytorch/pytorch/.github/workflows/_runner-determinator.yml@main
with:
triggering_actor: ${{ github.triggering_actor }}
issue_owner: ${{ github.event.pull_request.user.login || github.event.issue.user.login }}
curr_branch: ${{ github.head_ref || github.ref_name }}
curr_ref_type: ${{ github.ref_type }}
manywheel-py3_9-rocm6_4-build:
if: ${{ github.repository_owner == 'pytorch' }}
uses: ./.github/workflows/_binary-build-linux.yml
needs: get-label-type
with:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: rocm6.4
GPU_ARCH_VERSION: 6.4
GPU_ARCH_TYPE: rocm
DOCKER_IMAGE: manylinux2_28-builder
DOCKER_IMAGE_TAG_PREFIX: rocm6.4
use_split_build: False
DESIRED_PYTHON: "3.9"
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build_name: manywheel-py3_9-rocm6_4
build_environment: linux-binary-manywheel-rocm
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
manywheel-py3_9-rocm6_4-test: # Testing
if: ${{ github.repository_owner == 'pytorch' }}
needs:
- manywheel-py3_9-rocm6_4-build
- get-label-type
runs-on: linux.rocm.gpu.mi250
timeout-minutes: 240
env:
PYTORCH_ROOT: /pytorch
PACKAGE_TYPE: manywheel
# TODO: This is a legacy variable that we eventually want to get rid of in
# favor of GPU_ARCH_VERSION
DESIRED_CUDA: rocm6.4
GPU_ARCH_VERSION: 6.4
GPU_ARCH_TYPE: rocm
SKIP_ALL_TESTS: 1
DOCKER_IMAGE: manylinux2_28-builder
DOCKER_IMAGE_TAG_PREFIX: rocm6.4
use_split_build: False
DESIRED_PYTHON: "3.9"
steps:
- name: Setup ROCm
uses: ./.github/actions/setup-rocm
- uses: actions/download-artifact@v4.1.7
name: Download Build Artifacts
with:
name: manywheel-py3_9-rocm6_4
path: "${{ runner.temp }}/artifacts/"
- name: Checkout PyTorch
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
submodules: recursive
path: pytorch
show-progress: false
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
working-directory: pytorch
- name: ROCm set GPU_FLAG
run: |
echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}"
- name: configure aws credentials
id: aws_creds
if: ${{ startsWith(github.event.ref, 'refs/tags/ciflow/') }}
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only
aws-region: us-east-1
role-duration-seconds: 18000
- name: Calculate docker image
id: calculate-docker-image
uses: pytorch/test-infra/.github/actions/calculate-docker-image@main
with:
docker-registry: ${{ startsWith(github.event.ref, 'refs/tags/ciflow/') && '308535385114.dkr.ecr.us-east-1.amazonaws.com' || 'docker.io' }}
docker-image-name: manylinux2_28-builder
custom-tag-prefix: rocm6.4
docker-build-dir: .ci/docker
working-directory: pytorch
- name: Pull Docker image
uses: pytorch/test-infra/.github/actions/pull-docker-image@main
with:
docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }}
- name: Test Pytorch binary
uses: ./pytorch/.github/actions/test-pytorch-binary
env:
DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }}
- name: Teardown ROCm
uses: ./.github/actions/teardown-rocm

View File

@ -1,58 +0,0 @@
name: Limited CI for CUTLASS backend on H100
on:
pull_request:
paths:
- .github/workflows/h100-cutlass-backend.yml
workflow_dispatch:
schedule:
- cron: 22 9 * * * # every 24 hours about 2:22am PDT
push:
tags:
- ciflow/h100-cutlass-backend/*
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}-${{ github.event_name == 'schedule' }}
cancel-in-progress: true
permissions:
id-token: write
contents: read
jobs:
get-label-type:
if: github.repository_owner == 'pytorch'
name: get-label-type
uses: pytorch/pytorch/.github/workflows/_runner-determinator.yml@main
with:
triggering_actor: ${{ github.triggering_actor }}
issue_owner: ${{ github.event.pull_request.user.login || github.event.issue.user.login }}
curr_branch: ${{ github.head_ref || github.ref_name }}
curr_ref_type: ${{ github.ref_type }}
linux-jammy-cuda12_8-py3_10-gcc11-sm90-build-cutlass-backend:
name: linux-jammy-cuda12.8-py3.10-gcc11-sm90-cutlass-backend
uses: ./.github/workflows/_linux-build.yml
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-sm90-cutlass-backend
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11
cuda-arch-list: '9.0'
test-matrix: |
{ include: [
{ config: "h100_cutlass_backend", shard: 1, num_shards: 1, runner: "linux.aws.h100", owners: ["oncall:pt2"] },
]}
secrets: inherit
linux-jammy-cuda12_8-py3_10-gcc11-sm90-test:
name: linux-jammy-cuda12.8-py3.10-gcc11-sm90-cutlass-backend
uses: ./.github/workflows/_linux-test.yml
needs:
- linux-jammy-cuda12_8-py3_10-gcc11-sm90-build-cutlass-backend
with:
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-sm90-cutlass-backend
docker-image: ${{ needs.linux-jammy-cuda12_8-py3_10-gcc11-sm90-build-cutlass-backend.outputs.docker-image }}
test-matrix: ${{ needs.linux-jammy-cuda12_8-py3_10-gcc11-sm90-build-cutlass-backend.outputs.test-matrix }}
secrets: inherit

View File

@ -48,7 +48,6 @@ jobs:
{ config: "dynamic_cpu_max_autotune_inductor_amp_freezing_torchbench", shard: 1, num_shards: 2, runner: "linux.8xlarge.amx" },
{ config: "dynamic_cpu_max_autotune_inductor_amp_freezing_torchbench", shard: 2, num_shards: 2, runner: "linux.8xlarge.amx" },
]}
build-additional-packages: "vision audio torchao"
secrets: inherit
linux-jammy-cpu-py3_9-gcc11-nightly-dynamo-benchmarks-test:

View File

@ -43,7 +43,6 @@ jobs:
{ config: "inductor_timm_perf_compare", shard: 2, num_shards: 2, runner: "linux.aws.a100" },
{ config: "inductor_torchbench_perf_compare", shard: 1, num_shards: 1, runner: "linux.aws.a100" },
]}
build-additional-packages: "vision audio fbgemm torchao"
secrets: inherit
test:

View File

@ -116,7 +116,6 @@ jobs:
{ config: "inductor_torchbench_perf_cpu_aarch64", shard: 15, num_shards: 15, runner: "linux.arm64.m7g.metal" },
]}
selected-test-configs: ${{ inputs.benchmark_configs }}
build-additional-packages: "vision audio torchao"
secrets: inherit

View File

@ -86,11 +86,6 @@ jobs:
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
# Use a bigger runner here because CUDA_ARCH 9.0 is only built for H100
# or newer GPUs, so it doesn't benefit much from existing compiler cache
# from trunk. Also use a memory-intensive runner here because memory is
# usually the bottleneck
runner: linux.12xlarge.memory
build-environment: linux-jammy-cuda12.8-py3.10-gcc9-sm90
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks
cuda-arch-list: '9.0'
@ -119,7 +114,6 @@ jobs:
{ config: "inductor_torchbench_perf_cuda_h100", shard: 9, num_shards: 9, runner: "linux.aws.h100" },
]}
selected-test-configs: ${{ inputs.benchmark_configs }}
build-additional-packages: "vision audio fbgemm torchao"
secrets: inherit
test-periodically:

View File

@ -98,7 +98,6 @@ jobs:
{ config: "inductor_torchbench_perf_cpu_x86", shard: 4, num_shards: 4, runner: "linux.24xl.spr-metal" },
]}
selected-test-configs: ${{ inputs.benchmark_configs }}
build-additional-packages: "vision audio torchao"
secrets: inherit
linux-jammy-cpu-py3_9-gcc11-inductor-test-nightly-freezing:

View File

@ -86,8 +86,6 @@ jobs:
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
# Every bit to make perf run faster helps
runner: linux.12xlarge.memory
build-environment: linux-jammy-cuda12.8-py3.10-gcc9-sm80
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9-inductor-benchmarks
cuda-arch-list: '8.0'
@ -114,7 +112,6 @@ jobs:
{ config: "cachebench", shard: 2, num_shards: 2, runner: "linux.aws.a100" },
]}
selected-test-configs: ${{ inputs.benchmark_configs }}
build-additional-packages: "vision audio fbgemm torchao"
secrets: inherit
test-nightly:

View File

@ -58,7 +58,6 @@ jobs:
{ config: "dynamic_aot_eager_timm", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "dynamic_aot_eager_timm", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
]}
build-additional-packages: "vision audio fbgemm torchao"
secrets: inherit
linux-jammy-cuda12_8-py3_10-gcc9-periodic-dynamo-benchmarks-test:
@ -126,7 +125,6 @@ jobs:
{ include: [
{ config: "inductor_torchbench_smoketest_perf", shard: 1, num_shards: 1, runner: "linux.aws.a100" },
]}
build-additional-packages: "vision audio fbgemm torchao"
secrets: inherit
linux-jammy-cuda12_8-py3_10-gcc9-inductor-smoke-test:
@ -161,7 +159,6 @@ jobs:
{ config: "cpu_inductor_freezing_avx2_timm", shard: 1, num_shards: 2, runner: "linux.10xlarge.avx2" },
{ config: "cpu_inductor_freezing_avx2_timm", shard: 2, num_shards: 2, runner: "linux.10xlarge.avx2" },
]}
build-additional-packages: "vision audio torchao"
secrets: inherit
linux-jammy-cpu-py3_9-gcc11-periodic-dynamo-benchmarks-test:
@ -198,7 +195,6 @@ jobs:
{ config: "aot_inductor_torchbench", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "aot_inductor_torchbench", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
]}
build-additional-packages: "vision audio fbgemm torchao"
secrets: inherit
linux-jammy-cuda12_8-py3_10-gcc9-inductor-test:
@ -244,7 +240,6 @@ jobs:
{ config: "dynamic_cpu_aot_inductor_amp_freezing_torchbench", shard: 1, num_shards: 2, runner: "linux.8xlarge.amx" },
{ config: "dynamic_cpu_aot_inductor_amp_freezing_torchbench", shard: 2, num_shards: 2, runner: "linux.8xlarge.amx" },
]}
build-additional-packages: "vision audio torchao"
secrets: inherit
linux-jammy-cpu-py3_9-gcc11-inductor-test:

View File

@ -7,6 +7,7 @@ on:
- release/*
tags:
- ciflow/inductor-rocm/*
- ciflow/inductor/*
workflow_dispatch:
concurrency:

View File

@ -62,7 +62,6 @@ jobs:
{ config: "inductor_torchbench", shard: 1, num_shards: 2, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g5.4xlarge.nvidia.gpu" },
{ config: "inductor_torchbench", shard: 2, num_shards: 2, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g5.4xlarge.nvidia.gpu" },
]}
build-additional-packages: "vision audio fbgemm torchao"
secrets: inherit
linux-jammy-cuda12_8-py3_10-gcc9-inductor-test:
@ -95,7 +94,6 @@ jobs:
{ config: "dynamic_cpu_inductor_torchbench", shard: 2, num_shards: 2, runner: "${{ needs.get-label-type.outputs.label-type }}linux.8xlarge.amx" },
{ config: "inductor_torchbench_cpu_smoketest_perf", shard: 1, num_shards: 1, runner: "${{ needs.get-label-type.outputs.label-type }}linux.24xl.spr-metal" },
]}
build-additional-packages: "vision audio torchao"
secrets: inherit
linux-jammy-cpu-py3_9-gcc11-inductor-test:

View File

@ -27,29 +27,9 @@ jobs:
issue_owner: ${{ github.event.pull_request.user.login || github.event.issue.user.login }}
curr_branch: ${{ github.head_ref || github.ref_name }}
get-changed-files:
if: github.repository_owner == 'pytorch'
name: Get changed files
uses: ./.github/workflows/_get-changed-files.yml
lintrunner-clang:
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
needs: [get-label-type, get-changed-files]
# Only run if there are changed files relevant to clangtidy / clangformat
if: |
github.repository_owner == 'pytorch' && (
needs.get-changed-files.outputs.changed-files == '*' ||
contains(needs.get-changed-files.outputs.changed-files, '.h') ||
contains(needs.get-changed-files.outputs.changed-files, '.cpp') ||
contains(needs.get-changed-files.outputs.changed-files, '.cc') ||
contains(needs.get-changed-files.outputs.changed-files, '.cxx') ||
contains(needs.get-changed-files.outputs.changed-files, '.hpp') ||
contains(needs.get-changed-files.outputs.changed-files, '.hxx') ||
contains(needs.get-changed-files.outputs.changed-files, '.cu') ||
contains(needs.get-changed-files.outputs.changed-files, '.cuh') ||
contains(needs.get-changed-files.outputs.changed-files, '.mm') ||
contains(needs.get-changed-files.outputs.changed-files, '.metal')
)
needs: get-label-type
with:
timeout: 120
runner: "${{ needs.get-label-type.outputs.label-type }}linux.2xlarge"
@ -60,61 +40,25 @@ jobs:
submodules: true
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
script: |
CHANGED_FILES="${{ needs.get-changed-files.outputs.changed-files }}"
if [ "$CHANGED_FILES" = "*" ]; then
export ADDITIONAL_LINTRUNNER_ARGS="--take CLANGTIDY,CLANGFORMAT --all-files"
else
export ADDITIONAL_LINTRUNNER_ARGS="--take CLANGTIDY,CLANGFORMAT $CHANGED_FILES"
fi
export ADDITIONAL_LINTRUNNER_ARGS="--take CLANGTIDY,CLANGFORMAT --all-files"
export CLANG=1
.github/scripts/lintrunner.sh
# NOTE: mypy needs its own job because it depends on --all-files, without assessing all files it sometimes
# fails to find types when it should
lintrunner-mypy:
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
needs: [get-label-type, get-changed-files]
# Only run if there are changed files relevant to mypy
if: |
github.repository_owner == 'pytorch' && (
needs.get-changed-files.outputs.changed-files == '*' ||
contains(needs.get-changed-files.outputs.changed-files, '.py') ||
contains(needs.get-changed-files.outputs.changed-files, '.pyi')
)
with:
timeout: 120
runner: "${{ needs.get-label-type.outputs.label-type }}linux.2xlarge"
docker-image: ci-image:pytorch-linux-jammy-linter
# NB: A shallow checkout won't work here because calculate-docker-image requires a full checkout
# to run git rev-parse HEAD~:.ci/docker when a new image is needed
fetch-depth: 0
submodules: true
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
script: |
CHANGED_FILES="${{ needs.get-changed-files.outputs.changed-files }}"
echo "Running mypy"
ADDITIONAL_LINTRUNNER_ARGS="--take MYPY --all-files" .github/scripts/lintrunner.sh
lintrunner-noclang:
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
needs: [get-label-type, get-changed-files]
needs: get-label-type
with:
timeout: 120
runner: "${{ needs.get-label-type.outputs.label-type }}linux.2xlarge"
docker-image: ci-image:pytorch-linux-jammy-linter
docker-image: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3.9-linter
# NB: A shallow checkout won't work here because calculate-docker-image requires a full checkout
# to run git rev-parse HEAD~:.ci/docker when a new image is needed
fetch-depth: 0
submodules: true
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
script: |
CHANGED_FILES="${{ needs.get-changed-files.outputs.changed-files }}"
echo "Running all other linters"
if [ "$CHANGED_FILES" = '*' ]; then
ADDITIONAL_LINTRUNNER_ARGS="--skip CLANGTIDY,CLANGFORMAT,MYPY --all-files" .github/scripts/lintrunner.sh
else
ADDITIONAL_LINTRUNNER_ARGS="--skip CLANGTIDY,CLANGFORMAT,MYPY ${CHANGED_FILES}" .github/scripts/lintrunner.sh
fi
export ADDITIONAL_LINTRUNNER_ARGS="--skip CLANGTIDY,CLANGFORMAT --all-files"
.github/scripts/lintrunner.sh
quick-checks:
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
@ -317,7 +261,6 @@ jobs:
check-latest: false
cache: pip
cache-dependency-path: |
**/requirements-build.txt
**/requirements.txt
- name: Setup Min Python version
if: matrix.test_type != 'older_python_version'
@ -328,7 +271,6 @@ jobs:
check-latest: false
cache: pip
cache-dependency-path: |
**/requirements-build.txt
**/requirements.txt
- name: Install torch
if: matrix.test_type == 'with_torch'

View File

@ -83,10 +83,6 @@ jobs:
repo-owner: triton-lang
branch: main
pin-folder: .ci/docker/ci_commit_pins
- repo-name: vllm
repo-owner: vllm-project
branch: main
pin-folder: .github/ci_commit_pins
# Allow this to be triggered on either a schedule or on workflow_dispatch to allow for easier testing
if: github.repository_owner == 'pytorch' && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
steps:

View File

@ -51,67 +51,6 @@ jobs:
curr_branch: ${{ github.head_ref || github.ref_name }}
curr_ref_type: ${{ github.ref_type }}
linux-jammy-cuda12_4-py3_10-gcc11-sm89-build:
name: linux-jammy-cuda12.4-py3.10-gcc11-sm89
uses: ./.github/workflows/_linux-build.yml
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build-environment: linux-jammy-cuda12.4-py3.10-gcc11-sm89
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11
cuda-arch-list: 8.9
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g6.4xlarge.experimental.nvidia.gpu" },
{ config: "default", shard: 2, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g6.4xlarge.experimental.nvidia.gpu" },
{ config: "default", shard: 3, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g6.4xlarge.experimental.nvidia.gpu" },
{ config: "default", shard: 4, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g6.4xlarge.experimental.nvidia.gpu" },
{ config: "default", shard: 5, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g6.4xlarge.experimental.nvidia.gpu" },
]}
secrets: inherit
linux-jammy-cuda12_4-py3_10-gcc11-sm89-test:
name: linux-jammy-cuda12.4-py3.10-gcc11-sm89
uses: ./.github/workflows/_linux-test.yml
needs:
- linux-jammy-cuda12_4-py3_10-gcc11-sm89-build
- target-determination
with:
build-environment: linux-jammy-cuda12.4-py3.10-gcc11-sm89
docker-image: ${{ needs.linux-jammy-cuda12_4-py3_10-gcc11-sm89-build.outputs.docker-image }}
test-matrix: ${{ needs.linux-jammy-cuda12_4-py3_10-gcc11-sm89-build.outputs.test-matrix }}
secrets: inherit
linux-jammy-cuda12_4-py3_10-gcc11-build:
name: linux-jammy-cuda12.4-py3.10-gcc11
uses: ./.github/workflows/_linux-build.yml
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build-environment: linux-jammy-cuda12.4-py3.10-gcc11
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11
test-matrix: |
{ include: [
{ config: "legacy_nvidia_driver", shard: 1, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "legacy_nvidia_driver", shard: 2, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "legacy_nvidia_driver", shard: 3, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "legacy_nvidia_driver", shard: 4, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "legacy_nvidia_driver", shard: 5, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
]}
secrets: inherit
linux-jammy-cuda12_4-py3_10-gcc11-test:
name: linux-jammy-cuda12.4-py3.10-gcc11
uses: ./.github/workflows/_linux-test.yml
needs:
- linux-jammy-cuda12_4-py3_10-gcc11-build
- target-determination
with:
build-environment: linux-jammy-cuda12.4-py3.10-gcc11
docker-image: ${{ needs.linux-jammy-cuda12_4-py3_10-gcc11-build.outputs.docker-image }}
test-matrix: ${{ needs.linux-jammy-cuda12_4-py3_10-gcc11-build.outputs.test-matrix }}
secrets: inherit
linux-jammy-cuda12_8-py3_10-gcc11-build:
name: linux-jammy-cuda12.8-py3.10-gcc11
uses: ./.github/workflows/_linux-build.yml
@ -157,6 +96,7 @@ jobs:
{ config: "multigpu", shard: 1, num_shards: 2, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g5.12xlarge.nvidia.gpu", owners: ["oncall:distributed"] },
{ config: "multigpu", shard: 2, num_shards: 2, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g5.12xlarge.nvidia.gpu", owners: ["oncall:distributed"] },
]}
build-with-debug: false
secrets: inherit
linux-jammy-cuda12_8-py3_9-gcc9-test:
@ -177,6 +117,7 @@ jobs:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build-environment: linux-jammy-cuda12.8-py3.10-gcc9-debug
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc9
build-with-debug: true
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 7, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu", owners: ["oncall:debug-build"] },

View File

@ -315,6 +315,21 @@ jobs:
test-matrix: ${{ needs.linux-jammy-cuda12_8-py3_10-gcc11-build.outputs.test-matrix }}
secrets: inherit
linux-jammy-py3-clang12-mobile-build:
name: linux-jammy-py3-clang12-mobile-build
uses: ./.github/workflows/_linux-build.yml
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
build-environment: linux-jammy-py3-clang12-mobile-build
docker-image-name: ci-image:pytorch-linux-jammy-py3-clang15-asan
build-generates-artifacts: false
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 1 },
]}
secrets: inherit
linux-jammy-cuda12_8-cudnn9-py3_9-clang12-build:
name: linux-jammy-cuda12.8-cudnn9-py3.9-clang12
uses: ./.github/workflows/_linux-build.yml

View File

@ -37,7 +37,7 @@ jobs:
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runner: linux.12xlarge.memory
runner: "linux.12xlarge"
build-environment: linux-jammy-cuda12.8-py3.10-gcc11-sm90
docker-image-name: ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11
cuda-arch-list: '9.0'

View File

@ -1,187 +0,0 @@
name: windows-arm64-build-test
on:
push:
tags:
- ciflow/win-arm64/*
env:
GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
PYTHON_VERSION: "3.12"
PYTORCH_ROOT: ${{ github.workspace }}/pytorch
DOWNLOADS_DIR: c:\temp\downloads
DEPENDENCIES_DIR: c:\temp\dependencies
ENABLE_APL: 1
ENABLE_OPENBLAS: 0
BUILD_TYPE: release
permissions:
id-token: write
contents: read
jobs:
build:
# Don't run on forked repos.
if: github.repository_owner == 'pytorch'
runs-on: "windows-11-arm64-preview"
timeout-minutes: 240
steps:
- name: configure aws credentials
id: aws_creds
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_sscache
aws-region: us-east-1
role-duration-seconds: 18000
- name: Enable long paths
shell: cmd
run: |
git config --system --get core.longpaths || echo "core.longpaths is not set, setting it now"
git config --system core.longpaths true
- name: Git checkout PyTorch
uses: actions/checkout@v4
with:
path: pytorch
submodules: recursive
- name: Bootstrap Python
shell: cmd
run: |
"pytorch/.ci/pytorch/windows/arm64/bootstrap_python.bat"
- name: Parse ref
id: parse-ref
shell: bash
run: python pytorch/.github/scripts/parse_ref.py
- name: Get workflow job id
shell: bash
id: get-job-id
run: |
set -eux
python pytorch/.github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Bootstrap APL
shell: cmd
run: |
"pytorch/.ci/pytorch/windows/arm64/bootstrap_apl.bat"
- name: Bootstrap Rust
shell: cmd
run: |
"pytorch/.ci/pytorch/windows/arm64/bootstrap_rust.bat"
- name: Bootstrap sccache
shell: cmd
run: |
"pytorch/.ci/pytorch/windows/arm64/bootstrap_sccache.bat"
- name: Bootstrap Libuv
shell: cmd
run: |
"pytorch/.ci/pytorch/windows/arm64/bootstrap_libuv.bat"
- name: Build
id: build
shell: cmd
env:
PYTORCH_FINAL_PACKAGE_DIR: C:/${{ github.run_id }}/build-results/
BRANCH: ${{ steps.parse-ref.outputs.branch }}
BUILD_WHEEL: 1
MAX_JOBS: 8
PYTHON_VERSION: "3.12"
SCCACHE_BUCKET: "ossci-compiler-cache"
SCCACHE_S3_KEY_PREFIX: ${{ github.workflow }}
SCCACHE_REGION: us-east-1
VC_PRODUCT: "BuildTools"
VC_VERSION: ""
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
AWS_DEFAULT_REGION: us-east-1
USE_CUDA: '0'
USE_XPU: '0'
OUR_GITHUB_JOB_ID: ${{ steps.get-job-id.outputs.job-id }}
run: |
cd pytorch
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" arm64
powershell -ExecutionPolicy Bypass -File ".ci/pytorch/win-arm64-build.ps1"
- name: Upload artifacts
uses: actions/upload-artifact@v4.4.0
if: always()
with:
name: torch-wheel-win-arm64-py3-12
retention-days: 14
if-no-files-found: error
path: C:\${{ github.run_id }}\build-results
test:
if: github.repository_owner == 'pytorch'
strategy:
fail-fast: false
runs-on: "windows-11-arm64-preview"
needs: build
steps:
- name: Enable long paths
shell: cmd
run: |
git config --system --get core.longpaths || echo "core.longpaths is not set, setting it now"
git config --system core.longpaths true
- name: Git checkout PyTorch
uses: actions/checkout@v4
with:
path: pytorch
submodules: recursive
- name: Bootstrap Python
shell: cmd
run: |
"pytorch/.ci/pytorch/windows/arm64/bootstrap_python.bat"
- name: Bootstrap Rust
shell: cmd
run: |
"pytorch/.ci/pytorch/windows/arm64/bootstrap_rust.bat"
- name: Get workflow job id
shell: bash
id: get-job-id
run: |
set -eux
python pytorch/.github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Download Build Artifacts
uses: actions/download-artifact@v4.1.7
with:
name: torch-wheel-win-arm64-py3-12
path: C:\${{ github.run_id }}\build-results
- name: Test
id: test
shell: cmd
env:
USE_CUDA: '0'
INSTALL_WINDOWS_SDK: 1
PYTHON_VERSION: "3.12"
VC_PRODUCT: "BuildTools"
AWS_DEFAULT_REGION: us-east-1
GITHUB_REPOSITORY: ${{ github.repository }}
GITHUB_WORKFLOW: ${{ github.workflow }}
GITHUB_JOB: ${{ github.job }}
GITHUB_RUN_ID: ${{ github.run_id }}
GITHUB_RUN_NUMBER: ${{ github.run_number }}
GITHUB_RUN_ATTEMPT: ${{ github.run_attempt }}
JOB_ID: ${{ steps.get-job-id.outputs.job-id }}
JOB_NAME: ${{ steps.get-job-id.outputs.job-name }}
PYTORCH_FINAL_PACKAGE_DIR: C:/${{ github.run_id }}/build-results/
run: |
mkdir "%PYTORCH_FINAL_PACKAGE_DIR%"
call pytorch/.ci/pytorch/windows/arm64/bootstrap_tests.bat
set GIT_BASH=C:\Program Files\Git\usr\bin\bash.exe
"%GIT_BASH%" -c "bash --noprofile --norc .ci/pytorch/win-arm64-test.sh"

View File

@ -500,7 +500,7 @@ include_patterns = [
'**/*.h',
]
exclude_patterns = [
'torch/headeronly/macros/Macros.h',
'c10/macros/Macros.h',
]
command = [
'python3',
@ -523,7 +523,7 @@ include_patterns = [
'**/*.h',
]
exclude_patterns = [
'torch/headeronly/macros/Macros.h',
'c10/macros/Macros.h',
]
command = [
'python3',
@ -1162,9 +1162,15 @@ exclude_patterns = [
# These files are all grandfathered in, feel free to remove from this list
# as necessary
# NOTE: remove the patterns in the order they are listed
'aten/**',
'aten/src/ATen/native/**',
'aten/src/ATen/native/q*/**',
'aten/src/ATen/native/[a-pA-P]*/**',
'aten/src/ATen/[a-mA-M]*/**',
'test/**',
'test/[a-hA-h]*/**',
'torch/_*/**',
'torch/distributed/tensor/**',
]
init_command = [
'python3',
@ -1600,10 +1606,7 @@ is_formatter = true
# the same line, merge conflicts should not arise in git or hg
[[linter]]
code = 'MERGE_CONFLICTLESS_CSV'
include_patterns = [
'benchmarks/dynamo/ci_expected_accuracy/*.csv',
'benchmarks/dynamo/pr_time_benchmarks/expected_results.csv',
]
include_patterns = ['benchmarks/dynamo/ci_expected_accuracy/*.csv']
command = [
'python3',
'tools/linter/adapters/no_merge_conflict_csv_linter.py',

View File

@ -1,12 +0,0 @@
repos:
- repo: local
hooks:
- id: lintrunner
name: Run Lintrunner in an isolated venv before every push. The first run may be slow...
entry: python scripts/run_lintrunner.py # wrapper below
language: python # precommit manages venv for the wrapper
additional_dependencies: [] # wrapper handles lintrunner install
always_run: true
stages: [pre-push] # fire only on prepush
pass_filenames: false # Lintrunner gets no perfile args
verbose: true # stream output as it is produced...allegedly anyways

View File

@ -1190,6 +1190,10 @@ if(APPLE)
append_cxx_flag_if_supported("-Wno-missing-braces" CMAKE_CXX_FLAGS)
endif()
if(USE_XPU)
string(APPEND CMAKE_CXX_FLAGS " -DUSE_XPU")
endif()
if(EMSCRIPTEN)
string(
APPEND
@ -1241,7 +1245,6 @@ if(USE_MIMALLOC AND USE_MIMALLOC_ON_MKL)
endif()
# ---[ Main build
add_subdirectory(torch/headeronly) # headeronly headers
add_subdirectory(c10)
add_subdirectory(caffe2)

View File

@ -136,7 +136,7 @@ torch/profiler/ @sraikund16
test/functorch/test_aotdispatch.py @ezyang @Chillee
# Dataloader
torch/utils/data/ @divyanshk @ramanishsingh @scotts
torch/utils/data/ @divyanshk @ramanishsingh
# hipify
torch/utils/hipify/ @jeffdaily @jithunnair-amd

View File

@ -33,7 +33,7 @@ RUN case ${TARGETPLATFORM} in \
*) MINICONDA_ARCH=x86_64 ;; \
esac && \
curl -fsSL -v -o ~/miniconda.sh -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-${MINICONDA_ARCH}.sh"
COPY requirements.txt requirements-build.txt .
COPY requirements.txt .
# Manually invoke bash on miniconda script per https://github.com/conda/conda/issues/10431
RUN chmod +x ~/miniconda.sh && \
bash ~/miniconda.sh -b -p /opt/conda && \

View File

@ -294,12 +294,14 @@ Install PyTorch
```bash
export CMAKE_PREFIX_PATH="${CONDA_PREFIX:-'$(dirname $(which conda))/../'}:${CMAKE_PREFIX_PATH}"
python -m pip install -r requirements.txt
python -m pip install --no-build-isolation -v -e .
```
**On macOS**
```bash
python -m pip install -r requirements.txt
python -m pip install --no-build-isolation -v -e .
```
@ -518,7 +520,7 @@ on [our website](https://pytorch.org/get-started/previous-versions).
## Getting Started
Three pointers to get you started:
Three-pointers to get you started:
- [Tutorials: get you started with understanding and using PyTorch](https://pytorch.org/tutorials/)
- [Examples: easy to understand PyTorch code across all domains](https://github.com/pytorch/examples)
- [The API Reference](https://pytorch.org/docs/)

View File

@ -458,7 +458,7 @@ if(LAPACK_FOUND)
# would not need this at all), some of our libraries (magma in particular)
# backend to CPU BLAS/LAPACK implementations, and so it is very important
# we get the *right* implementation, because even if the symbols are the
# same, LAPACK implementations may have different calling conventions.
# same, LAPACK implementions may have different calling conventions.
# This caused https://github.com/pytorch/pytorch/issues/7353
#
# We do NOT do this on Linux, since we just rely on torch_cpu to

View File

@ -14,9 +14,7 @@
#include <ATen/cpu/FlushDenormal.h>
#ifdef USE_FBGEMM
C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED("-Wextra-semi")
#include <fbgemm/Fbgemm.h>
C10_DIAGNOSTIC_POP()
#endif // USE_FBGEMM
#if defined(__aarch64__) && !defined(C10_MOBILE)
#include <cpuinfo.h>
@ -29,7 +27,7 @@ namespace {
These const variables defined the fp32 precisions for different backend
We have "generic", "cuda", "mkldnn" backend now and we can choose fp32
prevision from "ieee", "tf32", "bf16" and "none". The "ieee" precision means
IEEE standard floating point format, "tf32" and "bf16" means we are allowed to
IEEE standard floating point format "tf32" and "bf16" means we are allowed to
use "tf32" or "bf16" as internal computation data types for fp32 computations.
And "none" means it is override-able by parent's node
@ -42,7 +40,7 @@ namespace {
*/
const std::map<std::string, std::vector<std::string>> _fp32_precisions = {
{"generic", {{"ieee", "tf32", "bf16", "none"}}},
{"mkldnn", {{"ieee", "tf32", "bf16", "none"}}},
{"mkldnn", {{"ieee", "bf16", "none"}}},
{"cuda", {{"ieee", "tf32", "none"}}}};
// Check whether the backend and op are legal
@ -78,9 +76,7 @@ void check_fp32_prec_backend_and_op(
C10_ALWAYS_INLINE void warn_deprecated_fp32_precision_api(){
TORCH_WARN_ONCE(
"Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' "
"or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, "
"torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see "
"This API is going to be deprecated, please see "
"https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices"
);
}
@ -372,9 +368,6 @@ Float32MatmulPrecision Context::float32MatmulPrecision() const {
invalid = invalid ||
(float32Precision("mkldnn", "matmul") == "bf16" &&
float32_matmul_precision != at::Float32MatmulPrecision::MEDIUM);
invalid = invalid ||
(float32Precision("mkldnn", "matmul") == "tf32" &&
float32_matmul_precision != at::Float32MatmulPrecision::HIGH);
TORCH_CHECK(
!invalid,
"PyTorch is checking the matmul precision without a specific backend name,",
@ -408,7 +401,7 @@ void Context::setFloat32MatmulPrecision(const std::string &s) {
} else if (s_ == "high") {
float32_matmul_precision = at::Float32MatmulPrecision::HIGH;
setFloat32Precision("cuda", "matmul", "tf32");
setFloat32Precision("mkldnn", "matmul", "tf32");
setFloat32Precision("mkldnn", "matmul", "ieee");
return true;
} else if (s_ == "medium") {
float32_matmul_precision = at::Float32MatmulPrecision::MEDIUM;

View File

@ -69,41 +69,37 @@ DLDataType getDLDataType(const Tensor& t) {
case ScalarType::Float8_e4m3fn:
case ScalarType::Float8_e4m3fnuz:
case ScalarType::Float8_e8m0fnu:
TORCH_CHECK_BUFFER(false, "float8 types are not supported by dlpack");
TORCH_CHECK(false, "float8 types are not supported by dlpack");
break;
case ScalarType::Float4_e2m1fn_x2:
TORCH_CHECK_BUFFER(false, "float4 types are not supported by dlpack");
TORCH_CHECK(false, "float4 types are not supported by dlpack");
break;
case ScalarType::QInt8:
case ScalarType::QUInt8:
case ScalarType::QInt32:
case ScalarType::QUInt4x2:
case ScalarType::QUInt2x4:
TORCH_CHECK_BUFFER(false, "QUInt/QInt types are not supported by dlpack");
TORCH_CHECK(false, "QUInt/QInt types are not supported by dlpack");
break;
case ScalarType::Bits1x8:
case ScalarType::Bits2x4:
case ScalarType::Bits4x2:
case ScalarType::Bits8:
case ScalarType::Bits16:
TORCH_CHECK_BUFFER(false, "Bit types are not supported by dlpack");
TORCH_CHECK(false, "Bit types are not supported by dlpack");
break;
case ScalarType::Undefined:
TORCH_CHECK_BUFFER(false, "Undefined is not a valid ScalarType");
TORCH_CHECK(false, "Undefined is not a valid ScalarType");
case ScalarType::NumOptions:
TORCH_CHECK_BUFFER(false, "NumOptions is not a valid ScalarType");
TORCH_CHECK(false, "NumOptions is not a valid ScalarType");
}
return dtype;
}
DLDevice torchDeviceToDLDevice(at::Device device) {
static DLDevice getDLDevice(const Tensor& tensor, c10::DeviceIndex device_id) {
DLDevice ctx;
ctx.device_id = (device.is_cuda() || device.is_privateuseone())
? static_cast<int32_t>(static_cast<unsigned char>(device.index()))
: 0;
switch (device.type()) {
ctx.device_id = static_cast<int32_t>(static_cast<unsigned char>(device_id));
switch (tensor.device().type()) {
case DeviceType::CPU:
ctx.device_type = DLDeviceType::kDLCPU;
break;
@ -124,7 +120,8 @@ DLDevice torchDeviceToDLDevice(at::Device device) {
break;
case DeviceType::XPU:
ctx.device_type = DLDeviceType::kDLOneAPI;
ctx.device_id = at::detail::getXPUHooks().getGlobalIdxFromDevice(device);
ctx.device_id =
at::detail::getXPUHooks().getGlobalIdxFromDevice(tensor.device());
break;
case DeviceType::MAIA:
ctx.device_type = DLDeviceType::kDLMAIA;
@ -133,46 +130,44 @@ DLDevice torchDeviceToDLDevice(at::Device device) {
ctx.device_type = DLDeviceType::kDLExtDev;
break;
default:
TORCH_CHECK_BUFFER(false, "Cannot pack tensors on " + device.str());
TORCH_CHECK(false, "Cannot pack tensors on " + tensor.device().str());
}
return ctx;
}
static Device getATenDevice(DLDeviceType type, c10::DeviceIndex index, void* data = nullptr) {
switch (type) {
static Device getATenDevice(const DLDevice& ctx, void* data) {
switch (ctx.device_type) {
case DLDeviceType::kDLCPU:
return at::Device(DeviceType::CPU);
#ifndef USE_ROCM
// if we are compiled under HIP, we cannot do cuda
case DLDeviceType::kDLCUDA:
return at::Device(DeviceType::CUDA, index);
return at::Device(DeviceType::CUDA, static_cast<c10::DeviceIndex>(ctx.device_id));
#endif
case DLDeviceType::kDLOpenCL:
return at::Device(DeviceType::OPENCL, index);
return at::Device(DeviceType::OPENCL, static_cast<c10::DeviceIndex>(ctx.device_id));
case DLDeviceType::kDLROCM:
#ifdef USE_ROCM
// this looks funny, we need to return CUDA here to masquerade
return at::Device(DeviceType::CUDA, index);
return at::Device(DeviceType::CUDA, static_cast<c10::DeviceIndex>(ctx.device_id));
#else
return at::Device(DeviceType::HIP, index);
return at::Device(DeviceType::HIP, static_cast<c10::DeviceIndex>(ctx.device_id));
#endif
case DLDeviceType::kDLOneAPI:
TORCH_CHECK(data != nullptr, "Can't get ATen device for XPU without XPU data.");
return at::detail::getXPUHooks().getDeviceFromPtr(data);
case DLDeviceType::kDLMAIA:
return at::Device(DeviceType::MAIA, index);
return at::Device(DeviceType::MAIA, static_cast<c10::DeviceIndex>(ctx.device_id));
case DLDeviceType::kDLExtDev:
return at::Device(DeviceType::PrivateUse1, index);
return at::Device(DeviceType::PrivateUse1, static_cast<c10::DeviceIndex>(ctx.device_id));
default:
TORCH_CHECK_BUFFER(
false, "Unsupported device_type: ", std::to_string(type));
TORCH_CHECK(
false, "Unsupported device_type: ", std::to_string(ctx.device_type));
}
}
ScalarType toScalarType(const DLDataType& dtype) {
ScalarType stype = ScalarType::Undefined;
TORCH_CHECK_BUFFER(dtype.lanes == 1, "ATen does not support lanes != 1");
TORCH_CHECK(dtype.lanes == 1, "ATen does not support lanes != 1");
switch (dtype.code) {
case DLDataTypeCode::kDLUInt:
switch (dtype.bits) {
@ -189,7 +184,7 @@ ScalarType toScalarType(const DLDataType& dtype) {
stype = ScalarType::UInt64;
break;
default:
TORCH_CHECK_BUFFER(
TORCH_CHECK(
false, "Unsupported kUInt bits ", std::to_string(dtype.bits));
}
break;
@ -208,7 +203,7 @@ ScalarType toScalarType(const DLDataType& dtype) {
stype = ScalarType::Long;
break;
default:
TORCH_CHECK_BUFFER(
TORCH_CHECK(
false, "Unsupported kInt bits ", std::to_string(dtype.bits));
}
break;
@ -224,7 +219,7 @@ ScalarType toScalarType(const DLDataType& dtype) {
stype = ScalarType::Double;
break;
default:
TORCH_CHECK_BUFFER(
TORCH_CHECK(
false, "Unsupported kFloat bits ", std::to_string(dtype.bits));
}
break;
@ -234,7 +229,7 @@ ScalarType toScalarType(const DLDataType& dtype) {
stype = ScalarType::BFloat16;
break;
default:
TORCH_CHECK_BUFFER(
TORCH_CHECK(
false, "Unsupported kFloat bits ", std::to_string(dtype.bits));
}
break;
@ -250,7 +245,7 @@ ScalarType toScalarType(const DLDataType& dtype) {
stype = ScalarType::ComplexDouble;
break;
default:
TORCH_CHECK_BUFFER(
TORCH_CHECK(
false, "Unsupported kFloat bits ", std::to_string(dtype.bits));
}
break;
@ -260,12 +255,12 @@ ScalarType toScalarType(const DLDataType& dtype) {
stype = ScalarType::Bool;
break;
default:
TORCH_CHECK_BUFFER(
TORCH_CHECK(
false, "Unsupported kDLBool bits ", std::to_string(dtype.bits));
}
break;
default:
TORCH_CHECK_BUFFER(false, "Unsupported code ", std::to_string(dtype.code));
TORCH_CHECK(false, "Unsupported code ", std::to_string(dtype.code));
}
return stype;
}
@ -319,7 +314,11 @@ T* toDLPackImpl(const Tensor& src) {
atDLMTensor->tensor.manager_ctx = atDLMTensor;
atDLMTensor->tensor.deleter = &deleter<T>;
atDLMTensor->tensor.dl_tensor.data = view.data_ptr();
atDLMTensor->tensor.dl_tensor.device = torchDeviceToDLDevice(src.device());
c10::DeviceIndex device_id = 0;
if (src.is_cuda() || src.is_privateuseone()) {
device_id = src.get_device();
}
atDLMTensor->tensor.dl_tensor.device = getDLDevice(src, device_id);
atDLMTensor->tensor.dl_tensor.ndim = static_cast<int32_t>(src.dim());
atDLMTensor->tensor.dl_tensor.dtype = getDLDataType(src);
atDLMTensor->tensor.dl_tensor.shape = view.sizes().data();
@ -347,7 +346,7 @@ at::Tensor fromDLPackImpl(T* src, std::function<void(void*)> deleter) {
}
DLTensor& dl_tensor = src->dl_tensor;
Device device = getATenDevice(dl_tensor.device.device_type, dl_tensor.device.device_id, dl_tensor.data);
Device device = getATenDevice(dl_tensor.device, dl_tensor.data);
ScalarType stype = toScalarType(dl_tensor.dtype);
if (!dl_tensor.strides) {
@ -389,35 +388,4 @@ Tensor fromDLPackVersioned(DLManagedTensorVersioned* src, std::function<void(voi
return fromDLPackImpl<DLManagedTensorVersioned>(src, std::move(deleter));
}
Tensor maybeCopyTensor(
const Tensor& data,
std::optional<DLDevice> optional_dl_device,
std::optional<bool> copy) {
bool force_copy = copy.has_value() && *copy;
bool force_move = copy.has_value() && !*copy;
if (optional_dl_device.has_value()) {
auto device = at::getATenDevice(
optional_dl_device->device_type,
static_cast<c10::DeviceIndex>(optional_dl_device->device_id));
if (device != data.device()) {
TORCH_CHECK_VALUE(
!force_move,
"cannot move (i.e. copy=False) tensor from ",
data.device(),
" to ",
device,
" without copying.");
return data.to(device);
}
}
if (force_copy) {
return data.clone();
}
return data;
}
} // namespace at

View File

@ -4,7 +4,7 @@
#include <ATen/Tensor.h>
#include <ATen/dlpack.h>
// this converter will:
// this convertor will:
// 1) take a Tensor object and wrap it in the DLPack tensor
// 2) take a dlpack tensor and convert it to the ATen Tensor
@ -21,16 +21,6 @@ TORCH_API Tensor fromDLPackVersioned(
TORCH_API DLDataType getDLDataType(const Tensor& t);
TORCH_API DLDevice getDLContext(const Tensor& tensor, const int64_t& device_id);
// Copies the Tensor if there's a device mismatch or copy is forced.
// This should be used before actually creating the DLPack capsule.
TORCH_API Tensor maybeCopyTensor(
const Tensor& data,
std::optional<DLDevice> optional_dl_device,
std::optional<bool> copy);
// Converts the given at::Device into a DLDevice.
TORCH_API DLDevice torchDeviceToDLDevice(at::Device device);
// This trait class is used for retrieving different attributes, such as the
// PyCapsule names and conversion functions for both DLPack tensor classes:
// `DLManagedTensor` and `DLManagedTensorVersioned`.

View File

@ -233,8 +233,8 @@ Tensor FunctionalInverses::slice_Tensor_inverse(const Tensor& base, const Tensor
// NOLINTNEXTLINE(performance-unnecessary-value-param)
Tensor FunctionalInverses::split_Tensor_inverse(const Tensor& base, const Tensor& mutated_view, InverseReturnMode inverse_return_mode, int64_t mutated_view_idx, c10::SymInt split_size, int64_t dim) {
// It would be nice if this logic could be reused from autograd's split_backward(), but I don't think it can.
// For functionalization, we have only have one of the tensors from the TensorList outputted by split(), and we want to layer i
// It would be nice if this logic could be re-used from autograd's split_backward(), but I don't think it can.
// For functionalization, we have only have one of the tensors from the TensorList outputed by split(), and we want to layer i
// on top of the base tensor.
// For autograd, we have all of the tensors outputted by split() and we just want to stack them.
dim = at::maybe_wrap_dim(dim, base.dim());

View File

@ -286,11 +286,11 @@ void FunctionalTensorWrapper::storage_resize_(const c10::SymInt& new_size) {
// storage resizing is severely limited: we only support resizing either to zero, or from zero bytes.
TORCH_CHECK(new_size == 0 || curr_storage_size == 0, "new_size: ", new_size, ". curr_storage_size: ", curr_storage_size);
// The "functionalization rule" for storage resizing is a giant no-op, mainly because we don't want
// resize_() calls to actually emit any ops in the functional graph.
// resize_() calls to actualy emit any ops in the functional graph.
// How does it work?
// Resizing up (old size == 0):
// We do nothing in this case.
// The expectation is that for the user code to be valid, the next op that should run against the current tensor "x"
// The expection is that for the user code to be valid, the next op that should run against the current tensor "x"
// will be a x.copy_(y) (or similar), that will fully overwrite the data of x.
// If there are any outstanding aliases of x, we expect them not to be used until after the copy_() call
// (otherwise the eager code would be invalid),
@ -327,7 +327,7 @@ void FunctionalTensorWrapper::maybe_replace_storage(const Tensor& other) {
// We're also no longer re-generate "b" fully from "a" anymore, since "a" refers to a slice of "b"'s data.
//
// This is probably fixable in theory, but:
// - the fix would likely complicated the functionalization logic quite a bit.
// - the fix would likey complicated the functionalization logic quite a bit.
// - the primary use case for resize_() today is resizing zero-sized tensors in out= variants of operators
// - resize_() also can give you weird results today if you try to resize_() a weirdly strided tensor.
//
@ -344,7 +344,7 @@ void FunctionalTensorWrapper::maybe_replace_storage(const Tensor& other) {
set_sizes_and_strides(value_.sizes(), value_.strides());
refresh_numel();
// (Technically we should be guaranteed that the tensor was already contiguous,
// since it's guaranteed not to have been a view. Doesn't hurt to run though)
// since it's guaranteed not to have been a view. Doesnt hurt to run though)
refresh_contiguous();
// Swapping out the storage of a tensor (aka from a resize_() call) will update the sizes and strides of the tensor,
// so we need to record the fact that metadata was mutated.
@ -819,7 +819,7 @@ void setFunctionalizationReapplyViewsTLS(bool reapply_views) {
// This function will "functionalize" it.
// That is, it will call the operator, but removing any intermediate views/mutations
// that are performed inside of it.
// This is useful for LTC/XLA, which would like to reuse some of our composite kernels
// This is useful for LTC/XLA, which would like to re-use some of our composite kernels
// from pytorch core but not have to worry about the view ops that they might call.
// e.g. at::block_diag
void functionalize_op_helper(const c10::OperatorHandle& op, torch::jit::Stack* stack) {

View File

@ -218,7 +218,7 @@ static Tensor safeStack(TensorList tensors) {
// is possible for the backward function to return an undefined grad for some
// grad_input for each example. In that case, we return an undefined grad.
//
// It is theoretically possible for *some* of the examples to produce an
// It is theoretically posssible for *some* of the examples to produce an
// undefined grad (a kernel could peek at the gradient values and return an
// undefined tensor if it determines the gradient is full of zeros). We
// could handle this by treating the undefined grad as a zero-filled tensor

View File

@ -140,7 +140,7 @@ struct TORCH_API VmapPhysicalView {
// mapping a physical tensor to a new logical tensor (BatchedTensor)
VmapPhysicalToLogicalMap getPhysicalToLogicalMap() const;
// Maps a logical shape to a physical shape by prepending the batch
// Maps a logical shape to a physical shape by pre-pending the batch
// sizes to the logical shape.
VmapDimVector getPhysicalShape(IntArrayRef logical_shape) const;

View File

@ -299,7 +299,7 @@ MapAllocator::MapAllocator(WithFd, std::string_view filename, int fd, int flags,
::close(fd);
TORCH_CHECK(false, "unable to stretch file <", filename_, "> to the right size: ", c10::utils::str_error(last_err), " (", last_err, ")");
}
/* on macOS write returns with errno 45 (Operation not supported) when used
/* on macOS write returns with errno 45 (Opperation not supported) when used
* with a file descriptor obtained via shm_open
*/
#ifndef __APPLE__

View File

@ -211,7 +211,7 @@ NestedTensorImpl::NestedTensorImpl(
}
// assume contiguous, `nested_strides` and `offsets`
// can be inferred from `nested_sizes`
// can be infered from `nested_sizes`
NestedTensorImpl::NestedTensorImpl(
const at::Tensor& buffer,
const at::Tensor& nested_sizes)

View File

@ -32,7 +32,7 @@ struct TORCH_API NestedTensorImpl : public c10::TensorImpl {
at::Tensor nested_strides,
at::Tensor storage_offsets);
// assume contiguous, `nested_strides` and `offsets`
// can be inferred from `nested_sizes`
// can be infered from `nested_sizes`
explicit NestedTensorImpl(
const at::Tensor& buffer,
const at::Tensor& nested_sizes);

View File

@ -93,12 +93,12 @@ ident: identity for binary combination function sf. sf(ident, x) needs to return
x.
f: function for reduction over a chunk. f needs to be of signature scalar_t
f(int64_t partial_begin, int64_t partial_end, scalar_t identify)
f(int64_t partial_begin, int64_t partial_end, scalar_t identifiy)
sf: function to combine two partial results. sf needs to be of signature
scalar_t sf(scalar_t x, scalar_t y)
For example, you might have a tensor of 10000 entries and want to sum together
For example, you might have a tensor of 10000 entires and want to sum together
all the elements. Parallel_reduce with a grain_size of 2500 will then allocate
an intermediate result tensor with 4 elements. Then it will execute the function
"f" you provide and pass the beginning and end index of these chunks, so

View File

@ -8,28 +8,7 @@ namespace at {
namespace {
template <typename scalar_t>
inline void fill_inplace(Tensor& self, const Scalar& value_scalar) {
scalar_t value{};
if constexpr (std::is_same_v<scalar_t, at::Half> ||
std::is_same_v<scalar_t, at::BFloat16> ||
std::is_same_v<scalar_t, at::Float8_e5m2> ||
std::is_same_v<scalar_t, at::Float8_e5m2fnuz> ||
std::is_same_v<scalar_t, at::Float8_e4m3fn> ||
std::is_same_v<scalar_t, at::Float8_e4m3fnuz> ||
std::is_same_v<scalar_t, at::Float8_e8m0fnu>) {
// relaxed float cast: allow inf similar to the torch.tensor constructor
//
// without this, we had the following divergence:
// torch.tensor(1123581321.0, dtype=torch.float16)
// => tensor(inf, dtype=torch.float16)
// torch.ops.aten.scalar_tensor.default(1123581321, dtype=torch.float16)
// => RuntimeError: value cannot be converted to type at::Half without overflow
value = static_cast<scalar_t>(value_scalar.to<double>());
} else {
value = value_scalar.to<scalar_t>();
}
auto value = value_scalar.to<scalar_t>();
scalar_t* dptr = static_cast<scalar_t*>(self.data_ptr());
*dptr = value;
}

View File

@ -252,7 +252,7 @@ inline Tensor applySelect(
// Note: `size >= -index` is not equivalent to `size > -1 - index` if index
// is INT64_MIN For std::numeric_limits<int64_t>::min() result of unary
// minus is undefined by the standard but in practice is equal to self. On
// the other hand, indexing wrapping is valid for all negative int64_t
// the other hand, indexing wraping is valid for all negative int64_t
// values, as x[INT64_MIN] is the same as x[INT64_MAX]
TORCH_CHECK_INDEX(
size.sym_gt(-1 - index)
@ -315,17 +315,10 @@ inline void recordTensorIndex(
const Tensor& tensor,
std::vector<Tensor>& outIndices,
int64_t* dim_ptr) {
if (outIndices.empty()) {
outIndices.resize(*dim_ptr + 1);
outIndices[*dim_ptr] = tensor;
} else {
outIndices.push_back(tensor);
}
if (tensor.scalar_type() == kByte || tensor.scalar_type() == kBool) {
*dim_ptr += tensor.dim();
} else {
*dim_ptr += 1;
}
// TODO: check scalarType
outIndices.resize(*dim_ptr + 1);
outIndices[*dim_ptr] = tensor;
(*dim_ptr)++;
}
inline c10::List<::std::optional<Tensor>> typeConvertIndices(
@ -465,23 +458,13 @@ inline Tensor handleDimInMultiDimIndexing(
original_tensor_device,
prev_dim_result_sizes);
(*dim_ptr)++;
if (!outIndices.empty()) {
outIndices.resize(outIndices.size() + 1);
}
return result;
} else if (index.is_ellipsis()) {
auto ellipsis_ndims = original_tensor.dim() - *specified_dims_ptr;
(*dim_ptr) += ellipsis_ndims;
if (!outIndices.empty()) {
outIndices.resize(outIndices.size() + ellipsis_ndims);
}
(*dim_ptr) += original_tensor.dim() - (*specified_dims_ptr);
return prev_dim_result;
} else if (index.is_none()) {
Tensor result = prev_dim_result.unsqueeze(*dim_ptr);
(*dim_ptr)++;
if (!outIndices.empty()) {
outIndices.resize(outIndices.size() + 1);
}
return result;
} else if (index.is_boolean()) {
Tensor result = prev_dim_result.unsqueeze(*dim_ptr);
@ -577,10 +560,6 @@ inline Tensor applySlicing(
inline Tensor dispatch_index(
const Tensor& self,
std::vector<Tensor>&& indices) {
// Remove trailing null elements from indices
while (!indices.empty() && !indices.back().defined()) {
indices.pop_back();
}
return self.index(impl::typeConvertIndices(self, std::move(indices)));
}
@ -588,10 +567,6 @@ inline Tensor dispatch_index_put_(
Tensor& self,
std::vector<Tensor>&& indices,
const Tensor& value) {
// Remove trailing null elements from indices
while (!indices.empty() && !indices.back().defined()) {
indices.pop_back();
}
return self.index_put_(
impl::typeConvertIndices(self, std::move(indices)), value);
}

View File

@ -208,7 +208,7 @@ bool TensorIteratorConfig::is_tensor_const(size_t idx) {
// same strides are increasing. If dimensions are non-increasing, we move on to the next input to break the tie.
//
// Instead of applying rule 4 for tie breaking, we could move on to the next tensor directly. This would result in possibly
// losing the correct permutation of the first tensor if there are permuted trivial dimensions, but could potentially
// losing the correct permuation of the first tensor if there are permuted trivial dimensions, but could potentially
// improve traversal order of the second tensor. We chose the former option to better propagate channels last layout
// for example for a tensor with the sizes N1H1
// These rules result in the intuitive behavior that in most cases recovers permutation of either the first argument (if all
@ -244,7 +244,7 @@ void TensorIteratorBase::reorder_dimensions() {
// initialize perm with n-1, n-2, ..., 1, 0
std::iota(perm_.rbegin(), perm_.rend(), 0);
// Reordering dimensions changes iteration order
// Reordering dimensions changes iteraton order
if (enforce_linear_iteration_) {
permute_dimensions(perm_);
return;

View File

@ -388,7 +388,7 @@ struct TORCH_API TensorIteratorBase : public impl::MetaBase {
/// Return scalar value from original_tensor_base if it is defined. When
/// common_dtype is Half, casting scalar input to common_dtype might overflow.
/// If the scalar is already given in the type of Half, then return scalar
/// If the scalar is aleady given in the type of Half, then return scalar
/// value from tensor_base.
template <typename T>
T original_scalar_value(int64_t arg) {
@ -502,7 +502,7 @@ struct TORCH_API TensorIteratorBase : public impl::MetaBase {
/// kernels
bool can_use_32bit_indexing() const;
/// An "iterable" object that recursively splits this iterator into
/// An "iteratable" object that recursively splits this iterator into
/// sub-iterators that can use 32-bit indexing.
SplitUntil32Bit with_32bit_indexing() const;
@ -878,7 +878,7 @@ class TORCH_API TensorIteratorConfig final {
// Sets the enforce_linear_iteration_ flag, which is false by default.
// If true, iteration goes in the same order as a C-contiguous tensor
// is laid out in memory. i.e. last dimension iterates fastest.
// is layed out in memory. i.e. last dimension iterates fastest.
//
// This iteration order can be less efficient and may even prevent
// vectorization. So only use if the correctness of your kernel depends on it.

View File

@ -78,7 +78,7 @@ inline bool areAnyOptionalTensorSubclassLike(
// NOTE: This function expects a scalar tensor of boolean dtype.
// Eg.
// Non-Composite Compliant Pattern : (t == 0).all().item<bool>()
// Composite Compliant Pattern : is_salar_tensor_true((t == 0).all())
// Composite Compliant Patter : is_salar_tensor_true((t == 0).all())
inline bool is_scalar_tensor_true(const Tensor& t) {
TORCH_INTERNAL_ASSERT(t.dim() == 0)
TORCH_INTERNAL_ASSERT(t.scalar_type() == kBool)

View File

@ -378,9 +378,9 @@ inline static std::optional<ResultVec> computeStride_impl(
(TORCH_GUARD_OR_TRUE(sym_ne(oldshape[tensor_d - 1], 1)) &&
TORCH_GUARD_OR_TRUE(sym_ne(oldstride[tensor_d - 1], tensor_numel * chunk_base_stride)))) {
// We want to accumulate stuff in view_numel until view_numel == tensor_numel, if we do not
// know if that is satisfied we keep accumulating. For example if view_numel = 1 and tensor_numel = u1,
// know if that is satisfied we keep accumalating. For example if view_numel = 1 and tensor_numel = u1,
// we want to take that path, view_numel will become u0. Next iteration if u0==u1 we want to stop.
// That's why we use TORCH_GUARD_OR_TRUE below.
// Thats why we use TORCH_GUARD_OR_TRUE below.
// we use TORCH_GUARD_OR_FALSE and not TORCH_GUARD_OR_TRUE when comparing newshape[view_d] ==1 because
// if we know view_numel < tensor_numel is false, we want to stop. Unless we know for sure newshape[view_d]==1

View File

@ -27,7 +27,7 @@
// ops (ops being called by other ops). After the intermediate op call
// finishes it's set back to the original `TracingState` object.
//
// The `TracingState` object in TLS can also be read/written via its Python
// The `TracingState` obect in TLS can also be read/written via its Python
// binding in `python_tracer.cpp`, and `get/setTracingState()` C++ APIs,
// which are also exposed as `TORCH_API`.
//

View File

@ -95,7 +95,7 @@ namespace at {
m.impl("clone", torch::CppFunction::makeFallthrough());
m.impl("dot", torch::CppFunction::makeFallthrough());
m.impl("vdot", torch::CppFunction::makeFallthrough());
// The functions in the list below have a specific registration in native_functions.yaml and
// The functions in the list below have a specific registeration in native_functions.yaml and
// do not use the fallback.
// m.impl("mul.Tensor", torch::CppFunction::makeFallthrough());
// m.impl("add.Tensor", torch::CppFunction::makeFallthrough());

View File

@ -377,7 +377,7 @@ Keep it simple for now by assuming only one such flag is
present in the argument list. If I ever need a function
with more than flag I'll figure out something else.
The policy is:
If the user has explicitly specified a dtype, respect it.
If the user has explicity specified a dtype, respect it.
Otherwise, set it to the autocast type.
********************************************************/

View File

@ -568,9 +568,9 @@ bool Dispatcher::profilingOperatorEvents() {
return TORCH_SDT_IS_ENABLED(operator_start) || TORCH_SDT_IS_ENABLED(operator_end);
}
C10_NOINLINE void Dispatcher::fireOpStartUSDT(at::RecordFunction::schema_ref_t schema_ref, std::vector<void*>& argsAddresses, std::vector<const char*>& argsTypes) {
C10_NOINLINE void Dispatcher::fireOpStartUSDT(at::RecordFunction::schema_ref_t schema_ref) {
if (TORCH_SDT_IS_ENABLED(operator_start)) {
TORCH_SDT_WITH_SEMAPHORE(operator_start, schema_ref.get().name().c_str(), argsAddresses.size(), argsAddresses.data(), argsTypes.data());
TORCH_SDT_WITH_SEMAPHORE(operator_start, schema_ref.get().name().c_str());
}
}

View File

@ -371,10 +371,7 @@ class TORCH_API Dispatcher final {
#ifdef FBCODE_CAFFE2
static bool profilingOperatorEvents();
static void fireOpStartUSDT(
at::RecordFunction::schema_ref_t schema_ref,
std::vector<void*>& argsAddresses,
std::vector<const char*>& argsTypes);
static void fireOpStartUSDT(at::RecordFunction::schema_ref_t schema_ref);
static void fireOpEndUSDT(at::RecordFunction::schema_ref_t schema_ref);
#endif // FBCODE_CAFFE2
@ -798,21 +795,16 @@ C10_ALWAYS_INLINE_UNLESS_MOBILE Return Dispatcher::call(
#ifdef FBCODE_CAFFE2
if (profilingOperatorEvents()) {
std::vector<void*> argsAddresses = {(void*)(&args)...};
std::vector<const char*> argsTypes = {(typeid(args).name())...};
struct FireOpRAII {
FireOpRAII(
at::RecordFunction::schema_ref_t schema_ref,
std::vector<void*>& argsAddresses,
std::vector<const char*>& argsTypes)
FireOpRAII(at::RecordFunction::schema_ref_t schema_ref)
: schema_ref_(schema_ref) {
fireOpStartUSDT(schema_ref, argsAddresses, argsTypes);
fireOpStartUSDT(schema_ref);
}
~FireOpRAII() {
fireOpEndUSDT(schema_ref_);
}
at::RecordFunction::schema_ref_t schema_ref_;
} event(op.schema(), argsAddresses, argsTypes);
} event(op.schema());
return kernel.template call<Return, Args...>(
op, dispatchKeySet, std::forward<Args>(args)...);
} else {

View File

@ -162,7 +162,7 @@ struct CUDACachingHostAllocatorImpl
}
bool pinned_use_background_threads() override {
return c10::cuda::CUDACachingAllocator::CUDAAllocatorConfig::
return c10::CachingAllocator::AcceleratorAllocatorConfig::
pinned_use_background_threads();
}
@ -258,7 +258,7 @@ DECLARE_HOST_ALLOCATOR(
CUDACachingHostAllocator,
CUDACachingHostAllocatorImpl,
raw_local_deleter,
caching_host_allocator)
caching_host_allocator);
REGISTER_HOST_ALLOCATOR(at::kCUDA, &caching_host_allocator)

View File

@ -199,7 +199,7 @@ typedef struct {
* `byte_offset` field should be used to point to the beginning of the data.
*
* Note that as of Nov 2021, multiply libraries (CuPy, PyTorch, TensorFlow,
* TVM, perhaps others) do not adhere to this 256 byte alignment requirement
* TVM, perhaps others) do not adhere to this 256 byte aligment requirement
* on CPU/CUDA/ROCm, and always use `byte_offset=0`. This must be fixed
* (after which this note will be updated); at the moment it is recommended
* to not rely on the data pointer being correctly aligned.

View File

@ -158,7 +158,6 @@ TORCH_LIBRARY_IMPL(aten, FuncTorchBatchedDecomposition, m) {
OP_DECOMPOSE(kron);
OP_DECOMPOSE(l1_loss);
m.impl("layer_norm", native::layer_norm_symint);
m.impl("_fused_rms_norm", native::rms_norm_composite);
OP_DECOMPOSE2(ldexp, Tensor);
OP_DECOMPOSE2(less_equal, Tensor );
OP_DECOMPOSE2(less, Tensor );

View File

@ -2453,7 +2453,7 @@ TORCH_IMPL_FUNC(linalg_qr_out)(const Tensor& A,
// geqrf requires m x n workspace input that is modified in-place
// We try to use Q. If it doesn't fit, we try to use R
// If m > n and compute_q==false, it won't fit into Q or R, so we need to create an auxiliary tensor
// If m > n and compute_q==false, it won't fit into Q or R, so we neet to create an auxiliary tensor
Tensor QR;
if (compute_q && Q.size(-1) == n) {
QR = Q;
@ -4095,7 +4095,7 @@ Tensor linalg_vander_symint(
const auto n = N.value_or(shape.back());
TORCH_CHECK(n > 1, "N must be greater than 1.");
// Append cumprod of the other 0...n-1 powers
// Append cumprod of the oher 0...n-1 powers
shape.push_back(n - 1);
auto result = at::cumprod(x_.unsqueeze(-1).expand_symint(shape), -1);
// The row of ones

View File

@ -202,7 +202,7 @@ void gemm(
float *c, int64_t ldc) {
internal::normalize_last_dims(transa, transb, m, n, k, &lda, &ldb, &ldc);
#if AT_MKLDNN_ENABLED()
if (mkldnn_reduced_f32_gemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)) {
if (mkldnn_bf32_gemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)) {
return;
}
#endif

View File

@ -36,10 +36,8 @@
#endif
#ifdef USE_FBGEMM
C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED("-Wextra-semi")
#include <fbgemm/Fbgemm.h>
#include <fbgemm/FbgemmConvert.h>
C10_DIAGNOSTIC_POP()
#endif
namespace {

View File

@ -54,7 +54,7 @@ bool ceil_mode) {
TORCH_CHECK((input.ndimension() == 3 || input.ndimension() == 4),
"non-empty 3D or 4D (batch mode) tensor expected for input");
} else {
TORCH_CHECK(false, "Unsupported memory format. Supports only ChannelsLast, Contiguous");
TORCH_CHECK(false, "Unsupport memory format. Supports only ChannelsLast, Contiguous");
}
/* sizes */
@ -130,7 +130,7 @@ const Tensor& indices) {
TORCH_CHECK((input.ndimension() == 3 || input.ndimension() == 4),
"non-empty 3D or 4D (batch mode) tensor expected for input");
} else {
TORCH_CHECK(false, "Unsupported memory format. Supports only ChannelsLast, Contiguous");
TORCH_CHECK(false, "Unsupport memory format. Supports only ChannelsLast, Contiguous");
}
/* sizes */

View File

@ -63,7 +63,7 @@ void max_pool3d_with_indices_out_cpu_template(
TORCH_CHECK((input.ndimension() == 4 || input.ndimension() == 5),
"non-empty 4D or 5D (batch mode) tensor expected for input");
} else {
TORCH_CHECK(false, "Unsupported memory format. Supports only ChannelsLast3d, Contiguous");
TORCH_CHECK(false, "Unsupport memory format. Supports only ChannelsLast3d, Contiguous");
}
const int64_t nslices = input.size(-4);
@ -158,7 +158,7 @@ Tensor& max_pool3d_with_indices_backward_out_cpu_template(
TORCH_CHECK((input.ndimension() == 4 || input.ndimension() == 5),
"non-empty 4D or 5D (batch mode) tensor expected for input");
} else {
TORCH_CHECK(false, "Unsupported memory format. Supports only ChannelsLast3d, Contiguous");
TORCH_CHECK(false, "Unsupport memory format. Supports only ChannelsLast3d, Contiguous");
}
const int64_t nslices = input.size(-4);

View File

@ -28,13 +28,13 @@ namespace at::native::templates {
// ==================================================== Random ========================================================
// The purpose of `update_from` and `update_to` is to find the closest valid int64_t number that can be used as actual `from`.
// The current implementation of `random_` uses uint64_t arithmetic and casts the result to the target dtype(scalar_t).
// The current implementation of `random_` uses uint64_t arithmetics and casts the result to the target dtype(scalar_t).
// This casting can result in generating numbers that happen to be greater or equal to `to` value. For instance:
//
// auto actual = torch::empty({3, 3}, torch::half);
// actual.random_(0, 65504);
//
// If random's uint64_t arithmetic produces 65503 as a random value after casting to torch::half it becomes 65504
// If random's uint64_t arithmetics produces 65503 as a random value after casting to torch::half it becomes 65504
// and violates the requirement that random value must be less than `to`. To resolve this issue `update_from` and `update_to`
// moves `from` to the right and `to` to the left to the next closest value that won't go outside [from, to) after casting to
// the target dtype. For `to` = 65504 it moves left for (1 << (log2(to) - 11 + 1)) = 32 and becomes 65472, which is previous

View File

@ -14,10 +14,8 @@
#include <c10/util/Half.h>
#ifdef USE_FBGEMM
C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED("-Wextra-semi")
#include <fbgemm/Fbgemm.h>
#include <fbgemm/FbgemmConvert.h>
C10_DIAGNOSTIC_POP()
#else
#include <caffe2/perfkernels/embedding_lookup_idx.h>
#endif

Some files were not shown because too many files have changed in this diff Show More