Fix hipify_python (#52756 )

Co-authored-by: rraminen <rraminen@amd.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>
Catch Flake8 error codes with multiple letters (#52750 ) (#52801 )
2025-10-26 08:34:52 +08:00 · 2021-02-26 14:13:54 -08:00 · 2021-02-26 07:49:51 -08:00 · 2021-02-26 07:47:29 -08:00 · 2021-02-23 07:51:38 -08:00 · 2021-02-23 05:31:57 -08:00
7881 changed files with 371355 additions and 1147890 deletions
--- a/.bazelrc
+++ b/.bazelrc
@ -1,27 +1,3 @@
-build --cxxopt=--std=c++14
+build --copt=--std=c++14
 build --copt=-I.
-# Bazel does not support including its cc_library targets as system
-# headers. We work around this for generated code
-# (e.g. c10/macros/cmake_macros.h) by making the generated directory a
-# system include path.
 build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
-build --copt=-isystem --copt bazel-out/darwin-fastbuild/bin
-build --experimental_ui_max_stdouterr_bytes=2048576
-
-# Configuration to disable tty features for environments like CI
-build:no-tty --curses no
-build:no-tty --progress_report_interval 10
-build:no-tty --show_progress_rate_limit 10
-
-# Configuration to build with GPU support
-build:gpu --define=cuda=true
-# define a separate build folder for faster switching between configs
-build:gpu --platform_suffix=-gpu
-# See the note on the config-less build for details about why we are
-# doing this. We must also do it for the "-gpu" platform suffix.
-build --copt=-isystem --copt=bazel-out/k8-fastbuild-gpu/bin
-# rules_cuda configuration
-build:gpu --@rules_cuda//cuda:enable_cuda
-build:gpu --@rules_cuda//cuda:cuda_targets=sm_52
-build:gpu --@rules_cuda//cuda:compiler=nvcc
-build:gpu --repo_env=CUDA_PATH=/usr/local/cuda
--- a/.bazelversion
+++ b/.bazelversion
@ -1 +1 @@
-4.2.1
+3.1.0
--- a/.buckconfig.oss
+++ b/.buckconfig.oss
@ -1,15 +0,0 @@
-[buildfile]
-name = BUILD.buck
-
-[repositories]
-  bazel_skylib = third_party/bazel-skylib/
-
-[download]
-  in_build = true
-
-[cxx]
-  cxxflags = -std=c++17
-  should_remap_host_platform = true
-
-[project]
-  default_flavors_mode=all
--- a/.circleci/README.md
+++ b/.circleci/README.md
@ -0,0 +1,499 @@
+Structure of CI
+===============
+
+setup job:
+1. Does a git checkout
+2. Persists CircleCI scripts (everything in `.circleci`) into a workspace.  Why?
+   We don't always do a Git checkout on all subjobs, but we usually
+   still want to be able to call scripts one way or another in a subjob.
+   Persisting files this way lets us have access to them without doing a
+   checkout.  This workspace is conventionally mounted on `~/workspace`
+   (this is distinguished from `~/project`, which is the conventional
+   working directory that CircleCI will default to starting your jobs
+   in.)
+3. Write out the commit message to `.circleci/COMMIT_MSG`.  This is so
+   we can determine in subjobs if we should actually run the jobs or
+   not, even if there isn't a Git checkout.
+
+
+
+
+CircleCI configuration generator
+================================
+
+One may no longer make changes to the `.circleci/config.yml` file directly.
+Instead, one must edit these Python scripts or files in the `verbatim-sources/` directory.
+
+
+Usage
+----------
+
+1. Make changes to these scripts.
+2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`.
+
+You'll see a build failure on GitHub if the scripts don't agree with the checked-in version.
+
+
+Motivation
+----------
+
+These scripts establish a single, authoritative source of documentation for the CircleCI configuration matrix.
+The documentation, in the form of diagrams, is automatically generated and cannot drift out of sync with the YAML content.
+
+Furthermore, consistency is enforced within the YAML config itself, by using a single source of data to generate
+multiple parts of the file.
+
+* Facilitates one-off culling/enabling of CI configs for testing PRs on special targets
+
+Also see https://github.com/pytorch/pytorch/issues/17038
+
+
+Future direction
+----------------
+
+### Declaring sparse config subsets
+See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):
+
+In contrast with a full recursive tree traversal of configuration dimensions,
+> in the future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
+
+----------------
+----------------
+
+# How do the binaries / nightlies / releases work?
+
+### What is a binary?
+
+A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source.
+
+A **binary configuration** is a collection of
+
+* release or nightly
+    * releases are stable, nightlies are beta and built every night
+* python version
+    * linux: 3.5m, 3.6m 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)
+    * macos: 3.6, 3.7, 3.8
+    * windows: 3.6, 3.7, 3.8
+* cpu version
+    * cpu, cuda 9.0, cuda 10.0
+    * The supported cuda versions occasionally change
+* operating system
+    * Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu
+    * MacOS
+    * Windows - these are built on Azure pipelines
+* devtoolset version (gcc compiler version)
+    * This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
+
+### Where are the binaries?
+
+The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months.
+
+We have 3 types of binary packages
+
+* pip packages - nightlies are stored on s3 (pip install -f \<a s3 url\>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
+* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix
+* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only
+    * shared with dependencies (the only supported option for Windows)
+    * static with dependencies
+    * shared without dependencies
+    * static without dependencies
+
+All binaries are built in CircleCI workflows except Windows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release)
+
+# CircleCI structure of the binaries
+
+Some quick vocab:
+
+* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
+* **jobs** are a sequence of '**steps**'
+* **steps** are usually just a bash script or a builtin CircleCI command. *All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
+* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
+
+## How are the workflows structured?
+
+The nightly binaries have 3 workflows. We have one job (actually 3 jobs:  build, test, and upload) per binary configuration
+
+1. binary_builds
+    1. every day midnight EST
+    2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
+    3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
+    4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
+        1. binary_linux_conda_3.7_cpu_build
+            1. Builds the build. On linux jobs this uses the 'docker executor'.
+            2. Persists the package to the workspace
+        2. binary_linux_conda_3.7_cpu_test
+            1. Loads the package to the workspace
+            2. Spins up a docker image (on Linux), mapping the package and code repos into the docker
+            3. Runs some smoke tests in the docker
+            4. (Actually, for macos this is a step rather than a separate job)
+        3. binary_linux_conda_3.7_cpu_upload
+            1. Logs in to aws/conda
+            2. Uploads the package
+2. update_s3_htmls
+    1. every day 5am EST
+    2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
+    3. See below for what these are for and why they're needed
+    4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
+3. binarysmoketests
+    1. every day
+    2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
+    3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
+        1. smoke_linux_conda_3.7_cpu
+            1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
+            2. Runs the smoke tests
+
+## How are the jobs structured?
+
+The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
+
+* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
+    * binary_linux_build.sh
+    * binary_linux_test.sh
+    * binary_linux_upload.sh
+* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
+    * binary_macos_build.sh
+    * binary_macos_test.sh
+    * binary_macos_upload.sh
+* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
+    * These delegate from the pytorch/builder repo
+    * https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh
+    * https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh
+* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
+    * These delegate from the pytorch/builder repo
+    * https://github.com/pytorch/builder/blob/master/run_tests.sh
+    * https://github.com/pytorch/builder/blob/master/smoke_test.sh
+    * https://github.com/pytorch/builder/blob/master/check_binary.sh
+* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
+    * binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh
+    * binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.
+    * binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables
+    * binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image
+
+### **Why do the steps all refer to scripts?**
+
+CircleCI creates a  final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems.
+
+### **What is binary_run_in_docker for?**
+
+So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus
+
+* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor
+* linux test jobs use the machine executor in order for them to properly interface with GPUs since docker executors cannot execute with attached GPUs
+* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use
+* linux smoke test jobs use the machine executor for the same reason as the linux test jobs
+
+binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs
+
+### **Why does binary_checkout also checkout pytorch? Why shouldn't it?**
+
+We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where.
+
+# Azure Pipelines structure of the binaries
+
+TODO: fill in stuff
+
+## How are the workflows structured?
+
+TODO: fill in stuff
+
+## How are the jobs structured?
+
+TODO: fill in stuff
+
+# Code structure of the binaries (circleci agnostic)
+
+## Overview
+
+The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder), which is a repo that defines how all the binaries are built. The relevant code is
+
+
+```
+# All code needed to set-up environments for build code to run in,
+# but only code that is specific to the current CI system
+pytorch/pytorch
+- .circleci/                # Folder that holds all circleci related stuff
+  - config.yml              # GENERATED file that actually controls all circleci behavior
+  - verbatim-sources        # Used to generate job/workflow sections in ^
+  - scripts/                # Code needed to prepare circleci environments for binary build scripts
+
+- setup.py                  # Builds pytorch. This is wrapped in pytorch/builder
+- cmake files               # used in normal building of pytorch
+
+# All code needed to prepare a binary build, given an environment
+# with all the right variables/packages/paths.
+pytorch/builder
+
+# Given an installed binary and a proper python env, runs some checks
+# to make sure the binary was built the proper way. Checks things like
+# the library dependencies, symbols present, etc.
+- check_binary.sh
+
+# Given an installed binary, runs python tests to make sure everything
+# is in order. These should be de-duped. Right now they both run smoke
+# tests, but are called from different places. Usually just call some
+# import statements, but also has overlap with check_binary.sh above
+- run_tests.sh
+- smoke_test.sh
+
+# Folders that govern how packages are built. See paragraphs below
+
+- conda/
+  - build_pytorch.sh          # Entrypoint. Delegates to proper conda build folder
+  - switch_cuda_version.sh    # Switches activate CUDA installation in Docker
+  - pytorch-nightly/          # Build-folder
+- manywheel/
+  - build_cpu.sh              # Entrypoint for cpu builds
+  - build.sh                  # Entrypoint for CUDA builds
+  - build_common.sh           # Actual build script that ^^ call into
+- wheel/
+  - build_wheel.sh            # Entrypoint for wheel builds
+- windows/
+  - build_pytorch.bat         # Entrypoint for wheel builds on Windows
+```
+
+Every type of package has an entrypoint build script that handles the all the important logic.
+
+## Conda
+
+Linux, MacOS and Windows use the same code flow for the conda builds.
+
+Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
+
+Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
+tl;dr on conda-build is
+
+1. Creates a brand new conda environment, based off of deps in the meta.yaml
+    1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml
+    2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is.
+2. Calls build.sh in the environment
+3. Copies the finished package to a new conda env, also specified by the meta.yaml
+4. Runs some simple import tests (if specified in the meta.yaml)
+5. Saves the finished package as a tarball
+
+The build.sh we use is essentially a wrapper around `python setup.py build`, but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.
+
+The entrypoint file `builder/conda/build_conda.sh` is complicated because
+
+* It works for Linux, MacOS and Windows
+    * The mac builds used to create their own environments, since they all used to be on the same machine. There’s now a lot of extra logic to handle conda envs. This extra machinery could be removed
+* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed.
+
+## Manywheels (linux pip and libtorch packages)
+
+Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
+
+`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`
+
+The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because
+
+* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unnecessary folders and movements here and there.
+    * The script is never used this way anymore. This extra machinery could be removed.
+* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff
+    * The script is never used this way anymore. This extra machinery could be removed.
+* This also builds libtorch packages
+    * This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
+* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.
+
+## Wheels (MacOS pip and libtorch packages)
+
+The entrypoint file `builder/wheel/build_wheel.sh` is complicated because
+
+* The mac builds used to all run on one machine (we didn’t have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory.
+    * The script is never used this way anymore. This extra machinery could be removed.
+* This also builds libtorch packages
+    * Ditto the comment above. This should definitely be separated out.
+
+Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
+
+## Windows Wheels (Windows pip and libtorch packages)
+
+The entrypoint file `builder/windows/build_pytorch.bat` is complicated because
+
+* This used to handle building for several different python versions at the same time. This is why there are loops everywhere
+    * The script is never used this way anymore. This extra machinery could be removed.
+* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff
+    * The script is never used this way anymore. This extra machinery could be removed.
+* This also builds libtorch packages
+    * This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
+
+Note that the Windows Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
+
+## General notes
+
+### Note on run_tests.sh, smoke_test.sh, and check_binary.sh
+
+* These should all be consolidated
+* These must run on all OS types: MacOS, Linux, and Windows
+* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didn’t mess anything up.
+* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.
+
+### Note on libtorch
+
+Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this
+
+* It’s confusing. Most of those scripts deal with python specifics.
+* The extra conditionals everywhere severely complicate the wheel build scripts
+* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)
+
+### Note on docker images / Dockerfiles
+
+All linux builds occur in docker images. The docker images are
+
+* pytorch/conda-cuda
+    * Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
+    * Also used for cpu builds
+* pytorch/manylinux-cuda90
+* pytorch/manylinux-cuda92
+* pytorch/manylinux-cuda100
+    * Also used for cpu builds
+
+The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
+
+### General Python
+
+* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2
+
+# How to manually rebuild the binaries
+
+tl;dr make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
+
+Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
+
+## How to test changes to the binaries via .circleci
+
+Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using `.circleci/regenerate.sh` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.
+
+```sh
+# Make your changes
+touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
+
+# Regenerate the yaml, has to be in python 3.7
+.circleci/regenerate.sh
+
+# Make a commit
+git add .circleci *
+git commit -m "My real changes"
+git push origin my_branch
+
+# Now hardcode the jobs that you want in the .circleci/config.yml workflows section
+# Also eliminate ensure-consistency and should_run_job checks
+# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d
+
+# Make a commit you won't keep
+git add .circleci
+git commit -m "[DO NOT LAND] testing binaries for above changes"
+git push origin my_branch
+
+# Now you need to make some changes to the first commit.
+git rebase -i HEAD~2 # mark the first commit as 'edit'
+
+# Make the changes
+touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
+.circleci/regenerate.sh
+
+# Ammend the commit and recontinue
+git add .circleci
+git commit --amend
+git rebase --continue
+
+# Update the PR, need to force since the commits are different now
+git push origin my_branch --force
+```
+
+The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes.
+
+## How to build a binary locally
+
+### Linux
+
+You can build Linux binaries locally easily using docker.
+
+```sh
+# Run the docker
+# Use the correct docker image, pytorch/conda-cuda used here as an example
+#
+# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the
+#    machine that you're running the command on) accessible to the docker
+#    container at path/to/bar. So if you then run `touch path/to/bar/baz`
+#    in the docker container then you will see path/to/foo/baz on your local
+#    machine. You could also clone the pytorch and builder repos in the docker.
+#
+# If you know how, add ccache as a volume too and speed up everything
+docker run \
+    -v your/pytorch/repo:/pytorch \
+    -v your/builder/repo:/builder \
+    -v where/you/want/packages/to/appear:/final_pkgs \
+    -it pytorch/conda-cuda /bin/bash
+
+# Export whatever variables are important to you. All variables that you'd
+# possibly need are in .circleci/scripts/binary_populate_env.sh
+# You should probably always export at least these 3 variables
+export PACKAGE_TYPE=conda
+export DESIRED_PYTHON=3.6
+export DESIRED_CUDA=cpu
+
+# Call the entrypoint
+# `|& tee foo.log` just copies all stdout and stderr output to foo.log
+# The builds generate lots of output so you probably need this when
+# building locally.
+/builder/conda/build_pytorch.sh |& tee build_output.log
+```
+
+**Building CUDA binaries on docker**
+
+You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though it’s gonna take a long time).
+
+For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.
+
+### MacOS
+
+There’s no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If you’re trying to repro an error on a Mac build in .circleci and you can’t seem to repro locally, then my best advice is actually to iterate on .circleci    :/
+
+But if you want to try, then I’d recommend
+
+```sh
+# Create a new terminal
+# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
+# know how to do
+
+# Install a new miniconda
+# First remove any other python or conda installation from your PATH
+# Always install miniconda 3, even if building for Python <3
+new_conda="~/my_new_conda"
+conda_sh="$new_conda/install_miniconda.sh"
+curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
+chmod +x "$conda_sh"
+"$conda_sh" -b -p "$MINICONDA_ROOT"
+rm -f "$conda_sh"
+export PATH="~/my_new_conda/bin:$PATH"
+
+# Create a clean python env
+# All MacOS builds use conda to manage the python env and dependencies
+# that are built with, even the pip packages
+conda create -yn binary python=2.7
+conda activate binary
+
+# Export whatever variables are important to you. All variables that you'd
+# possibly need are in .circleci/scripts/binary_populate_env.sh
+# You should probably always export at least these 3 variables
+export PACKAGE_TYPE=conda
+export DESIRED_PYTHON=3.6
+export DESIRED_CUDA=cpu
+
+# Call the entrypoint you want
+path/to/builder/wheel/build_wheel.sh
+```
+
+N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
+
+1. You make the ‘conda’ command accessible by prepending `path/to/conda_root/bin` to your PATH.
+2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
+3. Now say you (or some code that you ran) call python executable `foo`
+    1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.
+    2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called ‘base’), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!
+
+Newer conda versions and proper python hygiene can prevent this, but just install a new miniconda to be safe.
+
+### Windows
+
+TODO: fill in
--- a/.circleci/cimodel/data/binary_build_data.py
+++ b/.circleci/cimodel/data/binary_build_data.py
@ -30,7 +30,47 @@ def get_processor_arch_name(gpu_version):
        "cu" + gpu_version.strip("cuda") if gpu_version.startswith("cuda") else gpu_version
    )

+LINUX_PACKAGE_VARIANTS = OrderedDict(
+    manywheel=[
+        "3.6m",
+        "3.7m",
+        "3.8m",
+        "3.9m"
+    ],
+    conda=dimensions.STANDARD_PYTHON_VERSIONS,
+    libtorch=[
+        "3.7m",
+    ],
+)
+
 CONFIG_TREE_DATA = OrderedDict(
+    linux=(dimensions.GPU_VERSIONS, LINUX_PACKAGE_VARIANTS),
+    macos=([None], OrderedDict(
+        wheel=dimensions.STANDARD_PYTHON_VERSIONS,
+        conda=dimensions.STANDARD_PYTHON_VERSIONS,
+        libtorch=[
+            "3.7",
+        ],
+    )),
+    macos_arm64=([None], OrderedDict(
+        wheel=[
+            "3.8",
+        ],
+        conda=[
+            "3.8",
+        ],
+    )),
+    # Skip CUDA-9.2 builds on Windows
+    windows=(
+        [v for v in dimensions.GPU_VERSIONS if v not in ['cuda92'] + dimensions.ROCM_VERSION_LABELS],
+        OrderedDict(
+            wheel=dimensions.STANDARD_PYTHON_VERSIONS,
+            conda=dimensions.STANDARD_PYTHON_VERSIONS,
+            libtorch=[
+                "3.7",
+            ],
+        )
+    ),
 )

 # GCC config variants:
@ -85,7 +125,6 @@ class PackageFormatConfigNode(ConfigNode):
        self.props["python_versions"] = python_versions
        self.props["package_format"] = package_format

-
    def get_children(self):
        if self.find_prop("os_name") == "linux":
            return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]
--- a/.circleci/cimodel/data/binary_build_definitions.py
+++ b/.circleci/cimodel/data/binary_build_definitions.py
@ -27,19 +27,7 @@ class Conf(object):

    def gen_docker_image(self):
        if self.gcc_config_variant == 'gcc5.4_cxx11-abi':
-            if self.gpu_version is None:
-                return miniutils.quote("pytorch/libtorch-cxx11-builder:cpu")
-            else:
-                return miniutils.quote(
-                    f"pytorch/libtorch-cxx11-builder:{self.gpu_version}"
-                )
-        if self.pydistro == "conda":
-            if self.gpu_version is None:
-                return miniutils.quote("pytorch/conda-builder:cpu")
-            else:
-                return miniutils.quote(
-                    f"pytorch/conda-builder:{self.gpu_version}"
-                )
+            return miniutils.quote("pytorch/pytorch-binary-docker-image-ubuntu16.04:latest")

        docker_word_substitution = {
            "manywheel": "manylinux",
@ -124,9 +112,9 @@ class Conf(object):
        Output looks similar to:

      - binary_upload:
-          name: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_upload
+          name: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_upload
          context: org-member
-          requires: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_test
+          requires: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_test
          filters:
            branches:
              only:
@ -134,7 +122,7 @@ class Conf(object):
            tags:
              only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/
          package_type: manywheel
-          upload_subfolder: cu113
+          upload_subfolder: cu92
        """
        return {
            "binary_upload": OrderedDict({
--- a/.circleci/cimodel/data/dimensions.py
+++ b/.circleci/cimodel/data/dimensions.py
@ -1,14 +1,14 @@
 PHASES = ["build", "test"]

 CUDA_VERSIONS = [
+    "101",
    "102",
-    "113",
-    "116",
+    "111",
 ]

 ROCM_VERSIONS = [
-    "4.3.1",
-    "4.5.2",
+    "3.10",
+    "4.0.1",
 ]

 ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]
@ -16,8 +16,8 @@ ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]
 GPU_VERSIONS = [None] + ["cuda" + v for v in CUDA_VERSIONS] + ROCM_VERSION_LABELS

 STANDARD_PYTHON_VERSIONS = [
+    "3.6",
    "3.7",
    "3.8",
-    "3.9",
-    "3.10"
+    "3.9"
 ]
--- a/.circleci/cimodel/data/pytorch_build_data.py
+++ b/.circleci/cimodel/data/pytorch_build_data.py
@ -1,7 +1,105 @@
-from cimodel.lib.conf_tree import ConfigNode
+from cimodel.lib.conf_tree import ConfigNode, X, XImportant


 CONFIG_TREE_DATA = [
+    ("xenial", [
+        ("gcc", [
+            ("5.4", [  # All this subtree rebases to master and then build
+                ("3.6", [
+                    ("important", [X(True)]),
+                    ("parallel_tbb", [X(True)]),
+                    ("parallel_native", [X(True)]),
+                    ("pure_torch", [X(True)]),
+                ]),
+            ]),
+            # TODO: bring back libtorch test
+            ("7", [X("3.6")]),
+        ]),
+        ("clang", [
+            ("5", [
+                ("3.6", [
+                    ("asan", [
+                        (True, [
+                            ("shard_test", [XImportant(True)]),
+                        ]),
+                    ]),
+                ]),
+            ]),
+            ("7", [
+                ("3.6", [
+                    ("onnx", [XImportant(True)]),
+                ]),
+            ]),
+        ]),
+        ("cuda", [
+            ("9.2", [
+                ("3.6", [
+                    X(True),
+                    ("cuda_gcc_override", [
+                        ("gcc5.4", [
+                            ('build_only', [XImportant(True)]),
+                        ]),
+                    ]),
+                ])
+            ]),
+            ("10.1", [
+                ("3.6", [
+                    ('build_only', [X(True)]),
+                ]),
+            ]),
+            ("10.2", [
+                ("3.6", [
+                    ("shard_test", [XImportant(True)]),
+                    ("libtorch", [
+                        (True, [
+                            ('build_only', [X(True)]),
+                        ]),
+                    ]),
+                ]),
+            ]),
+            ("11.1", [
+                ("3.8", [
+                    X(True),
+                    ("libtorch", [
+                        (True, [
+                            ('build_only', [XImportant(True)]),
+                        ]),
+                    ]),
+                ]),
+            ]),
+        ]),
+    ]),
+    ("bionic", [
+        ("clang", [
+            ("9", [
+                XImportant("3.6"),
+            ]),
+            ("9", [
+                ("3.6", [
+                    ("xla", [XImportant(True)]),
+                    ("vulkan", [XImportant(True)]),
+                ]),
+            ]),
+        ]),
+        ("gcc", [
+            ("9", [
+                ("3.8", [
+                    ("coverage", [
+                        (True, [
+                            ("shard_test", [XImportant(True)]),
+                        ]),
+                    ]),
+                ]),
+            ]),
+        ]),
+        ("rocm", [
+            ("3.9", [
+                ("3.6", [
+                    ('build_only', [XImportant(True)]),
+                ]),
+            ]),
+        ]),
+    ]),
 ]


@ -53,8 +151,6 @@ class PyVerConfigNode(TreeConfigNode):
    def init2(self, node_name):
        self.props["pyver"] = node_name
        self.props["abbreviated_pyver"] = get_major_pyver(node_name)
-        if node_name == "3.9":
-            self.props["abbreviated_pyver"] = "py3.9"

    # noinspection PyMethodMayBeStatic
    def child_constructor(self):
@ -71,10 +167,8 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
        next_nodes = {
            "asan": AsanConfigNode,
            "xla": XlaConfigNode,
-            "mps": MPSConfigNode,
            "vulkan": VulkanConfigNode,
            "parallel_tbb": ParallelTBBConfigNode,
-            "crossref": CrossRefConfigNode,
            "parallel_native": ParallelNativeConfigNode,
            "onnx": ONNXConfigNode,
            "libtorch": LibTorchConfigNode,
@ -82,19 +176,12 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
            "build_only": BuildOnlyConfigNode,
            "shard_test": ShardTestConfigNode,
            "cuda_gcc_override": CudaGccOverrideConfigNode,
+            "coverage": CoverageConfigNode,
            "pure_torch": PureTorchConfigNode,
-            "slow_gradcheck": SlowGradcheckConfigNode,
        }
        return next_nodes[experimental_feature]


-class SlowGradcheckConfigNode(TreeConfigNode):
-    def init2(self, node_name):
-        self.props["is_slow_gradcheck"] = True
-
-    def child_constructor(self):
-        return ExperimentalFeatureConfigNode
-
 class PureTorchConfigNode(TreeConfigNode):
    def modify_label(self, label):
        return "PURE_TORCH=" + str(label)
@ -116,16 +203,6 @@ class XlaConfigNode(TreeConfigNode):
    def child_constructor(self):
        return ImportantConfigNode

-class MPSConfigNode(TreeConfigNode):
-    def modify_label(self, label):
-        return "MPS=" + str(label)
-
-    def init2(self, node_name):
-        self.props["is_mps"] = node_name
-
-    def child_constructor(self):
-        return ImportantConfigNode
-

 class AsanConfigNode(TreeConfigNode):
    def modify_label(self, label):
@ -171,14 +248,6 @@ class ParallelTBBConfigNode(TreeConfigNode):
        return ImportantConfigNode


-class CrossRefConfigNode(TreeConfigNode):
-    def init2(self, node_name):
-        self.props["is_crossref"] = node_name
-
-    def child_constructor(self):
-        return ImportantConfigNode
-
-
 class ParallelNativeConfigNode(TreeConfigNode):
    def modify_label(self, label):
        return "PARALLELNATIVE=" + str(label)
@ -225,6 +294,14 @@ class ShardTestConfigNode(TreeConfigNode):
        return ImportantConfigNode


+class CoverageConfigNode(TreeConfigNode):
+    def init2(self, node_name):
+        self.props["is_coverage"] = node_name
+
+    def child_constructor(self):
+        return ExperimentalFeatureConfigNode
+
+
 class ImportantConfigNode(TreeConfigNode):
    def modify_label(self, label):
        return "IMPORTANT=" + str(label)
--- a/.circleci/cimodel/data/pytorch_build_definitions.py
+++ b/.circleci/cimodel/data/pytorch_build_definitions.py
@ -31,7 +31,6 @@ class Conf:
    is_libtorch: bool = False
    is_important: bool = False
    parallel_backend: Optional[str] = None
-    build_only: bool = False

    @staticmethod
    def is_test_phase(phase):
@ -113,8 +112,6 @@ class Conf:
            parameters["resource_class"] = "xlarge"
        if hasattr(self, 'filters'):
            parameters['filters'] = self.filters
-        if self.build_only:
-            parameters['build_only'] = miniutils.quote(str(int(True)))
        return parameters

    def gen_workflow_job(self, phase):
@ -178,6 +175,35 @@ class DocPushConf(object):
            }
        }

+# TODO Convert these to graph nodes
+def gen_dependent_configs(xenial_parent_config):
+
+    extra_parms = [
+        (["multigpu"], "large"),
+        (["nogpu", "NO_AVX2"], None),
+        (["nogpu", "NO_AVX"], None),
+        (["slow"], "medium"),
+    ]
+
+    configs = []
+    for parms, gpu in extra_parms:
+
+        c = Conf(
+            xenial_parent_config.distro,
+            ["py3"] + parms,
+            pyver=xenial_parent_config.pyver,
+            cuda_version=xenial_parent_config.cuda_version,
+            restrict_phases=["test"],
+            gpu_resource=gpu,
+            parent_build=xenial_parent_config,
+            is_important=False,
+        )
+
+        configs.append(c)
+
+    return configs
+
+
 def gen_docs_configs(xenial_parent_config):
    configs = []

@ -185,7 +211,7 @@ def gen_docs_configs(xenial_parent_config):
        HiddenConf(
            "pytorch_python_doc_build",
            parent_build=xenial_parent_config,
-            filters=gen_filter_dict(branches_list=["master", "main", "nightly"],
+            filters=gen_filter_dict(branches_list=r"/.*/",
                                    tags_list=RC_PATTERN),
        )
    )
@ -201,7 +227,7 @@ def gen_docs_configs(xenial_parent_config):
        HiddenConf(
            "pytorch_cpp_doc_build",
            parent_build=xenial_parent_config,
-            filters=gen_filter_dict(branches_list=["master", "main", "nightly"],
+            filters=gen_filter_dict(branches_list=r"/.*/",
                                    tags_list=RC_PATTERN),
        )
    )
@ -212,6 +238,13 @@ def gen_docs_configs(xenial_parent_config):
            branch="master",
        )
    )
+
+    configs.append(
+        HiddenConf(
+            "pytorch_doc_test",
+            parent_build=xenial_parent_config
+        )
+    )
    return configs


@ -225,7 +258,7 @@ def gen_tree():
    return configs_list


-def instantiate_configs(only_slow_gradcheck):
+def instantiate_configs():

    config_list = []

@ -239,16 +272,12 @@ def instantiate_configs(only_slow_gradcheck):
        compiler_version = fc.find_prop("compiler_version")
        is_xla = fc.find_prop("is_xla") or False
        is_asan = fc.find_prop("is_asan") or False
-        is_crossref = fc.find_prop("is_crossref") or False
+        is_coverage = fc.find_prop("is_coverage") or False
        is_onnx = fc.find_prop("is_onnx") or False
        is_pure_torch = fc.find_prop("is_pure_torch") or False
        is_vulkan = fc.find_prop("is_vulkan") or False
-        is_slow_gradcheck = fc.find_prop("is_slow_gradcheck") or False
        parms_list_ignored_for_docker_image = []

-        if only_slow_gradcheck ^ is_slow_gradcheck:
-            continue
-
        python_version = None
        if compiler_name == "cuda" or compiler_name == "android":
            python_version = fc.find_prop("pyver")
@ -283,8 +312,9 @@ def instantiate_configs(only_slow_gradcheck):
            python_version = fc.find_prop("pyver")
            parms_list[0] = fc.find_prop("abbreviated_pyver")

-        if is_crossref:
-            parms_list_ignored_for_docker_image.append("crossref")
+        if is_coverage:
+            parms_list_ignored_for_docker_image.append("coverage")
+            python_version = fc.find_prop("pyver")

        if is_onnx:
            parms_list.append("onnx")
@ -308,10 +338,6 @@ def instantiate_configs(only_slow_gradcheck):
        if build_only or is_pure_torch:
            restrict_phases = ["build"]

-        if is_slow_gradcheck:
-            parms_list_ignored_for_docker_image.append("old")
-            parms_list_ignored_for_docker_image.append("gradcheck")
-
        gpu_resource = None
        if cuda_version and cuda_version != "10":
            gpu_resource = "medium"
@ -331,15 +357,15 @@ def instantiate_configs(only_slow_gradcheck):
            is_libtorch=is_libtorch,
            is_important=is_important,
            parallel_backend=parallel_backend,
-            build_only=build_only,
        )

-        # run docs builds on "pytorch-linux-xenial-py3.7-gcc5.4". Docs builds
+        # run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds
        # should run on a CPU-only build that runs on all PRs.
-        # XXX should this be updated to a more modern build?
+        # XXX should this be updated to a more modern build? Projects are
+        #     beginning to drop python3.6
        if (
            distro_name == "xenial"
-            and fc.find_prop("pyver") == "3.7"
+            and fc.find_prop("pyver") == "3.6"
            and cuda_version is None
            and parallel_backend is None
            and not is_vulkan
@ -351,14 +377,36 @@ def instantiate_configs(only_slow_gradcheck):
                                        tags_list=RC_PATTERN)
            c.dependent_tests = gen_docs_configs(c)

+        if cuda_version == "10.2" and python_version == "3.6" and not is_libtorch:
+            c.dependent_tests = gen_dependent_configs(c)
+
+        if (
+            compiler_name == "gcc"
+            and compiler_version == "5.4"
+            and not is_libtorch
+            and not is_vulkan
+            and not is_pure_torch
+            and parallel_backend is None
+        ):
+            bc_breaking_check = Conf(
+                "backward-compatibility-check",
+                [],
+                is_xla=False,
+                restrict_phases=["test"],
+                is_libtorch=False,
+                is_important=True,
+                parent_build=c,
+            )
+            c.dependent_tests.append(bc_breaking_check)
+
        config_list.append(c)

    return config_list


-def get_workflow_jobs(only_slow_gradcheck=False):
+def get_workflow_jobs():

-    config_list = instantiate_configs(only_slow_gradcheck)
+    config_list = instantiate_configs()

    x = []
    for conf_options in config_list:
--- a/.circleci/cimodel/data/simple/android_definitions.py
+++ b/.circleci/cimodel/data/simple/android_definitions.py
@ -0,0 +1,105 @@
+import cimodel.data.simple.util.branch_filters as branch_filters
+from cimodel.data.simple.util.docker_constants import (
+    DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK
+)
+
+
+class AndroidJob:
+    def __init__(self,
+                 variant,
+                 template_name,
+                 is_master_only=True):
+
+        self.variant = variant
+        self.template_name = template_name
+        self.is_master_only = is_master_only
+
+    def gen_tree(self):
+
+        base_name_parts = [
+            "pytorch",
+            "linux",
+            "xenial",
+            "py3",
+            "clang5",
+            "android",
+            "ndk",
+            "r19c",
+        ] + self.variant + [
+            "build",
+        ]
+
+        full_job_name = "_".join(base_name_parts)
+        build_env_name = "-".join(base_name_parts)
+
+        props_dict = {
+            "name": full_job_name,
+            "build_environment": "\"{}\"".format(build_env_name),
+            "docker_image": "\"{}\"".format(DOCKER_IMAGE_NDK),
+            "requires": [DOCKER_REQUIREMENT_NDK]
+        }
+
+        if self.is_master_only:
+            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)
+
+        return [{self.template_name: props_dict}]
+
+
+class AndroidGradleJob:
+    def __init__(self,
+                 job_name,
+                 template_name,
+                 dependencies,
+                 is_master_only=True,
+                 is_pr_only=False):
+
+        self.job_name = job_name
+        self.template_name = template_name
+        self.dependencies = dependencies
+        self.is_master_only = is_master_only
+        self.is_pr_only = is_pr_only
+
+    def gen_tree(self):
+
+        props_dict = {
+            "name": self.job_name,
+            "requires": self.dependencies,
+        }
+
+        if self.is_master_only:
+            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)
+        elif self.is_pr_only:
+            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.PR_BRANCH_LIST)
+
+        return [{self.template_name: props_dict}]
+
+
+WORKFLOW_DATA = [
+    AndroidJob(["x86_32"], "pytorch_linux_build", is_master_only=False),
+    AndroidJob(["x86_64"], "pytorch_linux_build"),
+    AndroidJob(["arm", "v7a"], "pytorch_linux_build"),
+    AndroidJob(["arm", "v8a"], "pytorch_linux_build"),
+    AndroidGradleJob(
+        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32",
+        "pytorch_android_gradle_build-x86_32",
+        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build"],
+        is_master_only=False,
+        is_pr_only=True),
+    AndroidGradleJob(
+        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
+        "pytorch_android_gradle_custom_build_single",
+        [DOCKER_REQUIREMENT_NDK],
+        is_master_only=False,
+        is_pr_only=True),
+    AndroidGradleJob(
+        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",
+        "pytorch_android_gradle_build",
+        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",
+         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",
+         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",
+         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),
+]
+
+
+def get_workflow_jobs():
+    return [item.gen_tree() for item in WORKFLOW_DATA]
--- a/.circleci/cimodel/data/simple/bazel_definitions.py
+++ b/.circleci/cimodel/data/simple/bazel_definitions.py
@ -0,0 +1,69 @@
+from cimodel.data.simple.util.docker_constants import (
+    DOCKER_IMAGE_GCC7,
+    DOCKER_REQUIREMENT_GCC7
+)
+
+
+def gen_job_name(phase):
+    job_name_parts = [
+        "pytorch",
+        "bazel",
+        phase,
+    ]
+
+    return "_".join(job_name_parts)
+
+
+class BazelJob:
+    def __init__(self, phase, extra_props=None):
+        self.phase = phase
+        self.extra_props = extra_props or {}
+
+    def gen_tree(self):
+
+        template_parts = [
+            "pytorch",
+            "linux",
+            "bazel",
+            self.phase,
+        ]
+
+        build_env_parts = [
+            "pytorch",
+            "linux",
+            "xenial",
+            "py3.6",
+            "gcc7",
+            "bazel",
+            self.phase,
+        ]
+
+        full_job_name = gen_job_name(self.phase)
+        build_env_name = "-".join(build_env_parts)
+
+        extra_requires = (
+            [gen_job_name("build")] if self.phase == "test" else
+            [DOCKER_REQUIREMENT_GCC7]
+        )
+
+        props_dict = {
+            "build_environment": build_env_name,
+            "docker_image": DOCKER_IMAGE_GCC7,
+            "name": full_job_name,
+            "requires": extra_requires,
+        }
+
+        props_dict.update(self.extra_props)
+
+        template_name = "_".join(template_parts)
+        return [{template_name: props_dict}]
+
+
+WORKFLOW_DATA = [
+    BazelJob("build", {"resource_class": "large"}),
+    BazelJob("test"),
+]
+
+
+def get_workflow_jobs():
+    return [item.gen_tree() for item in WORKFLOW_DATA]
--- a/.circleci/cimodel/data/simple/binary_smoketest.py
+++ b/.circleci/cimodel/data/simple/binary_smoketest.py
@ -0,0 +1,193 @@
+"""
+TODO: Refactor circleci/cimodel/data/binary_build_data.py to generate this file
+       instead of doing one offs here
+ Binary builds (subset, to smoke test that they'll work)
+
+ NB: If you modify this file, you need to also modify
+ the binary_and_smoke_tests_on_pr variable in
+ pytorch-ci-hud to adjust the allowed build list
+ at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js
+
+ Note:
+ This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
+ - binary_linux_conda_3_6_cu90_devtoolset7_build
+ - binary_linux_conda_3_6_cu90_devtoolset7_test
+
+ TODO
+ we should test a libtorch cuda build, but they take too long
+ - binary_linux_libtorch_3_6m_cu90_devtoolset7_static-without-deps_build
+"""
+
+import cimodel.lib.miniutils as miniutils
+import cimodel.data.simple.util.branch_filters
+
+
+class SmoketestJob:
+    def __init__(self,
+                 template_name,
+                 build_env_parts,
+                 docker_image,
+                 job_name,
+                 is_master_only=False,
+                 requires=None,
+                 has_libtorch_variant=False,
+                 extra_props=None):
+
+        self.template_name = template_name
+        self.build_env_parts = build_env_parts
+        self.docker_image = docker_image
+        self.job_name = job_name
+        self.is_master_only = is_master_only
+        self.requires = requires or []
+        self.has_libtorch_variant = has_libtorch_variant
+        self.extra_props = extra_props or {}
+
+    def gen_tree(self):
+
+        props_dict = {
+            "build_environment": " ".join(self.build_env_parts),
+            "name": self.job_name,
+            "requires": self.requires,
+        }
+
+        if self.docker_image:
+            props_dict["docker_image"] = self.docker_image
+
+        if self.is_master_only:
+            props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()
+
+        if self.has_libtorch_variant:
+            props_dict["libtorch_variant"] = "shared-with-deps"
+
+        props_dict.update(self.extra_props)
+
+        return [{self.template_name: props_dict}]
+
+
+WORKFLOW_DATA = [
+    SmoketestJob(
+        "binary_linux_build",
+        ["manywheel", "3.7m", "cu102", "devtoolset7"],
+        "pytorch/manylinux-cuda102",
+        "binary_linux_manywheel_3_7m_cu102_devtoolset7_build",
+        is_master_only=True,
+    ),
+    SmoketestJob(
+        "binary_linux_build",
+        ["libtorch", "3.7m", "cpu", "devtoolset7"],
+        "pytorch/manylinux-cuda102",
+        "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build",
+        is_master_only=False,
+        has_libtorch_variant=True,
+    ),
+    SmoketestJob(
+        "binary_linux_build",
+        ["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"],
+        "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",
+        "binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build",
+        is_master_only=False,
+        has_libtorch_variant=True,
+    ),
+    SmoketestJob(
+        "binary_mac_build",
+        ["wheel", "3.7", "cpu"],
+        None,
+        "binary_macos_wheel_3_7_cpu_build",
+        is_master_only=True,
+    ),
+    # This job has an average run time of 3 hours o.O
+    # Now only running this on master to reduce overhead
+    SmoketestJob(
+        "binary_mac_build",
+        ["libtorch", "3.7", "cpu"],
+        None,
+        "binary_macos_libtorch_3_7_cpu_build",
+        is_master_only=True,
+    ),
+    SmoketestJob(
+        "binary_windows_build",
+        ["libtorch", "3.7", "cpu", "debug"],
+        None,
+        "binary_windows_libtorch_3_7_cpu_debug_build",
+        is_master_only=False,
+    ),
+    SmoketestJob(
+        "binary_windows_build",
+        ["libtorch", "3.7", "cpu", "release"],
+        None,
+        "binary_windows_libtorch_3_7_cpu_release_build",
+        is_master_only=False,
+    ),
+    SmoketestJob(
+        "binary_windows_build",
+        ["wheel", "3.7", "cu102"],
+        None,
+        "binary_windows_wheel_3_7_cu102_build",
+        is_master_only=True,
+    ),
+
+    SmoketestJob(
+        "binary_windows_test",
+        ["libtorch", "3.7", "cpu", "debug"],
+        None,
+        "binary_windows_libtorch_3_7_cpu_debug_test",
+        is_master_only=False,
+        requires=["binary_windows_libtorch_3_7_cpu_debug_build"],
+    ),
+    SmoketestJob(
+        "binary_windows_test",
+        ["libtorch", "3.7", "cpu", "release"],
+        None,
+        "binary_windows_libtorch_3_7_cpu_release_test",
+        is_master_only=False,
+        requires=["binary_windows_libtorch_3_7_cpu_release_build"],
+    ),
+    SmoketestJob(
+        "binary_windows_test",
+        ["wheel", "3.7", "cu102"],
+        None,
+        "binary_windows_wheel_3_7_cu102_test",
+        is_master_only=True,
+        requires=["binary_windows_wheel_3_7_cu102_build"],
+        extra_props={
+            "executor": "windows-with-nvidia-gpu",
+        },
+    ),
+
+
+
+    SmoketestJob(
+        "binary_linux_test",
+        ["manywheel", "3.7m", "cu102", "devtoolset7"],
+        "pytorch/manylinux-cuda102",
+        "binary_linux_manywheel_3_7m_cu102_devtoolset7_test",
+        is_master_only=True,
+        requires=["binary_linux_manywheel_3_7m_cu102_devtoolset7_build"],
+        extra_props={
+            "resource_class": "gpu.medium",
+            "use_cuda_docker_runtime": miniutils.quote((str(1))),
+        },
+    ),
+    SmoketestJob(
+        "binary_linux_test",
+        ["libtorch", "3.7m", "cpu", "devtoolset7"],
+        "pytorch/manylinux-cuda102",
+        "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test",
+        is_master_only=False,
+        requires=["binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build"],
+        has_libtorch_variant=True,
+    ),
+    SmoketestJob(
+        "binary_linux_test",
+        ["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"],
+        "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",
+        "binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test",
+        is_master_only=False,
+        requires=["binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build"],
+        has_libtorch_variant=True,
+    ),
+]
+
+
+def get_workflow_jobs():
+    return [item.gen_tree() for item in WORKFLOW_DATA]
--- a/.circleci/cimodel/data/simple/docker_definitions.py
+++ b/.circleci/cimodel/data/simple/docker_definitions.py
@ -4,29 +4,45 @@ from cimodel.lib.miniutils import quote
 from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN


-# NOTE: All hardcoded docker image builds have been migrated to GHA
+# TODO: make this generated from a matrix rather than just a static list
 IMAGE_NAMES = [
+    "pytorch-linux-bionic-cuda11.1-cudnn8-py3.6-gcc9",
+    "pytorch-linux-bionic-cuda11.1-cudnn8-py3.8-gcc9",
+    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9",
+    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9",
+    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",
+    "pytorch-linux-bionic-py3.6-clang9",
+    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",
+    "pytorch-linux-bionic-py3.8-gcc9",
+    "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",
+    "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",
+    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
+    "pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7",
+    "pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
+    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4",
+    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7",
+    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
+    "pytorch-linux-xenial-py3-clang5-asan",
+    "pytorch-linux-xenial-py3-clang7-onnx",
+    "pytorch-linux-xenial-py3.8",
+    "pytorch-linux-xenial-py3.6-clang7",
+    "pytorch-linux-xenial-py3.6-gcc5.4",  # this one is used in doc builds
+    "pytorch-linux-xenial-py3.6-gcc7.2",
+    "pytorch-linux-xenial-py3.6-gcc7",
+    "pytorch-linux-bionic-rocm3.9-py3.6",
+    "pytorch-linux-bionic-rocm3.10-py3.6",
 ]

-# This entry should be an element from the list above
-# This should contain the image matching the "slow_gradcheck" entry in
-# pytorch_build_data.py
-SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

-def get_workflow_jobs(images=IMAGE_NAMES, only_slow_gradcheck=False):
+def get_workflow_jobs():
    """Generates a list of docker image build definitions"""
    ret = []
-    for image_name in images:
-        if image_name.startswith('docker-'):
-            image_name = image_name.lstrip('docker-')
-        if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:
-            continue
-
+    for image_name in IMAGE_NAMES:
        parameters = OrderedDict({
            "name": quote(f"docker-{image_name}"),
            "image_name": quote(image_name),
        })
-        if image_name == "pytorch-linux-xenial-py3.7-gcc5.4":
+        if image_name == "pytorch-linux-xenial-py3.6-gcc5.4":
            # pushing documentation on tags requires CircleCI to also
            # build all the dependencies on tags, including this docker image
            parameters['filters'] = gen_filter_dict(branches_list=r"/.*/",
--- a/.circleci/cimodel/data/simple/ge_config_tests.py
+++ b/.circleci/cimodel/data/simple/ge_config_tests.py
@ -0,0 +1,78 @@
+import cimodel.lib.miniutils as miniutils
+from cimodel.data.simple.util.versions import MultiPartVersion, CudaVersion
+from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_BASIC, DOCKER_IMAGE_CUDA_10_2
+
+
+class GeConfigTestJob:
+    def __init__(self,
+                 py_version,
+                 gcc_version,
+                 cuda_version,
+                 variant_parts,
+                 extra_requires,
+                 use_cuda_docker=False,
+                 build_env_override=None):
+
+        self.py_version = py_version
+        self.gcc_version = gcc_version
+        self.cuda_version = cuda_version
+        self.variant_parts = variant_parts
+        self.extra_requires = extra_requires
+        self.use_cuda_docker = use_cuda_docker
+        self.build_env_override = build_env_override
+
+    def get_all_parts(self, with_dots):
+
+        maybe_py_version = self.py_version.render_dots_or_parts(with_dots) if self.py_version else []
+        maybe_gcc_version = self.gcc_version.render_dots_or_parts(with_dots) if self.gcc_version else []
+        maybe_cuda_version = self.cuda_version.render_dots_or_parts(with_dots) if self.cuda_version else []
+
+        common_parts = [
+            "pytorch",
+            "linux",
+            "xenial",
+        ] + maybe_cuda_version + maybe_py_version + maybe_gcc_version
+
+        return common_parts + self.variant_parts
+
+    def gen_tree(self):
+
+        resource_class = "gpu.medium" if self.use_cuda_docker else "large"
+        docker_image = DOCKER_IMAGE_CUDA_10_2 if self.use_cuda_docker else DOCKER_IMAGE_BASIC
+        full_name = "_".join(self.get_all_parts(False))
+        build_env = self.build_env_override or "-".join(self.get_all_parts(True))
+
+        props_dict = {
+            "name": full_name,
+            "build_environment": build_env,
+            "requires": self.extra_requires,
+            "resource_class": resource_class,
+            "docker_image": docker_image,
+        }
+
+        if self.use_cuda_docker:
+            props_dict["use_cuda_docker_runtime"] = miniutils.quote(str(1))
+
+        return [{"pytorch_linux_test": props_dict}]
+
+
+WORKFLOW_DATA = [
+    GeConfigTestJob(
+        MultiPartVersion([3, 6], "py"),
+        MultiPartVersion([5, 4], "gcc"),
+        None,
+        ["jit_legacy", "test"],
+        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),
+    GeConfigTestJob(
+        None,
+        None,
+        CudaVersion(10, 2),
+        ["cudnn7", "py3", "jit_legacy", "test"],
+        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],
+        use_cuda_docker=True,
+    ),
+]
+
+
+def get_workflow_jobs():
+    return [item.gen_tree() for item in WORKFLOW_DATA]
--- a/.circleci/cimodel/data/simple/ios_definitions.py
+++ b/.circleci/cimodel/data/simple/ios_definitions.py
@ -1,7 +1,7 @@
 from cimodel.data.simple.util.versions import MultiPartVersion
 import cimodel.lib.miniutils as miniutils

-XCODE_VERSION = MultiPartVersion([12, 5, 1])
+XCODE_VERSION = MultiPartVersion([12, 0, 0])


 class ArchVariant:
@ -61,26 +61,10 @@ class IOSJob:


 WORKFLOW_DATA = [
-    IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False, extra_props={
-        "lite_interpreter": miniutils.quote(str(int(True)))}),
-    IOSJob(XCODE_VERSION, ArchVariant("x86_64", "full_jit"), is_org_member_context=False, extra_props={
-        "lite_interpreter": miniutils.quote(str(int(False)))}),
-    IOSJob(XCODE_VERSION, ArchVariant("arm64"), extra_props={
-        "lite_interpreter": miniutils.quote(str(int(True)))}),
-    IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={
-        "use_metal": miniutils.quote(str(int(True))),
-        "lite_interpreter": miniutils.quote(str(int(True)))}),
-    IOSJob(XCODE_VERSION, ArchVariant("arm64", "full_jit"), extra_props={
-        "lite_interpreter": miniutils.quote(str(int(False)))}),
-    IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={
-        "op_list": "mobilenetv2.yaml",
-        "lite_interpreter": miniutils.quote(str(int(True)))}),
-    IOSJob(XCODE_VERSION, ArchVariant("x86_64", "coreml"), is_org_member_context=False, extra_props={
-        "use_coreml": miniutils.quote(str(int(True))),
-        "lite_interpreter": miniutils.quote(str(int(True)))}),
-    IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={
-        "use_coreml": miniutils.quote(str(int(True))),
-        "lite_interpreter": miniutils.quote(str(int(True)))}),
+    IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False),
+    IOSJob(XCODE_VERSION, ArchVariant("arm64")),
+    IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={"use_metal": miniutils.quote(str(int(True)))}),
+    IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={"op_list": "mobilenetv2.yaml"}),
 ]


--- a/.circleci/cimodel/data/simple/macos_definitions.py
+++ b/.circleci/cimodel/data/simple/macos_definitions.py
@ -1,22 +1,14 @@
 class MacOsJob:
-    def __init__(self, os_version, is_build=False, is_test=False, extra_props=tuple()):
-        # extra_props is tuple type, because mutable data structures for argument defaults
-        # is not recommended.
+    def __init__(self, os_version, is_test=False):
        self.os_version = os_version
-        self.is_build = is_build
        self.is_test = is_test
-        self.extra_props = dict(extra_props)

    def gen_tree(self):
        non_phase_parts = ["pytorch", "macos", self.os_version, "py3"]

-        extra_name_list = [name for name, exist in self.extra_props.items() if exist]
-        full_job_name_list = non_phase_parts + extra_name_list + [
-            'build' if self.is_build else None,
-            'test' if self.is_test else None,
-        ]
+        phase_name = "test" if self.is_test else "build"

-        full_job_name = "_".join(list(filter(None, full_job_name_list)))
+        full_job_name = "_".join(non_phase_parts + [phase_name])

        test_build_dependency = "_".join(non_phase_parts + ["build"])
        extra_dependencies = [test_build_dependency] if self.is_test else []
@ -29,23 +21,7 @@ class MacOsJob:
        return [{full_job_name: props_dict}]


-WORKFLOW_DATA = [
-    MacOsJob("10_15", is_build=True),
-    MacOsJob("10_13", is_build=True),
-    MacOsJob(
-        "10_13",
-        is_build=False,
-        is_test=True,
-    ),
-    MacOsJob(
-        "10_13",
-        is_build=True,
-        is_test=True,
-        extra_props=tuple({
-            "lite_interpreter": True
-        }.items()),
-    )
-]
+WORKFLOW_DATA = [MacOsJob("10_13"), MacOsJob("10_13", True)]


 def get_workflow_jobs():
--- a/.circleci/cimodel/data/simple/mobile_definitions.py
+++ b/.circleci/cimodel/data/simple/mobile_definitions.py
@ -4,6 +4,12 @@ PyTorch Mobile PR builds (use linux host toolchain + mobile build options)

 import cimodel.lib.miniutils as miniutils
 import cimodel.data.simple.util.branch_filters
+from cimodel.data.simple.util.docker_constants import (
+    DOCKER_IMAGE_ASAN,
+    DOCKER_REQUIREMENT_ASAN,
+    DOCKER_IMAGE_NDK,
+    DOCKER_REQUIREMENT_NDK
+)


 class MobileJob:
@ -46,6 +52,27 @@ class MobileJob:


 WORKFLOW_DATA = [
+    MobileJob(
+        DOCKER_IMAGE_ASAN,
+        [DOCKER_REQUIREMENT_ASAN],
+        ["build"]
+    ),
+
+    # Use LLVM-DEV toolchain in android-ndk-r19c docker image
+    MobileJob(
+        DOCKER_IMAGE_NDK,
+        [DOCKER_REQUIREMENT_NDK],
+        ["custom", "build", "dynamic"]
+    ),
+
+    # Use LLVM-DEV toolchain in android-ndk-r19c docker image
+    # Most of this CI is already covered by "mobile-custom-build-dynamic" job
+    MobileJob(
+        DOCKER_IMAGE_NDK,
+        [DOCKER_REQUIREMENT_NDK],
+        ["code", "analysis"],
+        True
+    ),
 ]


--- a/.circleci/cimodel/data/simple/nightly_android.py
+++ b/.circleci/cimodel/data/simple/nightly_android.py
@ -0,0 +1,77 @@
+from cimodel.data.simple.util.docker_constants import (
+    DOCKER_IMAGE_NDK,
+    DOCKER_REQUIREMENT_NDK
+)
+
+
+class AndroidNightlyJob:
+    def __init__(self,
+                 variant,
+                 template_name,
+                 extra_props=None,
+                 with_docker=True,
+                 requires=None,
+                 no_build_suffix=False):
+
+        self.variant = variant
+        self.template_name = template_name
+        self.extra_props = extra_props or {}
+        self.with_docker = with_docker
+        self.requires = requires
+        self.no_build_suffix = no_build_suffix
+
+    def gen_tree(self):
+
+        base_name_parts = [
+            "pytorch",
+            "linux",
+            "xenial",
+            "py3",
+            "clang5",
+            "android",
+            "ndk",
+            "r19c",
+        ] + self.variant
+
+        build_suffix = [] if self.no_build_suffix else ["build"]
+        full_job_name = "_".join(["nightly"] + base_name_parts + build_suffix)
+        build_env_name = "-".join(base_name_parts)
+
+        props_dict = {
+            "name": full_job_name,
+            "requires": self.requires,
+            "filters": {"branches": {"only": "nightly"}},
+        }
+
+        props_dict.update(self.extra_props)
+
+        if self.with_docker:
+            props_dict["docker_image"] = DOCKER_IMAGE_NDK
+            props_dict["build_environment"] = build_env_name
+
+        return [{self.template_name: props_dict}]
+
+BASE_REQUIRES = [DOCKER_REQUIREMENT_NDK]
+
+WORKFLOW_DATA = [
+    AndroidNightlyJob(["x86_32"], "pytorch_linux_build", requires=BASE_REQUIRES),
+    AndroidNightlyJob(["x86_64"], "pytorch_linux_build", requires=BASE_REQUIRES),
+    AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build", requires=BASE_REQUIRES),
+    AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build", requires=BASE_REQUIRES),
+    AndroidNightlyJob(["android_gradle"], "pytorch_android_gradle_build",
+                      with_docker=False,
+                      requires=[
+                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",
+                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",
+                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",
+                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),
+    AndroidNightlyJob(["x86_32_android_publish_snapshot"], "pytorch_android_publish_snapshot",
+                      extra_props={"context": "org-member"},
+                      with_docker=False,
+                      requires=["nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build"],
+                      no_build_suffix=True),
+]
+
+
+def get_workflow_jobs():
+    return [item.gen_tree() for item in WORKFLOW_DATA]
--- a/.circleci/cimodel/data/simple/nightly_ios.py
+++ b/.circleci/cimodel/data/simple/nightly_ios.py
@ -1,15 +1,12 @@
 import cimodel.data.simple.ios_definitions as ios_definitions
-import cimodel.lib.miniutils as miniutils


 class IOSNightlyJob:
    def __init__(self,
                 variant,
-                 is_full_jit=False,
                 is_upload=False):

        self.variant = variant
-        self.is_full_jit = is_full_jit
        self.is_upload = is_upload

    def get_phase_name(self):
@ -19,11 +16,8 @@ class IOSNightlyJob:

        extra_name_suffix = [self.get_phase_name()] if self.is_upload else []

-        extra_name = ["full_jit"] if self.is_full_jit else []
-
        common_name_pieces = [
            "ios",
-        ] + extra_name + [
        ] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [
            "nightly",
            self.variant,
@ -36,8 +30,7 @@ class IOSNightlyJob:
        return "_".join(["pytorch"] + self.get_common_name_pieces(False))

    def gen_tree(self):
-        build_configs = BUILD_CONFIGS_FULL_JIT if self.is_full_jit else BUILD_CONFIGS
-        extra_requires = [x.gen_job_name() for x in build_configs] if self.is_upload else []
+        extra_requires = [x.gen_job_name() for x in BUILD_CONFIGS] if self.is_upload else []

        props_dict = {
            "build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(True)),
@ -50,11 +43,6 @@ class IOSNightlyJob:
            props_dict["ios_arch"] = self.variant
            props_dict["ios_platform"] = ios_definitions.get_platform(self.variant)
            props_dict["name"] = self.gen_job_name()
-            props_dict["use_metal"] = miniutils.quote(str(int(True)))
-            props_dict["use_coreml"] = miniutils.quote(str(int(True)))
-
-        if self.is_full_jit:
-            props_dict["lite_interpreter"] = miniutils.quote(str(int(False)))

        template_name = "_".join([
            "binary",
@ -70,14 +58,9 @@ BUILD_CONFIGS = [
    IOSNightlyJob("arm64"),
 ]

-BUILD_CONFIGS_FULL_JIT = [
-    IOSNightlyJob("x86_64", is_full_jit=True),
-    IOSNightlyJob("arm64", is_full_jit=True),
-]

-WORKFLOW_DATA = BUILD_CONFIGS + BUILD_CONFIGS_FULL_JIT + [
-    IOSNightlyJob("binary", is_full_jit=False, is_upload=True),
-    IOSNightlyJob("binary", is_full_jit=True, is_upload=True),
+WORKFLOW_DATA = BUILD_CONFIGS + [
+    IOSNightlyJob("binary", is_upload=True),
 ]


--- a/.circleci/cimodel/data/simple/util/branch_filters.py
+++ b/.circleci/cimodel/data/simple/util/branch_filters.py
@ -1,5 +1,4 @@
 NON_PR_BRANCH_LIST = [
-    "main",
    "master",
    r"/ci-all\/.*/",
    r"/release\/.*/",
--- a/.circleci/cimodel/data/simple/util/docker_constants.py
+++ b/.circleci/cimodel/data/simple/util/docker_constants.py
@ -11,7 +11,7 @@ def gen_docker_image_requires(image_name):


 DOCKER_IMAGE_BASIC, DOCKER_REQUIREMENT_BASE = gen_docker_image(
-    "pytorch-linux-xenial-py3.7-gcc5.4"
+    "pytorch-linux-xenial-py3.6-gcc5.4"
 )

 DOCKER_IMAGE_CUDA_10_2, DOCKER_REQUIREMENT_CUDA_10_2 = gen_docker_image(
@ -19,7 +19,7 @@ DOCKER_IMAGE_CUDA_10_2, DOCKER_REQUIREMENT_CUDA_10_2 = gen_docker_image(
 )

 DOCKER_IMAGE_GCC7, DOCKER_REQUIREMENT_GCC7 = gen_docker_image(
-    "pytorch-linux-xenial-py3.7-gcc7"
+    "pytorch-linux-xenial-py3.6-gcc7"
 )


--- a/.circleci/cimodel/data/windows_build_definitions.py
+++ b/.circleci/cimodel/data/windows_build_definitions.py
@ -0,0 +1,148 @@
+import cimodel.data.simple.util.branch_filters
+import cimodel.lib.miniutils as miniutils
+from cimodel.data.simple.util.versions import CudaVersion
+
+
+class WindowsJob:
+    def __init__(
+        self,
+        test_index,
+        vscode_spec,
+        cuda_version,
+        force_on_cpu=False,
+        master_only_pred=lambda job: job.vscode_spec.year != 2019,
+    ):
+        self.test_index = test_index
+        self.vscode_spec = vscode_spec
+        self.cuda_version = cuda_version
+        self.force_on_cpu = force_on_cpu
+        self.master_only_pred = master_only_pred
+
+    def gen_tree(self):
+
+        base_phase = "build" if self.test_index is None else "test"
+        numbered_phase = (
+            base_phase if self.test_index is None else base_phase + str(self.test_index)
+        )
+
+        key_name = "_".join(["pytorch", "windows", base_phase])
+
+        cpu_forcing_name_parts = ["on", "cpu"] if self.force_on_cpu else []
+
+        target_arch = self.cuda_version.render_dots() if self.cuda_version else "cpu"
+
+        base_name_parts = [
+            "pytorch",
+            "windows",
+            self.vscode_spec.render(),
+            "py36",
+            target_arch,
+        ]
+
+        prerequisite_jobs = []
+        if base_phase == "test":
+            prerequisite_jobs.append("_".join(base_name_parts + ["build"]))
+
+        if self.cuda_version:
+            self.cudnn_version = 8 if self.cuda_version.major == 11 else 7
+
+        arch_env_elements = (
+            ["cuda" + str(self.cuda_version.major), "cudnn" + str(self.cudnn_version)]
+            if self.cuda_version
+            else ["cpu"]
+        )
+
+        build_environment_string = "-".join(
+            ["pytorch", "win"]
+            + self.vscode_spec.get_elements()
+            + arch_env_elements
+            + ["py3"]
+        )
+
+        is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu
+
+        props_dict = {
+            "build_environment": build_environment_string,
+            "python_version": miniutils.quote("3.6"),
+            "vc_version": miniutils.quote(self.vscode_spec.dotted_version()),
+            "vc_year": miniutils.quote(str(self.vscode_spec.year)),
+            "vc_product": self.vscode_spec.get_product(),
+            "use_cuda": miniutils.quote(str(int(is_running_on_cuda))),
+            "requires": prerequisite_jobs,
+        }
+
+        if self.master_only_pred(self):
+            props_dict[
+                "filters"
+            ] = cimodel.data.simple.util.branch_filters.gen_filter_dict()
+
+        name_parts = base_name_parts + cpu_forcing_name_parts + [numbered_phase]
+
+        if base_phase == "test":
+            test_name = "-".join(["pytorch", "windows", numbered_phase])
+            props_dict["test_name"] = test_name
+
+            if is_running_on_cuda:
+                props_dict["executor"] = "windows-with-nvidia-gpu"
+
+        props_dict["cuda_version"] = (
+            miniutils.quote(str(self.cuda_version))
+            if self.cuda_version
+            else "cpu"
+        )
+
+        props_dict["name"] = "_".join(name_parts)
+
+        return [{key_name: props_dict}]
+
+
+class VcSpec:
+    def __init__(self, year, version_elements=None, hide_version=False):
+        self.year = year
+        self.version_elements = version_elements or []
+        self.hide_version = hide_version
+
+    def get_elements(self):
+        if self.hide_version:
+            return [self.prefixed_year()]
+        return [self.prefixed_year()] + self.version_elements
+
+    def get_product(self):
+        return "Community" if self.year == 2019 else "BuildTools"
+
+    def dotted_version(self):
+        return ".".join(self.version_elements)
+
+    def prefixed_year(self):
+        return "vs" + str(self.year)
+
+    def render(self):
+        return "_".join(self.get_elements())
+
+def FalsePred(_):
+    return False
+
+def TruePred(_):
+    return True
+
+_VC2019 = VcSpec(2019)
+
+WORKFLOW_DATA = [
+    # VS2019 CUDA-10.1
+    WindowsJob(None, _VC2019, CudaVersion(10, 1)),
+    WindowsJob(1, _VC2019, CudaVersion(10, 1)),
+    WindowsJob(2, _VC2019, CudaVersion(10, 1)),
+    # VS2019 CUDA-11.1
+    WindowsJob(None, _VC2019, CudaVersion(11, 1)),
+    WindowsJob(1, _VC2019, CudaVersion(11, 1), master_only_pred=TruePred),
+    WindowsJob(2, _VC2019, CudaVersion(11, 1), master_only_pred=TruePred),
+    # VS2019 CPU-only
+    WindowsJob(None, _VC2019, None),
+    WindowsJob(1, _VC2019, None, master_only_pred=TruePred),
+    WindowsJob(2, _VC2019, None, master_only_pred=TruePred),
+    WindowsJob(1, _VC2019, CudaVersion(10, 1), force_on_cpu=True, master_only_pred=TruePred),
+]
+
+
+def get_windows_workflows():
+    return [item.gen_tree() for item in WORKFLOW_DATA]
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
--- a/.circleci/docker/README.md
+++ b/.circleci/docker/README.md
@ -12,20 +12,8 @@ each image as the `BUILD_ENVIRONMENT` environment variable.

 See `build.sh` for valid build environments (it's the giant switch).

-Docker builds are now defined with `.circleci/cimodel/data/simple/docker_definitions.py`
-
 ## Contents

 * `build.sh` -- dispatch script to launch all builds
 * `common` -- scripts used to execute individual Docker build stages
 * `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker
-
-## Usage
-
-```bash
-# Build a specific image
-./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
-
-# Set flags (see build.sh) and build image
-sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
-```
--- a/.circleci/docker/android/build.gradle
+++ b/.circleci/docker/android/build.gradle
@ -20,8 +20,10 @@ buildscript {
    }

    dependencies {
-        classpath 'com.android.tools.build:gradle:4.1.2'
-        classpath 'com.vanniktech:gradle-maven-publish-plugin:0.14.2'
+        classpath 'com.android.tools.build:gradle:3.3.2'
+        classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:1.8.0"
+        classpath "com.github.dcendents:android-maven-gradle-plugin:2.1"
+        classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"
    }
 }

@ -51,9 +53,9 @@ android {
 dependencies {
    implementation 'com.android.support:appcompat-v7:28.0.0'
    implementation 'androidx.appcompat:appcompat:1.0.0'
-    implementation 'com.facebook.fbjni:fbjni-java-only:0.2.2'
+    implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'
    implementation 'com.google.code.findbugs:jsr305:3.0.1'
-    implementation 'com.facebook.soloader:nativeloader:0.10.1'
+    implementation 'com.facebook.soloader:nativeloader:0.8.0'

    implementation 'junit:junit:' + rootProject.junitVersion
    implementation 'androidx.test:core:' + rootProject.coreVersion
--- a/.circleci/docker/build.sh
+++ b/.circleci/docker/build.sh
@ -40,12 +40,6 @@ function extract_all_from_image_name() {
  done
 }

-# Use the same pre-built XLA test image from PyTorch/XLA
-if [[ "$image" == *xla* ]]; then
-  echo "Using pre-built XLA test image..."
-  exit 0
-fi
-
 if [[ "$image" == *-xenial* ]]; then
  UBUNTU_VERSION=16.04
 elif [[ "$image" == *-artful* ]]; then
@ -76,10 +70,6 @@ elif [[ "$image" == *rocm* ]]; then
  DOCKERFILE="${OS}-rocm/Dockerfile"
 fi

-if [[ "$image" == *xenial* ]] || [[ "$image" == *bionic* ]]; then
-  CMAKE_VERSION=3.13.5
-fi
-
 TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"

 # It's annoying to rename jobs every time you want to rewrite a
@ -91,62 +81,87 @@ case "$image" in
    GCC_VERSION=7
    # Do not install PROTOBUF, DB, and VISION as a test
    ;;
-  pytorch-linux-xenial-py3.7-gcc5.4)
-    ANACONDA_PYTHON_VERSION=3.7
+  pytorch-linux-xenial-py3.6-gcc5.4)
+    ANACONDA_PYTHON_VERSION=3.6
    GCC_VERSION=5
    PROTOBUF=yes
    DB=yes
    VISION=yes
    KATEX=yes
    ;;
-  pytorch-linux-xenial-py3.7-gcc7.2)
-    ANACONDA_PYTHON_VERSION=3.7
+  pytorch-linux-xenial-py3.6-gcc7.2)
+    ANACONDA_PYTHON_VERSION=3.6
    GCC_VERSION=7
    # Do not install PROTOBUF, DB, and VISION as a test
    ;;
-  pytorch-linux-xenial-py3.7-gcc7)
-    ANACONDA_PYTHON_VERSION=3.7
+  pytorch-linux-xenial-py3.6-gcc7)
+    ANACONDA_PYTHON_VERSION=3.6
    GCC_VERSION=7
    PROTOBUF=yes
    DB=yes
    VISION=yes
    ;;
+  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4)
+    CUDA_VERSION=9.2
+    CUDNN_VERSION=7
+    ANACONDA_PYTHON_VERSION=3.6
+    GCC_VERSION=5
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    ;;
+  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7)
+    CUDA_VERSION=9.2
+    CUDNN_VERSION=7
+    ANACONDA_PYTHON_VERSION=3.6
+    GCC_VERSION=7
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    ;;
+  pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)
+    CUDA_VERSION=10.0
+    CUDNN_VERSION=7
+    ANACONDA_PYTHON_VERSION=3.6
+    GCC_VERSION=7
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    ;;
+  pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)
+    CUDA_VERSION=10.1
+    CUDNN_VERSION=7
+    ANACONDA_PYTHON_VERSION=3.6
+    GCC_VERSION=7
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    KATEX=yes
+    ;;
  pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)
    CUDA_VERSION=10.2
    CUDNN_VERSION=7
-    ANACONDA_PYTHON_VERSION=3.7
+    ANACONDA_PYTHON_VERSION=3.6
    GCC_VERSION=7
    PROTOBUF=yes
    DB=yes
    VISION=yes
    KATEX=yes
    ;;
-  pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)
-    CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
+  pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7)
+    CUDA_VERSION=11.0
    CUDNN_VERSION=8
-    TENSORRT_VERSION=8.0.1.6
-    ANACONDA_PYTHON_VERSION=3.7
+    ANACONDA_PYTHON_VERSION=3.6
    GCC_VERSION=7
    PROTOBUF=yes
    DB=yes
    VISION=yes
    KATEX=yes
    ;;
-  pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9)
-    CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
+  pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)
+    CUDA_VERSION=11.1
    CUDNN_VERSION=8
-    TENSORRT_VERSION=8.0.1.6
-    ANACONDA_PYTHON_VERSION=3.7
-    CLANG_VERSION=9
-    PROTOBUF=yes
-    DB=yes
-    VISION=yes
-    KATEX=yes
-    ;;
-  pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7)
-    CUDA_VERSION=11.6.0
-    CUDNN_VERSION=8
-    ANACONDA_PYTHON_VERSION=3.7
+    ANACONDA_PYTHON_VERSION=3.6
    GCC_VERSION=7
    PROTOBUF=yes
    DB=yes
@ -154,50 +169,44 @@ case "$image" in
    KATEX=yes
    ;;
  pytorch-linux-xenial-py3-clang5-asan)
-    ANACONDA_PYTHON_VERSION=3.7
+    ANACONDA_PYTHON_VERSION=3.6
    CLANG_VERSION=5.0
    PROTOBUF=yes
    DB=yes
    VISION=yes
    ;;
-  pytorch-linux-xenial-py3-clang7-asan)
-    ANACONDA_PYTHON_VERSION=3.7
-    CLANG_VERSION=7
-    PROTOBUF=yes
-    DB=yes
-    VISION=yes
-    ;;
  pytorch-linux-xenial-py3-clang7-onnx)
-    ANACONDA_PYTHON_VERSION=3.7
+    ANACONDA_PYTHON_VERSION=3.6
    CLANG_VERSION=7
    PROTOBUF=yes
    DB=yes
    VISION=yes
    ;;
  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
-    ANACONDA_PYTHON_VERSION=3.7
+    ANACONDA_PYTHON_VERSION=3.6
    CLANG_VERSION=5.0
    LLVMDEV=yes
    PROTOBUF=yes
    ANDROID=yes
    ANDROID_NDK_VERSION=r19c
-    GRADLE_VERSION=6.8.3
+    GRADLE_VERSION=4.10.3
+    CMAKE_VERSION=3.7.0
    NINJA_VERSION=1.9.0
    ;;
-  pytorch-linux-xenial-py3.7-clang7)
-    ANACONDA_PYTHON_VERSION=3.7
+  pytorch-linux-xenial-py3.6-clang7)
+    ANACONDA_PYTHON_VERSION=3.6
    CLANG_VERSION=7
    PROTOBUF=yes
    DB=yes
    VISION=yes
    ;;
-  pytorch-linux-bionic-py3.7-clang9)
-    ANACONDA_PYTHON_VERSION=3.7
+  pytorch-linux-bionic-py3.6-clang9)
+    ANACONDA_PYTHON_VERSION=3.6
    CLANG_VERSION=9
    PROTOBUF=yes
    DB=yes
    VISION=yes
-    VULKAN_SDK_VERSION=1.2.162.1
+    VULKAN_SDK_VERSION=1.2.148.0
    SWIFTSHADER=yes
    ;;
  pytorch-linux-bionic-py3.8-gcc9)
@ -207,49 +216,78 @@ case "$image" in
    DB=yes
    VISION=yes
    ;;
-  pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9)
+  pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)
    CUDA_VERSION=10.2
    CUDNN_VERSION=7
-    ANACONDA_PYTHON_VERSION=3.7
+    ANACONDA_PYTHON_VERSION=3.6
    CLANG_VERSION=9
    PROTOBUF=yes
    DB=yes
    VISION=yes
    ;;
-  pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)
+  pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)
    CUDA_VERSION=10.2
    CUDNN_VERSION=7
-    ANACONDA_PYTHON_VERSION=3.9
-    GCC_VERSION=7
-    PROTOBUF=yes
-    DB=yes
-    VISION=yes
-    ;;
-  pytorch-linux-bionic-rocm5.0-py3.7)
-    ANACONDA_PYTHON_VERSION=3.7
+    ANACONDA_PYTHON_VERSION=3.8
    GCC_VERSION=9
    PROTOBUF=yes
    DB=yes
    VISION=yes
-    ROCM_VERSION=5.0
    ;;
-  pytorch-linux-bionic-rocm5.1-py3.7)
-    ANACONDA_PYTHON_VERSION=3.7
+  pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)
+    CUDA_VERSION=11.0
+    CUDNN_VERSION=8
+    ANACONDA_PYTHON_VERSION=3.6
    GCC_VERSION=9
    PROTOBUF=yes
    DB=yes
    VISION=yes
-    ROCM_VERSION=5.1.1
-    ;;
-  pytorch-linux-focal-py3.7-gcc7)
-    ANACONDA_PYTHON_VERSION=3.7
-    CMAKE_VERSION=3.12.4  # To make sure XNNPACK is enabled for the BACKWARDS_COMPAT_TEST used with this image
-    GCC_VERSION=7
-    PROTOBUF=yes
-    DB=yes
-    VISION=yes
    KATEX=yes
    ;;
+  pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9)
+    CUDA_VERSION=11.0
+    CUDNN_VERSION=8
+    ANACONDA_PYTHON_VERSION=3.8
+    GCC_VERSION=9
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    KATEX=yes
+    ;;
+  pytorch-linux-bionic-cuda11.1-cudnn8-py3.6-gcc9)
+    CUDA_VERSION=11.1
+    CUDNN_VERSION=8
+    ANACONDA_PYTHON_VERSION=3.6
+    GCC_VERSION=9
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    KATEX=yes
+    ;;
+  pytorch-linux-bionic-cuda11.1-cudnn8-py3.8-gcc9)
+    CUDA_VERSION=11.1
+    CUDNN_VERSION=8
+    ANACONDA_PYTHON_VERSION=3.8
+    GCC_VERSION=9
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    KATEX=yes
+    ;;
+  pytorch-linux-bionic-rocm3.9-py3.6)
+    ANACONDA_PYTHON_VERSION=3.6
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    ROCM_VERSION=3.9
+    ;;
+  pytorch-linux-bionic-rocm3.10-py3.6)
+    ANACONDA_PYTHON_VERSION=3.6
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    ROCM_VERSION=3.10
+    ;;
  *)
    # Catch-all for builds that are not hardcoded.
    PROTOBUF=yes
@ -290,15 +328,7 @@ if [ -n "${JENKINS:-}" ]; then
  JENKINS_GID=$(id -g jenkins)
 fi

-tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
-
-#when using cudnn version 8 install it separately from cuda
-if [[ "$image" == *cuda*  && ${OS} == "ubuntu" ]]; then
-  IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
-  if [[ ${CUDNN_VERSION} == 8 ]]; then
-    IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
-  fi
-fi
+tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | fold -w 32 | head -n 1)"

 # Build image
 # TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm
@ -326,7 +356,6 @@ docker build \
       --build-arg "GCC_VERSION=${GCC_VERSION}" \
       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \
       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
-       --build-arg "TENSORRT_VERSION=${TENSORRT_VERSION}" \
       --build-arg "ANDROID=${ANDROID}" \
       --build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \
       --build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \
@ -336,8 +365,6 @@ docker build \
       --build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
       --build-arg "KATEX=${KATEX:-}" \
       --build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
-       --build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \
-       --build-arg "IMAGE_NAME=${IMAGE_NAME}" \
       -f $(dirname ${DOCKERFILE})/Dockerfile \
       -t "$tmp_tag" \
       "$@" \
@ -356,7 +383,6 @@ function drun() {
 }

 if [[ "$OS" == "ubuntu" ]]; then
-
  if !(drun lsb_release -a 2>&1 | grep -qF Ubuntu); then
    echo "OS=ubuntu, but:"
    drun lsb_release -a
--- a/.circleci/docker/build_docker.sh
+++ b/.circleci/docker/build_docker.sh
@ -26,14 +26,11 @@ login() {
    docker login -u AWS --password-stdin "$1"
 }

+# Retry on timeouts (can happen on job stampede).
+retry login "${registry}"

-# Only run these steps if not on github actions
-if [[ -z "${GITHUB_ACTIONS}" ]]; then
-  # Retry on timeouts (can happen on job stampede).
-  retry login "${registry}"
-  # Logout on exit
-  trap "docker logout ${registry}" EXIT
-fi
+# Logout on exit
+trap "docker logout ${registry}" EXIT

 # export EC2=1
 # export JENKINS=1
@ -48,8 +45,5 @@ fi

 docker push "${image}:${tag}"

-if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then
-  trap "rm -rf ${IMAGE_NAME}:${tag}.tar" EXIT
-  docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"
-  aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read
-fi
+docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"
+aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read
--- a/.circleci/docker/centos-rocm/Dockerfile
+++ b/.circleci/docker/centos-rocm/Dockerfile
@ -4,10 +4,6 @@ FROM centos:${CENTOS_VERSION}

 ARG CENTOS_VERSION

-# Set AMD gpu targets to build for
-ARG PYTORCH_ROCM_ARCH
-ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
-
 # Install required packages to build Caffe2

 # Install common dependencies (so that this step can be cached separately)
@ -15,12 +11,6 @@ ARG EC2
 ADD ./common/install_base.sh install_base.sh
 RUN bash ./install_base.sh && rm install_base.sh

-# Update CentOS git version
-RUN yum -y remove git
-RUN yum -y remove git-*
-RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm
-RUN yum install -y git
-
 # Install devtoolset
 ARG DEVTOOLSET_VERSION
 ADD ./common/install_devtoolset.sh install_devtoolset.sh
@ -37,13 +27,11 @@ RUN rm install_glibc.sh
 ADD ./common/install_user.sh install_user.sh
 RUN bash ./install_user.sh && rm install_user.sh

-# Install conda and other packages (e.g., numpy, pytest)
+# Install conda and other packages (e.g., numpy, coverage, pytest)
 ENV PATH /opt/conda/bin:$PATH
 ARG ANACONDA_PYTHON_VERSION
-ADD requirements-ci.txt /opt/conda/requirements-ci.txt
 ADD ./common/install_conda.sh install_conda.sh
 RUN bash ./install_conda.sh && rm install_conda.sh
-RUN rm /opt/conda/requirements-ci.txt

 # (optional) Install protobuf for ONNX
 ARG PROTOBUF
@ -76,7 +64,6 @@ ENV PATH /opt/rocm/hcc/bin:$PATH
 ENV PATH /opt/rocm/hip/bin:$PATH
 ENV PATH /opt/rocm/opencl/bin:$PATH
 ENV PATH /opt/rocm/llvm/bin:$PATH
-ENV MAGMA_HOME /opt/rocm/magma
 ENV LANG en_US.utf8
 ENV LC_ALL en_US.utf8

--- a/.circleci/docker/common/install_android.sh
+++ b/.circleci/docker/common/install_android.sh
@ -99,7 +99,7 @@ echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
 chown -R jenkins /var/lib/jenkins/gradledeps
 chgrp -R jenkins /var/lib/jenkins/gradledeps

-sudo -H -u jenkins $GRADLE_HOME/bin/gradle -Pandroid.useAndroidX=true -p /var/lib/jenkins/gradledeps -g /var/lib/jenkins/.gradle --refresh-dependencies --debug --stacktrace assemble
+sudo -H -u jenkins $GRADLE_HOME/bin/gradle -p /var/lib/jenkins/gradledeps -g /var/lib/jenkins/.gradle --refresh-dependencies --debug --stacktrace assemble

 chown -R jenkins /var/lib/jenkins/.gradle
 chgrp -R jenkins /var/lib/jenkins/.gradle
--- a/.circleci/docker/common/install_base.sh
+++ b/.circleci/docker/common/install_base.sh
@ -11,20 +11,10 @@ install_ubuntu() {
  #   "$UBUNTU_VERSION" == "18.04"
  if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then
    cmake3="cmake=3.10*"
-    maybe_libiomp_dev="libiomp-dev"
-  elif [[ "$UBUNTU_VERSION" == "20.04"* ]]; then
-    cmake3="cmake=3.16*"
-    maybe_libiomp_dev=""
  else
    cmake3="cmake=3.5*"
-    maybe_libiomp_dev="libiomp-dev"
  fi

-  # TODO: Remove this once nvidia package repos are back online
-  # Comment out nvidia repositories to prevent them from getting apt-get updated, see https://github.com/pytorch/pytorch/issues/74968
-  # shellcheck disable=SC2046
-  sed -i 's/.*nvidia.*/# &/' $(find /etc/apt/ -type f -name "*.list")
-
  # Install common dependencies
  apt-get update
  # TODO: Some of these may not be necessary
@ -43,21 +33,17 @@ install_ubuntu() {
    git \
    libatlas-base-dev \
    libc6-dbg \
-    ${maybe_libiomp_dev} \
+    libiomp-dev \
    libyaml-dev \
    libz-dev \
    libjpeg-dev \
    libasound2-dev \
    libsndfile-dev \
    software-properties-common \
-    wget \
    sudo \
+    wget \
    vim

-  # Should resolve issues related to various apt package repository cert issues
-  # see: https://github.com/pytorch/pytorch/issues/65931
-  apt-get install -y libgnutls30
-
  # Cleanup package manager
  apt-get autoclean && apt-get clean
  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
@ -91,7 +77,6 @@ install_centos() {
    glog-devel \
    hiredis-devel \
    libstdc++-devel \
-    libsndfile-devel \
    make \
    opencv-devel \
    sudo \
@ -123,11 +108,14 @@ esac
 # Install Valgrind separately since the apt-get version is too old.
 mkdir valgrind_build && cd valgrind_build
 VALGRIND_VERSION=3.16.1
-wget https://ossci-linux.s3.amazonaws.com/valgrind-${VALGRIND_VERSION}.tar.bz2
+if ! wget http://valgrind.org/downloads/valgrind-${VALGRIND_VERSION}.tar.bz2
+then
+  wget https://sourceware.org/ftp/valgrind/valgrind-${VALGRIND_VERSION}.tar.bz2
+fi
 tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2
 cd valgrind-${VALGRIND_VERSION}
 ./configure --prefix=/usr/local
-make -j6
+make -j 4
 sudo make install
 cd ../../
 rm -rf valgrind_build
--- a/.circleci/docker/common/install_cmake.sh
+++ b/.circleci/docker/common/install_cmake.sh
@ -4,9 +4,6 @@ set -ex

 [ -n "$CMAKE_VERSION" ]

-# Remove system cmake install so it won't get used instead
-apt-get remove cmake -y
-
 # Turn 3.6.3 into v3.6
 path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')
 file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"
--- a/.circleci/docker/common/install_conda.sh
+++ b/.circleci/docker/common/install_conda.sh
@ -21,7 +21,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
      ;;
  esac

-  mkdir -p /opt/conda
+  mkdir /opt/conda
  chown jenkins:jenkins /opt/conda

  # Work around bug where devtoolset replaces sudo and breaks it.
@ -68,44 +68,61 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
    as_jenkins conda install -q -y python="$ANACONDA_PYTHON_VERSION" $*
  }

-  pip_install() {
-    as_jenkins pip install --progress-bar off $*
-  }
-
  # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
-  # DO NOT install cmake here as it would install a version newer than 3.10, but
-  # we want to pin to version 3.10.
-  if [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then
+  # DO NOT install cmake here as it would install a version newer than 3.5, but
+  # we want to pin to version 3.5.
+  if [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
-    conda_install numpy=1.19.2 astunparse pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
-  elif [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
-    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
-    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
+    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
  elif [ "$ANACONDA_PYTHON_VERSION" = "3.7" ]; then
    # DO NOT install dataclasses if installing python-3.7, since its part of python-3.7 core packages
-    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six typing_extensions
+    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six typing_extensions
  else
-    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions
+    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions
  fi
-
-  # Magma package names are concatenation of CUDA major and minor ignoring revision
-  # I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89
-  if [ -n "$CUDA_VERSION" ]; then
-    conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch
+  if [[ "$CUDA_VERSION" == 9.2* ]]; then
+    conda_install magma-cuda92 -c pytorch
+  elif [[ "$CUDA_VERSION" == 10.0* ]]; then
+    conda_install magma-cuda100 -c pytorch
+  elif [[ "$CUDA_VERSION" == 10.1* ]]; then
+    conda_install magma-cuda101 -c pytorch
+  elif [[ "$CUDA_VERSION" == 10.2* ]]; then
+    conda_install magma-cuda102 -c pytorch
+  elif [[ "$CUDA_VERSION" == 11.0* ]]; then
+    conda_install magma-cuda110 -c pytorch
+  elif [[ "$CUDA_VERSION" == 11.1* ]]; then
+    conda_install magma-cuda111 -c pytorch
+  elif [[ "$CUDA_VERSION" == 11.2* ]]; then
+    conda_install magma-cuda112 -c pytorch
  fi

  # TODO: This isn't working atm
  conda_install nnpack -c killeent

  # Install some other packages, including those needed for Python test reporting
-  pip_install -r /opt/conda/requirements-ci.txt
+  # TODO: Why is scipy pinned
+  # Pin MyPy version because new errors are likely to appear with each release
+  # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
+  as_jenkins pip install --progress-bar off pytest \
+    scipy==1.1.0 \
+    scikit-image \
+    librosa>=0.6.2 \
+    psutil \
+    numba \
+    llvmlite \
+    unittest-xml-reporting \
+    boto3==1.16.34 \
+    coverage \
+    hypothesis==4.53.2 \
+    mypy==0.770 \
+    tb-nightly

  # Update scikit-learn to a python-3.8 compatible version
  if [[ $(python -c "import sys; print(int(sys.version_info >= (3, 8)))") == "1" ]]; then
-    pip_install -U scikit-learn
+    as_jenkins pip install --progress-bar off -U scikit-learn
  else
    # Pinned scikit-learn due to https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5 only)
-    pip_install scikit-learn==0.20.3
+    as_jenkins pip install --progress-bar off scikit-learn==0.20.3
  fi

  popd
--- a/.circleci/docker/common/install_cudnn.sh
+++ b/.circleci/docker/common/install_cudnn.sh
@ -1,18 +0,0 @@
-#!/bin/bash
-
-if [[ ${CUDNN_VERSION} == 8 ]]; then
-    # cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
-    mkdir tmp_cudnn && cd tmp_cudnn
-    CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive"
-    curl -OLs  https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
-    tar xf ${CUDNN_NAME}.tar.xz
-    cp -a ${CUDNN_NAME}/include/* /usr/include/
-    cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/
-    cp -a ${CUDNN_NAME}/include/* /usr/include/x86_64-linux-gnu/
-
-    cp -a ${CUDNN_NAME}/lib/* /usr/local/cuda/lib64/
-    cp -a ${CUDNN_NAME}/lib/* /usr/lib/x86_64-linux-gnu/
-    cd ..
-    rm -rf tmp_cudnn
-    ldconfig
-fi
--- a/.circleci/docker/common/install_db.sh
+++ b/.circleci/docker/common/install_db.sh
@ -2,6 +2,23 @@

 set -ex

+# This function installs protobuf 2.6
+install_protobuf_26() {
+  pb_dir="/usr/temp_pb_install_dir"
+  mkdir -p $pb_dir
+
+  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
+  # else it will fail with
+  #   g++: error: ./../lib64/crti.o: No such file or directory
+  ln -s /usr/lib64 "$pb_dir/lib64"
+
+  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
+  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
+  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
+  popd
+  rm -rf $pb_dir
+}
+
 install_ubuntu() {
  apt-get update
  apt-get install -y --no-install-recommends \
--- a/.circleci/docker/common/install_gcc.sh
+++ b/.circleci/docker/common/install_gcc.sh
@ -7,18 +7,15 @@ if [ -n "$GCC_VERSION" ]; then
  # Need the official toolchain repo to get alternate packages
  add-apt-repository ppa:ubuntu-toolchain-r/test
  apt-get update
-  if [[ "$UBUNTU_VERSION" == "16.04" && "${GCC_VERSION:0:1}" == "5" ]]; then
+  if [ "$UBUNTU_VERSION" = "16.04" -a "$GCC_VERSION" = "5" ]; then
    apt-get install -y g++-5=5.4.0-6ubuntu1~16.04.12
-    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
-    update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
-    update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-5 50
  else
    apt-get install -y g++-$GCC_VERSION
-    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
-    update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
-    update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50
  fi

+  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
+  update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
+  update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

  # Cleanup package manager
  apt-get autoclean && apt-get clean
--- a/.circleci/docker/common/install_katex.sh
+++ b/.circleci/docker/common/install_katex.sh
@ -3,9 +3,6 @@
 set -ex

 if [ -n "$KATEX" ]; then
-  apt-get update
-  # Ignore error if gpg-agent doesn't exist (for Ubuntu 16.04)
-  apt-get install -y gpg-agent || :

  curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
  sudo apt-get install -y nodejs
--- a/.circleci/docker/common/install_nccl.sh
+++ b/.circleci/docker/common/install_nccl.sh
@ -0,0 +1,4 @@
+#!/bin/bash
+
+sudo apt-get -qq update
+sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
--- a/.circleci/docker/common/install_openmpi.sh
+++ b/.circleci/docker/common/install_openmpi.sh
@ -1,10 +1,4 @@
 #!/bin/bash

 sudo apt-get update
-# also install ssh to avoid error of:
-# --------------------------------------------------------------------------
-# The value of the MCA parameter "plm_rsh_agent" was set to a path
-# that could not be found:
-#   plm_rsh_agent: ssh : rsh
-sudo apt-get install -y ssh
 sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
--- a/.circleci/docker/common/install_openssl.sh
+++ b/.circleci/docker/common/install_openssl.sh
@ -1,14 +0,0 @@
-#!/bin/bash
-
-set -ex
-
-OPENSSL=openssl-1.1.1k
-
-wget -q -O "${OPENSSL}.tar.gz" "https://ossci-linux.s3.amazonaws.com/${OPENSSL}.tar.gz"
-tar xf "${OPENSSL}.tar.gz"
-cd "${OPENSSL}"
-./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'
-# NOTE: openssl install errors out when built with the -j option
-make -j6; make install_sw
-cd ..
-rm -rf "${OPENSSL}"
--- a/.circleci/docker/common/install_protobuf.sh
+++ b/.circleci/docker/common/install_protobuf.sh
@ -2,8 +2,8 @@

 set -ex

-# This function installs protobuf 3.17
-install_protobuf_317() {
+# This function installs protobuf 2.6
+install_protobuf_26() {
  pb_dir="/usr/temp_pb_install_dir"
  mkdir -p $pb_dir

@ -12,32 +12,37 @@ install_protobuf_317() {
  #   g++: error: ./../lib64/crti.o: No such file or directory
  ln -s /usr/lib64 "$pb_dir/lib64"

-  curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz"
-  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
-  # -j6 to balance memory usage and speed.
-  # naked `-j` seems to use too much memory.
-  pushd "$pb_dir" && ./configure && make -j6 && make -j6 check && sudo make -j6 install && sudo ldconfig
+  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
+  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
+  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
  popd
  rm -rf $pb_dir
 }

 install_ubuntu() {
-  # Ubuntu 14.04 has cmake 2.8.12 as the default option, so we will
+  # Ubuntu 14.04 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
+  # so we install that here if on 14.04
+  # Ubuntu 14.04 also has cmake 2.8.12 as the default option, so we will
  # install cmake3 here and use cmake3.
  apt-get update
  if [[ "$UBUNTU_VERSION" == 14.04 ]]; then
    apt-get install -y --no-install-recommends cmake3
+    install_protobuf_26
+  else
+    apt-get install -y --no-install-recommends \
+            libprotobuf-dev \
+            protobuf-compiler
  fi

  # Cleanup
  apt-get autoclean && apt-get clean
  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
-
-  install_protobuf_317
 }

 install_centos() {
-  install_protobuf_317
+  # Centos7 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
+  # so we always install install that here
+  install_protobuf_26
 }

 # Install base packages depending on the base OS
--- a/.circleci/docker/common/install_rocm.sh
+++ b/.circleci/docker/common/install_rocm.sh
@ -4,49 +4,26 @@ set -ex

 install_magma() {
    # "install" hipMAGMA into /opt/rocm/magma by copying after build
-    git clone https://bitbucket.org/icl/magma.git
+    git clone https://bitbucket.org/icl/magma.git -b hipMAGMA
    pushd magma
-    # Fixes memory leaks of magma found while executing linalg UTs
-    git checkout 5959b8783e45f1809812ed96ae762f38ee701972
-    cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
+    cp make.inc-examples/make.inc.hip-mkl-gcc make.inc
    echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
    echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
-    echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
+    echo 'DEVCCFLAGS += --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908' >> make.inc
    export PATH="${PATH}:/opt/rocm/bin"
-    if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
-      amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
-    else
-      amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
-    fi
-    for arch in $amdgpu_targets; do
-      echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
-    done
-    # hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
-    sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
    make -f make.gen.hipMAGMA -j $(nproc)
-    LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
+    make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
    make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
    popd
    mv magma /opt/rocm
 }

-ver() {
-    printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');
-}
-
-# Map ROCm version to AMDGPU version
-declare -A AMDGPU_VERSIONS=( ["4.5.2"]="21.40.2" ["5.0"]="21.50" ["5.1.1"]="22.10.1" )
-
 install_ubuntu() {
    apt-get update
    if [[ $UBUNTU_VERSION == 18.04 ]]; then
      # gpg-agent is not available by default on 18.04
      apt-get install -y --no-install-recommends gpg-agent
    fi
-    if [[ $UBUNTU_VERSION == 20.04 ]]; then
-      # gpg-agent is not available by default on 20.04
-      apt-get install -y --no-install-recommends gpg-agent
-    fi
    apt-get install -y kmod
    apt-get install -y wget

@ -54,22 +31,9 @@ install_ubuntu() {
    apt-get install -y libc++1
    apt-get install -y libc++abi1

-    if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then
-        # Add amdgpu repository
-        UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
-        local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/ubuntu"
-        echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
-    fi
-
-    ROCM_REPO="ubuntu"
-    if [[ $(ver $ROCM_VERSION) -lt $(ver 4.2) ]]; then
-        ROCM_REPO="xenial"
-    fi
-
    # Add rocm repository
    wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
-    local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
-    echo "deb [arch=amd64] ${rocm_baseurl} ${ROCM_REPO} main" > /etc/apt/sources.list.d/rocm.list
+    echo "deb [arch=amd64] http://repo.radeon.com/rocm/apt/${ROCM_VERSION} xenial main" > /etc/apt/sources.list.d/rocm.list
    apt-get update --allow-insecure-repositories

    DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
@ -106,24 +70,11 @@ install_centos() {
  yum install -y epel-release
  yum install -y dkms kernel-headers-`uname -r` kernel-devel-`uname -r`

-  if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then
-      # Add amdgpu repository
-      local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/7.9/main/x86_64"
-      echo "[AMDGPU]" > /etc/yum.repos.d/amdgpu.repo
-      echo "name=AMDGPU" >> /etc/yum.repos.d/amdgpu.repo
-      echo "baseurl=${amdgpu_baseurl}" >> /etc/yum.repos.d/amdgpu.repo
-      echo "enabled=1" >> /etc/yum.repos.d/amdgpu.repo
-      echo "gpgcheck=1" >> /etc/yum.repos.d/amdgpu.repo
-      echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/amdgpu.repo
-  fi
-
-  local rocm_baseurl="http://repo.radeon.com/rocm/yum/${ROCM_VERSION}"
  echo "[ROCm]" > /etc/yum.repos.d/rocm.repo
  echo "name=ROCm" >> /etc/yum.repos.d/rocm.repo
-  echo "baseurl=${rocm_baseurl}" >> /etc/yum.repos.d/rocm.repo
+  echo "baseurl=http://repo.radeon.com/rocm/yum/${ROCM_VERSION}" >> /etc/yum.repos.d/rocm.repo
  echo "enabled=1" >> /etc/yum.repos.d/rocm.repo
-  echo "gpgcheck=1" >> /etc/yum.repos.d/rocm.repo
-  echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/rocm.repo
+  echo "gpgcheck=0" >> /etc/yum.repos.d/rocm.repo

  yum update -y

--- a/.circleci/docker/common/install_user.sh
+++ b/.circleci/docker/common/install_user.sh
@ -3,11 +3,8 @@
 set -ex

 # Mirror jenkins user in container
-# jenkins user as ec2-user should have the same user-id
-echo "jenkins:x:1000:1000::/var/lib/jenkins:" >> /etc/passwd
-echo "jenkins:x:1000:" >> /etc/group
-# Needed on focal or newer
-echo "jenkins:*:19110:0:99999:7:::" >>/etc/shadow
+echo "jenkins:x:1014:1014::/var/lib/jenkins:" >> /etc/passwd
+echo "jenkins:x:1014:" >> /etc/group

 # Create $HOME
 mkdir -p /var/lib/jenkins
@ -21,6 +18,3 @@ chown jenkins:jenkins /usr/local
 # Allow sudo
 # TODO: Maybe we shouldn't
 echo 'jenkins ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/jenkins
-
-# Test that sudo works
-sudo -u jenkins sudo -v
--- a/.circleci/docker/common/install_vision.sh
+++ b/.circleci/docker/common/install_vision.sh
@ -2,6 +2,23 @@

 set -ex

+# This function installs protobuf 2.6
+install_protobuf_26() {
+  pb_dir="/usr/temp_pb_install_dir"
+  mkdir -p $pb_dir
+
+  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
+  # else it will fail with
+  #   g++: error: ./../lib64/crti.o: No such file or directory
+  ln -s /usr/lib64 "$pb_dir/lib64"
+
+  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
+  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
+  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
+  popd
+  rm -rf $pb_dir
+}
+
 install_ubuntu() {
  apt-get update
  apt-get install -y --no-install-recommends \
--- a/.circleci/docker/common/install_vulkan_sdk.sh
+++ b/.circleci/docker/common/install_vulkan_sdk.sh
@ -8,17 +8,16 @@ retry () {
    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
 }

+_https_amazon_aws=https://ossci-android.s3.amazonaws.com
+
 _vulkansdk_dir=/var/lib/jenkins/vulkansdk
+mkdir -p $_vulkansdk_dir
 _tmp_vulkansdk_targz=/tmp/vulkansdk.tar.gz
+curl --silent --show-error --location --fail --retry 3 \
+  --output "$_tmp_vulkansdk_targz" "$_https_amazon_aws/vulkansdk-linux-x86_64-${VULKAN_SDK_VERSION}.tar.gz"

-curl \
-  --silent \
-  --show-error \
-  --location \
-  --fail \
-  --retry 3 \
-  --output "${_tmp_vulkansdk_targz}" "https://ossci-android.s3.amazonaws.com/vulkansdk-linux-x86_64-${VULKAN_SDK_VERSION}.tar.gz"
+tar -C "$_vulkansdk_dir" -xzf "$_tmp_vulkansdk_targz" --strip-components 1

-mkdir -p "${_vulkansdk_dir}"
-tar -C "${_vulkansdk_dir}" -xzf "${_tmp_vulkansdk_targz}" --strip-components 1
-rm -rf "${_tmp_vulkansdk_targz}"
+export VULKAN_SDK="$_vulkansdk_dir/"
+
+rm "$_tmp_vulkansdk_targz"
--- a/.circleci/docker/requirements-ci.txt
+++ b/.circleci/docker/requirements-ci.txt
@ -1,212 +0,0 @@
-# Python dependencies required for unit tests
-
-#awscli==1.6 #this breaks some platforms
-#Description: AWS command line interface
-#Pinned versions: 1.6
-#test that import:
-
-boto3==1.19.12
-#Description: AWS SDK for python
-#Pinned versions: 1.19.12, 1.16.34
-#test that import:
-
-click
-#Description: Command Line Interface Creation Kit
-#Pinned versions:
-#test that import:
-
-coremltools==5.0b5
-#Description: Apple framework for ML integration
-#Pinned versions: 5.0b5
-#test that import:
-
-#dataclasses #this breaks some platforms
-#Description: Provides decorators for auto adding special methods to user classes
-#Pinned versions:
-#test that import:
-
-expecttest==0.1.3
-#Description: method for writing tests where test framework auto populates
-# the expected output based on previous runs
-#Pinned versions: 0.1.3
-#test that import:
-
-flatbuffers==2.0
-#Description: cross platform serialization library
-#Pinned versions: 2.0
-#test that import:
-
-#future #this breaks linux-bionic-rocm4.5-py3.7
-#Description: compatibility layer between python 2 and python 3
-#Pinned versions:
-#test that import:
-
-hypothesis==4.53.2
-# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
-#Description: advanced library for generating parametrized tests
-#Pinned versions: 3.44.6, 4.53.2
-#test that import: test_xnnpack_integration.py, test_pruning_op.py, test_nn.py
-
-junitparser==2.1.1
-#Description: unitparser handles JUnit/xUnit Result XML files
-#Pinned versions: 2.1.1
-#test that import:
-
-librosa>=0.6.2
-#Description: A python package for music and audio analysis
-#Pinned versions: >=0.6.2
-#test that import: test_spectral_ops.py
-
-#mkl #this breaks linux-bionic-rocm4.5-py3.7
-#Description: Intel oneAPI Math Kernel Library
-#Pinned versions:
-#test that import: test_profiler.py, test_public_bindings.py, test_testing.py,
-#test_nn.py, test_mkldnn.py, test_jit.py, test_fx_experimental.py,
-#test_autograd.py
-
-#mkl-devel
-# see mkl
-
-#mock # breaks ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c
-#Description: A testing library that allows you to replace parts of your
-#system under test with mock objects
-#Pinned versions:
-#test that import: test_module_init.py, test_modules.py, test_nn.py,
-#test_testing.py
-
-#MonkeyType # breaks pytorch-xla-linux-bionic-py3.7-clang8
-#Description: collects runtime types of function arguments and return
-#values, and can automatically generate stub files
-#Pinned versions:
-#test that import:
-
-mypy==0.812
-# Pin MyPy version because new errors are likely to appear with each release
-#Description: linter
-#Pinned versions: 0.812
-#test that import: test_typing.py, test_type_hints.py
-
-#networkx
-#Description: creation, manipulation, and study of
-#the structure, dynamics, and functions of complex networks
-#Pinned versions: 2.0
-#test that import:
-
-#ninja
-#Description: build system.  Note that it install from
-#here breaks things so it is commented out
-#Pinned versions: 1.10.0.post1
-#test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py
-
-numba==0.49.0 ; python_version < "3.9"
-numba==0.54.1 ; python_version == "3.9"
-#Description: Just-In-Time Compiler for Numerical Functions
-#Pinned versions: 0.54.1, 0.49.0, <=0.49.1
-#test that import: test_numba_integration.py
-#For numba issue see https://github.com/pytorch/pytorch/issues/51511
-
-#numpy
-#Description: Provides N-dimensional arrays and linear algebra
-#Pinned versions: 1.20
-#test that import: test_view_ops.py, test_unary_ufuncs.py, test_type_promotion.py,
-#test_type_info.py, test_torch.py, test_tensorexpr_pybind.py, test_tensorexpr.py,
-#test_tensorboard.py, test_tensor_creation_ops.py, test_static_runtime.py,
-#test_spectral_ops.py, test_sort_and_select.py, test_shape_ops.py,
-#test_segment_reductions.py, test_reductions.py, test_pruning_op.py,
-#test_overrides.py, test_numpy_interop.py, test_numba_integration.py
-#test_nn.py, test_namedtensor.py, test_linalg.py, test_jit_cuda_fuser.py,
-#test_jit.py, test_indexing.py, test_datapipe.py, test_dataloader.py,
-#test_binary_ufuncs.py
-
-#onnxruntime
-#Description: scoring engine for Open Neural Network Exchange (ONNX) models
-#Pinned versions: 1.9.0
-#test that import:
-
-#pillow
-#Description:  Python Imaging Library fork
-#Pinned versions:
-#test that import:
-
-#protobuf
-#Description:  Google’s data interchange format
-#Pinned versions:
-#test that import: test_tensorboard.py
-
-psutil
-#Description: information on running processes and system utilization
-#Pinned versions:
-#test that import: test_profiler.py, test_openmp.py, test_dataloader.py
-
-pytest
-#Description: testing framework
-#Pinned versions:
-#test that import: test_typing.py, test_cpp_extensions_aot.py, run_test.py
-
-#pytest-benchmark
-#Description: fixture for benchmarking code
-#Pinned versions: 3.2.3
-#test that import:
-
-#pytest-sugar
-#Description: shows failures and errors instantly
-#Pinned versions:
-#test that import:
-
-#PyYAML
-#Description: data serialization format
-#Pinned versions:
-#test that import:
-
-#requests
-#Description: HTTP library
-#Pinned versions:
-#test that import: test_type_promotion.py
-
-#rich
-#Description: rich text and beautiful formatting in the terminal
-#Pinned versions: 10.9.0
-#test that import:
-
-scikit-image
-#Description: image processing routines
-#Pinned versions:
-#test that import: test_nn.py
-
-#scikit-learn
-#Description: machine learning package
-#Pinned versions: 0.20.3
-#test that import:
-
-scipy==1.6.3
-# Pin SciPy because of failing distribution tests (see #60347)
-#Description: scientific python
-#Pinned versions: 1.6.3
-#test that import: test_unary_ufuncs.py, test_torch.py,test_tensor_creation_ops.py
-#test_spectral_ops.py, test_sparse_csr.py, test_reductions.py,test_nn.py
-#test_linalg.py, test_binary_ufuncs.py
-
-#tabulate
-#Description: Pretty-print tabular data
-#Pinned versions:
-#test that import:
-
-tb-nightly
-#Description: TensorBoard
-#Pinned versions:
-#test that import:
-
-#typing-extensions
-#Description: type hints for python
-#Pinned versions:
-#test that import:
-
-#virtualenv
-#Description: virtual environment for python
-#Pinned versions:
-#test that import:
-
-unittest-xml-reporting<=3.2.0,>=2.0.0
-#Description: saves unit test results to xml
-#Pinned versions:
-#test that import:
--- a/.circleci/docker/ubuntu-cuda/Dockerfile
+++ b/.circleci/docker/ubuntu-cuda/Dockerfile
@ -1,11 +1,12 @@
 ARG UBUNTU_VERSION
 ARG CUDA_VERSION
-ARG IMAGE_NAME
+ARG CUDNN_VERSION

-FROM ${IMAGE_NAME}
+FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}

 ARG UBUNTU_VERSION
 ARG CUDA_VERSION
+ARG CUDNN_VERSION

 ENV DEBIAN_FRONTEND noninteractive

@ -23,13 +24,11 @@ ARG KATEX
 ADD ./common/install_katex.sh install_katex.sh
 RUN bash ./install_katex.sh && rm install_katex.sh

-# Install conda and other packages (e.g., numpy, pytest)
+# Install conda and other packages (e.g., numpy, coverage, pytest)
 ENV PATH /opt/conda/bin:$PATH
 ARG ANACONDA_PYTHON_VERSION
-ADD requirements-ci.txt /opt/conda/requirements-ci.txt
 ADD ./common/install_conda.sh install_conda.sh
 RUN bash ./install_conda.sh && rm install_conda.sh
-RUN rm /opt/conda/requirements-ci.txt

 # Install gcc
 ARG GCC_VERSION
@ -62,27 +61,22 @@ RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
 RUN rm install_vision.sh
 ENV INSTALLED_VISION ${VISION}

-ADD ./common/install_openssl.sh install_openssl.sh
-ENV OPENSSL_ROOT_DIR /opt/openssl
-RUN bash ./install_openssl.sh
-
-# (optional) Install non-default CMake version
-ARG CMAKE_VERSION
-ADD ./common/install_cmake.sh install_cmake.sh
-RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
-RUN rm install_cmake.sh
-
 # Install ccache/sccache (do this last, so we get priority in PATH)
 ADD ./common/install_cache.sh install_cache.sh
 ENV PATH /opt/cache/bin:$PATH
 RUN bash ./install_cache.sh && rm install_cache.sh
-ENV CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache
+ENV CUDA_NVCC_EXECUTABLE=/opt/cache/lib/nvcc

 # Add jni.h for java host build
 ADD ./common/install_jni.sh install_jni.sh
 ADD ./java/jni.h jni.h
 RUN bash ./install_jni.sh && rm install_jni.sh

+# Install NCCL for when CUDA is version 10.1
+ADD ./common/install_nccl.sh install_nccl.sh
+RUN if [ "${CUDA_VERSION}" = 10.1 ]; then bash ./install_nccl.sh; fi
+RUN rm install_nccl.sh
+
 # Install Open MPI for CUDA
 ADD ./common/install_openmpi.sh install_openmpi.sh
 RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
@ -95,16 +89,9 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
 # AWS specific CUDA build guidance
 ENV TORCH_CUDA_ARCH_LIST Maxwell
 ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
-ENV CUDA_PATH /usr/local/cuda

 # Install LLVM dev version (Defined in the pytorch/builder github repository)
 COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

-# Install CUDNN
-ARG CUDNN_VERSION
-ADD ./common/install_cudnn.sh install_cudnn.sh
-RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi
-RUN rm install_cudnn.sh
-
 USER jenkins
 CMD ["bash"]
--- a/.circleci/docker/ubuntu-rocm/Dockerfile
+++ b/.circleci/docker/ubuntu-rocm/Dockerfile
@ -6,10 +6,6 @@ ARG UBUNTU_VERSION

 ENV DEBIAN_FRONTEND noninteractive

-# Set AMD gpu targets to build for
-ARG PYTORCH_ROCM_ARCH
-ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
-
 # Install common dependencies (so that this step can be cached separately)
 ARG EC2
 ADD ./common/install_base.sh install_base.sh
@ -25,18 +21,11 @@ RUN bash ./install_clang.sh && rm install_clang.sh
 ADD ./common/install_user.sh install_user.sh
 RUN bash ./install_user.sh && rm install_user.sh

-# Install conda and other packages (e.g., numpy, pytest)
+# Install conda and other packages (e.g., numpy, coverage, pytest)
 ENV PATH /opt/conda/bin:$PATH
 ARG ANACONDA_PYTHON_VERSION
-ADD requirements-ci.txt /opt/conda/requirements-ci.txt
 ADD ./common/install_conda.sh install_conda.sh
 RUN bash ./install_conda.sh && rm install_conda.sh
-RUN rm /opt/conda/requirements-ci.txt
-
-# Install gcc
-ARG GCC_VERSION
-ADD ./common/install_gcc.sh install_gcc.sh
-RUN bash ./install_gcc.sh && rm install_gcc.sh

 # (optional) Install protobuf for ONNX
 ARG PROTOBUF
--- a/.circleci/docker/ubuntu/Dockerfile
+++ b/.circleci/docker/ubuntu/Dockerfile
@ -33,13 +33,11 @@ ARG KATEX
 ADD ./common/install_katex.sh install_katex.sh
 RUN bash ./install_katex.sh && rm install_katex.sh

-# Install conda and other packages (e.g., numpy, pytest)
+# Install conda and other packages (e.g., numpy, coverage, pytest)
 ENV PATH /opt/conda/bin:$PATH
 ARG ANACONDA_PYTHON_VERSION
-ADD requirements-ci.txt /opt/conda/requirements-ci.txt
 ADD ./common/install_conda.sh install_conda.sh
 RUN bash ./install_conda.sh && rm install_conda.sh
-RUN rm /opt/conda/requirements-ci.txt

 # Install gcc
 ARG GCC_VERSION
@ -108,10 +106,6 @@ ADD ./common/install_ninja.sh install_ninja.sh
 RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
 RUN rm install_ninja.sh

-ADD ./common/install_openssl.sh install_openssl.sh
-RUN bash ./install_openssl.sh
-ENV OPENSSL_ROOT_DIR /opt/openssl
-
 # Install ccache/sccache (do this last, so we get priority in PATH)
 ADD ./common/install_cache.sh install_cache.sh
 ENV PATH /opt/cache/bin:$PATH
--- a/.circleci/ecr_gc_docker/Dockerfile
+++ b/.circleci/ecr_gc_docker/Dockerfile
@ -0,0 +1,13 @@
+FROM ubuntu:16.04
+
+RUN apt-get update && apt-get install -y python-pip git && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log
+
+ADD requirements.txt /requirements.txt
+
+RUN pip install -r /requirements.txt
+
+ADD gc.py /usr/bin/gc.py
+
+ADD docker_hub.py /usr/bin/docker_hub.py
+
+ENTRYPOINT ["/usr/bin/gc.py"]
--- a/.circleci/ecr_gc_docker/docker_hub.py
+++ b/.circleci/ecr_gc_docker/docker_hub.py
@ -0,0 +1,125 @@
+#!/usr/bin/env python
+
+from collections import namedtuple
+
+import boto3
+import requests
+import os
+
+
+IMAGE_INFO = namedtuple(
+    "IMAGE_INFO", ("repo", "tag", "size", "last_updated_at", "last_updated_by")
+)
+
+
+def build_access_token(username, passwordtr):
+    r = requests.post(
+        "https://hub.docker.com/v2/users/login/",
+        data={"username": username, "password": password},
+    )
+    r.raise_for_status()
+    token = r.json().get("token")
+    return {"Authorization": "JWT " + token}
+
+
+def list_repos(user, token):
+    r = requests.get("https://hub.docker.com/v2/repositories/" + user, headers=token)
+    r.raise_for_status()
+    ret = sorted(
+        repo["user"] + "/" + repo["name"] for repo in r.json().get("results", [])
+    )
+    if ret:
+        print("repos found:")
+        print("".join("\n\t" + r for r in ret))
+    return ret
+
+
+def list_tags(repo, token):
+    r = requests.get(
+        "https://hub.docker.com/v2/repositories/" + repo + "/tags", headers=token
+    )
+    r.raise_for_status()
+    return [
+        IMAGE_INFO(
+            repo=repo,
+            tag=t["name"],
+            size=t["full_size"],
+            last_updated_at=t["last_updated"],
+            last_updated_by=t["last_updater_username"],
+        )
+        for t in r.json().get("results", [])
+    ]
+
+
+def save_to_s3(tags):
+    table_content = ""
+    client = boto3.client("s3")
+    for t in tags:
+        table_content += (
+            "<tr><td>{repo}</td><td>{tag}</td><td>{size}</td>"
+            "<td>{last_updated_at}</td><td>{last_updated_by}</td></tr>"
+        ).format(
+            repo=t.repo,
+            tag=t.tag,
+            size=t.size,
+            last_updated_at=t.last_updated_at,
+            last_updated_by=t.last_updated_by,
+        )
+    html_body = """
+    <html>
+        <head>
+            <link rel="stylesheet"
+                href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"
+                integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
+                crossorigin="anonymous">
+            <link rel="stylesheet" type="text/css"
+                href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">
+            <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">
+            </script>
+            <script type="text/javascript" charset="utf8"
+                src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>
+            <title> docker image info</title>
+        </head>
+        <body>
+            <table class="table table-striped table-hover" id="docker">
+            <caption>Docker images on docker hub</caption>
+            <thead class="thead-dark">
+                <tr>
+                <th scope="col">repo</th>
+                <th scope="col">tag</th>
+                <th scope="col">size</th>
+                <th scope="col">last_updated_at</th>
+                <th scope="col">last_updated_by</th>
+                </tr>
+            </thead>
+            <tbody>
+                {table_content}
+            </tbody>
+            </table>
+        </body>
+        <script>
+            $(document).ready( function () {{
+                $('#docker').DataTable({{paging: false}});
+            }} );py
+        </script>
+    </html>
+    """.format(
+        table_content=table_content
+    )
+    client.put_object(
+        Bucket="docker.pytorch.org",
+        ACL="public-read",
+        Key="docker_hub.html",
+        Body=html_body,
+        ContentType="text/html",
+    )
+
+
+if __name__ == "__main__":
+    username = os.environ.get("DOCKER_HUB_USERNAME")
+    password = os.environ.get("DOCKER_HUB_PASSWORD")
+    token = build_access_token(username, password)
+    tags = []
+    for repo in list_repos("pytorch", token):
+        tags.extend(list_tags(repo, token))
+    save_to_s3(tags)
--- a/.circleci/ecr_gc_docker/gc.py
+++ b/.circleci/ecr_gc_docker/gc.py
@ -0,0 +1,214 @@
+#!/usr/bin/env python
+
+import argparse
+import datetime
+import boto3
+import pytz
+import sys
+import re
+
+
+def save_to_s3(project, data):
+    table_content = ""
+    client = boto3.client("s3")
+    for repo, tag, window, age, pushed in data:
+        table_content += "<tr><td>{repo}</td><td>{tag}</td><td>{window}</td><td>{age}</td><td>{pushed}</td></tr>".format(
+            repo=repo, tag=tag, window=window, age=age, pushed=pushed
+        )
+    html_body = """
+    <html>
+        <head>
+            <link rel="stylesheet"
+                href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"
+                integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
+                crossorigin="anonymous">
+            <link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">
+            <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
+            <script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>
+            <title>{project} nightly and permanent docker image info</title>
+        </head>
+        <body>
+            <table class="table table-striped table-hover" id="docker">
+            <thead class="thead-dark">
+                <tr>
+                <th scope="col">repo</th>
+                <th scope="col">tag</th>
+                <th scope="col">keep window</th>
+                <th scope="col">age</th>
+                <th scope="col">pushed at</th>
+                </tr>
+            </thead>
+            <tbody>
+                {table_content}
+            </tbody>
+            </table>
+        </body>
+        <script>
+            $(document).ready( function () {{
+                $('#docker').DataTable({{paging: false}});
+            }} );
+        </script>
+    </html>
+    """.format(
+        project=project, table_content=table_content
+    )
+
+    # for pytorch, file can be found at
+    # http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html
+    # and later one we can config docker.pytorch.org to point to the location
+
+    client.put_object(
+        Bucket="docker.pytorch.org",
+        ACL="public-read",
+        Key="{project}.html".format(project=project),
+        Body=html_body,
+        ContentType="text/html",
+    )
+
+
+def repos(client):
+    paginator = client.get_paginator("describe_repositories")
+    pages = paginator.paginate(registryId="308535385114")
+    for page in pages:
+        for repo in page["repositories"]:
+            yield repo
+
+
+def images(client, repository):
+    paginator = client.get_paginator("describe_images")
+    pages = paginator.paginate(
+        registryId="308535385114", repositoryName=repository["repositoryName"]
+    )
+    for page in pages:
+        for image in page["imageDetails"]:
+            yield image
+
+
+parser = argparse.ArgumentParser(description="Delete old Docker tags from registry")
+parser.add_argument(
+    "--dry-run", action="store_true", help="Dry run; print tags that would be deleted"
+)
+parser.add_argument(
+    "--debug", action="store_true", help="Debug, print ignored / saved tags"
+)
+parser.add_argument(
+    "--keep-stable-days",
+    type=int,
+    default=14,
+    help="Days of stable Docker tags to keep (non per-build images)",
+)
+parser.add_argument(
+    "--keep-unstable-days",
+    type=int,
+    default=1,
+    help="Days of unstable Docker tags to keep (per-build images)",
+)
+parser.add_argument(
+    "--filter-prefix",
+    type=str,
+    default="",
+    help="Only run cleanup for repositories with this prefix",
+)
+parser.add_argument(
+    "--ignore-tags",
+    type=str,
+    default="",
+    help="Never cleanup these tags (comma separated)",
+)
+args = parser.parse_args()
+
+if not args.ignore_tags or not args.filter_prefix:
+    print(
+        """
+Missing required arguments --ignore-tags and --filter-prefix
+
+You must specify --ignore-tags and --filter-prefix to avoid accidentally
+pruning a stable Docker tag which is being actively used.  This will
+make you VERY SAD.  So pay attention.
+
+First, which filter-prefix do you want?  The list of valid prefixes
+is in jobs/private.groovy under the 'docker-registry-cleanup' job.
+You probably want either pytorch or caffe2.
+
+Second, which ignore-tags do you want?  It should be whatever the most
+up-to-date DockerVersion for the repository in question is.  Follow
+the imports of jobs/pytorch.groovy to find them.
+"""
+    )
+    sys.exit(1)
+
+client = boto3.client("ecr", region_name="us-east-1")
+stable_window = datetime.timedelta(days=args.keep_stable_days)
+unstable_window = datetime.timedelta(days=args.keep_unstable_days)
+now = datetime.datetime.now(pytz.UTC)
+ignore_tags = args.ignore_tags.split(",")
+
+
+def chunks(chunkable, n):
+    """ Yield successive n-sized chunks from l.
+    """
+    for i in range(0, len(chunkable), n):
+        yield chunkable[i : i + n]
+
+SHA_PATTERN = re.compile(r'^[0-9a-f]{40}$')
+def looks_like_git_sha(tag):
+    """Returns a boolean to check if a tag looks like a git sha
+
+    For reference a sha1 is 40 characters with only 0-9a-f and contains no
+    "-" characters
+    """
+    return re.match(SHA_PATTERN, tag) is not None
+
+stable_window_tags = []
+for repo in repos(client):
+    repositoryName = repo["repositoryName"]
+    if not repositoryName.startswith(args.filter_prefix):
+        continue
+
+    # Keep list of image digests to delete for this repository
+    digest_to_delete = []
+
+    for image in images(client, repo):
+        tags = image.get("imageTags")
+        if not isinstance(tags, (list,)) or len(tags) == 0:
+            continue
+        created = image["imagePushedAt"]
+        age = now - created
+        for tag in tags:
+            if any([
+                    looks_like_git_sha(tag),
+                    tag.isdigit(),
+                    tag.count("-") == 4,  # TODO: Remove, this no longer applies as tags are now built using a SHA1
+                    tag in ignore_tags]):
+                window = stable_window
+                if tag in ignore_tags:
+                    stable_window_tags.append((repositoryName, tag, "", age, created))
+                elif age < window:
+                    stable_window_tags.append((repositoryName, tag, window, age, created))
+            else:
+                window = unstable_window
+
+            if tag in ignore_tags or age < window:
+                if args.debug:
+                    print("Ignoring {}:{} (age: {})".format(repositoryName, tag, age))
+                break
+        else:
+            for tag in tags:
+                print("{}Deleting {}:{} (age: {})".format("(dry run) " if args.dry_run else "", repositoryName, tag, age))
+            digest_to_delete.append(image["imageDigest"])
+    if args.dry_run:
+        if args.debug:
+            print("Skipping actual deletion, moving on...")
+    else:
+        # Issue batch delete for all images to delete for this repository
+        # Note that as of 2018-07-25, the maximum number of images you can
+        # delete in a single batch is 100, so chunk our list into batches of
+        # 100
+        for c in chunks(digest_to_delete, 100):
+            client.batch_delete_image(
+                registryId="308535385114",
+                repositoryName=repositoryName,
+                imageIds=[{"imageDigest": digest} for digest in c],
+            )
+
+        save_to_s3(args.filter_prefix, stable_window_tags)
--- a/.circleci/ecr_gc_docker/requirements.txt
+++ b/.circleci/ecr_gc_docker/requirements.txt
@ -0,0 +1,3 @@
+boto3
+pytz
+requests
--- a/.circleci/generate_config_yml.py
+++ b/.circleci/generate_config_yml.py
@ -10,10 +10,20 @@ import shutil
 import sys
 from collections import namedtuple

+import cimodel.data.binary_build_definitions as binary_build_definitions
+import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
+import cimodel.data.simple.android_definitions
+import cimodel.data.simple.bazel_definitions
+import cimodel.data.simple.binary_smoketest
 import cimodel.data.simple.docker_definitions
+import cimodel.data.simple.ge_config_tests
+import cimodel.data.simple.ios_definitions
+import cimodel.data.simple.macos_definitions
 import cimodel.data.simple.mobile_definitions
+import cimodel.data.simple.nightly_android
 import cimodel.data.simple.nightly_ios
 import cimodel.data.simple.anaconda_prune_defintions
+import cimodel.data.windows_build_definitions as windows_build_definitions
 import cimodel.lib.miniutils as miniutils
 import cimodel.lib.miniyaml as miniyaml

@ -70,96 +80,44 @@ class Header(object):
        for line in filter(None, lines):
            output_filehandle.write(line + "\n")

-def _for_all_items(items, functor) -> None:
-    if isinstance(items, list):
-        for item in items:
-            _for_all_items(item, functor)
-    if isinstance(items, dict) and len(items) == 1:
-        item_type, item = next(iter(items.items()))
-        functor(item_type, item)
-
-def filter_master_only_jobs(items):
-    def _is_main_or_master_item(item):
-        filters = item.get('filters', None)
-        branches = filters.get('branches', None) if filters is not None else None
-        branches_only = branches.get('only', None) if branches is not None else None
-        return ('main' in branches_only or 'master' in branches_only) if branches_only is not None else False
-
-    master_deps = set()
-
-    def _save_requires_if_master(item_type, item):
-        requires = item.get('requires', None)
-        item_name = item.get("name", None)
-        if not isinstance(requires, list):
-            return
-        if _is_main_or_master_item(item) or item_name in master_deps:
-            master_deps.update([n.strip('"') for n in requires])
-
-    def _do_filtering(items):
-        if isinstance(items, list):
-            rc = [_do_filtering(item) for item in items]
-            return [item for item in rc if len(item if item is not None else []) > 0]
-        assert isinstance(items, dict) and len(items) == 1
-        item_type, item = next(iter(items.items()))
-        item_name = item.get("name", None)
-        item_name = item_name.strip('"') if item_name is not None else None
-        if not _is_main_or_master_item(item) and item_name not in master_deps:
-            return None
-        if 'filters' in item:
-            item = item.copy()
-            item.pop('filters')
-        return {item_type: item}
-
-    # Scan of dependencies twice to pick up nested required jobs
-    # I.e. jobs depending on jobs that main-only job depend on
-    _for_all_items(items, _save_requires_if_master)
-    _for_all_items(items, _save_requires_if_master)
-    return _do_filtering(items)
-
-def generate_required_docker_images(items):
-    required_docker_images = set()
-
-    def _requires_docker_image(item_type, item):
-        requires = item.get('requires', None)
-        if not isinstance(requires, list):
-            return
-        for requirement in requires:
-            requirement = requirement.replace('"', '')
-            if requirement.startswith('docker-'):
-                required_docker_images.add(requirement)
-
-    _for_all_items(items, _requires_docker_image)
-    return required_docker_images

 def gen_build_workflows_tree():
    build_workflows_functions = [
+        cimodel.data.simple.docker_definitions.get_workflow_jobs,
+        pytorch_build_definitions.get_workflow_jobs,
+        cimodel.data.simple.macos_definitions.get_workflow_jobs,
+        cimodel.data.simple.android_definitions.get_workflow_jobs,
+        cimodel.data.simple.ios_definitions.get_workflow_jobs,
        cimodel.data.simple.mobile_definitions.get_workflow_jobs,
+        cimodel.data.simple.ge_config_tests.get_workflow_jobs,
+        cimodel.data.simple.bazel_definitions.get_workflow_jobs,
+        cimodel.data.simple.binary_smoketest.get_workflow_jobs,
        cimodel.data.simple.nightly_ios.get_workflow_jobs,
+        cimodel.data.simple.nightly_android.get_workflow_jobs,
        cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,
+        windows_build_definitions.get_windows_workflows,
+        binary_build_definitions.get_post_upload_jobs,
+        binary_build_definitions.get_binary_smoke_test_jobs,
    ]
-    build_jobs = [f() for f in build_workflows_functions]
-    build_jobs.extend(
-        cimodel.data.simple.docker_definitions.get_workflow_jobs(
-            # sort for consistency
-            sorted(generate_required_docker_images(build_jobs))
-        )
-    )
-    master_build_jobs = filter_master_only_jobs(build_jobs)

-    rc = {
+    binary_build_functions = [
+        binary_build_definitions.get_binary_build_jobs,
+        binary_build_definitions.get_nightly_tests,
+        binary_build_definitions.get_nightly_uploads,
+    ]
+
+    return {
        "workflows": {
+            "binary_builds": {
+                "when": r"<< pipeline.parameters.run_binary_tests >>",
+                "jobs": [f() for f in binary_build_functions],
+            },
            "build": {
                "when": r"<< pipeline.parameters.run_build >>",
-                "jobs": build_jobs,
+                "jobs": [f() for f in build_workflows_functions]
            },
        }
    }
-    if len(master_build_jobs) > 0:
-        rc["workflows"]["master_build"] = {
-            "when": r"<< pipeline.parameters.run_master_build >>",
-            "jobs": master_build_jobs,
-        }
-    return rc


 # Order of this list matters to the generated config.yml.
@ -170,14 +128,19 @@ YAML_SOURCES = [
    Header("Build parameters"),
    File("build-parameters/pytorch-build-params.yml"),
    File("build-parameters/binary-build-params.yml"),
+    File("build-parameters/promote-build-params.yml"),
    Header("Job specs"),
+    File("job-specs/pytorch-job-specs.yml"),
    File("job-specs/binary-job-specs.yml"),
    File("job-specs/job-specs-custom.yml"),
+    File("job-specs/job-specs-promote.yml"),
    File("job-specs/binary_update_htmls.yml"),
    File("job-specs/binary-build-tests.yml"),
    File("job-specs/docker_jobs.yml"),
    Header("Workflows"),
    Treegen(gen_build_workflows_tree, 0),
+    File("workflows/workflows-ecr-gc.yml"),
+    File("workflows/workflows-promote.yml"),
 ]


--- a/.circleci/regenerate.ps1
+++ b/.circleci/regenerate.ps1
@ -1,5 +0,0 @@
-cd $PSScriptRoot;
-$NewFile = New-TemporaryFile;
-python generate_config_yml.py > $NewFile.name
-(Get-Content $NewFile.name -Raw).TrimEnd().Replace("`r`n","`n") | Set-Content config.yml -Force
-Remove-Item $NewFile.name
--- a/.circleci/regenerate.sh
+++ b/.circleci/regenerate.sh
@ -1,17 +1,8 @@
-#!/bin/bash -e
+#!/bin/bash -xe

 # Allows this script to be invoked from any directory:
-cd "$(dirname "$0")"
-
-UNCOMMIT_CHANGE=$(git status -s | grep " config.yml" | wc -l | xargs)
-if [[ $UNCOMMIT_CHANGE != 0 ]]; then
-    OLD_FILE=$(mktemp)
-    cp config.yml "$OLD_FILE"
-    echo "Uncommitted change detected in .circleci/config.yml"
-    echo "It has been backed up to $OLD_FILE"
-fi
+cd $(dirname "$0")

 NEW_FILE=$(mktemp)
-./generate_config_yml.py > "$NEW_FILE"
-cp "$NEW_FILE" config.yml
-echo "New config generated in .circleci/config.yml"
+./generate_config_yml.py > $NEW_FILE
+cp $NEW_FILE config.yml
--- a/.circleci/scripts/binary_checkout.sh
+++ b/.circleci/scripts/binary_checkout.sh
@ -49,20 +49,19 @@ if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then
  git reset --hard "$CIRCLE_SHA1"
 elif [[ -n "${CIRCLE_SHA1:-}" ]]; then
  # Scheduled workflows & "smoke" binary build on master on PR merges
-  DEFAULT_BRANCH="$(git remote show $CIRCLE_REPOSITORY_URL | awk '/HEAD branch/ {print $NF}')"
  git reset --hard "$CIRCLE_SHA1"
-  git checkout -q -B $DEFAULT_BRANCH
+  git checkout -q -B master
 else
  echo "Can't tell what to checkout"
  exit 1
 fi
-retry git submodule update --init --recursive --jobs 0
+retry git submodule update --init --recursive
 echo "Using Pytorch from "
 git --no-pager log --max-count 1
 popd

 # Clone the Builder master repo
-retry git clone -q https://github.com/pytorch/builder.git -b release/1.12 "$BUILDER_ROOT"
+retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
 pushd "$BUILDER_ROOT"
 echo "Using builder from "
 git --no-pager log --max-count 1
--- a/.circleci/scripts/binary_ios_build.sh
+++ b/.circleci/scripts/binary_ios_build.sh
@ -15,14 +15,14 @@ export PATH="~/anaconda/bin:${PATH}"
 source ~/anaconda/bin/activate

 # Install dependencies
-conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
+conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests --yes
 conda install -c conda-forge valgrind --yes
 export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

 # sync submodules
 cd ${PROJ_ROOT}
 git submodule sync
-git submodule update --init --recursive --jobs 0
+git submodule update --init --recursive

 # run build script
 chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
@ -31,12 +31,8 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh
 echo "########################################################"
 echo "IOS_ARCH: ${IOS_ARCH}"
 echo "IOS_PLATFORM: ${IOS_PLATFORM}"
-echo "USE_PYTORCH_METAL: ${USE_PYTORCH_METAL}"
-echo "USE_COREML_DELEGATE: ${USE_COREML_DELEGATE}"
 export IOS_ARCH=${IOS_ARCH}
 export IOS_PLATFORM=${IOS_PLATFORM}
-export USE_PYTORCH_METAL=${USE_PYTORCH_METAL}
-export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}
 unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

 #store the binary
--- a/.circleci/scripts/binary_ios_test.sh
+++ b/.circleci/scripts/binary_ios_test.sh
@ -8,23 +8,22 @@ cd ${PROJ_ROOT}/ios/TestApp
 # install fastlane
 sudo gem install bundler && bundle install
 # install certificates
-echo "${IOS_CERT_KEY_2022}" >> cert.txt
+echo "${IOS_CERT_KEY}" >> cert.txt
 base64 --decode cert.txt -o Certificates.p12
 rm cert.txt
-bundle exec fastlane install_root_cert
-bundle exec fastlane install_dev_cert
+bundle exec fastlane install_cert
 # install the provisioning profile
-PROFILE=PyTorch_CI_2022.mobileprovision
+PROFILE=PyTorch_CI_2021.mobileprovision
 PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
 mkdir -pv "${PROVISIONING_PROFILES}"
 cd "${PROVISIONING_PROFILES}"
-echo "${IOS_SIGN_KEY_2022}" >> cert.txt
+echo "${IOS_SIGN_KEY}" >> cert.txt
 base64 --decode cert.txt -o ${PROFILE}
 rm cert.txt
 # run the ruby build script
 if ! [ -x "$(command -v xcodebuild)" ]; then
    echo 'Error: xcodebuild is not installed.'
    exit 1
-fi
-PROFILE=PyTorch_CI_2022
+fi 
+PROFILE=PyTorch_CI_2021
 ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}
--- a/.circleci/scripts/binary_ios_upload.sh
+++ b/.circleci/scripts/binary_ios_upload.sh
@ -23,27 +23,15 @@ do
    fi
 done
 lipo -i ${ZIP_DIR}/install/lib/*.a
-echo "BUILD_LITE_INTERPRETER: ${BUILD_LITE_INTERPRETER}"
 # copy the umbrella header and license
-if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
-    cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/
-else
-    cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/
-fi
+cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/
 cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
 # zip the library
-export DATE="$(date -u +%Y%m%d)"
-export IOS_NIGHTLY_BUILD_VERSION="1.12.0.${DATE}"
-if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
-    # libtorch_lite_ios_nightly_1.11.0.20210810.zip
-    ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"
-else
-    ZIPFILE="libtorch_ios_nightly_build.zip"
-fi
+ZIPFILE=libtorch_ios_nightly_build.zip
 cd ${ZIP_DIR}
 #for testing
 touch version.txt
-echo "${IOS_NIGHTLY_BUILD_VERSION}" > version.txt
+echo $(date +%s) > version.txt
 zip -r ${ZIPFILE} install src version.txt LICENSE
 # upload to aws
 # Install conda then 'conda install' awscli
@ -60,16 +48,3 @@ set +x
 # echo "AWS KEY: ${AWS_ACCESS_KEY_ID}"
 # echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"
 aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read
-
-if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
-    # create a new LibTorch-Lite-Nightly.podspec from the template
-    echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"
-    cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
-
-    # update pod version
-    sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
-    cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
-
-    # push the new LibTorch-Lite-Nightly.podspec to CocoaPods
-    pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
-fi
--- a/.circleci/scripts/binary_linux_build.sh
+++ b/.circleci/scripts/binary_linux_build.sh
@ -4,14 +4,10 @@ echo "RUNNING ON $(uname -a) WITH $(nproc) CPUS AND $(free -m)"
 set -eux -o pipefail
 source /env

-# Because most Circle executors only have 20 CPUs, using more causes OOMs w/ Ninja and nvcc parallelization
-MEMORY_LIMIT_MAX_JOBS=18
-NUM_CPUS=$(( $(nproc) - 2 ))
+# Defaults here so they can be changed in one place
+export MAX_JOBS=${MAX_JOBS:-$(( $(nproc) - 2 ))}

-# Defaults here for **binary** linux builds so they can be changed in one place
-export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}
-
-if [[ "${DESIRED_CUDA}" =~ cu11[0-9] ]]; then
+if [[ "${DESIRED_CUDA}" == "cu111" ]]; then
  export BUILD_SPLIT_CUDA="ON"
 fi

@ -26,9 +22,5 @@ else
  build_script='manywheel/build.sh'
 fi

-if [[ "$CIRCLE_BRANCH" == "main" ]] || [[ "$CIRCLE_BRANCH" == "master" ]] || [[ "$CIRCLE_BRANCH" == release/* ]]; then
-  export BUILD_DEBUG_INFO=1
-fi
-
 # Build the package
 SKIP_ALL_TESTS=1 "/builder/$build_script"
--- a/.circleci/scripts/binary_linux_test.sh
+++ b/.circleci/scripts/binary_linux_test.sh
@ -1,28 +1,18 @@
 #!/bin/bash

-OUTPUT_SCRIPT=${OUTPUT_SCRIPT:-/home/circleci/project/ci_test_script.sh}
-
-# only source if file exists
-if [[ -f /home/circleci/project/env ]]; then
-  source /home/circleci/project/env
-fi
-cat >"${OUTPUT_SCRIPT}" <<EOL
+source /home/circleci/project/env
+cat >/home/circleci/project/ci_test_script.sh <<EOL
 # =================== The following code will be executed inside Docker container ===================
 set -eux -o pipefail

-retry () {
-    "\$@"  || (sleep 1 && "\$@") || (sleep 2 && "\$@")
-}
-
-# Source binary env file here if exists
-if [[ -e "${BINARY_ENV_FILE:-/nofile}" ]]; then
-  source "${BINARY_ENV_FILE:-/nofile}"
-fi
-
 python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

 # Set up Python
 if [[ "$PACKAGE_TYPE" == conda ]]; then
+  # There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives
+  # above a certain size fail out when attempting to extract
+  # see: https://github.com/conda/conda-package-handling/issues/71
+  conda install -y conda-package-handling=1.6.0
  retry conda create -qyn testenv python="$DESIRED_PYTHON"
  source activate testenv >/dev/null
 elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
@ -37,38 +27,24 @@ fi

 EXTRA_CONDA_FLAGS=""
 NUMPY_PIN=""
-PROTOBUF_PACKAGE="defaults::protobuf"
-if [[ "\$python_nodot" = *310* ]]; then
-  EXTRA_CONDA_FLAGS="-c=conda-forge"
-  # There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
-  # we set a lower boundary here just to be safe
-  NUMPY_PIN=">=1.21.2"
-  PROTOBUF_PACKAGE="protobuf>=3.19.0"
-fi
-
-if [[ "\$python_nodot" = *39*  ]]; then
+if [[ "\$python_nodot" = *39* ]]; then
  EXTRA_CONDA_FLAGS="-c=conda-forge"
  # There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
  # we set a lower boundary here just to be safe
  NUMPY_PIN=">=1.20"
 fi

-if [[ "$DESIRED_CUDA" == "cu116" ]]; then
+if [[ "$DESIRED_CUDA" == "cu112" ]]; then
  EXTRA_CONDA_FLAGS="-c=conda-forge"
 fi

-# Move debug wheels out of the the package dir so they don't get installed
-mkdir -p /tmp/debug_final_pkgs
-mv /final_pkgs/debug-*.zip /tmp/debug_final_pkgs || echo "no debug packages to move"
-
 # Install the package
 # These network calls should not have 'retry's because they are installing
 # locally and aren't actually network calls
 # TODO there is duplicated and inconsistent test-python-env setup across this
 #   file, builder/smoke_test.sh, and builder/run_tests.sh, and also in the
 #   conda build scripts themselves. These should really be consolidated
-# Pick only one package of multiple available (which happens as result of workflow re-runs)
-pkg="/final_pkgs/\$(ls -1 /final_pkgs|sort|tail -1)"
+pkg="/final_pkgs/\$(ls /final_pkgs)"
 if [[ "$PACKAGE_TYPE" == conda ]]; then
  (
    # For some reason conda likes to re-activate the conda environment when attempting this install
@ -83,7 +59,7 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
      ninja \
      dataclasses \
      typing-extensions \
-      ${PROTOBUF_PACKAGE} \
+      defaults::protobuf \
      six
    if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
      retry conda install -c pytorch -y cpuonly
@ -116,4 +92,4 @@ EOL
 echo
 echo
 echo "The script that will run in the next step is:"
-cat "${OUTPUT_SCRIPT}"
+cat /home/circleci/project/ci_test_script.sh
--- a/.circleci/scripts/binary_macos_build.sh
+++ b/.circleci/scripts/binary_macos_build.sh
@ -1,19 +1,24 @@
 #!/bin/bash
 set -eux -o pipefail

-source "${BINARY_ENV_FILE:-/Users/distiller/project/env}"
+source "/Users/distiller/project/env"
 mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"

-if [[ -z "${IS_GHA:-}" ]]; then
-  export PATH="${workdir:-${HOME}}/miniconda/bin:${PATH}"
-fi
+# For some reason `unbuffer` breaks if we change the PATH here, so we
+# write a script with the PATH change in it and unbuffer the whole
+# thing
+build_script="$workdir/build_script.sh"
+touch "$build_script"
+chmod +x "$build_script"

 # Build
-export USE_PYTORCH_METAL_EXPORT=1
-export USE_COREML_DELEGATE=1
+cat >"$build_script" <<EOL
+export PATH="$workdir/miniconda/bin:$PATH"
 if [[ "$PACKAGE_TYPE" == conda ]]; then
-  "${BUILDER_ROOT}/conda/build_pytorch.sh"
+  "$workdir/builder/conda/build_pytorch.sh"
 else
  export TORCH_PACKAGE_NAME="$(echo $TORCH_PACKAGE_NAME | tr '-' '_')"
-  "${BUILDER_ROOT}/wheel/build_wheel.sh"
+  "$workdir/builder/wheel/build_wheel.sh"
 fi
+EOL
+unbuffer "$build_script" | ts
--- a/.circleci/scripts/binary_populate_env.sh
+++ b/.circleci/scripts/binary_populate_env.sh
@ -5,70 +5,53 @@ export TZ=UTC
 tagged_version() {
  # Grabs version from either the env variable CIRCLE_TAG
  # or the pytorch git described version
-  if [[ "$OSTYPE" == "msys" &&  -z "${IS_GHA:-}" ]]; then
-    GIT_DIR="${workdir}/p/.git"
+  if [[ "$OSTYPE" == "msys" ]]; then
+    GIT_DESCRIBE="git --git-dir ${workdir}/p/.git describe"
  else
-    GIT_DIR="${workdir}/pytorch/.git"
+    GIT_DESCRIBE="git --git-dir ${workdir}/pytorch/.git describe"
  fi
-  GIT_DESCRIBE="git --git-dir ${GIT_DIR} describe --tags --match v[0-9]*.[0-9]*.[0-9]*"
  if [[ -n "${CIRCLE_TAG:-}" ]]; then
    echo "${CIRCLE_TAG}"
-  elif [[ ! -d "${GIT_DIR}" ]]; then
-    echo "Abort, abort! Git dir ${GIT_DIR} does not exists!"
-    kill $$
-  elif ${GIT_DESCRIBE} --exact >/dev/null; then
-    ${GIT_DESCRIBE}
+  elif ${GIT_DESCRIBE} --exact --tags >/dev/null; then
+    ${GIT_DESCRIBE} --tags
  else
    return 1
  fi
 }

-# These are only relevant for CircleCI
-# TODO: Remove these later once migrated fully to GHA
-if [[ -z ${IS_GHA:-} ]]; then
-  # We need to write an envfile to persist these variables to following
-  # steps, but the location of the envfile depends on the circleci executor
-  if [[ "$(uname)" == Darwin ]]; then
-    # macos executor (builds and tests)
-    workdir="/Users/distiller/project"
-  elif [[ "$OSTYPE" == "msys" ]]; then
-    # windows executor (builds and tests)
-    workdir="/c/w"
-  elif [[ -d "/home/circleci/project" ]]; then
-    # machine executor (binary tests)
-    workdir="/home/circleci/project"
-  else
-    # docker executor (binary builds)
-    workdir="/"
-  fi
-  envfile="$workdir/env"
-  touch "$envfile"
-  chmod +x "$envfile"
+# We need to write an envfile to persist these variables to following
+# steps, but the location of the envfile depends on the circleci executor
+if [[ "$(uname)" == Darwin ]]; then
+  # macos executor (builds and tests)
+  workdir="/Users/distiller/project"
+elif [[ "$OSTYPE" == "msys" ]]; then
+  # windows executor (builds and tests)
+  workdir="/c/w"
+elif [[ -d "/home/circleci/project" ]]; then
+  # machine executor (binary tests)
+  workdir="/home/circleci/project"
+else
+  # docker executor (binary builds)
+  workdir="/"
+fi
+envfile="$workdir/env"
+touch "$envfile"
+chmod +x "$envfile"

-  # Parse the BUILD_ENVIRONMENT to package type, python, and cuda
-  configs=($BUILD_ENVIRONMENT)
-  export PACKAGE_TYPE="${configs[0]}"
-  export DESIRED_PYTHON="${configs[1]}"
-  export DESIRED_CUDA="${configs[2]}"
-  if [[ "${OSTYPE}" == "msys" ]]; then
-    export DESIRED_DEVTOOLSET=""
-    export LIBTORCH_CONFIG="${configs[3]:-}"
-    if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then
-      export DEBUG=1
-    fi
-  else
-    export DESIRED_DEVTOOLSET="${configs[3]:-}"
+# Parse the BUILD_ENVIRONMENT to package type, python, and cuda
+configs=($BUILD_ENVIRONMENT)
+export PACKAGE_TYPE="${configs[0]}"
+export DESIRED_PYTHON="${configs[1]}"
+export DESIRED_CUDA="${configs[2]}"
+if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
+  export DESIRED_DEVTOOLSET=""
+  export LIBTORCH_CONFIG="${configs[3]:-}"
+  if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then
+    export DEBUG=1
  fi
 else
-  envfile=${BINARY_ENV_FILE:-/tmp/env}
-  if [[ -n "${PYTORCH_ROOT}"  ]]; then
-    workdir=$(dirname "${PYTORCH_ROOT}")
-  else
-    # docker executor (binary builds)
-    workdir="/"
-  fi
+  export DESIRED_DEVTOOLSET="${configs[3]:-}"
 fi
-
 if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
  export BUILD_PYTHONLESS=1
 fi
@ -79,25 +62,18 @@ if [[ -z "$DOCKER_IMAGE" ]]; then
  if [[ "$PACKAGE_TYPE" == conda ]]; then
    export DOCKER_IMAGE="pytorch/conda-cuda"
  elif [[ "$DESIRED_CUDA" == cpu ]]; then
-    export DOCKER_IMAGE="pytorch/manylinux-cpu"
+    export DOCKER_IMAGE="pytorch/manylinux-cuda100"
  else
    export DOCKER_IMAGE="pytorch/manylinux-cuda${DESIRED_CUDA:2}"
  fi
 fi

-USE_GOLD_LINKER="OFF"
-# GOLD linker can not be used if CUPTI is statically linked into PyTorch, see https://github.com/pytorch/pytorch/issues/57744
-if [[ ${DESIRED_CUDA} == "cpu" ]]; then
-  USE_GOLD_LINKER="ON"
-fi
-
-
 # Default to nightly, since that's where this normally uploads to
 PIP_UPLOAD_FOLDER='nightly/'
 # We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
 export DATE="$(date -u +%Y%m%d)"
 #TODO: We should be pulling semver version from the base version.txt
-BASE_BUILD_VERSION="1.12.0.dev$DATE"
+BASE_BUILD_VERSION="1.8.0.dev$DATE"
 # Change BASE_BUILD_VERSION to git tag when on a git tag
 # Use 'git -C' to make doubly sure we're in the correct directory for checking
 # the git tag
@ -109,7 +85,7 @@ if tagged_version >/dev/null; then
  # Turns tag v1.6.0-rc1 -> v1.6.0
  BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"
 fi
-if [[ "$(uname)" == 'Darwin' ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
+if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu102" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"
 else
  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"
@ -143,28 +119,24 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then
  fi
 fi

-cat >"$envfile" <<EOL
+cat >>"$envfile" <<EOL
 # =================== The following code will be executed inside Docker container ===================
 export TZ=UTC
 echo "Running on $(uname -a) at $(date)"

 export PACKAGE_TYPE="$PACKAGE_TYPE"
-export DESIRED_PYTHON="${DESIRED_PYTHON:-}"
+export DESIRED_PYTHON="$DESIRED_PYTHON"
 export DESIRED_CUDA="$DESIRED_CUDA"
 export LIBTORCH_VARIANT="${LIBTORCH_VARIANT:-}"
 export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"
-if [[ "${OSTYPE}" == "msys" ]]; then
+export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"
+if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
  export LIBTORCH_CONFIG="${LIBTORCH_CONFIG:-}"
-  if [[ "${LIBTORCH_CONFIG:-}" == 'debug' ]]; then
-    export DEBUG=1
-  fi
-  export DESIRED_DEVTOOLSET=""
-else
-  export DESIRED_DEVTOOLSET="${DESIRED_DEVTOOLSET:-}"
+  export DEBUG="${DEBUG:-}"
 fi

 export DATE="$DATE"
-export NIGHTLIES_DATE_PREAMBLE=1.12.0.dev
+export NIGHTLIES_DATE_PREAMBLE=1.8.0.dev
 export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
 export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
 export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
@ -172,7 +144,6 @@ export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
 # TODO: We don't need this anymore IIUC
 export TORCH_PACKAGE_NAME='torch'
 export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'
-export ANACONDA_USER='pytorch'

 export USE_FBGEMM=1
 export JAVA_HOME=$JAVA_HOME
@ -180,48 +151,26 @@ export BUILD_JNI=$BUILD_JNI
 export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"
 export DOCKER_IMAGE="$DOCKER_IMAGE"

+export workdir="$workdir"
+export MAC_PACKAGE_WORK_DIR="$workdir"
+if [[ "$OSTYPE" == "msys" ]]; then
+  export PYTORCH_ROOT="$workdir/p"
+  export BUILDER_ROOT="$workdir/b"
+else
+  export PYTORCH_ROOT="$workdir/pytorch"
+  export BUILDER_ROOT="$workdir/builder"
+fi
+export MINICONDA_ROOT="$workdir/miniconda"
+export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"

-export USE_GOLD_LINKER="${USE_GOLD_LINKER}"
-export USE_GLOO_WITH_OPENSSL="ON"
+export CIRCLE_TAG="${CIRCLE_TAG:-}"
+export CIRCLE_SHA1="$CIRCLE_SHA1"
+export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
+export CIRCLE_BRANCH="$CIRCLE_BRANCH"
+export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"
 # =================== The above code will be executed inside Docker container ===================
 EOL

-# nproc doesn't exist on darwin
-if [[ "$(uname)" != Darwin ]]; then
-  # Because most Circle executors only have 20 CPUs, using more causes OOMs w/ Ninja and nvcc parallelization
-  MEMORY_LIMIT_MAX_JOBS=18
-  NUM_CPUS=$(( $(nproc) - 2 ))
-
-  # Defaults here for **binary** linux builds so they can be changed in one place
-  export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}
-
-  cat >>"$envfile" <<EOL
-  export MAX_JOBS="${MAX_JOBS}"
-EOL
-fi
-
-if [[ -z "${IS_GHA:-}" ]]; then
-  cat >>"$envfile" <<EOL
-  export workdir="$workdir"
-  export MAC_PACKAGE_WORK_DIR="$workdir"
-  if [[ "$OSTYPE" == "msys" ]]; then
-    export PYTORCH_ROOT="$workdir/p"
-    export BUILDER_ROOT="$workdir/b"
-  else
-    export PYTORCH_ROOT="$workdir/pytorch"
-    export BUILDER_ROOT="$workdir/builder"
-  fi
-  export MINICONDA_ROOT="$workdir/miniconda"
-  export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"
-
-  export CIRCLE_TAG="${CIRCLE_TAG:-}"
-  export CIRCLE_SHA1="$CIRCLE_SHA1"
-  export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
-  export CIRCLE_BRANCH="$CIRCLE_BRANCH"
-  export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"
-EOL
-fi
-
 echo 'retry () {' >> "$envfile"
 echo '    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)' >> "$envfile"
 echo '}' >> "$envfile"
--- a/.circleci/scripts/binary_upload.sh
+++ b/.circleci/scripts/binary_upload.sh
@ -63,10 +63,6 @@ s3_upload() {
  )
 }

-# Install dependencies (should be a no-op if previously installed)
-conda install -yq anaconda-client
-pip install -q awscli
-
 case "${PACKAGE_TYPE}" in
  conda)
    conda_upload
--- a/.circleci/scripts/binary_windows_build.sh
+++ b/.circleci/scripts/binary_windows_build.sh
@ -1,68 +1,34 @@
 #!/bin/bash
 set -eux -o pipefail

-source "${BINARY_ENV_FILE:-/c/w/env}"
+source "/c/w/env"
 mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"

 export CUDA_VERSION="${DESIRED_CUDA/cu/}"
 export USE_SCCACHE=1
 export SCCACHE_BUCKET=ossci-compiler-cache-windows
-export SCCACHE_IGNORE_SERVER_IO_ERROR=1
-export VC_YEAR=2019
+export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"

-if [[ "${DESIRED_CUDA}" == *"cu11"* ]]; then
-    export BUILD_SPLIT_CUDA=ON
+if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then
+  export VC_YEAR=2017
+else
+  export VC_YEAR=2019
 fi

+if [[ "${DESIRED_CUDA}" == "cu111" ]]; then
+  export BUILD_SPLIT_CUDA="ON"
+fi

-echo "Free Space for CUDA DEBUG BUILD"
-if [[ "${CIRCLECI:-}" == 'true' ]]; then
-    export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
-    if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" ]]; then
-        rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community"
-    fi
+set +x
+export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
+export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
+set -x

-    if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0" ]]; then
-        rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0"
-    fi
-
-    if [[ -d "C:\\Program Files (x86)\\Microsoft.NET" ]]; then
-        rm -rf "C:\\Program Files (x86)\\Microsoft.NET"
-    fi
-
-    if [[ -d "C:\\Program Files\\dotnet" ]]; then
-        rm -rf "C:\\Program Files\\dotnet"
-    fi
-
-    if [[ -d "C:\\Program Files (x86)\\dotnet" ]]; then
-        rm -rf "C:\\Program Files (x86)\\dotnet"
-    fi
-
-    if [[ -d "C:\\Program Files (x86)\\Microsoft SQL Server" ]]; then
-        rm -rf "C:\\Program Files (x86)\\Microsoft SQL Server"
-    fi
-
-    if [[ -d "C:\\Program Files (x86)\\Xamarin" ]]; then
-        rm -rf "C:\\Program Files (x86)\\Xamarin"
-    fi
-
-    if [[ -d "C:\\Program Files (x86)\\Google" ]]; then
-        rm -rf "C:\\Program Files (x86)\\Google"
-    fi
-    set +x
-    export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
-    export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
-    set -x
-    if [[ -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" ]]; then
-        mv "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" .
-        rm -rf "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
-        mkdir -p "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
-        mv _Instances "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
-    fi
-    if [[ -d "C:\\Microsoft" ]]; then
-        # don't use quotes here
-        rm -rf /c/Microsoft/AndroidNDK*
-    fi
+if [[ "$CIRCLECI" == 'true' && -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" ]]; then
+  mv "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" .
+  rm -rf "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
+  mkdir -p "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
+  mv _Instances "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
 fi

 echo "Free space on filesystem before build:"
@ -70,10 +36,9 @@ df -h

 pushd "$BUILDER_ROOT"
 if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
-    ./windows/internal/build_conda.bat
+  ./windows/internal/build_conda.bat
 elif [[ "$PACKAGE_TYPE" == 'wheel' || "$PACKAGE_TYPE" == 'libtorch' ]]; then
-    export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
-    ./windows/internal/build_wheels.bat
+  ./windows/internal/build_wheels.bat
 fi

 echo "Free space on filesystem after build:"
--- a/.circleci/scripts/binary_windows_test.sh
+++ b/.circleci/scripts/binary_windows_test.sh
@ -1,10 +1,16 @@
 #!/bin/bash
 set -eux -o pipefail

-source "${BINARY_ENV_FILE:-/c/w/env}"
+source "/c/w/env"

 export CUDA_VERSION="${DESIRED_CUDA/cu/}"
-export VC_YEAR=2019
+export VC_YEAR=2017
+
+if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then
+  export VC_YEAR=2017
+else
+  export VC_YEAR=2019
+fi

 pushd "$BUILDER_ROOT"

--- a/.circleci/scripts/build_android_gradle.sh
+++ b/.circleci/scripts/build_android_gradle.sh
@ -10,7 +10,7 @@ export ANDROID_HOME=/opt/android/sdk

 # Must be in sync with GRADLE_VERSION in docker image for android
 # https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh#L155
-export GRADLE_VERSION=6.8.3
+export GRADLE_VERSION=4.10.3
 export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION
 export GRADLE_PATH=$GRADLE_HOME/bin/gradle

--- a/.circleci/scripts/cpp_doc_push_script.sh
+++ b/.circleci/scripts/cpp_doc_push_script.sh
@ -10,33 +10,24 @@ pt_checkout="/var/lib/jenkins/workspace"
 # Since we're cat-ing this file, we need to escape all $'s
 echo "cpp_doc_push_script.sh: Invoked with $*"

-# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
-# the order of operations goes:
-#   1. Check if there's an argument $1
-#   2. If no argument check for environment var DOCS_INSTALL_PATH
-#   3. If no environment var fall back to default 'docs/'
-
-# NOTE: It might seem weird to gather the second argument before gathering the first argument
-#       but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
-#       try and gather it first, just so we don't potentially break people who rely on this script
-# Argument 2: What version of the Python API docs we are building.
-version="${2:-${DOCS_VERSION:-master}}"
-if [ -z "$version" ]; then
-echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
-  exit 1
-fi
-
 # Argument 1: Where to copy the built documentation for Python API to
 # (pytorch.github.io/$install_path)
-install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
+install_path="$1"
 if [ -z "$install_path" ]; then
 echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
  exit 1
 fi

-is_main_doc=false
+# Argument 2: What version of the Python API docs we are building.
+version="$2"
+if [ -z "$version" ]; then
+echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
+  exit 1
+fi
+
+is_master_doc=false
 if [ "$version" == "master" ]; then
-  is_main_doc=true
+  is_master_doc=true
 fi

 echo "install_path: $install_path  version: $version"
@ -56,7 +47,7 @@ sudo apt-get -y install doxygen
 # Generate ATen files
 pushd "${pt_checkout}"
 pip install -r requirements.txt
-time python -m torchgen.gen \
+time python -m tools.codegen.gen \
  -s aten/src/ATen \
  -d build/aten/src/ATen

@ -65,8 +56,9 @@ cp torch/_utils_internal.py tools/shared

 # Generate PyTorch files
 time python tools/setup_helpers/generate_code.py \
+  --declarations-path build/aten/src/ATen/Declarations.yaml \
  --native-functions-path aten/src/ATen/native/native_functions.yaml \
-  --tags-path aten/src/ATen/native/tags.yaml
+  --nn-path aten/src/

 # Build the docs
 pushd docs/cpp
@ -96,12 +88,8 @@ git status
 git config user.email "soumith+bot@pytorch.org"
 git config user.name "pytorchbot"
 # If there aren't changes, don't make a commit; push is no-op
-git commit -m "Generate C++ docs from pytorch/pytorch@${GITHUB_SHA}" || true
+git commit -m "Generate C++ docs from pytorch/pytorch@$CIRCLE_SHA1" || true
 git status

-if [[ "${WITH_PUSH:-}" == true ]]; then
-  git push -u origin
-fi
-
 popd
 # =================== The above code **should** be executed inside Docker container ===================
--- a/.circleci/scripts/publish_android_snapshot.sh
+++ b/.circleci/scripts/publish_android_snapshot.sh
@ -5,7 +5,7 @@ set -eu -o pipefail
 export ANDROID_NDK_HOME=/opt/ndk
 export ANDROID_HOME=/opt/android/sdk

-export GRADLE_VERSION=6.8.3
+export GRADLE_VERSION=4.10.3
 export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION
 export GRADLE_PATH=$GRADLE_HOME/bin/gradle

@ -35,9 +35,7 @@ else
  echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

  echo "SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES
-  echo "mavenCentralRepositoryUsername=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES
  echo "SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES
-  echo "mavenCentralRepositoryPassword=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES

  echo "signing.keyId=${ANDROID_SIGN_KEY}" >> $GRADLE_PROPERTIES
  echo "signing.password=${ANDROID_SIGN_PASS}" >> $GRADLE_PROPERTIES
--- a/.circleci/scripts/python_doc_push_script.sh
+++ b/.circleci/scripts/python_doc_push_script.sh
@ -13,37 +13,28 @@ echo "python_doc_push_script.sh: Invoked with $*"

 set -ex

-# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
-# the order of operations goes:
-#   1. Check if there's an argument $1
-#   2. If no argument check for environment var DOCS_INSTALL_PATH
-#   3. If no environment var fall back to default 'docs/'
-
-# NOTE: It might seem weird to gather the second argument before gathering the first argument
-#       but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
-#       try and gather it first, just so we don't potentially break people who rely on this script
-# Argument 2: What version of the docs we are building.
-version="${2:-${DOCS_VERSION:-master}}"
-if [ -z "$version" ]; then
-echo "error: python_doc_push_script.sh: version (arg2) not specified"
-  exit 1
-fi
-
 # Argument 1: Where to copy the built documentation to
 # (pytorch.github.io/$install_path)
-install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
+install_path="$1"
 if [ -z "$install_path" ]; then
 echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
  exit 1
 fi

-is_main_doc=false
+# Argument 2: What version of the docs we are building.
+version="$2"
+if [ -z "$version" ]; then
+echo "error: python_doc_push_script.sh: version (arg2) not specified"
+  exit 1
+fi
+
+is_master_doc=false
 if [ "$version" == "master" ]; then
-  is_main_doc=true
+  is_master_doc=true
 fi

 # Argument 3: The branch to push to. Usually is "site"
-branch="${3:-${DOCS_BRANCH:-site}}"
+branch="$3"
 if [ -z "$branch" ]; then
 echo "error: python_doc_push_script.sh: branch (arg3) not specified"
  exit 1
@ -86,7 +77,7 @@ pushd docs

 # Build the docs
 pip -q install -r requirements.txt
-if [ "$is_main_doc" = true ]; then
+if [ "$is_master_doc" = true ]; then
  build_docs html
  [ $? -eq 0 ] || exit $?
  make coverage
@ -120,6 +111,14 @@ popd
 git rm -rf "$install_path" || true
 mv "$pt_checkout/docs/build/html" "$install_path"

+# Add the version handler by search and replace.
+# XXX: Consider moving this to the docs Makefile or site build
+if [ "$is_master_doc" = true ]; then
+  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"
+else
+  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"
+fi
+
 # Prevent Google from indexing $install_path/_modules. This folder contains
 # generated source files.
 # NB: the following only works on gnu sed. The sed shipped with mac os is different.
@ -131,12 +130,8 @@ git status
 git config user.email "soumith+bot@pytorch.org"
 git config user.name "pytorchbot"
 # If there aren't changes, don't make a commit; push is no-op
-git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
+git commit -m "Generate Python docs from pytorch/pytorch@$CIRCLE_SHA1" || true
 git status

-if [[ "${WITH_PUSH:-}" == true ]]; then
-  git push -u origin "${branch}"
-fi
-
 popd
 # =================== The above code **should** be executed inside Docker container ===================
--- a/.circleci/scripts/setup_ci_environment.sh
+++ b/.circleci/scripts/setup_ci_environment.sh
@ -7,9 +7,6 @@ sudo rm -f /etc/apt/heroku.list
 sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list
 sudo rm -f /etc/apt/partner.list

-# To increase the network reliability, let apt decide which mirror is best to use
-sudo sed -i -e 's/http:\/\/.*archive/mirror:\/\/mirrors/' -e 's/\/ubuntu\//\/mirrors.txt/' /etc/apt/sources.list
-
 retry () {
  $*  || $* || $* || $* || $*
 }
@ -27,12 +24,10 @@ retry sudo apt-get -y install \
 echo "== DOCKER VERSION =="
 docker version

-if ! command -v aws >/dev/null; then
-  retry sudo pip3 -q install awscli==1.19.64
-fi
+retry sudo pip -q install awscli==1.16.35

 if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
-  DRIVER_FN="NVIDIA-Linux-x86_64-510.60.02.run"
+  DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"
  wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
  sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
  nvidia-smi
@ -43,9 +38,9 @@ if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
  curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
  curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

-  retry sudo apt-get update -qq
+  sudo apt-get update -qq
  # Necessary to get the `--gpus` flag to function within docker
-  retry sudo apt-get install -y nvidia-container-toolkit
+  sudo apt-get install -y nvidia-container-toolkit
  sudo systemctl restart docker
 else
  # Explicitly remove nvidia docker apt repositories if not building for cuda
@ -53,51 +48,43 @@ else
 fi

 add_to_env_file() {
-  local name=$1
-  local value=$2
-  case "$value" in
-    *\ *)
-      # BASH_ENV should be set by CircleCI
-      echo "${name}='${value}'" >> "${BASH_ENV:-/tmp/env}"
-      ;;
-    *)
-      echo "${name}=${value}" >> "${BASH_ENV:-/tmp/env}"
-      ;;
-  esac
+  local content
+  content=$1
+  # BASH_ENV should be set by CircleCI
+  echo "${content}" >> "${BASH_ENV:-/tmp/env}"
 }

-add_to_env_file IN_CI 1
-add_to_env_file CI_MASTER "${CI_MASTER:-}"
-add_to_env_file COMMIT_SOURCE "${CIRCLE_BRANCH:-}"
-add_to_env_file BUILD_ENVIRONMENT "${BUILD_ENVIRONMENT}"
-add_to_env_file CIRCLE_PULL_REQUEST "${CIRCLE_PULL_REQUEST}"
+add_to_env_file "IN_CI=1"
+add_to_env_file "COMMIT_SOURCE=${CIRCLE_BRANCH:-}"
+add_to_env_file "BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}"
+add_to_env_file "CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}"


 if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then
-  add_to_env_file SCCACHE_BUCKET ossci-compiler-cache-circleci-v2
+  add_to_env_file "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2"

  SCCACHE_MAX_JOBS=$(( $(nproc) - 1 ))
  MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
  MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
-  add_to_env_file MAX_JOBS "${MAX_JOBS}"
+  add_to_env_file "MAX_JOBS=${MAX_JOBS}"

  if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
-    add_to_env_file TORCH_CUDA_ARCH_LIST 5.2
+    add_to_env_file "TORCH_CUDA_ARCH_LIST=5.2"
  fi

  if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
    # This IAM user allows write access to S3 bucket for sccache & bazels3cache
    set +x
-    add_to_env_file XLA_CLANG_CACHE_S3_BUCKET_NAME "${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"
-    add_to_env_file AWS_ACCESS_KEY_ID "${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"
-    add_to_env_file AWS_SECRET_ACCESS_KEY "${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"
+    add_to_env_file "XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"
+    add_to_env_file "AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"
+    add_to_env_file "AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"
    set -x
  else
    # This IAM user allows write access to S3 bucket for sccache
    set +x
-    add_to_env_file XLA_CLANG_CACHE_S3_BUCKET_NAME "${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"
-    add_to_env_file AWS_ACCESS_KEY_ID "${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"
-    add_to_env_file AWS_SECRET_ACCESS_KEY "${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"
+    add_to_env_file "XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"
+    add_to_env_file "AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"
+    add_to_env_file "AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"
    set -x
  fi
 fi
@ -106,7 +93,5 @@ fi
 set +x
 export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4:-}
 export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4:-}
-export AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
-export AWS_REGION=us-east-1
-aws ecr get-login-password --region $AWS_REGION|docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
+eval "$(aws ecr get-login --region us-east-1 --no-include-email)"
 set -x
--- a/.circleci/scripts/trigger_azure_pipeline.py
+++ b/.circleci/scripts/trigger_azure_pipeline.py
@ -1,140 +0,0 @@
-# Documentation: https://docs.microsoft.com/en-us/rest/api/azure/devops/build/?view=azure-devops-rest-6.0
-
-import re
-import json
-import os
-import sys
-import requests
-import time
-
-AZURE_PIPELINE_BASE_URL = "https://aiinfra.visualstudio.com/PyTorch/"
-AZURE_DEVOPS_PAT_BASE64 = os.environ.get("AZURE_DEVOPS_PAT_BASE64_SECRET", "")
-PIPELINE_ID = "911"
-PROJECT_ID = "0628bce4-2d33-499e-bac5-530e12db160f"
-TARGET_BRANCH = os.environ.get("CIRCLE_BRANCH", "main")
-TARGET_COMMIT = os.environ.get("CIRCLE_SHA1", "")
-
-build_base_url = AZURE_PIPELINE_BASE_URL + "_apis/build/builds?api-version=6.0"
-
-s = requests.Session()
-s.headers.update({"Authorization": "Basic " + AZURE_DEVOPS_PAT_BASE64})
-
-def submit_build(pipeline_id, project_id, source_branch, source_version):
-    print("Submitting build for branch: " + source_branch)
-    print("Commit SHA1: ", source_version)
-
-    run_build_raw = s.post(build_base_url, json={
-        "definition": {"id": pipeline_id},
-        "project": {"id": project_id},
-        "sourceBranch": source_branch,
-        "sourceVersion": source_version
-    })
-
-    try:
-        run_build_json = run_build_raw.json()
-    except json.decoder.JSONDecodeError as e:
-        print(e)
-        print("Failed to parse the response. Check if the Azure DevOps PAT is incorrect or expired.")
-        sys.exit(-1)
-
-    build_id = run_build_json['id']
-
-    print("Submitted bulid: " + str(build_id))
-    print("Bulid URL: " + run_build_json['url'])
-    return build_id
-
-def get_build(_id):
-    get_build_url = AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}?api-version=6.0"
-    get_build_raw = s.get(get_build_url)
-    return get_build_raw.json()
-
-def get_build_logs(_id):
-    get_build_logs_url = AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}/logs?api-version=6.0"
-    get_build_logs_raw = s.get(get_build_logs_url)
-    return get_build_logs_raw.json()
-
-def get_log_content(url):
-    resp = s.get(url)
-    return resp.text
-
-def wait_for_build(_id):
-    build_detail = get_build(_id)
-    build_status = build_detail['status']
-
-    while build_status == 'notStarted':
-        print('Waiting for run to start: ' + str(_id))
-        sys.stdout.flush()
-        try:
-            build_detail = get_build(_id)
-            build_status = build_detail['status']
-        except Exception as e:
-            print("Error getting build")
-            print(e)
-
-        time.sleep(30)
-
-    print("Bulid started: ", str(_id))
-
-    handled_logs = set()
-    while build_status == 'inProgress':
-        try:
-            print("Waiting for log: " + str(_id))
-            logs = get_build_logs(_id)
-        except Exception as e:
-            print("Error fetching logs")
-            print(e)
-            time.sleep(30)
-            continue
-
-        for log in logs['value']:
-            log_id = log['id']
-            if log_id in handled_logs:
-                continue
-            handled_logs.add(log_id)
-            print('Fetching log: \n' + log['url'])
-            try:
-                log_content = get_log_content(log['url'])
-                print(log_content)
-            except Exception as e:
-                print("Error getting log content")
-                print(e)
-            sys.stdout.flush()
-        build_detail = get_build(_id)
-        build_status = build_detail['status']
-        time.sleep(30)
-
-    build_result = build_detail['result']
-
-    print("Bulid status: " + build_status)
-    print("Bulid result: " + build_result)
-
-    return build_status, build_result
-
-if __name__ == '__main__':
-    # Convert the branch name for Azure DevOps
-    match = re.search(r'pull/(\d+)', TARGET_BRANCH)
-    if match is not None:
-        pr_num = match.group(1)
-        SOURCE_BRANCH = f'refs/pull/{pr_num}/head'
-    else:
-        SOURCE_BRANCH = f'refs/heads/{TARGET_BRANCH}'
-
-    MAX_RETRY = 2
-    retry = MAX_RETRY
-
-    while retry > 0:
-        build_id = submit_build(PIPELINE_ID, PROJECT_ID, SOURCE_BRANCH, TARGET_COMMIT)
-        build_status, build_result = wait_for_build(build_id)
-
-        if build_result != 'succeeded':
-            retry = retry - 1
-            if retry > 0:
-                print("Retrying... remaining attempt: " + str(retry))
-                # Wait a bit before retrying
-                time.sleep((MAX_RETRY - retry) * 120)
-                continue
-            else:
-                print("No more chance to retry. Giving up.")
-                sys.exit(-1)
-        else:
-            break
--- a/.circleci/scripts/upload_binary_size_to_scuba.py
+++ b/.circleci/scripts/upload_binary_size_to_scuba.py
@ -0,0 +1,149 @@
+import glob
+import json
+import logging
+import os
+import os.path
+import pathlib
+import re
+import sys
+import time
+import zipfile
+
+import requests
+
+
+def get_size(file_dir):
+    try:
+        # we should only expect one file, if no, something is wrong
+        file_name = glob.glob(os.path.join(file_dir, "*"))[0]
+        return os.stat(file_name).st_size
+    except:
+        logging.exception(f"error getting file from: {file_dir}")
+        return 0
+
+
+def build_message(size):
+    pkg_type, py_ver, cu_ver, *_ = os.environ.get("BUILD_ENVIRONMENT", "").split() + [
+        None,
+        None,
+        None,
+    ]
+    os_name = os.uname()[0].lower()
+    if os_name == "darwin":
+        os_name = "macos"
+    return {
+        "normal": {
+            "os": os_name,
+            "pkg_type": pkg_type,
+            "py_ver": py_ver,
+            "cu_ver": cu_ver,
+            "pr": os.environ.get("CIRCLE_PR_NUMBER"),
+            "build_num": os.environ.get("CIRCLE_BUILD_NUM"),
+            "sha1": os.environ.get("CIRCLE_SHA1"),
+            "branch": os.environ.get("CIRCLE_BRANCH"),
+            "workflow_id": os.environ.get("CIRCLE_WORKFLOW_ID"),
+        },
+        "int": {
+            "time": int(time.time()),
+            "size": size,
+            "commit_time": int(os.environ.get("COMMIT_TIME", "0")),
+            "run_duration": int(time.time() - os.path.getmtime(os.path.realpath(__file__))),
+        },
+    }
+
+
+def send_message(messages):
+    access_token = os.environ.get("SCRIBE_GRAPHQL_ACCESS_TOKEN")
+    if not access_token:
+        raise ValueError("Can't find access token from environment variable")
+    url = "https://graph.facebook.com/scribe_logs"
+    r = requests.post(
+        url,
+        data={
+            "access_token": access_token,
+            "logs": json.dumps(
+                [
+                    {
+                        "category": "perfpipe_pytorch_binary_size",
+                        "message": json.dumps(message),
+                        "line_escape": False,
+                    }
+                    for message in messages
+                ]
+            ),
+        },
+    )
+    print(r.text)
+    r.raise_for_status()
+
+
+def report_android_sizes(file_dir):
+    def gen_sizes():
+        # we should only expect one file, if no, something is wrong
+        aar_files = list(pathlib.Path(file_dir).rglob("pytorch_android-*.aar"))
+        if len(aar_files) != 1:
+            logging.exception(f"error getting aar files from: {file_dir} / {aar_files}")
+            return
+
+        aar_file = aar_files[0]
+        zf = zipfile.ZipFile(aar_file)
+        for info in zf.infolist():
+            # Scan ".so" libs in `jni` folder. Examples:
+            # jni/arm64-v8a/libfbjni.so
+            # jni/arm64-v8a/libpytorch_jni.so
+            m = re.match(r"^jni/([^/]+)/(.*\.so)$", info.filename)
+            if not m:
+                continue
+            arch, lib = m.groups()
+            # report per architecture library size
+            yield [arch, lib, info.compress_size, info.file_size]
+
+        # report whole package size
+        yield ["aar", aar_file.name, os.stat(aar_file).st_size, 0]
+
+    def gen_messages():
+        android_build_type = os.environ.get("ANDROID_BUILD_TYPE")
+        for arch, lib, comp_size, uncomp_size in gen_sizes():
+            print(android_build_type, arch, lib, comp_size, uncomp_size)
+            yield {
+                "normal": {
+                    "os": "android",
+                    # TODO: create dedicated columns
+                    "pkg_type": "{}/{}/{}".format(android_build_type, arch, lib),
+                    "cu_ver": "",  # dummy value for derived field `build_name`
+                    "py_ver": "",  # dummy value for derived field `build_name`
+                    "pr": os.environ.get("CIRCLE_PR_NUMBER"),
+                    "build_num": os.environ.get("CIRCLE_BUILD_NUM"),
+                    "sha1": os.environ.get("CIRCLE_SHA1"),
+                    "branch": os.environ.get("CIRCLE_BRANCH"),
+                    "workflow_id": os.environ.get("CIRCLE_WORKFLOW_ID"),
+                },
+                "int": {
+                    "time": int(time.time()),
+                    "commit_time": int(os.environ.get("COMMIT_TIME", "0")),
+                    "run_duration": int(time.time() - os.path.getmtime(os.path.realpath(__file__))),
+                    "size": comp_size,
+                    "raw_size": uncomp_size,
+                },
+            }
+
+    send_message(list(gen_messages()))
+
+
+if __name__ == "__main__":
+    file_dir = os.environ.get(
+        "PYTORCH_FINAL_PACKAGE_DIR", "/home/circleci/project/final_pkgs"
+    )
+    if len(sys.argv) == 2:
+        file_dir = sys.argv[1]
+    print("checking dir: " + file_dir)
+
+    if "-android" in os.environ.get("BUILD_ENVIRONMENT", ""):
+        report_android_sizes(file_dir)
+    else:
+        size = get_size(file_dir)
+        if size != 0:
+            try:
+                send_message([build_message(size)])
+            except:
+                logging.exception("can't send message")
--- a/.circleci/scripts/vs_install.ps1
+++ b/.circleci/scripts/vs_install.ps1
@ -1,10 +1,7 @@
-# https://developercommunity.visualstudio.com/t/install-specific-version-of-vs-component/1142479
-# Where to find the links: https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers
-
-# BuildTools from S3
-$VS_DOWNLOAD_LINK = "https://s3.amazonaws.com/ossci-windows/vs${env:VS_VERSION}_BuildTools.exe"
+$VS_DOWNLOAD_LINK = "https://aka.ms/vs/15/release/vs_buildtools.exe"
 $COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"
 $VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",
+                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.14.13",
                                                     "--add Microsoft.Component.MSBuild",
                                                     "--add Microsoft.VisualStudio.Component.Roslyn.Compiler",
                                                     "--add Microsoft.VisualStudio.Component.TextTemplating",
@ -14,45 +11,17 @@ $VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStud
                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",
                                                     "--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")

-if (${env:INSTALL_WINDOWS_SDK} -eq "1") {
-    $VS_INSTALL_ARGS += "--add Microsoft.VisualStudio.Component.Windows10SDK.19041"
-}
-
-if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {
-    $VS_VERSION_major = [int] ${env:VS_VERSION}.split(".")[0]
-    $existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[${env:VS_VERSION}, ${env:VS_VERSION_major + 1})" -property installationPath
-    if (($existingPath -ne $null) -and (!${env:CIRCLECI})) {
-        echo "Found correctly versioned existing BuildTools installation in $existingPath"
-        exit 0
-    }
-    $pathToRemove = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -property installationPath
-}
-
-echo "Downloading VS installer from S3."
 curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe
 if ($LASTEXITCODE -ne 0) {
-    echo "Download of the VS 2019 Version ${env:VS_VERSION} installer failed"
+    echo "Download of the VS 2017 installer failed"
    exit 1
 }

-if ($pathToRemove -ne $null) {
-    echo "Uninstalling $pathToRemove."
-    $VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$pathToRemove`"", "--quiet","--wait")
-    $process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru
-    $exitCode = $process.ExitCode
-    if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
-        echo "Original BuildTools uninstall failed with code $exitCode"
-        exit 1
-    }
-    echo "Other versioned BuildTools uninstalled."
-}
-
-echo "Installing Visual Studio version ${env:VS_VERSION}."
 $process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru
 Remove-Item -Path vs_installer.exe -Force
 $exitCode = $process.ExitCode
 if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
-    echo "VS 2019 installer exited with code $exitCode, which should be one of [0, 3010]."
+    echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."
    curl.exe --retry 3 -kL $COLLECT_DOWNLOAD_LINK --output Collect.exe
    if ($LASTEXITCODE -ne 0) {
        echo "Download of the VS Collect tool failed."
@ -60,6 +29,6 @@ if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
    }
    Start-Process "${PWD}\Collect.exe" -NoNewWindow -Wait -PassThru
    New-Item -Path "C:\w\build-results" -ItemType "directory" -Force
-    Copy-Item -Path "${env:TEMP}\vslogs.zip" -Destination "C:\w\build-results\"
+    Copy-Item -Path "C:\Users\circleci\AppData\Local\Temp\vslogs.zip" -Destination "C:\w\build-results\"
    exit 1
 }
--- a/.circleci/scripts/vs_install_cmath.ps1
+++ b/.circleci/scripts/vs_install_cmath.ps1
@ -1,5 +0,0 @@
-$CMATH_DOWNLOAD_LINK = "https://raw.githubusercontent.com/microsoft/STL/12c684bba78f9b032050526abdebf14f58ca26a3/stl/inc/cmath"
-$VC14_28_INSTALL_PATH="C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include"
-
-curl.exe --retry 3 -kL $CMATH_DOWNLOAD_LINK --output "$home\cmath"
-Move-Item -Path "$home\cmath" -Destination "$VC14_28_INSTALL_PATH" -Force
--- a/.circleci/scripts/windows_cuda_install.sh
+++ b/.circleci/scripts/windows_cuda_install.sh
@ -1,70 +1,61 @@
 #!/bin/bash
 set -eux -o pipefail

-case ${CUDA_VERSION} in
-    10.2)
-        cuda_installer_name="cuda_10.2.89_441.22_win10"
-        cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"
-        ;;
-    11.3)
-        cuda_installer_name="cuda_11.3.0_465.89_win10"
-        cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"
-        ;;
-    11.6)
-        cuda_installer_name="cuda_11.6.0_511.23_windows"
-        cuda_install_packages="thrust_11.6 nvcc_11.6 cuobjdump_11.6 nvprune_11.6 nvprof_11.6 cupti_11.6 cublas_11.6 cublas_dev_11.6 cudart_11.6 cufft_11.6 cufft_dev_11.6 curand_11.6 curand_dev_11.6 cusolver_11.6 cusolver_dev_11.6 cusparse_11.6 cusparse_dev_11.6 npp_11.6 npp_dev_11.6 nvrtc_11.6 nvrtc_dev_11.6 nvml_dev_11.6"
-        ;;
-    *)
-        echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
-        exit 1
-        ;;
-esac
+cuda_major_version=${CUDA_VERSION%.*}

-
-if [[ -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
-    echo "Existing CUDA v${CUDA_VERSION} installation found, skipping install"
+if [[ "$cuda_major_version" == "10" ]]; then
+    cuda_installer_name="cuda_10.1.243_426.00_win10"
+    msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
+    cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"
+elif [[ "$cuda_major_version" == "11" ]]; then
+    cuda_installer_name="cuda_11.1.0_456.43_win10"
+    msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
+    cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"
 else
-    tmp_dir=$(mktemp -d)
-    (
-        # no need to popd after, the subshell shouldn't affect the parent shell
-        pushd "${tmp_dir}"
-        cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"
-
-        curl --retry 3 -kLO $cuda_installer_link
-        7z x ${cuda_installer_name}.exe -o${cuda_installer_name}
-        pushd ${cuda_installer_name}
-        mkdir cuda_install_logs
-
-        set +e
-
-        # This breaks for some reason if you quote cuda_install_packages
-        # shellcheck disable=SC2086
-        ./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
-
-        set -e
-
-        if [[ ! -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
-            echo "CUDA installation failed"
-            mkdir -p /c/w/build-results
-            7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
-            exit 1
-        fi
-    )
-    rm -rf "${tmp_dir}"
+    echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
+    exit 1
 fi

-if [[ -f "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll" ]]; then
-    echo "Existing nvtools installation found, skipping install"
-else
-    # create tmp dir for download
-    tmp_dir=$(mktemp -d)
-    (
-        # no need to popd after, the subshell shouldn't affect the parent shell
-        pushd "${tmp_dir}"
-        curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
-        7z x NvToolsExt.7z -oNvToolsExt
-        mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
-        cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
-    )
-    rm -rf "${tmp_dir}"
+if [[ "$cuda_major_version" == "11" && "${JOB_EXECUTOR}" == "windows-with-nvidia-gpu" ]]; then
+    cuda_install_packages="${cuda_install_packages} Display.Driver"
 fi
+
+cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"
+
+curl --retry 3 -kLO $cuda_installer_link
+7z x ${cuda_installer_name}.exe -o${cuda_installer_name}
+cd ${cuda_installer_name}
+mkdir cuda_install_logs
+
+set +e
+
+./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
+
+set -e
+
+if [[ "${VC_YEAR}" == "2017" ]]; then
+    cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"
+else
+    cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"
+fi
+
+if ! ls "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll"
+then
+    curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
+    7z x NvToolsExt.7z -oNvToolsExt
+    mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
+    cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
+    export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"
+fi
+
+if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe"
+then
+    echo "CUDA installation failed"
+    mkdir -p /c/w/build-results
+    7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
+    exit 1
+fi
+
+cd ..
+rm -rf ./${cuda_installer_name}
+rm -f ./${cuda_installer_name}.exe
--- a/.circleci/scripts/windows_cudnn_install.sh
+++ b/.circleci/scripts/windows_cudnn_install.sh
@ -1,48 +1,21 @@
 #!/bin/bash
 set -eux -o pipefail

+cuda_major_version=${CUDA_VERSION%.*}

-windows_s3_link="https://ossci-windows.s3.amazonaws.com"
-
-case ${CUDA_VERSION} in
-    10.2)
-        cudnn_file_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.5.32"
-        ;;
-    11.3)
-        # Use cudnn8.3 with hard-coded cuda11.3 version
-        cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
-        ;;
-    11.6)
-        # Use cudnn8.3 with hard-coded cuda11.5 version
-        cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
-        ;;
-    *)
-        echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
-        exit 1
-        ;;
-esac
-
-cudnn_installer_name="cudnn_installer.zip"
-cudnn_installer_link="${windows_s3_link}/${cudnn_file_name}.zip"
-cudnn_install_folder="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"
-
-if [[ -f "${cudnn_install_folder}/include/cudnn.h" ]]; then
-    echo "Existing cudnn installation found, skipping install..."
+if [[ "$cuda_major_version" == "10" ]]; then
+    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"
+elif [[ "$cuda_major_version" == "11" ]]; then
+    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"
 else
-    tmp_dir=$(mktemp -d)
-    (
-        pushd "${tmp_dir}"
-        curl --retry 3 -o "${cudnn_installer_name}" "$cudnn_installer_link"
-        7z x "${cudnn_installer_name}" -ocudnn
-        # Use '${var:?}/*' to avoid potentially expanding to '/*'
-        # Remove all of the directories before attempting to copy files
-        rm -rf "${cudnn_install_folder:?}/*"
-        cp -rf cudnn/cuda/* "${cudnn_install_folder}"
-
-        #Make sure windows path contains zlib dll
-        curl -k -L "${windows_s3_link}/zlib123dllx64.zip" --output "${tmp_dir}\zlib123dllx64.zip"
-        7z x "${tmp_dir}\zlib123dllx64.zip" -o"${tmp_dir}\zlib"
-        xcopy /Y "${tmp_dir}\zlib\dll_x64\*.dll" "C:\Windows\System32"
-    )
-    rm -rf "${tmp_dir}"
+    echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"
+    exit 1
 fi
+
+cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/${cudnn_installer_name}.zip"
+
+curl --retry 3 -O $cudnn_installer_link
+7z x ${cudnn_installer_name}.zip -ocudnn
+cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"
+rm -rf cudnn
+rm -f ${cudnn_installer_name}.zip
--- a/.circleci/verbatim-sources/build-parameters/binary-build-params.yml
+++ b/.circleci/verbatim-sources/build-parameters/binary-build-params.yml
@ -62,4 +62,5 @@ binary_windows_params: &binary_windows_params
      default: "windows-xlarge-cpu-with-nvidia-cuda"
  environment:
    BUILD_ENVIRONMENT: << parameters.build_environment >>
+    BUILD_FOR_SYSTEM: windows
    JOB_EXECUTOR: <<parameters.executor>>
--- a/.circleci/verbatim-sources/build-parameters/promote-build-params.yml
+++ b/.circleci/verbatim-sources/build-parameters/promote-build-params.yml
@ -0,0 +1,14 @@
+
+promote_common: &promote_common
+  docker:
+    - image: pytorch/release
+  parameters:
+    package_name:
+      description: "package name to promote"
+      type: string
+      default: ""
+  environment:
+    PACKAGE_NAME: << parameters.package_name >>
+    ANACONDA_API_TOKEN: ${CONDA_PYTORCHBOT_TOKEN}
+    AWS_ACCESS_KEY_ID: ${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}
+    AWS_SECRET_ACCESS_KEY: ${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}
--- a/.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
+++ b/.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
@ -15,15 +15,11 @@ pytorch_params: &pytorch_params
    build_only:
      type: string
      default: ""
-    ci_master:
-      type: string
-      default: ""
  environment:
    BUILD_ENVIRONMENT: << parameters.build_environment >>
    DOCKER_IMAGE: << parameters.docker_image >>
    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
    BUILD_ONLY: << parameters.build_only >>
-    CI_MASTER: << pipeline.parameters.run_master_build >>
  resource_class: << parameters.resource_class >>

 pytorch_ios_params: &pytorch_ios_params
@ -43,20 +39,12 @@ pytorch_ios_params: &pytorch_ios_params
    use_metal:
      type: string
      default: "0"
-    lite_interpreter:
-      type: string
-      default: "1"
-    use_coreml:
-      type: string
-      default: "0"
  environment:
    BUILD_ENVIRONMENT: << parameters.build_environment >>
    IOS_ARCH: << parameters.ios_arch >>
    IOS_PLATFORM: << parameters.ios_platform >>
    SELECTED_OP_LIST: << parameters.op_list >>
    USE_PYTORCH_METAL: << parameters.use_metal >>
-    BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>
-    USE_COREML_DELEGATE: << parameters.use_coreml >>

 pytorch_windows_params: &pytorch_windows_params
  parameters:
@ -74,10 +62,7 @@ pytorch_windows_params: &pytorch_windows_params
      default: "10.1"
    python_version:
      type: string
-      default: "3.8"
-    vs_version:
-      type: string
-      default: "16.8.6"
+      default: "3.6"
    vc_version:
      type: string
      default: "14.16"
@ -95,11 +80,10 @@ pytorch_windows_params: &pytorch_windows_params
    SCCACHE_BUCKET: "ossci-compiler-cache"
    CUDA_VERSION: <<parameters.cuda_version>>
    PYTHON_VERSION: <<parameters.python_version>>
-    VS_VERSION: <<parameters.vs_version>>
    VC_VERSION: <<parameters.vc_version>>
    VC_YEAR: <<parameters.vc_year>>
    VC_PRODUCT: <<parameters.vc_product>>
    USE_CUDA: <<parameters.use_cuda>>
-    TORCH_CUDA_ARCH_LIST: "5.2 7.5"
+    TORCH_CUDA_ARCH_LIST: "7.5"
    JOB_BASE_NAME: <<parameters.test_name>>
    JOB_EXECUTOR: <<parameters.executor>>
--- a/.circleci/verbatim-sources/commands.yml
+++ b/.circleci/verbatim-sources/commands.yml
@ -111,11 +111,11 @@ commands:
                git config --global user.email "circleci.ossci@gmail.com"
                git config --global user.name "CircleCI"
                git config remote.origin.url https://github.com/pytorch/pytorch.git
-                git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
-                git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
+                git config --add remote.origin.fetch +refs/heads/release/1.8:refs/remotes/origin/release/1.8
+                git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/release/1.8:refs/remotes/origin/release/1.8 --depth=100 --quiet
                # PRs generated from ghstack has format CIRCLE_PR_BASE_BRANCH=gh/xxx/1234/base
                if [[ "${CIRCLE_PR_BASE_BRANCH}" == "gh/"* ]]; then
-                  CIRCLE_PR_BASE_BRANCH=master
+                  CIRCLE_PR_BASE_BRANCH=release/1.8
                fi
                export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/$CIRCLE_PR_BASE_BRANCH`
                echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
@ -171,4 +171,4 @@ commands:
            cd ~/project
            export ANDROID_BUILD_TYPE="<< parameters.build_type >>"
            export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
-            python3 -m tools.stats.upload_binary_size_to_scuba android
+            python3 .circleci/scripts/upload_binary_size_to_scuba.py android
--- a/.circleci/verbatim-sources/header-section.yml
+++ b/.circleci/verbatim-sources/header-section.yml
@ -14,18 +14,19 @@ parameters:
  run_build:
    type: boolean
    default: true
-  run_master_build:
-    type: boolean
-    default: false
-  run_slow_gradcheck_build:
-    type: boolean
-    default: false
+
+docker_config_defaults: &docker_config_defaults
+  user: jenkins
+  aws_auth:
+    # This IAM user only allows read-write access to ECR
+    aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}
+    aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}

 executors:
  windows-with-nvidia-gpu:
    machine:
      resource_class: windows.gpu.nvidia.medium
-      image: windows-server-2019-nvidia:previous
+      image: windows-server-2019-nvidia:stable
      shell: bash.exe

  windows-xlarge-cpu-with-nvidia-cuda:
--- a/.circleci/verbatim-sources/job-specs/binary-build-tests.yml
+++ b/.circleci/verbatim-sources/job-specs/binary-build-tests.yml
@ -3,12 +3,12 @@
 #  binary_linux_libtorch_3.6m_cpu_test:
 #    environment:
 #      BUILD_ENVIRONMENT: "libtorch 3.6m cpu"
-#    resource_class: gpu.nvidia.small
+#    resource_class: gpu.medium
 #    <<: *binary_linux_test
 #
 #  binary_linux_libtorch_3.6m_cu90_test:
 #    environment:
 #      BUILD_ENVIRONMENT: "libtorch 3.6m cu90"
-#    resource_class: gpu.nvidia.small
+#    resource_class: gpu.medium
 #    <<: *binary_linux_test
 #
--- a/.circleci/verbatim-sources/job-specs/binary-job-specs.yml
+++ b/.circleci/verbatim-sources/job-specs/binary-job-specs.yml
@ -1,4 +1,3 @@
-jobs:
  binary_linux_build:
    <<: *binary_linux_build_params
    steps:
@ -23,14 +22,14 @@ jobs:
        command: |
            ls -lah /final_pkgs
    - run:
-        name: upload build & binary data
+        name: save binary size
        no_output_timeout: "5m"
        command: |
            source /env
            cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
            python3 -mpip install requests && \
            SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \
-            python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
+            python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0
    - persist_to_workspace:
        root: /
        paths: final_pkgs
@ -46,7 +45,7 @@ jobs:
  binary_linux_test:
    <<: *binary_linux_test_upload_params
    machine:
-        image: ubuntu-2004:202104-01
+        image: ubuntu-1604:202007-01
    steps:
    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
    - checkout
@ -109,7 +108,7 @@ jobs:
  smoke_linux_test:
    <<: *binary_linux_test_upload_params
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -162,7 +161,6 @@ jobs:
    <<: *binary_mac_params
    macos:
      xcode: "12.0"
-      resource_class: "large"
    steps:
    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
    - checkout
@ -241,7 +239,7 @@ jobs:
  binary_ios_build:
    <<: *pytorch_ios_params
    macos:
-      xcode: "12.5.1"
+      xcode: "12.0"
    steps:
    - attach_workspace:
        at: ~/workspace
@ -268,7 +266,7 @@ jobs:
  binary_ios_upload:
    <<: *pytorch_ios_params
    macos:
-      xcode: "12.5.1"
+      xcode: "12.0"
    steps:
    - attach_workspace:
        at: ~/workspace
@ -310,8 +308,6 @@ jobs:
    - persist_to_workspace:
        root: "C:/w"
        paths: final_pkgs
-    - store_artifacts:
-        path: C:/w/final_pkgs

  binary_windows_test:
    <<: *binary_windows_params
@ -394,3 +390,4 @@ jobs:
          command: |
              ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \
              scripts/release/anaconda-prune/run.sh
+
--- a/.circleci/verbatim-sources/job-specs/binary_update_htmls.yml
+++ b/.circleci/verbatim-sources/job-specs/binary_update_htmls.yml
@ -8,7 +8,7 @@
  # then install the one with the most recent version.
  update_s3_htmls: &update_s3_htmls
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    resource_class: medium
    steps:
    - checkout
--- a/.circleci/verbatim-sources/job-specs/docker_jobs.yml
+++ b/.circleci/verbatim-sources/job-specs/docker_jobs.yml
@ -4,7 +4,7 @@
          type: string
          default: ""
      machine:
-        image: ubuntu-2004:202104-01
+        image: ubuntu-1604:202007-01
      resource_class: large
      environment:
        IMAGE_NAME: << parameters.image_name >>
@ -20,10 +20,7 @@
              set +x
              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
-              export AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
-              export AWS_REGION=us-east-1
-              aws ecr get-login-password --region $AWS_REGION|docker login --username AWS \
-                       --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
+              eval $(aws ecr get-login --no-include-email --region us-east-1)
              set -x
              # Check if image already exists, if it does then skip building it
              if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then
@ -54,3 +51,58 @@
              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
              set -x
              cd .circleci/docker && ./build_docker.sh
+  docker_for_ecr_gc_build_job:
+      machine:
+        image: ubuntu-1604:202007-01
+      steps:
+        - checkout
+        - run:
+            name: build_docker_image_for_ecr_gc
+            no_output_timeout: "1h"
+            command: |
+              cd .circleci/ecr_gc_docker
+              docker build . -t 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr
+              set +x
+              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
+              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
+              eval $(aws ecr get-login --no-include-email --region us-east-1)
+              set -x
+              docker push 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr
+  ecr_gc_job:
+      parameters:
+        project:
+          type: string
+          default: "pytorch"
+        tags_to_keep:  # comma separate values
+          type: string
+      environment:
+        PROJECT: << parameters.project >>
+        # TODO: Remove legacy image tags once we feel comfortable with new docker image tags
+        IMAGE_TAG: << parameters.tags_to_keep >>
+      docker:
+        - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr
+          aws_auth:
+            aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
+            aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
+
+      steps:
+        - checkout
+        - run:
+            # NOTE: see 'docker_build_job' for how these tags actually get built
+            name: dynamically generate tags to keep
+            no_output_timeout: "1h"
+            command: |
+              GENERATED_IMAGE_TAG=$(\
+                git log --oneline --pretty='%H' .circleci/docker \
+                  | xargs -I '{}' git rev-parse '{}:.circleci/docker' \
+                  | paste -sd "," -)
+              echo "export GENERATED_IMAGE_TAG='${GENERATED_IMAGE_TAG}'" >> ${BASH_ENV}
+        - run:
+            name: garbage collecting for ecr images
+            no_output_timeout: "1h"
+            command: |
+              set +x
+              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
+              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
+              set -x
+              /usr/bin/gc.py --filter-prefix ${PROJECT}  --ignore-tags "${IMAGE_TAG},${GENERATED_IMAGE_TAG}"
--- a/.circleci/verbatim-sources/job-specs/job-specs-custom.yml
+++ b/.circleci/verbatim-sources/job-specs/job-specs-custom.yml
@ -1,11 +1,11 @@
  pytorch_doc_push:
    resource_class: medium
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    parameters:
      branch:
        type: string
-        default: "main"
+        default: "master"
    steps:
    - attach_workspace:
        at: /tmp/workspace
@ -27,10 +27,10 @@
  pytorch_python_doc_build:
    environment:
      BUILD_ENVIRONMENT: pytorch-python-doc-push
-      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
+      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"
    resource_class: large
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -41,11 +41,10 @@
        no_output_timeout: "1h"
        command: |
          set -ex
-          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
+          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
-          # turn v1.12.0rc3 into 1.12
-          tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/')
-          target=${tag:-main}
+          tag=${CIRCLE_TAG:1:5}
+          target=${tag:-master}
          echo "building for ${target}"
          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
@ -55,7 +54,7 @@
          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

          mkdir -p ~/workspace/build_artifacts
-          docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/main ~/workspace/build_artifacts
+          docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/master ~/workspace/build_artifacts
          docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io /tmp/workspace

          # Save the docs build so we can debug any problems
@ -67,16 +66,16 @@
        paths:
          - .
    - store_artifacts:
-        path: ~/workspace/build_artifacts/main
+        path: ~/workspace/build_artifacts/master
        destination: docs

  pytorch_cpp_doc_build:
    environment:
      BUILD_ENVIRONMENT: pytorch-cpp-doc-push
-      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
+      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"
    resource_class: large
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -87,16 +86,15 @@
        no_output_timeout: "1h"
        command: |
          set -ex
-          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
+          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
-          # turn v1.12.0rc3 into 1.12
-          tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/')
-          target=${tag:-main}
+          tag=${CIRCLE_TAG:1:5}
+          target=${tag:-master}
          echo "building for ${target}"
          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

-          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" main") | docker exec -u jenkins -i "$id" bash) 2>&1'
+          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" master") | docker exec -u jenkins -i "$id" bash) 2>&1'

          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

@ -113,44 +111,6 @@
        paths:
          - .

-  pytorch_macos_10_15_py3_build:
-    environment:
-      BUILD_ENVIRONMENT: pytorch-macos-10.15-py3-arm64-build
-    macos:
-      xcode: "12.3.0"
-    steps:
-      - checkout
-      - run_brew_for_macos_build
-      - run:
-          name: Build
-          no_output_timeout: "1h"
-          command: |
-            set -e
-            export IN_CI=1
-            export CROSS_COMPILE_ARM64=1
-            export JOB_BASE_NAME=$CIRCLE_JOB
-
-            # Install sccache
-            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
-            sudo chmod +x /usr/local/bin/sccache
-            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
-
-            # This IAM user allows write access to S3 bucket for sccache
-            set +x
-            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
-            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
-            set -x
-
-            chmod a+x .jenkins/pytorch/macos-build.sh
-            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
-
-      - persist_to_workspace:
-          root: /Users/distiller/workspace/
-          paths:
-            - miniconda3
-      - store_artifacts:
-          path: /Users/distiller/project/dist
-
  pytorch_macos_10_13_py3_build:
    environment:
      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
@ -165,10 +125,9 @@
          command: |
            set -e
            export IN_CI=1
-            export JOB_BASE_NAME=$CIRCLE_JOB

            # Install sccache
-            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
+            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
            sudo chmod +x /usr/local/bin/sccache
            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

@ -202,49 +161,9 @@
          command: |
            set -e
            export IN_CI=1
-            export JOB_BASE_NAME=$CIRCLE_JOB

            chmod a+x .jenkins/pytorch/macos-test.sh
            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
-      - run:
-          name: Report results
-          no_output_timeout: "5m"
-          command: |
-            set -ex
-            source /Users/distiller/workspace/miniconda3/bin/activate
-            python3 -m pip install boto3==1.19.12
-
-            export IN_CI=1
-            export JOB_BASE_NAME=$CIRCLE_JOB
-
-            # Using the same IAM user to write stats to our OSS bucket
-            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
-            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
-            python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
-          when: always
-      - store_test_results:
-          path: test/test-reports
-
-  pytorch_macos_10_13_py3_lite_interpreter_build_test:
-    environment:
-      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
-    macos:
-      xcode: "12.0"
-    steps:
-      - checkout
-      - attach_workspace:
-          at: ~/workspace
-      - run_brew_for_macos_build
-      - run:
-          name: Test
-          no_output_timeout: "1h"
-          command: |
-            set -e
-            export IN_CI=1
-            export BUILD_LITE_INTERPRETER=1
-            export JOB_BASE_NAME=$CIRCLE_JOB
-            chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh
-            unbuffer ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts
      - store_test_results:
          path: test/test-reports

@ -252,10 +171,10 @@
    environment:
      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build
      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
-      PYTHON_VERSION: "3.7"
+      PYTHON_VERSION: "3.6"
    resource_class: large
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -266,7 +185,7 @@
        no_output_timeout: "1h"
        command: |
          set -eux
-          docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
+          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

          docker_image_libtorch_android_x86_32=${docker_image_commit}-android-x86_32
          docker_image_libtorch_android_x86_64=${docker_image_commit}-android-x86_64
@ -341,10 +260,10 @@
    environment:
      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot
      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
-      PYTHON_VERSION: "3.7"
+      PYTHON_VERSION: "3.6"
    resource_class: large
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -355,7 +274,7 @@
        no_output_timeout: "1h"
        command: |
          set -eux
-          docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
+          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

          docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle

@ -377,10 +296,10 @@
    environment:
      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32
      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
-      PYTHON_VERSION: "3.7"
+      PYTHON_VERSION: "3.6"
    resource_class: large
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -392,7 +311,7 @@
        no_output_timeout: "1h"
        command: |
          set -e
-          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32
+          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32
          echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}

          # x86
@ -415,10 +334,50 @@
        path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz
        destination: artifacts.tgz

+  pytorch_android_gradle_custom_build_single:
+    environment:
+      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single
+      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
+      PYTHON_VERSION: "3.6"
+    resource_class: large
+    machine:
+      image: ubuntu-1604:202007-01
+    steps:
+    - checkout
+    - calculate_docker_image_tag
+    - setup_linux_system_environment
+    - checkout
+    - calculate_docker_image_tag
+    - setup_ci_environment
+    - run:
+        name: pytorch android gradle custom build single architecture (for PR)
+        no_output_timeout: "1h"
+        command: |
+          set -e
+          # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because:
+          # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build;
+          # 2) Not parallelizable by architecture: it only builds libtorch for one architecture;
+
+          echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"
+          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null
+
+          git submodule sync && git submodule update -q --init --recursive
+          VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"
+          export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})
+
+          export COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
+          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
+
+          # Skip docker push as this job is purely for size analysis purpose.
+          # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied.
+
+    - upload_binary_size_for_android_build:
+        build_type: custom-build-single
+
  pytorch_ios_build:
    <<: *pytorch_ios_params
    macos:
-      xcode: "12.5.1"
+      xcode: "12.0"
    steps:
      - checkout
      - run_brew_for_ios_build
@ -432,17 +391,16 @@
            # install fastlane
            sudo gem install bundler && bundle install
            # install certificates
-            echo ${IOS_CERT_KEY_2022} >> cert.txt
+            echo ${IOS_CERT_KEY} >> cert.txt
            base64 --decode cert.txt -o Certificates.p12
            rm cert.txt
-            bundle exec fastlane install_root_cert
-            bundle exec fastlane install_dev_cert
+            bundle exec fastlane install_cert
            # install the provisioning profile
-            PROFILE=PyTorch_CI_2022.mobileprovision
+            PROFILE=PyTorch_CI_2021.mobileprovision
            PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
            mkdir -pv "${PROVISIONING_PROFILES}"
            cd "${PROVISIONING_PROFILES}"
-            echo ${IOS_SIGN_KEY_2022} >> cert.txt
+            echo ${IOS_SIGN_KEY} >> cert.txt
            base64 --decode cert.txt -o ${PROFILE}
            rm cert.txt
      - run:
@ -472,7 +430,7 @@
            # sync submodules
            cd ${PROJ_ROOT}
            git submodule sync
-            git submodule update --init --recursive --depth 1 --jobs 0
+            git submodule update --init --recursive

            # export
            export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
@ -482,8 +440,6 @@
            echo "IOS_ARCH: ${IOS_ARCH}"
            echo "IOS_PLATFORM: ${IOS_PLATFORM}"
            echo "USE_PYTORCH_METAL": "${USE_METAL}"
-            echo "BUILD_LITE_INTERPRETER": "${BUILD_LITE_INTERPRETER}"
-            echo "USE_COREML_DELEGATE": "${USE_COREML_DELEGATE}"

            #check the custom build flag
            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"
@ -492,7 +448,6 @@
            fi
            export IOS_ARCH=${IOS_ARCH}
            export IOS_PLATFORM=${IOS_PLATFORM}
-            export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}
            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
              export USE_PYTORCH_METAL=${USE_METAL}
            fi
@ -503,7 +458,7 @@
          command: |
            set -e
            PROJ_ROOT=/Users/distiller/project
-            PROFILE=PyTorch_CI_2022
+            PROFILE=PyTorch_CI_2021
            # run the ruby build script
            if ! [ -x "$(command -v xcodebuild)" ]; then
              echo 'Error: xcodebuild is not installed.'
@ -531,40 +486,18 @@
            WORKSPACE=/Users/distiller/workspace
            PROJ_ROOT=/Users/distiller/project
            source ~/anaconda/bin/activate
-            # use the pytorch nightly build to generate models
-            pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
-            # generate models for differnet backends
+            pip install torch torchvision --progress-bar off
+            #run unit test
            cd ${PROJ_ROOT}/ios/TestApp/benchmark
-            mkdir -p ../models
-            if [ ${USE_COREML_DELEGATE} == 1 ]; then
-              pip install coremltools==5.0b5
-              pip install six
-              python coreml_backend.py
-            else
-              python trace_model.py
-            fi
-            if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
-              echo "Setting up the TestApp for LiteInterpreter"
-              ruby setup.rb --lite 1
-            else
-              echo "Setting up the TestApp for Full JIT"
-              ruby setup.rb
-            fi
+            python trace_model.py
+            ruby setup.rb
            cd ${PROJ_ROOT}/ios/TestApp
            instruments -s -devices
-            if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
-              if [ ${USE_COREML_DELEGATE} == 1 ]; then
-                fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML
-              else
-                fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter
-              fi
-            else
-              fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT
-            fi
+            fastlane scan
  pytorch_linux_bazel_build:
    <<: *pytorch_params
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -580,9 +513,9 @@
          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null
          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

-          echo "Do NOT merge main branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
+          echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

-          git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
+          git submodule sync && git submodule update -q --init --recursive

          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

@ -593,7 +526,7 @@
          # Push intermediate Docker image for next phase to use
          if [ -z "${BUILD_ONLY}" ]; then
            # Augment our output image name with bazel to avoid collisions
-            output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
+            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
            export COMMIT_DOCKER_IMAGE=$output_image
            docker commit "$id" ${COMMIT_DOCKER_IMAGE}
            time docker push ${COMMIT_DOCKER_IMAGE}
@ -602,7 +535,7 @@
  pytorch_linux_bazel_test:
    <<: *pytorch_params
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -613,7 +546,7 @@
        no_output_timeout: "90m"
        command: |
          set -e
-          output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
+          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
          export COMMIT_DOCKER_IMAGE=$output_image
          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

@ -643,26 +576,13 @@
    - store_test_results:
        path: bazel-testlogs

-  pytorch_windows_test_multigpu:
-    machine:
-      image: ubuntu-2004:202104-01
-    steps:
-      - checkout
-      - run:
-          name: Test
-          no_output_timeout: "90m"
-          command: |
-            set -e
-            python3 -m pip install requests
-            python3 ./.circleci/scripts/trigger_azure_pipeline.py
-
  pytorch_doc_test:
    environment:
      BUILD_ENVIRONMENT: pytorch-doc-test
-      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
+      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"
    resource_class: medium
    machine:
-      image: ubuntu-2004:202104-01
+      image: ubuntu-1604:202007-01
    steps:
    - checkout
    - calculate_docker_image_tag
@ -673,7 +593,7 @@
        no_output_timeout: "30m"
        command: |
          set -ex
-          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
+          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
--- a/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
+++ b/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
@ -0,0 +1,350 @@
+jobs:
+  pytorch_linux_build:
+    <<: *pytorch_params
+    machine:
+      image: ubuntu-1604:202007-01
+    steps:
+    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
+    - checkout
+    - calculate_docker_image_tag
+    - setup_linux_system_environment
+    - optional_merge_target_branch
+    - setup_ci_environment
+    - run:
+        name: Build
+        no_output_timeout: "1h"
+        command: |
+          set -e
+          if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then
+            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"
+          fi
+          if [[ ${BUILD_ENVIRONMENT} == *"pure_torch"* ]]; then
+            echo 'BUILD_CAFFE2=OFF' >> "${BASH_ENV}"
+          fi
+          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then
+            echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}"
+            echo 'USE_TBB=1' >> "${BASH_ENV}"
+          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
+            echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}"
+          fi
+          echo "Parallel backend flags: "${PARALLEL_FLAGS}
+          # Pull Docker image and run build
+          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}:${DOCKER_TAG}
+          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null
+          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})
+
+          git submodule sync && git submodule update -q --init --recursive
+
+          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
+
+          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'
+
+          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
+
+          # Copy dist folder back
+          docker cp $id:/var/lib/jenkins/workspace/dist /home/circleci/project/. || echo "Dist folder not found"
+
+          # Push intermediate Docker image for next phase to use
+          if [ -z "${BUILD_ONLY}" ]; then
+            # Note [Special build images]
+            # The xla build uses the same docker image as
+            # pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to
+            # distinguish between them so the test can pick up the correct image.
+            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
+            if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-xla
+            elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-libtorch
+            elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb
+            elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-parallelnative
+            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64
+            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a
+            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v8a"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a
+            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_32"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32
+            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-vulkan-x86_32"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-android-vulkan-x86_32
+            elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then
+              export COMMIT_DOCKER_IMAGE=$output_image-vulkan
+            else
+              export COMMIT_DOCKER_IMAGE=$output_image
+            fi
+            docker commit "$id" ${COMMIT_DOCKER_IMAGE}
+            time docker push ${COMMIT_DOCKER_IMAGE}
+          fi
+    - store_artifacts:
+        path: /home/circleci/project/dist
+
+  pytorch_linux_test:
+    <<: *pytorch_params
+    machine:
+      image: ubuntu-1604:202007-01
+    steps:
+    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
+    - checkout
+    - calculate_docker_image_tag
+    - setup_linux_system_environment
+    - setup_ci_environment
+    - run:
+        name: Download Docker image
+        no_output_timeout: "90m"
+        command: |
+          set -e
+          export PYTHONUNBUFFERED=1
+          if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then
+            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"
+          fi
+          # See Note [Special build images]
+          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
+          if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
+            export COMMIT_DOCKER_IMAGE=$output_image-xla
+          elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then
+            export COMMIT_DOCKER_IMAGE=$output_image-libtorch
+          elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then
+            export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb
+          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
+            export COMMIT_DOCKER_IMAGE=$output_image-parallelnative
+          elif [[ ${BUILD_ENVIRONMENT} == *"vulkan-linux"* ]]; then
+            export COMMIT_DOCKER_IMAGE=$output_image-vulkan
+          else
+            export COMMIT_DOCKER_IMAGE=$output_image
+          fi
+          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
+
+          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then
+            echo 'ATEN_THREADING=TBB' >> "${BASH_ENV}"
+            echo 'USE_TBB=1' >> "${BASH_ENV}"
+          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
+            echo 'ATEN_THREADING=NATIVE' >> "${BASH_ENV}"
+          fi
+          echo "Parallel backend flags: "${PARALLEL_FLAGS}
+
+          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
+
+          # TODO: Make this less painful
+          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
+            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --gpus all --shm-size=2g -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
+          elif [[ ${BUILD_ENVIRONMENT} == *"rocm"* ]]; then
+            hostname
+            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=8g --ipc=host --device /dev/kfd --device /dev/dri --group-add video -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
+          else
+            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=1g --ipc=host -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
+          fi
+          echo "id=${id}" >> "${BASH_ENV}"
+
+    - run:
+        name: Check for no AVX instruction by default
+        no_output_timeout: "20m"
+        command: |
+          set -e
+          is_vanilla_build() {
+            if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-bionic-py3.6-clang9-test" ]; then
+              return 0
+            fi
+            if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-xenial-py3.6-gcc5.4-test" ]; then
+              return 0
+            fi
+            return 1
+          }
+
+          if is_vanilla_build; then
+            echo "apt-get update && apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash
+            echo "cd workspace/build; qemu-x86_64 -g 2345 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU & gdb ./bin/basic -ex 'set pagination off' -ex 'target remote :2345' -ex 'continue' -ex 'bt' -ex='set confirm off' -ex 'quit \$_isvoid(\$_exitcode)'" | docker exec -u jenkins -i "$id" bash
+          else
+            echo "Skipping for ${BUILD_ENVIRONMENT}"
+          fi
+    - run:
+        name: Run tests
+        no_output_timeout: "90m"
+        command: |
+          set -e
+
+          cat >docker_commands.sh \<<EOL
+          # =================== The following code will be executed inside Docker container ===================
+          set -ex
+          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"
+          ${PARALLEL_FLAGS}
+          cd workspace
+          EOL
+          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then
+            echo ".jenkins/pytorch/multigpu-test.sh" >> docker_commands.sh
+          elif [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then
+            echo "pip install click mock tabulate networkx==2.0" >> docker_commands.sh
+            echo "pip -q install --user \"file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx\"" >> docker_commands.sh
+            echo ".jenkins/caffe2/test.sh" >> docker_commands.sh
+          else
+            echo ".jenkins/pytorch/test.sh" >> docker_commands.sh
+          fi
+          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh
+          unbuffer bash command.sh | ts
+    - run:
+        name: Report results
+        no_output_timeout: "5m"
+        command: |
+          set -e
+          docker stats --all --no-stream
+
+          cat >docker_commands.sh \<<EOL
+          # =================== The following code will be executed inside Docker container ===================
+          set -ex
+          export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}
+          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"
+          export CIRCLE_TAG="${CIRCLE_TAG:-}"
+          export CIRCLE_SHA1="$CIRCLE_SHA1"
+          export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
+          export CIRCLE_BRANCH="$CIRCLE_BRANCH"
+          export CIRCLE_JOB="$CIRCLE_JOB"
+          export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"
+          cd workspace
+          python test/print_test_stats.py --upload-to-s3 test
+          EOL
+          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh
+          unbuffer bash command.sh | ts
+
+          echo "Retrieving test reports"
+          docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'
+          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* ]]; then
+              echo "Retrieving C++ coverage report"
+              docker cp $id:/var/lib/jenkins/workspace/build/coverage.info ./test
+          fi
+          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* || ${BUILD_ENVIRONMENT} == *"onnx"* ]]; then
+              echo "Retrieving Python coverage report"
+              docker cp $id:/var/lib/jenkins/workspace/test/.coverage ./test
+              docker cp $id:/var/lib/jenkins/workspace/test/coverage.xml ./test
+              python3 -mpip install codecov
+              python3 -mcodecov
+          fi
+        when: always
+    - store_test_results:
+        path: test-reports
+
+  pytorch_windows_build:
+    <<: *pytorch_windows_params
+    parameters:
+      executor:
+        type: string
+        default: "windows-xlarge-cpu-with-nvidia-cuda"
+      build_environment:
+        type: string
+        default: ""
+      test_name:
+        type: string
+        default: ""
+      cuda_version:
+        type: string
+        default: "10.1"
+      python_version:
+        type: string
+        default: "3.6"
+      vc_version:
+        type: string
+        default: "14.16"
+      vc_year:
+        type: string
+        default: "2019"
+      vc_product:
+        type: string
+        default: "BuildTools"
+      use_cuda:
+        type: string
+        default: ""
+    executor: <<parameters.executor>>
+    steps:
+      - checkout
+      - run:
+          name: Install Cuda
+          no_output_timeout: 30m
+          command: |
+            if [[ "${USE_CUDA}" == "1" ]]; then
+              .circleci/scripts/windows_cuda_install.sh
+            fi
+      - run:
+          name: Install Cudnn
+          command : |
+            if [[ "${USE_CUDA}" == "1" ]]; then
+              .circleci/scripts/windows_cudnn_install.sh
+            fi
+      - run:
+          name: Build
+          no_output_timeout: "90m"
+          command: |
+            set -e
+            set +x
+            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
+            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
+            set -x
+            .jenkins/pytorch/win-build.sh
+      - persist_to_workspace:
+          root: "C:/w"
+          paths: build-results
+      - store_artifacts:
+          path: C:/w/build-results
+
+  pytorch_windows_test:
+    <<: *pytorch_windows_params
+    parameters:
+      executor:
+        type: string
+        default: "windows-medium-cpu-with-nvidia-cuda"
+      build_environment:
+        type: string
+        default: ""
+      test_name:
+        type: string
+        default: ""
+      cuda_version:
+        type: string
+        default: "10.1"
+      python_version:
+        type: string
+        default: "3.6"
+      vc_version:
+        type: string
+        default: "14.16"
+      vc_year:
+        type: string
+        default: "2019"
+      vc_product:
+        type: string
+        default: "BuildTools"
+      use_cuda:
+        type: string
+        default: ""
+    executor: <<parameters.executor>>
+    steps:
+      - checkout
+      - attach_workspace:
+          at: c:/users/circleci/workspace
+      - run:
+          name: Install Cuda
+          no_output_timeout: 30m
+          command: |
+            if [[ "${CUDA_VERSION}" != "cpu" ]]; then
+              if [[ "${CUDA_VERSION}" != "10" || "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then
+                .circleci/scripts/windows_cuda_install.sh
+              fi
+            fi
+      - run:
+          name: Install Cudnn
+          command : |
+            if [[ "${CUDA_VERSION}" != "cpu" ]]; then
+              .circleci/scripts/windows_cudnn_install.sh
+            fi
+      - run:
+          name: Test
+          no_output_timeout: "30m"
+          command: |
+            set -e
+            export IN_CI=1
+            set +x
+            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
+            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
+            set -x
+            .jenkins/pytorch/win-test.sh
+      - store_test_results:
+          path: test/test-reports
--- a/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
+++ b/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
@ -26,7 +26,6 @@
 # (smoke tests and upload jobs do not need the pytorch repo).
 binary_checkout: &binary_checkout
  name: Checkout pytorch/builder repo
-  no_output_timeout: "30m"
  command: .circleci/scripts/binary_checkout.sh

 # Parses circleci arguments in a consistent way, essentially routing to the
--- a/.circleci/verbatim-sources/workflows/workflows-ecr-gc.yml
+++ b/.circleci/verbatim-sources/workflows/workflows-ecr-gc.yml
@ -0,0 +1,34 @@
+  ecr_gc:
+    triggers:
+      - schedule:
+          cron: "45 * * * *"
+          filters:
+            branches:
+              only:
+                - master
+    jobs:
+      - docker_for_ecr_gc_build_job
+      - ecr_gc_job:
+            name: ecr_gc_job_for_pytorch
+            project: pytorch
+            tags_to_keep: "271,262,256,278,282,291,300,323,327,347,389,401,402,403,405,a8006f9a-272d-4478-b137-d121c6f05c83,6e7b11da-a919-49e5-b2ba-da66e3d4bb0a,f990c76a-a798-42bb-852f-5be5006f8026,e43973a9-9d5a-4138-9181-a08a0fc55e2f,8fcf46ef-4a34-480b-a8ee-b0a30a4d3e59,9a3986fa-7ce7-4a36-a001-3c9bef9892e2,1bc00f11-e0f3-4e5c-859f-15937dd938cd,209062ef-ab58-422a-b295-36c4eed6e906,be76e8fd-44e2-484d-b090-07e0cc3a56f0,fff7795428560442086f7b2bb6004b65245dc11a,ab1632df-fa59-40e6-8c23-98e004f61148"
+            requires:
+              - docker_for_ecr_gc_build_job
+      - ecr_gc_job:
+            name: ecr_gc_job_for_caffe2
+            project: caffe2
+            tags_to_keep: "376,373,369,348,345,336,325,324,315,306,301,287,283,276,273,266,253,248,238,230,213"
+            requires:
+              - docker_for_ecr_gc_build_job
+      - ecr_gc_job:
+            name: ecr_gc_job_for_translate
+            project: translate
+            tags_to_keep: "8"
+            requires:
+              - docker_for_ecr_gc_build_job
+      - ecr_gc_job:
+            name: ecr_gc_job_for_tensorcomp
+            project: tensorcomp
+            tags_to_keep: "34"
+            requires:
+              - docker_for_ecr_gc_build_job
--- a/.circleci/verbatim-sources/workflows/workflows-promote.yml
+++ b/.circleci/verbatim-sources/workflows/workflows-promote.yml
@ -0,0 +1,46 @@
+  # Promotion workflow
+  promote:
+    jobs:
+      # Requires manual approval by someone in org-member
+      # CircleCI security context
+      - promote_approval:
+          context: org-member
+          filters:
+            branches:
+              ignore: /.*/
+            tags:
+              only: /v[0-9]+(\.[0-9]+)*/
+          type: approval
+      - promote_s3:
+          context: org-member
+          filters:
+            branches:
+              ignore: /.*/
+            tags:
+              only: /v[0-9]+(\.[0-9]+)*/
+          name: promote_s3_libtorch
+          package_name: libtorch
+          requires:
+            - promote_approval
+      - promote_s3:
+          context: org-member
+          filters:
+            branches:
+              ignore: /.*/
+            tags:
+              only: /v[0-9]+(\.[0-9]+)*/
+          name: promote_s3_torch
+          package_name: torch
+          requires:
+            - promote_approval
+      - promote_conda:
+          context: org-member
+          filters:
+            branches:
+              ignore: /.*/
+            tags:
+              only: /v[0-9]+(\.[0-9]+)*/
+          name: promote_conda_pytorch
+          package_name: pytorch
+          requires:
+            - promote_approval
--- a/.circleci/windows-jni/include/jni.h
+++ b/.circleci/windows-jni/include/jni.h
@ -1129,3 +1129,4 @@ JNIEXPORT void JNI_OnUnload(JavaVM* vm, void* reserved);
 #define JNI_ABORT       2           /* free buffer w/o copying back */

 #endif  /* JNI_H_ */
+
--- a/.clang-tidy
+++ b/.clang-tidy
@ -6,12 +6,8 @@ bugprone-*,
 -bugprone-forward-declaration-namespace,
 -bugprone-macro-parentheses,
 -bugprone-lambda-function-name,
-bugprone-reserved-identifier,
 cppcoreguidelines-*,
-cppcoreguidelines-avoid-magic-numbers,
-cppcoreguidelines-avoid-non-const-global-variables,
 -cppcoreguidelines-interfaces-global-init,
-cppcoreguidelines-macro-usage,
 -cppcoreguidelines-owning-memory,
 -cppcoreguidelines-pro-bounds-array-to-pointer-decay,
 -cppcoreguidelines-pro-bounds-constant-array-index,
@ -22,7 +18,6 @@ cppcoreguidelines-*,
 -cppcoreguidelines-pro-type-union-access,
 -cppcoreguidelines-pro-type-vararg,
 -cppcoreguidelines-special-member-functions,
-cppcoreguidelines-non-private-member-variables-in-classes,
 -facebook-hte-RelativeInclude,
 hicpp-exception-baseclass,
 hicpp-avoid-goto,
@ -33,13 +28,10 @@ modernize-*,
 -modernize-use-default-member-init,
 -modernize-use-using,
 -modernize-use-trailing-return-type,
-modernize-use-nodiscard,
 performance-*,
 -performance-noexcept-move-constructor,
-performance-unnecessary-value-param,
 '
-HeaderFilterRegex: 'torch/csrc/(?!deploy/interpreter/cpython).*'
+HeaderFilterRegex: 'torch/csrc/.*'
 AnalyzeTemporaryDtors: false
-WarningsAsErrors: '*'
 CheckOptions:
 ...
--- a/.coveragerc
+++ b/.coveragerc
@ -1,15 +0,0 @@
-[run]
-plugins =
-    coverage_plugins.jit_plugin
-omit =
-    */tmp*
-    */Temp/*
-    */usr/local/lib*
-    *test/*
-
-[report]
-omit =
-    */tmp*
-    */Temp/*
-    */usr/local/lib*
-    *test/*
--- a/.flake8
+++ b/.flake8
@ -4,7 +4,7 @@ max-line-length = 120
 # C408 ignored because we like the dict keyword argument syntax
 # E501 is not flexible enough, we're using B950 instead
 ignore =
-    E203,E305,E402,E501,E721,E741,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
+    E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
    # shebang has extra meaning in fbcode lints, so I think it's not worth trying
    # to line this up with executable bit
    EXE001,
@ -13,19 +13,21 @@ ignore =
    # these ignores are from flake8-comprehensions; please fix!
    C400,C401,C402,C403,C404,C405,C407,C411,C413,C414,C415
 per-file-ignores = __init__.py: F401 torch/utils/cpp_extension.py: B950
-optional-ascii-coding = True
 exclude =
-    ./.git,
-    ./build_test_custom_build,
-    ./build,
-    ./caffe2,
-    ./docs/caffe2,
-    ./docs/cpp/src,
-    ./docs/src,
-    ./scripts,
-    ./test/generated_type_hints_smoketest.py,
-    ./third_party,
-    ./torch/include,
-    ./torch/lib,
-    ./venv,
-    *.pyi
+    docs/src,
+    docs/cpp/src,
+    venv,
+    third_party,
+    caffe2,
+    scripts,
+    docs/caffe2,
+    torch/lib/include,
+    torch/lib/tmp_install,
+    build,
+    torch/include,
+    *.pyi,
+    .git,
+    build,
+    build_test_custom_build,
+    build_code_analyzer,
+    test/generated_type_hints_smoketest.py
--- a/.gdbinit
+++ b/.gdbinit
@ -1,14 +0,0 @@
-# automatically load the pytoch-gdb extension.
-#
-# gdb automatically tries to load this file whenever it is executed from the
-# root of the pytorch repo, but by default it is not allowed to do so due to
-# security reasons. If you want to use pytorch-gdb, please add the following
-# line to your ~/.gdbinit (i.e., the .gdbinit file which is in your home
-# directory, NOT this file):
-#    add-auto-load-safe-path /path/to/pytorch/.gdbinit
-#
-# Alternatively, you can manually load the pytorch-gdb commands into your
-# existing gdb session by doing the following:
-#    (gdb) source /path/to/pytorch/tools/gdb/pytorch-gdb.py
-
-source tools/gdb/pytorch-gdb.py
--- a/.git-blame-ignore-revs
+++ b/.git-blame-ignore-revs
@ -1,24 +0,0 @@
-# 2020-11-12 Enabled ShellCheck on `.jenkins/pytorch`
-65d5004b09fd8d5deac173a3aaa259f46eaa0d67
-# 2021-01-20 Replaced `   ` with `...` in many doctests
-c147aa306c6386a753fdff24b48d04e803070a63
-# 2021-03-05 Removed all trailing whitespace
-8c798e062216278673a75bac0848ea69a8bd3f03
-# 2021-03-30 Normalized trailing newlines
-5bcbbf537327f6e8328289c25a3a453a2444d984
-# 2021-03-31 Autogenerated Markdown ToCs
-a74b10def961ab090385f291ee06e66db99c1a2f
-# 2021-04-02 Enabled more ShellCheck warnings
-09670c7d43b9abce862a6bf71d8cc89e64764bdb
-# 2021-04-08 Removed all non-breaking spaces
-cc11aaaa60aadf28e3ec278bce26a42c1cd68a4f
-# 2021-04-13 Expanded many wildcard imports
-4753100a3baa96273204c361c8452afb7b59836f
-# 2021-04-19 Removed all unqualified `noqa`
-e3900d2ba5c9f91a24a9ce34520794c8366d5c54
-# 2021-04-21 Removed all unqualified `type: ignore`
-75024e228ca441290b6a1c2e564300ad507d7af6
-# 2021-05-14 Removed all versionless Python shebangs
-2e26976ad3b06ce95dd6afccfdbe124802edf28f
-# 2021-06-07 Strictly typed everything in `.github` and `tools`
-737d920b21db9b4292d056ee1329945990656304
--- a/.gitattributes
+++ b/.gitattributes
@ -1,6 +1 @@
-*.bat text eol=crlf
-.circleci/config.yml linguist-generated=true
-.github/workflows/generated-*.yml linguist-generated=true
-.github/generated-* linguist-generated=true
-.github/scripts/gql_mocks.json linguist-generated=true
-third_party/LICENSES_BUNDLED.txt linguist-generated=true
+*.bat	text eol=crlf
--- a/.github/ISSUE_TEMPLATE/bug-report.md
+++ b/.github/ISSUE_TEMPLATE/bug-report.md
@ -0,0 +1,49 @@
+---
+name: "\U0001F41B Bug Report"
+about: Submit a bug report to help us improve PyTorch
+
+---
+
+## 🐛 Bug
+
+<!-- A clear and concise description of what the bug is. -->
+
+## To Reproduce
+
+Steps to reproduce the behavior:
+
+1.
+1.
+1.
+
+<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->
+
+## Expected behavior
+
+<!-- A clear and concise description of what you expected to happen. -->
+
+## Environment
+
+Please copy and paste the output from our
+[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
+(or fill out the checklist below manually).
+
+You can get the script and run it with:
+```
+wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
+# For security purposes, please check the contents of collect_env.py before running it.
+python collect_env.py
+```
+
+ - PyTorch Version (e.g., 1.0):
+ - OS (e.g., Linux):
+ - How you installed PyTorch (`conda`, `pip`, source):
+ - Build command you used (if compiling from source):
+ - Python version:
+ - CUDA/cuDNN version:
+ - GPU models and configuration:
+ - Any other relevant information:
+
+## Additional context
+
+<!-- Add any other context about the problem here. -->
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@ -1,56 +0,0 @@
-name: 🐛 Bug Report
-description: Create a report to help us reproduce and fix the bug
-
-body:
- type: markdown
-  attributes:
-    value: >
-      #### Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/pytorch/pytorch/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
-  attributes:
-    label: 🐛 Describe the bug
-    description: |
-      Please provide a clear and concise description of what the bug is.
-
-      If relevant, add a minimal example so that we can reproduce the error by running the code. It is very important for the snippet to be as succinct (minimal) as possible, so please take time to trim down any irrelevant code to help us debug efficiently. We are going to copy-paste your code and we expect to get the same result as you did: avoid any external data, and include the relevant imports, etc. For example:
-
-      ```python
-      # All necessary imports at the beginning
-      import torch
-
-      # A succinct reproducing example trimmed down to the essential parts:
-      t = torch.rand(5, 10)  # Note: the bug is here, we should pass requires_grad=True
-      t.sum().backward()
-      ```
-
-      If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.
-
-      Please also paste or describe the results you observe instead of the expected results. If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.
-    placeholder: |
-      A clear and concise description of what the bug is.
-
-      ```python
-      # Sample code to reproduce the problem
-      ```
-
-      ```
-      The error message you got, with the full traceback.
-      ```
-  validations:
-    required: true
- type: textarea
-  attributes:
-    label: Versions
-    description: |
-      Please run the following and paste the output below.
-      ```sh
-      wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
-      # For security purposes, please check the contents of collect_env.py before running it.
-      python collect_env.py
-      ```
-  validations:
-    required: true
- type: markdown
-  attributes:
-    value: >
-      Thanks for contributing 🎉!
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Jeff Daily	37c1f4a7fe	Fix hipify_python (#52756 ) Co-authored-by: rraminen <rraminen@amd.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-02-26 14:13:54 -08:00
Rong Rong	49b74a52a4	Catch Flake8 error codes with multiple letters (#52750 ) (#52801 ) Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - https://github.com/pytorch/pytorch/issues/50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - https://github.com/pytorch/pytorch/issues/51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - https://github.com/pytorch/pytorch/issues/51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - https://github.com/pytorch/pytorch/issues/51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - https://github.com/pytorch/pytorch/issues/51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - https://github.com/pytorch/pytorch/issues/51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: https://github.com/pytorch/pytorch/pull/52750 Test Plan: The Lint / flake8-py3 job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in https://github.com/pytorch/pytorch/issues/52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804 Co-authored-by: Sam Estep <sestep@fb.com>	2021-02-26 07:49:51 -08:00
Neeraj Pradhan	11c78e9cb3	Expose documentation for LKJCholesky distribution (#52904 ) This is already added to the master branch in https://github.com/pytorch/pytorch/pull/52763.	2021-02-26 07:47:29 -08:00
X Wang	d6943ea58d	apply diff 52351 (#52649 )	2021-02-23 07:51:38 -08:00
Nikita Shulga	02b61b49ea	[1.8] Update XNNPACK (#52647 ) Cherry-pick `55d53a4e70` into release/1.8 branch	2021-02-23 05:31:57 -08:00
Luca Wehrstedt	d553478c98	[v1.8] Make TensorPipe work around bug in old versions of libibverbs (#52615 ) The bug affects PyTorch users who meet two conditions: - they have an old version of libibverbs installed (the userspace library), namely older than v25, which dates from Jul 29, 2019; - but they do _not_ have an InfiniBand kernel module loaded. In those cases they will experience a crash (uncaught exception) happening when initializing RPC, mentioning an "unknown error -38". There is a workaround, which is for those users to activate a killswitch (which is private and undocumented) to disable the `ibv` backend of TensorPipe.	2021-02-22 16:55:12 -08:00
Nikita Shulga	63333e2a25	[1.8] Update api doc for enabling TcpStore on Windows (#52601 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/51847 Reviewed By: albanD Differential Revision: D26405678 Pulled By: malfet fbshipit-source-id: 073b675225b48d1732771583f8f2473e0fdcf35c Co-authored-by: Joe Zhu <jozh@microsoft.com>	2021-02-22 10:14:09 -08:00
Bowen Bao	8e7eebfc9a	[1.8] Fix onnx mixed precision export for layernorm & fuseLogSoftmaxNllLoss (#52510 ) Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>	2021-02-19 14:40:53 -08:00
Eli Uriegas	f8afb8bdd0	[v1.8.0] Various CUDA 11.1 with BUILD_SPLIT_CUDA_FIXES (#52518 ) Co-authored-by: Nikita Shulga <nshulga@fb.com> Co-authored-by: peterjc123 <peterghost86@gmail.com> Co-authored-by: Jane Xu <janeyx@fb.com>	2021-02-19 12:41:21 -08:00
eellison	0851cc42b0	Update freezing API - changes from 52337 (#52392 ) Co-authored-by: eellison <eellison@fb.com>	2021-02-18 15:36:51 -08:00
Jane (Yuan) Xu	804f7b6018	Add arm64 binary build (#52443 ) (#52469 ) Summary: This is getting tested by https://github.com/pytorch/pytorch/issues/52441. Adds new config for macos arm64 to our binary builds. Now stores artifacts for mac builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52443 Reviewed By: walterddr Differential Revision: D26517330 Pulled By: janeyx99 fbshipit-source-id: 02774937a827bdd4c08486dc9f8fe63446917f1e	2021-02-18 15:17:27 -08:00
SplitInfinity	32758d30b3	onnx export of per channel fake quantize functions (#42835 ) (#52430 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39502 This PR adds support for exporting fake_quantize_per_channel_affine to a pair of QuantizeLinear and DequantizeLinear. Per tensor support was added by PR https://github.com/pytorch/pytorch/pull/39738. `axis` attribute of QuantizeLinear and DequantizeLinear, which is required for per channel support, is added in opset13 added by https://github.com/onnx/onnx/pull/2772. [update 1/20/2021]: opset13 is being supported on master, the added function is now properly tested. Code also rebased to new master. The function is also tested offline with the following code ```python import torch from torch import quantization from torchvision import models qat_resnet18 = models.resnet18(pretrained=True).eval().cuda() qat_resnet18.qconfig = quantization.QConfig( activation=quantization.default_fake_quant, weight=quantization.default_per_channel_weight_fake_quant) quantization.prepare_qat(qat_resnet18, inplace=True) qat_resnet18.apply(quantization.enable_observer) qat_resnet18.apply(quantization.enable_fake_quant) dummy_input = torch.randn(16, 3, 224, 224).cuda() _ = qat_resnet18(dummy_input) for module in qat_resnet18.modules(): if isinstance(module, quantization.FakeQuantize): module.calculate_qparams() qat_resnet18.apply(quantization.disable_observer) qat_resnet18.cuda() input_names = [ "actual_input_1" ] output_names = [ "output1" ] torch.onnx.export(qat_resnet18, dummy_input, "quant_model.onnx", verbose=True, opset_version=13) ``` It can generate the desired graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42835 Reviewed By: houseroad Differential Revision: D26293823 Pulled By: SplitInfinity fbshipit-source-id: 300498a2e24b7731b12fa2fbdea4e73dde80e7ea Co-authored-by: Hao Wu <skyw@users.noreply.github.com>	2021-02-18 12:50:40 -08:00
gchanan	bcb64a8084	Fix upsample bicubic2d batching handling on CPU. (#52389 ) (#52445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52389 Fixes: https://github.com/pytorch/pytorch/issues/49159 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26496319 Pulled By: gchanan fbshipit-source-id: d385cd683ef09e0596a9875ce84d03e6e77acc93	2021-02-18 12:46:39 -08:00
albanD	f07991d396	update symeig backward note about similar eigenvalues (#52311 ) (#52446 ) Summary: First part of https://github.com/pytorch/pytorch/issues/49886 to at least properly warn users of the current state Pull Request resolved: https://github.com/pytorch/pytorch/pull/52311 Reviewed By: soulitzer Differential Revision: D26495644 Pulled By: albanD fbshipit-source-id: 72abdfe41cdbcc1ac739a536eb85d1aa4ba90897	2021-02-18 12:45:47 -08:00
Eli Uriegas	c458cd4852	[v1.8.0] .circleci: Downgrade CUDA 11.2 -> 11.1 for binaries (#52151 ) (#52406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52151 CUDA 11.2 might not be as performant as we thought so let's downgrade to something we think is more performant. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26408314 Pulled By: seemethere fbshipit-source-id: e2446aa0115e2c2a79718b1fdfd9fccf2072822d (cherry picked from commit a11650b069729997b002032d70e9793477147851) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2021-02-18 10:59:03 -08:00
Nikita Shulga	f7c4afc0f4	[cmake] Add explicit cublas->cudart dependency (#52243 ) (#52404 ) Summary: Necessary to ensure correct link order, especially if libraries are linked statically. Otherwise, one might run into: ``` /usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0' /usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52243 Reviewed By: seemethere, ngimel Differential Revision: D26437159 Pulled By: malfet fbshipit-source-id: 33b8bb5040bda10537833f3ad737f535488452ea	2021-02-17 16:07:41 -08:00
Richard Zou	20554c00b6	[1.8] Remove torch.vmap (#52397 ) torch.vmap is a prototype feature and should not be in the stable binary. This PR: - Removes the `torch.vmap` API - Removes the documentation entry for torch.vmap - Changes the vmap tests to use an internal API instead of torch.vmap. Test Plan: - Tested locally (test_torch, test_autograd, test_type_hints, test_vmap), but also wait for CI.	2021-02-17 16:05:34 -08:00
Nikita Shulga	3464d64f08	[1.8] Fix libnvrtc discoverability in package patched by `auditwheel` (#52365 )	2021-02-17 16:05:05 -08:00
Vitaly Fedyunin	c6972eb3ac	Skip OneDNN Convolution in case of groups = 24 #50042 (#52313 ) Co-authored-by: Vitaly Fedyunin <vitaly.fedyunin@gmail.com>	2021-02-17 16:04:26 -08:00
Rohan Varma	25562d3d41	Use side-stream in CPU to GPU copies in DDP (#50180 ) (#52270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50180 Resolves the regression in https://github.com/pytorch/pytorch/issues/49819 by adding copy over background stream similar to scatter. For internal use cases, this is gated with an env var that maintains the previous behavior when it is off. Test Plan: CI Reviewed By: mrshenli, ngimel Differential Revision: D25818170 fbshipit-source-id: e50c76c035504b2a44e2be084701cee45c90df75	2021-02-17 09:49:30 -08:00
Mike Ruberry	cd63c37bc6	ports fix (#52242 ) Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2021-02-13 17:59:51 -08:00
Yi Wang	c79decdbba	[v1.8 patch] [Resubmission] Add a documentation page for DDP communication hooks (#52215 ) Co-authored-by: wayi <wayi@devgpu238.prn2.facebook.com>	2021-02-12 16:37:23 -08:00
Nikita Shulga	c307a3f336	[1.8] Do not print warning if CUDA driver not found (#51806 ) (#52050 ) Summary: It frequently happens when PyTorch compiled with CUDA support is installed on machine that does not have NVIDIA GPUs. Fixes https://github.com/pytorch/pytorch/issues/47038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51806 Reviewed By: ezyang Differential Revision: D26285827 Pulled By: malfet fbshipit-source-id: 9fd5e690d0135a2b219c1afa803fb69de9729f5e	2021-02-12 12:20:46 -08:00
Nikita Shulga	f071020756	Workaround arm64 gcc error in `std::copysign` (#51900 ) (#52049 ) Summary: Move definition of copysign template and specialization for bfloat16/half types before first use of copysign in that file Add comment explaining why this is necessary Fixes https://github.com/pytorch/pytorch/issues/51889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51900 Reviewed By: walterddr Differential Revision: D26321741 Pulled By: malfet fbshipit-source-id: 888858b11d9708fa140fe9c0570cc5a24599205b	2021-02-12 08:00:46 -08:00
Vasiliy Kuznetsov	4f436f8570	fake_quant cachemask: remove Python bindings (#51878 ) (#52160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51878 `fake_quantize_per_tensor_affine_cachemask` and `fake_quantize_per_channel_affine_cachemask` are implementation details of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`, removing the Python bindings for them since there is no need to expose them. Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Imported from OSS Reviewed By: albanD, bugra Differential Revision: D26314173 fbshipit-source-id: 733c93a3951453e739b6ed46b72fbad2244f6e97 (cherry picked from commit 33afb5f19f4e427f099653139ae45b661b8bc596)	2021-02-12 07:37:00 -08:00
James Reed	ae11589710	[FX][1.8] Cherrypick three FX fixes to 1.8 (#52021 ) * Fix leaf modules in Transformer [ghstack-poisoned] * Fix tuple type annotations [ghstack-poisoned] * Generalize dict key check in `create-arg` (#51927) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51927 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26329655 Pulled By: jamesr66a fbshipit-source-id: a15e7d9564551521af12a8fde1c7524856f0cbc2	2021-02-12 07:35:34 -08:00
Yuxin Wu	9e5bcc1020	1.8 cherrypick: Add metacompile of Ternary if (#51789 ) (#51913 ) Summary: Fixes issue: https://github.com/pytorch/pytorch/issues/49728 ======== Ternary if operation fails in Torchscript when the condition variable is annotated as Final. Tests: ======= pytest -k test_ternary_static_if test/test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/51789 Reviewed By: gmagogsfm Differential Revision: D26278969 Pulled By: nikithamalgifb fbshipit-source-id: 27d1383290211503188428fb2e8b7749f59ba16e Co-authored-by: nikithamalgi <nikithamalgi@devvm146.prn0.facebook.com>	2021-02-09 21:34:26 -08:00
Eli Uriegas	fa8578241d	.jenkins: Release branch specific updates (#51982 )	2021-02-09 21:33:29 -08:00
Eli Uriegas	1368809532	[v1.8.0] [wip] doc_fix (#52006 ) Summary: tries to fix doc_test Pull Request resolved: https://github.com/pytorch/pytorch/pull/51825 Reviewed By: bertmaher Differential Revision: D26295583 Pulled By: ngimel fbshipit-source-id: 13f6e7f1675d810adfd4abd2d579e2812fe54c80 (cherry picked from commit 6c0bf28da651eb8ff1d2d0dcfe807ea757fb61e5) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Co-authored-by: Natalia Gimelshein <ngimel@fb.com>	2021-02-09 21:32:32 -08:00
James Reed	4073248fc2	[FX] Hide experimental folder (#51987 )	2021-02-09 15:44:33 -08:00
Jane (Yuan) Xu	75153cb730	Disable unaliged-access test from TestVectorizedMemoryAccess.CopyKernel (#51864 ) (#51890 ) Summary: Test begins to fail after the driver udpate See https://github.com/pytorch/pytorch/issues/51863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51864 Reviewed By: bertmaher Differential Revision: D26304018 Pulled By: malfet fbshipit-source-id: bb7ade2f28d8cf8f847159d4ce92391f0794c258 Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-02-09 10:17:18 -08:00
Rong Rong	5bb69b080c	concantenate LICENSE files when building a wheel (#51634 ) (#51882 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50695 I checked locally that the concatenated license file appears at `torch-<version>.dist-info/LICENSE` in the wheel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51634 Reviewed By: zhangguanheng66 Differential Revision: D26225550 Pulled By: walterddr fbshipit-source-id: 830c59fb7aea0eb50b99e295edddad9edab6ba3a Co-authored-by: mattip <matti.picus@gmail.com>	2021-02-09 10:16:12 -08:00
 @ -1 +1 @@
 .2.1
 .1.0