gate load_library tests behind BUILD_TEST=1 (#46556 )

ghstack-source-id: 9147465bd7eb251b1b65f3f7d08861e1cd560214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46550
properly handle getGraphExecutorOptimize to not leak memory due to (#46621 )
2025-11-01 22:14:53 +08:00 · 2020-10-20 18:57:37 -07:00 · 2020-10-20 17:02:54 -07:00 · 2020-10-20 08:08:08 -07:00 · 2020-10-19 17:53:21 -07:00 · 2020-10-16 13:40:15 -04:00
3153 changed files with 84323 additions and 296757 deletions
--- a/.circleci/README.md
+++ b/.circleci/README.md
@ -31,7 +31,7 @@ Usage
 1. Make changes to these scripts.
 2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`.

-You'll see a build failure on GitHub if the scripts don't agree with the checked-in version.
+You'll see a build failure on TravisCI if the scripts don't agree with the checked-in version.


 Motivation
@ -55,7 +55,7 @@ Future direction
 See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):

 In contrast with a full recursive tree traversal of configuration dimensions,
-> in the future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
+> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.

 ----------------
 ----------------
@ -90,7 +90,7 @@ The binaries are built in CircleCI. There are nightly binaries built every night

 We have 3 types of binary packages

-* pip packages - nightlies are stored on s3 (pip install -f \<a s3 url\>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
+* pip packages - nightlies are stored on s3 (pip install -f <a s3 url>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
 * conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix
 * libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only
    * shared with dependencies (the only supported option for Windows)
@ -104,16 +104,16 @@ All binaries are built in CircleCI workflows except Windows. There are checked-i

 Some quick vocab:

-* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
+* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
 * **jobs** are a sequence of '**steps**'
-* **steps** are usually just a bash script or a builtin CircleCI command. *All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
+* **steps** are usually just a bash script or a builtin CircleCI command.* All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
 * CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.

 ## How are the workflows structured?

 The nightly binaries have 3 workflows. We have one job (actually 3 jobs:  build, test, and upload) per binary configuration

-1. binary_builds
+1. binarybuilds
    1. every day midnight EST
    2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
    3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
@ -144,7 +144,7 @@ The nightly binaries have 3 workflows. We have one job (actually 3 jobs:  build,

 ## How are the jobs structured?

-The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
+The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .

 * Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
    * binary_linux_build.sh
@ -204,7 +204,7 @@ TODO: fill in stuff

 ## Overview

-The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder), which is a repo that defines how all the binaries are built. The relevant code is
+The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder) , which is a repo that defines how all the binaries are built. The relevant code is


 ```
@ -260,7 +260,7 @@ Linux, MacOS and Windows use the same code flow for the conda builds.
 Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html

 Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
-tl;dr on conda-build is
+tldr; on conda-build is

 1. Creates a brand new conda environment, based off of deps in the meta.yaml
    1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml
@ -270,7 +270,7 @@ tl;dr on conda-build is
 4. Runs some simple import tests (if specified in the meta.yaml)
 5. Saves the finished package as a tarball

-The build.sh we use is essentially a wrapper around `python setup.py build`, but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.
+The build.sh we use is essentially a wrapper around ```python setup.py build``` , but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.

 The entrypoint file `builder/conda/build_conda.sh` is complicated because

@ -355,15 +355,15 @@ The Dockerfiles are available in pytorch/builder, but there is no circleci job o

 # How to manually rebuild the binaries

-tl;dr make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
+tldr; make a PR that looks like https://github.com/pytorch/pytorch/pull/21159

 Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.

 ## How to test changes to the binaries via .circleci

-Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using `.circleci/regenerate.sh` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.
+Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using ```.circleci/regenerate.sh``` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.

-```sh
+```
 # Make your changes
 touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml

@ -408,7 +408,7 @@ The advantage of this flow is that you can make new changes to the base commit a

 You can build Linux binaries locally easily using docker.

-```sh
+```
 # Run the docker
 # Use the correct docker image, pytorch/conda-cuda used here as an example
 #
@ -451,7 +451,7 @@ There’s no easy way to generate reproducible hermetic MacOS environments. If y

 But if you want to try, then I’d recommend

-```sh
+```
 # Create a new terminal
 # Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
 # know how to do
--- a/.circleci/cimodel/data/binary_build_data.py
+++ b/.circleci/cimodel/data/binary_build_data.py
@ -30,12 +30,12 @@ def get_processor_arch_name(gpu_version):
        "cu" + gpu_version.strip("cuda") if gpu_version.startswith("cuda") else gpu_version
    )

+
 LINUX_PACKAGE_VARIANTS = OrderedDict(
    manywheel=[
        "3.6m",
        "3.7m",
        "3.8m",
-        "3.9m"
    ],
    conda=dimensions.STANDARD_PYTHON_VERSIONS,
    libtorch=[
@ -52,14 +52,6 @@ CONFIG_TREE_DATA = OrderedDict(
            "3.7",
        ],
    )),
-    macos_arm64=([None], OrderedDict(
-        wheel=[
-            "3.8",
-        ],
-        conda=[
-            "3.8",
-        ],
-    )),
    # Skip CUDA-9.2 builds on Windows
    windows=(
        [v for v in dimensions.GPU_VERSIONS if v not in ['cuda92'] + dimensions.ROCM_VERSION_LABELS],
--- a/.circleci/cimodel/data/binary_build_definitions.py
+++ b/.circleci/cimodel/data/binary_build_definitions.py
@ -164,7 +164,7 @@ def gen_build_env_list(smoke):
            c.find_prop("gpu"),
            c.find_prop("package_format"),
            [c.find_prop("pyver")],
-            c.find_prop("smoke") and not (c.find_prop("os_name") == "macos_arm64"),  # don't test arm64
+            c.find_prop("smoke"),
            c.find_prop("libtorch_variant"),
            c.find_prop("gcc_config_variant"),
            c.find_prop("libtorch_config_variant"),
@ -216,9 +216,7 @@ def get_jobs(toplevel_key, smoke):
    configs = gen_build_env_list(smoke)
    phase = "build" if toplevel_key == "binarybuilds" else "test"
    for build_config in configs:
-        # don't test for macos_arm64 as it's cross compiled
-        if phase != "test" or build_config.os != "macos_arm64":
-            jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))
+        jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))

    return jobs_list

--- a/.circleci/cimodel/data/dimensions.py
+++ b/.circleci/cimodel/data/dimensions.py
@ -1,14 +1,15 @@
 PHASES = ["build", "test"]

 CUDA_VERSIONS = [
+    "92",
    "101",
    "102",
-    "111",
+    "110",
 ]

 ROCM_VERSIONS = [
-    "3.10",
-    "4.0.1",
+    "3.7",
+    "3.8",
 ]

 ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]
@ -19,5 +20,4 @@ STANDARD_PYTHON_VERSIONS = [
    "3.6",
    "3.7",
    "3.8",
-    "3.9"
 ]
--- a/.circleci/cimodel/data/pytorch_build_data.py
+++ b/.circleci/cimodel/data/pytorch_build_data.py
@ -18,11 +18,7 @@ CONFIG_TREE_DATA = [
        ("clang", [
            ("5", [
                ("3.6", [
-                    ("asan", [
-                        (True, [
-                            ("shard_test", [XImportant(True)]),
-                        ]),
-                    ]),
+                    ("asan", [XImportant(True)]),
                ]),
            ]),
            ("7", [
@ -49,22 +45,14 @@ CONFIG_TREE_DATA = [
            ]),
            ("10.2", [
                ("3.6", [
-                    ("shard_test", [XImportant(True)]),
-                    ("libtorch", [
-                        (True, [
-                            ('build_only', [X(True)]),
-                        ]),
-                    ]),
+                    ("important", [X(True)]),
+                    ("libtorch", [X(True)]),
                ]),
            ]),
-            ("11.1", [
+            ("11.0", [
                ("3.8", [
                    X(True),
-                    ("libtorch", [
-                        (True, [
-                            ('build_only', [XImportant(True)]),
-                        ]),
-                    ]),
+                    ("libtorch", [XImportant(True)])
                ]),
            ]),
        ]),
@ -84,16 +72,12 @@ CONFIG_TREE_DATA = [
        ("gcc", [
            ("9", [
                ("3.8", [
-                    ("coverage", [
-                        (True, [
-                            ("shard_test", [XImportant(True)]),
-                        ]),
-                    ]),
+                    ("coverage", [XImportant(True)]),
                ]),
            ]),
        ]),
        ("rocm", [
-            ("3.9", [
+            ("3.7", [
                ("3.6", [
                    ('build_only', [XImportant(True)]),
                ]),
@ -174,7 +158,6 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
            "libtorch": LibTorchConfigNode,
            "important": ImportantConfigNode,
            "build_only": BuildOnlyConfigNode,
-            "shard_test": ShardTestConfigNode,
            "cuda_gcc_override": CudaGccOverrideConfigNode,
            "coverage": CoverageConfigNode,
            "pure_torch": PureTorchConfigNode,
@ -212,7 +195,7 @@ class AsanConfigNode(TreeConfigNode):
        self.props["is_asan"] = node_name

    def child_constructor(self):
-        return ExperimentalFeatureConfigNode
+        return ImportantConfigNode


 class ONNXConfigNode(TreeConfigNode):
@ -267,7 +250,7 @@ class LibTorchConfigNode(TreeConfigNode):
        self.props["is_libtorch"] = node_name

    def child_constructor(self):
-        return ExperimentalFeatureConfigNode
+        return ImportantConfigNode


 class CudaGccOverrideConfigNode(TreeConfigNode):
@ -277,8 +260,8 @@ class CudaGccOverrideConfigNode(TreeConfigNode):
    def child_constructor(self):
        return ExperimentalFeatureConfigNode

-
 class BuildOnlyConfigNode(TreeConfigNode):
+
    def init2(self, node_name):
        self.props["build_only"] = node_name

@ -286,15 +269,8 @@ class BuildOnlyConfigNode(TreeConfigNode):
        return ExperimentalFeatureConfigNode


-class ShardTestConfigNode(TreeConfigNode):
-    def init2(self, node_name):
-        self.props["shard_test"] = node_name
-
-    def child_constructor(self):
-        return ImportantConfigNode
-
-
 class CoverageConfigNode(TreeConfigNode):
+
    def init2(self, node_name):
        self.props["is_coverage"] = node_name

@ -314,6 +290,7 @@ class ImportantConfigNode(TreeConfigNode):


 class XenialCompilerConfigNode(TreeConfigNode):
+
    def modify_label(self, label):
        return label or "<unspecified>"

@ -327,6 +304,7 @@ class XenialCompilerConfigNode(TreeConfigNode):


 class BionicCompilerConfigNode(TreeConfigNode):
+
    def modify_label(self, label):
        return label or "<unspecified>"

--- a/.circleci/cimodel/data/pytorch_build_definitions.py
+++ b/.circleci/cimodel/data/pytorch_build_definitions.py
@ -272,7 +272,6 @@ def instantiate_configs():
        compiler_version = fc.find_prop("compiler_version")
        is_xla = fc.find_prop("is_xla") or False
        is_asan = fc.find_prop("is_asan") or False
-        is_coverage = fc.find_prop("is_coverage") or False
        is_onnx = fc.find_prop("is_onnx") or False
        is_pure_torch = fc.find_prop("is_pure_torch") or False
        is_vulkan = fc.find_prop("is_vulkan") or False
@ -311,10 +310,7 @@ def instantiate_configs():
            parms_list.append("asan")
            python_version = fc.find_prop("pyver")
            parms_list[0] = fc.find_prop("abbreviated_pyver")
-
-        if is_coverage:
-            parms_list_ignored_for_docker_image.append("coverage")
-            python_version = fc.find_prop("pyver")
+            restrict_phases = ["build", "test1", "test2"]

        if is_onnx:
            parms_list.append("onnx")
@ -330,13 +326,13 @@ def instantiate_configs():
        is_important = fc.find_prop("is_important") or False
        parallel_backend = fc.find_prop("parallel_backend") or None
        build_only = fc.find_prop("build_only") or False
-        shard_test = fc.find_prop("shard_test") or False
+        is_coverage = fc.find_prop("is_coverage") or False
        # TODO: fix pure_torch python test packaging issue.
-        if shard_test:
-            restrict_phases = ["build"] if restrict_phases is None else restrict_phases
-            restrict_phases.extend(["test1", "test2"])
        if build_only or is_pure_torch:
            restrict_phases = ["build"]
+        if is_coverage and restrict_phases is None:
+            restrict_phases = ["build", "coverage_test"]
+

        gpu_resource = None
        if cuda_version and cuda_version != "10":
--- a/.circleci/cimodel/data/simple/android_definitions.py
+++ b/.circleci/cimodel/data/simple/android_definitions.py
@ -79,6 +79,7 @@ WORKFLOW_DATA = [
    AndroidJob(["x86_64"], "pytorch_linux_build"),
    AndroidJob(["arm", "v7a"], "pytorch_linux_build"),
    AndroidJob(["arm", "v8a"], "pytorch_linux_build"),
+    AndroidJob(["vulkan", "x86_32"], "pytorch_linux_build", is_master_only=False),
    AndroidGradleJob(
        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32",
        "pytorch_android_gradle_build-x86_32",
--- a/.circleci/cimodel/data/simple/docker_definitions.py
+++ b/.circleci/cimodel/data/simple/docker_definitions.py
@ -6,19 +6,17 @@ from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN

 # TODO: make this generated from a matrix rather than just a static list
 IMAGE_NAMES = [
-    "pytorch-linux-bionic-cuda11.1-cudnn8-py3.6-gcc9",
-    "pytorch-linux-bionic-cuda11.1-cudnn8-py3.8-gcc9",
    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9",
    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9",
    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",
    "pytorch-linux-bionic-py3.6-clang9",
    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",
    "pytorch-linux-bionic-py3.8-gcc9",
+    "pytorch-linux-bionic-rocm3.5.1-py3.6",
    "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",
    "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",
    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
    "pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7",
-    "pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4",
    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7",
    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
@ -26,11 +24,12 @@ IMAGE_NAMES = [
    "pytorch-linux-xenial-py3-clang7-onnx",
    "pytorch-linux-xenial-py3.8",
    "pytorch-linux-xenial-py3.6-clang7",
+    "pytorch-linux-xenial-py3.6-gcc4.8",
    "pytorch-linux-xenial-py3.6-gcc5.4",  # this one is used in doc builds
    "pytorch-linux-xenial-py3.6-gcc7.2",
    "pytorch-linux-xenial-py3.6-gcc7",
-    "pytorch-linux-bionic-rocm3.9-py3.6",
-    "pytorch-linux-bionic-rocm3.10-py3.6",
+    "pytorch-linux-bionic-rocm3.7-py3.6",
+    "pytorch-linux-bionic-rocm3.8-py3.6",
 ]


@ -41,7 +40,7 @@ def get_workflow_jobs():
        parameters = OrderedDict({
            "name": quote(f"docker-{image_name}"),
            "image_name": quote(image_name),
-        })
+        }) 
        if image_name == "pytorch-linux-xenial-py3.6-gcc5.4":
            # pushing documentation on tags requires CircleCI to also
            # build all the dependencies on tags, including this docker image
--- a/.circleci/cimodel/data/simple/ge_config_tests.py
+++ b/.circleci/cimodel/data/simple/ge_config_tests.py
@ -61,16 +61,41 @@ WORKFLOW_DATA = [
        MultiPartVersion([3, 6], "py"),
        MultiPartVersion([5, 4], "gcc"),
        None,
-        ["jit_legacy", "test"],
+        ["ge_config_legacy", "test"],
        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),
+    GeConfigTestJob(
+        MultiPartVersion([3, 6], "py"),
+        MultiPartVersion([5, 4], "gcc"),
+        None,
+        ["ge_config_profiling", "test"],
+        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),
+    GeConfigTestJob(
+        MultiPartVersion([3, 6], "py"),
+        MultiPartVersion([5, 4], "gcc"),
+        None,
+        ["ge_config_simple", "test"],
+        ["pytorch_linux_xenial_py3_6_gcc5_4_build"],
+    ),
    GeConfigTestJob(
        None,
        None,
        CudaVersion(10, 2),
-        ["cudnn7", "py3", "jit_legacy", "test"],
+        ["cudnn7", "py3", "ge_config_legacy", "test"],
        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],
        use_cuda_docker=True,
-    ),
+        # TODO Why does the build environment specify cuda10.1, while the
+        # job name is cuda10_2?
+        build_env_override="pytorch-linux-xenial-cuda10.1-cudnn7-ge_config_legacy-test"),
+    GeConfigTestJob(
+        None,
+        None,
+        CudaVersion(10, 2),
+        ["cudnn7", "py3", "ge_config_profiling", "test"],
+        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],
+        use_cuda_docker=True,
+        # TODO Why does the build environment specify cuda10.1, while the
+        # job name is cuda10_2?
+        build_env_override="pytorch-linux-xenial-cuda10.1-cudnn7-ge_config_profiling-test"),
 ]


--- a/.circleci/cimodel/data/simple/ios_definitions.py
+++ b/.circleci/cimodel/data/simple/ios_definitions.py
@ -1,16 +1,16 @@
 from cimodel.data.simple.util.versions import MultiPartVersion
-import cimodel.lib.miniutils as miniutils

-XCODE_VERSION = MultiPartVersion([12, 0, 0])
+
+IOS_VERSION = MultiPartVersion([12, 0, 0])


 class ArchVariant:
-    def __init__(self, name, custom_build_name=""):
+    def __init__(self, name, is_custom=False):
        self.name = name
-        self.custom_build_name = custom_build_name
+        self.is_custom = is_custom

    def render(self):
-        extra_parts = [self.custom_build_name] if len(self.custom_build_name) > 0 else []
+        extra_parts = ["custom"] if self.is_custom else []
        return "_".join([self.name] + extra_parts)


@ -19,15 +19,15 @@ def get_platform(arch_variant_name):


 class IOSJob:
-    def __init__(self, xcode_version, arch_variant, is_org_member_context=True, extra_props=None):
-        self.xcode_version = xcode_version
+    def __init__(self, ios_version, arch_variant, is_org_member_context=True, extra_props=None):
+        self.ios_version = ios_version
        self.arch_variant = arch_variant
        self.is_org_member_context = is_org_member_context
        self.extra_props = extra_props

    def gen_name_parts(self, with_version_dots):

-        version_parts = self.xcode_version.render_dots_or_parts(with_version_dots)
+        version_parts = self.ios_version.render_dots_or_parts(with_version_dots)
        build_variant_suffix = "_".join([self.arch_variant.render(), "build"])

        return [
@ -61,10 +61,9 @@ class IOSJob:


 WORKFLOW_DATA = [
-    IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False),
-    IOSJob(XCODE_VERSION, ArchVariant("arm64")),
-    IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={"use_metal": miniutils.quote(str(int(True)))}),
-    IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={"op_list": "mobilenetv2.yaml"}),
+    IOSJob(IOS_VERSION, ArchVariant("x86_64"), is_org_member_context=False),
+    IOSJob(IOS_VERSION, ArchVariant("arm64")),
+    IOSJob(IOS_VERSION, ArchVariant("arm64", True), extra_props={"op_list": "mobilenetv2.yaml"}),
 ]


--- a/.circleci/cimodel/data/simple/nightly_ios.py
+++ b/.circleci/cimodel/data/simple/nightly_ios.py
@ -18,7 +18,7 @@ class IOSNightlyJob:

        common_name_pieces = [
            "ios",
-        ] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [
+        ] + ios_definitions.IOS_VERSION.render_dots_or_parts(with_version_dots) + [
            "nightly",
            self.variant,
            "build",
--- a/.circleci/cimodel/data/simple/util/versions.py
+++ b/.circleci/cimodel/data/simple/util/versions.py
@ -9,7 +9,7 @@ class MultiPartVersion:
        with the prefix string.
        """
        if self.parts:
-            return [self.prefix + str(self.parts[0])] + [str(part) for part in self.parts[1:]]
+            return [self.prefix + str(self.parts[0])] + list(map(str, self.parts[1:]))
        else:
            return [self.prefix]

@ -29,6 +29,3 @@ class CudaVersion(MultiPartVersion):
        self.minor = minor

        super().__init__([self.major, self.minor], "cuda")
-
-    def __str__(self):
-        return f"{self.major}.{self.minor}"
--- a/.circleci/cimodel/data/windows_build_definitions.py
+++ b/.circleci/cimodel/data/windows_build_definitions.py
@ -86,11 +86,10 @@ class WindowsJob:
                props_dict["executor"] = "windows-with-nvidia-gpu"

        props_dict["cuda_version"] = (
-            miniutils.quote(str(self.cuda_version))
+            miniutils.quote(str(self.cuda_version.major))
            if self.cuda_version
            else "cpu"
        )
-
        props_dict["name"] = "_".join(name_parts)

        return [{key_name: props_dict}]
@ -132,10 +131,10 @@ WORKFLOW_DATA = [
    WindowsJob(None, _VC2019, CudaVersion(10, 1)),
    WindowsJob(1, _VC2019, CudaVersion(10, 1)),
    WindowsJob(2, _VC2019, CudaVersion(10, 1)),
-    # VS2019 CUDA-11.1
-    WindowsJob(None, _VC2019, CudaVersion(11, 1)),
-    WindowsJob(1, _VC2019, CudaVersion(11, 1), master_only_pred=TruePred),
-    WindowsJob(2, _VC2019, CudaVersion(11, 1), master_only_pred=TruePred),
+    # VS2019 CUDA-11.0
+    WindowsJob(None, _VC2019, CudaVersion(11, 0)),
+    WindowsJob(1, _VC2019, CudaVersion(11, 0), master_only_pred=TruePred),
+    WindowsJob(2, _VC2019, CudaVersion(11, 0), master_only_pred=TruePred),
    # VS2019 CPU-only
    WindowsJob(None, _VC2019, None),
    WindowsJob(1, _VC2019, None, master_only_pred=TruePred),
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
--- a/.circleci/docker/build.sh
+++ b/.circleci/docker/build.sh
@ -40,7 +40,9 @@ function extract_all_from_image_name() {
  done
 }

-if [[ "$image" == *-xenial* ]]; then
+if [[ "$image" == *-trusty* ]]; then
+  UBUNTU_VERSION=14.04
+elif [[ "$image" == *-xenial* ]]; then
  UBUNTU_VERSION=16.04
 elif [[ "$image" == *-artful* ]]; then
  UBUNTU_VERSION=17.10
@ -77,10 +79,19 @@ TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/u
 # from scratch
 case "$image" in
  pytorch-linux-xenial-py3.8)
-    ANACONDA_PYTHON_VERSION=3.8
+    # TODO: This is a hack, get rid of this as soon as you get rid of the travis downloads
+    TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/16.04/x86_64"
+    TRAVIS_PYTHON_VERSION=3.8
    GCC_VERSION=7
    # Do not install PROTOBUF, DB, and VISION as a test
    ;;
+  pytorch-linux-xenial-py3.6-gcc4.8)
+    ANACONDA_PYTHON_VERSION=3.6
+    GCC_VERSION=4.8
+    PROTOBUF=yes
+    DB=yes
+    VISION=yes
+    ;;
  pytorch-linux-xenial-py3.6-gcc5.4)
    ANACONDA_PYTHON_VERSION=3.6
    GCC_VERSION=5
@ -158,16 +169,6 @@ case "$image" in
    VISION=yes
    KATEX=yes
    ;;
-  pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)
-    CUDA_VERSION=11.1
-    CUDNN_VERSION=8
-    ANACONDA_PYTHON_VERSION=3.6
-    GCC_VERSION=7
-    PROTOBUF=yes
-    DB=yes
-    VISION=yes
-    KATEX=yes
-    ;;
  pytorch-linux-xenial-py3-clang5-asan)
    ANACONDA_PYTHON_VERSION=3.6
    CLANG_VERSION=5.0
@ -254,39 +255,19 @@ case "$image" in
    VISION=yes
    KATEX=yes
    ;;
-  pytorch-linux-bionic-cuda11.1-cudnn8-py3.6-gcc9)
-    CUDA_VERSION=11.1
-    CUDNN_VERSION=8
-    ANACONDA_PYTHON_VERSION=3.6
-    GCC_VERSION=9
-    PROTOBUF=yes
-    DB=yes
-    VISION=yes
-    KATEX=yes
-    ;;
-  pytorch-linux-bionic-cuda11.1-cudnn8-py3.8-gcc9)
-    CUDA_VERSION=11.1
-    CUDNN_VERSION=8
-    ANACONDA_PYTHON_VERSION=3.8
-    GCC_VERSION=9
-    PROTOBUF=yes
-    DB=yes
-    VISION=yes
-    KATEX=yes
-    ;;
-  pytorch-linux-bionic-rocm3.9-py3.6)
+  pytorch-linux-bionic-rocm3.7-py3.6)
    ANACONDA_PYTHON_VERSION=3.6
    PROTOBUF=yes
    DB=yes
    VISION=yes
-    ROCM_VERSION=3.9
+    ROCM_VERSION=3.7
    ;;
-  pytorch-linux-bionic-rocm3.10-py3.6)
+  pytorch-linux-bionic-rocm3.8-py3.6)
    ANACONDA_PYTHON_VERSION=3.6
    PROTOBUF=yes
    DB=yes
    VISION=yes
-    ROCM_VERSION=3.10
+    ROCM_VERSION=3.8
    ;;
  *)
    # Catch-all for builds that are not hardcoded.
@ -353,6 +334,7 @@ docker build \
       --build-arg "GLIBC_VERSION=${GLIBC_VERSION}" \
       --build-arg "CLANG_VERSION=${CLANG_VERSION}" \
       --build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \
+       --build-arg "TRAVIS_PYTHON_VERSION=${TRAVIS_PYTHON_VERSION}" \
       --build-arg "GCC_VERSION=${GCC_VERSION}" \
       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \
       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
@ -395,6 +377,19 @@ if [[ "$OS" == "ubuntu" ]]; then
  fi
 fi

+if [ -n "$TRAVIS_PYTHON_VERSION" ]; then
+  if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]]; then
+    if !(drun python --version 2>&1 | grep -qF "Python $TRAVIS_PYTHON_VERSION"); then
+      echo "TRAVIS_PYTHON_VERSION=$TRAVIS_PYTHON_VERSION, but:"
+      drun python --version
+      exit 1
+    fi
+  else
+    echo "Please manually check nightly is OK:"
+    drun python --version
+  fi
+fi
+
 if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
  if !(drun python --version 2>&1 | grep -qF "Python $ANACONDA_PYTHON_VERSION"); then
    echo "ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION, but:"
--- a/.circleci/docker/centos-rocm/Dockerfile
+++ b/.circleci/docker/centos-rocm/Dockerfile
@ -27,7 +27,7 @@ RUN rm install_glibc.sh
 ADD ./common/install_user.sh install_user.sh
 RUN bash ./install_user.sh && rm install_user.sh

-# Install conda and other packages (e.g., numpy, coverage, pytest)
+# Install conda
 ENV PATH /opt/conda/bin:$PATH
 ARG ANACONDA_PYTHON_VERSION
 ADD ./common/install_conda.sh install_conda.sh
@ -64,6 +64,7 @@ ENV PATH /opt/rocm/hcc/bin:$PATH
 ENV PATH /opt/rocm/hip/bin:$PATH
 ENV PATH /opt/rocm/opencl/bin:$PATH
 ENV PATH /opt/rocm/llvm/bin:$PATH
+ENV HIP_PLATFORM hcc
 ENV LANG en_US.utf8
 ENV LC_ALL en_US.utf8

--- a/.circleci/docker/common/install_base.sh
+++ b/.circleci/docker/common/install_base.sh
@ -18,6 +18,7 @@ install_ubuntu() {
  # Install common dependencies
  apt-get update
  # TODO: Some of these may not be necessary
+  # TODO: libiomp also gets installed by conda, aka there's a conflict
  ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"
  numpy_deps="gfortran"
  apt-get install -y --no-install-recommends \
@ -39,11 +40,21 @@ install_ubuntu() {
    libjpeg-dev \
    libasound2-dev \
    libsndfile-dev \
+    python \
+    python-dev \
+    python-setuptools \
+    python-wheel \
    software-properties-common \
    sudo \
    wget \
    vim

+  # TODO: THIS IS A HACK!!!
+  # distributed nccl(2) tests are a bit busted, see https://github.com/pytorch/pytorch/issues/5877
+  if dpkg -s libnccl-dev; then
+    apt-get remove -y libnccl-dev libnccl2 --allow-change-held-packages
+  fi
+
  # Cleanup package manager
  apt-get autoclean && apt-get clean
  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
--- a/.circleci/docker/common/install_cache.sh
+++ b/.circleci/docker/common/install_cache.sh
@ -2,28 +2,6 @@

 set -ex

-install_ubuntu() {
-  echo "Preparing to build sccache from source"
-  apt-get update
-  apt-get install -y cargo pkg-config libssl-dev
-  echo "Checking out sccache repo"
-  git clone https://github.com/pytorch/sccache
-  cd sccache
-  echo "Building sccache"
-  cargo build --release
-  cp target/release/sccache /opt/cache/bin
-  echo "Cleaning up"
-  cd ..
-  rm -rf sccache
-  apt-get remove -y cargo rustc
-  apt-get autoclean && apt-get clean
-}
-
-install_binary() {
-  echo "Downloading sccache binary from S3 repo"
-  curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
-}
-
 mkdir -p /opt/cache/bin
 mkdir -p /opt/cache/lib
 sed -e 's|PATH="\(.*\)"|PATH="/opt/cache/bin:\1"|g' -i /etc/environment
@ -33,20 +11,12 @@ export PATH="/opt/cache/bin:$PATH"
 if [ -n "$ROCM_VERSION" ]; then
  curl --retry 3 http://repo.radeon.com/misc/.sccache_amd/sccache -o /opt/cache/bin/sccache
 else
-  ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
-  case "$ID" in
-    ubuntu)
-      install_ubuntu
-      ;;
-    *)
-      install_binary
-      ;;
-  esac
+  curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
 fi
 chmod a+x /opt/cache/bin/sccache

 function write_sccache_stub() {
-  printf "#!/bin/sh\nif [ \$(ps -p \$PPID -o comm=) != sccache ]; then\n  exec sccache $(which $1) \"\$@\"\nelse\n  exec $(which $1) \"\$@\"\nfi" > "/opt/cache/bin/$1"
+  printf "#!/bin/sh\nexec sccache $(which $1) \$*" > "/opt/cache/bin/$1"
  chmod a+x "/opt/cache/bin/$1"
 }

@ -68,8 +38,8 @@ if [ -n "$CUDA_VERSION" ]; then
  # where CUDA is installed.  Instead, we install an nvcc symlink outside
  # of the PATH, and set CUDA_NVCC_EXECUTABLE so that we make use of it.

-  write_sccache_stub nvcc
-  mv /opt/cache/bin/nvcc /opt/cache/lib/
+  printf "#!/bin/sh\nexec sccache $(which nvcc) \"\$@\"" > /opt/cache/lib/nvcc
+  chmod a+x /opt/cache/lib/nvcc
 fi

 if [ -n "$ROCM_VERSION" ]; then
@ -87,8 +57,8 @@ if [ -n "$ROCM_VERSION" ]; then
    TOPDIR=$(dirname $OLDCOMP)
    WRAPPED="$TOPDIR/original/$COMPNAME"
    mv "$OLDCOMP" "$WRAPPED"
-    printf "#!/bin/sh\nexec sccache $WRAPPED \"\$@\"" > "$OLDCOMP"
-    chmod a+x "$OLDCOMP"
+    printf "#!/bin/sh\nexec sccache $WRAPPED \$*" > "$OLDCOMP"
+    chmod a+x "$1"
  }

  if [[ -e "/opt/rocm/hcc/bin/hcc" ]]; then
--- a/.circleci/docker/common/install_conda.sh
+++ b/.circleci/docker/common/install_conda.sh
@ -72,13 +72,11 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
  # DO NOT install cmake here as it would install a version newer than 3.5, but
  # we want to pin to version 3.5.
  if [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
+    # DO NOT install typing if installing python-3.8, since its part of python-3.8 core packages
    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
-    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
-  elif [ "$ANACONDA_PYTHON_VERSION" = "3.7" ]; then
-    # DO NOT install dataclasses if installing python-3.7, since its part of python-3.7 core packages
-    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six typing_extensions
+    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0 dataclasses
  else
-    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions
+    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi typing future six dataclasses
  fi
  if [[ "$CUDA_VERSION" == 9.2* ]]; then
    conda_install magma-cuda92 -c pytorch
@ -90,40 +88,18 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
    conda_install magma-cuda102 -c pytorch
  elif [[ "$CUDA_VERSION" == 11.0* ]]; then
    conda_install magma-cuda110 -c pytorch
-  elif [[ "$CUDA_VERSION" == 11.1* ]]; then
-    conda_install magma-cuda111 -c pytorch
-  elif [[ "$CUDA_VERSION" == 11.2* ]]; then
-    conda_install magma-cuda112 -c pytorch
  fi

  # TODO: This isn't working atm
  conda_install nnpack -c killeent

-  # Install some other packages, including those needed for Python test reporting
+  # Install some other packages
  # TODO: Why is scipy pinned
-  # Pin MyPy version because new errors are likely to appear with each release
-  # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
-  as_jenkins pip install --progress-bar off pytest \
-    scipy==1.1.0 \
-    scikit-image \
-    librosa>=0.6.2 \
-    psutil \
-    numba \
-    llvmlite \
-    unittest-xml-reporting \
-    boto3==1.16.34 \
-    coverage \
-    hypothesis==4.53.2 \
-    mypy==0.770 \
-    tb-nightly
-
-  # Update scikit-learn to a python-3.8 compatible version
-  if [[ $(python -c "import sys; print(int(sys.version_info >= (3, 8)))") == "1" ]]; then
-    as_jenkins pip install --progress-bar off -U scikit-learn
-  else
-    # Pinned scikit-learn due to https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5 only)
-    as_jenkins pip install --progress-bar off scikit-learn==0.20.3
-  fi
+  # numba & llvmlite is pinned because of https://github.com/numba/numba/issues/4368
+  # scikit-learn is pinned because of
+  # https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5
+  # only)
+  as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.46.0 llvmlite==0.30.0

  popd
 fi
--- a/.circleci/docker/common/install_gcc.sh
+++ b/.circleci/docker/common/install_gcc.sh
@ -15,7 +15,6 @@ if [ -n "$GCC_VERSION" ]; then

  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
  update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
-  update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

  # Cleanup package manager
  apt-get autoclean && apt-get clean
--- a/.circleci/docker/common/install_lcov.sh
+++ b/.circleci/docker/common/install_lcov.sh
@ -1,8 +0,0 @@
-#!/bin/bash
-
-set -ex
-
-git clone --branch v1.15 https://github.com/linux-test-project/lcov.git
-pushd lcov
-sudo make install   # will be installed in /usr/local/bin/lcov
-popd
--- a/.circleci/docker/common/install_nccl.sh
+++ b/.circleci/docker/common/install_nccl.sh
@ -1,4 +0,0 @@
-#!/bin/bash
-
-sudo apt-get -qq update
-sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
--- a/.circleci/docker/common/install_openmpi.sh
+++ b/.circleci/docker/common/install_openmpi.sh
@ -1,4 +0,0 @@
-#!/bin/bash
-
-sudo apt-get update
-sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
--- a/.circleci/docker/common/install_rocm.sh
+++ b/.circleci/docker/common/install_rocm.sh
@ -2,22 +2,6 @@

 set -ex

-install_magma() {
-    # "install" hipMAGMA into /opt/rocm/magma by copying after build
-    git clone https://bitbucket.org/icl/magma.git -b hipMAGMA
-    pushd magma
-    cp make.inc-examples/make.inc.hip-mkl-gcc make.inc
-    echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
-    echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
-    echo 'DEVCCFLAGS += --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908' >> make.inc
-    export PATH="${PATH}:/opt/rocm/bin"
-    make -f make.gen.hipMAGMA -j $(nproc)
-    make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
-    make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
-    popd
-    mv magma /opt/rocm
-}
-
 install_ubuntu() {
    apt-get update
    if [[ $UBUNTU_VERSION == 18.04 ]]; then
@ -26,20 +10,28 @@ install_ubuntu() {
    fi
    apt-get install -y kmod
    apt-get install -y wget
+    apt-get install -y libopenblas-dev

    # Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime
    apt-get install -y libc++1
    apt-get install -y libc++abi1

+    DEB_ROCM_REPO=http://repo.radeon.com/rocm/apt/${ROCM_VERSION}
    # Add rocm repository
-    wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
-    echo "deb [arch=amd64] http://repo.radeon.com/rocm/apt/${ROCM_VERSION} xenial main" > /etc/apt/sources.list.d/rocm.list
+    wget -qO - $DEB_ROCM_REPO/rocm.gpg.key | apt-key add -
+    echo "deb [arch=amd64] $DEB_ROCM_REPO xenial main" > /etc/apt/sources.list.d/rocm.list
    apt-get update --allow-insecure-repositories

    DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
                   rocm-dev \
                   rocm-utils \
-                   rocm-libs \
+                   rocfft \
+                   miopen-hip \
+                   rocblas \
+                   hipsparse \
+                   rocrand \
+                   hipcub \
+                   rocthrust \
                   rccl \
                   rocprofiler-dev \
                   roctracer-dev
@ -53,11 +45,9 @@ install_ubuntu() {
      DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENKERNELS}
    fi

-    install_magma
-
-    # Cleanup
-    apt-get autoclean && apt-get clean
-    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
+  # Cleanup
+  apt-get autoclean && apt-get clean
+  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
 }

 install_centos() {
@ -81,13 +71,17 @@ install_centos() {
  yum install -y \
                   rocm-dev \
                   rocm-utils \
-                   rocm-libs \
+                   rocfft \
+                   miopen-hip \
+                   rocblas \
+                   hipsparse \
+                   rocrand \
                   rccl \
+                   hipcub \
+                   rocthrust \
                   rocprofiler-dev \
                   roctracer-dev

-  install_magma
-
  # Cleanup
  yum clean all
  rm -rf /var/cache/yum
--- a/.circleci/docker/common/install_travis_python.sh
+++ b/.circleci/docker/common/install_travis_python.sh
@ -0,0 +1,79 @@
+#!/bin/bash
+
+set -ex
+
+as_jenkins() {
+  # NB: Preserve PATH and LD_LIBRARY_PATH changes
+  sudo -H -u jenkins env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
+}
+
+if [ -n "$TRAVIS_PYTHON_VERSION" ]; then
+
+  mkdir -p /opt/python
+  chown jenkins:jenkins /opt/python
+
+  # Download Python binary from Travis
+  pushd tmp
+  as_jenkins wget --quiet ${TRAVIS_DL_URL_PREFIX}/python-$TRAVIS_PYTHON_VERSION.tar.bz2
+  # NB: The tarball also comes with /home/travis virtualenv that we
+  # don't care about.  (Maybe we should, but we've worked around the
+  # "how do I install to python" issue by making this entire directory
+  # user-writable "lol")
+  # NB: Relative ordering of opt/python and flags matters
+  as_jenkins tar xjf python-$TRAVIS_PYTHON_VERSION.tar.bz2 --strip-components=2 --directory /opt/python opt/python
+  popd
+
+  echo "/opt/python/$TRAVIS_PYTHON_VERSION/lib" > /etc/ld.so.conf.d/travis-python.conf
+  ldconfig
+  sed -e 's|PATH="\(.*\)"|PATH="/opt/python/'"$TRAVIS_PYTHON_VERSION"'/bin:\1"|g' -i /etc/environment
+  export PATH="/opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH"
+
+  python --version
+  pip --version
+
+  # Install pip from source.
+  # The python-pip package on Ubuntu Trusty is old
+  # and upon install numpy doesn't use the binary
+  # distribution, and fails to compile it from source.
+  pushd tmp
+  as_jenkins curl -L -O https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz
+  as_jenkins tar zxf pip-9.0.1.tar.gz
+  pushd pip-9.0.1
+  as_jenkins python setup.py install
+  popd
+  rm -rf pip-9.0.1*
+  popd
+
+  # Install pip packages
+  as_jenkins pip install --upgrade pip
+
+  pip --version
+
+  as_jenkins pip install numpy pyyaml
+
+  as_jenkins pip install \
+      future \
+      hypothesis \
+      protobuf \
+      pytest \
+      pillow \
+      typing \
+      dataclasses
+
+  as_jenkins pip install mkl mkl-devel
+
+  # SciPy does not support Python 3.7 or Python 2.7.9
+  if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]] && [[ "$TRAVIS_PYTHON_VERSION" != "2.7.9" ]]; then
+      as_jenkins pip install scipy==1.1.0 scikit-image librosa>=0.6.2
+  fi
+
+  # Install psutil for dataloader tests
+  as_jenkins pip install psutil
+
+  # Install dill for serialization tests
+  as_jenkins pip install "dill>=0.3.1"
+
+  # Cleanup package manager
+  apt-get autoclean && apt-get clean
+  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
+fi
--- a/.circleci/docker/ubuntu-cuda/Dockerfile
+++ b/.circleci/docker/ubuntu-cuda/Dockerfile
@ -24,7 +24,7 @@ ARG KATEX
 ADD ./common/install_katex.sh install_katex.sh
 RUN bash ./install_katex.sh && rm install_katex.sh

-# Install conda and other packages (e.g., numpy, coverage, pytest)
+# Install conda
 ENV PATH /opt/conda/bin:$PATH
 ARG ANACONDA_PYTHON_VERSION
 ADD ./common/install_conda.sh install_conda.sh
@ -40,6 +40,12 @@ ARG CLANG_VERSION
 ADD ./common/install_clang.sh install_clang.sh
 RUN bash ./install_clang.sh && rm install_clang.sh

+# Install non-standard Python versions (via Travis binaries)
+ARG TRAVIS_PYTHON_VERSION
+ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH
+ADD ./common/install_travis_python.sh install_travis_python.sh
+RUN bash ./install_travis_python.sh && rm install_travis_python.sh
+
 # (optional) Install protobuf for ONNX
 ARG PROTOBUF
 ADD ./common/install_protobuf.sh install_protobuf.sh
@ -72,16 +78,6 @@ ADD ./common/install_jni.sh install_jni.sh
 ADD ./java/jni.h jni.h
 RUN bash ./install_jni.sh && rm install_jni.sh

-# Install NCCL for when CUDA is version 10.1
-ADD ./common/install_nccl.sh install_nccl.sh
-RUN if [ "${CUDA_VERSION}" = 10.1 ]; then bash ./install_nccl.sh; fi
-RUN rm install_nccl.sh
-
-# Install Open MPI for CUDA
-ADD ./common/install_openmpi.sh install_openmpi.sh
-RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
-RUN rm install_openmpi.sh
-
 # Include BUILD_ENVIRONMENT environment variable in image
 ARG BUILD_ENVIRONMENT
 ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
--- a/.circleci/docker/ubuntu-rocm/Dockerfile
+++ b/.circleci/docker/ubuntu-rocm/Dockerfile
@ -21,7 +21,7 @@ RUN bash ./install_clang.sh && rm install_clang.sh
 ADD ./common/install_user.sh install_user.sh
 RUN bash ./install_user.sh && rm install_user.sh

-# Install conda and other packages (e.g., numpy, coverage, pytest)
+# Install conda
 ENV PATH /opt/conda/bin:$PATH
 ARG ANACONDA_PYTHON_VERSION
 ADD ./common/install_conda.sh install_conda.sh
@ -58,7 +58,7 @@ ENV PATH /opt/rocm/hcc/bin:$PATH
 ENV PATH /opt/rocm/hip/bin:$PATH
 ENV PATH /opt/rocm/opencl/bin:$PATH
 ENV PATH /opt/rocm/llvm/bin:$PATH
-ENV MAGMA_HOME /opt/rocm/magma
+ENV HIP_PLATFORM hcc
 ENV LANG C.UTF-8
 ENV LC_ALL C.UTF-8

--- a/.circleci/docker/ubuntu/Dockerfile
+++ b/.circleci/docker/ubuntu/Dockerfile
@ -33,7 +33,7 @@ ARG KATEX
 ADD ./common/install_katex.sh install_katex.sh
 RUN bash ./install_katex.sh && rm install_katex.sh

-# Install conda and other packages (e.g., numpy, coverage, pytest)
+# Install conda
 ENV PATH /opt/conda/bin:$PATH
 ARG ANACONDA_PYTHON_VERSION
 ADD ./common/install_conda.sh install_conda.sh
@ -44,9 +44,12 @@ ARG GCC_VERSION
 ADD ./common/install_gcc.sh install_gcc.sh
 RUN bash ./install_gcc.sh && rm install_gcc.sh

-# Install lcov for C++ code coverage
-ADD ./common/install_lcov.sh install_lcov.sh
-RUN  bash ./install_lcov.sh && rm install_lcov.sh
+# Install non-standard Python versions (via Travis binaries)
+ARG TRAVIS_PYTHON_VERSION
+ARG TRAVIS_DL_URL_PREFIX
+ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH
+ADD ./common/install_travis_python.sh install_travis_python.sh
+RUN bash ./install_travis_python.sh && rm install_travis_python.sh

 # (optional) Install protobuf for ONNX
 ARG PROTOBUF
--- a/.circleci/generate_config_yml.py
+++ b/.circleci/generate_config_yml.py
@ -112,10 +112,7 @@ def gen_build_workflows_tree():
                "when": r"<< pipeline.parameters.run_binary_tests >>",
                "jobs": [f() for f in binary_build_functions],
            },
-            "build": {
-                "when": r"<< pipeline.parameters.run_build >>",
-                "jobs": [f() for f in build_workflows_functions]
-            },
+            "build": {"jobs": [f() for f in build_workflows_functions]},
        }
    }

--- a/.circleci/scripts/binary_checkout.sh
+++ b/.circleci/scripts/binary_checkout.sh
@ -33,11 +33,6 @@ else
  export BUILDER_ROOT="$workdir/builder"
 fi

-# Try to extract PR number from branch if not already set
-if [[ -z "${CIRCLE_PR_NUMBER:-}" ]]; then
-  CIRCLE_PR_NUMBER="$(echo ${CIRCLE_BRANCH} | sed -E -n 's/pull\/([0-9]*).*/\1/p')"
-fi
-
 # Clone the Pytorch branch
 retry git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"
 pushd "$PYTORCH_ROOT"
--- a/.circleci/scripts/binary_ios_build.sh
+++ b/.circleci/scripts/binary_ios_build.sh
@ -15,7 +15,7 @@ export PATH="~/anaconda/bin:${PATH}"
 source ~/anaconda/bin/activate

 # Install dependencies
-conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests --yes
+conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes
 conda install -c conda-forge valgrind --yes
 export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

--- a/.circleci/scripts/binary_ios_upload.sh
+++ b/.circleci/scripts/binary_ios_upload.sh
@ -34,13 +34,7 @@ touch version.txt
 echo $(date +%s) > version.txt
 zip -r ${ZIPFILE} install src version.txt LICENSE
 # upload to aws
-# Install conda then 'conda install' awscli
-curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
-chmod +x ~/conda.sh
-/bin/bash ~/conda.sh -b -p ~/anaconda
-export PATH="~/anaconda/bin:${PATH}"
-source ~/anaconda/bin/activate
-conda install -c conda-forge awscli --yes
+brew install awscli
 set +x
 export AWS_ACCESS_KEY_ID=${AWS_S3_ACCESS_KEY_FOR_PYTORCH_BINARY_UPLOAD}
 export AWS_SECRET_ACCESS_KEY=${AWS_S3_ACCESS_SECRET_FOR_PYTORCH_BINARY_UPLOAD}
--- a/.circleci/scripts/binary_linux_build.sh
+++ b/.circleci/scripts/binary_linux_build.sh
@ -7,10 +7,6 @@ source /env
 # Defaults here so they can be changed in one place
 export MAX_JOBS=${MAX_JOBS:-$(( $(nproc) - 2 ))}

-if [[ "${DESIRED_CUDA}" == "cu111" ]]; then
-  export BUILD_SPLIT_CUDA="ON"
-fi
-
 # Parse the parameters
 if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
  build_script='conda/build_pytorch.sh'
--- a/.circleci/scripts/binary_linux_test.sh
+++ b/.circleci/scripts/binary_linux_test.sh
@ -5,17 +5,12 @@ cat >/home/circleci/project/ci_test_script.sh <<EOL
 # =================== The following code will be executed inside Docker container ===================
 set -eux -o pipefail

-python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
-
 # Set up Python
 if [[ "$PACKAGE_TYPE" == conda ]]; then
-  # There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives
-  # above a certain size fail out when attempting to extract
-  # see: https://github.com/conda/conda-package-handling/issues/71
-  conda install -y conda-package-handling=1.6.0
  retry conda create -qyn testenv python="$DESIRED_PYTHON"
  source activate testenv >/dev/null
 elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
+  python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
  python_path="/opt/python/cp\$python_nodot-cp\${python_nodot}"
  # Prior to Python 3.8 paths were suffixed with an 'm'
  if [[ -d  "\${python_path}/bin" ]]; then
@ -25,19 +20,6 @@ elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
  fi
 fi

-EXTRA_CONDA_FLAGS=""
-NUMPY_PIN=""
-if [[ "\$python_nodot" = *39* ]]; then
-  EXTRA_CONDA_FLAGS="-c=conda-forge"
-  # There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
-  # we set a lower boundary here just to be safe
-  NUMPY_PIN=">=1.20"
-fi
-
-if [[ "$DESIRED_CUDA" == "cu112" ]]; then
-  EXTRA_CONDA_FLAGS="-c=conda-forge"
-fi
-
 # Install the package
 # These network calls should not have 'retry's because they are installing
 # locally and aren't actually network calls
@ -46,37 +28,23 @@ fi
 #   conda build scripts themselves. These should really be consolidated
 pkg="/final_pkgs/\$(ls /final_pkgs)"
 if [[ "$PACKAGE_TYPE" == conda ]]; then
-  (
-    # For some reason conda likes to re-activate the conda environment when attempting this install
-    # which means that a deactivate is run and some variables might not exist when that happens,
-    # namely CONDA_MKL_INTERFACE_LAYER_BACKUP from libblas so let's just ignore unbound variables when
-    # it comes to the conda installation commands
-    set +u
-    retry conda install \${EXTRA_CONDA_FLAGS} -yq \
-      "numpy\${NUMPY_PIN}" \
-      future \
-      mkl>=2018 \
-      ninja \
-      dataclasses \
-      typing-extensions \
-      defaults::protobuf \
-      six
-    if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
-      retry conda install -c pytorch -y cpuonly
+  conda install -y "\$pkg" --offline
+  if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
+    retry conda install -y cpuonly -c pytorch
+  fi
+  retry conda install -yq future numpy protobuf six
+  if [[ "$DESIRED_CUDA" != 'cpu' ]]; then
+    # DESIRED_CUDA is in format cu90 or cu102
+    if [[ "${#DESIRED_CUDA}" == 4 ]]; then
+      cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"
    else
-      # DESIRED_CUDA is in format cu90 or cu102
-      if [[ "${#DESIRED_CUDA}" == 4 ]]; then
-        cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"
-      else
-        cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
-      fi
-      retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch "cudatoolkit=\${cu_ver}"
+      cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
    fi
-    conda install \${EXTRA_CONDA_FLAGS} -y "\$pkg" --offline
-  )
+    retry conda install -yq -c nvidia -c pytorch "cudatoolkit=\${cu_ver}"
+  fi
 elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
  pip install "\$pkg"
-  retry pip install -q future numpy protobuf typing-extensions six
+  retry pip install -q future numpy protobuf six
 fi
 if [[ "$PACKAGE_TYPE" == libtorch ]]; then
  pkg="\$(ls /final_pkgs/*-latest.zip)"
--- a/.circleci/scripts/binary_macos_test.sh
+++ b/.circleci/scripts/binary_macos_test.sh
@ -20,9 +20,9 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then
  unzip "$pkg" -d /tmp
  cd /tmp/libtorch
 elif [[ "$PACKAGE_TYPE" == conda ]]; then
-  conda install -y "$pkg"
+  conda install -y "$pkg" --offline
 else
-  pip install "$pkg" -v
+  pip install "$pkg" --no-index --no-dependencies -v
 fi

 # Test
--- a/.circleci/scripts/binary_populate_env.sh
+++ b/.circleci/scripts/binary_populate_env.sh
@ -73,7 +73,7 @@ PIP_UPLOAD_FOLDER='nightly/'
 # We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
 export DATE="$(date -u +%Y%m%d)"
 #TODO: We should be pulling semver version from the base version.txt
-BASE_BUILD_VERSION="1.8.0.dev$DATE"
+BASE_BUILD_VERSION="1.7.0.dev$DATE"
 # Change BASE_BUILD_VERSION to git tag when on a git tag
 # Use 'git -C' to make doubly sure we're in the correct directory for checking
 # the git tag
@ -100,14 +100,8 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then
  POSSIBLE_JAVA_HOMES+=(/usr/local)
  POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)
  POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)
-  # Add the Windows-specific JNI path
-  POSSIBLE_JAVA_HOMES+=("$PWD/.circleci/windows-jni/")
  for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do
    if [[ -e "$JH/include/jni.h" ]] ; then
-      # Skip if we're not on Windows but haven't found a JAVA_HOME
-      if [[ "$JH" == "$PWD/.circleci/windows-jni/" && "$OSTYPE" != "msys" ]] ; then
-        break
-      fi
      echo "Found jni.h under $JH"
      JAVA_HOME="$JH"
      BUILD_JNI=ON
@ -136,7 +130,7 @@ if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
 fi

 export DATE="$DATE"
-export NIGHTLIES_DATE_PREAMBLE=1.8.0.dev
+export NIGHTLIES_DATE_PREAMBLE=1.7.0.dev
 export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
 export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
 export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
@ -167,7 +161,6 @@ export CIRCLE_TAG="${CIRCLE_TAG:-}"
 export CIRCLE_SHA1="$CIRCLE_SHA1"
 export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
 export CIRCLE_BRANCH="$CIRCLE_BRANCH"
-export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"
 # =================== The above code will be executed inside Docker container ===================
 EOL

--- a/.circleci/scripts/binary_windows_build.sh
+++ b/.circleci/scripts/binary_windows_build.sh
@ -15,10 +15,6 @@ else
  export VC_YEAR=2019
 fi

-if [[ "${DESIRED_CUDA}" == "cu111" ]]; then
-  export BUILD_SPLIT_CUDA="ON"
-fi
-
 set +x
 export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
 export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
--- a/.circleci/scripts/cpp_doc_push_script.sh
+++ b/.circleci/scripts/cpp_doc_push_script.sh
@ -57,7 +57,6 @@ cp torch/_utils_internal.py tools/shared
 # Generate PyTorch files
 time python tools/setup_helpers/generate_code.py \
  --declarations-path build/aten/src/ATen/Declarations.yaml \
-  --native-functions-path aten/src/ATen/native/native_functions.yaml \
  --nn-path aten/src/

 # Build the docs
@ -88,7 +87,7 @@ git status
 git config user.email "soumith+bot@pytorch.org"
 git config user.name "pytorchbot"
 # If there aren't changes, don't make a commit; push is no-op
-git commit -m "Generate C++ docs from pytorch/pytorch@$CIRCLE_SHA1" || true
+git commit -m "Automatic sync on $(date)" || true
 git status

 popd
--- a/.circleci/scripts/driver_update.bat
+++ b/.circleci/scripts/driver_update.bat
@ -1,8 +1,8 @@
-set "DRIVER_DOWNLOAD_LINK=https://s3.amazonaws.com/ossci-windows/452.39-data-center-tesla-desktop-win10-64bit-international.exe"
-curl --retry 3 -kL %DRIVER_DOWNLOAD_LINK% --output 452.39-data-center-tesla-desktop-win10-64bit-international.exe
+set "DRIVER_DOWNLOAD_LINK=https://s3.amazonaws.com/ossci-windows/451.82-tesla-desktop-winserver-2019-2016-international.exe"
+curl --retry 3 -kL %DRIVER_DOWNLOAD_LINK% --output 451.82-tesla-desktop-winserver-2019-2016-international.exe
 if errorlevel 1 exit /b 1

-start /wait 452.39-data-center-tesla-desktop-win10-64bit-international.exe -s -noreboot
+start /wait 451.82-tesla-desktop-winserver-2019-2016-international.exe -s -noreboot
 if errorlevel 1 exit /b 1

-del 452.39-data-center-tesla-desktop-win10-64bit-international.exe || ver > NUL
+del 451.82-tesla-desktop-winserver-2019-2016-international.exe || ver > NUL
--- a/.circleci/scripts/python_doc_push_script.sh
+++ b/.circleci/scripts/python_doc_push_script.sh
@ -42,28 +42,7 @@ fi

 echo "install_path: $install_path  version: $version"

-
-build_docs () {
-  set +e
-  set -o pipefail
-  make $1 2>&1 | tee /tmp/docs_build.txt
-  code=$?
-  if [ $code -ne 0 ]; then
-    set +x
-    echo =========================
-    grep "WARNING:" /tmp/docs_build.txt
-    echo =========================
-    echo Docs build failed. If the failure is not clear, scan back in the log
-    echo for any WARNINGS or for the line "build finished with problems"
-    echo "(tried to echo the WARNINGS above the ==== line)"
-    echo =========================
-  fi
-  set -ex
-  return $code
-}
-
-
-git clone https://github.com/pytorch/pytorch.github.io -b $branch --depth 1
+git clone https://github.com/pytorch/pytorch.github.io -b $branch
 pushd pytorch.github.io

 export LC_ALL=C
@ -78,8 +57,7 @@ pushd docs
 # Build the docs
 pip -q install -r requirements.txt
 if [ "$is_master_doc" = true ]; then
-  build_docs html
-  [ $? -eq 0 ] || exit $?
+  make html
  make coverage
  # Now we have the coverage report, we need to make sure it is empty.
  # Count the number of lines in the file and turn that number into a variable
@ -100,9 +78,8 @@ if [ "$is_master_doc" = true ]; then
    exit 1
  fi
 else
-  # skip coverage, format for stable or tags
-  build_docs html-stable
-  [ $? -eq 0 ] || exit $?
+  # Don't fail the build on coverage problems
+  make html-stable
 fi

 # Move them into the docs repo
@ -130,7 +107,7 @@ git status
 git config user.email "soumith+bot@pytorch.org"
 git config user.name "pytorchbot"
 # If there aren't changes, don't make a commit; push is no-op
-git commit -m "Generate Python docs from pytorch/pytorch@$CIRCLE_SHA1" || true
+git commit -m "auto-generating sphinx docs" || true
 git status

 popd
--- a/.circleci/scripts/setup_ci_environment.sh
+++ b/.circleci/scripts/setup_ci_environment.sh
@ -27,7 +27,7 @@ docker version
 retry sudo pip -q install awscli==1.16.35

 if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
-  DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"
+  DRIVER_FN="NVIDIA-Linux-x86_64-450.51.06.run"
  wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
  sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
  nvidia-smi
@ -54,7 +54,7 @@ add_to_env_file() {
  echo "${content}" >> "${BASH_ENV:-/tmp/env}"
 }

-add_to_env_file "IN_CI=1"
+add_to_env_file "IN_CIRCLECI=1"
 add_to_env_file "COMMIT_SOURCE=${CIRCLE_BRANCH:-}"
 add_to_env_file "BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}"
 add_to_env_file "CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}"
--- a/.circleci/scripts/upload_binary_size_to_scuba.py
+++ b/.circleci/scripts/upload_binary_size_to_scuba.py
@ -41,7 +41,6 @@ def build_message(size):
            "build_num": os.environ.get("CIRCLE_BUILD_NUM"),
            "sha1": os.environ.get("CIRCLE_SHA1"),
            "branch": os.environ.get("CIRCLE_BRANCH"),
-            "workflow_id": os.environ.get("CIRCLE_WORKFLOW_ID"),
        },
        "int": {
            "time": int(time.time()),
@ -116,7 +115,6 @@ def report_android_sizes(file_dir):
                    "build_num": os.environ.get("CIRCLE_BUILD_NUM"),
                    "sha1": os.environ.get("CIRCLE_SHA1"),
                    "branch": os.environ.get("CIRCLE_BRANCH"),
-                    "workflow_id": os.environ.get("CIRCLE_WORKFLOW_ID"),
                },
                "int": {
                    "time": int(time.time()),
--- a/.circleci/scripts/windows_cuda_install.sh
+++ b/.circleci/scripts/windows_cuda_install.sh
@ -1,25 +1,21 @@
 #!/bin/bash
 set -eux -o pipefail

-cuda_major_version=${CUDA_VERSION%.*}
-
-if [[ "$cuda_major_version" == "10" ]]; then
+if [[ "$CUDA_VERSION" == "10" ]]; then
+    cuda_complete_version="10.1"
    cuda_installer_name="cuda_10.1.243_426.00_win10"
    msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
    cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"
-elif [[ "$cuda_major_version" == "11" ]]; then
-    cuda_installer_name="cuda_11.1.0_456.43_win10"
+elif [[ "$CUDA_VERSION" == "11" ]]; then
+    cuda_complete_version="11.0"
+    cuda_installer_name="cuda_11.0.2_451.48_win10"
    msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
-    cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"
+    cuda_install_packages="nvcc_11.0 cuobjdump_11.0 nvprune_11.0 nvprof_11.0 cupti_11.0 cublas_11.0 cublas_dev_11.0 cudart_11.0 cufft_11.0 cufft_dev_11.0 curand_11.0 curand_dev_11.0 cusolver_11.0 cusolver_dev_11.0 cusparse_11.0 cusparse_dev_11.0 npp_11.0 npp_dev_11.0 nvrtc_11.0 nvrtc_dev_11.0 nvml_dev_11.0"
 else
    echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
    exit 1
 fi

-if [[ "$cuda_major_version" == "11" && "${JOB_EXECUTOR}" == "windows-with-nvidia-gpu" ]]; then
-    cuda_install_packages="${cuda_install_packages} Display.Driver"
-fi
-
 cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"

 curl --retry 3 -kLO $cuda_installer_link
@ -48,7 +44,7 @@ then
    export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"
 fi

-if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe"
+if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${cuda_complete_version}/bin/nvcc.exe"
 then
    echo "CUDA installation failed"
    mkdir -p /c/w/build-results
--- a/.circleci/scripts/windows_cudnn_install.sh
+++ b/.circleci/scripts/windows_cudnn_install.sh
@ -1,12 +1,12 @@
 #!/bin/bash
 set -eux -o pipefail

-cuda_major_version=${CUDA_VERSION%.*}
-
-if [[ "$cuda_major_version" == "10" ]]; then
-    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"
-elif [[ "$cuda_major_version" == "11" ]]; then
-    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"
+if [[ "$CUDA_VERSION" == "10" ]]; then
+    cuda_complete_version="10.1"
+    cudnn_installer_name="cudnn-10.1-windows10-x64-v7.6.4.38"
+elif [[ "$CUDA_VERSION" == "11" ]]; then
+    cuda_complete_version="11.0"
+    cudnn_installer_name="cudnn-11.0-windows-x64-v8.0.2.39"
 else
    echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"
    exit 1
@ -16,6 +16,6 @@ cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/${cudnn_installer_n

 curl --retry 3 -O $cudnn_installer_link
 7z x ${cudnn_installer_name}.zip -ocudnn
-cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"
+cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${cuda_complete_version}/"
 rm -rf cudnn
 rm -f ${cudnn_installer_name}.zip
--- a/.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
+++ b/.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
@ -36,15 +36,11 @@ pytorch_ios_params: &pytorch_ios_params
    op_list:
      type: string
      default: ""
-    use_metal:
-      type: string
-      default: "0"
  environment:
    BUILD_ENVIRONMENT: << parameters.build_environment >>
    IOS_ARCH: << parameters.ios_arch >>
    IOS_PLATFORM: << parameters.ios_platform >>
    SELECTED_OP_LIST: << parameters.op_list >>
-    USE_PYTORCH_METAL: << parameters.use_metal >>

 pytorch_windows_params: &pytorch_windows_params
  parameters:
@ -59,7 +55,7 @@ pytorch_windows_params: &pytorch_windows_params
      default: ""
    cuda_version:
      type: string
-      default: "10.1"
+      default: "10"
    python_version:
      type: string
      default: "3.6"
--- a/.circleci/verbatim-sources/commands.yml
+++ b/.circleci/verbatim-sources/commands.yml
@ -103,7 +103,7 @@ commands:
          name: (Optional) Merge target branch
          no_output_timeout: "10m"
          command: |
-            if [[ -n "$CIRCLE_PULL_REQUEST" && "$CIRCLE_BRANCH" != "nightly" ]]; then
+            if [ -n "$CIRCLE_PULL_REQUEST" ]; then
              PR_NUM=$(basename $CIRCLE_PULL_REQUEST)
              CIRCLE_PR_BASE_BRANCH=$(curl -s https://api.github.com/repos/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME/pulls/$PR_NUM | jq -r '.base.ref')
              if [[ "${BUILD_ENVIRONMENT}" == *"xla"* || "${BUILD_ENVIRONMENT}" == *"gcc5"* ]] ; then
@ -111,11 +111,11 @@ commands:
                git config --global user.email "circleci.ossci@gmail.com"
                git config --global user.name "CircleCI"
                git config remote.origin.url https://github.com/pytorch/pytorch.git
-                git config --add remote.origin.fetch +refs/heads/release/1.8:refs/remotes/origin/release/1.8
-                git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/release/1.8:refs/remotes/origin/release/1.8 --depth=100 --quiet
+                git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
+                git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
                # PRs generated from ghstack has format CIRCLE_PR_BASE_BRANCH=gh/xxx/1234/base
                if [[ "${CIRCLE_PR_BASE_BRANCH}" == "gh/"* ]]; then
-                  CIRCLE_PR_BASE_BRANCH=release/1.8
+                  CIRCLE_PR_BASE_BRANCH=master
                fi
                export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/$CIRCLE_PR_BASE_BRANCH`
                echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
--- a/.circleci/verbatim-sources/header-section.yml
+++ b/.circleci/verbatim-sources/header-section.yml
@ -11,9 +11,6 @@ parameters:
  run_binary_tests:
    type: boolean
    default: false
-  run_build:
-    type: boolean
-    default: true

 docker_config_defaults: &docker_config_defaults
  user: jenkins
--- a/.circleci/verbatim-sources/job-specs/binary-job-specs.yml
+++ b/.circleci/verbatim-sources/job-specs/binary-job-specs.yml
@ -135,7 +135,7 @@
  smoke_mac_test:
    <<: *binary_linux_test_upload_params
    macos:
-      xcode: "12.0"
+      xcode: "11.2.1"
    steps:
      - checkout
      - run:
@ -160,7 +160,7 @@
  binary_mac_build:
    <<: *binary_mac_params
    macos:
-      xcode: "12.0"
+      xcode: "11.2.1"
    steps:
    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
    - checkout
@ -174,7 +174,7 @@

    - run:
        name: Build
-        no_output_timeout: "90m"
+        no_output_timeout: "1h"
        command: |
          # Do not set -u here; there is some problem with CircleCI
          # variable expansion with PROMPT_COMMAND
@ -198,44 +198,6 @@
        root: /Users/distiller/project
        paths: final_pkgs

-    - store_artifacts:
-        path: /Users/distiller/project/final_pkgs
-
-  binary_macos_arm64_build:
-    <<: *binary_mac_params
-    macos:
-      xcode: "12.3.0"
-    steps:
-    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
-    - checkout
-    - run:
-        <<: *binary_checkout
-    - run:
-        <<: *binary_populate_env
-    - brew_update
-    - run:
-        <<: *binary_install_miniconda
-
-    - run:
-        name: Build
-        no_output_timeout: "90m"
-        command: |
-          # Do not set -u here; there is some problem with CircleCI
-          # variable expansion with PROMPT_COMMAND
-          set -ex -o pipefail
-          export CROSS_COMPILE_ARM64=1
-          script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
-          cat "$script"
-          source "$script"
-
-    - persist_to_workspace:
-        root: /Users/distiller/project
-        paths: final_pkgs
-
-    - store_artifacts:
-        path: /Users/distiller/project/final_pkgs
-
-
  binary_ios_build:
    <<: *pytorch_ios_params
    macos:
--- a/.circleci/verbatim-sources/job-specs/job-specs-custom.yml
+++ b/.circleci/verbatim-sources/job-specs/job-specs-custom.yml
@ -49,7 +49,7 @@
          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

-          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/python_doc_push_script.sh docs/'$target' '$target' site") | docker exec -u jenkins -i "$id" bash) 2>&1'
+          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/python_doc_push_script.sh docs/'$target' master site") | docker exec -u jenkins -i "$id" bash) 2>&1'

          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

@ -94,7 +94,7 @@
          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

-          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" master") | docker exec -u jenkins -i "$id" bash) 2>&1'
+          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" master") | docker exec -u jenkins -i "$id" bash) 2>&1'

          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

@ -115,7 +115,7 @@
    environment:
      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
    macos:
-      xcode: "12.0"
+      xcode: "11.2.1"
    steps:
      - checkout
      - run_brew_for_macos_build
@ -124,7 +124,7 @@
          no_output_timeout: "1h"
          command: |
            set -e
-            export IN_CI=1
+            export IN_CIRCLECI=1

            # Install sccache
            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
@ -149,7 +149,7 @@
    environment:
      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
    macos:
-      xcode: "12.0"
+      xcode: "11.2.1"
    steps:
      - checkout
      - attach_workspace:
@ -160,7 +160,7 @@
          no_output_timeout: "1h"
          command: |
            set -e
-            export IN_CI=1
+            export IN_CIRCLECI=1

            chmod a+x .jenkins/pytorch/macos-test.sh
            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
@ -259,22 +259,22 @@
  pytorch_android_publish_snapshot:
    environment:
      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot
-      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
+      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:ab1632df-fa59-40e6-8c23-98e004f61148"
      PYTHON_VERSION: "3.6"
    resource_class: large
    machine:
      image: ubuntu-1604:202007-01
    steps:
    - checkout
-    - calculate_docker_image_tag
    - setup_linux_system_environment
+    - checkout
    - setup_ci_environment
    - run:
        name: pytorch android gradle build
        no_output_timeout: "1h"
        command: |
          set -eux
-          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
+          docker_image_commit=${DOCKER_IMAGE}-${CIRCLE_SHA1}

          docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle

@ -408,7 +408,7 @@
          no_output_timeout: "1h"
          command: |
            set -e
-            export IN_CI=1
+            export IN_CIRCLECI=1
            WORKSPACE=/Users/distiller/workspace
            PROJ_ROOT=/Users/distiller/project
            export TCLLIBPATH="/usr/local/lib"
@ -425,7 +425,7 @@
                $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
            }

-            retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
+            retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

            # sync submodules
            cd ${PROJ_ROOT}
@ -439,7 +439,6 @@
            chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
            echo "IOS_ARCH: ${IOS_ARCH}"
            echo "IOS_PLATFORM: ${IOS_PLATFORM}"
-            echo "USE_PYTORCH_METAL": "${USE_METAL}"

            #check the custom build flag
            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"
@ -448,9 +447,6 @@
            fi
            export IOS_ARCH=${IOS_ARCH}
            export IOS_PLATFORM=${IOS_PLATFORM}
-            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
-              export USE_PYTORCH_METAL=${USE_METAL}
-            fi
            unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts
      - run:
          name: Run Build Test
--- a/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
+++ b/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
@ -15,8 +15,12 @@ jobs:
        no_output_timeout: "1h"
        command: |
          set -e
-          if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then
-            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"
+          # TODO: Remove this after we figure out why rocm tests are failing
+          if [[ "${DOCKER_IMAGE}" == *rocm3.5* ]]; then
+            export DOCKER_TAG="ab1632df-fa59-40e6-8c23-98e004f61148"
+          fi
+          if [[ "${DOCKER_IMAGE}" == *rocm3.7* ]]; then
+            export DOCKER_TAG="1045c7b891104cb4fd23399eab413b6213e48aeb"
          fi
          if [[ ${BUILD_ENVIRONMENT} == *"pure_torch"* ]]; then
            echo 'BUILD_CAFFE2=OFF' >> "${BASH_ENV}"
@ -48,7 +52,7 @@ jobs:
          if [ -z "${BUILD_ONLY}" ]; then
            # Note [Special build images]
            # The xla build uses the same docker image as
-            # pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to
+            # pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to
            # distinguish between them so the test can pick up the correct image.
            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
            if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
@ -96,8 +100,12 @@ jobs:
        command: |
          set -e
          export PYTHONUNBUFFERED=1
-          if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then
-            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"
+          # TODO: Remove this after we figure out why rocm tests are failing
+          if [[ "${DOCKER_IMAGE}" == *rocm3.5* ]]; then
+            export DOCKER_TAG="ab1632df-fa59-40e6-8c23-98e004f61148"
+          fi
+          if [[ "${DOCKER_IMAGE}" == *rocm3.7* ]]; then
+            export DOCKER_TAG="1045c7b891104cb4fd23399eab413b6213e48aeb"
          fi
          # See Note [Special build images]
          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
@ -133,7 +141,7 @@ jobs:
            hostname
            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=8g --ipc=host --device /dev/kfd --device /dev/dri --group-add video -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
          else
-            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size=1g --ipc=host -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
+            export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
          fi
          echo "id=${id}" >> "${BASH_ENV}"

@ -175,7 +183,7 @@ jobs:
            echo ".jenkins/pytorch/multigpu-test.sh" >> docker_commands.sh
          elif [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then
            echo "pip install click mock tabulate networkx==2.0" >> docker_commands.sh
-            echo "pip -q install --user \"file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx\"" >> docker_commands.sh
+            echo "pip -q install --user -b /tmp/pip_install_onnx \"file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx\"" >> docker_commands.sh
            echo ".jenkins/caffe2/test.sh" >> docker_commands.sh
          else
            echo ".jenkins/pytorch/test.sh" >> docker_commands.sh
@ -199,9 +207,8 @@ jobs:
          export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
          export CIRCLE_BRANCH="$CIRCLE_BRANCH"
          export CIRCLE_JOB="$CIRCLE_JOB"
-          export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"
          cd workspace
-          python test/print_test_stats.py --upload-to-s3 test
+          python test/print_test_stats.py test
          EOL
          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh
          unbuffer bash command.sh | ts
@ -209,11 +216,7 @@ jobs:
          echo "Retrieving test reports"
          docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'
          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* ]]; then
-              echo "Retrieving C++ coverage report"
-              docker cp $id:/var/lib/jenkins/workspace/build/coverage.info ./test
-          fi
-          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* || ${BUILD_ENVIRONMENT} == *"onnx"* ]]; then
-              echo "Retrieving Python coverage report"
+              echo "Retrieving coverage report"
              docker cp $id:/var/lib/jenkins/workspace/test/.coverage ./test
              docker cp $id:/var/lib/jenkins/workspace/test/coverage.xml ./test
              python3 -mpip install codecov
@ -237,7 +240,7 @@ jobs:
        default: ""
      cuda_version:
        type: string
-        default: "10.1"
+        default: "10"
      python_version:
        type: string
        default: "3.6"
@ -299,7 +302,7 @@ jobs:
        default: ""
      cuda_version:
        type: string
-        default: "10.1"
+        default: "10"
      python_version:
        type: string
        default: "3.6"
@ -328,6 +331,9 @@ jobs:
              if [[ "${CUDA_VERSION}" != "10" || "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then
                .circleci/scripts/windows_cuda_install.sh
              fi
+              if [[ "${CUDA_VERSION}" != "10" && "${JOB_EXECUTOR}" == "windows-with-nvidia-gpu" ]]; then
+                .circleci/scripts/driver_update.bat
+              fi
            fi
      - run:
          name: Install Cudnn
@ -340,7 +346,7 @@ jobs:
          no_output_timeout: "30m"
          command: |
            set -e
-            export IN_CI=1
+            export IN_CIRCLECI=1
            set +x
            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
--- a/.circleci/windows-jni/include/jni.h
+++ b/.circleci/windows-jni/include/jni.h
--- a/.clang-tidy
+++ b/.clang-tidy
@ -1,7 +1,6 @@
 ---
 # NOTE there must be no spaces before the '-', so put the comma last.
-InheritParentConfig: true
-Checks: '
+Checks: '-*,
 bugprone-*,
 -bugprone-forward-declaration-namespace,
 -bugprone-macro-parentheses,
@ -18,11 +17,9 @@ cppcoreguidelines-*,
 -cppcoreguidelines-pro-type-union-access,
 -cppcoreguidelines-pro-type-vararg,
 -cppcoreguidelines-special-member-functions,
-facebook-hte-RelativeInclude,
 hicpp-exception-baseclass,
 hicpp-avoid-goto,
 modernize-*,
-modernize-concat-nested-namespaces,
 -modernize-return-braced-init-list,
 -modernize-use-auto,
 -modernize-use-default-member-init,
@ -30,7 +27,7 @@ modernize-*,
 -modernize-use-trailing-return-type,
 performance-*,
 -performance-noexcept-move-constructor,
-'
+  '
 HeaderFilterRegex: 'torch/csrc/.*'
 AnalyzeTemporaryDtors: false
 CheckOptions:
--- a/.flake8
+++ b/.flake8
@ -12,22 +12,5 @@ ignore =
    B007,B008,
    # these ignores are from flake8-comprehensions; please fix!
    C400,C401,C402,C403,C404,C405,C407,C411,C413,C414,C415
-per-file-ignores = __init__.py: F401 torch/utils/cpp_extension.py: B950
-exclude =
-    docs/src,
-    docs/cpp/src,
-    venv,
-    third_party,
-    caffe2,
-    scripts,
-    docs/caffe2,
-    torch/lib/include,
-    torch/lib/tmp_install,
-    build,
-    torch/include,
-    *.pyi,
-    .git,
-    build,
-    build_test_custom_build,
-    build_code_analyzer,
-    test/generated_type_hints_smoketest.py
+per-file-ignores = __init__.py: F401
+exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi,.git,build,build_test_custom_build,build_code_analyzer
--- a/.github/pytorch-circleci-labels.yml
+++ b/.github/pytorch-circleci-labels.yml
@ -9,5 +9,3 @@ labels_to_circle_params:
        - release/.*
      tags:
        - v[0-9]+(\.[0-9]+)*-rc[0-9]+
-    set_to_false:
-      - run_build
--- a/.github/scripts/generate_binary_build_matrix.py
+++ b/.github/scripts/generate_binary_build_matrix.py
@ -1,86 +0,0 @@
-#!/usr/bin/env python3
-
-"""Generates a matrix to be utilized through github actions
-
-Will output a condensed version of the matrix if on a pull request that only
-includes the latest version of python we support built on three different
-architectures:
-    * CPU
-    * Latest CUDA
-    * Latest ROCM
-"""
-
-import json
-import os
-import itertools
-
-CUDA_ARCHES = [
-    "10.1",
-    "10.2",
-    "11.0"
-]
-
-ROCM_ARCHES = [
-    "3.10",
-    "4.0"
-]
-
-FULL_ARCHES = [
-    "cpu",
-    *CUDA_ARCHES,
-    *ROCM_ARCHES
-]
-
-CONTAINER_IMAGES = {
-    **{
-        # TODO: Re-do manylinux CUDA image tagging scheme to be similar to
-        #       ROCM so we don't have to do this replacement
-        gpu_arch: f"pytorch/manylinux-cuda{gpu_arch.replace('.', '')}"
-        for gpu_arch in CUDA_ARCHES
-    },
-    **{
-        gpu_arch: f"pytorch/manylinux-rocm:{gpu_arch}"
-        for gpu_arch in ROCM_ARCHES
-    },
-    "cpu": "pytorch/manylinux-cpu"
-}
-
-FULL_PYTHON_VERSIONS = [
-    "3.6",
-    "3.7",
-    "3.8",
-    "3.9",
-]
-
-
-def is_pull_request():
-    return os.environ.get("GITHUB_HEAD_REF")
-
-def generate_matrix():
-    python_versions = FULL_PYTHON_VERSIONS
-    arches = FULL_ARCHES
-    if is_pull_request():
-        python_versions = [python_versions[-1]]
-        arches = ["cpu", CUDA_ARCHES[-1], ROCM_ARCHES[-1]]
-    matrix = []
-    for item in itertools.product(python_versions, arches):
-        python_version, arch_version = item
-        # Not my favorite code here
-        gpu_arch_type = "cuda"
-        if "rocm" in CONTAINER_IMAGES[arch_version]:
-            gpu_arch_type = "rocm"
-        elif "cpu" in CONTAINER_IMAGES[arch_version]:
-            gpu_arch_type = "cpu"
-        matrix.append({
-            "python_version": python_version,
-            "gpu_arch_type": gpu_arch_type,
-            "gpu_arch_version": arch_version,
-            "container_image": CONTAINER_IMAGES[arch_version]
-        })
-    return json.dumps({"include": matrix})
-
-def main():
-    print(generate_matrix())
-
-if __name__ == "__main__":
-    main()
--- a/.github/scripts/generate_pytorch_version.py
+++ b/.github/scripts/generate_pytorch_version.py
@ -1,118 +0,0 @@
-#!/usr/bin/env python3
-
-import argparse
-import os
-import subprocess
-import re
-
-from datetime import datetime
-from distutils.util import strtobool
-from pathlib import Path
-
-LEADING_V_PATTERN = re.compile("^v")
-TRAILING_RC_PATTERN = re.compile("-rc[0-9]*$")
-LEGACY_BASE_VERSION_SUFFIX_PATTERN = re.compile("a0$")
-
-class NoGitTagException(Exception):
-    pass
-
-def get_pytorch_root():
-    return Path(subprocess.check_output(
-        ['git', 'rev-parse', '--show-toplevel']
-    ).decode('ascii').strip())
-
-def get_tag():
-    root = get_pytorch_root()
-    # We're on a tag
-    am_on_tag = (
-        subprocess.run(
-            ['git', 'describe', '--tags', '--exact'],
-            cwd=root,
-            stdout=subprocess.DEVNULL,
-            stderr=subprocess.DEVNULL
-        ).returncode == 0
-    )
-    tag = ""
-    if am_on_tag:
-        dirty_tag = subprocess.check_output(
-            ['git', 'describe'],
-            cwd=root
-        ).decode('ascii').strip()
-        # Strip leading v that we typically do when we tag branches
-        # ie: v1.7.1 -> 1.7.1
-        tag = re.sub(LEADING_V_PATTERN, "", dirty_tag)
-        # Strip trailing rc pattern
-        # ie: 1.7.1-rc1 -> 1.7.1
-        tag = re.sub(TRAILING_RC_PATTERN, "", tag)
-    return tag
-
-def get_base_version():
-    root = get_pytorch_root()
-    dirty_version = open(root / 'version.txt', 'r').read().strip()
-    # Strips trailing a0 from version.txt, not too sure why it's there in the
-    # first place
-    return re.sub(LEGACY_BASE_VERSION_SUFFIX_PATTERN, "", dirty_version)
-
-class PytorchVersion:
-    def __init__(self, gpu_arch_type, gpu_arch_version, no_build_suffix):
-        self.gpu_arch_type = gpu_arch_type
-        self.gpu_arch_version = gpu_arch_version
-        self.no_build_suffix = no_build_suffix
-
-    def get_post_build_suffix(self):
-        # CUDA 10.2 is the version to be uploaded to PyPI so it doesn't have a
-        # version suffix
-        if ((self.gpu_arch_type == "cuda" and self.gpu_arch_version == "10.2")
-                or self.no_build_suffix):
-            return ""
-        if self.gpu_arch_type == "cuda":
-            return f"+cu{self.gpu_arch_version.replace('.', '')}"
-        return f"+{self.gpu_arch_type}{self.gpu_arch_version}"
-
-    def get_release_version(self):
-        if not get_tag():
-            raise NoGitTagException(
-                "Not on a git tag, are you sure you want a release version?"
-            )
-        return f"{get_tag()}{self.get_post_build_suffix()}"
-
-    def get_nightly_version(self):
-        date_str = datetime.today().strftime('%Y%m%d')
-        build_suffix = self.get_post_build_suffix()
-        return f"{get_base_version()}.dev{date_str}{build_suffix}"
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Generate pytorch version for binary builds"
-    )
-    parser.add_argument(
-        "--no-build-suffix",
-        type=strtobool,
-        help="Whether or not to add a build suffix typically (+cpu)",
-        default=os.environ.get("NO_BUILD_SUFFIX", False)
-    )
-    parser.add_argument(
-        "--gpu-arch-type",
-        type=str,
-        help="GPU arch you are building for, typically (cpu, cuda, rocm)",
-        default=os.environ.get("GPU_ARCH_TYPE", "cpu")
-    )
-    parser.add_argument(
-        "--gpu-arch-version",
-        type=str,
-        help="GPU arch version, typically (10.2, 4.0), leave blank for CPU",
-        default=os.environ.get("GPU_ARCH_VERSION", "")
-    )
-    args = parser.parse_args()
-    version_obj = PytorchVersion(
-        args.gpu_arch_type,
-        args.gpu_arch_version,
-        args.no_build_suffix
-    )
-    try:
-        print(version_obj.get_release_version())
-    except NoGitTagException:
-        print(version_obj.get_nightly_version())
-
-if __name__ == "__main__":
-    main()
--- a/.github/workflows/build_linux_binaries.yml
+++ b/.github/workflows/build_linux_binaries.yml
@ -1,86 +0,0 @@
-name: Build Linux Wheels
-
-on:
-  # TODO: These are only runnable from workflow_dispatch, we need to eventually add
-  #       a cron
-  # TODO: Add an on_release trigger to build on tags
-  workflow_dispatch:
-
-jobs:
-  generate-build-matrix:
-    if: ${{ github.repository_owner == 'pytorch' }}
-    runs-on: ubuntu-18.04
-    outputs:
-      matrix: ${{ steps.set-matrix.outputs.matrix }}
-    container:
-      image: python:3.9
-    steps:
-      - name: Clone pytorch/pytorch
-        uses: actions/checkout@v2
-      - name: Generating build matrix
-        id: set-matrix
-        run: |
-          # outputting for debugging purposes
-          python .github/scripts/generate_binary_build_matrix.py
-          MATRIX=$(python .github/scripts/generate_binary_build_matrix.py)
-          echo "::set-output name=matrix::${MATRIX}"
-  build-wheel:
-    if: ${{ github.repository_owner == 'pytorch' }}
-    needs: generate-build-matrix
-    runs-on: linux.2xlarge
-    strategy:
-      matrix:
-        ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}
-    container:
-      image: ${{ matrix.container_image }}
-    env:
-      DESIRED_PYTHON: ${{ matrix.python_version }}
-      # TODO: This is a legacy variable that we eventually want to get rid of in
-      #       favor of GPU_ARCH_VERSION
-      DESIRED_CUDA: ${{ matrix.gpu_arch_version }}
-      GPU_ARCH_VERSION: ${{ matrix.GPU_ARCH_VERSION }}
-      GPU_ARCH_TYPE: ${{ matrix.gpu_arch_type }}
-      PYTORCH_BUILD_NUMBER: 1
-      SKIP_ALL_TESTS: 1
-    steps:
-      - name: Clone pytorch/pytorch
-        uses: actions/checkout@v2
-        with:
-          path: pytorch
-          submodules: recursive
-      - name: Clone pytorch/builder
-        uses: actions/checkout@v2
-        with:
-          repository: pytorch/builder
-          path: builder
-      - name: Generate version string
-        working-directory: pytorch/
-        run: |
-          version=$(.github/scripts/generate_pytorch_version.py)
-          echo "Generated version: ${version}"
-          echo "PYTORCH_BUILD_VERSION=${version}" >> $GITHUB_ENV
-      # TODO: Remove this once we remove the need for the directories to be
-      #       in specific locations
-      - name: Symlink repositories to root directory (for legacy scripts purposes)
-        run: |
-          ln -s $(pwd)/pytorch /pytorch
-          ln -s $(pwd)/builder /builder
-      # TODO: Bundle the correct build script in the base container image so
-      #       that we don't have to do this type of specification
-      - name: Build PyTorch binary (CUDA specific)
-        if: ${{ matrix.gpu_arch_type == 'cuda' }}
-        run: |
-          /builder/manywheel/build.sh
-      - name: Build PyTorch binary (ROCM specific)
-        if: ${{ matrix.gpu_arch_type == 'rocm' }}
-        run: |
-          /builder/manywheel/build_rocm.sh
-      - name: Build PyTorch binary (CPU specific)
-        if: ${{ matrix.gpu_arch_type == 'cpu' }}
-        run: |
-          /builder/manywheel/build_cpu.sh
-      - uses: actions/upload-artifact@v2
-        with:
-          name: pytorch-wheel-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}
-          path: /remote/**/*.whl
-      # TODO: Add a step here for uploading binaries
--- a/.github/workflows/clang_format.yml
+++ b/.github/workflows/clang_format.yml
@ -5,7 +5,7 @@ on:

 jobs:
  clang-format:
-    runs-on: ubuntu-18.04
+    runs-on: ubuntu-latest
    steps:
      - name: Setup Python
        uses: actions/setup-python@v1
--- a/.github/workflows/jit_triage.yml
+++ b/.github/workflows/jit_triage.yml
@ -6,7 +6,7 @@ on:

 jobs:
  welcome:
-    runs-on: ubuntu-18.04
+    runs-on: ubuntu-latest
    steps:
      - uses: actions/github-script@v2
        with:
@ -19,7 +19,7 @@ jobs:
            // - io: A reference to the @actions/io package

            // Check if issue has a JIT label.
-            const kJitLabel = "oncall: jit";
+            const kJitLabel = "jit";

            issue = await github.issues.get({
              owner: context.issue.owner,
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@ -8,7 +8,7 @@ on:

 jobs:
  quick-checks:
-    runs-on: ubuntu-18.04
+    runs-on: ubuntu-latest
    steps:
      - name: Setup Python
        uses: actions/setup-python@v1
@ -17,28 +17,13 @@ jobs:
          architecture: x64
      - name: Checkout PyTorch
        uses: actions/checkout@v1
-      - name: Checkout PR tip
-        run: |
-          set -eux
-          if [[ "${{ github.event_name }}" == "pull_request" ]]; then
-            # We are on a PR, so actions/checkout leaves us on a merge commit.
-            # Check out the actual tip of the branch.
-            git checkout ${{ github.event.pull_request.head.sha }}
-          fi
-          echo ::set-output name=commit_sha::$(git rev-parse HEAD)
-        id: get_pr_tip
      - name: Ensure consistent CircleCI YAML config
        run: |
          pip install -r requirements.txt
          cd .circleci && ./ensure-consistency.py
      - name: Shellcheck Jenkins scripts
-        # https://github.com/koalaman/shellcheck#installing-a-pre-compiled-binary
        run: |
-          scversion="stable"
-          wget -qO- "https://github.com/koalaman/shellcheck/releases/download/${scversion?}/shellcheck-${scversion?}.linux.x86_64.tar.xz" | tar -xJv
-          sudo cp "shellcheck-${scversion}/shellcheck" /usr/bin/
-          rm -r "shellcheck-${scversion}"
-          shellcheck --version
+          sudo apt-get install -y shellcheck
          .jenkins/run-shellcheck.sh
      - name: Ensure no tabs
        run: |
@ -46,23 +31,16 @@ jobs:
      - name: Ensure canonical include
        run: |
          (! git grep -I -l $'#include "' -- ./c10 ./aten ./torch/csrc ':(exclude)aten/src/ATen/native/quantized/cpu/qnnpack/**' || (echo "The above files have include with quotes; please convert them to #include <xxxx>"; false))
-      # note that this next step depends on a clean heckout;
-      # if you run it locally then it will likely to complain
-      # about all the generated files in torch/test
      - name: Ensure C++ source files are not executable
        run: |
-          (! find . \( -path ./third_party -o -path ./.git -o -path ./torch/bin -o -path ./build \) -prune -o -type f -executable -regextype posix-egrep -not -regex '.+(\.(bash|sh|py|so)|git-pre-commit|git-clang-format|gradlew)$' -print | grep . || (echo 'The above files have executable permission; please remove their executable permission by using `chmod -x`'; false))
+          (! find . \( -path ./third_party -o -path ./.git -o -path ./torch/bin -o -path ./build \) -prune -o -type f -executable -regextype posix-egrep -not -regex '.+(\.(bash|sh|py|so)|git-pre-commit|git-clang-format)$' -print | grep . || (echo 'The above files have executable permission; please remove their executable permission by using `chmod -x`'; false))
      - name: C++ docs check
        run: |
          sudo apt-get install -y doxygen && pip install -r requirements.txt
          cd docs/cpp/source && ./check-doxygen.sh
-      - name: CUDA kernel launch check
-        run: |
-          set -eux
-          python torch/testing/check_kernel_launches.py |& tee ${GITHUB_WORKSPACE}/cuda_kernel_launch_checks.txt

  flake8-py3:
-    runs-on: ubuntu-18.04
+    runs-on: ubuntu-latest
    steps:
      - name: Setup Python
        uses: actions/setup-python@v1
@ -84,25 +62,23 @@ jobs:
      - name: Run flake8
        run: |
          set -eux
-          pip install -r requirements-flake8.txt
+          pip install flake8==3.8.2 flake8-mypy flake8-bugbear flake8-comprehensions flake8-executable flake8-pyi==20.5.0 mccabe pycodestyle==2.6.0 pyflakes==2.2.0
          flake8 --version
-          flake8 | tee ${GITHUB_WORKSPACE}/flake8-output.txt
+          flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt
+          cat ${GITHUB_WORKSPACE}/flake8-output.txt
      - name: Add annotations
        uses: pytorch/add-annotations-github-action@master
        with:
          check_name: 'flake8-py3'
          linter_output_path: 'flake8-output.txt'
          commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}
-          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w+\d+) (?<errorDesc>.*)'
+          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-      - name: Catch any other warnings
-        run: |
-          [ ! -s flake8-output.txt ]

  clang-tidy:
    if: github.event_name == 'pull_request'
-    runs-on: ubuntu-18.04
+    runs-on: ubuntu-latest
    steps:
      - name: Setup Python
        uses: actions/setup-python@v1
@ -132,12 +108,12 @@ jobs:
          sudo apt-get update
          sudo apt-get --no-install-recommends -y install cuda-toolkit-10-2
          # Install dependencies
-          pip install pyyaml typing_extensions
+          pip install pyyaml
          wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
-          sudo apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-11 main"
+          sudo apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-8 main"
          sudo apt-get update
-          sudo apt-get install -y clang-tidy-11
-          sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-11 1000
+          sudo apt-get install -y clang-tidy-8
+          sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-8 1000
      - name: Run clang-tidy
        run: |
          set -eux
@ -162,7 +138,6 @@ jobs:
            # Generate PyTorch files.
            time python tools/setup_helpers/generate_code.py            \
              --declarations-path build/aten/src/ATen/Declarations.yaml \
-              --native-functions-path aten/src/ATen/native/native_functions.yaml \
              --nn-path aten/src
          fi

@ -172,8 +147,6 @@ jobs:
          # FunctionsManual.cpp is excluded to keep this diff clean. It will be fixed
          # in a follow up PR.
          # /torch/csrc/generic/*.cpp is excluded because those files aren't actually built.
-          # deploy/interpreter files are excluded due to using macros and other techniquies
-          # that are not easily converted to accepted c++
          python tools/clang_tidy.py                               \
            --verbose                                              \
            --paths torch/csrc/                                    \
@ -189,11 +162,6 @@ jobs:
            -g"-torch/csrc/cuda/python_nccl.cpp"                   \
            -g"-torch/csrc/autograd/FunctionsManual.cpp"           \
            -g"-torch/csrc/generic/*.cpp"                          \
-            -g"-torch/csrc/jit/codegen/cuda/runtime/*"             \
-            -g"-torch/csrc/deploy/interpreter/interpreter.cpp"     \
-            -g"-torch/csrc/deploy/interpreter/interpreter.h"  \
-            -g"-torch/csrc/deploy/interpreter/interpreter_impl.h"  \
-            -g"-torch/csrc/deploy/interpreter/test_main.cpp"  \
            "$@" > ${GITHUB_WORKSPACE}/clang-tidy-output.txt

          cat ${GITHUB_WORKSPACE}/clang-tidy-output.txt
@ -208,7 +176,7 @@ jobs:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

  cmakelint:
-    runs-on: ubuntu-18.04
+    runs-on: ubuntu-latest
    steps:
      - name: Setup Python
        uses: actions/setup-python@v1
--- a/.github/workflows/quantization_triage.yml
+++ b/.github/workflows/quantization_triage.yml
@ -1,78 +0,0 @@
-name: quantization-triage
-
-on:
-  issues:
-    types: [labeled]
-
-jobs:
-  welcome:
-    runs-on: ubuntu-18.04
-    steps:
-      - uses: actions/github-script@v2
-        with:
-          github-token: ${{secrets.GITHUB_TOKEN}}
-          script: |
-            // Arguments available:
-            // - github: A pre-authenticated octokit/rest.js client
-            // - context: An object containing the context of the workflow run
-            // - core: A reference to the @actions/core package
-            // - io: A reference to the @actions/io package
-
-            // Check if issue has a Quantization label.
-            const kQuantizationLabel = "oncall: quantization";
-
-            issue = await github.issues.get({
-              owner: context.issue.owner,
-              repo: context.issue.repo,
-              issue_number: context.issue.number,
-            })
-
-            const hasQuantizationLabel = issue.data.labels.filter(label => label.name == kQuantizationLabel).length > 0;
-
-            if (!hasQuantizationLabel) {
-              core.debug("Issue " + issue.data.title + " does not have Quantization label");
-              return;
-            }
-
-            // Get project column ID.
-            const kProjectName = "Quantization Triage";
-            const kColumnName = "Need Triage";
-
-            // Query all projects in the repository.
-            // TODO: Support pagination once there are > 30 projects.
-            const projects = await github.projects.listForRepo({
-              owner: context.issue.owner,
-              repo: context.issue.repo,
-            });
-
-            // Filter out unwanted projects and get the ID for the Quantization Triage project.
-            const filteredProjects = projects.data.filter(project => project.name == kProjectName);
-
-            if (filteredProjects.length != 1) {
-              core.setFailed("Unable to find a project named " + kProjectName);
-              return;
-            }
-
-            const projectId = filteredProjects[0].id;
-            // First, query all columns in the project.
-            // TODO: Support pagination once there are > 30 columns.
-            const columns = await github.projects.listColumns({
-              project_id: projectId,
-            });
-
-            // Filter out unwanted projects and get the ID for the Need triage column.
-            const filteredColumns = columns.data.filter(column => column.name == kColumnName);
-
-            if (filteredColumns.length != 1) {
-              core.setFailed("Unable to find a column named " + kColumnName);
-              return;
-            }
-
-            const columnId = filteredColumns[0].id;
-
-            // Create a project card for this new issue.
-            await github.projects.createCard({
-              column_id: columnId,
-              content_id: issue.data.id,
-              content_type: "Issue",
-            })
--- a/.github/workflows/stale_pull_requests.yml
+++ b/.github/workflows/stale_pull_requests.yml
@ -1,36 +0,0 @@
-name: 'Close stale pull requests'
-on:
-  schedule:
-    # TODO: Reduce frequency once we work through the backlog of pull requests
-    - cron: '0 * * * *'
-  workflow_dispatch:
-
-jobs:
-  stale:
-    if: ${{ github.repository_owner == 'pytorch' }}
-    runs-on: ubuntu-18.04
-    steps:
-      - uses: actions/stale@v3
-        with:
-          stale-pr-message: >
-            Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as `Stale`. <br>
-            Feel free to remove the `Stale` label if you feel this was a mistake. <br>
-            `Stale` pull requests will automatically be closed 30 days after being marked `Stale` <br>
-          exempt-pr-labels: "no-stale,open source,high priority"
-          days-before-stale: 60
-          days-before-close: 90
-  stale-open-source:
-    if: ${{ github.repository_owner == 'pytorch' }}
-    runs-on: ubuntu-18.04
-    steps:
-      - uses: actions/stale@v3
-        with:
-          stale-pr-message: >
-            Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as `Stale`. <br>
-            Feel free to remove the `Stale` label if you feel this was a mistake. <br>
-            If you are unable to remove the `Stale` label please contact a maintainer in order to do so. <br>
-            `Stale` pull requests will automatically be closed 30 days after being marked `Stale` <br>
-          exempt-pr-labels: "no-stale,high priority"
-          only-labels: "open source"
-          days-before-stale: 150
-          days-before-close: 180
--- a/.github/workflows/update_s3_htmls.yml
+++ b/.github/workflows/update_s3_htmls.yml
@ -1,23 +0,0 @@
-name: Update S3 HTML indices for download.pytorch.org
-on:
-  schedule:
-    # Update the indices every 30 minutes
-    - cron: "*/30 * * * *"
-  # Have the ability to trigger this job manually using the API as well
-  workflow_dispatch:
-
-jobs:
-  update-html:
-    runs-on: ubuntu-18.04
-    if: ${{ github.repository_owner == 'pytorch' }}
-    strategy:
-      matrix:
-        prefix: ["whl", "whl/test", "whl/nightly"]
-    steps:
-      - name: Run updater image
-        env:
-          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_S3_UPDATE_ACCESS_KEY_ID }}
-          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_S3_UPDATE_SECRET_ACCESS_KEY }}
-        uses: docker://pytorch/manage_s3_html
-        with:
-          args: ${{ matrix.prefix }}
--- a/.gitignore
+++ b/.gitignore
@ -10,7 +10,6 @@

 .coverage
 coverage.xml
-.dmypy.json
 .gradle
 .hypothesis
 .mypy_cache
@ -34,16 +33,12 @@ docs/cpp/src
 docs/src/**/*
 docs/cpp/build
 docs/cpp/source/api
-docs/cpp/source/html/
-docs/cpp/source/latex/
 docs/source/generated/
 log
-test-reports/
 test/.coverage
 test/.hypothesis/
 test/cpp/api/mnist
 test/custom_operator/model.pt
-test/jit_hooks/*.pt
 test/data/legacy_modules.t7
 test/data/*.pt
 test/backward_compatibility/nightly_schemas.txt
@ -51,10 +46,9 @@ dropout_model.pt
 test/generated_type_hints_smoketest.py
 test/htmlcov
 test/cpp_extensions/install/
+test/test-reports/
 third_party/build/
 tools/shared/_utils_internal.py
-tools/fast_nvcc/wrap_nvcc.sh
-tools/fast_nvcc/tmp/
 torch.egg-info/
 torch/_C/__init__.pyi
 torch/_C/_nn.pyi
@ -65,11 +59,7 @@ torch/csrc/autograd/generated/*
 # Listed manually because some files in this directory are not generated
 torch/testing/_internal/generated/annotated_fn_args.py
 torch/testing/_internal/data/*.pt
-torch/csrc/api/include/torch/version.h
 torch/csrc/cudnn/cuDNN.cpp
-torch/csrc/deploy/interpreter/cpython
-torch/csrc/deploy/interpreter/frozen
-torch/csrc/deploy/interpreter/third_party/typing_extensions.py
 torch/csrc/generated
 torch/csrc/generic/TensorMethods.cpp
 torch/csrc/jit/generated/*
@ -84,7 +74,6 @@ torch/lib/*.exe*
 torch/lib/*.dylib*
 torch/lib/*.h
 torch/lib/*.lib
-torch/lib/*.pdb
 torch/lib/*.so*
 torch/lib/protobuf*.pc
 torch/lib/build
@ -102,14 +91,11 @@ torch/lib64
 torch/include/
 torch/share/
 torch/test/
-torch/utils/benchmark/utils/valgrind_wrapper/callgrind.h
-torch/utils/benchmark/utils/valgrind_wrapper/valgrind.h
 torch/version.py
 # Root level file used in CI to specify certain env configs.
 # E.g., see .circleci/config.yaml
 env
 .circleci/scripts/COMMIT_MSG
-scripts/release_notes/*.json

 # IPython notebook checkpoints
 .ipynb_checkpoints
@ -202,7 +188,6 @@ build_ios
 /build_*
 .build_debug/*
 .build_release/*
-.build_profile/*
 distribute/*
 *.testbin
 *.bin
--- a/.gitmodules
+++ b/.gitmodules
@ -121,7 +121,7 @@
 [submodule "third_party/XNNPACK"]
    ignore = dirty
    path = third_party/XNNPACK
-    url = https://github.com/malfet/XNNPACK.git
+    url = https://github.com/google/XNNPACK.git
 [submodule "third_party/fmt"]
    ignore = dirty
    path = third_party/fmt
@ -130,6 +130,3 @@
    ignore = dirty
    path = third_party/tensorpipe
    url = https://github.com/pytorch/tensorpipe.git
-[submodule "third_party/kineto"]
-	path = third_party/kineto
-	url = https://github.com/pytorch/kineto
--- a/.jenkins/caffe2/build.sh
+++ b/.jenkins/caffe2/build.sh
@ -18,6 +18,49 @@ build_to_cmake () {


 SCCACHE="$(which sccache)"
+if [ "$(which gcc)" != "/root/sccache/gcc" ]; then
+  # Setup SCCACHE
+  ###############################################################################
+  # Setup sccache if SCCACHE_BUCKET is set
+  if [ -n "${SCCACHE_BUCKET}" ]; then
+    mkdir -p ./sccache
+
+    SCCACHE="$(which sccache)"
+    if [ -z "${SCCACHE}" ]; then
+      echo "Unable to find sccache..."
+      exit 1
+    fi
+
+    # Setup wrapper scripts
+    wrapped="cc c++ gcc g++ x86_64-linux-gnu-gcc"
+    if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then
+        wrapped="$wrapped nvcc"
+    fi
+    for compiler in $wrapped; do
+      (
+        echo "#!/bin/sh"
+
+        # TODO: if/when sccache gains native support for an
+        # SCCACHE_DISABLE flag analogous to ccache's CCACHE_DISABLE,
+        # this can be removed. Alternatively, this can be removed when
+        # https://github.com/pytorch/pytorch/issues/13362 is fixed.
+        #
+        # NOTE: carefully quoted - we want `which compiler` to be
+        # resolved as we execute the script, but SCCACHE_DISABLE and
+        # $@ to be evaluated when we execute the script
+        echo 'test $SCCACHE_DISABLE && exec '"$(which $compiler)"' "$@"'
+
+        echo "exec $SCCACHE $(which $compiler) \"\$@\""
+      ) > "./sccache/$compiler"
+      chmod +x "./sccache/$compiler"
+    done
+
+    export CACHE_WRAPPER_DIR="$PWD/sccache"
+
+    # CMake must find these wrapper scripts
+    export PATH="$CACHE_WRAPPER_DIR:$PATH"
+  fi
+fi

 # Setup ccache if configured to use it (and not sccache)
 if [ -z "${SCCACHE}" ] && which ccache > /dev/null; then
@ -118,11 +161,6 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then
  export PATH="/usr/local/cuda/bin:$PATH"
 fi
 if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
-  if [[ -n "$IN_CI" ]]; then
-      # Set ROCM_ARCH to gfx900 and gfx906 for CI builds
-      echo "Limiting PYTORCH_ROCM_ARCH to gfx90[06] for CI builds"
-      export PYTORCH_ROCM_ARCH="gfx900;gfx906"
-  fi
  # This is needed to enable ImageInput operator in resnet50_trainer
  build_args+=("USE_OPENCV=ON")
  # This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
@ -210,7 +248,7 @@ else
    export MAX_JOBS=`expr $(nproc) - 1`
  fi

-  pip install --user dataclasses typing_extensions
+  pip install --user dataclasses

  $PYTHON setup.py install --user

@ -222,21 +260,6 @@ fi
 ###############################################################################

 # Install ONNX into a local directory
-pip install --user "file://${ROOT_DIR}/third_party/onnx#egg=onnx"
+pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

 report_compile_cache_stats
-
-if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
-  # remove sccache wrappers post-build; runtime compilation of MIOpen kernels does not yet fully support them
-  sudo rm -f /opt/cache/bin/cc
-  sudo rm -f /opt/cache/bin/c++
-  sudo rm -f /opt/cache/bin/gcc
-  sudo rm -f /opt/cache/bin/g++
-  pushd /opt/rocm/llvm/bin
-  if [[ -d original ]]; then
-    sudo mv original/clang .
-    sudo mv original/clang++ .
-  fi
-  sudo rm -rf original
-  popd
-fi
--- a/.jenkins/caffe2/common.sh
+++ b/.jenkins/caffe2/common.sh
@ -2,9 +2,9 @@ set -ex

 LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
 ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
-TEST_DIR="$ROOT_DIR/test"
-gtest_reports_dir="${TEST_DIR}/test-reports/cpp"
-pytest_reports_dir="${TEST_DIR}/test-reports/python"
+TEST_DIR="$ROOT_DIR/caffe2_tests"
+gtest_reports_dir="${TEST_DIR}/cpp"
+pytest_reports_dir="${TEST_DIR}/python"

 # Figure out which Python to use
 PYTHON="$(which python)"
@ -13,8 +13,6 @@ if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
 fi

 if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
-    # HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
-    unset HIP_PLATFORM
    if which sccache > /dev/null; then
        # Save sccache logs to file
        sccache --stop-server || true
--- a/.jenkins/caffe2/test.sh
+++ b/.jenkins/caffe2/test.sh
@ -84,21 +84,23 @@ fi
 # CircleCI docker images could install conda as jenkins user, or use the OS's python package.
 PIP=$(which pip)
 PIP_USER=$(stat --format '%U' $PIP)
-CURRENT_USER=$(id -u -n)
-if [[ "$PIP_USER" = root && "$CURRENT_USER" != root ]]; then
+if [[ "$PIP_USER" = root ]]; then
  MAYBE_SUDO=sudo
 fi

-# Uninstall pre-installed hypothesis and coverage to use an older version as newer
-# versions remove the timeout parameter from settings which ideep/conv_transpose_test.py uses
-$MAYBE_SUDO pip -q uninstall -y hypothesis
-$MAYBE_SUDO pip -q uninstall -y coverage
-
-# "pip install hypothesis==3.44.6" from official server is unreliable on
-# CircleCI, so we host a copy on S3 instead
-$MAYBE_SUDO pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
-$MAYBE_SUDO pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
-$MAYBE_SUDO pip -q install hypothesis==3.44.6 -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl
+# if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then
+  # Hotfix, use hypothesis 3.44.6 on Ubuntu 14.04
+  # See comments on
+  # https://github.com/HypothesisWorks/hypothesis-python/commit/eadd62e467d6cee6216e71b391951ec25b4f5830
+  $MAYBE_SUDO pip -q uninstall -y hypothesis
+  # "pip install hypothesis==3.44.6" from official server is unreliable on
+  # CircleCI, so we host a copy on S3 instead
+  $MAYBE_SUDO pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
+  $MAYBE_SUDO pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
+  $MAYBE_SUDO pip -q install hypothesis==3.44.6 -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl
+# else
+#   pip install --user --no-cache-dir hypothesis==3.59.0
+# fi

 # Collect additional tests to run (outside caffe2/python)
 EXTRA_TESTS=()
@ -161,12 +163,15 @@ pip install --user pytest-sugar
 if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
  # Check out torch/vision at Jun 11 2020 commit
  # This hash must match one in .jenkins/pytorch/test.sh
-  pip install -q --user git+https://github.com/pytorch/vision.git@ae0d80b3c52dc98b3a9763bdb974c3ef7b6eb83d
+  pip install -q --user git+https://github.com/pytorch/vision.git@c2e8a00885e68ae1200eb6440f540e181d9125de
  pip install -q --user ninja
  # JIT C++ extensions require ninja, so put it into PATH.
  export PATH="/var/lib/jenkins/.local/bin:$PATH"
  if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then
-    pip install -q --user onnxruntime==1.6.0
+    # default pip version is too old(9.0.2), unable to support tag `manylinux2010`.
+    # Fix the pip error: Couldn't find a version that satisfies the requirement
+    pip install --upgrade pip
+    pip install -q --user ort-nightly==1.5.0.dev202009182
  fi
  "$ROOT_DIR/scripts/onnx/test.sh"
 fi
--- a/.jenkins/pytorch/.shellcheckrc
+++ b/.jenkins/pytorch/.shellcheckrc
@ -1,6 +0,0 @@
-disable=SC2086
-disable=SC1091
-disable=SC2155
-disable=SC1090
-disable=SC2164
-disable=SC1003
--- a/.jenkins/pytorch/README.md
+++ b/.jenkins/pytorch/README.md
@ -10,9 +10,9 @@ it is very easy to run these tests yourself:
   ``registry.pytorch.org/pytorch/pytorch-$BUILD_ENVIRONMENT:$DOCKER_VERSION``,
   where ``$BUILD_ENVIRONMENT`` is one of the build environments
   enumerated in
-   [pytorch-dockerfiles](https://github.com/pytorch/pytorch/blob/master/.circleci/docker/build.sh). The dockerfile used by jenkins can be found under the `.circle` [directory](https://github.com/pytorch/pytorch/blob/master/.circleci/docker)
+   [pytorch-dockerfiles](https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh)

-2. Run ``docker run -it -u jenkins $DOCKER_IMAGE``, clone PyTorch and
+2. Run ``docker -it -u jenkins $DOCKER_IMAGE``, clone PyTorch and
   run one of the scripts in this directory.

 The Docker images are designed so that any "reasonable" build commands
@ -38,5 +38,5 @@ mechanisms we use:
  build scripts.

 - We reroute well known paths like `/usr/bin/gcc` to alternate
-  implementations with `update-alternatives`, instead of setting
+  implementations with `update-alternatives, instead of setting
  `CC` and `CXX` in our implementations.
--- a/.jenkins/pytorch/build-mobile-code-analysis.sh
+++ b/.jenkins/pytorch/build-mobile-code-analysis.sh
@ -5,7 +5,6 @@ set -eu -o pipefail
 # This script builds and runs code analyzer tool to generate aten op dependency
 # graph for custom mobile build.

-# shellcheck disable=SC2034
 COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

 source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
--- a/.jenkins/pytorch/build-mobile.sh
+++ b/.jenkins/pytorch/build-mobile.sh
@ -6,7 +6,6 @@ set -eu -o pipefail
 # build & test mobile libtorch without having to setup Android/iOS
 # toolchain/simulator.

-# shellcheck disable=SC2034
 COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

 source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
--- a/.jenkins/pytorch/build.sh
+++ b/.jenkins/pytorch/build.sh
@ -1,7 +1,5 @@
 #!/bin/bash

-set -ex
-
 # Required environment variable: $BUILD_ENVIRONMENT
 # (This is set by default in the Docker images we build, so you don't
 # need to set it yourself.
@ -9,8 +7,37 @@ set -ex
 # shellcheck disable=SC2034
 COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

+# Temp: use new sccache
+if [[ -n "$IN_CIRCLECI" && "$BUILD_ENVIRONMENT" == *rocm* ]]; then
+  # Download customized sccache
+  sudo curl --retry 3 http://repo.radeon.com/misc/.sccache_amd/sccache -o /opt/cache/bin/sccache
+  sudo chmod 755 /opt/cache/bin/sccache
+fi
+
 source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

+# For distributed, four environmental configs:
+# (1) build with only NCCL
+# (2) build with NCCL and MPI
+# (3) build with only MPI
+# (4) build with neither
+if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then
+  # TODO: move this to Docker
+  sudo apt-get -qq update
+  sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
+fi
+
+if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc5* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
+  # TODO: move this to Docker
+  sudo apt-get -qq update
+  if [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then
+    sudo apt-get -qq install openmpi-bin libopenmpi-dev
+  else
+    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
+  fi
+  sudo mkdir -p /var/run/sshd
+fi
+
 if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then
  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"
 fi
@ -23,17 +50,6 @@ if [[ "$BUILD_ENVIRONMENT" == *-mobile-code-analysis* ]]; then
  exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile-code-analysis.sh" "$@"
 fi

-if [[ "$BUILD_ENVIRONMENT" == pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7* ]]; then
-  # Enabling DEPLOY build (embedded torch python interpreter, experimental)
-  # only on one config for now, can expand later
-  export USE_DEPLOY=ON
-
-  # Deploy feature builds cpython. It requires these packages.
-  # TODO move this to dockerfile?
-  sudo apt-get -qq update
-  sudo apt-get -qq install libffi-dev libbz2-dev libreadline-dev libncurses5-dev libncursesw5-dev libgdbm-dev libsqlite3-dev uuid-dev tk-dev
-fi
-
 echo "Python version:"
 python --version

@ -48,16 +64,6 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
  nvcc --version
 fi

-if [[ "$BUILD_ENVIRONMENT" == *coverage* ]]; then
-  # enable build option in CMake
-  export USE_CPP_CODE_COVERAGE=ON
-fi
-
-if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then
-  # enable split torch_cuda build option in CMake
-  export BUILD_SPLIT_CUDA=ON
-fi
-
 # TODO: Don't run this...
 pip_install -r requirements.txt || true

@ -83,14 +89,8 @@ if [[ "$BUILD_ENVIRONMENT" == *libtorch* ]]; then
  POSSIBLE_JAVA_HOMES+=(/usr/local)
  POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)
  POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)
-  # Add the Windows-specific JNI
-  POSSIBLE_JAVA_HOMES+=("$PWD/.circleci/windows-jni/")
  for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do
    if [[ -e "$JH/include/jni.h" ]] ; then
-      # Skip if we're not on Windows but haven't found a JAVA_HOME
-      if [[ "$JH" == "$PWD/.circleci/windows-jni/" && "$OSTYPE" != "msys" ]] ; then
-        break
-      fi
      echo "Found jni.h under $JH"
      export JAVA_HOME="$JH"
      export BUILD_JNI=ON
@ -135,27 +135,40 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
    export MAX_JOBS=$(($(nproc) - 1))
  fi

-  if [[ -n "$IN_CI" ]]; then
-      # Set ROCM_ARCH to gfx900 and gfx906 for CI builds
-      echo "Limiting PYTORCH_ROCM_ARCH to gfx90[06] for CI builds"
+  # ROCm CI is using Caffe2 docker images, which needs these wrapper
+  # scripts to correctly use sccache.
+  if [[ -n "${SCCACHE_BUCKET}" && -z "$IN_CIRCLECI" ]]; then
+    mkdir -p ./sccache
+
+    SCCACHE="$(which sccache)"
+    if [ -z "${SCCACHE}" ]; then
+      echo "Unable to find sccache..."
+      exit 1
+    fi
+
+    # Setup wrapper scripts
+    for compiler in cc c++ gcc g++ clang clang++; do
+      (
+        echo "#!/bin/sh"
+        echo "exec $SCCACHE $(which $compiler) \"\$@\""
+      ) > "./sccache/$compiler"
+      chmod +x "./sccache/$compiler"
+    done
+
+    export CACHE_WRAPPER_DIR="$PWD/sccache"
+
+    # CMake must find these wrapper scripts
+    export PATH="$CACHE_WRAPPER_DIR:$PATH"
+  fi
+
+  if [[ -n "$IN_CIRCLECI" ]]; then
+      # Set ROCM_ARCH to gtx900 and gtx906 in CircleCI
+      echo "Limiting PYTORCH_ROCM_ARCH to gfx90[06] for CircleCI builds"
      export PYTORCH_ROCM_ARCH="gfx900;gfx906"
  fi

  python tools/amd_build/build_amd.py
-  python setup.py install
-
-  # remove sccache wrappers post-build; runtime compilation of MIOpen kernels does not yet fully support them
-  sudo rm -f /opt/cache/bin/cc
-  sudo rm -f /opt/cache/bin/c++
-  sudo rm -f /opt/cache/bin/gcc
-  sudo rm -f /opt/cache/bin/g++
-  pushd /opt/rocm/llvm/bin
-  if [[ -d original ]]; then
-    sudo mv original/clang .
-    sudo mv original/clang++ .
-  fi
-  sudo rm -rf original
-  popd
+  python setup.py install --user

  exit 0
 fi
@ -163,7 +176,7 @@ fi
 # sccache will fail for CUDA builds if all cores are used for compiling
 # gcc 7 with sccache seems to have intermittent OOM issue if all cores are used
 if [ -z "$MAX_JOBS" ]; then
-  if { [[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]; } && which sccache > /dev/null; then
+  if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then
    export MAX_JOBS=$(($(nproc) - 1))
  fi
 fi
@ -182,7 +195,7 @@ fi

 # Patch required to build xla
 if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
-  git clone --recursive -b r1.8 https://github.com/pytorch/xla.git
+  git clone --recursive -b r1.7 https://github.com/pytorch/xla.git
  ./xla/scripts/apply_patches.sh
 fi

@ -239,18 +252,6 @@ else
    popd
    assert_git_not_dirty

-    # Build jit hook tests
-    JIT_HOOK_BUILD="$PWD/../jit-hook-build"
-    JIT_HOOK_TEST="$PWD/test/jit_hooks"
-    python --version
-    SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
-    mkdir "$JIT_HOOK_BUILD"
-    pushd "$JIT_HOOK_BUILD"
-    cmake "$JIT_HOOK_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" -DPYTHON_EXECUTABLE="$(which python)"
-    make VERBOSE=1
-    popd
-    assert_git_not_dirty
-
    # Build custom backend tests.
    CUSTOM_BACKEND_BUILD="$PWD/../custom-backend-build"
    CUSTOM_BACKEND_TEST="$PWD/test/custom_backend"
@ -289,7 +290,6 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
  # TODO: Move this to Dockerfile.

  pip_install lark-parser
-  pip_install cloud-tpu-client

  sudo apt-get -qq update
  sudo apt-get -qq install npm nodejs
--- a/.jenkins/pytorch/codegen-test.sh
+++ b/.jenkins/pytorch/codegen-test.sh
@ -1,56 +0,0 @@
-#!/usr/bin/env bash
-
-# This script can also be used to test whether your diff changes any codegen output.
-#
-# Run it before and after your change:
-#   .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
-#   .jenkins/pytorch/codegen-test.sh <test_output_dir>
-#
-# Then run diff to compare the generated files:
-#   diff -Naur <baseline_output_dir> <test_output_dir>
-
-set -eu -o pipefail
-
-if [ "$#" -eq 0 ]; then
-  # shellcheck disable=SC2034
-  COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
-  source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
-  OUT="$(dirname "${BASH_SOURCE[0]}")/../../codegen_result"
-else
-  OUT=$1
-fi
-
-set -x
-
-rm -rf "$OUT"
-
-# aten codegen
-python -m tools.codegen.gen \
-  -d "$OUT"/torch/share/ATen
-
-# torch codegen
-python -m tools.setup_helpers.generate_code \
-  --declarations-path "$OUT"/torch/share/ATen/Declarations.yaml \
-  --install_dir "$OUT"
-
-# pyi codegen
-mkdir -p "$OUT"/pyi/torch/_C
-mkdir -p "$OUT"/pyi/torch/nn
-python -m tools.pyi.gen_pyi \
-  --native-functions-path aten/src/ATen/native/native_functions.yaml \
-  --deprecated-functions-path tools/autograd/deprecated.yaml \
-  --out "$OUT"/pyi
-
-# autograd codegen (called by torch codegen but can run independently)
-python -m tools.autograd.gen_autograd \
-  "$OUT"/torch/share/ATen/Declarations.yaml \
-  aten/src/ATen/native/native_functions.yaml \
-  "$OUT"/autograd \
-  tools/autograd
-
-# annotated_fn_args codegen (called by torch codegen but can run independently)
-mkdir -p "$OUT"/annotated_fn_args
-python -m tools.autograd.gen_annotated_fn_args \
-  aten/src/ATen/native/native_functions.yaml \
-  "$OUT"/annotated_fn_args \
-  tools/autograd
--- a/.jenkins/pytorch/common.sh
+++ b/.jenkins/pytorch/common.sh
@ -12,13 +12,11 @@ SCRIPT_DIR="$( cd "$(dirname "${BASH_SOURCE[0]}")" ; pwd -P )"

 # Figure out which Python to use for ROCm
 if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]] && [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
-  # HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
-  unset HIP_PLATFORM
  PYTHON=$(which "python${BASH_REMATCH[1]}")
  # non-interactive bashs do not expand aliases by default
  shopt -s expand_aliases
  export PYTORCH_TEST_WITH_ROCM=1
-  alias python='$PYTHON'
+  alias python="$PYTHON"
  # temporary to locate some kernel issues on the CI nodes
  export HSAKMT_DEBUG_LEVEL=4
 fi
@ -45,7 +43,7 @@ fatal() { error "$@"; exit 1; }
 # - remaining args:  names of traps to modify
 #
 trap_add() {
-    trap_add_cmd=$1; shift || fatal "${FUNCNAME[0]} usage error"
+    trap_add_cmd=$1; shift || fatal "${FUNCNAME} usage error"
    for trap_add_name in "$@"; do
        trap -- "$(
            # helper fn to get existing trap command from output
@ -116,7 +114,6 @@ if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda10.1-cudnn7-py3* ]] || \
   [[ "$BUILD_ENVIRONMENT" == *pytorch_macos* ]]; then
  BUILD_TEST_LIBTORCH=1
 else
-  # shellcheck disable=SC2034
  BUILD_TEST_LIBTORCH=0
 fi

@ -129,7 +126,6 @@ fi
 if [[ "$BUILD_ENVIRONMENT" == *pytorch-xla-linux-bionic* ]] || \
   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py2* ]] || \
   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda10.1-cudnn7-py3* ]] || \
-   [[ "$BUILD_ENVIRONMENT" == *pytorch-*centos* ]] || \
   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-bionic* ]]; then
  if ! which conda; then
    echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"
@ -137,12 +133,9 @@ if [[ "$BUILD_ENVIRONMENT" == *pytorch-xla-linux-bionic* ]] || \
  else
    conda install -q -y cmake
  fi
-  if [[ "$BUILD_ENVIRONMENT" == *pytorch-*centos* ]]; then
-    # cmake3 package will conflict with conda cmake
-    sudo yum -y remove cmake3 || true
-  fi
 fi

 retry () {
-  "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")
+  $*  || (sleep 1 && $*) || (sleep 2 && $*)
 }
+
--- a/.jenkins/pytorch/common_utils.sh
+++ b/.jenkins/pytorch/common_utils.sh
@ -18,7 +18,7 @@ function cleanup {
 function assert_git_not_dirty() {
    # TODO: we should add an option to `build_amd.py` that reverts the repo to
    #       an unmodified state.
-    if [[ "$BUILD_ENVIRONMENT" != *rocm* ]] && [[ "$BUILD_ENVIRONMENT" != *xla* ]] ; then
+    if ([[ "$BUILD_ENVIRONMENT" != *rocm* ]] && [[ "$BUILD_ENVIRONMENT" != *xla* ]]) ; then
        git_status=$(git status --porcelain)
        if [[ $git_status ]]; then
            echo "Build left local git repository checkout dirty"
@ -52,9 +52,9 @@ function get_exit_code() {
 function file_diff_from_base() {
  # The fetch may fail on Docker hosts, but it's not always necessary.
  set +e
-  git fetch origin master --quiet
+  git fetch origin release/1.7 --quiet
  set -e
-  git diff --name-only "$(git merge-base origin/release/1.8 HEAD)" > "$1"
+  git diff --name-only "$(git merge-base origin/release/1.7 HEAD)" > "$1"
 }

 function get_bazel() {
@ -66,7 +66,7 @@ function get_bazel() {
  chmod +x tools/bazel
 }

-TORCHVISION_COMMIT=ae0d80b3c52dc98b3a9763bdb974c3ef7b6eb83d
+TORCHVISION_COMMIT=c2e8a00885e68ae1200eb6440f540e181d9125de

 function install_torchvision() {
  # Check out torch/vision at Jun 11 2020 commit
--- a/.jenkins/pytorch/macos-build.sh
+++ b/.jenkins/pytorch/macos-build.sh
@ -8,26 +8,48 @@ git submodule update --init --recursive
 export CMAKE_PREFIX_PATH=${WORKSPACE_DIR}/miniconda3/

 # Build PyTorch
-if [ -z "${IN_CI}" ]; then
-  export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
+if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then
+  export CUDA_VERSION=9.2
+  export TORCH_CUDA_ARCH_LIST=5.2
+  export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}
+  export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
+  export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}
+  export USE_CUDA=1
+
+  if [ -z "${IN_CIRCLECI}" ]; then
+    # Eigen gives "explicit specialization of class must precede its first use" error
+    # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
+    export DEVELOPER_DIR=/Library/Developer/CommandLineTools
+  fi
+else
+  if [ -z "${IN_CIRCLECI}" ]; then
+    export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
+  fi
 fi

 if which sccache > /dev/null; then
-  printf "#!/bin/sh\nexec sccache %s \$*" "$(which clang++)" > "${WORKSPACE_DIR}/clang++"
+  printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${WORKSPACE_DIR}/clang++"
  chmod a+x "${WORKSPACE_DIR}/clang++"

-  printf "#!/bin/sh\nexec sccache %s \$*" "$(which clang)" > "${WORKSPACE_DIR}/clang"
+  printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${WORKSPACE_DIR}/clang"
  chmod a+x "${WORKSPACE_DIR}/clang"

+  if [[ "${BUILD_ENVIRONMENT}" == *cuda* ]]; then
+    printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${WORKSPACE_DIR}/nvcc"
+    chmod a+x "${WORKSPACE_DIR}/nvcc"
+    export CUDA_NVCC_EXECUTABLE="${WORKSPACE_DIR}/nvcc"
+  fi
+
  export PATH="${WORKSPACE_DIR}:$PATH"
 fi

-USE_DISTRIBUTED=1 python setup.py install
+# If we run too many parallel jobs, we will OOM
+MAX_JOBS=2 USE_DISTRIBUTED=1 python setup.py install

 assert_git_not_dirty

 # Upload torch binaries when the build job is finished
-if [ -z "${IN_CI}" ]; then
+if [ -z "${IN_CIRCLECI}" ]; then
  7z a ${IMAGE_COMMIT_TAG}.7z ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*
  aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read
 fi
--- a/.jenkins/pytorch/macos-test.sh
+++ b/.jenkins/pytorch/macos-test.sh
@ -7,9 +7,14 @@ conda install -y six
 pip install -q hypothesis "librosa>=0.6.2" "numba<=0.49.1" psutil

 # TODO move this to docker
-pip install unittest-xml-reporting pytest
+pip install unittest-xml-reporting

-if [ -z "${IN_CI}" ]; then
+# faulthandler become built-in since 3.3
+if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then
+  pip install -q faulthandler
+fi
+
+if [ -z "${IN_CIRCLECI}" ]; then
  rm -rf ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*
 fi

@ -18,7 +23,7 @@ git submodule update --init --recursive
 export CMAKE_PREFIX_PATH=${WORKSPACE_DIR}/miniconda3/

 # Test PyTorch
-if [ -z "${IN_CI}" ]; then
+if [ -z "${IN_CIRCLECI}" ]; then
  if [[ "${BUILD_ENVIRONMENT}" == *cuda9.2* ]]; then
    # Eigen gives "explicit specialization of class must precede its first use" error
    # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.
@ -29,7 +34,7 @@ if [ -z "${IN_CI}" ]; then
 fi

 # Download torch binaries in the test jobs
-if [ -z "${IN_CI}" ]; then
+if [ -z "${IN_CIRCLECI}" ]; then
  rm -rf ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*
  aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z
  7z x ${IMAGE_COMMIT_TAG}.7z -o"${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages"
@ -58,7 +63,7 @@ test_python_all() {
  # Increase default limit on open file handles from 256 to 1024
  ulimit -n 1024

-  python test/run_test.py --verbose --exclude-jit-executor --determine-from="$DETERMINE_FROM"
+  python test/run_test.py --verbose --exclude test_jit_cuda_fuser_profiling test_jit_cuda_fuser_legacy test_jit_legacy test_jit_fuser_legacy --determine-from="$DETERMINE_FROM"

  assert_git_not_dirty
 }
@ -134,31 +139,11 @@ test_custom_script_ops() {
  assert_git_not_dirty
 }

-test_jit_hooks() {
-  echo "Testing jit hooks in cpp"
-  pushd test/jit_hooks
-  # Build the custom operator library.
-  rm -rf build && mkdir build
-  pushd build
-  SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
-  CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake ..
-  make VERBOSE=1
-  popd
-
-  # Run tests Python-side and export a script module.
-  python model.py --export-script-module=model
-  # Run tests C++-side and load the exported script module.
-  build/test_jit_hooks ./model
-  popd
-  assert_git_not_dirty
-}
-

 if [ -z "${BUILD_ENVIRONMENT}" ] || [[ "${BUILD_ENVIRONMENT}" == *-test ]]; then
  test_python_all
  test_libtorch
  test_custom_script_ops
-  test_jit_hooks
  test_custom_backend
 else
  if [[ "${BUILD_ENVIRONMENT}" == *-test1 ]]; then
@ -166,7 +151,6 @@ else
  elif [[ "${BUILD_ENVIRONMENT}" == *-test2 ]]; then
    test_libtorch
    test_custom_script_ops
-    test_jit_hooks
    test_custom_backend
  fi
 fi
--- a/.jenkins/pytorch/multigpu-test.sh
+++ b/.jenkins/pytorch/multigpu-test.sh
@ -10,16 +10,26 @@ COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
 source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

 echo "Testing pytorch (distributed only)"
-if [ -n "${IN_CI}" ]; then
+if [ -n "${IN_CIRCLECI}" ]; then
  # TODO move this to docker
  pip_install unittest-xml-reporting
+
+  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then
+    # TODO: move this to Docker
+    sudo apt-get update
+    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
+  fi
+
+  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-cudnn7-py3* ]]; then
+    # TODO: move this to Docker
+    sudo apt-get update
+    sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
+  fi
 fi

 python tools/download_mnist.py --quiet -d test/cpp/api/mnist
 OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" build/bin/test_api
-time python test/run_test.py --verbose -i distributed/test_jit_c10d
 time python test/run_test.py --verbose -i distributed/test_distributed_fork
 time python test/run_test.py --verbose -i distributed/test_c10d
 time python test/run_test.py --verbose -i distributed/test_c10d_spawn
-time python test/run_test.py --verbose -i distributed/rpc/test_tensorpipe_agent
 assert_git_not_dirty
--- a/.jenkins/pytorch/perf_test/test_cpu_speed_mini_sequence_labeler.sh
+++ b/.jenkins/pytorch/perf_test/test_cpu_speed_mini_sequence_labeler.sh
@ -21,7 +21,7 @@ test_cpu_speed_mini_sequence_labeler () {

  for (( i=1; i<=NUM_RUNS; i++ )) do
    runtime=$(get_runtime_of_command python main.py)
-    SAMPLE_ARRAY+=("${runtime}")
+    SAMPLE_ARRAY+=(${runtime})
  done

  cd ../../..
--- a/.jenkins/pytorch/perf_test/test_cpu_speed_mnist.sh
+++ b/.jenkins/pytorch/perf_test/test_cpu_speed_mnist.sh
@ -23,7 +23,7 @@ test_cpu_speed_mnist () {
  for (( i=1; i<=NUM_RUNS; i++ )) do
    runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
    echo $runtime
-    SAMPLE_ARRAY+=("${runtime}")
+    SAMPLE_ARRAY+=(${runtime})
  done

  cd ../..
--- a/.jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh
+++ b/.jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh
@ -22,7 +22,7 @@ test_gpu_speed_cudnn_lstm () {
  for (( i=1; i<=NUM_RUNS; i++ )) do
    runtime=$(get_runtime_of_command python cudnn_lstm.py --skip-cpu-governor-check)
    echo $runtime
-    SAMPLE_ARRAY+=("${runtime}")
+    SAMPLE_ARRAY+=(${runtime})
  done

  cd ../..
--- a/.jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh
+++ b/.jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh
@ -22,7 +22,7 @@ test_gpu_speed_lstm () {
  for (( i=1; i<=NUM_RUNS; i++ )) do
    runtime=$(get_runtime_of_command python lstm.py --skip-cpu-governor-check)
    echo $runtime
-    SAMPLE_ARRAY+=("${runtime}")
+    SAMPLE_ARRAY+=(${runtime})
  done

  cd ../..
--- a/.jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh
+++ b/.jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh
@ -22,7 +22,7 @@ test_gpu_speed_mlstm () {
  for (( i=1; i<=NUM_RUNS; i++ )) do
    runtime=$(get_runtime_of_command python mlstm.py --skip-cpu-governor-check)
    echo $runtime
-    SAMPLE_ARRAY+=("${runtime}")
+    SAMPLE_ARRAY+=(${runtime})
  done

  cd ../..
--- a/.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh
+++ b/.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh
@ -26,7 +26,7 @@ test_gpu_speed_mnist () {
  for (( i=1; i<=NUM_RUNS; i++ )) do
    runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)
    echo $runtime
-    SAMPLE_ARRAY+=("${runtime}")
+    SAMPLE_ARRAY+=(${runtime})
  done

  cd ../..
--- a/.jenkins/pytorch/perf_test/test_gpu_speed_word_language_model.sh
+++ b/.jenkins/pytorch/perf_test/test_gpu_speed_word_language_model.sh
@ -31,7 +31,7 @@ test_gpu_speed_word_language_model () {
  for (( i=1; i<=NUM_RUNS; i++ )) do
    runtime=$(get_runtime_of_command python main.py --cuda --epochs 1)
    echo $runtime
-    SAMPLE_ARRAY+=("${runtime}")
+    SAMPLE_ARRAY+=(${runtime})
  done

  cd ../..
--- a/.jenkins/pytorch/short-perf-test-cpu.sh
+++ b/.jenkins/pytorch/short-perf-test-cpu.sh
@ -27,12 +27,13 @@ fi
 git remote add upstream https://github.com/pytorch/pytorch.git
 git fetch upstream
 IFS=$'\n'
-while IFS='' read -r commit_id; do
+master_commit_ids=($(git rev-list upstream/master))
+for commit_id in "${master_commit_ids[@]}"; do
    if aws s3 ls s3://ossci-perf-test/pytorch/cpu_runtime/${commit_id}.json; then
        LATEST_TESTED_COMMIT=${commit_id}
        break
    fi
-done < <(git rev-list upstream/master)
+done
 aws s3 cp s3://ossci-perf-test/pytorch/cpu_runtime/${LATEST_TESTED_COMMIT}.json cpu_runtime.json

 if [[ "$COMMIT_SOURCE" == master ]]; then
--- a/.jenkins/pytorch/short-perf-test-gpu.sh
+++ b/.jenkins/pytorch/short-perf-test-gpu.sh
@ -26,12 +26,13 @@ fi
 git remote add upstream https://github.com/pytorch/pytorch.git
 git fetch upstream
 IFS=$'\n'
-while IFS='' read -r commit_id; do
+master_commit_ids=($(git rev-list upstream/master))
+for commit_id in "${master_commit_ids[@]}"; do
    if aws s3 ls s3://ossci-perf-test/pytorch/gpu_runtime/${commit_id}.json; then
        LATEST_TESTED_COMMIT=${commit_id}
        break
    fi
-done < <(git rev-list upstream/master)
+done
 aws s3 cp s3://ossci-perf-test/pytorch/gpu_runtime/${LATEST_TESTED_COMMIT}.json gpu_runtime.json

 if [[ "$COMMIT_SOURCE" == master ]]; then
--- a/.jenkins/pytorch/test.sh
+++ b/.jenkins/pytorch/test.sh
@ -11,24 +11,34 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

 echo "Testing pytorch"

-export LANG=C.UTF-8
+if [ -n "${IN_CIRCLECI}" ]; then
+  # TODO move this to docker
+  pip_install unittest-xml-reporting coverage

-if [[ "$BUILD_ENVIRONMENT" == *-slow-* ]]; then
-  export PYTORCH_TEST_WITH_SLOW=1
-  export PYTORCH_TEST_SKIP_FAST=1
-fi
+  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-* ]]; then
+    # TODO: move this to Docker
+    sudo apt-get -qq update
+    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
+  fi

-if [[ "$BUILD_ENVIRONMENT" == *coverage* ]]; then
-  export PYTORCH_COLLECT_COVERAGE=1
-fi
+  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-cudnn7-py3* ]]; then
+    # TODO: move this to Docker
+    sudo apt-get -qq update
+    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
+  fi

-if [[ "$BUILD_ENVIRONMENT" == *cuda11.1* ]]; then
-  export BUILD_SPLIT_CUDA=ON
+  if [[ "$BUILD_ENVIRONMENT" == *-slow-* ]]; then
+    export PYTORCH_TEST_WITH_SLOW=1
+    export PYTORCH_TEST_SKIP_FAST=1
+  fi
+  if [[ "$BUILD_ENVIRONMENT" == *coverage* ]]; then
+    export PYTORCH_COLLECT_COVERAGE=1
+  fi
 fi

 if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
  # Print GPU info
-  rocminfo | grep -E 'Name:.*\sgfx|Marketing'
+  rocminfo | egrep 'Name:.*\sgfx|Marketing'
 fi

 # --user breaks ppc64le builds and these packages are already in ppc64le docker
@ -38,6 +48,18 @@ if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]] && [[ "$BUILD_ENVIRONMENT" != *-bazel
  # ninja is installed in $HOME/.local/bin, e.g., /var/lib/jenkins/.local/bin for CI user jenkins
  # but this script should be runnable by any user, including root
  export PATH="$HOME/.local/bin:$PATH"
+
+  # TODO: Please move this to Docker
+  # The version is fixed to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
+  pip_install --user "hypothesis==4.53.2"
+  # Pin MyPy version because new errors are likely to appear with each release
+  pip_install --user "mypy==0.770"
+  # Update scikit-learn to a python-3.8 compatible version
+  if [[ $(python -c "import sys; print(int(sys.version_info >= (3, 8)))") == "1" ]]; then
+    pip_install -U scikit-learn
+  fi
+
+  pip_install --user tb-nightly
 fi

 # DANGER WILL ROBINSON.  The LD_PRELOAD here could cause you problems
@ -99,23 +121,28 @@ elif [[ "${BUILD_ENVIRONMENT}" == *-NO_AVX2-* ]]; then
  export ATEN_CPU_CAPABILITY=avx
 fi

-if [ -n "$CIRCLE_PULL_REQUEST" ] && [[ "$BUILD_ENVIRONMENT" != *coverage* ]]; then
+if ([ -n "$CIRCLE_PULL_REQUEST" ] && [[ "$BUILD_ENVIRONMENT" != *coverage* ]]); then
  DETERMINE_FROM=$(mktemp)
  file_diff_from_base "$DETERMINE_FROM"
 fi

-test_python_legacy_jit() {
-  time python test/run_test.py --include test_jit_legacy test_jit_fuser_legacy --verbose --determine-from="$DETERMINE_FROM"
+test_python_nn() {
+  time python test/run_test.py --include test_nn --verbose --determine-from="$DETERMINE_FROM"
  assert_git_not_dirty
 }

-test_python_shard1() {
-  time python test/run_test.py --exclude-jit-executor --shard 1 2 --verbose --determine-from="$DETERMINE_FROM"
+test_python_ge_config_profiling() {
+  time python test/run_test.py --include test_jit_cuda_fuser_profiling test_jit_profiling test_jit_fuser_te test_tensorexpr --verbose --determine-from="$DETERMINE_FROM"
  assert_git_not_dirty
 }

-test_python_shard2() {
-  time python test/run_test.py --exclude-jit-executor --shard 2 2 --verbose --determine-from="$DETERMINE_FROM"
+test_python_ge_config_legacy() {
+  time python test/run_test.py --include test_jit_cuda_fuser_legacy test_jit_legacy test_jit_fuser_legacy --verbose --determine-from="$DETERMINE_FROM"
+  assert_git_not_dirty
+}
+
+test_python_all_except_nn_and_cpp_extensions() {
+  time python test/run_test.py --exclude test_jit_cuda_fuser_profiling test_jit_cuda_fuser_legacy test_nn test_jit_profiling test_jit_legacy test_jit_fuser_legacy test_jit_fuser_te test_tensorexpr --verbose --determine-from="$DETERMINE_FROM"
  assert_git_not_dirty
 }

@ -123,7 +150,7 @@ test_aten() {
  # Test ATen
  # The following test(s) of ATen have already been skipped by caffe2 in rocm environment:
  # scalar_tensor_test, basic, native_test
-  if [[ "$BUILD_ENVIRONMENT" != *asan* ]] && [[ "$BUILD_ENVIRONMENT" != *rocm* ]]; then
+  if ([[ "$BUILD_ENVIRONMENT" != *asan* ]] && [[ "$BUILD_ENVIRONMENT" != *rocm* ]]); then
    echo "Running ATen tests with pytorch lib"
    TORCH_LIB_PATH=$(python -c "import site; print(site.getsitepackages()[0])")/torch/lib
    # NB: the ATen test binaries don't have RPATH set, so it's necessary to
@ -246,21 +273,6 @@ test_custom_script_ops() {
  fi
 }

-test_jit_hooks() {
-  if [[ "$BUILD_ENVIRONMENT" != *rocm* ]] && [[ "$BUILD_ENVIRONMENT" != *asan* ]] ; then
-    echo "Testing jit hooks in cpp"
-    HOOK_BUILD="$PWD/../jit-hook-build"
-    pushd test/jit_hooks
-    cp -a "$HOOK_BUILD" build
-    # Run tests Python-side and export the script modules with hooks
-    python model.py --export-script-module=model
-    # Run tests C++-side and load the exported script modules
-    build/test_jit_hooks ./model
-    popd
-    assert_git_not_dirty
-  fi
-}
-
 test_torch_function_benchmark() {
  echo "Testing __torch_function__ benchmarks"
  pushd benchmarks/overrides_benchmark
@ -276,7 +288,7 @@ test_torch_function_benchmark() {
 test_xla() {
  export XLA_USE_XRT=1 XRT_DEVICE_MAP="CPU:0;/job:localservice/replica:0/task:0/device:XLA_CPU:0"
  # Issue #30717: randomize the port of XLA/gRPC workers is listening on to reduce flaky tests.
-  XLA_PORT=$(shuf -i 40701-40999 -n 1)
+  XLA_PORT=`shuf -i 40701-40999 -n 1`
  export XRT_WORKERS="localservice:0;grpc://localhost:$XLA_PORT"
  pushd xla
  echo "Running Python Tests"
@ -292,7 +304,7 @@ test_xla() {
  assert_git_not_dirty
 }

-# Do NOT run this test before any other tests, like test_python_shard1, etc.
+# Do NOT run this test before any other tests, like test_python_nn, etc.
 # Because this function uninstalls the torch built from branch, and install
 # nightly version.
 test_backward_compatibility() {
@ -300,7 +312,7 @@ test_backward_compatibility() {
  pushd test/backward_compatibility
  python -m venv venv
  . venv/bin/activate
-  pip_install --pre torch -f https://download.pytorch.org/whl/test/cpu/torch_test.html
+  pip_install --pre torch -f https://download.pytorch.org/whl/cpu/torch_stable.html
  pip show torch
  python dump_all_function_schemas.py --filename nightly_schemas.txt
  deactivate
@ -358,11 +370,6 @@ test_vec256() {
  fi
 }

-test_torch_deploy() {
-  SIMPLE_MODEL_PATH=torch/csrc/deploy/example/simple.pt LIBINTERPRETER_PATH=build/lib/libinterpreter.so build/bin/interpreter_test
-  assert_git_not_dirty
-}
-
 if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* || "${BUILD_ENVIRONMENT}" == *-bazel-* ]]; then
  (cd test && python -c "import torch; print(torch.__config__.show())")
  (cd test && python -c "import torch; print(torch.__config__.parallel_info())")
@ -374,20 +381,19 @@ if [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then
 elif [[ "${BUILD_ENVIRONMENT}" == *xla* || "${JOB_BASE_NAME}" == *xla* ]]; then
  install_torchvision
  test_xla
-elif [[ "${BUILD_ENVIRONMENT}" == *jit_legacy-test || "${JOB_BASE_NAME}" == *jit_legacy-test ]]; then
-  test_python_legacy_jit
+elif [[ "${BUILD_ENVIRONMENT}" == *ge_config_legacy* || "${JOB_BASE_NAME}" == *ge_config_legacy* ]]; then
+  test_python_ge_config_legacy
+elif [[ "${BUILD_ENVIRONMENT}" == *ge_config_profiling* || "${JOB_BASE_NAME}" == *ge_config_profiling* ]]; then
+  test_python_ge_config_profiling
 elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then
  # TODO: run some C++ tests
  echo "no-op at the moment"
 elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 ]]; then
-  if [[ "${BUILD_ENVIRONMENT}" == pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-test1 ]]; then
-    test_torch_deploy
-  fi
-  install_torchvision
-  test_python_shard1
+  test_python_nn
+  test_cpp_extensions
 elif [[ "${BUILD_ENVIRONMENT}" == *-test2 || "${JOB_BASE_NAME}" == *-test2 ]]; then
  install_torchvision
-  test_python_shard2
+  test_python_all_except_nn_and_cpp_extensions
  test_aten
  test_libtorch
  test_custom_script_ops
@ -403,8 +409,9 @@ elif [[ "${BUILD_ENVIRONMENT}" == pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4
  test_cpp_extensions
 else
  install_torchvision
-  test_python_shard1
-  test_python_shard2
+  test_python_nn
+  test_python_all_except_nn_and_cpp_extensions
+  test_cpp_extensions
  test_aten
  test_vec256
  test_libtorch
@ -414,15 +421,10 @@ else
  test_distributed
  test_benchmarks
  test_rpc
-fi
-
-if [[ "$BUILD_ENVIRONMENT" == *coverage* ]]; then
-  pushd test
-  echo "Generating XML coverage report"
-  time python -mcoverage xml
-  popd
-  pushd build
-  echo "Generating lcov coverage report for C++ sources"
-  time lcov --capture --directory . --output-file coverage.info
-  popd
+  if [[ "$BUILD_ENVIRONMENT" == *coverage* ]]; then
+    pushd test
+    echo "Generating XML coverage report"
+    time python -mcoverage xml
+    popd
+  fi
 fi
--- a/.jenkins/pytorch/win-build.sh
+++ b/.jenkins/pytorch/win-build.sh
@ -15,7 +15,7 @@ COMPACT_JOB_NAME=pytorch-win-ws2019-cuda10-cudnn7-py3-build
 SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
 source "$SCRIPT_PARENT_DIR/common.sh"

-export IMAGE_COMMIT_ID=$(git rev-parse HEAD)
+export IMAGE_COMMIT_ID=`git rev-parse HEAD`
 export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
 if [[ ${JOB_NAME} == *"develop"* ]]; then
  export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}
@ -38,21 +38,6 @@ fi

 export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers

-set +ex
-grep -E -R 'PyLong_(From|As)(Unsigned|)Long\(' --exclude=python_numbers.h torch/
-PYLONG_API_CHECK=$?
-if [[ $PYLONG_API_CHECK == 0 ]]; then
-  echo "Usage of PyLong_{From,As}{Unsigned}Long API may lead to overflow errors on Windows"
-  echo "because \`sizeof(long) == 4\` and \`sizeof(unsigned long) == 4\`."
-  echo "Please include \"torch/csrc/python_numbers.h\" and use the correspoding APIs instead."
-  echo "PyLong_FromLong -> THPUtils_packInt32 / THPUtils_packInt64"
-  echo "PyLong_AsLong -> THPUtils_unpackInt (32-bit) / THPUtils_unpackLong (64-bit)"
-  echo "PyLong_FromUnsignedLong -> THPUtils_packUInt32 / THPUtils_packUInt64"
-  echo "PyLong_AsUnsignedLong -> THPUtils_unpackUInt32 / THPUtils_unpackUInt64"
-  exit 1
-fi
-set -ex
-
 $SCRIPT_HELPERS_DIR/build_pytorch.bat

 assert_git_not_dirty
--- a/.jenkins/pytorch/win-test-helpers/build_pytorch.bat
+++ b/.jenkins/pytorch/win-test-helpers/build_pytorch.bat
@ -22,7 +22,7 @@ call %INSTALLER_DIR%\install_miniconda3.bat


 :: Install ninja and other deps
-if "%REBUILD%"=="" ( pip install -q "ninja==1.9.0" dataclasses typing_extensions )
+if "%REBUILD%"=="" ( pip install -q "ninja==1.9.0" dataclasses )

 git submodule sync --recursive
 git submodule update --init --recursive
@ -37,19 +37,33 @@ if "%VC_VERSION%" == "" (
@echo on
 popd

-if not "%USE_CUDA%"=="1" goto cuda_build_end
+if "%CUDA_VERSION%" == "9" goto cuda_build_9
+if "%CUDA_VERSION%" == "10" goto cuda_build_10
+if "%CUDA_VERSION%" == "11" goto cuda_build_11
+goto cuda_build_end

-set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v%CUDA_VERSION%
+:cuda_build_9

-rem version transformer, for example 10.1 to 10_1.
-set VERSION_SUFFIX=%CUDA_VERSION:.=_%
-set CUDA_PATH_V%VERSION_SUFFIX%=%CUDA_PATH%
+set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2
+set CUDA_PATH_V9_2=%CUDA_PATH%

-set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
-set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
-set CUDNN_ROOT_DIR=%CUDA_PATH%
-set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
-set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
+goto cuda_build_common
+
+:cuda_build_10
+
+set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
+set CUDA_PATH_V10_1=%CUDA_PATH%
+
+goto cuda_build_common
+
+:cuda_build_11
+
+set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0
+set CUDA_PATH_V11_0=%CUDA_PATH%
+
+goto cuda_build_common
+
+:cuda_build_common

 set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
 set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
@ -81,7 +95,7 @@ if "%USE_CUDA%"=="1" (
  copy %TMP_DIR_WIN%\bin\sccache.exe %TMP_DIR_WIN%\bin\nvcc.exe

  :: randomtemp is used to resolve the intermittent build error related to CUDA.
-  :: code: https://github.com/peterjc123/randomtemp-rust
+  :: code: https://github.com/peterjc123/randomtemp
  :: issue: https://github.com/pytorch/pytorch/issues/25393
  ::
  :: Previously, CMake uses CUDA_NVCC_EXECUTABLE for finding nvcc and then
@ -89,7 +103,7 @@ if "%USE_CUDA%"=="1" (
  :: in PATH, and then pass the arguments to it.
  :: Currently, randomtemp is placed before sccache (%TMP_DIR_WIN%\bin\nvcc)
  :: so we are actually pretending sccache instead of nvcc itself.
-  curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.3/randomtemp.exe --output %TMP_DIR_WIN%\bin\randomtemp.exe
+  curl -kL https://github.com/peterjc123/randomtemp/releases/download/v0.3/randomtemp.exe --output %TMP_DIR_WIN%\bin\randomtemp.exe
  set RANDOMTEMP_EXECUTABLE=%TMP_DIR_WIN%\bin\nvcc.exe
  set CUDA_NVCC_EXECUTABLE=%TMP_DIR_WIN%\bin\randomtemp.exe
  set RANDOMTEMP_BASEDIR=%TMP_DIR_WIN%\bin
@ -110,10 +124,6 @@ if "%REBUILD%" == "" (
    aws s3 cp "s3://ossci-windows/Restore PyTorch Environment.lnk" "C:\Users\circleci\Desktop\Restore PyTorch Environment.lnk"
  )
 )
-:: tests if BUILD_ENVIRONMENT contains cuda11 as a substring
-if not x%BUILD_ENVIRONMENT:cuda11=%==x%BUILD_ENVIRONMENT% (
-   set BUILD_SPLIT_CUDA=ON
-)

 python setup.py install --cmake && sccache --show-stats && (
  if "%BUILD_ENVIRONMENT%"=="" (
@ -122,3 +132,4 @@ python setup.py install --cmake && sccache --show-stats && (
    7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\caffe2 && copy /Y "%TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z" "%PYTORCH_FINAL_PACKAGE_DIR%\"
  )
 )
+
--- a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat
+++ b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat
@ -1,18 +1,18 @@
-rem remove dot in cuda_version, fox example 11.1 to 111
-set VERSION_SUFFIX=%CUDA_VERSION:.=%
-set CUDA_SUFFIX=cuda%VERSION_SUFFIX%
+if "%CUDA_VERSION%" == "9" set CUDA_SUFFIX=cuda92
+if "%CUDA_VERSION%" == "10" set CUDA_SUFFIX=cuda101
+if "%CUDA_VERSION%" == "11" set CUDA_SUFFIX=cuda110

 if "%CUDA_SUFFIX%" == "" (
-  echo unknown CUDA version, please set `CUDA_VERSION` higher than 9.2
+  echo unknown CUDA version, please set `CUDA_VERSION` to 9, 10 or 11.
  exit /b 1
 )

 if "%REBUILD%"=="" (
  if "%BUILD_ENVIRONMENT%"=="" (
-    curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z
+    curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/magma_2.5.3_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.3_%CUDA_SUFFIX%_%BUILD_TYPE%.7z
  ) else (
-    aws s3 cp s3://ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet
+    aws s3 cp s3://ossci-windows/magma_2.5.3_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.3_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet
  )
-  7z x -aoa %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma
+  7z x -aoa %TMP_DIR_WIN%\magma_2.5.3_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma
 )
 set MAGMA_HOME=%TMP_DIR_WIN%\magma
--- a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat
+++ b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_miniconda3.bat
@ -12,7 +12,7 @@ call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Minic
 if "%REBUILD%"=="" (
  call conda install -y -q python=%PYTHON_VERSION% numpy cffi pyyaml boto3
  call conda install -y -q -c conda-forge cmake
-  call conda install -y -q -c conda-forge libuv=1.39
+  call conda install -y -q -c rdonnelly libuv
 )

 :: Get installed libuv path
--- a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat
+++ b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat
@ -7,4 +7,4 @@ if "%REBUILD%"=="" (
  7z x -aoa %TMP_DIR_WIN%\mkl.7z -o%TMP_DIR_WIN%\mkl
 )
 set CMAKE_INCLUDE_PATH=%TMP_DIR_WIN%\mkl\include
-set LIB=%TMP_DIR_WIN%\mkl\lib;%LIB%
+set LIB=%TMP_DIR_WIN%\mkl\lib;%LIB
--- a/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat
+++ b/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat
@ -39,18 +39,40 @@ if %errorlevel% neq 0 ( exit /b %errorlevel% )
 popd

 :: The version is fixed to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
-pip install "ninja==1.10.0.post1" future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow unittest-xml-reporting pytest coverage
+pip install "ninja==1.10.0.post1" future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow unittest-xml-reporting
 if %errorlevel% neq 0 ( exit /b %errorlevel% )
+:: No need to install faulthandler since we only test Python >= 3.6 on Windows
+:: faulthandler is builtin since Python 3.3

 set DISTUTILS_USE_SDK=1

-if not "%USE_CUDA%"=="1" goto cuda_build_end
+if "%CUDA_VERSION%" == "9" goto cuda_build_9
+if "%CUDA_VERSION%" == "10" goto cuda_build_10
+if "%CUDA_VERSION%" == "11" goto cuda_build_11
+goto cuda_build_end

-set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v%CUDA_VERSION%
+:cuda_build_9

-rem version transformer, for example 10.1 to 10_1.
-set VERSION_SUFFIX=%CUDA_VERSION:.=_%
-set CUDA_PATH_V%VERSION_SUFFIX%=%CUDA_PATH%
+set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2
+set CUDA_PATH_V9_2=%CUDA_PATH%
+
+goto cuda_build_common
+
+:cuda_build_10
+
+set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
+set CUDA_PATH_V10_1=%CUDA_PATH%
+
+goto cuda_build_common
+
+:cuda_build_11
+
+set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0
+set CUDA_PATH_V11_0=%CUDA_PATH%
+
+goto cuda_build_common
+
+:cuda_build_common

 set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
 set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
--- a/.jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat
+++ b/.jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat
@ -0,0 +1,3 @@
+call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
+cd test && python run_test.py --exclude test_jit_cuda_fuser_profiling test_jit_cuda_fuser_legacy test_jit_profiling test_jit_legacy test_jit_fuser_legacy test_jit_fuser_te test_tensorexpr --verbose --determine-from="%1" && cd ..
+if ERRORLEVEL 1 exit /b 1
--- a/.jenkins/pytorch/win-test-helpers/test_python_jit_profiling.bat
+++ b/.jenkins/pytorch/win-test-helpers/test_python_jit_profiling.bat
@ -3,7 +3,9 @@ call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
 pushd test

 echo Run jit_profiling tests
-python run_test.py --include test_jit_legacy test_jit_fuser_legacy --verbose --determine-from="%1"
+python run_test.py --include test_jit_profiling test_jit_fuser_te test_tensorexpr --verbose --determine-from="%1"
 if ERRORLEVEL 1 exit /b 1

 popd
+
+
--- a/.jenkins/pytorch/win-test-helpers/test_python_first_shard.bat
+++ b/.jenkins/pytorch/win-test-helpers/test_python_first_shard.bat
@ -12,7 +12,9 @@ if ERRORLEVEL 1 exit /b 1
 if ERRORLEVEL 1 exit /b 1

 echo Run nn tests
-python run_test.py --exclude-jit-executor --shard 1 2 --verbose --determine-from="%1"
+python run_test.py --include test_nn --verbose --determine-from="%1"
 if ERRORLEVEL 1 exit /b 1

 popd
+
+
--- a/.jenkins/pytorch/win-test-helpers/test_python_second_shard.bat
+++ b/.jenkins/pytorch/win-test-helpers/test_python_second_shard.bat
@ -1,3 +0,0 @@
-call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
-cd test && python run_test.py --exclude-jit-executor --shard 2 2 --verbose --determine-from="%1" && cd ..
-if ERRORLEVEL 1 exit /b 1
--- a/.jenkins/pytorch/win-test.sh
+++ b/.jenkins/pytorch/win-test.sh
@ -1,12 +1,12 @@
-#!/bin/bash
-set -ex
+#!/bin/bash -ex
+
 # shellcheck disable=SC2034
 COMPACT_JOB_NAME=pytorch-win-ws2019-cuda10-cudnn7-py3-test

 SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
 source "$SCRIPT_PARENT_DIR/common.sh"

-export IMAGE_COMMIT_ID=$(git rev-parse HEAD)
+export IMAGE_COMMIT_ID=`git rev-parse HEAD`
 export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
 if [[ ${JOB_NAME} == *"develop"* ]]; then
  export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}
@ -14,10 +14,6 @@ fi

 export TMP_DIR="${PWD}/build/win_tmp"
 export TMP_DIR_WIN=$(cygpath -w "${TMP_DIR}")
-export PROJECT_DIR="${PWD}"
-export PROJECT_DIR_WIN=$(cygpath -w "${PROJECT_DIR}")
-export TEST_DIR="${PWD}/test"
-export TEST_DIR_WIN=$(cygpath -w "${TEST_DIR}")
 export PYTORCH_FINAL_PACKAGE_DIR="/c/users/circleci/workspace/build-results"
 export PYTORCH_FINAL_PACKAGE_DIR_WIN=$(cygpath -w "${PYTORCH_FINAL_PACKAGE_DIR}")

@ -40,46 +36,26 @@ if [ -n "$CIRCLE_PULL_REQUEST" ]; then
  file_diff_from_base "$DETERMINE_FROM"
 fi

-if [[ "${CIRCLE_JOB}" == *11.1* ]]; then
-  export BUILD_SPLIT_CUDA=ON
-fi
-
 run_tests() {
    if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then
-        $SCRIPT_HELPERS_DIR/test_python_nn.bat "$DETERMINE_FROM"
-        $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat "$DETERMINE_FROM"
-        $SCRIPT_HELPERS_DIR/test_custom_script_ops.bat
-        $SCRIPT_HELPERS_DIR/test_custom_backend.bat
+        $SCRIPT_HELPERS_DIR/test_python_nn.bat "$DETERMINE_FROM" && \
+        $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat "$DETERMINE_FROM" && \
+        $SCRIPT_HELPERS_DIR/test_custom_script_ops.bat && \
+        $SCRIPT_HELPERS_DIR/test_custom_backend.bat && \
        $SCRIPT_HELPERS_DIR/test_libtorch.bat
    else
-        export PYTORCH_COLLECT_COVERAGE=1
        if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then
-            $SCRIPT_HELPERS_DIR/test_python_first_shard.bat "$DETERMINE_FROM"
+            $SCRIPT_HELPERS_DIR/test_python_nn.bat "$DETERMINE_FROM" && \
            $SCRIPT_HELPERS_DIR/test_libtorch.bat
            if [[ "${USE_CUDA}" == "1" ]]; then
-              $SCRIPT_HELPERS_DIR/test_python_jit_legacy.bat "$DETERMINE_FROM"
+              $SCRIPT_HELPERS_DIR/test_python_jit_profiling.bat "$DETERMINE_FROM"
            fi
        elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then
-            $SCRIPT_HELPERS_DIR/test_python_second_shard.bat "$DETERMINE_FROM"
-            $SCRIPT_HELPERS_DIR/test_custom_backend.bat
+            $SCRIPT_HELPERS_DIR/test_python_all_except_nn.bat "$DETERMINE_FROM" && \
+            $SCRIPT_HELPERS_DIR/test_custom_backend.bat && \
            $SCRIPT_HELPERS_DIR/test_custom_script_ops.bat
        fi
    fi
 }

-run_tests
-assert_git_not_dirty
-echo "TEST PASSED"
-
-if [[ "${BUILD_ENVIRONMENT}" == "pytorch-win-vs2019-cuda10-cudnn7-py3" ]]; then
-  pushd $TEST_DIR
-  python -mpip install coverage
-  echo "Generating XML coverage report"
-  time python -mcoverage xml
-  popd
-
-  pushd $PROJECT_DIR
-  python -mpip install codecov
-  python -mcodecov
-  popd
-fi
+run_tests && assert_git_not_dirty && echo "TEST PASSED"
--- a/Show More
+++ b/Show More