Revert "Fix handling of non-finite values in topk (#35253 )" (#35582 )

This reverts commit b12579da5398ff23b421332e21e18dc619a0b960. This patch in-and-of itself looks fine, but it's causing some AMP tests to fail.
[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 )
2025-10-27 09:04:53 +08:00 · 2020-03-27 17:44:03 -07:00 · 2020-03-27 12:30:29 -04:00 · 2020-03-27 11:13:01 -04:00 · 2020-03-27 11:12:36 -04:00 · 2020-03-27 10:53:18 -04:00
262 changed files with 1790 additions and 1580 deletions
--- a/.circleci/cimodel/data/binary_build_data.py
+++ b/.circleci/cimodel/data/binary_build_data.py
@ -34,8 +34,6 @@ def get_processor_arch_name(cuda_version):
 LINUX_PACKAGE_VARIANTS = OrderedDict(
    manywheel=[
        "2.7m",
        "2.7mu",
        "3.5m",
        "3.6m",
        "3.7m",
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -351,16 +351,16 @@ jobs:
          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
          # TODO We may want to move the rebase logic to a separate step after checkout
-          # Rebase to master only if in xenial_py3_6_gcc5_4 case
+          # Rebase to release/1.5 only if in xenial_py3_6_gcc5_4 case
-          if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
+          if [[ "${CIRCLE_BRANCH}" != "release/1.5" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
-            echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
+            echo "Merge release/1.5 branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
            set -x
            git config --global user.email "circleci.ossci@gmail.com"
            git config --global user.name "CircleCI"
            git config remote.origin.url https://github.com/pytorch/pytorch.git
-            git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
+            git config --add remote.origin.fetch +refs/heads/release/1.5:refs/remotes/origin/release/1.5
-            git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
+            git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/release/1.5:refs/remotes/origin/release/1.5 --depth=100 --quiet
-            export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
+            export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/release/1.5`
            echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
            export GIT_COMMIT=${CIRCLE_SHA1}
            echo "GIT_COMMIT: " ${GIT_COMMIT}
@ -369,7 +369,7 @@ jobs:
            git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}
            set +x
          else
-            echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
+            echo "Do NOT merge release/1.5 branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
          fi
          git submodule sync && git submodule update -q --init --recursive
@ -2544,28 +2544,6 @@ workflows:
          libtorch_variant: "shared-with-deps"
          docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"
      - smoke_linux_test:
          name: smoke_linux_manywheel_2_7m_cpu_devtoolset7_nightly
          build_environment: "manywheel 2.7m cpu devtoolset7"
          requires:
            - setup
            - update_s3_htmls_for_nightlies
            - update_s3_htmls_for_nightlies_devtoolset7
          filters:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda102"
      - smoke_linux_test:
          name: smoke_linux_manywheel_2_7mu_cpu_devtoolset7_nightly
          build_environment: "manywheel 2.7mu cpu devtoolset7"
          requires:
            - setup
            - update_s3_htmls_for_nightlies
            - update_s3_htmls_for_nightlies_devtoolset7
          filters:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda102"
      - smoke_linux_test:
          name: smoke_linux_manywheel_3_5m_cpu_devtoolset7_nightly
          build_environment: "manywheel 3.5m cpu devtoolset7"
@ -2610,32 +2588,6 @@ workflows:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda102"
      - smoke_linux_test:
          name: smoke_linux_manywheel_2_7m_cu92_devtoolset7_nightly
          build_environment: "manywheel 2.7m cu92 devtoolset7"
          requires:
            - setup
            - update_s3_htmls_for_nightlies
            - update_s3_htmls_for_nightlies_devtoolset7
          filters:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda92"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - smoke_linux_test:
          name: smoke_linux_manywheel_2_7mu_cu92_devtoolset7_nightly
          build_environment: "manywheel 2.7mu cu92 devtoolset7"
          requires:
            - setup
            - update_s3_htmls_for_nightlies
            - update_s3_htmls_for_nightlies_devtoolset7
          filters:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda92"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - smoke_linux_test:
          name: smoke_linux_manywheel_3_5m_cu92_devtoolset7_nightly
          build_environment: "manywheel 3.5m cu92 devtoolset7"
@ -2688,32 +2640,6 @@ workflows:
          docker_image: "pytorch/manylinux-cuda92"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - smoke_linux_test:
          name: smoke_linux_manywheel_2_7m_cu101_devtoolset7_nightly
          build_environment: "manywheel 2.7m cu101 devtoolset7"
          requires:
            - setup
            - update_s3_htmls_for_nightlies
            - update_s3_htmls_for_nightlies_devtoolset7
          filters:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda101"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - smoke_linux_test:
          name: smoke_linux_manywheel_2_7mu_cu101_devtoolset7_nightly
          build_environment: "manywheel 2.7mu cu101 devtoolset7"
          requires:
            - setup
            - update_s3_htmls_for_nightlies
            - update_s3_htmls_for_nightlies_devtoolset7
          filters:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda101"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - smoke_linux_test:
          name: smoke_linux_manywheel_3_5m_cu101_devtoolset7_nightly
          build_environment: "manywheel 3.5m cu101 devtoolset7"
@ -2766,32 +2692,6 @@ workflows:
          docker_image: "pytorch/manylinux-cuda101"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - smoke_linux_test:
          name: smoke_linux_manywheel_2_7m_cu102_devtoolset7_nightly
          build_environment: "manywheel 2.7m cu102 devtoolset7"
          requires:
            - setup
            - update_s3_htmls_for_nightlies
            - update_s3_htmls_for_nightlies_devtoolset7
          filters:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda102"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - smoke_linux_test:
          name: smoke_linux_manywheel_2_7mu_cu102_devtoolset7_nightly
          build_environment: "manywheel 2.7mu cu102 devtoolset7"
          requires:
            - setup
            - update_s3_htmls_for_nightlies
            - update_s3_htmls_for_nightlies_devtoolset7
          filters:
            branches:
              only: postnightly
          docker_image: "pytorch/manylinux-cuda102"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - smoke_linux_test:
          name: smoke_linux_manywheel_3_5m_cu102_devtoolset7_nightly
          build_environment: "manywheel 3.5m cu102 devtoolset7"
@ -3636,28 +3536,6 @@ workflows:
          filters:
            branches:
              only: postnightly
      - binary_linux_build:
          name: binary_linux_manywheel_2_7m_cpu_devtoolset7_nightly_build
          build_environment: "manywheel 2.7m cpu devtoolset7"
          requires:
            - setup
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
      - binary_linux_build:
          name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_nightly_build
          build_environment: "manywheel 2.7mu cpu devtoolset7"
          requires:
            - setup
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
      - binary_linux_build:
          name: binary_linux_manywheel_3_5m_cpu_devtoolset7_nightly_build
          build_environment: "manywheel 3.5m cpu devtoolset7"
@ -3702,28 +3580,6 @@ workflows:
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
      - binary_linux_build:
          name: binary_linux_manywheel_2_7m_cu92_devtoolset7_nightly_build
          build_environment: "manywheel 2.7m cu92 devtoolset7"
          requires:
            - setup
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda92"
      - binary_linux_build:
          name: binary_linux_manywheel_2_7mu_cu92_devtoolset7_nightly_build
          build_environment: "manywheel 2.7mu cu92 devtoolset7"
          requires:
            - setup
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda92"
      - binary_linux_build:
          name: binary_linux_manywheel_3_5m_cu92_devtoolset7_nightly_build
          build_environment: "manywheel 3.5m cu92 devtoolset7"
@ -3768,28 +3624,6 @@ workflows:
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda92"
      - binary_linux_build:
          name: binary_linux_manywheel_2_7m_cu101_devtoolset7_nightly_build
          build_environment: "manywheel 2.7m cu101 devtoolset7"
          requires:
            - setup
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda101"
      - binary_linux_build:
          name: binary_linux_manywheel_2_7mu_cu101_devtoolset7_nightly_build
          build_environment: "manywheel 2.7mu cu101 devtoolset7"
          requires:
            - setup
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda101"
      - binary_linux_build:
          name: binary_linux_manywheel_3_5m_cu101_devtoolset7_nightly_build
          build_environment: "manywheel 3.5m cu101 devtoolset7"
@ -3834,28 +3668,6 @@ workflows:
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda101"
      - binary_linux_build:
          name: binary_linux_manywheel_2_7m_cu102_devtoolset7_nightly_build
          build_environment: "manywheel 2.7m cu102 devtoolset7"
          requires:
            - setup
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
      - binary_linux_build:
          name: binary_linux_manywheel_2_7mu_cu102_devtoolset7_nightly_build
          build_environment: "manywheel 2.7mu cu102 devtoolset7"
          requires:
            - setup
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
      - binary_linux_build:
          name: binary_linux_manywheel_3_5m_cu102_devtoolset7_nightly_build
          build_environment: "manywheel 3.5m cu102 devtoolset7"
@ -4706,30 +4518,6 @@ workflows:
 ##############################################################################
 # Nightly tests
 ##############################################################################
      - binary_linux_test:
          name: binary_linux_manywheel_2_7m_cpu_devtoolset7_nightly_test
          build_environment: "manywheel 2.7m cpu devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7m_cpu_devtoolset7_nightly_build
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
      - binary_linux_test:
          name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_nightly_test
          build_environment: "manywheel 2.7mu cpu devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7mu_cpu_devtoolset7_nightly_build
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
      - binary_linux_test:
          name: binary_linux_manywheel_3_5m_cpu_devtoolset7_nightly_test
          build_environment: "manywheel 3.5m cpu devtoolset7"
@ -4778,34 +4566,6 @@ workflows:
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
      - binary_linux_test:
          name: binary_linux_manywheel_2_7m_cu92_devtoolset7_nightly_test
          build_environment: "manywheel 2.7m cu92 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7m_cu92_devtoolset7_nightly_build
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda92"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - binary_linux_test:
          name: binary_linux_manywheel_2_7mu_cu92_devtoolset7_nightly_test
          build_environment: "manywheel 2.7mu cu92 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7mu_cu92_devtoolset7_nightly_build
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda92"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - binary_linux_test:
          name: binary_linux_manywheel_3_5m_cu92_devtoolset7_nightly_test
          build_environment: "manywheel 3.5m cu92 devtoolset7"
@ -4862,34 +4622,6 @@ workflows:
          docker_image: "pytorch/manylinux-cuda92"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - binary_linux_test:
          name: binary_linux_manywheel_2_7m_cu101_devtoolset7_nightly_test
          build_environment: "manywheel 2.7m cu101 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7m_cu101_devtoolset7_nightly_build
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda101"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - binary_linux_test:
          name: binary_linux_manywheel_2_7mu_cu101_devtoolset7_nightly_test
          build_environment: "manywheel 2.7mu cu101 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7mu_cu101_devtoolset7_nightly_build
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda101"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - binary_linux_test:
          name: binary_linux_manywheel_3_5m_cu101_devtoolset7_nightly_test
          build_environment: "manywheel 3.5m cu101 devtoolset7"
@ -4946,34 +4678,6 @@ workflows:
          docker_image: "pytorch/manylinux-cuda101"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - binary_linux_test:
          name: binary_linux_manywheel_2_7m_cu102_devtoolset7_nightly_test
          build_environment: "manywheel 2.7m cu102 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7m_cu102_devtoolset7_nightly_build
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - binary_linux_test:
          name: binary_linux_manywheel_2_7mu_cu102_devtoolset7_nightly_test
          build_environment: "manywheel 2.7mu cu102 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7mu_cu102_devtoolset7_nightly_build
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          docker_image: "pytorch/manylinux-cuda102"
          use_cuda_docker_runtime: "1"
          resource_class: gpu.medium
      - binary_linux_test:
          name: binary_linux_manywheel_3_5m_cu102_devtoolset7_nightly_test
          build_environment: "manywheel 3.5m cu102 devtoolset7"
@ -5772,30 +5476,6 @@ workflows:
      #      - binary_linux_libtorch_2.7m_cu90_build
      # Nightly uploads
      - binary_linux_upload:
          name: binary_linux_manywheel_2_7m_cpu_devtoolset7_nightly_upload
          build_environment: "manywheel 2.7m cpu devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7m_cpu_devtoolset7_nightly_test
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_nightly_upload
          build_environment: "manywheel 2.7mu cpu devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7mu_cpu_devtoolset7_nightly_test
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_3_5m_cpu_devtoolset7_nightly_upload
          build_environment: "manywheel 3.5m cpu devtoolset7"
@ -5844,30 +5524,6 @@ workflows:
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_2_7m_cu92_devtoolset7_nightly_upload
          build_environment: "manywheel 2.7m cu92 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7m_cu92_devtoolset7_nightly_test
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_2_7mu_cu92_devtoolset7_nightly_upload
          build_environment: "manywheel 2.7mu cu92 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7mu_cu92_devtoolset7_nightly_test
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_3_5m_cu92_devtoolset7_nightly_upload
          build_environment: "manywheel 3.5m cu92 devtoolset7"
@ -5916,30 +5572,6 @@ workflows:
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_2_7m_cu101_devtoolset7_nightly_upload
          build_environment: "manywheel 2.7m cu101 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7m_cu101_devtoolset7_nightly_test
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_2_7mu_cu101_devtoolset7_nightly_upload
          build_environment: "manywheel 2.7mu cu101 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7mu_cu101_devtoolset7_nightly_test
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_3_5m_cu101_devtoolset7_nightly_upload
          build_environment: "manywheel 3.5m cu101 devtoolset7"
@ -5988,30 +5620,6 @@ workflows:
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_2_7m_cu102_devtoolset7_nightly_upload
          build_environment: "manywheel 2.7m cu102 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7m_cu102_devtoolset7_nightly_test
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_2_7mu_cu102_devtoolset7_nightly_upload
          build_environment: "manywheel 2.7mu cu102 devtoolset7"
          requires:
            - setup
            - binary_linux_manywheel_2_7mu_cu102_devtoolset7_nightly_test
          filters:
            branches:
              only: nightly
            tags:
              only: /v[0-9]+(\.[0-9]+)*-rc[0-9]+/
          context: org-member
      - binary_linux_upload:
          name: binary_linux_manywheel_3_5m_cu102_devtoolset7_nightly_upload
          build_environment: "manywheel 3.5m cu102 devtoolset7"
--- a/.circleci/scripts/binary_populate_env.sh
+++ b/.circleci/scripts/binary_populate_env.sh
@ -2,6 +2,19 @@
 set -eux -o pipefail
 export TZ=UTC
 tagged_version() {
  # Grabs version from either the env variable CIRCLE_TAG
  # or the pytorch git described version
  GIT_DESCRIBE="git --git-dir ${workdir}/pytorch/.git describe"
  if [[ -n "${CIRCLE_TAG:-}" ]]; then
    echo "${CIRCLE_TAG}"
  elif ${GIT_DESCRIBE} --exact --tags >/dev/null; then
    ${GIT_DESCRIBE} --tags
  else
    return 1
  fi
 }
 # We need to write an envfile to persist these variables to following
 # steps, but the location of the envfile depends on the circleci executor
 if [[ "$(uname)" == Darwin ]]; then
@ -47,15 +60,17 @@ export DATE="$(date -u +%Y%m%d)"
 #TODO: We should be pulling semver version from the base version.txt
 BASE_BUILD_VERSION="1.5.0.dev$DATE"
 # Change BASE_BUILD_VERSION to git tag when on a git tag
-if git describe --tags --exact >/dev/null 2>/dev/null; then
+# Use 'git -C' to make doubly sure we're in the correct directory for checking
 # the git tag
 if tagged_version >/dev/null; then
  # Switch upload folder to 'test/' if we are on a tag
  PIP_UPLOAD_FOLDER='test/'
  # Grab git tag, remove prefixed v and remove everything after -
  # Used to clean up tags that are for release candidates like v1.5.0-rc1
  # Turns tag v1.5.0-rc1 -> v1.5.0
-  BASE_BUILD_VERSION="$(git describe --tags | sed -e 's/^v//' -e 's/-.*$//')"
+  BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"
 fi
-if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu101" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
+if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu102" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"
 else
  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"
--- a/.circleci/verbatim-sources/pytorch-job-specs.yml
+++ b/.circleci/verbatim-sources/pytorch-job-specs.yml
@ -20,16 +20,16 @@ jobs:
          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
          # TODO We may want to move the rebase logic to a separate step after checkout
-          # Rebase to master only if in xenial_py3_6_gcc5_4 case
+          # Rebase to release/1.5 only if in xenial_py3_6_gcc5_4 case
-          if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
+          if [[ "${CIRCLE_BRANCH}" != "release/1.5" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
-            echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
+            echo "Merge release/1.5 branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
            set -x
            git config --global user.email "circleci.ossci@gmail.com"
            git config --global user.name "CircleCI"
            git config remote.origin.url https://github.com/pytorch/pytorch.git
-            git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
+            git config --add remote.origin.fetch +refs/heads/release/1.5:refs/remotes/origin/release/1.5
-            git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
+            git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/release/1.5:refs/remotes/origin/release/1.5 --depth=100 --quiet
-            export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
+            export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/release/1.5`
            echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
            export GIT_COMMIT=${CIRCLE_SHA1}
            echo "GIT_COMMIT: " ${GIT_COMMIT}
@ -38,7 +38,7 @@ jobs:
            git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}
            set +x
          else
-            echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
+            echo "Do NOT merge release/1.5 branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
          fi
          git submodule sync && git submodule update -q --init --recursive
--- a/.jenkins/pytorch/build.sh
+++ b/.jenkins/pytorch/build.sh
@ -167,7 +167,7 @@ fi
 # Patch required to build xla
 if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
-  git clone --recursive https://github.com/pytorch/xla.git
+  git clone --recursive -b r1.5 https://github.com/pytorch/xla.git
  ./xla/scripts/apply_patches.sh
 fi
--- a/.jenkins/pytorch/macos-common.sh
+++ b/.jenkins/pytorch/macos-common.sh
@ -18,7 +18,7 @@ if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then
 fi
 export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
 source ${WORKSPACE_DIR}/miniconda3/bin/activate
-retry conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
+retry conda install -y mkl mkl-include numpy pyyaml=5.3 setuptools=46.0.0 cmake cffi ninja
 # The torch.hub tests make requests to GitHub.
 #
--- a/aten/src/ATen/Utils.h
+++ b/aten/src/ATen/Utils.h
@ -16,14 +16,6 @@
 #include <numeric>
 #include <memory>
 #if defined(__clang__)
 #define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
 #define __ubsan_ignore_vptr__ __attribute__((no_sanitize("vptr")))
 #else
 #define __ubsan_ignore_float_divide_by_zero__
 #define __ubsan_ignore_vptr__
 #endif
 #define AT_DISALLOW_COPY_AND_ASSIGN(TypeName) \
  TypeName(const TypeName&) = delete; \
  void operator=(const TypeName&) = delete
--- a/aten/src/ATen/core/custom_class.cpp
+++ b/aten/src/ATen/core/custom_class.cpp
@ -20,6 +20,10 @@ void registerCustomClass(at::ClassTypePtr class_type) {
 }
 at::ClassTypePtr getCustomClass(const std::string& name) {
  // BC hack so we can upgrade a binary internally
  if (name == "__torch__.torch.classes.SentencePiece") {
    return getCustomClass("__torch__.torch.classes.fb.SentencePiece");
  }
  return customClasses().count(name) ? customClasses()[name] : nullptr;
 }
--- a/aten/src/ATen/cpu/vec256/vec256_base.h
+++ b/aten/src/ATen/cpu/vec256/vec256_base.h
@ -15,6 +15,7 @@
 #include <c10/util/math_compat.h>
 #include <ATen/native/cpu/zmath.h>
 #include <c10/util/TypeCast.h>
 #include <c10/macros/Macros.h>
 #if defined(__GNUC__)
 #define __at_align32__ __attribute__((aligned(32)))
--- a/aten/src/ATen/cudnn/Descriptors.h
+++ b/aten/src/ATen/cudnn/Descriptors.h
@ -145,7 +145,7 @@ private:
 std::ostream& operator<<(std::ostream & out, const TensorDescriptor& d);
-class FilterDescriptor
+class TORCH_CUDA_API FilterDescriptor
  : public Descriptor<cudnnFilterStruct,
                      &cudnnCreateFilterDescriptor,
                      &cudnnDestroyFilterDescriptor>
--- a/aten/src/ATen/native/BinaryOps.cpp
+++ b/aten/src/ATen/native/BinaryOps.cpp
@ -138,6 +138,10 @@ Tensor true_divide(const Tensor& self, const Tensor& divisor) {
  return iter.output();
 }
 Tensor& true_divide_(Tensor& self, const Tensor& divisor) {
  return native::true_divide_out(self, self, divisor);
 }
 Tensor& floor_divide_out(Tensor& result, const Tensor& self, const Tensor& other) {
  auto iter = TensorIterator::binary_op(result, self, other,
    /*check_mem_overlap=*/true);
@ -731,7 +735,11 @@ Tensor& fmod_(Tensor& self, Scalar other) {
 }
 Tensor true_divide(const Tensor& self, Scalar divisor) {
-  return at::true_divide(self, wrapped_scalar_tensor(divisor)); // redispatch!
+  return self.true_divide(wrapped_scalar_tensor(divisor)); // redispatch!
 }
 Tensor& true_divide_(Tensor& self, Scalar divisor) {
  return self.true_divide_(wrapped_scalar_tensor(divisor)); // redispatch!
 }
 }
--- a/aten/src/ATen/native/TensorConversions.cpp
+++ b/aten/src/ATen/native/TensorConversions.cpp
@ -33,7 +33,7 @@ static inline Tensor to_impl(const Tensor& self, const TensorOptions& options, b
    if (self.is_non_overlapping_and_dense()) {
      // Copy all strides
      auto r = at::empty_strided(self.sizes(), self.strides(), options.memory_format(c10::nullopt));
-      r.copy_(self);
+      r.copy_(self, non_blocking);
      return r;
    } else {
      memory_format = self.suggest_memory_format();
--- a/aten/src/ATen/native/TensorShape.cpp
+++ b/aten/src/ATen/native/TensorShape.cpp
@ -98,6 +98,15 @@ Tensor & _cat_out_cpu(Tensor& result, TensorList tensors, int64_t dim) {
        "output memory locations. Found overlap in input tensor ", i);
  }
  // Dtypes should be the same
  const auto first_in_cat = tensors[0];
  for (int64_t i = 1; i < tensors.size(); i++) {
    TORCH_CHECK(first_in_cat.dtype() == tensors[i].dtype(),
              "Expected object of scalar type ", first_in_cat.dtype(),
              " but got scalar type ", tensors[i].dtype(),
              " for sequence element ", i, ".");
  }
  auto should_skip = [](const Tensor& t) { return t.numel() == 0 && t.dim() == 1; };
  for (auto const &tensor : tensors) {
    if (should_skip(tensor)) {
--- a/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp
+++ b/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp
@ -7,6 +7,7 @@
 #include <ATen/native/TensorIterator.h>
 #include <ATen/native/BinaryOps.h>
 #include <ATen/native/cpu/Loops.h>
 #include <c10/macros/Macros.h>
 namespace at { namespace native {
 namespace {
--- a/aten/src/ATen/native/cuda/BinaryArithmeticKernel.cu
+++ b/aten/src/ATen/native/cuda/BinaryArithmeticKernel.cu
@ -4,7 +4,7 @@
 #include <ATen/native/cuda/zmath.cuh>
 #include <ATen/native/TensorIterator.h>
 #include <ATen/native/BinaryOps.h>
-
+#include <c10/macros/Macros.h>
 // NOTE: CUDA on Windows requires that the enclosing function
 // of a __device__ lambda not have internal linkage.
--- a/aten/src/ATen/native/cuda/Shape.cu
+++ b/aten/src/ATen/native/cuda/Shape.cu
@ -307,6 +307,15 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
                "tensor ", i);
  }
  // Dtypes should be the same
  const auto first_in_cat = inputs[0];
  for (int64_t i = 1; i < inputs.size(); i++) {
    TORCH_CHECK(first_in_cat.dtype() == inputs[i].dtype(),
              "Expected object of scalar type ", first_in_cat.dtype(),
              " but got scalar type ", inputs[i].dtype(),
              " for sequence element ", i, ".");
  }
  for (int i = 0; i < inputs.size(); i++)
  {
    if (should_skip(inputs[i])) {
@ -325,6 +334,12 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
  TORCH_CHECK(inputs.size() > 0, "invalid number of inputs ", inputs.size());
  TORCH_CHECK(dimension >= 0, "invalid dimension ", dimension);
  for (const Tensor& t: inputs) {
    TORCH_CHECK(t.device() == notSkippedTensor->device(),
                "All input tensors must be on the same device. Received ",
                t.device(), " and ", notSkippedTensor->device());
  }
  c10::MemoryFormat memory_format = compute_output_memory_format(inputs);
  std::vector<int64_t> size(notSkippedTensor->sizes().vec());
@ -355,17 +370,11 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
  // 4. The number of dimensions is <= 4
  // 5. All input tensors are contiguous (output tensor may be non-contig)
  // 6. All input tensors can use 32-bit indexing
  // 7. All input tensors are on the same device
  const bool all32BitIndexable = std::all_of(inputs.begin(), inputs.end(),
    [] (const Tensor& t) {
      return at::cuda::detail::canUse32BitIndexMath(t);
    });
  Device firstDevice = notSkippedTensor->device();
  const bool allSameDevice = std::all_of(inputs.begin(), inputs.end(),
    [firstDevice](const Tensor& t) {
      return t.device() == firstDevice;
    });
  const bool allContiguous = std::all_of(inputs.begin(), inputs.end(),
    [=](const Tensor& t) {
      return !t.defined() || t.is_contiguous(memory_format);
@ -375,8 +384,7 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
      out.dim() <= CAT_ARRAY_MAX_INPUT_DIMS &&
      at::cuda::detail::canUse32BitIndexMath(out) &&
      allContiguous &&
-      all32BitIndexable &&
+      all32BitIndexable) {
      allSameDevice) {
    AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(
        at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16,
--- a/aten/src/ATen/native/native_functions.yaml
+++ b/aten/src/ATen/native/native_functions.yaml
@ -2872,7 +2872,7 @@
 - func: true_divide.Tensor(Tensor self, Tensor other) -> Tensor
  use_c10_dispatcher: full
-  variants: function
+  variants: function, method
  dispatch:
    CPU: true_divide
    CUDA: true_divide
@ -2880,6 +2880,15 @@
    SparseCUDA: true_divide_sparse
  supports_named_tensor: True
 - func: true_divide_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
  variants: method
  dispatch:
    CPU: true_divide_
    CUDA: true_divide_
    SparseCPU: true_divide_sparse_
    SparseCUDA: true_divide_sparse_
  supports_named_tensor: True
 - func: true_divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
  dispatch:
    CPU: true_divide_out
@ -2890,7 +2899,11 @@
 - func: true_divide.Scalar(Tensor self, Scalar other) -> Tensor
  use_c10_dispatcher: full
-  variants: function
+  variants: function, method
  supports_named_tensor: True
 - func: true_divide_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
  variants: method
  supports_named_tensor: True
 - func: trunc(Tensor self) -> Tensor
--- a/aten/src/ATen/native/sparse/SparseTensorMath.cpp
+++ b/aten/src/ATen/native/sparse/SparseTensorMath.cpp
@ -272,6 +272,10 @@ SparseTensor& true_divide_out_sparse_scalar(
  return true_divide_out_sparse_zerodim(result, dividend, wrapped_scalar_tensor(divisor));
 }
 Tensor& true_divide_sparse_(Tensor& self, const Tensor& divisor) {
  return true_divide_out_sparse_zerodim(self, self, divisor);
 }
 // --------------------------------------------------------------------
 // floor_divide(SparseTensor, Scalar)
 // --------------------------------------------------------------------
--- a/aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp
+++ b/aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp
@ -14,7 +14,7 @@ namespace xnnpack {
 namespace {
 torch::jit::class_<XNNPackLinearOpContext> register_xnnpack_linear_op_context_class() {
  static auto register_linear_op_context_class =
-      torch::jit::class_<XNNPackLinearOpContext>("XNNPackLinearOpContext")
+      torch::jit::class_<XNNPackLinearOpContext>("xnnpack", "XNNPackLinearOpContext")
          .def_pickle(
              [](const c10::intrusive_ptr<XNNPackLinearOpContext>& op_context)
                  -> SerializationTypeLinearPrePack { // __getstate__
@ -38,7 +38,7 @@ torch::jit::class_<XNNPackLinearOpContext> register_xnnpack_linear_op_context_cl
 torch::jit::class_<XNNPackConv2dOpContext> register_xnnpack_conv2d_op_context_class() {
  static auto register_conv2d_op_context_class =
-      torch::jit::class_<XNNPackConv2dOpContext>("XNNPackConv2dOpContext")
+      torch::jit::class_<XNNPackConv2dOpContext>("xnnpack", "XNNPackConv2dOpContext")
          .def_pickle(
              [](const c10::intrusive_ptr<XNNPackConv2dOpContext>& op_context)
                  -> SerializationTypeConv2dPrePack { // __getstate__
@ -74,25 +74,25 @@ static auto registry =
  // Registering under _xnnpack namespace for now. As we add more backend requiring similar functionality
  // We can refactor the code and use a better namespace.
    torch::RegisterOperators()
-        .op("_xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> __torch__.torch.classes.XNNPackLinearOpContext",
+        .op("_xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> __torch__.torch.classes.xnnpack.XNNPackLinearOpContext",
            torch::RegisterOperators::options()
            .aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
            .kernel<internal::linear::LinearPrePack>(
                DispatchKey::CPUTensorId))
-        .op("_xnnpack::linear_packed(Tensor X, __torch__.torch.classes.XNNPackLinearOpContext W_prepack) -> Tensor Y",
+        .op("_xnnpack::linear_packed(Tensor X, __torch__.torch.classes.xnnpack.XNNPackLinearOpContext W_prepack) -> Tensor Y",
            torch::RegisterOperators::options()
            .aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
            .kernel<internal::linear::LinearPacked>(
                DispatchKey::CPUTensorId))
        .op("_xnnpack::conv2d_prepack(Tensor W, Tensor? B, int[2] stride, "
            "int[2] padding, int[2] dilation, int groups) "
-            "-> __torch__.torch.classes.XNNPackConv2dOpContext",
+            "-> __torch__.torch.classes.xnnpack.XNNPackConv2dOpContext",
            torch::RegisterOperators::options()
            .aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
            .kernel<internal::convolution2d::Conv2dPrePack>(
                DispatchKey::CPUTensorId))
        .op("_xnnpack::conv2d_packed(Tensor X, "
-            "__torch__.torch.classes.XNNPackConv2dOpContext W_prepack) -> Tensor Y",
+            "__torch__.torch.classes.xnnpack.XNNPackConv2dOpContext W_prepack) -> Tensor Y",
            torch::RegisterOperators::options()
            .aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
            .kernel<internal::convolution2d::Conv2dPacked>(
--- a/aten/src/TH/THGeneral.h.in
+++ b/aten/src/TH/THGeneral.h.in
@ -69,12 +69,6 @@
 # define TH_UNUSED
 #endif
 #if defined(__clang__)
 #define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
 #else
 #define __ubsan_ignore_float_divide_by_zero__
 #endif
 #ifndef M_PI
 # define M_PI 3.14159265358979323846
 #endif
--- a/c10/macros/Macros.h
+++ b/c10/macros/Macros.h
@ -23,6 +23,14 @@
 #include "c10/macros/Export.h"
 #if defined(__clang__)
  #define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
  #define __ubsan_ignore_float_cast_overflow__ __attribute__((no_sanitize("float-cast-overflow")))
 #else
  #define __ubsan_ignore_float_divide_by_zero__
  #define __ubsan_ignore_float_cast_overflow__
 #endif
 // Disable the copy and assignment operator for a class. Note that this will
 // disable the usage of the class in std containers.
 #define C10_DISABLE_COPY_AND_ASSIGN(classname) \
--- a/c10/util/Exception.cpp
+++ b/c10/util/Exception.cpp
@ -66,24 +66,44 @@ void Error::AppendMessage(const std::string& new_msg) {
 namespace Warning {
 namespace {
-  WarningHandler* getHandler() {
+  WarningHandler* getBaseHandler() {
    static WarningHandler base_warning_handler_ = WarningHandler();
    return &base_warning_handler_;
  };
-  static thread_local WarningHandler* warning_handler_ = getHandler();
+
  class ThreadWarningHandler {
    public:
      ThreadWarningHandler() = delete;
      static WarningHandler* get_handler() {
        if (!warning_handler_) {
          warning_handler_ = getBaseHandler();
        }
        return warning_handler_;
      }
      static void set_handler(WarningHandler* handler) {
        warning_handler_ = handler;
      }
    private:
      static thread_local WarningHandler* warning_handler_;
  };
  thread_local WarningHandler* ThreadWarningHandler::warning_handler_ = nullptr;
 }
 void warn(SourceLocation source_location, const std::string& msg) {
-  warning_handler_->process(source_location, msg);
+  ThreadWarningHandler::get_handler()->process(source_location, msg);
 }
 void set_warning_handler(WarningHandler* handler) noexcept(true) {
-  warning_handler_ = handler;
+  ThreadWarningHandler::set_handler(handler);
 }
 WarningHandler* get_warning_handler() noexcept(true) {
-  return warning_handler_;
+  return ThreadWarningHandler::get_handler();
 }
 } // namespace Warning
--- a/c10/util/TypeCast.h
+++ b/c10/util/TypeCast.h
@ -67,7 +67,7 @@ struct maybe_real<true, src_t> {
 template <typename dest_t, typename src_t>
 struct static_cast_with_inter_type {
-  C10_HOST_DEVICE static inline dest_t apply(src_t src) {
+  C10_HOST_DEVICE __ubsan_ignore_float_cast_overflow__ static inline dest_t apply(src_t src) {
    constexpr bool real = needs_real<dest_t, src_t>::value;
    return static_cast<dest_t>(
      static_cast<inter_copy_type_t<dest_t>>(maybe_real<real, src_t>::apply(src)));
--- a/docs/source/distributed.rst
+++ b/docs/source/distributed.rst
@ -395,6 +395,8 @@ of 16
 .. autofunction:: all_gather_multigpu
 .. _distributed-launch:
 Launch utility
 --------------
--- a/docs/source/notes/cuda.rst
+++ b/docs/source/notes/cuda.rst
@ -306,20 +306,30 @@ to overlap data transfers with computation.
 You can make the :class:`~torch.utils.data.DataLoader` return batches placed in
 pinned memory by passing ``pin_memory=True`` to its constructor.
-.. _cuda-nn-dataparallel-instead:
+.. _cuda-nn-ddp-instead:
-Use nn.DataParallel instead of multiprocessing
+Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Most use cases involving batched inputs and multiple GPUs should default to
-using :class:`~torch.nn.DataParallel` to utilize more than one GPU. Even with
+using :class:`~torch.nn.parallel.DistributedDataParallel` to utilize more
-the GIL, a single Python process can saturate multiple GPUs.
+than one GPU.
 As of version 0.1.9, large numbers of GPUs (8+) might not be fully utilized.
 However, this is a known issue that is under active development. As always,
 test your use case.
 There are significant caveats to using CUDA models with
 :mod:`~torch.multiprocessing`; unless care is taken to meet the data handling
 requirements exactly, it is likely that your program will have incorrect or
 undefined behavior.
 It is recommended to use :class:`~torch.nn.parallel.DistributedDataParallel`,
 instead of :class:`~torch.nn.DataParallel` to do multi-GPU training, even if
 there is only a single node.
 The difference between :class:`~torch.nn.parallel.DistributedDataParallel` and
 :class:`~torch.nn.DataParallel` is: :class:`~torch.nn.parallel.DistributedDataParallel`
 uses multiprocessing where a process is created for each GPU, while
 :class:`~torch.nn.DataParallel` uses multithreading. By using multiprocessing,
 each GPU has its dedicated process, this avoids the performance overhead caused
 by GIL of Python interpreter. 
 If you use :class:`~torch.nn.parallel.DistributedDataParallel`, you could use 
 `torch.distributed.launch` utility to launch your program, see :ref:`distributed-launch`.
--- a/docs/source/notes/multiprocessing.rst
+++ b/docs/source/notes/multiprocessing.rst
@ -45,7 +45,7 @@ the consumer process has references to the tensor, and the refcounting can not
 save you if the consumer process exits abnormally via a fatal signal. See
 :ref:`this section <multiprocessing-cuda-sharing-details>`.
-See also: :ref:`cuda-nn-dataparallel-instead`
+See also: :ref:`cuda-nn-ddp-instead`
 Best practices and tips
--- a/docs/source/tensor_attributes.rst
+++ b/docs/source/tensor_attributes.rst
@ -210,3 +210,25 @@ Example::
    (1, 5)
 For more information on ``torch.sparse_coo`` tensors, see :ref:`sparse-docs`.
 torch.memory_format
 ------------
 .. class:: torch.memory_format
 A :class:`torch.memory_format` is an object representing the memory format on which a :class:`torch.Tensor` is
 or will be allocated.
 Possible values are:
 - ``torch.contiguous_format``:
  Tensor is or will be  allocated in dense non-overlapping memory. Strides represented by values in decreasing order.
 - ``torch.channels_last``:
  Tensor is or will be  allocated in dense non-overlapping memory. Strides represented by values in 
  ``strides[0] > strides[2] > strides[3] > strides[1] == 1`` aka NHWC order. 
 - ``torch.preserve_format``:
  Used in functions like `clone` to preserve the memory format of the input tensor. If input tensor is 
  allocated in dense non-overlapping memory, the output tensor strides will be copied from the input. 
  Otherwise output strides will follow ``torch.contiguous_format``
--- a/docs/source/tensor_view.rst
+++ b/docs/source/tensor_view.rst
@ -49,8 +49,10 @@ For reference, here’s a full list of view ops in PyTorch:
 - Basic slicing and indexing op, e.g. ``tensor[0, 2:, 1:7:2]`` returns a view of base ``tensor``, see note below.
 - :meth:`~torch.Tensor.as_strided`
 - :meth:`~torch.Tensor.detach`
 - :meth:`~torch.Tensor.diagonal`
 - :meth:`~torch.Tensor.expand`
 - :meth:`~torch.Tensor.expand_as`
 - :meth:`~torch.Tensor.narrow`
 - :meth:`~torch.Tensor.permute`
 - :meth:`~torch.Tensor.select`
--- a/docs/source/tensors.rst
+++ b/docs/source/tensors.rst
@ -495,6 +495,8 @@ view of a storage and defines numeric operations on it.
   .. automethod:: tril_
   .. automethod:: triu
   .. automethod:: triu_
   .. automethod:: true_divide
   .. automethod:: true_divide_
   .. automethod:: trunc
   .. automethod:: trunc_
   .. automethod:: type
--- a/setup.py
+++ b/setup.py
@ -352,10 +352,10 @@ def build_deps():
 ################################################################################
 # the list of runtime dependencies required by this built package
-install_requires = []
+install_requires = ['future']
 if sys.version_info <= (2, 7):
-    install_requires += ['future', 'typing']
+    install_requires += ['typing']
 missing_pydep = '''
 Missing build dependency: Unable to `import {importname}`.
--- a/test/backward_compatibility/check_backward_compatibility.py
+++ b/test/backward_compatibility/check_backward_compatibility.py
@ -115,6 +115,10 @@ white_list = [
    ('aten::confirmed_by_owner', datetime.date(2020, 3, 17)),
    ('aten::owner', datetime.date(2020, 3, 27)),
    ('aten::owner_name', datetime.date(2020, 3, 27)),
    ('_xnnpack::conv2d_packed', datetime.date(2020, 4, 2)),
    ('_xnnpack::conv2d_prepack', datetime.date(2020, 4, 2)),
    ('_xnnpack::linear_packed', datetime.date(2020, 4, 2)),
    ('_xnnpack::linear_prepack', datetime.date(2020, 4, 2)),
 ]
@ -176,6 +180,9 @@ if __name__ == '__main__':
            line = f.readline()
            if not line:
                break
            if "torch.classes" in line:
                # TODO Fix type __torch__.torch.classes.xxx
                continue
            s = parse_schema(line.strip())
            slist = new_schema_dict.get(s.name, [])
--- a/test/cpp/api/functional.cpp
+++ b/test/cpp/api/functional.cpp
@ -293,7 +293,7 @@ TEST_F(FunctionalTest, MultiLabelSoftMarginLossWeightedNoReduction) {
  auto input = torch::tensor({{0., 2., 2., 0.}, {2., 1., 0., 1.}}, torch::dtype(torch::kFloat).requires_grad(true));
  auto target = torch::tensor({{0., 0., 1., 0.}, {1., 0., 1., 1.}}, torch::kFloat);
  auto weight = torch::tensor({0.1, 0.6, 0.4, 0.8}, torch::kFloat);
-  auto options = F::MultiLabelSoftMarginLossFuncOptions().reduction(torch::kNone).weight(weight);
+  auto options = F::MultilabelSoftMarginLossFuncOptions().reduction(torch::kNone).weight(weight);
  auto output =
      F::multilabel_soft_margin_loss(input, target, options);
  auto expected = torch::tensor({0.4876902, 0.3321295}, torch::kFloat);
@ -1875,7 +1875,7 @@ TEST_F(FunctionalTest, Interpolate) {
    // 1D interpolation
    auto input = torch::ones({1, 1, 2});
    auto options = F::InterpolateFuncOptions()
-                       .size({4})
+                       .size(std::vector<int64_t>({4}))
                       .mode(torch::kNearest);
    auto output = F::interpolate(input, options);
    auto expected = torch::ones({1, 1, 4});
@ -1889,7 +1889,7 @@ TEST_F(FunctionalTest, Interpolate) {
      for (const auto scale_factor : {0.5, 1.5, 2.0}) {
        auto input = torch::ones({1, 1, 2, 2});
        auto options = F::InterpolateFuncOptions()
-                           .scale_factor({scale_factor, scale_factor})
+                           .scale_factor(std::vector<double>({scale_factor, scale_factor}))
                           .mode(torch::kBilinear)
                           .align_corners(align_corners);
        auto output = F::interpolate(input, options);
@ -1908,7 +1908,7 @@ TEST_F(FunctionalTest, Interpolate) {
        auto input = torch::ones({1, 1, 2, 2, 2});
        auto options =
            F::InterpolateFuncOptions()
-                .scale_factor({scale_factor, scale_factor, scale_factor})
+                .scale_factor(std::vector<double>({scale_factor, scale_factor, scale_factor}))
                .mode(torch::kTrilinear)
                .align_corners(align_corners);
        auto output = F::interpolate(input, options);
@ -1924,13 +1924,13 @@ TEST_F(FunctionalTest, Interpolate) {
  {
    auto input = torch::randn({3, 2, 2});
    ASSERT_THROWS_WITH(
-        F::interpolate(input[0], F::InterpolateFuncOptions().size({4, 4})),
+        F::interpolate(input[0], F::InterpolateFuncOptions().size(std::vector<int64_t>({4, 4}))),
        "Input Error: Only 3D, 4D and 5D input Tensors supported (got 2D) "
        "for the modes: nearest | linear | bilinear | bicubic | trilinear (got kNearest)");
    ASSERT_THROWS_WITH(
        F::interpolate(
            torch::reshape(input, {1, 1, 1, 3, 2, 2}),
-            F::InterpolateFuncOptions().size({1, 1, 1, 3, 4, 4})),
+            F::InterpolateFuncOptions().size(std::vector<int64_t>({1, 1, 1, 3, 4, 4}))),
        "Input Error: Only 3D, 4D and 5D input Tensors supported (got 6D) "
        "for the modes: nearest | linear | bilinear | bicubic | trilinear (got kNearest)");
    ASSERT_THROWS_WITH(
@ -1939,12 +1939,12 @@ TEST_F(FunctionalTest, Interpolate) {
    ASSERT_THROWS_WITH(
        F::interpolate(
            input,
-            F::InterpolateFuncOptions().size({3, 4, 4}).scale_factor({0.5})),
+            F::InterpolateFuncOptions().size(std::vector<int64_t>({3, 4, 4})).scale_factor(std::vector<double>({0.5}))),
        "only one of size or scale_factor should be defined");
    ASSERT_THROWS_WITH(
-        F::interpolate(input, F::InterpolateFuncOptions().scale_factor({3, 2})),
+        F::interpolate(input, F::InterpolateFuncOptions().scale_factor(std::vector<double>({3, 2}))),
        "scale_factor shape must match input shape. "
-        "Input is 1D, scale_factor size is 2");
+        "Input is 1D, scale_factor size is [3, 2]");
    ASSERT_THROWS_WITH(
        F::interpolate(
            input,
@ -2328,9 +2328,15 @@ TEST_F(FunctionalTest, AlphaDropout) {
  auto input_std = input.std();
  for (const auto rate : {0.2, 0.5, 0.8}) {
-    auto output = F::alpha_dropout(input, F::AlphaDropoutFuncOptions().p(rate).training(false));
+    for (const auto inplace : {false, true}) {
-    ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
+      auto input_ = input.clone();
-    ASSERT_TRUE(torch::allclose(input_std, output.std(), 0.1));
+      auto output = F::alpha_dropout(input_, F::AlphaDropoutFuncOptions().p(rate).training(false).inplace(inplace));
      ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
      ASSERT_TRUE(torch::allclose(input_std, output.std(), 0.1));
      if (inplace) {
        ASSERT_TRUE(torch::allclose(input_, output));
      }
    }
  }
  auto output = F::detail::alpha_dropout(input, 0.5, false, false);
  ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
@ -2343,9 +2349,15 @@ TEST_F(FunctionalTest, FeatureAlphaDropout) {
  auto input_std = input.std();
  for (const auto rate : {0.2, 0.5, 0.8}) {
-    auto output = F::feature_alpha_dropout(input, F::FeatureAlphaDropoutFuncOptions().p(rate).training(false));
+    for (const auto inplace : {false, true}) {
-    ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
+      auto input_ = input.clone();
-    ASSERT_TRUE(torch::allclose(input_std, output.std(), 0.1));
+      auto output = F::feature_alpha_dropout(input_, F::FeatureAlphaDropoutFuncOptions().p(rate).training(false).inplace(inplace));
      ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
      ASSERT_TRUE(torch::allclose(input_std, output.std(), 0.1));
      if (inplace) {
        ASSERT_TRUE(torch::allclose(input_, output));
      }
    }
  }
  auto output = F::feature_alpha_dropout(input);
  ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
--- a/test/cpp/api/modules.cpp
+++ b/test/cpp/api/modules.cpp
@ -1300,54 +1300,81 @@ TEST_F(ModulesTest, FeatureAlphaDropout) {
 }
 TEST_F(ModulesTest, Dropout) {
-  Dropout dropout(0.5);
+  for (const auto inplace : {false, true}) {
-  torch::Tensor x = torch::ones(100, torch::requires_grad());
+    Dropout dropout(DropoutOptions(0.5).inplace(inplace));
-  torch::Tensor y = dropout(x);
+    torch::Tensor x = torch::ones(100);
    if (!inplace) {
      x.requires_grad_(true);
    }
    torch::Tensor y = dropout(x);
-  y.backward(torch::ones_like(y));
+    ASSERT_EQ(y.ndimension(), 1);
-  ASSERT_EQ(y.ndimension(), 1);
+    ASSERT_EQ(y.size(0), 100);
-  ASSERT_EQ(y.size(0), 100);
+    ASSERT_LT(y.sum().item<float>(), 130); // Probably
-  ASSERT_LT(y.sum().item<float>(), 130); // Probably
+    ASSERT_GT(y.sum().item<float>(), 70); // Probably
-  ASSERT_GT(y.sum().item<float>(), 70); // Probably
+    if (inplace) {
      ASSERT_TRUE(y.allclose(x));
    } else {
      y.backward(torch::ones_like(y));
    }
-  dropout->eval();
+    dropout->eval();
-  y = dropout(x);
+    y = dropout(torch::ones(100));
-  ASSERT_EQ(y.sum().item<float>(), 100);
+    ASSERT_EQ(y.sum().item<float>(), 100);
  }
 }
 TEST_F(ModulesTest, Dropout2d) {
-  Dropout2d dropout(0.5);
+  for (const auto inplace : {false, true}) {
-  torch::Tensor x = torch::ones({10, 10}, torch::requires_grad());
+    Dropout2d dropout(Dropout2dOptions(0.5).inplace(inplace));
-  torch::Tensor y = dropout(x);
+    torch::Tensor x = torch::ones({10, 10});
    if (!inplace) {
      x.requires_grad_(true);
    }
    torch::Tensor y = dropout(x);
-  y.backward(torch::ones_like(y));
+    ASSERT_EQ(y.ndimension(), 2);
-  ASSERT_EQ(y.ndimension(), 2);
+    ASSERT_EQ(y.size(0), 10);
-  ASSERT_EQ(y.size(0), 10);
+    ASSERT_EQ(y.size(1), 10);
-  ASSERT_EQ(y.size(1), 10);
+    ASSERT_LT(y.sum().item<float>(), 130); // Probably
-  ASSERT_LT(y.sum().item<float>(), 130); // Probably
+    ASSERT_GT(y.sum().item<float>(), 70); // Probably
-  ASSERT_GT(y.sum().item<float>(), 70); // Probably
+    if (inplace) {
      ASSERT_TRUE(y.allclose(x));
    } else {
      y.backward(torch::ones_like(y));
    }
-  dropout->eval();
+    dropout->eval();
-  y = dropout(x);
+    y = dropout(torch::ones({10, 10}));
-  ASSERT_EQ(y.sum().item<float>(), 100);
+    ASSERT_EQ(y.sum().item<float>(), 100);
  }
 }
 TEST_F(ModulesTest, Dropout3d) {
-  Dropout3d dropout(0.5);
+  for (const auto inplace : {false, true}) {
-  torch::Tensor x = torch::ones({4, 5, 5}, torch::requires_grad());
+    Dropout3d dropout(Dropout3dOptions(0.5).inplace(inplace));
-  torch::Tensor y = dropout(x);
+    torch::Tensor x = torch::ones({4, 5, 5});
    if (!inplace) {
      x.requires_grad_(true);
    }
    torch::Tensor y = dropout(x);
-  y.backward(torch::ones_like(y));
+    ASSERT_EQ(y.ndimension(), 3);
-  ASSERT_EQ(y.ndimension(), 3);
+    ASSERT_EQ(y.size(0), 4);
-  ASSERT_EQ(y.size(0), 4);
+    ASSERT_EQ(y.size(1), 5);
-  ASSERT_EQ(y.size(1), 5);
+    ASSERT_EQ(y.size(1), 5);
-  ASSERT_EQ(y.size(1), 5);
+    ASSERT_LT(y.sum().item<float>(), 130); // Probably
-  ASSERT_LT(y.sum().item<float>(), 130); // Probably
+    ASSERT_GT(y.sum().item<float>(), 70); // Probably
-  ASSERT_GT(y.sum().item<float>(), 70); // Probably
+    if (inplace) {
      ASSERT_TRUE(y.allclose(x));
    } else {
      y.backward(torch::ones_like(y));
    }
-  dropout->eval();
+    dropout->eval();
-  y = dropout(x);
+    y = dropout(torch::ones({4, 5, 5}));
-  ASSERT_EQ(y.sum().item<float>(), 100);
+    ASSERT_EQ(y.sum().item<float>(), 100);
  }
 }
 TEST_F(ModulesTest, Parameters) {
@ -2147,38 +2174,58 @@ TEST_F(ModulesTest, PairwiseDistance) {
 TEST_F(ModulesTest, ELU) {
  const auto size = 3;
  for (const auto alpha : {0.0, 0.42, 1.0, 4.2, 42.42}) {
-    ELU model {ELUOptions().alpha(alpha)};
+    for (const auto inplace : {false, true}) {
-    auto x = torch::linspace(-10.0, 10.0, size * size * size);
+      ELU model {ELUOptions().alpha(alpha).inplace(inplace)};
-    x.resize_({size, size, size}).set_requires_grad(true);
+      auto x = torch::linspace(-10.0, 10.0, size * size * size);
-    auto y = model(x);
+      x.resize_({size, size, size});
-    torch::Tensor s = y.sum();
+      if (!inplace) {
        x.requires_grad_(true);
      }
      auto x_orig = x.clone();
      auto y = model(x);
      torch::Tensor s = y.sum();
-    s.backward();
+      ASSERT_EQ(s.ndimension(), 0);
    ASSERT_EQ(s.ndimension(), 0);
-    ASSERT_EQ(y.ndimension(), 3);
+      ASSERT_EQ(y.ndimension(), 3);
-    ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
+      ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
-    auto y_exp = torch::max(torch::zeros_like(x), x) +
+      auto y_exp = torch::max(torch::zeros_like(x_orig), x_orig) +
-                 torch::min(torch::zeros_like(x), alpha * (torch::exp(x) - 1.0));
+                   torch::min(torch::zeros_like(x_orig), alpha * (torch::exp(x_orig) - 1.0));
-    ASSERT_TRUE(torch::allclose(y, y_exp));
+      ASSERT_TRUE(torch::allclose(y, y_exp));
      if (inplace) {
        ASSERT_TRUE(torch::allclose(x, y_exp));
      } else {
        s.backward();
      }
    }
  }
 }
 TEST_F(ModulesTest, SELU) {
-  SELU model;
+  for (const auto inplace : {false, true}) {
-  auto input = torch::randn({5, 5}, torch::requires_grad());
+    SELU model(inplace);
-  auto output = model->forward(input);
+    auto input = torch::randn({5, 5});
-  const double scale = 1.0507009873554804934193349852946;
+    if (!inplace) {
-  const double alpha = 1.6732632423543772848170429916717;
+      input.requires_grad_(true);
-  auto zero = torch::zeros_like(input);
+    }
-  auto expected = scale *
+    auto input_orig = input.clone();
-      (torch::max(zero, input) +
+    auto output = model->forward(input);
-       torch::min(zero, alpha * (torch::exp(input) - 1)));
+    const double scale = 1.0507009873554804934193349852946;
-  auto s = output.sum();
+    const double alpha = 1.6732632423543772848170429916717;
-  s.backward();
+    auto zero = torch::zeros_like(input);
    auto expected = scale *
        (torch::max(zero, input_orig) +
         torch::min(zero, alpha * (torch::exp(input_orig) - 1)));
    auto s = output.sum();
-  ASSERT_EQ(s.ndimension(), 0);
+    ASSERT_EQ(s.ndimension(), 0);
-  ASSERT_TRUE(output.allclose(expected));
+    ASSERT_TRUE(output.allclose(expected));
    if (inplace) {
      ASSERT_TRUE(input.allclose(expected));
    } else {
      s.backward();
    }
  }
 }
 TEST_F(ModulesTest, Hardshrink) {
@ -2192,7 +2239,6 @@ TEST_F(ModulesTest, Hardshrink) {
    s.backward();
    ASSERT_EQ(s.ndimension(), 0);
    ASSERT_EQ(y.ndimension(), 3);
    ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
    auto y_exp = (x.abs() > lambda) * x;
@ -2204,21 +2250,30 @@ TEST_F(ModulesTest, Hardtanh) {
  const auto size = 3;
  for (const auto min_val : {-4.2, -1.0, -0.42, 0.0}) {
    for (const auto max_val : {0.42, 1.0, 4.2}) {
-      Hardtanh model {HardtanhOptions().min_val(min_val).max_val(max_val)};
+      for (const auto inplace : {false, true}) {
-      auto x = torch::linspace(-10.0, 10.0, size * size * size);
+        Hardtanh model {HardtanhOptions().min_val(min_val).max_val(max_val).inplace(inplace)};
-      x.resize_({size, size, size}).set_requires_grad(true);
+        auto x = torch::linspace(-10.0, 10.0, size * size * size);
-      auto y = model(x);
+        x.resize_({size, size, size});
-      torch::Tensor s = y.sum();
+        if (!inplace) {
          x.requires_grad_(true);
        }
        auto x_orig = x.clone();
        auto y = model(x);
        torch::Tensor s = y.sum();
-      s.backward();
+        ASSERT_EQ(s.ndimension(), 0);
-      ASSERT_EQ(s.ndimension(), 0);
+        ASSERT_EQ(y.ndimension(), 3);
-
+        ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
-      ASSERT_EQ(y.ndimension(), 3);
+        auto y_exp = (x_orig < min_val) * min_val +
-      ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
+                     ((x_orig >= min_val) * (x_orig <= max_val)) * x_orig +
-      auto y_exp = (x < min_val) * min_val +
+                     (x_orig > max_val) * max_val;
-                   ((x >= min_val) * (x <= max_val)) * x +
+        ASSERT_TRUE(torch::allclose(y, y_exp));
-                   (x > max_val) * max_val;
+        if (inplace) {
-      ASSERT_TRUE(torch::allclose(y, y_exp));
+          ASSERT_TRUE(torch::allclose(x, y_exp));
        } else {
          s.backward();
        }
      }
    }
  }
 }
@ -2238,20 +2293,29 @@ TEST_F(ModulesTest, HardtanhMinValGEMaxVal) {
 TEST_F(ModulesTest, LeakyReLU) {
  const auto size = 3;
-  for (const auto negative_slope : {0.0, 0.42, 1.0}) {
+  for (const auto inplace : {false, true}) {
-    LeakyReLU model {LeakyReLUOptions().negative_slope(negative_slope)};
+    for (const auto negative_slope : {0.0, 0.42, 1.0}) {
-    auto x = torch::linspace(-10.0, 10.0, size * size * size);
+      LeakyReLU model {LeakyReLUOptions().negative_slope(negative_slope).inplace(inplace)};
-    x.resize_({size, size, size}).set_requires_grad(true);
+      auto x = torch::linspace(-10.0, 10.0, size * size * size);
-    auto y = model(x);
+      x.resize_({size, size, size});
-    torch::Tensor s = y.sum();
+      if (!inplace) {
        x.requires_grad_(true);
      }
      auto x_orig = x.clone();
      auto y = model(x);
      torch::Tensor s = y.sum();
-    s.backward();
+      ASSERT_EQ(s.ndimension(), 0);
-    ASSERT_EQ(s.ndimension(), 0);
+      ASSERT_EQ(y.ndimension(), 3);
-
+      ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
-    ASSERT_EQ(y.ndimension(), 3);
+      auto y_exp = (x_orig < 0) * x_orig * negative_slope + (x_orig >= 0) * x_orig;
-    ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
+      ASSERT_TRUE(torch::allclose(y, y_exp));
-    auto y_exp = (x < 0) * x * negative_slope + (x >= 0) * x;
+      if (inplace) {
-    ASSERT_TRUE(torch::allclose(y, y_exp));
+        ASSERT_TRUE(torch::allclose(x, y_exp));
      } else {
        s.backward();
      }
    }
  }
 }
@ -2394,78 +2458,114 @@ TEST_F(ModulesTest, PReLU) {
 }
 TEST_F(ModulesTest, ReLU) {
-  const auto size = 3;
+  for (const auto inplace : {false, true}) {
-  ReLU model;
+    const auto size = 3;
-  auto x = torch::linspace(-10.0, 10.0, size * size * size);
+    ReLU model(inplace);
-  x.resize_({size, size, size}).set_requires_grad(true);
+    auto x = torch::linspace(-10.0, 10.0, size * size * size);
-  auto y = model(x);
+    x.resize_({size, size, size});
-  torch::Tensor s = y.sum();
+    if (!inplace) {
      x.requires_grad_(true);
    }
    auto x_orig = x.clone();
    auto y = model(x);
    torch::Tensor s = y.sum();
-  s.backward();
+    ASSERT_EQ(s.ndimension(), 0);
-  ASSERT_EQ(s.ndimension(), 0);
+    ASSERT_EQ(y.ndimension(), 3);
-
+    ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
-  ASSERT_EQ(y.ndimension(), 3);
+    auto y_exp = (x_orig < 0) * 0 + (x_orig >= 0) * x_orig;
-  ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
+    ASSERT_TRUE(torch::allclose(y, y_exp));
-  auto y_exp = (x < 0) * 0 + (x >= 0) * x;
+    if (inplace) {
-  ASSERT_TRUE(torch::allclose(y, y_exp));
+      ASSERT_TRUE(torch::allclose(x, y_exp));
    } else {
      s.backward();
    }
  }
 }
 TEST_F(ModulesTest, ReLU6) {
-  const auto size = 3;
+  for (const auto inplace : {false, true}) {
-  ReLU6 model;
+    const auto size = 3;
-  auto x = torch::linspace(-10.0, 10.0, size * size * size);
+    ReLU6 model(inplace);
-  x.resize_({size, size, size}).set_requires_grad(true);
+    auto x = torch::linspace(-10.0, 10.0, size * size * size);
-  auto y = model(x);
+    x.resize_({size, size, size});
-  torch::Tensor s = y.sum();
+    if (!inplace) {
      x.requires_grad_(true);
    }
    auto x_orig = x.clone();
    auto y = model(x);
    torch::Tensor s = y.sum();
-  s.backward();
+    ASSERT_EQ(s.ndimension(), 0);
-  ASSERT_EQ(s.ndimension(), 0);
+    ASSERT_EQ(y.ndimension(), 3);
-
+    ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
-  ASSERT_EQ(y.ndimension(), 3);
+    auto y_exp = (x_orig < 0) * 0 + ((x_orig >= 0) * (x_orig <= 6)) * x_orig + (x_orig > 6) * 6;
-  ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
+    ASSERT_TRUE(torch::allclose(y, y_exp));
-  auto y_exp = (x < 0) * 0 + ((x >= 0) * (x <= 6)) * x + (x > 6) * 6;
+    if (inplace) {
-  ASSERT_TRUE(torch::allclose(y, y_exp));
+      ASSERT_TRUE(torch::allclose(x, y_exp));
    } else {
      s.backward();
    }
  }
 }
 TEST_F(ModulesTest, RReLU) {
  const auto size = 3;
  for (const auto lower : {0.01, 0.1, 0.2}) {
    for (const auto upper : {0.3, 0.4, 0.5}) {
-      RReLU model {RReLUOptions().lower(lower).upper(upper)};
+      for (const auto inplace : {false, true}) {
-      auto x = torch::linspace(-10.0, 10.0, size * size * size);
+        RReLU model {RReLUOptions().lower(lower).upper(upper).inplace(inplace)};
-      x.resize_({size, size, size}).set_requires_grad(true);
+        auto x = torch::linspace(-10.0, 10.0, size * size * size);
-      auto y = model(x);
+        x.resize_({size, size, size});
-      torch::Tensor s = y.sum();
+        if (!inplace) {
          x.requires_grad_(true);
        }
        auto x_orig = x.clone();
        auto y = model(x);
        torch::Tensor s = y.sum();
-      s.backward();
+        ASSERT_EQ(s.ndimension(), 0);
-      ASSERT_EQ(s.ndimension(), 0);
+        ASSERT_EQ(y.ndimension(), 3);
-
+        ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
-      ASSERT_EQ(y.ndimension(), 3);
+        auto z = ((x_orig >= 0) * (x_orig == y) +
-      ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
+          (x_orig < 0) * (y >= x_orig * upper) * (y <= lower * x_orig)) * 1.0;
-      auto z = ((x >= 0) * (x == y) +
+        ASSERT_TRUE(torch::allclose(z, torch::ones_like(z)));
-        (x < 0) * (y >= x * upper) * (y <= lower * x)) * 1.0;
+        if (inplace) {
-      ASSERT_TRUE(torch::allclose(z, torch::ones_like(z)));
+          ASSERT_TRUE(torch::allclose(x, y));
        } else {
          s.backward();
        }
      }
    }
  }
 }
 TEST_F(ModulesTest, CELU) {
  const auto size = 3;
-  for (const auto alpha : {0.42, 1.0, 4.2, 42.42}) {
+  for (const auto inplace : {false, true}) {
-    CELU model {CELUOptions().alpha(alpha)};
+    for (const auto alpha : {0.42, 1.0, 4.2, 42.42}) {
-    auto x = torch::linspace(-10.0, 10.0, size * size * size);
+      CELU model {CELUOptions().alpha(alpha).inplace(inplace)};
-    x.resize_({size, size, size}).set_requires_grad(true);
+      auto x = torch::linspace(-10.0, 10.0, size * size * size);
-    auto y = model(x);
+      x.resize_({size, size, size});
-    torch::Tensor s = y.sum();
+      if (!inplace) {
        x.requires_grad_(true);
      }
      auto x_orig = x.clone();
      auto y = model(x);
      torch::Tensor s = y.sum();
-    s.backward();
+      ASSERT_EQ(s.ndimension(), 0);
-    ASSERT_EQ(s.ndimension(), 0);
+      ASSERT_EQ(y.ndimension(), 3);
-
+      ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
-    ASSERT_EQ(y.ndimension(), 3);
+      auto y_exp = torch::max(torch::zeros_like(x_orig), x_orig) +
-    ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
+          torch::min(torch::zeros_like(x_orig), alpha * (torch::exp(x_orig / alpha) - 1.0));
-    auto y_exp = torch::max(torch::zeros_like(x), x) +
+      ASSERT_TRUE(torch::allclose(y, y_exp));
-        torch::min(torch::zeros_like(x), alpha * (torch::exp(x / alpha) - 1.0));
+      if (inplace) {
-    ASSERT_TRUE(torch::allclose(y, y_exp));
+        ASSERT_TRUE(torch::allclose(x, y_exp));
      } else {
        s.backward();
      }
    }
  }
 }
@ -2597,12 +2697,16 @@ TEST_F(ModulesTest, Threshold) {
        Threshold model {ThresholdOptions(threshold, value).inplace(inplace)};
        auto x = torch::linspace(-3.0, 3.0, 61);
        x.resize_({size, size, size});
-        auto y_exp = (x <= threshold) * value + (x > threshold) * x;
+        auto x_orig = x.clone();
        auto y_exp = (x_orig <= threshold) * value + (x_orig > threshold) * x_orig;
        auto y = model(x);
        ASSERT_EQ(y.ndimension(), 3);
        ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
        ASSERT_TRUE(torch::allclose(y, y_exp));
        if (inplace) {
          ASSERT_TRUE(torch::allclose(x, y_exp));
        }
      }
    }
  }
@ -2611,7 +2715,7 @@ TEST_F(ModulesTest, Threshold) {
 TEST_F(ModulesTest, Upsampling1D) {
  {
    Upsample model(UpsampleOptions()
-                       .size({4})
+                       .size(std::vector<int64_t>({4}))
                       .mode(torch::kNearest));
    auto input = torch::ones({1, 1, 2}, torch::requires_grad());
    auto output = model->forward(input);
@ -2627,7 +2731,7 @@ TEST_F(ModulesTest, Upsampling1D) {
      // test float scale factor up & down sampling
      for (const auto scale_factor : {0.5, 1.5, 2.0}) {
        Upsample model(UpsampleOptions()
-                           .scale_factor({scale_factor})
+                           .scale_factor(std::vector<double>({scale_factor}))
                           .mode(torch::kLinear)
                           .align_corners(align_corners));
        auto input = torch::ones({1, 1, 2}, torch::requires_grad());
@ -2646,7 +2750,7 @@ TEST_F(ModulesTest, Upsampling1D) {
  {
    // linear (1D) upsampling spatial invariance
    Upsample model(UpsampleOptions()
-                       .scale_factor({3})
+                       .scale_factor(std::vector<double>({3}))
                       .mode(torch::kLinear)
                       .align_corners(false));
    auto input = torch::zeros({1, 1, 9});
@ -2661,7 +2765,7 @@ TEST_F(ModulesTest, Upsampling1D) {
 TEST_F(ModulesTest, Upsampling2D) {
  {
    Upsample model(UpsampleOptions()
-                       .size({4, 4})
+                       .size(std::vector<int64_t>({4, 4}))
                       .mode(torch::kNearest));
    auto input = torch::ones({1, 1, 2, 2}, torch::requires_grad());
    auto output = model->forward(input);
@ -2677,7 +2781,7 @@ TEST_F(ModulesTest, Upsampling2D) {
      // test float scale factor up & down sampling
      for (const auto scale_factor : {0.5, 1.5, 2.0}) {
        Upsample model(UpsampleOptions()
-                           .scale_factor({scale_factor, scale_factor})
+                           .scale_factor(std::vector<double>({scale_factor, scale_factor}))
                           .mode(torch::kBilinear)
                           .align_corners(align_corners));
        auto input = torch::ones({1, 1, 2, 2}, torch::requires_grad());
@ -2698,7 +2802,7 @@ TEST_F(ModulesTest, Upsampling2D) {
      // test float scale factor up & down sampling
      for (const auto scale_factor : {0.5, 1.5, 2.0}) {
        Upsample model(UpsampleOptions()
-                           .scale_factor({scale_factor, scale_factor})
+                           .scale_factor(std::vector<double>({scale_factor, scale_factor}))
                           .mode(torch::kBicubic)
                           .align_corners(align_corners));
        auto input = torch::ones({1, 1, 2, 2}, torch::requires_grad());
@ -2719,7 +2823,7 @@ TEST_F(ModulesTest, Upsampling2D) {
 TEST_F(ModulesTest, Upsampling3D) {
  {
    Upsample model(UpsampleOptions()
-                       .size({4, 4, 4})
+                       .size(std::vector<int64_t>({4, 4, 4}))
                       .mode(torch::kNearest));
    auto input = torch::ones({1, 1, 2, 2, 2}, torch::requires_grad());
    auto output = model->forward(input);
@ -2736,7 +2840,7 @@ TEST_F(ModulesTest, Upsampling3D) {
      for (const auto scale_factor : {0.5, 1.5, 2.0}) {
        Upsample model(
            UpsampleOptions()
-                .scale_factor({scale_factor, scale_factor, scale_factor})
+                .scale_factor(std::vector<double>({scale_factor, scale_factor, scale_factor}))
                .mode(torch::kTrilinear)
                .align_corners(align_corners));
        auto input = torch::ones({1, 1, 2, 2, 2}, torch::requires_grad());
@ -3876,10 +3980,10 @@ TEST_F(ModulesTest, PrettyPrintConvTranspose) {
 TEST_F(ModulesTest, PrettyPrintUpsample) {
  ASSERT_EQ(
-      c10::str(Upsample(UpsampleOptions().size({2, 4, 4}))),
+      c10::str(Upsample(UpsampleOptions().size(std::vector<int64_t>({2, 4, 4})))),
      "torch::nn::Upsample(size=[2, 4, 4], mode=kNearest)");
  ASSERT_EQ(
-      c10::str(Upsample(UpsampleOptions().scale_factor({0.5, 1.5}).mode(torch::kBilinear))),
+      c10::str(Upsample(UpsampleOptions().scale_factor(std::vector<double>({0.5, 1.5})).mode(torch::kBilinear))),
      "torch::nn::Upsample(scale_factor=[0.5, 1.5], mode=kBilinear)");
 }
@ -3987,15 +4091,27 @@ TEST_F(ModulesTest, PrettyPrintAdaptiveMaxPool) {
      c10::str(AdaptiveMaxPool2d(5)),
      "torch::nn::AdaptiveMaxPool2d(output_size=[5, 5])");
  ASSERT_EQ(
-      c10::str(AdaptiveMaxPool2d(std::vector<int64_t>{5, 6})),
+      c10::str(AdaptiveMaxPool2d(AdaptiveMaxPool2dOptions({5, 6}))),
      "torch::nn::AdaptiveMaxPool2d(output_size=[5, 6])");
  ASSERT_EQ(
      c10::str(AdaptiveMaxPool2d(AdaptiveMaxPool2dOptions({5, c10::nullopt}))),
      "torch::nn::AdaptiveMaxPool2d(output_size=[5, None])");
  ASSERT_EQ(
      c10::str(AdaptiveMaxPool2d(AdaptiveMaxPool2dOptions({c10::nullopt, c10::nullopt}))),
      "torch::nn::AdaptiveMaxPool2d(output_size=[None, None])");
  ASSERT_EQ(
      c10::str(AdaptiveMaxPool3d(5)),
      "torch::nn::AdaptiveMaxPool3d(output_size=[5, 5, 5])");
  ASSERT_EQ(
-      c10::str(AdaptiveMaxPool3d(std::vector<int64_t>{5, 6, 7})),
+      c10::str(AdaptiveMaxPool3d(AdaptiveMaxPool3dOptions({5, 6, 7}))),
      "torch::nn::AdaptiveMaxPool3d(output_size=[5, 6, 7])");
  ASSERT_EQ(
      c10::str(AdaptiveMaxPool3d(AdaptiveMaxPool3dOptions({5, c10::nullopt, 7}))),
      "torch::nn::AdaptiveMaxPool3d(output_size=[5, None, 7])");
  ASSERT_EQ(
      c10::str(AdaptiveMaxPool3d(AdaptiveMaxPool3dOptions({c10::nullopt, c10::nullopt, c10::nullopt}))),
      "torch::nn::AdaptiveMaxPool3d(output_size=[None, None, None])");
 }
 TEST_F(ModulesTest, PrettyPrintAdaptiveAvgPool) {
@ -4007,15 +4123,27 @@ TEST_F(ModulesTest, PrettyPrintAdaptiveAvgPool) {
      c10::str(AdaptiveAvgPool2d(5)),
      "torch::nn::AdaptiveAvgPool2d(output_size=[5, 5])");
  ASSERT_EQ(
-      c10::str(AdaptiveAvgPool2d(std::vector<int64_t>{5, 6})),
+      c10::str(AdaptiveAvgPool2d(AdaptiveAvgPool2dOptions({5, 6}))),
      "torch::nn::AdaptiveAvgPool2d(output_size=[5, 6])");
  ASSERT_EQ(
      c10::str(AdaptiveAvgPool2d(AdaptiveAvgPool2dOptions({5, c10::nullopt}))),
      "torch::nn::AdaptiveAvgPool2d(output_size=[5, None])");
  ASSERT_EQ(
      c10::str(AdaptiveAvgPool2d(AdaptiveAvgPool2dOptions({c10::nullopt, c10::nullopt}))),
      "torch::nn::AdaptiveAvgPool2d(output_size=[None, None])");
  ASSERT_EQ(
      c10::str(AdaptiveAvgPool3d(5)),
      "torch::nn::AdaptiveAvgPool3d(output_size=[5, 5, 5])");
  ASSERT_EQ(
-      c10::str(AdaptiveAvgPool3d(std::vector<int64_t>{5, 6, 7})),
+      c10::str(AdaptiveAvgPool3d(AdaptiveAvgPool3dOptions({5, 6, 7}))),
      "torch::nn::AdaptiveAvgPool3d(output_size=[5, 6, 7])");
  ASSERT_EQ(
      c10::str(AdaptiveAvgPool3d(AdaptiveAvgPool3dOptions({5, c10::nullopt, 7}))),
      "torch::nn::AdaptiveAvgPool3d(output_size=[5, None, 7])");
  ASSERT_EQ(
      c10::str(AdaptiveAvgPool3d(AdaptiveAvgPool3dOptions({c10::nullopt, c10::nullopt, c10::nullopt}))),
      "torch::nn::AdaptiveAvgPool3d(output_size=[None, None, None])");
 }
 TEST_F(ModulesTest, PrettyPrintMaxUnpool) {
--- a/test/cpp/api/optim.cpp
+++ b/test/cpp/api/optim.cpp
@ -26,7 +26,7 @@ bool test_optimizer_xor(Options options) {
      Linear(8, 1),
      Functional(torch::sigmoid));
-  const int64_t kBatchSize = 4;
+  const int64_t kBatchSize = 50;
  const int64_t kMaximumNumberOfEpochs = 3000;
  OptimizerClass optimizer(model->parameters(), options);
@ -40,13 +40,21 @@ bool test_optimizer_xor(Options options) {
      inputs[i] = torch::randint(2, {2}, torch::kInt64);
      labels[i] = inputs[i][0].item<int64_t>() ^ inputs[i][1].item<int64_t>();
    }
    inputs.set_requires_grad(true);
    optimizer.zero_grad();
    auto x = model->forward(inputs);
    torch::Tensor loss = torch::binary_cross_entropy(x, labels);
    loss.backward();
-    optimizer.step();
+    inputs.set_requires_grad(true);
    auto step = [&](OptimizerClass& optimizer, Sequential model, torch::Tensor inputs, torch::Tensor labels) {
      auto closure = [&]() {
        optimizer.zero_grad();
        auto x = model->forward(inputs);
        auto loss = torch::binary_cross_entropy(x, labels);
        loss.backward();
        return loss;
      };
      return optimizer.step(closure);
    };
    torch::Tensor loss = step(optimizer, model, inputs, labels);
    running_loss = running_loss * 0.99 + loss.item<float>() * 0.01;
    if (epoch > kMaximumNumberOfEpochs) {
@ -166,30 +174,66 @@ TEST(OptimTest, OptimizerAccessors) {
  optimizer_.state();
 }
-TEST(OptimTest, BasicInterface) {
+#define OLD_INTERFACE_WARNING_CHECK(func) \
 { \
  std::stringstream buffer;\
  torch::test::CerrRedirect cerr_redirect(buffer.rdbuf());\
  func;\
  ASSERT_EQ(\
    torch::test::count_substr_occurrences(\
      buffer.str(),\
      "will be removed"\
    ),\
  1);\
 }
 struct MyOptimizerOptions : public OptimizerCloneableOptions<MyOptimizerOptions> {
  MyOptimizerOptions(double lr = 1.0) : lr_(lr) {};
  TORCH_ARG(double, lr) = 1.0;
 };
 TEST(OptimTest, OldInterface) {
  struct MyOptimizer : Optimizer {
    using Optimizer::Optimizer;
    torch::Tensor step(LossClosure closure = nullptr) override { return {};}
    explicit MyOptimizer(
        std::vector<at::Tensor> params, MyOptimizerOptions defaults = {}) :
          Optimizer({std::move(OptimizerParamGroup(params))}, std::make_unique<MyOptimizerOptions>(defaults)) {}
  };
  std::vector<torch::Tensor> parameters = {
      torch::ones({2, 3}), torch::zeros({2, 3}), torch::rand({2, 3})};
  {
    MyOptimizer optimizer(parameters);
-    ASSERT_EQ(optimizer.size(), parameters.size());
+    size_t size;
    OLD_INTERFACE_WARNING_CHECK(size = optimizer.size());
    ASSERT_EQ(size, parameters.size());
  }
  {
-    MyOptimizer optimizer;
+    std::vector<at::Tensor> params;
-    ASSERT_EQ(optimizer.size(), 0);
+    MyOptimizer optimizer(params);
-    optimizer.add_parameters(parameters);
+
-    ASSERT_EQ(optimizer.size(), parameters.size());
+    size_t size;
-    for (size_t p = 0; p < parameters.size(); ++p) {
+    OLD_INTERFACE_WARNING_CHECK(size = optimizer.size());
-      ASSERT_TRUE(optimizer.parameters()[p].allclose(parameters[p]));
+    ASSERT_EQ(size, 0);
    OLD_INTERFACE_WARNING_CHECK(optimizer.add_parameters(parameters));
    OLD_INTERFACE_WARNING_CHECK(size = optimizer.size());
    ASSERT_EQ(size, parameters.size());
    std::vector<torch::Tensor> params_;
    OLD_INTERFACE_WARNING_CHECK(params_ = optimizer.parameters());
    for (size_t p = 0; p < size; ++p) {
      ASSERT_TRUE(params_[p].allclose(parameters[p]));
    }
  }
  {
    Linear linear(3, 4);
    MyOptimizer optimizer(linear->parameters());
-    ASSERT_EQ(optimizer.size(), linear->parameters().size());
+
    size_t size;
    OLD_INTERFACE_WARNING_CHECK(size = optimizer.size());
    ASSERT_EQ(size, linear->parameters().size());
  }
 }
@ -198,6 +242,11 @@ TEST(OptimTest, XORConvergence_SGD) {
      SGDOptions(0.1).momentum(0.9).nesterov(true).weight_decay(1e-6)));
 }
 TEST(OptimTest, XORConvergence_LBFGS) {
  ASSERT_TRUE(test_optimizer_xor<LBFGS>(LBFGSOptions(1.0)));
  ASSERT_TRUE(test_optimizer_xor<LBFGS>(LBFGSOptions(1.0).line_search_fn("strong_wolfe")));
 }
 TEST(OptimTest, XORConvergence_Adagrad) {
  ASSERT_TRUE(test_optimizer_xor<Adagrad>(
      AdagradOptions(1.0).weight_decay(1e-6).lr_decay(1e-3)));
@ -375,7 +424,7 @@ TEST(OptimTest, AddParameter_LBFGS) {
  }
  LBFGS optimizer(std::vector<torch::Tensor>{}, 1.0);
-  optimizer.add_parameters(parameters);
+  OLD_INTERFACE_WARNING_CHECK(optimizer.add_parameters(parameters));
  optimizer.step([]() { return torch::tensor(1); });
--- a/test/cpp/api/serialize.cpp
+++ b/test/cpp/api/serialize.cpp
@ -64,7 +64,7 @@ void is_optimizer_state_equal(
 }
 template <typename OptimizerClass, typename DerivedOptimizerOptions, typename DerivedOptimizerParamState>
-void test_serialize_optimizer(DerivedOptimizerOptions options) {
+void test_serialize_optimizer(DerivedOptimizerOptions options, bool only_has_global_state = false) {
  auto model1 = Linear(5, 2);
  auto model2 = Linear(5, 2);
  auto model3 = Linear(5, 2);
@ -125,9 +125,11 @@ void test_serialize_optimizer(DerivedOptimizerOptions options) {
  auto& optim3_2_state = optim3_2.state();
  auto& optim3_state = optim3.state();
-  // optim3_2 and optim1 should have param_groups and state of size 1 and 2 respectively
+  // optim3_2 and optim1 should have param_groups and state of size 1 and state_size respectively
  ASSERT_TRUE(optim3_2_param_groups.size() == 1);
-  ASSERT_TRUE(optim3_2_state.size() == 2);
+  // state_size = 2 for all optimizers except LBFGS as LBFGS only maintains one global state
  int state_size = only_has_global_state ? 1 : 2;
  ASSERT_TRUE(optim3_2_state.size() == state_size);
  // optim3_2 and optim1 should have param_groups and state of same size
  ASSERT_TRUE(optim3_2_param_groups.size() == optim3_param_groups.size());
@ -668,39 +670,16 @@ TEST(SerializeTest, Optim_RMSprop) {
 }
 TEST(SerializeTest, Optim_LBFGS) {
-  auto options = LBFGSOptions();
+  test_serialize_optimizer<LBFGS, LBFGSOptions, LBFGSParamState>(LBFGSOptions(), true);
  // bc compatibility check
  auto model1 = Linear(5, 2);
-  auto model2 = Linear(5, 2);
+  auto model1_params = model1->parameters();
-  auto model3 = Linear(5, 2);
+  // added a tensor for lazy init check - when all params do not have entry in buffers
-
+  model1_params.emplace_back(torch::randn({2,3}));
-  // Models 1, 2, 3 will have the same parameters.
+  auto optim1 = torch::optim::LBFGS(model1_params, torch::optim::LBFGSOptions());
  auto model_tempfile = c10::make_tempfile();
  torch::save(model1, model_tempfile.name);
  torch::load(model2, model_tempfile.name);
  torch::load(model3, model_tempfile.name);
  auto param1 = model1->named_parameters();
  auto param2 = model2->named_parameters();
  auto param3 = model3->named_parameters();
  for (const auto& p : param1) {
    ASSERT_TRUE(p->allclose(param2[p.key()]));
    ASSERT_TRUE(param2[p.key()].allclose(param3[p.key()]));
  }
  // Make some optimizers
  auto optim1 = LBFGS(
      {torch::optim::OptimizerParamGroup(model1->parameters())}, options);
  auto optim2 = LBFGS(
      model2->parameters(), options);
  auto optim2_2 = LBFGS(
      model2->parameters(), options);
  auto optim3 = LBFGS(
      model3->parameters(), options);
  auto optim3_2 = LBFGS(
      model3->parameters(), options);
  auto x = torch::ones({10, 5});
-
+  auto step = [&x](torch::optim::Optimizer& optimizer, Linear model) {
  auto step = [&x](torch::optim::LossClosureOptimizer& optimizer, Linear model) {
    optimizer.zero_grad();
    auto y = model->forward(x).sum();
    y.backward();
@ -708,56 +687,47 @@ TEST(SerializeTest, Optim_LBFGS) {
    optimizer.step(closure);
  };
  // Do 2 steps of model1
  step(optim1, model1);
  step(optim1, model1);
-  // Do 2 steps of model 2 without saving the optimizer
+  at::Tensor d, t, H_diag, prev_flat_grad, prev_loss;
-  step(optim2, model2);
+  std::deque<at::Tensor> old_dirs, old_stps;
  step(optim2_2, model2);
-  // Do 1 step of model 3
+  const auto& params_ = optim1.param_groups()[0].params();
-  step(optim3, model3);
+  auto key_ = c10::guts::to_string(params_[0].unsafeGetTensorImpl());
  const auto& optim1_state = static_cast<const LBFGSParamState&>(*(optim1.state().at(key_).get()));
  d = optim1_state.d();
  t = at::tensor(optim1_state.t());
  H_diag = optim1_state.H_diag();
  prev_flat_grad = optim1_state.prev_flat_grad();
  prev_loss = at::tensor(optim1_state.prev_loss());
  old_dirs = optim1_state.old_dirs();
-  // save the optimizer
+  // write buffers to the file
-  auto optim_tempfile = c10::make_tempfile();
+  auto optim_tempfile_old_format = c10::make_tempfile();
-  torch::save(optim3, optim_tempfile.name);
+  torch::serialize::OutputArchive output_archive;
-  torch::load(optim3_2, optim_tempfile.name);
+  output_archive.write("d", d, /*is_buffer=*/true);
  output_archive.write("t", t, /*is_buffer=*/true);
  output_archive.write("H_diag", H_diag, /*is_buffer=*/true);
  output_archive.write("prev_flat_grad", prev_flat_grad, /*is_buffer=*/true);
  output_archive.write("prev_loss", prev_loss, /*is_buffer=*/true);
  write_tensors_to_archive(output_archive, "old_dirs", old_dirs);
  write_tensors_to_archive(output_archive, "old_stps", old_stps);
  output_archive.save_to(optim_tempfile_old_format.name);
-  auto& optim3_2_param_groups = optim3_2.param_groups();
+  auto optim1_2 = LBFGS(model1_params, torch::optim::LBFGSOptions());
-  auto& optim3_param_groups = optim3.param_groups();
+  OLD_SERIALIZATION_LOGIC_WARNING_CHECK(torch::load, optim1_2, optim_tempfile_old_format.name);
  auto& optim3_2_state = optim3_2.state();
  auto& optim3_state = optim3.state();
-  // LBFGS only supports 1 param_group
+  const auto& params1_2_ = optim1_2.param_groups()[0].params();
-  // optim3_2 and optim1 should have param_groups of size 1
+  auto param_key = c10::guts::to_string(params1_2_[0].unsafeGetTensorImpl());
-  ASSERT_TRUE(optim3_param_groups.size() == 1);
+  auto& optim1_2_state = static_cast<LBFGSParamState&>(*(optim1_2.state().at(param_key).get()));
  ASSERT_TRUE(optim3_2_param_groups.size() == 1);
  // LBFGS only maintains one global state
  ASSERT_TRUE(optim3_2_state.size() == 1);
  ASSERT_TRUE(optim3_state.size() == 1);
-  // checking correctness of serialization logic for optimizer.param_groups_ and optimizer.state_
+  // old LBFGS didn't track func_evals, n_iter, ro, al values
-  for (int i = 0; i < optim3_2_param_groups.size(); i++) {
+  optim1_2_state.func_evals(optim1_state.func_evals());
-    is_optimizer_param_group_equal<LBFGSOptions>(
+  optim1_2_state.n_iter(optim1_state.n_iter());
-      optim3_2_param_groups[i], optim3_param_groups[i]);
+  optim1_2_state.ro(optim1_state.ro());
-    is_optimizer_state_equal<LBFGSParamState>(optim3_2_state, optim3_state);
+  optim1_2_state.al(optim1_state.al());
  }
-  // Do step2 for model 3
+  is_optimizer_state_equal<LBFGSParamState>(optim1.state(), optim1_2.state());
  step(optim3_2, model3);
  param1 = model1->named_parameters();
  param2 = model2->named_parameters();
  param3 = model3->named_parameters();
  for (const auto& p : param1) {
    const auto& name = p.key();
    // Model 1 and 3 should be the same
    ASSERT_TRUE(
        param1[name].norm().item<float>() == param3[name].norm().item<float>());
    ASSERT_TRUE(
        param1[name].norm().item<float>() != param2[name].norm().item<float>());
  }
 }
 TEST(SerializeTest, XOR_CUDA) {
--- a/test/cpp/jit/test_class_import.cpp
+++ b/test/cpp/jit/test_class_import.cpp
@ -138,7 +138,7 @@ void testClassDerive() {
 static const auto torchbindSrc = R"JIT(
 class FooBar1234(Module):
  __parameters__ = []
-  f : __torch__.torch.classes._TorchScriptTesting_StackString
+  f : __torch__.torch.classes._TorchScriptTesting._StackString
  training : bool
  def forward(self: __torch__.FooBar1234) -> str:
    return (self.f).top()
--- a/test/cpp/jit/test_custom_class.cpp
+++ b/test/cpp/jit/test_custom_class.cpp
@ -66,7 +66,7 @@ struct PickleTester : torch::CustomClassHolder {
  std::vector<int64_t> vals;
 };
-static auto test = torch::class_<Foo>("_TorchScriptTesting_Foo")
+static auto test = torch::class_<Foo>("_TorchScriptTesting", "_Foo")
                       .def(torch::init<int64_t, int64_t>())
                       // .def(torch::init<>())
                       .def("info", &Foo::info)
@ -75,7 +75,9 @@ static auto test = torch::class_<Foo>("_TorchScriptTesting_Foo")
                       .def("combine", &Foo::combine);
 static auto testStack =
-    torch::class_<MyStackClass<std::string>>("_TorchScriptTesting_StackString")
+    torch::class_<MyStackClass<std::string>>(
        "_TorchScriptTesting",
        "_StackString")
        .def(torch::init<std::vector<std::string>>())
        .def("push", &MyStackClass<std::string>::push)
        .def("pop", &MyStackClass<std::string>::pop)
@ -101,7 +103,7 @@ static auto testStack =
 // clang-format on
 static auto testPickle =
-    torch::class_<PickleTester>("_TorchScriptTesting_PickleTester")
+    torch::class_<PickleTester>("_TorchScriptTesting", "_PickleTester")
        .def(torch::init<std::vector<int64_t>>())
        .def_pickle(
            [](c10::intrusive_ptr<PickleTester> self) { // __getstate__
@ -127,10 +129,10 @@ at::Tensor take_an_instance(const c10::intrusive_ptr<PickleTester>& instance) {
 torch::RegisterOperators& register_take_instance() {
  static auto instance_registry = torch::RegisterOperators().op(
-  torch::RegisterOperators::options()
+      torch::RegisterOperators::options()
-      .schema(
+          .schema(
-          "_TorchScriptTesting::take_an_instance(__torch__.torch.classes._TorchScriptTesting_PickleTester x) -> Tensor Y")
+              "_TorchScriptTesting::take_an_instance(__torch__.torch.classes._TorchScriptTesting._PickleTester x) -> Tensor Y")
-      .catchAllKernel<decltype(take_an_instance), &take_an_instance>());
+          .catchAllKernel<decltype(take_an_instance), &take_an_instance>());
  return instance_registry;
 }
@ -146,7 +148,7 @@ void testTorchbindIValueAPI() {
  auto custom_class_obj = make_custom_class<MyStackClass<std::string>>(
      std::vector<std::string>{"foo", "bar"});
  m.define(R"(
-    def forward(self, s : __torch__.torch.classes._TorchScriptTesting_StackString):
+    def forward(self, s : __torch__.torch.classes._TorchScriptTesting._StackString):
      return s.pop(), s
  )");
--- a/test/cpp/jit/test_lite_interpreter.cpp
+++ b/test/cpp/jit/test_lite_interpreter.cpp
@ -343,7 +343,8 @@ void testLiteInterpreterBuiltinFunction() {
 namespace {
 static auto reg =
    torch::jit::class_<TorchBindLiteInterpreterTestStruct>(
-        "_TorchScriptTesting_LiteInterpreterTest")
+        "_TorchScriptTesting",
        "_LiteInterpreterTest")
        .def("get", &TorchBindLiteInterpreterTestStruct::get)
        .def_pickle(
            // __getattr__
--- a/test/custom_operator/test_custom_classes.py
+++ b/test/custom_operator/test_custom_classes.py
@ -35,19 +35,19 @@ class TestCustomOperators(unittest.TestCase):
    def test_no_return_class(self):
        def f():
-            val = torch.classes._TorchScriptTesting_Foo(5, 3)
+            val = torch.classes._TorchScriptTesting._Foo(5, 3)
            return val.info()
        self.assertEqual(*test_equality(f, lambda x: x))
    def test_constructor_with_args(self):
        def f():
-            val = torch.classes._TorchScriptTesting_Foo(5, 3)
+            val = torch.classes._TorchScriptTesting._Foo(5, 3)
            return val
        self.assertEqual(*test_equality(f, lambda x: x.info()))
    def test_function_call_with_args(self):
        def f():
-            val = torch.classes._TorchScriptTesting_Foo(5, 3)
+            val = torch.classes._TorchScriptTesting._Foo(5, 3)
            val.increment(1)
            return val
@ -55,7 +55,7 @@ class TestCustomOperators(unittest.TestCase):
    def test_function_method_wrong_type(self):
        def f():
-            val = torch.classes._TorchScriptTesting_Foo(5, 3)
+            val = torch.classes._TorchScriptTesting._Foo(5, 3)
            val.increment("asdf")
            return val
@ -65,8 +65,8 @@ class TestCustomOperators(unittest.TestCase):
    @unittest.skip("We currently don't support passing custom classes to custom methods.")
    def test_input_class_type(self):
        def f():
-            val = torch.classes._TorchScriptTesting_Foo(1, 2)
+            val = torch.classes._TorchScriptTesting._Foo(1, 2)
-            val2 = torch.classes._TorchScriptTesting_Foo(2, 3)
+            val2 = torch.classes._TorchScriptTesting._Foo(2, 3)
            val.combine(val2)
            return val
@ -74,14 +74,14 @@ class TestCustomOperators(unittest.TestCase):
    def test_stack_string(self):
        def f():
-            val = torch.classes._TorchScriptTesting_StackString(["asdf", "bruh"])
+            val = torch.classes._TorchScriptTesting._StackString(["asdf", "bruh"])
            return val.pop()
        self.assertEqual(*test_equality(f, lambda x: x))
    def test_stack_push_pop(self):
        def f():
-            val = torch.classes._TorchScriptTesting_StackString(["asdf", "bruh"])
+            val = torch.classes._TorchScriptTesting._StackString(["asdf", "bruh"])
-            val2 = torch.classes._TorchScriptTesting_StackString(["111", "222"])
+            val2 = torch.classes._TorchScriptTesting._StackString(["111", "222"])
            val.push(val2.pop())
            return val.pop() + val2.pop()
        self.assertEqual(*test_equality(f, lambda x: x))
--- a/test/onnx/expect/TestOperators.test_acos.expect
+++ b/test/onnx/expect/TestOperators.test_acos.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_add_broadcast.expect
+++ b/test/onnx/expect/TestOperators.test_add_broadcast.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_add_left_broadcast.expect
+++ b/test/onnx/expect/TestOperators.test_add_left_broadcast.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_add_size1_broadcast.expect
+++ b/test/onnx/expect/TestOperators.test_add_size1_broadcast.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect
+++ b/test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect
+++ b/test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_addconstant.expect
+++ b/test/onnx/expect/TestOperators.test_addconstant.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/test/onnx/expect/TestOperators.test_addmm.expect
+++ b/test/onnx/expect/TestOperators.test_addmm.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_arange_dynamic.expect
+++ b/test/onnx/expect/TestOperators.test_arange_dynamic.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_argmax.expect
+++ b/test/onnx/expect/TestOperators.test_argmax.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_asin.expect
+++ b/test/onnx/expect/TestOperators.test_asin.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_at_op.expect
+++ b/test/onnx/expect/TestOperators.test_at_op.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "x"
--- a/test/onnx/expect/TestOperators.test_atan.expect
+++ b/test/onnx/expect/TestOperators.test_atan.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_avg_pool2d.expect
+++ b/test/onnx/expect/TestOperators.test_avg_pool2d.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_baddbmm.expect
+++ b/test/onnx/expect/TestOperators.test_baddbmm.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "1"
--- a/test/onnx/expect/TestOperators.test_basic.expect
+++ b/test/onnx/expect/TestOperators.test_basic.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_batchnorm.expect
+++ b/test/onnx/expect/TestOperators.test_batchnorm.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_batchnorm_1d.expect
+++ b/test/onnx/expect/TestOperators.test_batchnorm_1d.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect
+++ b/test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "4"
--- a/test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect
+++ b/test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_batchnorm_training.expect
+++ b/test/onnx/expect/TestOperators.test_batchnorm_training.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_bitshift.expect
+++ b/test/onnx/expect/TestOperators.test_bitshift.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "4"
--- a/test/onnx/expect/TestOperators.test_c2_op.expect
+++ b/test/onnx/expect/TestOperators.test_c2_op.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_chunk.expect
+++ b/test/onnx/expect/TestOperators.test_chunk.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_clip.expect
+++ b/test/onnx/expect/TestOperators.test_clip.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_clip_max.expect
+++ b/test/onnx/expect/TestOperators.test_clip_max.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_clip_min.expect
+++ b/test/onnx/expect/TestOperators.test_clip_min.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_concat2.expect
+++ b/test/onnx/expect/TestOperators.test_concat2.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_conv.expect
+++ b/test/onnx/expect/TestOperators.test_conv.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect
+++ b/test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_conv_onnx_irv4_opset8.expect
+++ b/test/onnx/expect/TestOperators.test_conv_onnx_irv4_opset8.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_convtranspose.expect
+++ b/test/onnx/expect/TestOperators.test_convtranspose.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_cos.expect
+++ b/test/onnx/expect/TestOperators.test_cos.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_cumsum.expect
+++ b/test/onnx/expect/TestOperators.test_cumsum.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/test/onnx/expect/TestOperators.test_det.expect
+++ b/test/onnx/expect/TestOperators.test_det.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_dict.expect
+++ b/test/onnx/expect/TestOperators.test_dict.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "1"
--- a/test/onnx/expect/TestOperators.test_dict_str.expect
+++ b/test/onnx/expect/TestOperators.test_dict_str.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/test/onnx/expect/TestOperators.test_dim.expect
+++ b/test/onnx/expect/TestOperators.test_dim.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/test/onnx/expect/TestOperators.test_dropout.expect
+++ b/test/onnx/expect/TestOperators.test_dropout.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "x"
--- a/test/onnx/expect/TestOperators.test_elu.expect
+++ b/test/onnx/expect/TestOperators.test_elu.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_embedding_bags.expect
+++ b/test/onnx/expect/TestOperators.test_embedding_bags.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "weight"
--- a/test/onnx/expect/TestOperators.test_empty_like.expect
+++ b/test/onnx/expect/TestOperators.test_empty_like.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_empty_like_opset7.expect
+++ b/test/onnx/expect/TestOperators.test_empty_like_opset7.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_equal.expect
+++ b/test/onnx/expect/TestOperators.test_equal.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "x"
--- a/test/onnx/expect/TestOperators.test_erf.expect
+++ b/test/onnx/expect/TestOperators.test_erf.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_exp.expect
+++ b/test/onnx/expect/TestOperators.test_exp.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_expand.expect
+++ b/test/onnx/expect/TestOperators.test_expand.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/test/onnx/expect/TestOperators.test_flatten.expect
+++ b/test/onnx/expect/TestOperators.test_flatten.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/test/onnx/expect/TestOperators.test_flatten2D.expect
+++ b/test/onnx/expect/TestOperators.test_flatten2D.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_fmod.expect
+++ b/test/onnx/expect/TestOperators.test_fmod.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_frobenius_norm.expect
+++ b/test/onnx/expect/TestOperators.test_frobenius_norm.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "x"
--- a/test/onnx/expect/TestOperators.test_full.expect
+++ b/test/onnx/expect/TestOperators.test_full.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_full_like.expect
+++ b/test/onnx/expect/TestOperators.test_full_like.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_gather.expect
+++ b/test/onnx/expect/TestOperators.test_gather.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "2"
--- a/test/onnx/expect/TestOperators.test_gather_opset11.expect
+++ b/test/onnx/expect/TestOperators.test_gather_opset11.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "0"
--- a/test/onnx/expect/TestOperators.test_ge.expect
+++ b/test/onnx/expect/TestOperators.test_ge.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "x"
--- a/test/onnx/expect/TestOperators.test_gelu.expect
+++ b/test/onnx/expect/TestOperators.test_gelu.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/test/onnx/expect/TestOperators.test_gt.expect
+++ b/test/onnx/expect/TestOperators.test_gt.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "x"
--- a/test/onnx/expect/TestOperators.test_hardtanh.expect
+++ b/test/onnx/expect/TestOperators.test_hardtanh.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    input: "input"
--- a/test/onnx/expect/TestOperators.test_implicit_expand.expect
+++ b/test/onnx/expect/TestOperators.test_implicit_expand.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/test/onnx/expect/TestOperators.test_index.expect
+++ b/test/onnx/expect/TestOperators.test_index.expect
@ -1,6 +1,6 @@
 ir_version: 6
 producer_name: "pytorch"
-producer_version: "1.4"
+producer_version: "1.5"
 graph {
  node {
    output: "1"
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
gchanan	dacdbc22d1	Revert "Fix handling of non-finite values in topk (#35253 )" (#35582 ) This reverts commit b12579da5398ff23b421332e21e18dc619a0b960. This patch in-and-of itself looks fine, but it's causing some AMP tests to fail.	2020-03-27 17:44:03 -07:00
anjali411	2a789cd0e0	[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 ) Summary: 1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer) 2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function. 3. BC-compatibility serialization test for LBFGS 4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions 5. Made defaults_ optional argument in all optimizers except SGD TODO: add BC-breaking notes for this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20678162 Pulled By: yf225 fbshipit-source-id: 74e062e42d86dc118f0fbaddd794e438b2eaf35a	2020-03-27 12:30:29 -04:00
Yanli Zhao	f9b010f399	enforce rref JIT pickling to be in the scope of rpc calls (#34689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34689 rref JIT pickling is only allowed inside rpc calls. enforcing this by adding a thread local variable isInRpcCall and set it as True when converting rpc requests or responses to message, before calling JIT::pickle(). Inside JIT::pickle(), it allowes to pickle RRef only when the isInRpcCall is true. ghstack-source-id: 100481001 Test Plan: unit tests Differential Revision: D20429826 fbshipit-source-id: dbc04612ed15de5d6c7d75a4732041ccd4ef3f8c	2020-03-27 11:13:01 -04:00
Yanli Zhao	55614ff306	Enforce rref python pickling to be in the scope of RPC call (#34755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34755 This diff disallows to use python pickler to pickle RRef. RRef can only be pickled in the scope of RPC call using _InternalRPCPickler. ghstack-source-id: 100481337 Test Plan: unit tests Differential Revision: D20453806 fbshipit-source-id: ebd4115ee01457ba6958cde805afd0a87c686612	2020-03-27 11:12:36 -04:00
Peter Bell	b12579da53	Fix handling of non-finite values in topk (#35253 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34191 `at::native::radixSelect` basically uses integer comparison which creates a defined ordering of non-finite float values. This isn't compatible with IEEE float comparison, so mixing the two leads to unwritten values in the output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35253 Differential Revision: D20645554 Pulled By: ezyang fbshipit-source-id: 651bcb1742ed67086ec89cc318d862caae65b981	2020-03-27 10:53:18 -04:00
Zafar Takhirov	920e3eb761	Making sure all tensors in `torch.cat` sequence have the same dtype. (#35150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35150 Fixes #35014 Test Plan: Imported from OSS Differential Revision: D20578589 Pulled By: z-a-f fbshipit-source-id: edeaef133d1cf5152dcbafab2b969f1424ee2836	2020-03-26 16:49:11 -04:00
Will Feng	bec01e755a	Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions gh-metadata: pytorch pytorch 35163 gh/yf225/104/head	2020-03-26 14:31:21 -04:00
Will Feng	6a880e1bc9	Add inplace tests for several torch::nn modules / functionals gh-metadata: pytorch pytorch 35147 gh/yf225/101/head	2020-03-26 14:31:21 -04:00
Will Feng	fa86e32a4e	Fix F::interpolate and torch::nn::Upsample implementation gh-metadata: pytorch pytorch 35025 gh/yf225/100/head	2020-03-26 14:31:21 -04:00
Will Feng	5aabaf2b18	Fix fractional_max_pool3d_with_indices implementation gh-metadata: pytorch pytorch 35024 gh/yf225/99/head	2020-03-26 14:31:21 -04:00
Will Feng	4a707e8f95	Fix Conv and ConvTranspose implementation gh-metadata: pytorch pytorch 35023 gh/yf225/98/head	2020-03-26 14:31:21 -04:00
Will Feng	db127b21eb	Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation gh-metadata: pytorch pytorch 35022 gh/yf225/97/head	2020-03-26 14:31:21 -04:00
Will Feng	45313cd9e1	[1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35440 ) * add xor_convergence test for lbfgs * increased batchsize to 6 * minor * increased batch size Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>	2020-03-26 14:22:55 -04:00
neginraoof	df531973e1	[ONNX] update producer version (#35059 ) Summary: Updating producer version Pull Request resolved: https://github.com/pytorch/pytorch/pull/35059 Reviewed By: hl475 Differential Revision: D20585173 Pulled By: houseroad fbshipit-source-id: af0c4e3860beb899548466ea99be2050150f905d	2020-03-26 13:56:57 -04:00
Ksenija Stanojevic	9e3c577caa	Fix torch.mm export to ONNX (#34661 ) Summary: torch.mm is exported as Gemm operator in ONNX and both have an optional input: out. out is considered as broadcastable in Gemm and during graph optimization the optional input (out) would get selected. Since out is optional, in case when it is not defined in torch.mm that would result in the following exception: IndexError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34661 Reviewed By: hl475 Differential Revision: D20496398 Pulled By: houseroad fbshipit-source-id: e677aef0a6aefb1f83a54033153aaabe5c23bc0f	2020-03-26 13:55:18 -04:00
Eli Uriegas	5357b8e4d9	.circleci: Remove python 2 binary builds (#35475 ) Python 2 is EOL soon so we're dropping support as of v1.5.0 Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-26 10:50:34 -07:00
Vitaly Fedyunin	0f23d23db4	Add docs to resize_ and resize_as_ (#35392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35392 Test Plan: Imported from OSS Differential Revision: D20650097 Pulled By: VitalyFedyunin fbshipit-source-id: cff4f555d355dfee42394f6070fe3e466949aeb5	2020-03-26 12:23:04 -04:00
Vitaly Fedyunin	7c24280a3f	Add docs about memory format (#34818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34818 Test Plan: Imported from OSS Differential Revision: D20601336 Pulled By: VitalyFedyunin fbshipit-source-id: d34ad226be950bf134c6b383a4810ea6aa75599e	2020-03-26 12:23:04 -04:00
Mike Ruberry	7100f0be13	ports true_divide method variant to 1.5 (#35390 ) Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2020-03-26 11:50:00 -04:00
Wojciech Baranowski	f7f611c2ec	torch.cat: disallow inputs on different devices (#35053 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35045 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35053 Differential Revision: D20545517 Pulled By: ngimel fbshipit-source-id: eee3fc87c7e578ff44d69d5ce6f92a8f496fa97b	2020-03-26 10:58:33 -04:00
Gao, Xiang	acb982d0b0	Add TORCH_CUDA_API to FilterDescriptor (#35131 ) Summary: `FilterDescriptor` is missing a `TORCH_CUDA_API`, so this symbol is not exported from `torch_cuda.so`, and users could have trouble building cpp_extension when using cudnn. cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/35131 Differential Revision: D20604439 Pulled By: ezyang fbshipit-source-id: c57414fc8a9df9cb1e910e2ec0a48cfdbe7d1779	2020-03-26 10:57:59 -04:00
Gabe Schwartz	aa8b7ad989	Fix thread_local initializtion in C10 WarningHandler. (#34822 ) Summary: The Windows + MSVC-specific bug discussed here: https://github.com/pytorch/pytorch/issues/19394 and fixed here: https://github.com/pytorch/pytorch/issues/22405 still appears in C10's warning handler class. This results in a crash if a user attempts to run code which would print a warning when that code is running inside a thread created by a DLL. This PR applies a similar fix to that of https://github.com/pytorch/pytorch/issues/22405. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34822 Test Plan: * Tested locally by running CodecverseWorkbench Unity app with patched build. * CI Differential Revision: D20627971 Pulled By: HapeMask fbshipit-source-id: 64dfca531ed7eebbe9e0ecac3d3d4d025c683883	2020-03-25 20:02:45 -07:00
mrshenli	2d403ed8be	Add python excepiton handling catch block to resolve deadlock (#35283 ) (#35402 ) Summary: Note: This PR has been merged into master after the 1.5.0 branch cut at 36e3c00 (see original PR: #35283). This PR is to cherry pick it into 1.5. ---- Original Commit Description Follows --- Pull Request resolved: https://github.com/pytorch/pytorch/pull/35283 https://github.com/pytorch/pytorch/issues/34260 Deadlock on destructing py::error_already_set. There are request callback impls in Python, where Python exceptions could be thrown. For releasing Python exception py::objects, GIL must be held. Differential Revision: D7753253 fbshipit-source-id: 4bfaaaf027e4254f5e3fedaca80228c8b4282e39 Co-authored-by: Shihao Xu <shihaoxu@fb.com>	2020-03-25 17:05:18 -07:00
Eli Uriegas	c25a664f77	Trying pinning pyyaml and setuptools on macos to older version (#35296 ) (#35400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35296 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20624843 Pulled By: ezyang fbshipit-source-id: 9028f1dd62d0c25e916eb4927fd8dd6acbd88886 (cherry picked from commit 3f896ef7435201b2c3f51851f80dc674dfadfd40) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Co-authored-by: Edward Yang <ezyang@fb.com>	2020-03-25 16:04:06 -07:00
Pavel Belevich	ab660ae394	Fix Tensor __radd__ type hint issue (#35231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35231 Fixes #35213 (Note: this ignores all push blocking failures!) Test Plan: `mypy -c "import torch; ten = torch.tensor([1.0, 2.0, 3.0]); print(7 + ten)"` should not produce any warnings Differential Revision: D20604924 Pulled By: pbelevich fbshipit-source-id: 53a293a99b3f2ab6ca5516b31f3a92f67eb67a39	2020-03-25 18:37:07 -04:00
Eli Uriegas	3c476a8858	PyTorch should always depend on `future` (#35057 ) (#35412 ) Summary: Because `past` is used in `caffe2.python.core` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35057 Test Plan: CI Differential Revision: D20547042 Pulled By: malfet fbshipit-source-id: cad2123c7b88271fea37f21e616df551075383a8 (cherry picked from commit d3f5045bf55e4a5dfb53ceccb6130e4e408cf466) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2020-03-25 14:54:26 -07:00
peter	651fa88645	Load all DLLs in the lib directory for Windows (v.1.5.0)	2020-03-25 16:23:22 -04:00
Ailing Zhang	565c3400b4	Update view op list.	2020-03-25 16:14:08 -04:00
Xiao Wang	3e332778b4	non blocking copy from #35144	2020-03-25 14:54:41 -04:00
Mike Ruberry	f598738920	UBSAN deliberate float to int fix	2020-03-25 11:24:30 -04:00
James Reed	4c6bfa0187	[1.5 cherrypick][JIT] Namespaces for TorchBind	2020-03-25 11:23:03 -04:00
James Reed	6f25003682	[1.5 cherrypick][JIT] BC shim for TorchBind classes	2020-03-25 11:23:03 -04:00
Xiang Gao	752c129fa1	Update docs about DP and DDP for CUDA (#35063 ) Summary: We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063 Differential Revision: D20549621 Pulled By: ngimel fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543	2020-03-25 11:18:17 -04:00
Eli Uriegas	fb59a9caca	.circleci: Change default CUDA for pip, cu101 -> cu102 (#35310 ) So that packages are correctly marked when looking through the html pages. Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-24 15:05:25 -07:00
Ailing Zhang	4d30dbdd35	Pin XLA CI to use r1.5 release branch.	2020-03-24 17:54:31 -04:00
Eli Uriegas	b7f4a1a397	.circleci: Switch master to release/1.5 for git merge (#35320 ) Since we're on a release branch we'll need to fix this up to do a merge for release/1.5 instead of master. TODO: In the future we should have a dynamic way of gathering the base branch for PRs. Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-24 14:52:24 -07:00
Will Feng	afda1dc943	Revert "Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation" This reverts commit e2184ba08352d730d7165455c14f783b3e54082a.	2020-03-24 14:09:18 -04:00
Will Feng	d506ae882b	Revert "Fix Conv and ConvTranspose implementation" This reverts commit 88778854546b08bc6dd9f68e0a64311902c7d30c.	2020-03-24 14:09:18 -04:00
Will Feng	36e5abe531	Revert "Fix fractional_max_pool3d_with_indices implementation" This reverts commit b89eb7c654b846fb3391cf4cc5aeb536cc41f1d7.	2020-03-24 14:09:18 -04:00
Will Feng	6e6f62230e	Revert "Fix F::interpolate and torch::nn::Upsample implementation" This reverts commit 75148df1f56c91f54965b530d606a6b9a4c8e269.	2020-03-24 14:09:18 -04:00
Will Feng	5d15577e6c	Revert "Add inplace tests for several torch::nn modules / functionals" This reverts commit 48590d6a9b939fb8097e4f2108872721ea5a516f.	2020-03-24 14:09:18 -04:00
Will Feng	6aa5298c5c	Revert "Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions" This reverts commit 5ca901431886d60687275b9a310eac5b5aeba02f.	2020-03-24 14:09:18 -04:00
Will Feng	f3df13725b	Revert "[1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35113 )" This reverts commit 246b824644c3731b00be6119f69795afd4eac9b6.	2020-03-24 14:08:56 -04:00
Eli Uriegas	4eee3caa11	[release/1.5] .circleci: Fix unbound CIRCLE_TAG variable (#35242 ) Was failing when trying to execute this script on a non-tag Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-23 16:21:44 -07:00
Yunus Rahbar	4d96463130	Updating fbgemm	2020-03-23 13:31:24 -07:00
anjali411	246b824644	[1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35113 ) * add xor_convergence test for lbfgs * increased batchsize to 6 * minor * increased batch size	2020-03-23 16:00:57 -04:00
Will Feng	5ca9014318	Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions	2020-03-23 15:55:18 -04:00
Will Feng	48590d6a9b	Add inplace tests for several torch::nn modules / functionals gh-metadata: pytorch pytorch 35147 gh/yf225/101/head	2020-03-23 15:55:18 -04:00
Will Feng	75148df1f5	Fix F::interpolate and torch::nn::Upsample implementation gh-metadata: pytorch pytorch 35025 gh/yf225/100/head	2020-03-23 15:55:18 -04:00
Will Feng	b89eb7c654	Fix fractional_max_pool3d_with_indices implementation gh-metadata: pytorch pytorch 35024 gh/yf225/99/head	2020-03-23 15:55:18 -04:00
Will Feng	8877885454	Fix Conv and ConvTranspose implementation gh-metadata: pytorch pytorch 35023 gh/yf225/98/head	2020-03-23 15:55:18 -04:00
Will Feng	e2184ba083	Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation gh-metadata: pytorch pytorch 35022 gh/yf225/97/head	2020-03-23 15:55:18 -04:00
Yunus Rahbar	8ef47ad2f0	Updating fbgemm	2020-03-23 10:08:52 -07:00
Eli Uriegas	6725b6f503	.cirlceci: Refactor how to grab the tagged version Discovered that the upload scripts do not do well when there's no pytorch repository to actually do git operations on. CirlceCI however provides a nice environment variable with the name of the current tag so let's just use that when it's available and fall back on the git describe functionality if that fails. Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-19 16:34:57 -07:00
Eli Uriegas	bcd3f6da1a	.circleci: Remove quotes from --git-dir git doesn't handle the escapes correctly so let's just not put them altogether. Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-19 15:39:31 -07:00
Eli Uriegas	0b3d2f7b7d	.circleci: Make sure to add .git to --git-dir --git-dir only works when it points directly to a .git folder Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-19 15:28:23 -07:00
Eli Uriegas	f522651a7e	.circleci: Switch git -C -> git --git-dir Older versions of git do not contain the '-C' flag so let's switch to a flag that is pre-historic and will run on any version of RHEL that is still supported in the modern era. Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-19 15:22:44 -07:00
Eli Uriegas	01c8ef2757	.circleci: One more -C to add to get correct git info Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-19 15:08:02 -07:00
Eli Uriegas	7cfe68ce3a	.circleci: Hardcode directory to /pytorch to ensure git Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-19 14:54:57 -07:00
Eli Uriegas	6f3120c6b9	.circleci: Ensure describe happens in pytorch repo Found an issue where the git describe wasn't properly executed since the binary_populate_env.sh script was being executed from a different directory. 'git -C' forces the describe to run in the running directory for the script which should contain the correct git information Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-03-19 14:24:18 -07:00