Compare commits

...

126 Commits

Author SHA1 Message Date
4ff3872a20 [v.1.5.0] Ensure linearIndex of advanced indexing backwards is contig… (#36962)
* [v.1.5.0] Ensure linearIndex of advanced indexing backwards is contiguous.

This is a more straightforward solution to the problem than https://github.com/pytorch/pytorch/pull/36957; I don't know about the relative performance.

Fixes: #36956

ghstack-source-id: 43c48eaee7232cd3ed2b108edbbee24c11e8321a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36959

* Fix test.
2020-04-20 19:59:38 -04:00
d7bdffabed [v1.5 Patch] Disable flaky test_backward_node_failure_python_udf test in dist_autograd_test.py
This test is flaky on 1.5 release branch. Below is a failed CI run:
https://app.circleci.com/pipelines/github/pytorch/pytorch/157331/workflows/b3e0bd6b-6c55-4d14-bde8-96b8345cf9e2/jobs/5190025
2020-04-20 14:25:32 -04:00
9ba0a89489 Overwrite bazel if /usr/bin/bazel already exists. 2020-04-20 14:24:42 -04:00
c164fbccb1 Add TorchServe 2020-04-19 21:44:32 -07:00
9a51e477ac make simple executor the default for OSS 2020-04-17 20:00:53 -04:00
375566fb78 Handle log_sigmoid(out=) properly.
Fixes: https://github.com/pytorch/pytorch/issues/36499

Changes:
1) Moves some bindings from LegacyNNDefinitions to Activation so all of log_sigmoid lives together
2) Properly handle non-contiguous / incorrectly sized out parameters to log_sigmoid.  This is done by copying from a buffer if necessary.
3) Require that the internal buffer (different from 2)) is contiguous.  This should always be the case because it's always created internally.
4) Adds a test
2020-04-17 15:43:35 -04:00
dfdc788076 Fix incorrect merge of #34136.
If you look at https://github.com/pytorch/pytorch/pull/34136/, you will notice a commit (80c15c087c) that didn't get merged.
This is to address that, to avoid crashing on remainder when the rhs is 0.

ghstack-source-id: e805e290bd4b7d3165fd78d4e537e56e4c459162
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36760
2020-04-17 15:42:20 -04:00
9e6ef814cc [v1.5.0] Print keyword-only arg symbol for function signature suggestions.
Fixes: https://github.com/pytorch/pytorch/issues/36773

ghstack-source-id: 6b08839ffc8b228e9533a47b7fd034367fc93dec
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36780
2020-04-17 15:42:04 -04:00
31461800f6 Migrate release CI jobs to CircleCI for Windows (v1.5 Release) (#36658)
* Migrate release CI jobs to CircleCI for Windows (v1.5 Release)

* Fix comments
2020-04-16 12:18:27 -04:00
Jie
e741839b0e Fixing SyncBN dgrad (#36382)
Summary:
Previous PR https://github.com/pytorch/pytorch/issues/22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36382

Differential Revision: D20984446

Pulled By: ngimel

fbshipit-source-id: 80066eee83760b275d61e2cdd4e86facca5577fd
2020-04-16 10:58:16 -04:00
8eb39c9cfd [CI] fix test_distributed for python 3.8+ (#36542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36542

Python 3.8 set the default multiprocessing start mode to spawn, but we
need fork in these tests, otherwise there are some pickling issues.
Test: Ensure that these tests succeed when run with python 3.8
ghstack-source-id: 102093824

Test Plan: Ensure success with python 3.8

Differential Revision: D21007753

fbshipit-source-id: 4b39844c6ba76a53293c0dfde7c98ec5a78fe113
2020-04-16 10:54:57 -04:00
b5e4c0993d Add a warning for Single-Process Multi-GPU DDP 2020-04-15 19:08:24 -04:00
6bc6832bda fix syntax 2020-04-15 19:00:11 -04:00
593594839c Update docs for 1.5 to remove Python 2 references (#36338)
* Remove python 2 from jit.rst

* Remove python 2 from jit_language_reference.rst

* Remove python 2 from multiprocessing.rst

* Remove python 2 from named_tensor.rst

* Remove python 2 from multiprocessing.rst

* Remove python 2 from windows.rst

* Update multiprocessing.rst

* Remove python 2 from notes/multiprocessing.rst
2020-04-14 15:57:02 -07:00
cf65c8ef15 Fix torch.min docs (#36319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36319

On the way to resolving #35216.
This is a fix for just the master branch but once this goes in,
I'll send a cherry-pick to release/1.5

The problem is that we were not calling `format` on a string that had
templates (e.g., '{input}', '{dim}'). This change makes it so that we
call format on the entire docstring for `torch.min`.

Test Plan:
- The `torch.max` docs are OK:
https://pytorch.org/docs/master/torch.html#torch.max and don't need
changing.
- `torch.min` docs, before this change: see second screenshot in #35216.
- after this change: <Insert link here on github>

![image](https://user-images.githubusercontent.com/5652049/78921702-4e2acc00-7a63-11ea-9ea0-89636ff6fb0a.png)

Differential Revision: D20946702

Pulled By: zou3519

fbshipit-source-id: a1a28707e41136a9bb170c8a4191786cf037a0c2
2020-04-13 19:03:03 -04:00
ca0dc1fcdc skip test in 3.8 because of inconsistent regex 2020-04-10 11:06:47 -07:00
b58f89b2e4 Use counter instead of vector of futures in _parallel_run (#36159) (#36334)
Summary:
This should be faster than allocating one mutex, flag and conditional variable per task.

Using `std::atomic<size_t>` to count remaing tasks is not sufficient,
because modification of remaining counter and signalling conditional variable must happen atomically,
otherwise `wait()` might get invoked after `notify_one()` was called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36159

Test Plan: CI

Differential Revision: D20905411

Pulled By: malfet

fbshipit-source-id: facaf599693649c3f43edafc49f369e90d2f60de
(cherry picked from commit 986a8fdd6a18d9110f8bde59361967139450966b)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-04-09 14:08:57 -07:00
87b6685c6b repr and _*state_dict for qRNN (#31540)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31540

Fixes #31468

Test Plan: Imported from OSS

Differential Revision: D19205894

Pulled By: z-a-f

fbshipit-source-id: 80c36f74aa20a125ea8d74a54e9905576f1bc6d7
2020-04-09 12:26:56 -04:00
f746f1b746 Revert "Avoid clone for sparse tensors during accumulation of grads. (#33427)"
This reverts commit b185359fb4ba4dcb0c048fd1d049da23eff88b27.
2020-04-09 11:33:55 -04:00
1379415150 Revert "AccumulateGrad: ensure sparse tensor indices and values refcount is always 1 (#34559)"
This reverts commit 2ce9513b0c8894987f6d42bfb57ff95b22e32c95.
2020-04-09 11:33:55 -04:00
7d638d2596 [v1.5.0] fix is_float_scale_factor warning (python and c++) (#36274)
* fix is_float_scale_factor warning

* fix python impl

Co-authored-by: Robin Lobel <divide@divideconcept.net>
Co-authored-by: Will Feng <willfeng@fb.com>
2020-04-09 11:31:13 -04:00
bad005d331 .circleci: Add binary builds/tests to run on release branches (#36283)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-04-08 16:37:24 -07:00
16d8a52407 [pytorch] Add error when PyTorch used with Python 2 (#36151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36151

Python 2 has reached end-of-life and is no longer supported by PyTorch. To avoid confusing behavior when trying to use PyTorch with Python 2, detect this case early and fail with a clear message.  This commit covers `import torch` only and not C++  for now.

Test Plan: waitforsandcastle

Reviewed By: dreiss

Differential Revision: D20894381

fbshipit-source-id: a1073b7a648e07cf10cda5a99a2cf4eee5a89230
2020-04-08 18:55:58 -04:00
a33b264588 Revert "Update docs for 1.5 to remove Python 2 references (#36116)"
This reverts commit 63dcd9eccc90136afdfb5d8130077ff1e917ba2e.
2020-04-08 18:51:13 -04:00
3a67e00889 [1.5 cherrypick] C++ Adam optimizer - corrected messages for check of default options (#36245)
* Corrected messages for check of default options

* Added 0<= betas < 1 range check, match python messages for check of betas

Co-authored-by: meganset <meganset@gmail.com>
2020-04-08 18:06:16 -04:00
6bd039551d Remove determine_from from test/run_test.py (#36256)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-04-08 14:58:23 -07:00
b6c3058d61 Exclude torch/csrc/cuda/*nccl* from clang-tidy (#36251)
Since workflow configures pytorch with 'USE_NCCL` set to 0, we can not tidy those files

(cherry picked from commit e172a6ef920b6838b67eb8f0020d78031df8cde5)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-04-08 13:37:16 -07:00
ed908b4fbc [release/1.5] Move all nccl from torch_python to torch_cuda (#36229)
* Remote dead code

`THCPModule_useNccl()` doesn't seem to be used anywhere

* Move all nccl calls from `torch_python` to `torch_cuda`

Because `torch_python` is supposed to be thin wrapper around torch

This ensures API parity between C++ and Python, as well as reduces `torch_python` binary size

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-04-08 10:39:20 -07:00
b66e0af58b s/repo.continuum.io/repo.anaconda.com/
Followup after  https://github.com/pytorch/pytorch/pull/36201

Per https://github.com/conda/conda/issues/6886  `repo.anaconda.com` should have been used since Feb 2019

Test Plan: CI
2020-04-08 13:05:04 -04:00
bf8a5ede96 [ONNX] fix size for opset 11 (#35984)
Summary:
Fixing size, as the aten op has updated to support 0 inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35984

Reviewed By: hl475

Differential Revision: D20858214

Pulled By: houseroad

fbshipit-source-id: 8ad0a0174a569455e89da6798eed403c8b162a47
2020-04-08 11:50:59 -04:00
c2bc5c56c5 Use repo.anaconda.com instead of repo.continuum.io (#36201)
Summary:
Per https://github.com/conda/conda/issues/6886  `repo.anaconda.com` should have been used since Feb 2019
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36201

Test Plan: CI

Differential Revision: D20910667

Pulled By: malfet

fbshipit-source-id: 3a191e2cae293e6f96dbb323853e84c07cd7aabc
2020-04-08 08:39:52 -07:00
db3c3ed662 Move test to test_jit_py3.py 2020-04-08 11:15:33 -04:00
9de4770bbd [v1.5.0] Group libraries in TOC and add PyTorch Elastic
Move XLA out of Notes and group with other libraries. Also adds link to PyTorch Elastic.
2020-04-08 11:08:39 -04:00
911a2a6b63 [BugFix] Fix compare_exchange_weak in DispatchStub.h (#35794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35794

### Summary

As PyTorch has gone in production on iOS for about week, we've spotted a few crashes (90 out of 20.3k ) related to DispatchStub.h. The major part of the crash log is pasted below (full crash information can be found at `bunnylol logview 1d285dc9172c877b679d0f8539da58f0`):

```
FBCameraFramework void at::native::DispatchStub<void (*)(at::TensorIterator&, c10::Scalar), at::native::add_stub>::operator()<at::TensorIterator&, c10::Scalar&>(c10::DeviceType, at::TensorIterator&, c10::Scalar&)(DispatchStub.h:0)
+FBCameraFramework at::native::add(at::Tensor const&, at::Tensor const&, c10::Scalar)(BinaryOps.cpp:53)
+FBCameraFramework at::CPUType::add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar)(CPUType.cpp:55)
+FBCameraFramework at::add(at::Tensor const&, at::Tensor const&, c10::Scalar)(Functions.h:1805)
+FBCameraFramework [inlined] c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::intrusive_ptr(c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>&&)(intrusive_ptr.h:0)
+FBCameraFramework [inlined] c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::intrusive_ptr(c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>&&)(intrusive_ptr.h:221)
+FBCameraFramework [inlined] at::Tensor::Tensor(at::Tensor&&)(TensorBody.h:93)
+FBCameraFramework [inlined] at::Tensor::Tensor(at::Tensor&&)(TensorBody.h:93)
+FBCameraFramework c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >::operator()(at::Tensor, at::Tensor, c10::Scalar)(kernel_lambda.h:23)
+FBCameraFramework [inlined] c10::guts::infer_function_traits<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> > >::type::return_type c10::detail::call_functor_with_args_from_stack_<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false, 0ul, 1ul, 2ul>(c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*, std::__1::vector<c10::IValue, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*::allocator<std::__1::vector> >*, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*::integer_sequence<unsigned long, 0ul, 1ul, 2ul>)(kernel_functor.h:210)
+FBCameraFramework [inlined] c10::guts::infer_function_traits<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> > >::type::return_type c10::detail::call_functor_with_args_from_stack<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false>(c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*, std::__1::vector<c10::IValue, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*::allocator<std::__1::vector> >*)(kernel_functor.h:218)
+FBCameraFramework c10::detail::make_boxed_from_unboxed_functor<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false, void>::call(c10::OperatorKernel*, c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >*)(kernel_functor.h:250)
+FBCameraFramework [inlined] (anonymous namespace)::variable_fallback_kernel(c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >*)(VariableFallbackKernel.cpp:32)
+FBCameraFramework void c10::KernelFunction::make_boxed_function<&((anonymous namespace)::variable_fallback_kernel(c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >*))>(c10::OperatorKernel*, c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >*)(KernelFunction_impl.h:21)
+FBCameraFramework torch::jit::mobile::InterpreterState::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&)(interpreter.cpp:0)
+FBCameraFramework torch::jit::mobile::Function::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) const(function.cpp:59)
+FBCameraFramework torch::jit::mobile::Module::run_method(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(module.cpp:51)
+FBCameraFramework [inlined] torch::jit::mobile::Module::forward(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(module.h:28)
```
The problem is `compare_exchange_weak` is not guaranteed to be successful in one shot, as described in  [C++ Concurrency in Action (2nd Edition)](https://livebook.manning.com/book/c-plus-plus-concurrency-in-action-second-edition/chapter-5/79). This might result in `cpu_dispatch_ptr` being null pointer in concurrent situations, thus leading to the crash. As suggested in the book, due to spurious failure, the `compare_exchange_weak` is typically used in a loop.  There is also a [stackoverflow discussion](https://stackoverflow.com/questions/25199838/understanding-stdatomiccompare-exchange-weak-in-c11) about this. Feel free to drop comments below if there is a better option.

### The original PR

- [Enhance DispatchStub to be thread safe from a TSAN point of view](https://github.com/pytorch/pytorch/pull/32148)

### Test Plan

- Keep observing the crash reports in QE

Test Plan: Imported from OSS

Differential Revision: D20808751

Pulled By: xta0

fbshipit-source-id: 52f5c865b70c59b332ef9f0865315e76d97f6eaa
2020-04-08 10:56:07 -04:00
60375bcfdf [1.5.0] Attempt to fix the pytorch_cpp_doc_push build by pinning breathe. 2020-04-08 10:54:56 -04:00
63dcd9eccc Update docs for 1.5 to remove Python 2 references (#36116) 2020-04-07 16:03:44 -07:00
e8236d2ed4 fix max_pool2d cuda version Dimension out of range issue(#36046) (#36095)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36095

Test Plan: Imported from OSS

Differential Revision: D20876733

Pulled By: glaringlee

fbshipit-source-id: a2b92fd2dd0254c5443af469e3fb2faa2323e5c9
2020-04-07 18:52:21 -04:00
0058b1bb7e [1.5 cherrypick][JIT] Fix fake_range() 2020-04-07 18:47:22 -04:00
419283e291 Improve C++ API autograd and indexing docs (#35777)
Summary:
This PR adds docs for the following components:
1. Tensor autograd APIs (such as `is_leaf` / `backward` / `detach` / `detach_` / `retain_grad` / `grad` / `register_hook` / `remove_hook`)
2. Autograd APIs: `torch::autograd::backward` / `grad` / `Function` / `AutogradContext`, `torch::NoGradGuard` / `torch::AutoGradMode`
3. Tensor indexing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35777

Differential Revision: D20810616

Pulled By: yf225

fbshipit-source-id: 60526ec0c5b051021901d89bc3b56861c68758e8
2020-04-07 18:37:27 -04:00
0e6f6ba218 [pytorch] Remove python2 support from tests and torch.jit (#35042) (#36162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35042

Removing python2 tests and some compat code in torch.jit. Check if dependent projects and external tests have any issues after these changes.

Test Plan: waitforsandcastle

Reviewed By: suo, seemethere

Differential Revision: D18942633

fbshipit-source-id: d76cc41ff20bee147dd8d44d70563c10d8a95a35
(cherry picked from commit 8240db11e193b0334a60a33d9fc907ebc6ba6987)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Orion Reblitz-Richardson <orionr@fb.com>
2020-04-07 13:55:50 -07:00
ec8dbaf920 Add more alternative filters in places people forgot to add them. (#36082) (#36148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36082

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20874618

Pulled By: ezyang

fbshipit-source-id: b6f12100a247564428eb7272f803a03c9cad3a97
(cherry picked from commit 449a4ca3408774ed961f1702ca31a549f5818b80)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Edward Yang <ezyang@fb.com>
2020-04-07 09:59:33 -07:00
7e168d134f Pin Sphinx to 2.4.4 (take 2), fix docs CIs (#36072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36072

Update to https://github.com/pytorch/pytorch/pull/36065/ which was
almost there

Test Plan: - Wait for CI

Differential Revision: D20871661

Pulled By: zou3519

fbshipit-source-id: 2bf5ce382e879aafd232700ff1c0d61fc17ea52d
2020-04-07 10:54:36 -04:00
6daae58871 Remove __nv_relfatbin section from nccl_static library (#35907)
Test Plan: CI

(cherry picked from commit 04e06b419990328157f0e2108a95b2848f66d75f)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-04-06 16:57:03 -07:00
fee0ff1bf6 May fix TopKTypeConfig<at::Half> without an additional Bitfield specialization 2020-04-06 19:41:17 -04:00
deaf3b65cf Compile THCTensorTopK per dtype.
ROCm builds fail inconsistently on this file by timing out.

ghstack-source-id: 4a8f22731aa82c02d464a8cba522e856afbe49b8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36074
2020-04-06 19:41:17 -04:00
dca9c2501d Revert "Revert "Fix handling of non-finite values in topk (#35253)" (#35582)"
This reverts commit dacdbc22d195f80e0b529b4e9111c8ca9a172914.
2020-04-06 19:41:17 -04:00
842cd47416 Refactor and turn on C++ API parity test in CI
gh-metadata: pytorch pytorch 35190 gh/yf225/106/head
2020-04-06 15:40:35 -04:00
a30b49085c Move NewModuleTest and NewCriterionTest from test_nn.py to common_nn.py (#35189)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35189

Test Plan: Imported from OSS

Differential Revision: D20588197

Pulled By: yf225

fbshipit-source-id: 5a28159b653895678c250cbc0c1ddd51bc7a3123
2020-04-06 15:40:35 -04:00
82626f8ad9 More generic dedupe MKL fix (#35966)
* Stop linking against MKL

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Perform test for build size

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* fixup

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* One more MSVC fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Revert "Perform test for build size"

This reverts commit 8b5ed8eac81cc880b5cedb33cb3b86f584abacb7.
2020-04-06 11:50:48 -07:00
27fddfda4f Use std::abs instead of abs in lbfgs.cpp (#35974)
Summary:
This supersedes https://github.com/pytorch/pytorch/pull/35698.

`abs` is a C-style function that takes only integral argument
`std::abs` is polymorphic and can be applied to both integral and floating point types

This PR also increases `kBatchSize` in `test_optimizer_xor` function in `test/cpp/api/optim.cpp` to fix `OptimTest.XORConvergence_LBFGS` failure under ASAN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35974

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D20853570

Pulled By: yf225

fbshipit-source-id: 6135588df2426c5b974e4e097b416955d1907bd4
2020-04-06 14:50:18 -04:00
7ecf6a1c10 [release/1.5] Bump libtorch to 3.7, remove python2 (#36080)
* .cirlceci: Remove Python 2.7 builds, switch libtorch to 3.7

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* .circleci: Bump libtorch builds to 3.7

The image is actually using Python 3.7.2 so we should reflect that
within our circleci configs

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
(cherry picked from commit b3f2572aaf83d1f5383369187f6263e6f926103b)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-04-06 11:10:48 -07:00
beb07a44c4 Ports integer division callsite cleanup 2020-04-02 20:17:31 -04:00
a01c3bd1fe [BC] Fix the BC test for 1.5 (#35733)
* [BC] Fix the BC test for 1.5

* Skip RRef

* Skip more

* Skip more

* Fix whitelist

* Fix whitelist
2020-04-02 19:36:18 -04:00
ffd010f8a0 Make test_leaky_relu_inplace_with_neg_slope device-generic and skipIfRocm. (#35816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35816

Fixes https://github.com/pytorch/pytorch/issues/35689.

Test Plan: Imported from OSS

Differential Revision: D20796656

Pulled By: gchanan

fbshipit-source-id: 474790fe07899d9944644f6b3d7a15db1c2b96db
2020-04-02 17:05:23 -04:00
8ad59f03a8 Skip ROCm test in test/test_cpp_extensions_aot.py (#35838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35838

It may be flaky.

Test Plan: Imported from OSS

Differential Revision: D20807409

Pulled By: gchanan

fbshipit-source-id: f085d05bcb6a04d304f3cd048c38d2e8453125d6
2020-04-02 17:04:54 -04:00
ed3640df68 Fix another case of float2::x and float2::y may not be the same on ROCm (#35785)
Summary:
This is another case of the issue fixed in https://github.com/pytorch/pytorch/pull/35783. Mirroring 35786.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35785

Differential Revision: D20800317

Pulled By: ezyang

fbshipit-source-id: de5f32839755d5ff5aefff8408df69adbab4d0a1
2020-04-02 17:01:27 -04:00
fb88942f6c Fix typo 2020-04-02 13:53:13 -04:00
5d05c51887 Refactored rpc docs (#35109)
Summary:
Reorganize as per jlin27 's comments. Screenshots added in comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109

Differential Revision: D20788774

Pulled By: rohan-varma

fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766
2020-04-02 13:53:13 -04:00
df5986fbf3 [1.5 Release] Disabled complex tensor construction (#35579)
* disabled complex tensor construction

* minor

* doc fix

* added docs back and updated complex dtype check

* removed test_complex.py

* removed complexfloat reg test

* debug
2020-04-01 11:11:05 -04:00
165403f614 [v1.5.0] float2::x and float2::y may not be the same as float on ROCm (#35593)
Summary:
This causes ambiguity and can be triggered sometimes (e.g., by https://github.com/pytorch/pytorch/issues/35217). Explicitly convert them to float.

    error: conditional expression is ambiguous; 'const
    hip_impl::Scalar_accessor<float, Native_vec_, 0>' can be converted to
    'float' and vice versa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35593

Differential Revision: D20735663

Pulled By: ezyang

fbshipit-source-id: ae6a38a08e59821bae13eb0b9f9bdf21a008d5c0
2020-03-31 19:58:40 -04:00
fbf18c34ff ports disabling imag 2020-03-31 18:55:45 -04:00
84f806c821 ports real and imag fixes 2020-03-31 13:34:39 -04:00
94139a7d95 Add warnings that amp is incomplete in 1.5 2020-03-31 10:49:45 -04:00
75e36186b2 [v1.5.0] Fix Caffe2 mobile compilation
Ports #35288
2020-03-30 17:17:59 -04:00
f4a0b406dd Warn a known autograd issue on XLA backend. 2020-03-30 17:16:39 -04:00
e884e720f0 [Windows] make torch_cuda's forced link also work for CMake
Was only working for ninja
2020-03-30 17:13:51 -04:00
dacdbc22d1 Revert "Fix handling of non-finite values in topk (#35253)" (#35582)
This reverts commit b12579da5398ff23b421332e21e18dc619a0b960.

This patch in-and-of itself looks fine, but it's causing some AMP tests to fail.
2020-03-27 17:44:03 -07:00
2a789cd0e0 [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957)
Summary:
1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer)
2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function.
3. BC-compatibility serialization test for LBFGS
4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions
5. Made defaults_ optional argument in all optimizers except SGD

**TODO**: add BC-breaking notes for this PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20678162

Pulled By: yf225

fbshipit-source-id: 74e062e42d86dc118f0fbaddd794e438b2eaf35a
2020-03-27 12:30:29 -04:00
f9b010f399 enforce rref JIT pickling to be in the scope of rpc calls (#34689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34689

rref JIT pickling is only allowed inside rpc calls. enforcing this by adding a thread local variable isInRpcCall and set it as True when converting rpc requests or responses to message, before calling JIT::pickle(). Inside JIT::pickle(), it allowes to pickle RRef only when the isInRpcCall is true.
ghstack-source-id: 100481001

Test Plan: unit tests

Differential Revision: D20429826

fbshipit-source-id: dbc04612ed15de5d6c7d75a4732041ccd4ef3f8c
2020-03-27 11:13:01 -04:00
55614ff306 Enforce rref python pickling to be in the scope of RPC call (#34755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34755

This diff disallows to use python pickler to pickle RRef. RRef can only be pickled in the scope of RPC call using _InternalRPCPickler.
ghstack-source-id: 100481337

Test Plan: unit tests

Differential Revision: D20453806

fbshipit-source-id: ebd4115ee01457ba6958cde805afd0a87c686612
2020-03-27 11:12:36 -04:00
b12579da53 Fix handling of non-finite values in topk (#35253)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34191

`at::native::radixSelect` basically uses integer comparison which creates a defined ordering of non-finite float values. This isn't compatible with IEEE float comparison, so mixing the two leads to unwritten values in the output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35253

Differential Revision: D20645554

Pulled By: ezyang

fbshipit-source-id: 651bcb1742ed67086ec89cc318d862caae65b981
2020-03-27 10:53:18 -04:00
920e3eb761 Making sure all tensors in torch.cat sequence have the same dtype. (#35150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35150

Fixes #35014

Test Plan: Imported from OSS

Differential Revision: D20578589

Pulled By: z-a-f

fbshipit-source-id: edeaef133d1cf5152dcbafab2b969f1424ee2836
2020-03-26 16:49:11 -04:00
bec01e755a Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions
gh-metadata: pytorch pytorch 35163 gh/yf225/104/head
2020-03-26 14:31:21 -04:00
6a880e1bc9 Add inplace tests for several torch::nn modules / functionals
gh-metadata: pytorch pytorch 35147 gh/yf225/101/head
2020-03-26 14:31:21 -04:00
fa86e32a4e Fix F::interpolate and torch::nn::Upsample implementation
gh-metadata: pytorch pytorch 35025 gh/yf225/100/head
2020-03-26 14:31:21 -04:00
5aabaf2b18 Fix fractional_max_pool3d_with_indices implementation
gh-metadata: pytorch pytorch 35024 gh/yf225/99/head
2020-03-26 14:31:21 -04:00
4a707e8f95 Fix Conv and ConvTranspose implementation
gh-metadata: pytorch pytorch 35023 gh/yf225/98/head
2020-03-26 14:31:21 -04:00
db127b21eb Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation
gh-metadata: pytorch pytorch 35022 gh/yf225/97/head
2020-03-26 14:31:21 -04:00
45313cd9e1 [1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35440)
* add xor_convergence test for lbfgs

* increased batchsize to 6

* minor

* increased batch size

Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>
2020-03-26 14:22:55 -04:00
df531973e1 [ONNX] update producer version (#35059)
Summary:
Updating producer version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35059

Reviewed By: hl475

Differential Revision: D20585173

Pulled By: houseroad

fbshipit-source-id: af0c4e3860beb899548466ea99be2050150f905d
2020-03-26 13:56:57 -04:00
9e3c577caa Fix torch.mm export to ONNX (#34661)
Summary:
torch.mm is exported as Gemm operator in ONNX and both have an optional input: out.
out is considered as broadcastable in Gemm and during graph optimization the optional input (out) would get selected. Since out is optional, in case when it is not defined in torch.mm that would result in the following exception:
IndexError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34661

Reviewed By: hl475

Differential Revision: D20496398

Pulled By: houseroad

fbshipit-source-id: e677aef0a6aefb1f83a54033153aaabe5c23bc0f
2020-03-26 13:55:18 -04:00
5357b8e4d9 .circleci: Remove python 2 binary builds (#35475)
Python 2 is EOL soon so we're dropping support as of v1.5.0

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-26 10:50:34 -07:00
0f23d23db4 Add docs to resize_ and resize_as_ (#35392)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35392

Test Plan: Imported from OSS

Differential Revision: D20650097

Pulled By: VitalyFedyunin

fbshipit-source-id: cff4f555d355dfee42394f6070fe3e466949aeb5
2020-03-26 12:23:04 -04:00
7c24280a3f Add docs about memory format (#34818)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34818

Test Plan: Imported from OSS

Differential Revision: D20601336

Pulled By: VitalyFedyunin

fbshipit-source-id: d34ad226be950bf134c6b383a4810ea6aa75599e
2020-03-26 12:23:04 -04:00
7100f0be13 ports true_divide method variant to 1.5 (#35390)
Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-03-26 11:50:00 -04:00
f7f611c2ec torch.cat: disallow inputs on different devices (#35053)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35045
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35053

Differential Revision: D20545517

Pulled By: ngimel

fbshipit-source-id: eee3fc87c7e578ff44d69d5ce6f92a8f496fa97b
2020-03-26 10:58:33 -04:00
acb982d0b0 Add TORCH_CUDA_API to FilterDescriptor (#35131)
Summary:
`FilterDescriptor` is missing a `TORCH_CUDA_API`, so this symbol is not exported from `torch_cuda.so`, and users could have trouble building cpp_extension when using cudnn.

cc: ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35131

Differential Revision: D20604439

Pulled By: ezyang

fbshipit-source-id: c57414fc8a9df9cb1e910e2ec0a48cfdbe7d1779
2020-03-26 10:57:59 -04:00
aa8b7ad989 Fix thread_local initializtion in C10 WarningHandler. (#34822)
Summary:
The Windows + MSVC-specific bug discussed here: https://github.com/pytorch/pytorch/issues/19394 and fixed here: https://github.com/pytorch/pytorch/issues/22405 still appears in C10's warning handler class. This results in a crash if a user attempts to run code which would print a warning when that code is running inside a thread created by a DLL. This PR applies a similar fix to that of https://github.com/pytorch/pytorch/issues/22405.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34822

Test Plan:
* Tested locally by running CodecverseWorkbench Unity app with patched build.
* CI

Differential Revision: D20627971

Pulled By: HapeMask

fbshipit-source-id: 64dfca531ed7eebbe9e0ecac3d3d4d025c683883
2020-03-25 20:02:45 -07:00
2d403ed8be Add python excepiton handling catch block to resolve deadlock (#35283) (#35402)
Summary:
Note: This PR has been merged into master after the 1.5.0 branch cut at
36e3c00 (see original PR: #35283). This PR is to cherry pick it into 1.5.

---- Original Commit Description Follows ---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/35283

https://github.com/pytorch/pytorch/issues/34260

Deadlock on destructing py::error_already_set.

There are request callback impls in Python, where Python exceptions
could be thrown. For releasing Python exception py::objects, GIL must
be held.

Differential Revision: D7753253

fbshipit-source-id: 4bfaaaf027e4254f5e3fedaca80228c8b4282e39

Co-authored-by: Shihao Xu <shihaoxu@fb.com>
2020-03-25 17:05:18 -07:00
c25a664f77 Trying pinning pyyaml and setuptools on macos to older version (#35296) (#35400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35296

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20624843

Pulled By: ezyang

fbshipit-source-id: 9028f1dd62d0c25e916eb4927fd8dd6acbd88886
(cherry picked from commit 3f896ef7435201b2c3f51851f80dc674dfadfd40)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Edward Yang <ezyang@fb.com>
2020-03-25 16:04:06 -07:00
ab660ae394 Fix Tensor __radd__ type hint issue (#35231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35231

Fixes #35213

(Note: this ignores all push blocking failures!)

Test Plan: `mypy -c "import torch; ten = torch.tensor([1.0, 2.0, 3.0]); print(7 + ten)"` should not produce any warnings

Differential Revision: D20604924

Pulled By: pbelevich

fbshipit-source-id: 53a293a99b3f2ab6ca5516b31f3a92f67eb67a39
2020-03-25 18:37:07 -04:00
3c476a8858 PyTorch should always depend on future (#35057) (#35412)
Summary:
Because `past` is used in `caffe2.python.core`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35057

Test Plan: CI

Differential Revision: D20547042

Pulled By: malfet

fbshipit-source-id: cad2123c7b88271fea37f21e616df551075383a8
(cherry picked from commit d3f5045bf55e4a5dfb53ceccb6130e4e408cf466)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-03-25 14:54:26 -07:00
651fa88645 Load all DLLs in the lib directory for Windows (v.1.5.0) 2020-03-25 16:23:22 -04:00
565c3400b4 Update view op list. 2020-03-25 16:14:08 -04:00
3e332778b4 non blocking copy from #35144 2020-03-25 14:54:41 -04:00
f598738920 UBSAN deliberate float to int fix 2020-03-25 11:24:30 -04:00
4c6bfa0187 [1.5 cherrypick][JIT] Namespaces for TorchBind 2020-03-25 11:23:03 -04:00
6f25003682 [1.5 cherrypick][JIT] BC shim for TorchBind classes 2020-03-25 11:23:03 -04:00
752c129fa1 Update docs about DP and DDP for CUDA (#35063)
Summary:
We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063

Differential Revision: D20549621

Pulled By: ngimel

fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543
2020-03-25 11:18:17 -04:00
fb59a9caca .circleci: Change default CUDA for pip, cu101 -> cu102 (#35310)
So that packages are correctly marked when looking through the html
pages.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-24 15:05:25 -07:00
4d30dbdd35 Pin XLA CI to use r1.5 release branch. 2020-03-24 17:54:31 -04:00
b7f4a1a397 .circleci: Switch master to release/1.5 for git merge (#35320)
Since we're on a release branch we'll need to fix this up to do a merge
for release/1.5 instead of master.

TODO: In the future we should have a dynamic way of gathering the base
branch for PRs.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-24 14:52:24 -07:00
afda1dc943 Revert "Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation"
This reverts commit e2184ba08352d730d7165455c14f783b3e54082a.
2020-03-24 14:09:18 -04:00
d506ae882b Revert "Fix Conv and ConvTranspose implementation"
This reverts commit 88778854546b08bc6dd9f68e0a64311902c7d30c.
2020-03-24 14:09:18 -04:00
36e5abe531 Revert "Fix fractional_max_pool3d_with_indices implementation"
This reverts commit b89eb7c654b846fb3391cf4cc5aeb536cc41f1d7.
2020-03-24 14:09:18 -04:00
6e6f62230e Revert "Fix F::interpolate and torch::nn::Upsample implementation"
This reverts commit 75148df1f56c91f54965b530d606a6b9a4c8e269.
2020-03-24 14:09:18 -04:00
5d15577e6c Revert "Add inplace tests for several torch::nn modules / functionals"
This reverts commit 48590d6a9b939fb8097e4f2108872721ea5a516f.
2020-03-24 14:09:18 -04:00
6aa5298c5c Revert "Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions"
This reverts commit 5ca901431886d60687275b9a310eac5b5aeba02f.
2020-03-24 14:09:18 -04:00
f3df13725b Revert "[1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35113)"
This reverts commit 246b824644c3731b00be6119f69795afd4eac9b6.
2020-03-24 14:08:56 -04:00
4eee3caa11 [release/1.5] .circleci: Fix unbound CIRCLE_TAG variable (#35242)
Was failing when trying to execute this script on a non-tag

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-23 16:21:44 -07:00
4d96463130 Updating fbgemm 2020-03-23 13:31:24 -07:00
246b824644 [1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35113)
* add xor_convergence test for lbfgs

* increased batchsize to 6

* minor

* increased batch size
2020-03-23 16:00:57 -04:00
5ca9014318 Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions 2020-03-23 15:55:18 -04:00
48590d6a9b Add inplace tests for several torch::nn modules / functionals
gh-metadata: pytorch pytorch 35147 gh/yf225/101/head
2020-03-23 15:55:18 -04:00
75148df1f5 Fix F::interpolate and torch::nn::Upsample implementation
gh-metadata: pytorch pytorch 35025 gh/yf225/100/head
2020-03-23 15:55:18 -04:00
b89eb7c654 Fix fractional_max_pool3d_with_indices implementation
gh-metadata: pytorch pytorch 35024 gh/yf225/99/head
2020-03-23 15:55:18 -04:00
8877885454 Fix Conv and ConvTranspose implementation
gh-metadata: pytorch pytorch 35023 gh/yf225/98/head
2020-03-23 15:55:18 -04:00
e2184ba083 Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation
gh-metadata: pytorch pytorch 35022 gh/yf225/97/head
2020-03-23 15:55:18 -04:00
8ef47ad2f0 Updating fbgemm 2020-03-23 10:08:52 -07:00
6725b6f503 .cirlceci: Refactor how to grab the tagged version
Discovered that the upload scripts do not do well when there's no
pytorch repository to actually do git operations on.

CirlceCI however provides a nice environment variable with the name of
the current tag so let's just use that when it's available and fall back
on the git describe functionality if that fails.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 16:34:57 -07:00
bcd3f6da1a .circleci: Remove quotes from --git-dir
git doesn't handle the escapes correctly so let's just not put them
altogether.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 15:39:31 -07:00
0b3d2f7b7d .circleci: Make sure to add .git to --git-dir
--git-dir only works when it points directly to a .git folder

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 15:28:23 -07:00
f522651a7e .circleci: Switch git -C -> git --git-dir
Older versions of git do not contain the '-C' flag so let's switch to a
flag that is pre-historic and will run on any version of RHEL that is
still supported in the modern era.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 15:22:44 -07:00
01c8ef2757 .circleci: One more -C to add to get correct git info
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 15:08:02 -07:00
7cfe68ce3a .circleci: Hardcode directory to /pytorch to ensure git
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 14:54:57 -07:00
6f3120c6b9 .circleci: Ensure describe happens in pytorch repo
Found an issue where the git describe wasn't properly executed since the
binary_populate_env.sh script was being executed from a different
directory.

'git -C' forces the describe to run in the running directory for the
script which should contain the correct git information

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 14:24:18 -07:00
380 changed files with 7619 additions and 4886 deletions

View File

@ -466,7 +466,7 @@ But if you want to try, then Id recommend
# Always install miniconda 3, even if building for Python <3
new_conda="~/my_new_conda"
conda_sh="$new_conda/install_miniconda.sh"
curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"

View File

@ -34,8 +34,6 @@ def get_processor_arch_name(cuda_version):
LINUX_PACKAGE_VARIANTS = OrderedDict(
manywheel=[
"2.7m",
"2.7mu",
"3.5m",
"3.6m",
"3.7m",
@ -43,7 +41,7 @@ LINUX_PACKAGE_VARIANTS = OrderedDict(
],
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[
"2.7m",
"3.7m",
],
)
@ -53,11 +51,21 @@ CONFIG_TREE_DATA = OrderedDict(
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[
"2.7",
"3.7",
],
)),
windows=(dimensions.CUDA_VERSIONS, OrderedDict(
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[
"3.7",
],
)),
)
CONFIG_TREE_DATA_NO_WINDOWS = CONFIG_TREE_DATA.copy()
CONFIG_TREE_DATA_NO_WINDOWS.pop("windows")
# GCC config variants:
#
# All the nightlies (except libtorch with new gcc ABI) are built with devtoolset7,
@ -74,6 +82,11 @@ LINUX_GCC_CONFIG_VARIANTS = OrderedDict(
],
)
WINDOWS_LIBTORCH_CONFIG_VARIANTS = [
"debug",
"release",
]
class TopLevelNode(ConfigNode):
def __init__(self, node_name, config_tree_data, smoke):
@ -108,6 +121,8 @@ class PackageFormatConfigNode(ConfigNode):
def get_children(self):
if self.find_prop("os_name") == "linux":
return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]
elif self.find_prop("os_name") == "windows" and self.find_prop("package_format") == "libtorch":
return [WindowsLibtorchConfigNode(self, v) for v in WINDOWS_LIBTORCH_CONFIG_VARIANTS]
else:
return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]
@ -129,6 +144,16 @@ class LinuxGccConfigNode(ConfigNode):
return [ArchConfigNode(self, v) for v in cuda_versions]
class WindowsLibtorchConfigNode(ConfigNode):
def __init__(self, parent, libtorch_config_variant):
super(WindowsLibtorchConfigNode, self).__init__(parent, "LIBTORCH_CONFIG_VARIANT=" + str(libtorch_config_variant))
self.props["libtorch_config_variant"] = libtorch_config_variant
def get_children(self):
return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]
class ArchConfigNode(ConfigNode):
def __init__(self, parent, cu):
super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(cu))

View File

@ -6,7 +6,7 @@ import cimodel.lib.miniutils as miniutils
class Conf(object):
def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant):
def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant, libtorch_config_variant):
self.os = os
self.cuda_version = cuda_version
@ -15,11 +15,14 @@ class Conf(object):
self.smoke = smoke
self.libtorch_variant = libtorch_variant
self.gcc_config_variant = gcc_config_variant
self.libtorch_config_variant = libtorch_config_variant
def gen_build_env_parms(self):
elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.cuda_version)]
if self.gcc_config_variant is not None:
elems.append(str(self.gcc_config_variant))
if self.libtorch_config_variant is not None:
elems.append(str(self.libtorch_config_variant))
return elems
def gen_docker_image(self):
@ -67,9 +70,14 @@ class Conf(object):
job_def["requires"].append("update_s3_htmls_for_nightlies_devtoolset7")
job_def["filters"] = {"branches": {"only": "postnightly"}}
else:
filter_branches = ["nightly"]
# we only want to add the release branch filter if we aren't
# uploading
if phase not in ["upload"]:
filter_branches.append(r"/release\/.*/")
job_def["filters"] = {
"branches": {
"only": "nightly"
"only": filter_branches
},
# Will run on tags like v1.5.0-rc1, etc.
"tags": {
@ -105,11 +113,18 @@ class Conf(object):
def get_root(smoke, name):
return binary_build_data.TopLevelNode(
name,
binary_build_data.CONFIG_TREE_DATA,
smoke,
)
if smoke:
return binary_build_data.TopLevelNode(
name,
binary_build_data.CONFIG_TREE_DATA_NO_WINDOWS,
smoke,
)
else:
return binary_build_data.TopLevelNode(
name,
binary_build_data.CONFIG_TREE_DATA,
smoke,
)
def gen_build_env_list(smoke):
@ -127,6 +142,7 @@ def gen_build_env_list(smoke):
c.find_prop("smoke"),
c.find_prop("libtorch_variant"),
c.find_prop("gcc_config_variant"),
c.find_prop("libtorch_config_variant"),
)
newlist.append(conf)

View File

@ -4,7 +4,6 @@ from cimodel.lib.conf_tree import Ver
CONFIG_TREE_DATA = [
(Ver("ubuntu", "16.04"), [
([Ver("gcc", "5")], [XImportant("onnx_py2")]),
([Ver("clang", "7")], [XImportant("onnx_main_py3.6"),
XImportant("onnx_ort1_py3.6"),
XImportant("onnx_ort2_py3.6")]),

View File

@ -33,8 +33,7 @@ class Conf:
# TODO: Eventually we can probably just remove the cudnn7 everywhere.
def get_cudnn_insertion(self):
omit = self.language == "onnx_py2" \
or self.language == "onnx_main_py3.6" \
omit = self.language == "onnx_main_py3.6" \
or self.language == "onnx_ort1_py3.6" \
or self.language == "onnx_ort2_py3.6" \
or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \
@ -71,11 +70,10 @@ class Conf:
def gen_docker_image(self):
lang_substitutions = {
"onnx_py2": "py2",
"onnx_main_py3.6": "py3.6",
"onnx_ort1_py3.6": "py3.6",
"onnx_ort2_py3.6": "py3.6",
"cmake": "py2",
"cmake": "py3",
}
lang = miniutils.override(self.language, lang_substitutions)
@ -85,7 +83,7 @@ class Conf:
def gen_workflow_params(self, phase):
parameters = OrderedDict()
lang_substitutions = {
"onnx_py2": "onnx-py2",
"onnx_py3": "onnx-py3",
"onnx_main_py3.6": "onnx-main-py3.6",
"onnx_ort1_py3.6": "onnx-ort1-py3.6",
"onnx_ort2_py3.6": "onnx-ort2-py3.6",
@ -129,7 +127,7 @@ class Conf:
job_name = "caffe2_" + self.get_platform() + "_build"
if not self.is_important:
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/", r"/release\/.*/"]}}
job_def.update(self.gen_workflow_params(phase))
return {job_name : job_def}

View File

@ -8,7 +8,6 @@ CUDA_VERSIONS = [
]
STANDARD_PYTHON_VERSIONS = [
"2.7",
"3.5",
"3.6",
"3.7",

View File

@ -114,7 +114,7 @@ class Conf:
if not self.is_important:
# If you update this, update
# caffe2_build_definitions.py too
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/", r"/release\/.*/"]}}
job_def.update(self.gen_workflow_params(phase))
return {job_name : job_def}

File diff suppressed because it is too large Load Diff

View File

@ -4,7 +4,7 @@ set -ex
# Optionally install conda
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://repo.continuum.io/miniconda"
BASE_URL="https://repo.anaconda.com/miniconda"
MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)

View File

@ -10,6 +10,11 @@ retry () {
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ "$OSTYPE" == "msys" ]]; then
# windows executor (builds and tests)
rm -rf /c/w
ln -s "/c/Users/circleci/project" /c/w
workdir="/c/w"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
@ -19,8 +24,14 @@ else
fi
# It is very important that this stays in sync with binary_populate_env.sh
export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
if [[ "$OSTYPE" == "msys" ]]; then
# We need to make the paths as short as possible on Windows
export PYTORCH_ROOT="$workdir/p"
export BUILDER_ROOT="$workdir/b"
else
export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
fi
# Clone the Pytorch branch
retry git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"

View File

@ -31,9 +31,9 @@ fi
conda_sh="$workdir/install_miniconda.sh"
if [[ "$(uname)" == Darwin ]]; then
curl --retry 3 -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
else
curl --retry 3 -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
fi
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"

View File

@ -2,11 +2,31 @@
set -eux -o pipefail
export TZ=UTC
tagged_version() {
# Grabs version from either the env variable CIRCLE_TAG
# or the pytorch git described version
if [[ "$OSTYPE" == "msys" ]]; then
GIT_DESCRIBE="git --git-dir ${workdir}/p/.git describe"
else
GIT_DESCRIBE="git --git-dir ${workdir}/pytorch/.git describe"
fi
if [[ -n "${CIRCLE_TAG:-}" ]]; then
echo "${CIRCLE_TAG}"
elif ${GIT_DESCRIBE} --exact --tags >/dev/null; then
${GIT_DESCRIBE} --tags
else
return 1
fi
}
# We need to write an envfile to persist these variables to following
# steps, but the location of the envfile depends on the circleci executor
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ "$OSTYPE" == "msys" ]]; then
# windows executor (builds and tests)
workdir="/c/w"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
@ -23,7 +43,15 @@ configs=($BUILD_ENVIRONMENT)
export PACKAGE_TYPE="${configs[0]}"
export DESIRED_PYTHON="${configs[1]}"
export DESIRED_CUDA="${configs[2]}"
export DESIRED_DEVTOOLSET="${configs[3]:-}"
if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
export DESIRED_DEVTOOLSET=""
export LIBTORCH_CONFIG="${configs[3]:-}"
if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then
export DEBUG=1
fi
else
export DESIRED_DEVTOOLSET="${configs[3]:-}"
fi
if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
export BUILD_PYTHONLESS=1
fi
@ -47,15 +75,17 @@ export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="1.5.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
if git describe --tags --exact >/dev/null 2>/dev/null; then
# Use 'git -C' to make doubly sure we're in the correct directory for checking
# the git tag
if tagged_version >/dev/null; then
# Switch upload folder to 'test/' if we are on a tag
PIP_UPLOAD_FOLDER='test/'
# Grab git tag, remove prefixed v and remove everything after -
# Used to clean up tags that are for release candidates like v1.5.0-rc1
# Turns tag v1.5.0-rc1 -> v1.5.0
BASE_BUILD_VERSION="$(git describe --tags | sed -e 's/^v//' -e 's/-.*$//')"
BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"
fi
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu101" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu102" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"
else
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"
@ -94,6 +124,10 @@ export DESIRED_CUDA="$DESIRED_CUDA"
export LIBTORCH_VARIANT="${LIBTORCH_VARIANT:-}"
export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"
export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"
if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
export LIBTORCH_CONFIG="${LIBTORCH_CONFIG:-}"
export DEBUG="${DEBUG:-}"
fi
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.5.0.dev
@ -113,8 +147,13 @@ export DOCKER_IMAGE="$DOCKER_IMAGE"
export workdir="$workdir"
export MAC_PACKAGE_WORK_DIR="$workdir"
export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
if [[ "$OSTYPE" == "msys" ]]; then
export PYTORCH_ROOT="$workdir/p"
export BUILDER_ROOT="$workdir/b"
else
export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
fi
export MINICONDA_ROOT="$workdir/miniconda"
export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"

View File

@ -0,0 +1,33 @@
#!/bin/bash
set -eux -o pipefail
source "/c/w/env"
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"
export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export VC_YEAR=2017
export USE_SCCACHE=1
export SCCACHE_BUCKET=ossci-compiler-cache-windows
export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
set -x
if [[ "$CIRCLECI" == 'true' && -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019"
fi
echo "Free space on filesystem before build:"
df -h
pushd "$BUILDER_ROOT"
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
./windows/internal/build_conda.bat
elif [[ "$PACKAGE_TYPE" == 'wheel' || "$PACKAGE_TYPE" == 'libtorch' ]]; then
./windows/internal/build_wheels.bat
fi
echo "Free space on filesystem after build:"
df -h

View File

@ -0,0 +1,37 @@
#!/bin/bash
set -eu -o pipefail
set +x
declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -x ON BEFORE THIS LINE
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
set -eux -o pipefail
source "/env"
# This gets set in binary_populate_env.sh, but lets have a sane default just in case
PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly/}
# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable
# The only difference is the trailing slash
# Strip trailing slashes if there
CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')
pushd /root/workspace/final_pkgs
# Upload the package to the final location
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
retry conda install -c conda-forge -yq awscli
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
for pkg in $(ls); do
retry aws s3 cp "$pkg" "$s3_dir" --acl public-read
done
else
retry conda install -c conda-forge -yq awscli
s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read
fi

View File

@ -72,10 +72,10 @@ time python tools/setup_helpers/generate_code.py \
# Build the docs
pushd docs/cpp
pip install breathe>=4.13.0 bs4 lxml six
pip install breathe==4.13.0 bs4 lxml six
pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"
pip install exhale>=0.2.1
pip install sphinx>=2.0
pip install sphinx==2.4.4
# Uncomment once it is fixed
# pip install -r requirements.txt
time make VERBOSE=1 html -j

View File

@ -52,3 +52,12 @@ binary_mac_params: &binary_mac_params
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
binary_windows_params: &binary_windows_params
parameters:
build_environment:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
BUILD_FOR_SYSTEM: windows

View File

@ -275,3 +275,46 @@
script="/Users/distiller/project/.circleci/scripts/binary_ios_upload.sh"
cat "$script"
source "$script"
binary_windows_build:
<<: *binary_windows_params
executor:
name: windows-cpu-with-nvidia-cuda
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_scripts
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Build
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/c/w/p/.circleci/scripts/binary_windows_build.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: "C:/w"
paths: final_pkgs
binary_windows_upload:
<<: *binary_windows_params
docker:
- image: continuumio/miniconda
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- attach_scripts
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Upload
no_output_timeout: "10m"
command: |
set -eux -o pipefail
script="/pytorch/.circleci/scripts/binary_windows_upload.sh"
cat "$script"
source "$script"

View File

@ -151,7 +151,7 @@
# Install Anaconda if we need to
if [ -n "${CAFFE2_USE_ANACONDA}" ]; then
rm -rf ${TMPDIR}/anaconda
curl --retry 3 -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
curl --retry 3 -o ${TMPDIR}/conda.sh https://repo.anaconda.com/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
chmod +x ${TMPDIR}/conda.sh
/bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda
rm -f ${TMPDIR}/conda.sh

View File

@ -20,16 +20,16 @@ jobs:
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
# TODO We may want to move the rebase logic to a separate step after checkout
# Rebase to master only if in xenial_py3_6_gcc5_4 case
if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
# Rebase to release/1.5 only if in xenial_py3_6_gcc5_4 case
if [[ "${CIRCLE_BRANCH}" != "release/1.5" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
echo "Merge release/1.5 branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
set -x
git config --global user.email "circleci.ossci@gmail.com"
git config --global user.name "CircleCI"
git config remote.origin.url https://github.com/pytorch/pytorch.git
git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
git config --add remote.origin.fetch +refs/heads/release/1.5:refs/remotes/origin/release/1.5
git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/release/1.5:refs/remotes/origin/release/1.5 --depth=100 --quiet
export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/release/1.5`
echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
export GIT_COMMIT=${CIRCLE_SHA1}
echo "GIT_COMMIT: " ${GIT_COMMIT}
@ -38,7 +38,7 @@ jobs:
git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}
set +x
else
echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
echo "Do NOT merge release/1.5 branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
fi
git submodule sync && git submodule update -q --init --recursive

View File

@ -15,6 +15,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.11_py36_cuda10.1_test1
test_name: pytorch-windows-test1
@ -32,6 +33,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.11_py36_cuda10.1_test2
test_name: pytorch-windows-test2
@ -49,6 +51,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_build:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_build
cuda_version: "10"
@ -64,6 +67,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_test1
test_name: pytorch-windows-test1
@ -81,6 +85,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_test2
test_name: pytorch-windows-test2
@ -98,6 +103,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_build:
name: pytorch_windows_vs2019_py36_cuda10.1_build
cuda_version: "10"

View File

@ -7,12 +7,6 @@
# pytorch-ci-hud to adjust the list of whitelisted builds
# at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js
- binary_linux_build:
name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_build
build_environment: "manywheel 2.7mu cpu devtoolset7"
requires:
- setup
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_build:
name: binary_linux_manywheel_3_7m_cu102_devtoolset7_build
build_environment: "manywheel 3.7m cu102 devtoolset7"
@ -23,24 +17,21 @@
branches:
only:
- master
- binary_linux_build:
name: binary_linux_conda_2_7_cpu_devtoolset7_build
build_environment: "conda 2.7 cpu devtoolset7"
requires:
- setup
docker_image: "pytorch/conda-cuda"
- /ci-all\/.*/
- /release\/.*/
# This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3_6_cu90_devtoolset7_build
# TODO rename to remove python version for libtorch
- binary_linux_build:
name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build
build_environment: "libtorch 2.7m cpu devtoolset7"
name: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build
build_environment: "libtorch 3.7m cpu devtoolset7"
requires:
- setup
libtorch_variant: "shared-with-deps"
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_build:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
name: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
build_environment: "libtorch 3.7m cpu gcc5.4_cxx11-abi"
requires:
- setup
libtorch_variant: "shared-with-deps"
@ -48,45 +39,51 @@
# TODO we should test a libtorch cuda build, but they take too long
# - binary_linux_libtorch_2_7m_cu90_devtoolset7_static-without-deps_build
- binary_mac_build:
name: binary_macos_wheel_3_6_cpu_build
build_environment: "wheel 3.6 cpu"
requires:
- setup
filters:
branches:
only:
- master
- binary_mac_build:
name: binary_macos_conda_2_7_cpu_build
build_environment: "conda 2.7 cpu"
name: binary_macos_wheel_3_7_cpu_build
build_environment: "wheel 3.7 cpu"
requires:
- setup
filters:
branches:
only:
- master
- /ci-all\/.*/
- /release\/.*/
# This job has an average run time of 3 hours o.O
# Now only running this on master to reduce overhead
# TODO rename to remove python version for libtorch
- binary_mac_build:
name: binary_macos_libtorch_2_7_cpu_build
build_environment: "libtorch 2.7 cpu"
name: binary_macos_libtorch_3_7_cpu_build
build_environment: "libtorch 3.7 cpu"
requires:
- setup
filters:
branches:
only:
- master
- binary_linux_test:
name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_test
build_environment: "manywheel 2.7mu cpu devtoolset7"
- /ci-all\/.*/
- /release\/.*/
- binary_windows_build:
name: binary_windows_libtorch_3_7_cpu_debug_build
build_environment: "libtorch 3.7 cpu debug"
requires:
- setup
- binary_windows_build:
name: binary_windows_libtorch_3_7_cpu_release_build
build_environment: "libtorch 3.7 cpu release"
requires:
- setup
- binary_windows_build:
name: binary_windows_wheel_3_7_cu102_build
build_environment: "wheel 3.7 cu102"
requires:
- setup
- binary_linux_manywheel_2_7mu_cpu_devtoolset7_build
docker_image: "pytorch/manylinux-cuda102"
filters:
branches:
only:
- master
- /ci-all\/.*/
- /release\/.*/
- binary_linux_test:
name: binary_linux_manywheel_3_7m_cu102_devtoolset7_test
build_environment: "manywheel 3.7m cu102 devtoolset7"
@ -100,29 +97,25 @@
branches:
only:
- master
- binary_linux_test:
name: binary_linux_conda_2_7_cpu_devtoolset7_test
build_environment: "conda 2.7 cpu devtoolset7"
requires:
- setup
- binary_linux_conda_2_7_cpu_devtoolset7_build
docker_image: "pytorch/conda-cuda"
- /ci-all\/.*/
- /release\/.*/
# This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3_6_cu90_devtoolset7_test:
# TODO rename to remove python version for libtorch
- binary_linux_test:
name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_test
build_environment: "libtorch 2.7m cpu devtoolset7"
name: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test
build_environment: "libtorch 3.7m cpu devtoolset7"
requires:
- setup
- binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build
- binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_test:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
name: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test
build_environment: "libtorch 3.7m cpu gcc5.4_cxx11-abi"
requires:
- setup
- binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
- binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"

View File

@ -20,21 +20,12 @@
- docker_build_job:
name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9-cudnn7-py2"
image_name: "pytorch-linux-xenial-cuda9-cudnn7-py2"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9-cudnn7-py3"
image_name: "pytorch-linux-xenial-cuda9-cudnn7-py3"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-py2.7.9"
image_name: "pytorch-linux-xenial-py2.7.9"
- docker_build_job:
name: "pytorch-linux-xenial-py2.7"
image_name: "pytorch-linux-xenial-py2.7"
- docker_build_job:
name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
image_name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

View File

@ -4,6 +4,8 @@
branches:
only:
- master
- /ci-all\/.*/
- /release\/.*/
requires:
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
@ -13,6 +15,8 @@
branches:
only:
- master
- /ci-all\/.*/
- /release\/.*/
requires:
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

View File

@ -31,6 +31,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
build_environment: "pytorch-linux-xenial-py3-clang5-mobile-code-analysis"
build_only: "1"
# Use LLVM-DEV toolchain in android-ndk-r19c docker image

View File

@ -81,44 +81,6 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
flake8-py2:
runs-on: ubuntu-latest
steps:
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 2.x
architecture: x64
- name: Fetch PyTorch
uses: actions/checkout@v1
- name: Checkout PR tip
run: |
set -eux
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
# We are on a PR, so actions/checkout leaves us on a merge commit.
# Check out the actual tip of the branch.
git checkout ${{ github.event.pull_request.head.sha }}
fi
echo ::set-output name=commit_sha::$(git rev-parse HEAD)
id: get_pr_tip
- name: Run flake8
run: |
set -eux
pip install flake8
rm -rf .circleci tools/clang_format_new.py
flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt
cat ${GITHUB_WORKSPACE}/flake8-output.txt
- name: Add annotations
uses: pytorch/add-annotations-github-action@master
with:
check_name: 'flake8-py2'
linter_output_path: 'flake8-output.txt'
commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}
regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
clang-tidy:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
@ -198,6 +160,8 @@ jobs:
-g"-torch/csrc/jit/export.cpp" \
-g"-torch/csrc/jit/import.cpp" \
-g"-torch/csrc/jit/netdef_converter.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
"$@" > ${GITHUB_WORKSPACE}/clang-tidy-output.txt
cat ${GITHUB_WORKSPACE}/clang-tidy-output.txt

View File

@ -167,7 +167,7 @@ fi
# Patch required to build xla
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
git clone --recursive https://github.com/pytorch/xla.git
git clone --recursive -b r1.5 https://github.com/pytorch/xla.git
./xla/scripts/apply_patches.sh
fi
@ -259,7 +259,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# XLA build requires Bazel
# We use bazelisk to avoid updating Bazel version manually.
sudo npm install -g @bazel/bazelisk
sudo ln -s "$(command -v bazelisk)" /usr/bin/bazel
sudo ln -sf "$(command -v bazelisk)" /usr/bin/bazel
# Install bazels3cache for cloud cache
sudo npm install -g bazels3cache

View File

@ -13,12 +13,12 @@ mkdir -p ${WORKSPACE_DIR}
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then
mkdir -p ${WORKSPACE_DIR}
curl --retry 3 https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
curl --retry 3 https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
retry bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3
fi
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
source ${WORKSPACE_DIR}/miniconda3/bin/activate
retry conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
retry conda install -y mkl mkl-include numpy pyyaml=5.3 setuptools=46.0.0 cmake cffi ninja
# The torch.hub tests make requests to GitHub.
#

View File

@ -20,7 +20,7 @@ if [ -n "${IN_CIRCLECI}" ]; then
sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-cudnn7-py3* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

View File

@ -21,7 +21,7 @@ if [ -n "${IN_CIRCLECI}" ]; then
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-cudnn7-py3* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
@ -244,7 +244,7 @@ test_backward_compatibility() {
pushd test/backward_compatibility
python dump_all_function_schemas.py --filename new_schemas.txt
pip_uninstall torch
pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
pip_install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
python check_backward_compatibility.py --new-schemas new_schemas.txt
popd
set +x

View File

@ -5,7 +5,7 @@ if "%BUILD_ENVIRONMENT%"=="" (
)
if "%REBUILD%"=="" (
IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )
curl --retry 3 -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
curl --retry 3 -k https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
)
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

View File

@ -13,7 +13,7 @@ if "%BUILD_ENVIRONMENT%"=="" (
)
if NOT "%BUILD_ENVIRONMENT%"=="" (
IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )
curl --retry 3 https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
curl --retry 3 https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
)
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

View File

@ -160,20 +160,18 @@ ENDIF(BLAS_FOUND)
IF(LAPACK_FOUND)
list(APPEND ATen_CPU_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
if(USE_CUDA)
if(USE_CUDA AND MSVC)
# Although Lapack provides CPU (and thus, one might expect that ATen_cuda
# would not need this at all), some of our libraries (magma in particular)
# backend to CPU BLAS/LAPACK implementations, and so it is very important
# we get the *right* implementation, because even if the symbols are the
# same, LAPACK implementions may have different calling conventions.
# This caused https://github.com/pytorch/pytorch/issues/7353
#
# We do NOT do this on Linux, since we just rely on torch_cpu to
# provide all of the symbols we need
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
endif()
if(USE_ROCM)
# It's not altogether clear that HIP behaves the same way, but it
# seems safer to assume that it needs it too
list(APPEND ATen_HIP_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
endif()
ENDIF(LAPACK_FOUND)
IF (UNIX AND NOT APPLE)
@ -331,8 +329,12 @@ IF(USE_CUDA AND NOT USE_ROCM)
IF(USE_MAGMA)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${MAGMA_LIBRARIES})
IF ($ENV{TH_BINARY_BUILD})
list(APPEND ATen_CUDA_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
IF (MSVC)
# Do not do this on Linux: see Note [Extra MKL symbols for MAGMA in torch_cpu]
# in caffe2/CMakeLists.txt
list(APPEND ATen_CUDA_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
ENDIF(MSVC)
ENDIF($ENV{TH_BINARY_BUILD})
ENDIF(USE_MAGMA)
IF ($ENV{ATEN_STATIC_CUDA})

View File

@ -125,13 +125,15 @@ void _parallel_run(
std::tie(num_tasks, chunk_size) =
internal::calc_num_tasks_and_chunk_size(begin, end, grain_size);
std::atomic_flag err_flag = ATOMIC_FLAG_INIT;
std::exception_ptr eptr;
std::vector<std::shared_ptr<c10::ivalue::Future>> futures(num_tasks);
for (size_t task_id = 0; task_id < num_tasks; ++task_id) {
futures[task_id] = std::make_shared<c10::ivalue::Future>(c10::NoneType::get());
}
auto task = [f, &eptr, &err_flag, &futures, begin, end, chunk_size]
struct {
std::atomic_flag err_flag = ATOMIC_FLAG_INIT;
std::exception_ptr eptr;
std::mutex mutex;
volatile size_t remaining;
std::condition_variable cv;
} state;
auto task = [f, &state, begin, end, chunk_size]
(int /* unused */, size_t task_id) {
int64_t local_start = begin + task_id * chunk_size;
if (local_start < end) {
@ -140,21 +142,30 @@ void _parallel_run(
ParallelRegionGuard guard(task_id);
f(local_start, local_end, task_id);
} catch (...) {
if (!err_flag.test_and_set()) {
eptr = std::current_exception();
if (!state.err_flag.test_and_set()) {
state.eptr = std::current_exception();
}
}
}
futures[task_id]->markCompleted();
{
std::unique_lock<std::mutex> lk(state.mutex);
if (--state.remaining == 0) {
state.cv.notify_one();
}
}
};
state.remaining = num_tasks;
_run_with_pool(task, num_tasks);
// Wait for all tasks to finish.
for (size_t task_id = 0; task_id < num_tasks; ++task_id) {
futures[task_id]->wait();
{
std::unique_lock<std::mutex> lk(state.mutex);
if (state.remaining != 0) {
state.cv.wait(lk);
}
}
if (eptr) {
std::rethrow_exception(eptr);
if (state.eptr) {
std::rethrow_exception(state.eptr);
}
}

View File

@ -16,14 +16,6 @@
#include <numeric>
#include <memory>
#if defined(__clang__)
#define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
#define __ubsan_ignore_vptr__ __attribute__((no_sanitize("vptr")))
#else
#define __ubsan_ignore_float_divide_by_zero__
#define __ubsan_ignore_vptr__
#endif
#define AT_DISALLOW_COPY_AND_ASSIGN(TypeName) \
TypeName(const TypeName&) = delete; \
void operator=(const TypeName&) = delete

View File

@ -20,6 +20,10 @@ void registerCustomClass(at::ClassTypePtr class_type) {
}
at::ClassTypePtr getCustomClass(const std::string& name) {
// BC hack so we can upgrade a binary internally
if (name == "__torch__.torch.classes.SentencePiece") {
return getCustomClass("__torch__.torch.classes.fb.SentencePiece");
}
return customClasses().count(name) ? customClasses()[name] : nullptr;
}

View File

@ -15,6 +15,7 @@
#include <c10/util/math_compat.h>
#include <ATen/native/cpu/zmath.h>
#include <c10/util/TypeCast.h>
#include <c10/macros/Macros.h>
#if defined(__GNUC__)
#define __at_align32__ __attribute__((aligned(32)))

View File

@ -145,7 +145,7 @@ private:
std::ostream& operator<<(std::ostream & out, const TensorDescriptor& d);
class FilterDescriptor
class TORCH_CUDA_API FilterDescriptor
: public Descriptor<cudnnFilterStruct,
&cudnnCreateFilterDescriptor,
&cudnnDestroyFilterDescriptor>

View File

@ -698,17 +698,34 @@ Tensor leaky_relu_backward(
}
std::tuple<Tensor, Tensor> log_sigmoid_forward_cpu(const Tensor& input) {
auto result = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
auto buffer = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
// FIXME: do these actually need to be zeros_like or can they be empty_like?
auto result = at::zeros_like(input, at::MemoryFormat::Contiguous);
auto buffer = at::zeros_like(input, at::MemoryFormat::Contiguous);
log_sigmoid_cpu_stub(kCPU, result, buffer, input.contiguous());
return std::make_tuple(result, buffer);
}
std::tuple<Tensor&, Tensor&> log_sigmoid_forward_out_cpu(Tensor& result, Tensor& buffer, const Tensor& input) {
log_sigmoid_cpu_stub(kCPU, result, buffer, input);
result.resize_as_(input);
buffer.resize_as_(input, at::MemoryFormat::Contiguous);
TORCH_CHECK(buffer.is_contiguous(), "Contiguous buffer required for log_sigmoid with out parameter");
Tensor result_tmp = result.is_contiguous() ? result : at::empty_like(result, at::MemoryFormat::Contiguous);
log_sigmoid_cpu_stub(kCPU, result_tmp, buffer, input.contiguous());
if (!result.is_contiguous()) {
result.copy_(result_tmp);
}
return std::forward_as_tuple(result, buffer);
}
Tensor & log_sigmoid_out(Tensor & output, const Tensor & self) {
Tensor buffer = at::empty({0}, self.options());
return std::get<0>(at::log_sigmoid_forward_out(output, buffer, self));
}
Tensor log_sigmoid(const Tensor & self) {
return std::get<0>(at::log_sigmoid_forward(self));
}
Tensor log_sigmoid_backward_cpu(const Tensor& grad_output, const Tensor& input, const Tensor& buffer) {
Tensor grad_input;
auto iter = at::TensorIterator();

View File

@ -138,6 +138,10 @@ Tensor true_divide(const Tensor& self, const Tensor& divisor) {
return iter.output();
}
Tensor& true_divide_(Tensor& self, const Tensor& divisor) {
return native::true_divide_out(self, self, divisor);
}
Tensor& floor_divide_out(Tensor& result, const Tensor& self, const Tensor& other) {
auto iter = TensorIterator::binary_op(result, self, other,
/*check_mem_overlap=*/true);
@ -731,7 +735,11 @@ Tensor& fmod_(Tensor& self, Scalar other) {
}
Tensor true_divide(const Tensor& self, Scalar divisor) {
return at::true_divide(self, wrapped_scalar_tensor(divisor)); // redispatch!
return self.true_divide(wrapped_scalar_tensor(divisor)); // redispatch!
}
Tensor& true_divide_(Tensor& self, Scalar divisor) {
return self.true_divide_(wrapped_scalar_tensor(divisor)); // redispatch!
}
}

View File

@ -70,8 +70,8 @@ struct CAFFE2_API DispatchStub<rT (*)(Args...), T> {
// they will still compute the same value for cpu_dispatch_ptr.
if (!cpu_dispatch_ptr.load(std::memory_order_relaxed)) {
FnPtr tmp_cpu_dispatch_ptr = nullptr;
cpu_dispatch_ptr.compare_exchange_weak(
tmp_cpu_dispatch_ptr, choose_cpu_impl(), std::memory_order_relaxed);
while(!cpu_dispatch_ptr.compare_exchange_weak(
tmp_cpu_dispatch_ptr, choose_cpu_impl(), std::memory_order_relaxed));
}
return (*cpu_dispatch_ptr)(std::forward<ArgTypes>(args)...);
} else if (device_type == DeviceType::CUDA) {

View File

@ -31,15 +31,6 @@ Tensor nll_loss2d(const Tensor & self, const Tensor & target, const Tensor & wei
return std::get<0>(at::nll_loss2d_forward(self, target, weight, reduction, ignore_index));
}
Tensor & log_sigmoid_out(Tensor & output, const Tensor & self) {
Tensor buffer = at::empty({0}, self.options());
return std::get<0>(at::log_sigmoid_forward_out(output, buffer, self));
}
Tensor log_sigmoid(const Tensor & self) {
return std::get<0>(at::log_sigmoid_forward(self));
}
Tensor & thnn_conv2d_out(Tensor & output, const Tensor & self, const Tensor & weight, IntArrayRef kernel_size, const Tensor & bias, IntArrayRef stride, IntArrayRef padding) {
Tensor finput = at::empty({0}, self.options());
Tensor fgrad_input = at::empty({0}, self.options());

View File

@ -533,7 +533,7 @@ Tensor frobenius_norm(const Tensor& self, IntArrayRef dim, bool keepdim) {
return at::norm(self, 2, dim, keepdim, self.scalar_type());
}
if (self.is_complex()){
return at::sqrt(at::sum((self.conj() * self).real(), dim, keepdim));
return at::sqrt(at::sum(at::real(self.conj() * self), dim, keepdim));
} else {
return at::sqrt(at::sum((self * self), dim, keepdim));
}
@ -553,7 +553,7 @@ Tensor &frobenius_norm_out(
return at::norm_out(result, self, 2, dim, keepdim, self.scalar_type());
}
if (self.is_complex()){
return at::sqrt_out(result, at::sum((self.conj() * self).real(), dim, keepdim));
return at::sqrt_out(result, at::sum(at::real(self.conj() * self), dim, keepdim));
} else {
return at::sqrt_out(result, at::sum((self * self), dim, keepdim));
}

View File

@ -799,7 +799,7 @@ static Tensor &std_var_out(Tensor &result, const Tensor &self, IntArrayRef dim,
if (at::isComplexType(self.scalar_type())){
ScalarType dtype = c10::toValueType(get_dtype(result, self, {}, true));
Tensor real_in = self.real().to(dtype);
Tensor real_in = at::real(self).to(dtype);
Tensor real_out = at::empty({0}, self.options().dtype(dtype));
auto iter = make_reduction("std or var", real_out, real_in, dim, keepdim, dtype);
if (iter.numel() == 0) {
@ -807,7 +807,7 @@ static Tensor &std_var_out(Tensor &result, const Tensor &self, IntArrayRef dim,
} else {
std_var_stub(iter.device_type(), iter, unbiased, false);
}
Tensor imag_in = self.imag().to(dtype);
Tensor imag_in = at::imag(self).to(dtype);
Tensor imag_out = at::empty({0}, self.options().dtype(dtype));
iter = make_reduction("std or var", imag_out, imag_in, dim, keepdim, dtype);
if (iter.numel() == 0) {
@ -845,7 +845,7 @@ static std::tuple<Tensor&,Tensor&> std_var_mean_out(const char* fname, Tensor &r
".");
if (at::isComplexType(self.scalar_type())){
ScalarType dtype = c10::toValueType(get_dtype(result1, self, {}, true));
Tensor real_in = self.real().to(dtype);
Tensor real_in = at::real(self).to(dtype);
Tensor real_out_var = at::empty({0}, self.options().dtype(dtype));
Tensor real_out_mean = at::empty({0}, self.options().dtype(dtype));
auto iter = make_reduction(fname, real_out_var, real_out_mean, real_in, dim, keepdim, dtype);
@ -855,7 +855,7 @@ static std::tuple<Tensor&,Tensor&> std_var_mean_out(const char* fname, Tensor &r
} else {
std_var_stub(iter.device_type(), iter, unbiased, false);
}
Tensor imag_in = self.imag().to(dtype);
Tensor imag_in = at::imag(self).to(dtype);
Tensor imag_out_var = at::empty({0}, self.options().dtype(dtype));
Tensor imag_out_mean = at::empty({0}, self.options().dtype(dtype));
iter = make_reduction(fname, imag_out_var, imag_out_mean, imag_in, dim, keepdim, dtype);

View File

@ -33,7 +33,7 @@ static inline Tensor to_impl(const Tensor& self, const TensorOptions& options, b
if (self.is_non_overlapping_and_dense()) {
// Copy all strides
auto r = at::empty_strided(self.sizes(), self.strides(), options.memory_format(c10::nullopt));
r.copy_(self);
r.copy_(self, non_blocking);
return r;
} else {
memory_format = self.suggest_memory_format();

View File

@ -99,7 +99,7 @@ Tensor _dim_arange(const Tensor& like, int64_t dim) {
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ empty ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tensor empty_cpu(IntArrayRef size, const TensorOptions& options_, c10::optional<c10::MemoryFormat> optional_memory_format) {
TORCH_CHECK(!isComplexType(at::typeMetaToScalarType(options_.dtype())), "Complex dtype not supported.");
TORCH_CHECK(
!(options_.has_memory_format() && optional_memory_format.has_value()),
"Cannot set memory_format both in TensorOptions and explicit argument; please delete "

View File

@ -98,6 +98,15 @@ Tensor & _cat_out_cpu(Tensor& result, TensorList tensors, int64_t dim) {
"output memory locations. Found overlap in input tensor ", i);
}
// Dtypes should be the same
const auto first_in_cat = tensors[0];
for (int64_t i = 1; i < tensors.size(); i++) {
TORCH_CHECK(first_in_cat.dtype() == tensors[i].dtype(),
"Expected object of scalar type ", first_in_cat.dtype(),
" but got scalar type ", tensors[i].dtype(),
" for sequence element ", i, ".");
}
auto should_skip = [](const Tensor& t) { return t.numel() == 0 && t.dim() == 1; };
for (auto const &tensor : tensors) {
if (should_skip(tensor)) {

View File

@ -73,11 +73,17 @@ Tensor& abs_(Tensor& self) { return unary_op_impl_(self, at::abs_out); }
Tensor& angle_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, angle_stub); }
Tensor angle(const Tensor& self) { return unary_op_impl(self, at::angle_out); }
Tensor& real_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, real_stub); }
Tensor real(const Tensor& self) { return unary_op_impl(self, at::real_out); }
Tensor real(const Tensor& self) {
TORCH_CHECK(!self.is_complex(), "real is not yet implemented for complex tensors.");
return self;
}
Tensor& imag_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, imag_stub); }
Tensor imag(const Tensor& self) { return unary_op_impl(self, at::imag_out); }
Tensor imag(const Tensor& self) {
TORCH_CHECK(false, "imag is not yet implemented.");
// Note: unreachable
return at::zeros_like(self);
}
Tensor& conj_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, conj_stub); }
Tensor conj(const Tensor& self) { return unary_op_impl(self, at::conj_out); }

View File

@ -7,6 +7,7 @@
#include <ATen/native/TensorIterator.h>
#include <ATen/native/BinaryOps.h>
#include <ATen/native/cpu/Loops.h>
#include <c10/macros/Macros.h>
namespace at { namespace native {
namespace {

View File

@ -4,7 +4,7 @@
#include <ATen/native/cuda/zmath.cuh>
#include <ATen/native/TensorIterator.h>
#include <ATen/native/BinaryOps.h>
#include <c10/macros/Macros.h>
// NOTE: CUDA on Windows requires that the enclosing function
// of a __device__ lambda not have internal linkage.
@ -69,7 +69,6 @@ void remainder_kernel_cuda(TensorIterator& iter) {
AT_DISPATCH_INTEGRAL_TYPES(iter.dtype(), "remainder_cuda", [&]() {
using thrust_t = typename ztype_cuda<scalar_t>::thrust_t;
gpu_kernel_with_scalars(iter, []GPU_LAMBDA(thrust_t a, thrust_t b) -> thrust_t {
CUDA_KERNEL_ASSERT(b != 0);
thrust_t r = a % b;
if ((r != 0) && ((r < 0) != (b < 0))) {
r += b;

View File

@ -358,7 +358,7 @@ void max_pool2d_with_indices_out_cuda_template(
Tensor input = input_.contiguous(memory_format);
const int64_t in_stride_n = input.stride(-4);
const int64_t in_stride_n = input_.ndimension() == 4 ? input.stride(-4) : 0;
const int64_t in_stride_c = input.stride(-3);
const int64_t in_stride_h = input.stride(-2);
const int64_t in_stride_w = input.stride(-1);
@ -506,7 +506,7 @@ void max_pool2d_with_indices_backward_out_cuda_template(
const int64_t inputHeight = input.size(-2);
const int64_t inputWidth = input.size(-1);
const int64_t in_stride_n = input.stride(-4);
const int64_t in_stride_n = input.ndimension() == 4 ? input.stride(-4) : 0;
const int64_t in_stride_c = input.stride(-3);
const int64_t in_stride_h = input.stride(-2);
const int64_t in_stride_w = input.stride(-1);

View File

@ -192,13 +192,13 @@ void index_put_accum_kernel(Tensor & self, TensorList indices, const Tensor & va
if (num_indices > 0 && sliceSize > 0) {
const bool permuted = !src.is_contiguous();
auto src_ = permuted ? src.contiguous() : src;
linearIndex = linearIndex.view(-1);
linearIndex = linearIndex.reshape(-1);
auto sorted_indices = at::empty_like(linearIndex, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
auto orig_indices = at::empty_like(linearIndex, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
using device_ptr = thrust::device_ptr<int64_t>;
const cudaStream_t stream = at::cuda::getCurrentCUDAStream();
linearIndex.div_(sliceSize);
linearIndex.floor_divide_(sliceSize);
{
sorted_indices.copy_(linearIndex);
auto allocator = THCThrustAllocator(globalContext().lazyInitCUDA());

View File

@ -431,13 +431,12 @@ __global__ void batch_norm_backward_reduce_kernel(
const GenericPackedTensorAccessor<input_scalar_t, 3, DefaultPtrTraits, index_t> grad_output,
GenericPackedTensorAccessor<stat_accscalar_t, 1, DefaultPtrTraits, index_t> mean,
GenericPackedTensorAccessor<stat_accscalar_t, 1, DefaultPtrTraits, index_t> invstd,
GenericPackedTensorAccessor<stat_accscalar_t, 1, DefaultPtrTraits, index_t> mean_dy,
GenericPackedTensorAccessor<stat_accscalar_t, 1, DefaultPtrTraits, index_t> mean_dy_xmu,
GenericPackedTensorAccessor<stat_accscalar_t, 1, DefaultPtrTraits, index_t> sum_dy,
GenericPackedTensorAccessor<stat_accscalar_t, 1, DefaultPtrTraits, index_t> sum_dy_xmu,
GenericPackedTensorAccessor<stat_scalar_t, 1, DefaultPtrTraits, index_t> grad_weight,
GenericPackedTensorAccessor<stat_scalar_t, 1, DefaultPtrTraits, index_t> grad_bias) {
index_t plane = blockIdx.x;
index_t N = input.size(0) * input.size(2);
stat_accscalar_t r_mean = mean[plane];
stat_accscalar_t factor = invstd[plane];
@ -446,7 +445,6 @@ __global__ void batch_norm_backward_reduce_kernel(
Float2<input_scalar_t, stat_accscalar_t> res = reduce<Float2<input_scalar_t, stat_accscalar_t>, GradOp<input_scalar_t, stat_accscalar_t,
GenericPackedTensorAccessor<input_scalar_t, 3, DefaultPtrTraits, index_t>>>(g, grad_output, plane);
stat_accscalar_t norm = stat_accscalar_t(1) / N;
if (threadIdx.x == 0) {
if (grad_weight.size(0) > 0) {
grad_weight[plane] = static_cast<stat_scalar_t>(res.v2 * factor);
@ -454,11 +452,11 @@ __global__ void batch_norm_backward_reduce_kernel(
if (grad_bias.size(0) > 0) {
grad_bias[plane] = static_cast<stat_scalar_t>(res.v1);
}
if (mean_dy.size(0) > 0) {
mean_dy[plane] = static_cast<stat_accscalar_t>(res.v1 * norm);
if (sum_dy.size(0) > 0) {
sum_dy[plane] = static_cast<stat_accscalar_t>(res.v1);
}
if (mean_dy_xmu.size(0) > 0) {
mean_dy_xmu[plane] = static_cast<stat_accscalar_t>(res.v2 * norm);
if (sum_dy_xmu.size(0) > 0) {
sum_dy_xmu[plane] = static_cast<stat_accscalar_t>(res.v2);
}
}
}
@ -740,16 +738,16 @@ std::tuple<Tensor, Tensor, Tensor, Tensor> batch_norm_backward_reduce_cuda_templ
using stat_accscalar_t = at::acc_type<stat_scalar_t, true>;
int64_t n_input = input_.size(1);
Tensor mean_dy_;
Tensor mean_dy_xmu_;
Tensor sum_dy_;
Tensor sum_dy_xmu_;
Tensor grad_weight_;
Tensor grad_bias_;
auto input_reshaped = input_.reshape({input_.size(0), input_.size(1), -1}); // internally we merge the feature dimensions
auto grad_output_reshaped = grad_out_.reshape(input_reshaped.sizes());
if (input_g) {
mean_dy_ = at::empty_like(mean_, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
mean_dy_xmu_ = at::empty_like(mean_, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
sum_dy_ = at::empty_like(mean_, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
sum_dy_xmu_ = at::empty_like(mean_, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
}
if (weight_g) {
grad_weight_ = at::empty({n_input}, weight_.options());
@ -764,8 +762,8 @@ std::tuple<Tensor, Tensor, Tensor, Tensor> batch_norm_backward_reduce_cuda_templ
auto grad_bias = packed_accessor_or_dummy<stat_scalar_t, 1, DefaultPtrTraits, index_t>(grad_bias_);
auto mean = packed_accessor_or_dummy<stat_accscalar_t, 1, DefaultPtrTraits, index_t>(mean_);
auto invstd = packed_accessor_or_dummy<stat_accscalar_t, 1, DefaultPtrTraits, index_t>(invstd_);
auto mean_dy = packed_accessor_or_dummy<stat_accscalar_t, 1, DefaultPtrTraits, index_t>(mean_dy_);
auto mean_dy_xmu = packed_accessor_or_dummy<stat_accscalar_t, 1, DefaultPtrTraits, index_t>(mean_dy_xmu_);
auto sum_dy = packed_accessor_or_dummy<stat_accscalar_t, 1, DefaultPtrTraits, index_t>(sum_dy_);
auto sum_dy_xmu = packed_accessor_or_dummy<stat_accscalar_t, 1, DefaultPtrTraits, index_t>(sum_dy_xmu_);
auto batch_size = input_reshaped.size(0);
auto feature_size = input_reshaped.size(2);
@ -778,10 +776,10 @@ std::tuple<Tensor, Tensor, Tensor, Tensor> batch_norm_backward_reduce_cuda_templ
const dim3 grid(n_input);
batch_norm_backward_reduce_kernel<input_scalar_t, stat_scalar_t, stat_accscalar_t, index_t> <<<grid, block, 0, stream>>>
(input, grad_output, mean, invstd, mean_dy, mean_dy_xmu, grad_weight, grad_bias);
(input, grad_output, mean, invstd, sum_dy, sum_dy_xmu, grad_weight, grad_bias);
AT_CUDA_CHECK(cudaGetLastError());
return std::make_tuple(mean_dy_, mean_dy_xmu_, grad_weight_, grad_bias_);
return std::make_tuple(sum_dy_, sum_dy_xmu_, grad_weight_, grad_bias_);
}
template<typename input_scalar_t, typename stat_scalar_t, typename index_t>

View File

@ -307,6 +307,15 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
"tensor ", i);
}
// Dtypes should be the same
const auto first_in_cat = inputs[0];
for (int64_t i = 1; i < inputs.size(); i++) {
TORCH_CHECK(first_in_cat.dtype() == inputs[i].dtype(),
"Expected object of scalar type ", first_in_cat.dtype(),
" but got scalar type ", inputs[i].dtype(),
" for sequence element ", i, ".");
}
for (int i = 0; i < inputs.size(); i++)
{
if (should_skip(inputs[i])) {
@ -325,6 +334,12 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
TORCH_CHECK(inputs.size() > 0, "invalid number of inputs ", inputs.size());
TORCH_CHECK(dimension >= 0, "invalid dimension ", dimension);
for (const Tensor& t: inputs) {
TORCH_CHECK(t.device() == notSkippedTensor->device(),
"All input tensors must be on the same device. Received ",
t.device(), " and ", notSkippedTensor->device());
}
c10::MemoryFormat memory_format = compute_output_memory_format(inputs);
std::vector<int64_t> size(notSkippedTensor->sizes().vec());
@ -355,17 +370,11 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
// 4. The number of dimensions is <= 4
// 5. All input tensors are contiguous (output tensor may be non-contig)
// 6. All input tensors can use 32-bit indexing
// 7. All input tensors are on the same device
const bool all32BitIndexable = std::all_of(inputs.begin(), inputs.end(),
[] (const Tensor& t) {
return at::cuda::detail::canUse32BitIndexMath(t);
});
Device firstDevice = notSkippedTensor->device();
const bool allSameDevice = std::all_of(inputs.begin(), inputs.end(),
[firstDevice](const Tensor& t) {
return t.device() == firstDevice;
});
const bool allContiguous = std::all_of(inputs.begin(), inputs.end(),
[=](const Tensor& t) {
return !t.defined() || t.is_contiguous(memory_format);
@ -375,8 +384,7 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
out.dim() <= CAT_ARRAY_MAX_INPUT_DIMS &&
at::cuda::detail::canUse32BitIndexMath(out) &&
allContiguous &&
all32BitIndexable &&
allSameDevice) {
all32BitIndexable) {
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(
at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16,

View File

@ -125,7 +125,7 @@ struct TopKTypeConfig<at::Half> {
static inline __device__ RadixType convert(at::Half v) {
#if defined(__CUDA_ARCH__) || defined(__HIP_PLATFORM_HCC__)
RadixType x = __half_as_ushort(v);
RadixType mask = -((x >> 15)) | 0x8000;
RadixType mask = (x & 0x00008000) ? 0x0000ffff : 0x00008000;
return (v == v) ? (x ^ mask) : 0xffff;
#else
assert(false);
@ -135,7 +135,7 @@ struct TopKTypeConfig<at::Half> {
static inline __device__ at::Half deconvert(RadixType v) {
#if defined(__CUDA_ARCH__) || defined(__HIP_PLATFORM_HCC__)
RadixType mask = ((v >> 15) - 1) | 0x8000;
RadixType mask = (v & 0x00008000) ? 0x00008000 : 0x0000ffff;
return __ushort_as_half(v ^ mask);
#else
assert(false);

View File

@ -44,6 +44,7 @@ Tensor& eye_out_cuda(Tensor& result, int64_t n, int64_t m) {
}
Tensor empty_cuda(IntArrayRef size, const TensorOptions& options, c10::optional<MemoryFormat> optional_memory_format) {
TORCH_CHECK(!isComplexType(at::typeMetaToScalarType(options.dtype())), "Complex dtype not supported.");
AT_ASSERT(options.device().type() == at::DeviceType::CUDA);
TORCH_INTERNAL_ASSERT(impl::variable_excluded_from_dispatch());
TORCH_CHECK(!options.pinned_memory(), "Only dense CPU tensors can be pinned");

View File

@ -238,18 +238,12 @@
- func: real(Tensor self) -> Tensor
use_c10_dispatcher: full
variants: function, method
supports_named_tensor: True
- func: real.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
variants: function
supports_named_tensor: True
- func: imag(Tensor self) -> Tensor
use_c10_dispatcher: full
variants: function, method
supports_named_tensor: True
- func: imag.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
variants: function
supports_named_tensor: True
- func: conj(Tensor self) -> Tensor
@ -2872,7 +2866,7 @@
- func: true_divide.Tensor(Tensor self, Tensor other) -> Tensor
use_c10_dispatcher: full
variants: function
variants: function, method
dispatch:
CPU: true_divide
CUDA: true_divide
@ -2880,6 +2874,15 @@
SparseCUDA: true_divide_sparse
supports_named_tensor: True
- func: true_divide_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
variants: method
dispatch:
CPU: true_divide_
CUDA: true_divide_
SparseCPU: true_divide_sparse_
SparseCUDA: true_divide_sparse_
supports_named_tensor: True
- func: true_divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
dispatch:
CPU: true_divide_out
@ -2890,7 +2893,11 @@
- func: true_divide.Scalar(Tensor self, Scalar other) -> Tensor
use_c10_dispatcher: full
variants: function
variants: function, method
supports_named_tensor: True
- func: true_divide_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
variants: method
supports_named_tensor: True
- func: trunc(Tensor self) -> Tensor

View File

@ -272,6 +272,10 @@ SparseTensor& true_divide_out_sparse_scalar(
return true_divide_out_sparse_zerodim(result, dividend, wrapped_scalar_tensor(divisor));
}
Tensor& true_divide_sparse_(Tensor& self, const Tensor& divisor) {
return true_divide_out_sparse_zerodim(self, self, divisor);
}
// --------------------------------------------------------------------
// floor_divide(SparseTensor, Scalar)
// --------------------------------------------------------------------

View File

@ -138,7 +138,7 @@ SparseTensor coalesce_sparse_cuda(const SparseTensor& self) {
// broadcasting logic; instead, it will blast the elements from one
// to the other so long as the numel is the same
indicesSlice.copy_(indices1D);
indices1D.div_(self.size(d));
indices1D.floor_divide_(self.size(d));
indicesSlice.add_(indices1D, -self.size(d));
}
}

View File

@ -14,7 +14,7 @@ namespace xnnpack {
namespace {
torch::jit::class_<XNNPackLinearOpContext> register_xnnpack_linear_op_context_class() {
static auto register_linear_op_context_class =
torch::jit::class_<XNNPackLinearOpContext>("XNNPackLinearOpContext")
torch::jit::class_<XNNPackLinearOpContext>("xnnpack", "XNNPackLinearOpContext")
.def_pickle(
[](const c10::intrusive_ptr<XNNPackLinearOpContext>& op_context)
-> SerializationTypeLinearPrePack { // __getstate__
@ -38,7 +38,7 @@ torch::jit::class_<XNNPackLinearOpContext> register_xnnpack_linear_op_context_cl
torch::jit::class_<XNNPackConv2dOpContext> register_xnnpack_conv2d_op_context_class() {
static auto register_conv2d_op_context_class =
torch::jit::class_<XNNPackConv2dOpContext>("XNNPackConv2dOpContext")
torch::jit::class_<XNNPackConv2dOpContext>("xnnpack", "XNNPackConv2dOpContext")
.def_pickle(
[](const c10::intrusive_ptr<XNNPackConv2dOpContext>& op_context)
-> SerializationTypeConv2dPrePack { // __getstate__
@ -74,25 +74,25 @@ static auto registry =
// Registering under _xnnpack namespace for now. As we add more backend requiring similar functionality
// We can refactor the code and use a better namespace.
torch::RegisterOperators()
.op("_xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> __torch__.torch.classes.XNNPackLinearOpContext",
.op("_xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> __torch__.torch.classes.xnnpack.XNNPackLinearOpContext",
torch::RegisterOperators::options()
.aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
.kernel<internal::linear::LinearPrePack>(
DispatchKey::CPUTensorId))
.op("_xnnpack::linear_packed(Tensor X, __torch__.torch.classes.XNNPackLinearOpContext W_prepack) -> Tensor Y",
.op("_xnnpack::linear_packed(Tensor X, __torch__.torch.classes.xnnpack.XNNPackLinearOpContext W_prepack) -> Tensor Y",
torch::RegisterOperators::options()
.aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
.kernel<internal::linear::LinearPacked>(
DispatchKey::CPUTensorId))
.op("_xnnpack::conv2d_prepack(Tensor W, Tensor? B, int[2] stride, "
"int[2] padding, int[2] dilation, int groups) "
"-> __torch__.torch.classes.XNNPackConv2dOpContext",
"-> __torch__.torch.classes.xnnpack.XNNPackConv2dOpContext",
torch::RegisterOperators::options()
.aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
.kernel<internal::convolution2d::Conv2dPrePack>(
DispatchKey::CPUTensorId))
.op("_xnnpack::conv2d_packed(Tensor X, "
"__torch__.torch.classes.XNNPackConv2dOpContext W_prepack) -> Tensor Y",
"__torch__.torch.classes.xnnpack.XNNPackConv2dOpContext W_prepack) -> Tensor Y",
torch::RegisterOperators::options()
.aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
.kernel<internal::convolution2d::Conv2dPacked>(

View File

@ -423,6 +423,85 @@ class CAFFE2_API Tensor {
// ~~~~~ Autograd API ~~~~~
/// \fn bool is_leaf() const;
///
/// All Tensors that have `requires_grad()` which is ``false`` will be leaf Tensors by convention.
///
/// For Tensors that have `requires_grad()` which is ``true``, they will be leaf Tensors if they were
/// created by the user. This means that they are not the result of an operation and so
/// `grad_fn()` is `nullptr`.
///
/// Only leaf Tensors will have their `grad()` populated during a call to `backward()`.
/// To get `grad()` populated for non-leaf Tensors, you can use `retain_grad()`.
///
/// Example:
/// @code
/// auto a = torch::rand(10, torch::requires_grad());
/// std::cout << a.is_leaf() << std::endl; // prints `true`
///
/// auto b = torch::rand(10, torch::requires_grad()).to(torch::kCUDA);
/// std::cout << b.is_leaf() << std::endl; // prints `false`
/// // b was created by the operation that cast a cpu Tensor into a cuda Tensor
///
/// auto c = torch::rand(10, torch::requires_grad()) + 2;
/// std::cout << c.is_leaf() << std::endl; // prints `false`
/// // c was created by the addition operation
///
/// auto d = torch::rand(10).cuda();
/// std::cout << d.is_leaf() << std::endl; // prints `true`
/// // d does not require gradients and so has no operation creating it (that is tracked by the autograd engine)
///
/// auto e = torch::rand(10).cuda().requires_grad_();
/// std::cout << e.is_leaf() << std::endl; // prints `true`
/// // e requires gradients and has no operations creating it
///
/// auto f = torch::rand(10, torch::device(torch::kCUDA).requires_grad(true));
/// std::cout << f.is_leaf() << std::endl; // prints `true`
/// // f requires grad, has no operation creating it
/// @endcode
/// \fn void backward(const Tensor & gradient={}, bool keep_graph=false, bool create_graph=false) const;
///
/// Computes the gradient of current tensor with respect to graph leaves.
///
/// The graph is differentiated using the chain rule. If the tensor is
/// non-scalar (i.e. its data has more than one element) and requires
/// gradient, the function additionally requires specifying ``gradient``.
/// It should be a tensor of matching type and location, that contains
/// the gradient of the differentiated function w.r.t. this Tensor.
///
/// This function accumulates gradients in the leaves - you might need to
/// zero them before calling it.
///
/// \param gradient Gradient w.r.t. the
/// tensor. If it is a tensor, it will be automatically converted
/// to a Tensor that does not require grad unless ``create_graph`` is True.
/// None values can be specified for scalar Tensors or ones that
/// don't require grad. If a None value would be acceptable then
/// this argument is optional.
/// \param keep_graph If ``false``, the graph used to compute
/// the grads will be freed. Note that in nearly all cases setting
/// this option to True is not needed and often can be worked around
/// in a much more efficient way. Defaults to the value of
/// ``create_graph``.
/// \param create_graph If ``true``, graph of the derivative will
/// be constructed, allowing to compute higher order derivative
/// products. Defaults to ``false``.
/// \fn Tensor detach() const;
///
/// Returns a new Tensor, detached from the current graph.
/// The result will never require gradient.
/// \fn Tensor & detach_() const;
///
/// Detaches the Tensor from the graph that created it, making it a leaf.
/// Views cannot be detached in-place.
/// \fn void retain_grad() const;
///
/// Enables .grad() for non-leaf Tensors.
Tensor& set_requires_grad(bool requires_grad) {
impl_->set_requires_grad(requires_grad);
return *this;
@ -431,9 +510,16 @@ class CAFFE2_API Tensor {
return impl_->requires_grad();
}
/// Return a mutable reference to the gradient. This is conventionally
/// used as `t.grad() = x` to set a gradient to a completely new tensor.
Tensor& grad() {
return impl_->grad();
}
/// This function returns an undefined tensor by default and returns a defined tensor
/// the first time a call to `backward()` computes gradients for this Tensor.
/// The attribute will then contain the gradients computed and future calls
/// to `backward()` will accumulate (add) gradients into it.
const Tensor& grad() const {
return impl_->grad();
}
@ -505,11 +591,38 @@ class CAFFE2_API Tensor {
template <typename T>
using hook_return_var_t = std::enable_if_t<std::is_same<typename std::result_of<T&(Tensor)>::type, Tensor>::value, unsigned>;
// Returns the index of the hook in the list which can be used to remove hook
// Register a hook with no return value
/// Registers a backward hook.
///
/// The hook will be called every time a gradient with respect to the Tensor is computed.
/// The hook should have one of the following signature:
/// ```
/// hook(Tensor grad) -> Tensor
/// ```
/// ```
/// hook(Tensor grad) -> void
/// ```
/// The hook should not modify its argument, but it can optionally return a new gradient
/// which will be used in place of `grad`.
///
/// This function returns the index of the hook in the list which can be used to remove hook.
///
/// Example:
/// @code
/// auto v = torch::tensor({0., 0., 0.}, torch::requires_grad());
/// auto h = v.register_hook([](torch::Tensor grad){ return grad * 2; }); // double the gradient
/// v.backward(torch::tensor({1., 2., 3.}));
/// // This prints:
/// // ```
/// // 2
/// // 4
/// // 6
/// // [ CPUFloatType{3} ]
/// // ```
/// std::cout << v.grad() << std::endl;
/// v.remove_hook(h); // removes the hook
/// @endcode
template <typename T>
hook_return_void_t<T> register_hook(T&& hook) const;
// Register a hook with variable return value
template <typename T>
hook_return_var_t<T> register_hook(T&& hook) const;
@ -518,7 +631,7 @@ private:
public:
// Remove hook at given position
/// Remove hook at given position
void remove_hook(unsigned pos) const;
// View Variables

View File

@ -69,12 +69,6 @@
# define TH_UNUSED
#endif
#if defined(__clang__)
#define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
#else
#define __ubsan_ignore_float_divide_by_zero__
#endif
#ifndef M_PI
# define M_PI 3.14159265358979323846
#endif

View File

@ -9,7 +9,7 @@ set(extra_src)
# loop over all types
foreach(THC_TYPE Byte Char Short Int Long Half Float Double)
# loop over files which need to be split between types (because of long compile times)
foreach(THC_FILE TensorSort TensorMathPointwise TensorMathReduce TensorMasked)
foreach(THC_FILE TensorSort TensorMathPointwise TensorMathReduce TensorMasked TensorTopK)
if(NOT EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/generated/THC${THC_FILE}${THC_TYPE}.cu")
FILE(WRITE "${CMAKE_CURRENT_SOURCE_DIR}/generated/THC${THC_FILE}${THC_TYPE}.cu"
"#include <THC/THC${THC_FILE}.cuh>\n#include <THC/THCTensor.hpp>\n\n#include <THC/generic/THC${THC_FILE}.cu>\n#include <THC/THCGenerate${THC_TYPE}Type.h>\n")
@ -56,7 +56,6 @@ set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS}
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorIndex.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorRandom.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorScatterGather.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorTopK.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorSort.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCSortUtils.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorMode.cu

View File

@ -1,19 +0,0 @@
#include <THC/THC.h>
#include <THC/THCReduceApplyUtils.cuh>
#include <THC/THCTensorCopy.h>
#include <THC/THCTensorMath.h>
#include <THC/THCAsmUtils.cuh>
#include <THC/THCScanUtils.cuh>
#include <THC/THCTensorTypeUtils.cuh>
#include <THC/THCTensorMathReduce.cuh>
#include <ATen/WrapDimUtils.h>
#include <algorithm> // for std::min
#if CUDA_VERSION >= 7000 || defined __HIP_PLATFORM_HCC__
#include <thrust/system/cuda/execution_policy.h>
#endif
#include <THC/THCTensorTopK.cuh>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateAllTypes.h>

View File

@ -1,6 +1,21 @@
#ifndef THC_TENSOR_TOPK_CUH
#define THC_TENSOR_TOPK_CUH
#include <THC/THC.h>
#include <THC/THCReduceApplyUtils.cuh>
#include <THC/THCTensorCopy.h>
#include <THC/THCTensorMath.h>
#include <THC/THCAsmUtils.cuh>
#include <THC/THCScanUtils.cuh>
#include <THC/THCTensorTypeUtils.cuh>
#include <THC/THCTensorMathReduce.cuh>
#include <ATen/WrapDimUtils.h>
#include <algorithm> // for std::min
#if CUDA_VERSION >= 7000 || defined __HIP_PLATFORM_HCC__
#include <thrust/system/cuda/execution_policy.h>
#endif
#include <c10/macros/Macros.h>
#include <ATen/native/cuda/SortingRadixSelect.cuh>
@ -52,6 +67,7 @@ __global__ void gatherTopK(TensorInfo<T, IndexType> input,
inputSliceStart, outputSliceSize,
inputSliceSize, inputWithinSliceStride,
smem, &topKValue);
const auto topKConverted = at::native::TopKTypeConfig<T>::convert(topKValue);
// Every value that is strictly less/greater than `pattern`
// (depending on sort dir) in sorted int format is in the top-K.
@ -74,11 +90,12 @@ __global__ void gatherTopK(TensorInfo<T, IndexType> input,
bool inRange = (i < inputSliceSize);
T v =
inRange ? doLdg(&inputSliceStart[i * inputWithinSliceStride]) : ScalarConvert<int, T>::to(0);
const auto convertedV = at::native::TopKTypeConfig<T>::convert(v);
bool hasTopK;
if (Order) {
hasTopK = inRange && (THCNumerics<T>::gt(v, topKValue));
hasTopK = inRange && (convertedV > topKConverted);
} else {
hasTopK = inRange && (THCNumerics<T>::lt(v, topKValue));
hasTopK = inRange && (convertedV < topKConverted);
}
int index;
@ -111,7 +128,8 @@ __global__ void gatherTopK(TensorInfo<T, IndexType> input,
bool inRange = (i < inputSliceSize);
T v =
inRange ? doLdg(&inputSliceStart[i * inputWithinSliceStride]) : ScalarConvert<int, T>::to(0);
bool hasTopK = inRange && (THCNumerics<T>::eq(v, topKValue));
const auto convertedV = at::native::TopKTypeConfig<T>::convert(v);
bool hasTopK = inRange && (convertedV == topKConverted);
int index;
int carry;

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateByteType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateCharType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateDoubleType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateFloatType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateHalfType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateIntType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateLongType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateShortType.h>

View File

@ -23,6 +23,14 @@
#include "c10/macros/Export.h"
#if defined(__clang__)
#define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
#define __ubsan_ignore_float_cast_overflow__ __attribute__((no_sanitize("float-cast-overflow")))
#else
#define __ubsan_ignore_float_divide_by_zero__
#define __ubsan_ignore_float_cast_overflow__
#endif
// Disable the copy and assignment operator for a class. Note that this will
// disable the usage of the class in std containers.
#define C10_DISABLE_COPY_AND_ASSIGN(classname) \

View File

@ -66,24 +66,44 @@ void Error::AppendMessage(const std::string& new_msg) {
namespace Warning {
namespace {
WarningHandler* getHandler() {
WarningHandler* getBaseHandler() {
static WarningHandler base_warning_handler_ = WarningHandler();
return &base_warning_handler_;
};
static thread_local WarningHandler* warning_handler_ = getHandler();
class ThreadWarningHandler {
public:
ThreadWarningHandler() = delete;
static WarningHandler* get_handler() {
if (!warning_handler_) {
warning_handler_ = getBaseHandler();
}
return warning_handler_;
}
static void set_handler(WarningHandler* handler) {
warning_handler_ = handler;
}
private:
static thread_local WarningHandler* warning_handler_;
};
thread_local WarningHandler* ThreadWarningHandler::warning_handler_ = nullptr;
}
void warn(SourceLocation source_location, const std::string& msg) {
warning_handler_->process(source_location, msg);
ThreadWarningHandler::get_handler()->process(source_location, msg);
}
void set_warning_handler(WarningHandler* handler) noexcept(true) {
warning_handler_ = handler;
ThreadWarningHandler::set_handler(handler);
}
WarningHandler* get_warning_handler() noexcept(true) {
return warning_handler_;
return ThreadWarningHandler::get_handler();
}
} // namespace Warning

View File

@ -67,7 +67,7 @@ struct maybe_real<true, src_t> {
template <typename dest_t, typename src_t>
struct static_cast_with_inter_type {
C10_HOST_DEVICE static inline dest_t apply(src_t src) {
C10_HOST_DEVICE __ubsan_ignore_float_cast_overflow__ static inline dest_t apply(src_t src) {
constexpr bool real = needs_real<dest_t, src_t>::value;
return static_cast<dest_t>(
static_cast<inter_copy_type_t<dest_t>>(maybe_real<real, src_t>::apply(src)));

View File

@ -748,7 +748,7 @@ if (NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE)
target_include_directories(torch_cuda PUBLIC "${NVTOOLEXT_HOME}/include")
# -INCLUDE is used to ensure torch_cuda is linked against in a project that relies on it.
# Related issue: https://github.com/pytorch/pytorch/issues/31611
target_link_libraries(torch_cuda INTERFACE "-INCLUDE:\"?warp_size@cuda@at@@YAHXZ\"")
target_link_libraries(torch_cuda INTERFACE "-INCLUDE:?warp_size@cuda@at@@YAHXZ")
elseif(APPLE)
set(TORCH_CUDA_LIBRARIES
@ -949,6 +949,31 @@ if (USE_OPENMP AND OPENMP_FOUND)
target_link_libraries(torch_cpu PRIVATE ${OpenMP_CXX_LIBRARIES})
endif()
if ($ENV{TH_BINARY_BUILD})
if (NOT MSVC AND USE_CUDA AND NOT APPLE)
# Note [Extra MKL symbols for MAGMA in torch_cpu]
#
# When we build CUDA libraries and link against MAGMA, MAGMA makes use of
# some BLAS symbols in its CPU fallbacks when it has no GPU versions
# of kernels. Previously, we ensured the BLAS symbols were filled in by
# MKL by linking torch_cuda with BLAS, but when we are statically linking
# against MKL (when we do wheel builds), this actually ends up pulling in a
# decent chunk of MKL into torch_cuda, inflating our torch_cuda binary
# size by 8M. torch_cpu exposes most of the MKL symbols we need, but
# empirically we determined that there are four which it doesn't provide. If
# we link torch_cpu with these --undefined symbols, we can ensure they
# do get pulled in, and then we can avoid statically linking in MKL to
# torch_cuda at all!
#
# We aren't really optimizing for binary size on Windows (and this link
# line doesn't work on Windows), so don't do it there.
#
# These linker commands do not work on OS X, do not attempt this there.
# (It shouldn't matter anyway, though, because OS X has dropped CUDA support)
set_target_properties(torch_cpu PROPERTIES LINK_FLAGS "-Wl,--undefined=mkl_lapack_slaed0 -Wl,--undefined=mkl_lapack_dlaed0 -Wl,--undefined=mkl_lapack_dormql -Wl,--undefined=mkl_lapack_sormql")
endif()
endif()
target_link_libraries(torch_cpu PUBLIC c10)
target_link_libraries(torch_cpu PUBLIC ${Caffe2_PUBLIC_DEPENDENCY_LIBS})
target_link_libraries(torch_cpu PRIVATE ${Caffe2_DEPENDENCY_LIBS})

View File

@ -1,6 +1,8 @@
#include "caffe2/operators/fused_rowwise_nbitfake_conversion_ops.h"
#include <fp16.h>
#ifdef __AVX__
#include <immintrin.h>
#endif
#include "c10/util/Registry.h"
namespace caffe2 {

View File

@ -50,8 +50,13 @@ __global__ void ReluCUDAKernel<half2>(const int N, const half2* X, half2* Y) {
Y[i] = __hmul2(__hgt2(__ldg(X + i), kZero), __ldg(X + i));
#else
const float2 xx = __half22float2(X[i]);
Y[i] =
__floats2half2_rn(xx.x > 0.0f ? xx.x : 0.0f, xx.y > 0.0f ? xx.y : 0.0f);
// There are explicit cast to float here, because it may otherwise cause ambiguity on ROCm and can be triggered
// sometimes:
//
// error: conditional expression is ambiguous; 'const hip_impl::Scalar_accessor<float, Native_vec_, 0>' can be
// converted to 'float' and vice versa
Y[i] = __floats2half2_rn(xx.x > 0.0f ? static_cast<float>(xx.x) : 0.0f,
xx.y > 0.0f ? static_cast<float>(xx.y) : 0.0f);
#endif
}
}
@ -100,8 +105,14 @@ __global__ void ReluGradientCUDAKernel<half2>(
#else
const float2 dy = __half22float2(dY[i]);
const float2 yy = __half22float2(Y[i]);
dX[i] =
__floats2half2_rn(yy.x > 0.0f ? dy.x : 0.0f, yy.y > 0.0f ? dy.y : 0.0f);
// There are explicit cast to float here, because it may otherwise cause ambiguity on ROCm and can be triggered
// sometimes:
//
// error: conditional expression is ambiguous; 'const hip_impl::Scalar_accessor<float, Native_vec_, 1>' can be
// converted to 'float' and vice versa
dX[i] = __floats2half2_rn(yy.x > 0.0f ? static_cast<float>(dy.x) : 0.0f,
yy.y > 0.0f ? static_cast<float>(dy.y) : 0.0f);
#endif
}
}

View File

@ -15,6 +15,7 @@ if (NOT __NCCL_INCLUDED)
# this second replacement is needed when there are multiple archs
string(REPLACE ";-gencode" " -gencode" NVCC_GENCODE "${NVCC_GENCODE}")
set(__NCCL_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/nccl")
ExternalProject_Add(nccl_external
SOURCE_DIR ${PROJECT_SOURCE_DIR}/third_party/nccl/nccl
BUILD_IN_SOURCE 1
@ -30,20 +31,49 @@ if (NOT __NCCL_INCLUDED)
"CUDA_HOME=${CUDA_TOOLKIT_ROOT_DIR}"
"NVCC=${CUDA_NVCC_EXECUTABLE}"
"NVCC_GENCODE=${NVCC_GENCODE}"
"BUILDDIR=${CMAKE_CURRENT_BINARY_DIR}/nccl"
"BUILDDIR=${__NCCL_BUILD_DIR}"
"VERBOSE=0"
"-j"
BUILD_BYPRODUCTS "${CMAKE_CURRENT_BINARY_DIR}/nccl/lib/libnccl_static.a"
BUILD_BYPRODUCTS "${__NCCL_BUILD_DIR}/lib/libnccl_static.a"
INSTALL_COMMAND ""
)
# Detect objcopy version
execute_process (COMMAND "${CMAKE_OBJCOPY}" "--version" OUTPUT_VARIABLE OBJCOPY_VERSION_STR)
string(REGEX REPLACE "GNU objcopy version ([0-9])\\.([0-9]+).*" "\\1" OBJCOPY_VERSION_MAJOR ${OBJCOPY_VERSION_STR})
string(REGEX REPLACE "GNU objcopy version ([0-9])\\.([0-9]+).*" "\\2" OBJCOPY_VERSION_MINOR ${OBJCOPY_VERSION_STR})
if ((${OBJCOPY_VERSION_MAJOR} GREATER 2) OR ((${OBJCOPY_VERSION_MAJOR} EQUAL 2) AND (${OBJCOPY_VERSION_MINOR} GREATER 27)))
message(WARNING "Enabling NCCL library slimming")
add_custom_command(
OUTPUT "${__NCCL_BUILD_DIR}/lib/libnccl_slim_static.a"
DEPENDS nccl_external
COMMAND "${CMAKE_COMMAND}" -E make_directory "${__NCCL_BUILD_DIR}/objects"
COMMAND cd objects
COMMAND "${CMAKE_AR}" x "${__NCCL_BUILD_DIR}/lib/libnccl_static.a"
COMMAND for obj in all_gather_* all_reduce_* broadcast_* reduce_*.o$<SEMICOLON> do "${CMAKE_OBJCOPY}" --remove-relocations .nvFatBinSegment --remove-section __nv_relfatbin $$obj$<SEMICOLON> done
COMMAND "${CMAKE_AR}" cr "${__NCCL_BUILD_DIR}/lib/libnccl_slim_static.a" "*.o"
COMMAND cd -
COMMAND "${CMAKE_COMMAND}" -E remove_directory "${__NCCL_BUILD_DIR}/objects"
WORKING_DIRECTORY "${__NCCL_BUILD_DIR}"
COMMENT "Slimming NCCL"
)
add_custom_target(nccl_slim_external DEPENDS "${__NCCL_BUILD_DIR}/lib/libnccl_slim_static.a")
set(__NCCL_LIBRARY_DEP nccl_slim_external)
set(NCCL_LIBRARIES ${__NCCL_BUILD_DIR}/lib/libnccl_slim_static.a)
else()
message(WARNING "Objcopy version is too old to support NCCL library slimming")
set(__NCCL_LIBRARY_DEP nccl_external)
set(NCCL_LIBRARIES ${__NCCL_BUILD_DIR}/lib/libnccl_static.a)
endif()
set(NCCL_FOUND TRUE)
add_library(__caffe2_nccl INTERFACE)
# The following old-style variables are set so that other libs, such as Gloo,
# can still use it.
set(NCCL_INCLUDE_DIRS ${CMAKE_CURRENT_BINARY_DIR}/nccl/include)
set(NCCL_LIBRARIES ${CMAKE_CURRENT_BINARY_DIR}/nccl/lib/libnccl_static.a)
add_dependencies(__caffe2_nccl nccl_external)
set(NCCL_INCLUDE_DIRS ${__NCCL_BUILD_DIR}/include)
add_dependencies(__caffe2_nccl ${__NCCL_LIBRARY_DEP})
target_link_libraries(__caffe2_nccl INTERFACE ${NCCL_LIBRARIES})
target_include_directories(__caffe2_nccl INTERFACE ${NCCL_INCLUDE_DIRS})
endif()

View File

@ -56,6 +56,10 @@ INPUT = ../../../aten/src/ATen/ATen.h \
../../../c10/cuda/CUDAStream.h \
../../../torch/csrc/api/include \
../../../torch/csrc/api/src \
../../../torch/csrc/autograd/autograd.h \
../../../torch/csrc/autograd/custom_function.h \
../../../torch/csrc/autograd/function.h \
../../../torch/csrc/autograd/variable.h \
../../../torch/csrc/autograd/generated/variable_factories.h \
../../../torch/csrc/jit/runtime/custom_operator.h \
../../../torch/csrc/jit/serialization/import.h \

View File

@ -281,7 +281,9 @@ change one property, this is quite practical.
In conclusion, we can now compare how ``TensorOptions`` defaults, together with
the abbreviated API for creating ``TensorOptions`` using free functions, allow
tensor creation in C++ with the same convenience as in Python. Compare this
call in Python::
call in Python:
.. code-block:: python
torch.randn(3, 4, dtype=torch.float32, device=torch.device('cuda', 1), requires_grad=True)

View File

@ -0,0 +1,99 @@
Tensor Indexing API
===================
Indexing a tensor in the PyTorch C++ API works very similar to the Python API.
All index types such as ``None`` / ``...`` / integer / boolean / slice / tensor
are available in the C++ API, making translation from Python indexing code to C++
very simple. The main difference is that, instead of using the ``[]``-operator
similar to the Python API syntax, in the C++ API the indexing methods are:
- ``torch::Tensor::index`` (`link <https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor5indexE8ArrayRefIN2at8indexing11TensorIndexEE>`_)
- ``torch::Tensor::index_put_`` (`link <https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4N2at6Tensor10index_put_E8ArrayRefIN2at8indexing11TensorIndexEERK6Tensor>`_)
It's also important to note that index types such as ``None`` / ``Ellipsis`` / ``Slice``
live in the ``torch::indexing`` namespace, and it's recommended to put ``using namespace torch::indexing``
before any indexing code for convenient use of those index types.
Here are some examples of translating Python indexing code to C++:
Getter
------
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| Python | C++ (assuming ``using namespace torch::indexing``) |
+==========================================================+======================================================================================+
| ``tensor[None]`` | ``tensor.index({None})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[Ellipsis, ...]`` | ``tensor.index({Ellipsis, "..."})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[1, 2]`` | ``tensor.index({1, 2})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[True, False]`` | ``tensor.index({true, false})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[1::2]`` | ``tensor.index({Slice(1, None, 2)})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[torch.tensor([1, 2])]`` | ``tensor.index({torch::tensor({1, 2})})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[..., 0, True, 1::2, torch.tensor([1, 2])]`` | ``tensor.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
Setter
------
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| Python | C++ (assuming ``using namespace torch::indexing``) |
+==========================================================+======================================================================================+
| ``tensor[None] = 1`` | ``tensor.index_put_({None}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[Ellipsis, ...] = 1`` | ``tensor.index_put_({Ellipsis, "..."}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[1, 2] = 1`` | ``tensor.index_put_({1, 2}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[True, False] = 1`` | ``tensor.index_put_({true, false}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[1::2] = 1`` | ``tensor.index_put_({Slice(1, None, 2)}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[torch.tensor([1, 2])] = 1`` | ``tensor.index_put_({torch::tensor({1, 2})}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[..., 0, True, 1::2, torch.tensor([1, 2])] = 1`` | ``tensor.index_put_({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
Translating between Python/C++ index types
------------------------------------------
The one-to-one translation between Python and C++ index types is as follows:
+-------------------------+------------------------------------------------------------------------+
| Python | C++ (assuming ``using namespace torch::indexing``) |
+=========================+========================================================================+
| ``None`` | ``None`` |
+-------------------------+------------------------------------------------------------------------+
| ``Ellipsis`` | ``Ellipsis`` |
+-------------------------+------------------------------------------------------------------------+
| ``...`` | ``"..."`` |
+-------------------------+------------------------------------------------------------------------+
| ``123`` | ``123`` |
+-------------------------+------------------------------------------------------------------------+
| ``True`` | ``true`` |
+-------------------------+------------------------------------------------------------------------+
| ``False`` | ``false`` |
+-------------------------+------------------------------------------------------------------------+
| ``:`` or ``::`` | ``Slice()`` or ``Slice(None, None)`` or ``Slice(None, None, None)`` |
+-------------------------+------------------------------------------------------------------------+
| ``1:`` or ``1::`` | ``Slice(1, None)`` or ``Slice(1, None, None)`` |
+-------------------------+------------------------------------------------------------------------+
| ``:3`` or ``:3:`` | ``Slice(None, 3)`` or ``Slice(None, 3, None)`` |
+-------------------------+------------------------------------------------------------------------+
| ``::2`` | ``Slice(None, None, 2)`` |
+-------------------------+------------------------------------------------------------------------+
| ``1:3`` | ``Slice(1, 3)`` |
+-------------------------+------------------------------------------------------------------------+
| ``1::2`` | ``Slice(1, None, 2)`` |
+-------------------------+------------------------------------------------------------------------+
| ``:3:2`` | ``Slice(None, 3, 2)`` |
+-------------------------+------------------------------------------------------------------------+
| ``1:3:2`` | ``Slice(1, 3, 2)`` |
+-------------------------+------------------------------------------------------------------------+
| ``torch.tensor([1, 2])``| ``torch::tensor({1, 2})`` |
+-------------------------+------------------------------------------------------------------------+

View File

@ -1,4 +1,4 @@
sphinx
sphinx==2.4.4
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
sphinxcontrib.katex
matplotlib

View File

@ -13,6 +13,13 @@ use ``torch.float16`` (``half``). Some operations, like linear layers and convol
are much faster in ``float16``. Other operations, like reductions, often require the dynamic
range of ``float32``. Networks running in mixed precision try to match each operation to its appropriate datatype.
.. warning::
:class:`torch.cuda.amp.GradScaler` is not a complete implementation of automatic mixed precision.
:class:`GradScaler` is only useful if you manually run regions of your model in ``float16``.
If you aren't sure how to choose op precision manually, the master branch and nightly pip/conda
builds include a context manager that chooses op precision automatically wherever it's enabled.
See the `master documentation <https://pytorch.org/docs/master/amp.html>`_ for details.
.. contents:: :local:
.. _gradient-scaling:

View File

@ -395,6 +395,8 @@ of 16
.. autofunction:: all_gather_multigpu
.. _distributed-launch:
Launch utility
--------------

View File

@ -16,7 +16,6 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
:caption: Notes
notes/*
PyTorch on XLA Devices <http://pytorch.org/xla/>
.. toctree::
:maxdepth: 1
@ -46,7 +45,7 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
onnx
optim
quantization
rpc
rpc/index.rst
torch.random <random>
sparse
storage
@ -62,24 +61,16 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
name_inference
torch.__config__ <__config__>
.. toctree::
:glob:
:maxdepth: 2
:caption: torchvision Reference
torchvision/index
.. toctree::
:maxdepth: 1
:caption: torchaudio Reference
:caption: Libraries
torchaudio <https://pytorch.org/audio>
.. toctree::
:maxdepth: 1
:caption: torchtext Reference
torchtext <https://pytorch.org/text>
torchvision/index
TorchElastic <https://pytorch.org/elastic/>
TorchServe <https://pytorch.org/serve>
PyTorch on XLA Devices <http://pytorch.org/xla/>
.. toctree::
:glob:

View File

@ -790,21 +790,6 @@ New API:
m = torch.jit.script(MyModule())
Python 2
""""""""
If you are stuck on Python 2 and cannot use the class annotation syntax, you can use the ``__annotations__`` class member to directly apply type annotations.
.. testcode::
from typing import Dict
class MyModule(torch.jit.ScriptModule):
__annotations__ = {'my_dict': Dict[str, int]}
def __init__(self):
super(MyModule, self).__init__()
self.my_dict = {}
self.my_int = 20
Constants
^^^^^^^^^

View File

@ -185,13 +185,10 @@ MyPy-style type annotations using the types listed above.
...
In our examples, we use comment-based type hints to ensure Python 2
compatibility as well.
An empty list is assumed to be ``List[Tensor]`` and empty dicts
``Dict[str, Tensor]``. To instantiate an empty list or dict of other types,
use `Python 3 type hints`_. If you are on Python 2, you can use ``torch.jit.annotate``.
use `Python 3 type hints`_.
Example (type annotations for Python 3):
@ -217,31 +214,6 @@ Example (type annotations for Python 3):
x = torch.jit.script(EmptyDataStructures())
Example (``torch.jit.annotate`` for Python 2):
.. testcode::
import torch
import torch.nn as nn
from typing import Dict, List, Tuple
class EmptyDataStructures(torch.nn.Module):
def __init__(self):
super(EmptyDataStructures, self).__init__()
def forward(self, x):
# type: (Tensor) -> Tuple[List[Tuple[int, float]], Dict[str, int]]
# This annotates the list to be a `List[Tuple[int, float]]`
my_list = torch.jit.annotate(List[Tuple[int, float]], [])
for i in range(10):
my_list.append((i, float(x.item())))
my_dict = torch.jit.annotate(Dict[str, int], {})
return my_list, my_dict
x = torch.jit.script(EmptyDataStructures())
Optional Type Refinement
@ -856,28 +828,8 @@ Supported constant Python types are
* tuples containing supported types
* ``torch.nn.ModuleList`` which can be used in a TorchScript for loop
.. note::
If you are on Python 2, you can mark an attribute as a constant by adding
its name to the ``__constants__`` property of the class:
.. testcode::
import torch
import torch.nn as nn
class Foo(nn.Module):
__constants__ = ['a']
def __init__(self):
super(Foo, self).__init__()
self.a = 1 + 4
def forward(self, input):
return self.a + input
f = torch.jit.script(Foo())
|
.. _module attributes:
@ -924,32 +876,3 @@ Example:
f = torch.jit.script(Foo({'hi': 2}))
.. note::
If you are on Python 2, you can mark an attribute's type by adding it to
the ``__annotations__`` class property as a dictionary of attribute name to
type
.. testcode::
from typing import List, Dict
class Foo(nn.Module):
__annotations__ = {'words': List[str], 'some_dict': Dict[str, int]}
def __init__(self, a_dict):
super(Foo, self).__init__()
self.words = []
self.some_dict = a_dict
# `int`s can be inferred
self.my_int = 10
def forward(self, input):
# type: (str) -> int
self.words.append(input)
return self.some_dict[input] + self.my_int
f = torch.jit.script(Foo({'hi': 2}))
|

View File

@ -30,9 +30,7 @@ Sharing CUDA tensors
--------------------
Sharing CUDA tensors between processes is supported only in Python 3, using
a ``spawn`` or ``forkserver`` start methods. :mod:`python:multiprocessing` in
Python 2 can only create subprocesses using ``fork``, and it's not supported
by the CUDA runtime.
a ``spawn`` or ``forkserver`` start methods.
Unlike CPU tensors, the sending process is required to keep the original tensor
as long as the receiving process retains a copy of the tensor. The refcounting is

View File

@ -187,7 +187,7 @@ mentioning all of them as in required by :meth:`~Tensor.permute`.
# Move the F (dim 5) and E dimension (dim 4) to the front while keeping
# the rest in the same order
>>> tensor.permute(5, 4, 0, 1, 2, 3)
>>> named_tensor.align_to('F', 'E', ...) # Use '...' instead in Python 2
>>> named_tensor.align_to('F', 'E', ...)
Use :meth:`~Tensor.flatten` and :meth:`~Tensor.unflatten` to flatten and unflatten
dimensions, respectively. These methods are more verbose than :meth:`~Tensor.view`
@ -317,4 +317,3 @@ operators, see :ref:`name_inference_reference-doc`.
.. warning::
The named tensor API is experimental and subject to change.

View File

@ -5,6 +5,13 @@ Automatic Mixed Precision examples
.. currentmodule:: torch.cuda.amp
.. warning::
:class:`torch.cuda.amp.GradScaler` is not a complete implementation of automatic mixed precision.
:class:`GradScaler` is only useful if you manually run regions of your model in ``float16``.
If you aren't sure how to choose op precision manually, the master branch and nightly pip/conda
builds include a context manager that chooses op precision automatically wherever it's enabled.
See the `master documentation <https://pytorch.org/docs/master/amp.html>`_ for details.
.. contents:: :local:
.. _gradient-scaling-examples:

View File

@ -306,20 +306,30 @@ to overlap data transfers with computation.
You can make the :class:`~torch.utils.data.DataLoader` return batches placed in
pinned memory by passing ``pin_memory=True`` to its constructor.
.. _cuda-nn-dataparallel-instead:
.. _cuda-nn-ddp-instead:
Use nn.DataParallel instead of multiprocessing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Most use cases involving batched inputs and multiple GPUs should default to
using :class:`~torch.nn.DataParallel` to utilize more than one GPU. Even with
the GIL, a single Python process can saturate multiple GPUs.
As of version 0.1.9, large numbers of GPUs (8+) might not be fully utilized.
However, this is a known issue that is under active development. As always,
test your use case.
using :class:`~torch.nn.parallel.DistributedDataParallel` to utilize more
than one GPU.
There are significant caveats to using CUDA models with
:mod:`~torch.multiprocessing`; unless care is taken to meet the data handling
requirements exactly, it is likely that your program will have incorrect or
undefined behavior.
It is recommended to use :class:`~torch.nn.parallel.DistributedDataParallel`,
instead of :class:`~torch.nn.DataParallel` to do multi-GPU training, even if
there is only a single node.
The difference between :class:`~torch.nn.parallel.DistributedDataParallel` and
:class:`~torch.nn.DataParallel` is: :class:`~torch.nn.parallel.DistributedDataParallel`
uses multiprocessing where a process is created for each GPU, while
:class:`~torch.nn.DataParallel` uses multithreading. By using multiprocessing,
each GPU has its dedicated process, this avoids the performance overhead caused
by GIL of Python interpreter.
If you use :class:`~torch.nn.parallel.DistributedDataParallel`, you could use
`torch.distributed.launch` utility to launch your program, see :ref:`distributed-launch`.

View File

@ -27,10 +27,7 @@ others that require asynchronous operation.
CUDA in multiprocessing
-----------------------
The CUDA runtime does not support the ``fork`` start method. However,
:mod:`python:multiprocessing` in Python 2 can only create subprocesses using
``fork``. So Python 3 and either ``spawn`` or ``forkserver`` start method are
required to use CUDA in subprocesses.
The CUDA runtime does not support the ``fork`` start method. In Python 3, either the ``spawn`` or ``forkserver`` start method are
.. note::
The start method can be set via either creating a context with
@ -45,7 +42,7 @@ the consumer process has references to the tensor, and the refcounting can not
save you if the consumer process exits abnormally via a fatal signal. See
:ref:`this section <multiprocessing-cuda-sharing-details>`.
See also: :ref:`cuda-nn-dataparallel-instead`
See also: :ref:`cuda-nn-ddp-instead`
Best practices and tips

View File

@ -151,11 +151,6 @@ Package not found in win-32 channel.
PyTorch doesn't work on 32-bit system. Please use Windows and
Python 64-bit version.
Why are there no Python 2 packages for Windows?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Because it's not stable enough. There're some issues that need to
be solved before we officially release it. You can build it by yourself.
Import error
^^^^^^^^^^^^
@ -290,4 +285,3 @@ tensors cannot succeed, there are two alternatives for this.
2. Share CPU tensors instead. Make sure your custom
:class:`~torch.utils.data.DataSet` returns CPU tensors.

24
docs/source/rpc/index.rst Normal file
View File

@ -0,0 +1,24 @@
.. _rpc-index:
Distributed RPC Framework
==============================
The distributed RPC framework provides mechanisms for multi-machine model training through a set of primitives to allow for remote communication, and a higher-level API to automatically differentiate models split across several machines.
- :ref:`distributed-rpc-framework`
Design Notes
-----------
The distributed autograd design note covers the design of the RPC-based distributed autograd framework that is useful for applications such as model parallel training.
- :ref:`distributed-autograd-design`
The RRef design note covers the design of the :ref:`rref` (Remote REFerence) protocol used to refer to values on remote workers by the framework.
- :ref:`remote-reference-protocol`
Tutorials
---------
The RPC tutorial introduces users to the RPC framework and provides two example applications using :ref:`torch.distributed.rpc<distributed-rpc-framework>` APIs.
- `Getting started with Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_tutorial.html>`__

View File

@ -8,6 +8,8 @@ training through a set of primitives to allow for remote communication, and a
higher-level API to automatically differentiate models split across several
machines.
.. warning ::
APIs in the RPC package are stable. There are multiple ongoing work items to improve performance and error handling, which will ship in future releases.
Basics

View File

@ -210,3 +210,25 @@ Example::
(1, 5)
For more information on ``torch.sparse_coo`` tensors, see :ref:`sparse-docs`.
torch.memory_format
------------
.. class:: torch.memory_format
A :class:`torch.memory_format` is an object representing the memory format on which a :class:`torch.Tensor` is
or will be allocated.
Possible values are:
- ``torch.contiguous_format``:
Tensor is or will be allocated in dense non-overlapping memory. Strides represented by values in decreasing order.
- ``torch.channels_last``:
Tensor is or will be allocated in dense non-overlapping memory. Strides represented by values in
``strides[0] > strides[2] > strides[3] > strides[1] == 1`` aka NHWC order.
- ``torch.preserve_format``:
Used in functions like `clone` to preserve the memory format of the input tensor. If input tensor is
allocated in dense non-overlapping memory, the output tensor strides will be copied from the input.
Otherwise output strides will follow ``torch.contiguous_format``

View File

@ -49,8 +49,10 @@ For reference, heres a full list of view ops in PyTorch:
- Basic slicing and indexing op, e.g. ``tensor[0, 2:, 1:7:2]`` returns a view of base ``tensor``, see note below.
- :meth:`~torch.Tensor.as_strided`
- :meth:`~torch.Tensor.detach`
- :meth:`~torch.Tensor.diagonal`
- :meth:`~torch.Tensor.expand`
- :meth:`~torch.Tensor.expand_as`
- :meth:`~torch.Tensor.narrow`
- :meth:`~torch.Tensor.permute`
- :meth:`~torch.Tensor.select`

Some files were not shown because too many files have changed in this diff Show More