Compare commits

...

110 Commits

Author SHA1 Message Date
b58f89b2e4 Use counter instead of vector of futures in _parallel_run (#36159) (#36334)
Summary:
This should be faster than allocating one mutex, flag and conditional variable per task.

Using `std::atomic<size_t>` to count remaing tasks is not sufficient,
because modification of remaining counter and signalling conditional variable must happen atomically,
otherwise `wait()` might get invoked after `notify_one()` was called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36159

Test Plan: CI

Differential Revision: D20905411

Pulled By: malfet

fbshipit-source-id: facaf599693649c3f43edafc49f369e90d2f60de
(cherry picked from commit 986a8fdd6a18d9110f8bde59361967139450966b)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-04-09 14:08:57 -07:00
87b6685c6b repr and _*state_dict for qRNN (#31540)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31540

Fixes #31468

Test Plan: Imported from OSS

Differential Revision: D19205894

Pulled By: z-a-f

fbshipit-source-id: 80c36f74aa20a125ea8d74a54e9905576f1bc6d7
2020-04-09 12:26:56 -04:00
f746f1b746 Revert "Avoid clone for sparse tensors during accumulation of grads. (#33427)"
This reverts commit b185359fb4ba4dcb0c048fd1d049da23eff88b27.
2020-04-09 11:33:55 -04:00
1379415150 Revert "AccumulateGrad: ensure sparse tensor indices and values refcount is always 1 (#34559)"
This reverts commit 2ce9513b0c8894987f6d42bfb57ff95b22e32c95.
2020-04-09 11:33:55 -04:00
7d638d2596 [v1.5.0] fix is_float_scale_factor warning (python and c++) (#36274)
* fix is_float_scale_factor warning

* fix python impl

Co-authored-by: Robin Lobel <divide@divideconcept.net>
Co-authored-by: Will Feng <willfeng@fb.com>
2020-04-09 11:31:13 -04:00
bad005d331 .circleci: Add binary builds/tests to run on release branches (#36283)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-04-08 16:37:24 -07:00
16d8a52407 [pytorch] Add error when PyTorch used with Python 2 (#36151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36151

Python 2 has reached end-of-life and is no longer supported by PyTorch. To avoid confusing behavior when trying to use PyTorch with Python 2, detect this case early and fail with a clear message.  This commit covers `import torch` only and not C++  for now.

Test Plan: waitforsandcastle

Reviewed By: dreiss

Differential Revision: D20894381

fbshipit-source-id: a1073b7a648e07cf10cda5a99a2cf4eee5a89230
2020-04-08 18:55:58 -04:00
a33b264588 Revert "Update docs for 1.5 to remove Python 2 references (#36116)"
This reverts commit 63dcd9eccc90136afdfb5d8130077ff1e917ba2e.
2020-04-08 18:51:13 -04:00
3a67e00889 [1.5 cherrypick] C++ Adam optimizer - corrected messages for check of default options (#36245)
* Corrected messages for check of default options

* Added 0<= betas < 1 range check, match python messages for check of betas

Co-authored-by: meganset <meganset@gmail.com>
2020-04-08 18:06:16 -04:00
6bd039551d Remove determine_from from test/run_test.py (#36256)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-04-08 14:58:23 -07:00
b6c3058d61 Exclude torch/csrc/cuda/*nccl* from clang-tidy (#36251)
Since workflow configures pytorch with 'USE_NCCL` set to 0, we can not tidy those files

(cherry picked from commit e172a6ef920b6838b67eb8f0020d78031df8cde5)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-04-08 13:37:16 -07:00
ed908b4fbc [release/1.5] Move all nccl from torch_python to torch_cuda (#36229)
* Remote dead code

`THCPModule_useNccl()` doesn't seem to be used anywhere

* Move all nccl calls from `torch_python` to `torch_cuda`

Because `torch_python` is supposed to be thin wrapper around torch

This ensures API parity between C++ and Python, as well as reduces `torch_python` binary size

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-04-08 10:39:20 -07:00
b66e0af58b s/repo.continuum.io/repo.anaconda.com/
Followup after  https://github.com/pytorch/pytorch/pull/36201

Per https://github.com/conda/conda/issues/6886  `repo.anaconda.com` should have been used since Feb 2019

Test Plan: CI
2020-04-08 13:05:04 -04:00
bf8a5ede96 [ONNX] fix size for opset 11 (#35984)
Summary:
Fixing size, as the aten op has updated to support 0 inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35984

Reviewed By: hl475

Differential Revision: D20858214

Pulled By: houseroad

fbshipit-source-id: 8ad0a0174a569455e89da6798eed403c8b162a47
2020-04-08 11:50:59 -04:00
c2bc5c56c5 Use repo.anaconda.com instead of repo.continuum.io (#36201)
Summary:
Per https://github.com/conda/conda/issues/6886  `repo.anaconda.com` should have been used since Feb 2019
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36201

Test Plan: CI

Differential Revision: D20910667

Pulled By: malfet

fbshipit-source-id: 3a191e2cae293e6f96dbb323853e84c07cd7aabc
2020-04-08 08:39:52 -07:00
db3c3ed662 Move test to test_jit_py3.py 2020-04-08 11:15:33 -04:00
9de4770bbd [v1.5.0] Group libraries in TOC and add PyTorch Elastic
Move XLA out of Notes and group with other libraries. Also adds link to PyTorch Elastic.
2020-04-08 11:08:39 -04:00
911a2a6b63 [BugFix] Fix compare_exchange_weak in DispatchStub.h (#35794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35794

### Summary

As PyTorch has gone in production on iOS for about week, we've spotted a few crashes (90 out of 20.3k ) related to DispatchStub.h. The major part of the crash log is pasted below (full crash information can be found at `bunnylol logview 1d285dc9172c877b679d0f8539da58f0`):

```
FBCameraFramework void at::native::DispatchStub<void (*)(at::TensorIterator&, c10::Scalar), at::native::add_stub>::operator()<at::TensorIterator&, c10::Scalar&>(c10::DeviceType, at::TensorIterator&, c10::Scalar&)(DispatchStub.h:0)
+FBCameraFramework at::native::add(at::Tensor const&, at::Tensor const&, c10::Scalar)(BinaryOps.cpp:53)
+FBCameraFramework at::CPUType::add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar)(CPUType.cpp:55)
+FBCameraFramework at::add(at::Tensor const&, at::Tensor const&, c10::Scalar)(Functions.h:1805)
+FBCameraFramework [inlined] c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::intrusive_ptr(c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>&&)(intrusive_ptr.h:0)
+FBCameraFramework [inlined] c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::intrusive_ptr(c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>&&)(intrusive_ptr.h:221)
+FBCameraFramework [inlined] at::Tensor::Tensor(at::Tensor&&)(TensorBody.h:93)
+FBCameraFramework [inlined] at::Tensor::Tensor(at::Tensor&&)(TensorBody.h:93)
+FBCameraFramework c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >::operator()(at::Tensor, at::Tensor, c10::Scalar)(kernel_lambda.h:23)
+FBCameraFramework [inlined] c10::guts::infer_function_traits<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> > >::type::return_type c10::detail::call_functor_with_args_from_stack_<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false, 0ul, 1ul, 2ul>(c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*, std::__1::vector<c10::IValue, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*::allocator<std::__1::vector> >*, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*::integer_sequence<unsigned long, 0ul, 1ul, 2ul>)(kernel_functor.h:210)
+FBCameraFramework [inlined] c10::guts::infer_function_traits<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> > >::type::return_type c10::detail::call_functor_with_args_from_stack<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false>(c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*, std::__1::vector<c10::IValue, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >*::allocator<std::__1::vector> >*)(kernel_functor.h:218)
+FBCameraFramework c10::detail::make_boxed_from_unboxed_functor<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false, void>::call(c10::OperatorKernel*, c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >*)(kernel_functor.h:250)
+FBCameraFramework [inlined] (anonymous namespace)::variable_fallback_kernel(c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >*)(VariableFallbackKernel.cpp:32)
+FBCameraFramework void c10::KernelFunction::make_boxed_function<&((anonymous namespace)::variable_fallback_kernel(c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >*))>(c10::OperatorKernel*, c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >*)(KernelFunction_impl.h:21)
+FBCameraFramework torch::jit::mobile::InterpreterState::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&)(interpreter.cpp:0)
+FBCameraFramework torch::jit::mobile::Function::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) const(function.cpp:59)
+FBCameraFramework torch::jit::mobile::Module::run_method(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(module.cpp:51)
+FBCameraFramework [inlined] torch::jit::mobile::Module::forward(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(module.h:28)
```
The problem is `compare_exchange_weak` is not guaranteed to be successful in one shot, as described in  [C++ Concurrency in Action (2nd Edition)](https://livebook.manning.com/book/c-plus-plus-concurrency-in-action-second-edition/chapter-5/79). This might result in `cpu_dispatch_ptr` being null pointer in concurrent situations, thus leading to the crash. As suggested in the book, due to spurious failure, the `compare_exchange_weak` is typically used in a loop.  There is also a [stackoverflow discussion](https://stackoverflow.com/questions/25199838/understanding-stdatomiccompare-exchange-weak-in-c11) about this. Feel free to drop comments below if there is a better option.

### The original PR

- [Enhance DispatchStub to be thread safe from a TSAN point of view](https://github.com/pytorch/pytorch/pull/32148)

### Test Plan

- Keep observing the crash reports in QE

Test Plan: Imported from OSS

Differential Revision: D20808751

Pulled By: xta0

fbshipit-source-id: 52f5c865b70c59b332ef9f0865315e76d97f6eaa
2020-04-08 10:56:07 -04:00
60375bcfdf [1.5.0] Attempt to fix the pytorch_cpp_doc_push build by pinning breathe. 2020-04-08 10:54:56 -04:00
63dcd9eccc Update docs for 1.5 to remove Python 2 references (#36116) 2020-04-07 16:03:44 -07:00
e8236d2ed4 fix max_pool2d cuda version Dimension out of range issue(#36046) (#36095)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36095

Test Plan: Imported from OSS

Differential Revision: D20876733

Pulled By: glaringlee

fbshipit-source-id: a2b92fd2dd0254c5443af469e3fb2faa2323e5c9
2020-04-07 18:52:21 -04:00
0058b1bb7e [1.5 cherrypick][JIT] Fix fake_range() 2020-04-07 18:47:22 -04:00
419283e291 Improve C++ API autograd and indexing docs (#35777)
Summary:
This PR adds docs for the following components:
1. Tensor autograd APIs (such as `is_leaf` / `backward` / `detach` / `detach_` / `retain_grad` / `grad` / `register_hook` / `remove_hook`)
2. Autograd APIs: `torch::autograd::backward` / `grad` / `Function` / `AutogradContext`, `torch::NoGradGuard` / `torch::AutoGradMode`
3. Tensor indexing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35777

Differential Revision: D20810616

Pulled By: yf225

fbshipit-source-id: 60526ec0c5b051021901d89bc3b56861c68758e8
2020-04-07 18:37:27 -04:00
0e6f6ba218 [pytorch] Remove python2 support from tests and torch.jit (#35042) (#36162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35042

Removing python2 tests and some compat code in torch.jit. Check if dependent projects and external tests have any issues after these changes.

Test Plan: waitforsandcastle

Reviewed By: suo, seemethere

Differential Revision: D18942633

fbshipit-source-id: d76cc41ff20bee147dd8d44d70563c10d8a95a35
(cherry picked from commit 8240db11e193b0334a60a33d9fc907ebc6ba6987)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Orion Reblitz-Richardson <orionr@fb.com>
2020-04-07 13:55:50 -07:00
ec8dbaf920 Add more alternative filters in places people forgot to add them. (#36082) (#36148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36082

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20874618

Pulled By: ezyang

fbshipit-source-id: b6f12100a247564428eb7272f803a03c9cad3a97
(cherry picked from commit 449a4ca3408774ed961f1702ca31a549f5818b80)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Edward Yang <ezyang@fb.com>
2020-04-07 09:59:33 -07:00
7e168d134f Pin Sphinx to 2.4.4 (take 2), fix docs CIs (#36072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36072

Update to https://github.com/pytorch/pytorch/pull/36065/ which was
almost there

Test Plan: - Wait for CI

Differential Revision: D20871661

Pulled By: zou3519

fbshipit-source-id: 2bf5ce382e879aafd232700ff1c0d61fc17ea52d
2020-04-07 10:54:36 -04:00
6daae58871 Remove __nv_relfatbin section from nccl_static library (#35907)
Test Plan: CI

(cherry picked from commit 04e06b419990328157f0e2108a95b2848f66d75f)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-04-06 16:57:03 -07:00
fee0ff1bf6 May fix TopKTypeConfig<at::Half> without an additional Bitfield specialization 2020-04-06 19:41:17 -04:00
deaf3b65cf Compile THCTensorTopK per dtype.
ROCm builds fail inconsistently on this file by timing out.

ghstack-source-id: 4a8f22731aa82c02d464a8cba522e856afbe49b8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36074
2020-04-06 19:41:17 -04:00
dca9c2501d Revert "Revert "Fix handling of non-finite values in topk (#35253)" (#35582)"
This reverts commit dacdbc22d195f80e0b529b4e9111c8ca9a172914.
2020-04-06 19:41:17 -04:00
842cd47416 Refactor and turn on C++ API parity test in CI
gh-metadata: pytorch pytorch 35190 gh/yf225/106/head
2020-04-06 15:40:35 -04:00
a30b49085c Move NewModuleTest and NewCriterionTest from test_nn.py to common_nn.py (#35189)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35189

Test Plan: Imported from OSS

Differential Revision: D20588197

Pulled By: yf225

fbshipit-source-id: 5a28159b653895678c250cbc0c1ddd51bc7a3123
2020-04-06 15:40:35 -04:00
82626f8ad9 More generic dedupe MKL fix (#35966)
* Stop linking against MKL

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Perform test for build size

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* fixup

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* One more MSVC fix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Revert "Perform test for build size"

This reverts commit 8b5ed8eac81cc880b5cedb33cb3b86f584abacb7.
2020-04-06 11:50:48 -07:00
27fddfda4f Use std::abs instead of abs in lbfgs.cpp (#35974)
Summary:
This supersedes https://github.com/pytorch/pytorch/pull/35698.

`abs` is a C-style function that takes only integral argument
`std::abs` is polymorphic and can be applied to both integral and floating point types

This PR also increases `kBatchSize` in `test_optimizer_xor` function in `test/cpp/api/optim.cpp` to fix `OptimTest.XORConvergence_LBFGS` failure under ASAN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35974

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D20853570

Pulled By: yf225

fbshipit-source-id: 6135588df2426c5b974e4e097b416955d1907bd4
2020-04-06 14:50:18 -04:00
7ecf6a1c10 [release/1.5] Bump libtorch to 3.7, remove python2 (#36080)
* .cirlceci: Remove Python 2.7 builds, switch libtorch to 3.7

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* .circleci: Bump libtorch builds to 3.7

The image is actually using Python 3.7.2 so we should reflect that
within our circleci configs

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
(cherry picked from commit b3f2572aaf83d1f5383369187f6263e6f926103b)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-04-06 11:10:48 -07:00
beb07a44c4 Ports integer division callsite cleanup 2020-04-02 20:17:31 -04:00
a01c3bd1fe [BC] Fix the BC test for 1.5 (#35733)
* [BC] Fix the BC test for 1.5

* Skip RRef

* Skip more

* Skip more

* Fix whitelist

* Fix whitelist
2020-04-02 19:36:18 -04:00
ffd010f8a0 Make test_leaky_relu_inplace_with_neg_slope device-generic and skipIfRocm. (#35816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35816

Fixes https://github.com/pytorch/pytorch/issues/35689.

Test Plan: Imported from OSS

Differential Revision: D20796656

Pulled By: gchanan

fbshipit-source-id: 474790fe07899d9944644f6b3d7a15db1c2b96db
2020-04-02 17:05:23 -04:00
8ad59f03a8 Skip ROCm test in test/test_cpp_extensions_aot.py (#35838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35838

It may be flaky.

Test Plan: Imported from OSS

Differential Revision: D20807409

Pulled By: gchanan

fbshipit-source-id: f085d05bcb6a04d304f3cd048c38d2e8453125d6
2020-04-02 17:04:54 -04:00
ed3640df68 Fix another case of float2::x and float2::y may not be the same on ROCm (#35785)
Summary:
This is another case of the issue fixed in https://github.com/pytorch/pytorch/pull/35783. Mirroring 35786.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35785

Differential Revision: D20800317

Pulled By: ezyang

fbshipit-source-id: de5f32839755d5ff5aefff8408df69adbab4d0a1
2020-04-02 17:01:27 -04:00
fb88942f6c Fix typo 2020-04-02 13:53:13 -04:00
5d05c51887 Refactored rpc docs (#35109)
Summary:
Reorganize as per jlin27 's comments. Screenshots added in comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109

Differential Revision: D20788774

Pulled By: rohan-varma

fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766
2020-04-02 13:53:13 -04:00
df5986fbf3 [1.5 Release] Disabled complex tensor construction (#35579)
* disabled complex tensor construction

* minor

* doc fix

* added docs back and updated complex dtype check

* removed test_complex.py

* removed complexfloat reg test

* debug
2020-04-01 11:11:05 -04:00
165403f614 [v1.5.0] float2::x and float2::y may not be the same as float on ROCm (#35593)
Summary:
This causes ambiguity and can be triggered sometimes (e.g., by https://github.com/pytorch/pytorch/issues/35217). Explicitly convert them to float.

    error: conditional expression is ambiguous; 'const
    hip_impl::Scalar_accessor<float, Native_vec_, 0>' can be converted to
    'float' and vice versa
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35593

Differential Revision: D20735663

Pulled By: ezyang

fbshipit-source-id: ae6a38a08e59821bae13eb0b9f9bdf21a008d5c0
2020-03-31 19:58:40 -04:00
fbf18c34ff ports disabling imag 2020-03-31 18:55:45 -04:00
84f806c821 ports real and imag fixes 2020-03-31 13:34:39 -04:00
94139a7d95 Add warnings that amp is incomplete in 1.5 2020-03-31 10:49:45 -04:00
75e36186b2 [v1.5.0] Fix Caffe2 mobile compilation
Ports #35288
2020-03-30 17:17:59 -04:00
f4a0b406dd Warn a known autograd issue on XLA backend. 2020-03-30 17:16:39 -04:00
e884e720f0 [Windows] make torch_cuda's forced link also work for CMake
Was only working for ninja
2020-03-30 17:13:51 -04:00
dacdbc22d1 Revert "Fix handling of non-finite values in topk (#35253)" (#35582)
This reverts commit b12579da5398ff23b421332e21e18dc619a0b960.

This patch in-and-of itself looks fine, but it's causing some AMP tests to fail.
2020-03-27 17:44:03 -07:00
2a789cd0e0 [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957)
Summary:
1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer)
2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function.
3. BC-compatibility serialization test for LBFGS
4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions
5. Made defaults_ optional argument in all optimizers except SGD

**TODO**: add BC-breaking notes for this PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957

Test Plan: Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D20678162

Pulled By: yf225

fbshipit-source-id: 74e062e42d86dc118f0fbaddd794e438b2eaf35a
2020-03-27 12:30:29 -04:00
f9b010f399 enforce rref JIT pickling to be in the scope of rpc calls (#34689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34689

rref JIT pickling is only allowed inside rpc calls. enforcing this by adding a thread local variable isInRpcCall and set it as True when converting rpc requests or responses to message, before calling JIT::pickle(). Inside JIT::pickle(), it allowes to pickle RRef only when the isInRpcCall is true.
ghstack-source-id: 100481001

Test Plan: unit tests

Differential Revision: D20429826

fbshipit-source-id: dbc04612ed15de5d6c7d75a4732041ccd4ef3f8c
2020-03-27 11:13:01 -04:00
55614ff306 Enforce rref python pickling to be in the scope of RPC call (#34755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34755

This diff disallows to use python pickler to pickle RRef. RRef can only be pickled in the scope of RPC call using _InternalRPCPickler.
ghstack-source-id: 100481337

Test Plan: unit tests

Differential Revision: D20453806

fbshipit-source-id: ebd4115ee01457ba6958cde805afd0a87c686612
2020-03-27 11:12:36 -04:00
b12579da53 Fix handling of non-finite values in topk (#35253)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34191

`at::native::radixSelect` basically uses integer comparison which creates a defined ordering of non-finite float values. This isn't compatible with IEEE float comparison, so mixing the two leads to unwritten values in the output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35253

Differential Revision: D20645554

Pulled By: ezyang

fbshipit-source-id: 651bcb1742ed67086ec89cc318d862caae65b981
2020-03-27 10:53:18 -04:00
920e3eb761 Making sure all tensors in torch.cat sequence have the same dtype. (#35150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35150

Fixes #35014

Test Plan: Imported from OSS

Differential Revision: D20578589

Pulled By: z-a-f

fbshipit-source-id: edeaef133d1cf5152dcbafab2b969f1424ee2836
2020-03-26 16:49:11 -04:00
bec01e755a Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions
gh-metadata: pytorch pytorch 35163 gh/yf225/104/head
2020-03-26 14:31:21 -04:00
6a880e1bc9 Add inplace tests for several torch::nn modules / functionals
gh-metadata: pytorch pytorch 35147 gh/yf225/101/head
2020-03-26 14:31:21 -04:00
fa86e32a4e Fix F::interpolate and torch::nn::Upsample implementation
gh-metadata: pytorch pytorch 35025 gh/yf225/100/head
2020-03-26 14:31:21 -04:00
5aabaf2b18 Fix fractional_max_pool3d_with_indices implementation
gh-metadata: pytorch pytorch 35024 gh/yf225/99/head
2020-03-26 14:31:21 -04:00
4a707e8f95 Fix Conv and ConvTranspose implementation
gh-metadata: pytorch pytorch 35023 gh/yf225/98/head
2020-03-26 14:31:21 -04:00
db127b21eb Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation
gh-metadata: pytorch pytorch 35022 gh/yf225/97/head
2020-03-26 14:31:21 -04:00
45313cd9e1 [1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35440)
* add xor_convergence test for lbfgs

* increased batchsize to 6

* minor

* increased batch size

Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>
2020-03-26 14:22:55 -04:00
df531973e1 [ONNX] update producer version (#35059)
Summary:
Updating producer version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35059

Reviewed By: hl475

Differential Revision: D20585173

Pulled By: houseroad

fbshipit-source-id: af0c4e3860beb899548466ea99be2050150f905d
2020-03-26 13:56:57 -04:00
9e3c577caa Fix torch.mm export to ONNX (#34661)
Summary:
torch.mm is exported as Gemm operator in ONNX and both have an optional input: out.
out is considered as broadcastable in Gemm and during graph optimization the optional input (out) would get selected. Since out is optional, in case when it is not defined in torch.mm that would result in the following exception:
IndexError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34661

Reviewed By: hl475

Differential Revision: D20496398

Pulled By: houseroad

fbshipit-source-id: e677aef0a6aefb1f83a54033153aaabe5c23bc0f
2020-03-26 13:55:18 -04:00
5357b8e4d9 .circleci: Remove python 2 binary builds (#35475)
Python 2 is EOL soon so we're dropping support as of v1.5.0

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-26 10:50:34 -07:00
0f23d23db4 Add docs to resize_ and resize_as_ (#35392)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35392

Test Plan: Imported from OSS

Differential Revision: D20650097

Pulled By: VitalyFedyunin

fbshipit-source-id: cff4f555d355dfee42394f6070fe3e466949aeb5
2020-03-26 12:23:04 -04:00
7c24280a3f Add docs about memory format (#34818)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34818

Test Plan: Imported from OSS

Differential Revision: D20601336

Pulled By: VitalyFedyunin

fbshipit-source-id: d34ad226be950bf134c6b383a4810ea6aa75599e
2020-03-26 12:23:04 -04:00
7100f0be13 ports true_divide method variant to 1.5 (#35390)
Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-03-26 11:50:00 -04:00
f7f611c2ec torch.cat: disallow inputs on different devices (#35053)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35045
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35053

Differential Revision: D20545517

Pulled By: ngimel

fbshipit-source-id: eee3fc87c7e578ff44d69d5ce6f92a8f496fa97b
2020-03-26 10:58:33 -04:00
acb982d0b0 Add TORCH_CUDA_API to FilterDescriptor (#35131)
Summary:
`FilterDescriptor` is missing a `TORCH_CUDA_API`, so this symbol is not exported from `torch_cuda.so`, and users could have trouble building cpp_extension when using cudnn.

cc: ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35131

Differential Revision: D20604439

Pulled By: ezyang

fbshipit-source-id: c57414fc8a9df9cb1e910e2ec0a48cfdbe7d1779
2020-03-26 10:57:59 -04:00
aa8b7ad989 Fix thread_local initializtion in C10 WarningHandler. (#34822)
Summary:
The Windows + MSVC-specific bug discussed here: https://github.com/pytorch/pytorch/issues/19394 and fixed here: https://github.com/pytorch/pytorch/issues/22405 still appears in C10's warning handler class. This results in a crash if a user attempts to run code which would print a warning when that code is running inside a thread created by a DLL. This PR applies a similar fix to that of https://github.com/pytorch/pytorch/issues/22405.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34822

Test Plan:
* Tested locally by running CodecverseWorkbench Unity app with patched build.
* CI

Differential Revision: D20627971

Pulled By: HapeMask

fbshipit-source-id: 64dfca531ed7eebbe9e0ecac3d3d4d025c683883
2020-03-25 20:02:45 -07:00
2d403ed8be Add python excepiton handling catch block to resolve deadlock (#35283) (#35402)
Summary:
Note: This PR has been merged into master after the 1.5.0 branch cut at
36e3c00 (see original PR: #35283). This PR is to cherry pick it into 1.5.

---- Original Commit Description Follows ---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/35283

https://github.com/pytorch/pytorch/issues/34260

Deadlock on destructing py::error_already_set.

There are request callback impls in Python, where Python exceptions
could be thrown. For releasing Python exception py::objects, GIL must
be held.

Differential Revision: D7753253

fbshipit-source-id: 4bfaaaf027e4254f5e3fedaca80228c8b4282e39

Co-authored-by: Shihao Xu <shihaoxu@fb.com>
2020-03-25 17:05:18 -07:00
c25a664f77 Trying pinning pyyaml and setuptools on macos to older version (#35296) (#35400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35296

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20624843

Pulled By: ezyang

fbshipit-source-id: 9028f1dd62d0c25e916eb4927fd8dd6acbd88886
(cherry picked from commit 3f896ef7435201b2c3f51851f80dc674dfadfd40)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Edward Yang <ezyang@fb.com>
2020-03-25 16:04:06 -07:00
ab660ae394 Fix Tensor __radd__ type hint issue (#35231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35231

Fixes #35213

(Note: this ignores all push blocking failures!)

Test Plan: `mypy -c "import torch; ten = torch.tensor([1.0, 2.0, 3.0]); print(7 + ten)"` should not produce any warnings

Differential Revision: D20604924

Pulled By: pbelevich

fbshipit-source-id: 53a293a99b3f2ab6ca5516b31f3a92f67eb67a39
2020-03-25 18:37:07 -04:00
3c476a8858 PyTorch should always depend on future (#35057) (#35412)
Summary:
Because `past` is used in `caffe2.python.core`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35057

Test Plan: CI

Differential Revision: D20547042

Pulled By: malfet

fbshipit-source-id: cad2123c7b88271fea37f21e616df551075383a8
(cherry picked from commit d3f5045bf55e4a5dfb53ceccb6130e4e408cf466)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-03-25 14:54:26 -07:00
651fa88645 Load all DLLs in the lib directory for Windows (v.1.5.0) 2020-03-25 16:23:22 -04:00
565c3400b4 Update view op list. 2020-03-25 16:14:08 -04:00
3e332778b4 non blocking copy from #35144 2020-03-25 14:54:41 -04:00
f598738920 UBSAN deliberate float to int fix 2020-03-25 11:24:30 -04:00
4c6bfa0187 [1.5 cherrypick][JIT] Namespaces for TorchBind 2020-03-25 11:23:03 -04:00
6f25003682 [1.5 cherrypick][JIT] BC shim for TorchBind classes 2020-03-25 11:23:03 -04:00
752c129fa1 Update docs about DP and DDP for CUDA (#35063)
Summary:
We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063

Differential Revision: D20549621

Pulled By: ngimel

fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543
2020-03-25 11:18:17 -04:00
fb59a9caca .circleci: Change default CUDA for pip, cu101 -> cu102 (#35310)
So that packages are correctly marked when looking through the html
pages.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-24 15:05:25 -07:00
4d30dbdd35 Pin XLA CI to use r1.5 release branch. 2020-03-24 17:54:31 -04:00
b7f4a1a397 .circleci: Switch master to release/1.5 for git merge (#35320)
Since we're on a release branch we'll need to fix this up to do a merge
for release/1.5 instead of master.

TODO: In the future we should have a dynamic way of gathering the base
branch for PRs.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-24 14:52:24 -07:00
afda1dc943 Revert "Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation"
This reverts commit e2184ba08352d730d7165455c14f783b3e54082a.
2020-03-24 14:09:18 -04:00
d506ae882b Revert "Fix Conv and ConvTranspose implementation"
This reverts commit 88778854546b08bc6dd9f68e0a64311902c7d30c.
2020-03-24 14:09:18 -04:00
36e5abe531 Revert "Fix fractional_max_pool3d_with_indices implementation"
This reverts commit b89eb7c654b846fb3391cf4cc5aeb536cc41f1d7.
2020-03-24 14:09:18 -04:00
6e6f62230e Revert "Fix F::interpolate and torch::nn::Upsample implementation"
This reverts commit 75148df1f56c91f54965b530d606a6b9a4c8e269.
2020-03-24 14:09:18 -04:00
5d15577e6c Revert "Add inplace tests for several torch::nn modules / functionals"
This reverts commit 48590d6a9b939fb8097e4f2108872721ea5a516f.
2020-03-24 14:09:18 -04:00
6aa5298c5c Revert "Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions"
This reverts commit 5ca901431886d60687275b9a310eac5b5aeba02f.
2020-03-24 14:09:18 -04:00
f3df13725b Revert "[1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35113)"
This reverts commit 246b824644c3731b00be6119f69795afd4eac9b6.
2020-03-24 14:08:56 -04:00
4eee3caa11 [release/1.5] .circleci: Fix unbound CIRCLE_TAG variable (#35242)
Was failing when trying to execute this script on a non-tag

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-23 16:21:44 -07:00
4d96463130 Updating fbgemm 2020-03-23 13:31:24 -07:00
246b824644 [1.5 cherrypick] [C++ API Parity] Add xor_convergence test for lbfgs (#35113)
* add xor_convergence test for lbfgs

* increased batchsize to 6

* minor

* increased batch size
2020-03-23 16:00:57 -04:00
5ca9014318 Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions 2020-03-23 15:55:18 -04:00
48590d6a9b Add inplace tests for several torch::nn modules / functionals
gh-metadata: pytorch pytorch 35147 gh/yf225/101/head
2020-03-23 15:55:18 -04:00
75148df1f5 Fix F::interpolate and torch::nn::Upsample implementation
gh-metadata: pytorch pytorch 35025 gh/yf225/100/head
2020-03-23 15:55:18 -04:00
b89eb7c654 Fix fractional_max_pool3d_with_indices implementation
gh-metadata: pytorch pytorch 35024 gh/yf225/99/head
2020-03-23 15:55:18 -04:00
8877885454 Fix Conv and ConvTranspose implementation
gh-metadata: pytorch pytorch 35023 gh/yf225/98/head
2020-03-23 15:55:18 -04:00
e2184ba083 Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation
gh-metadata: pytorch pytorch 35022 gh/yf225/97/head
2020-03-23 15:55:18 -04:00
8ef47ad2f0 Updating fbgemm 2020-03-23 10:08:52 -07:00
6725b6f503 .cirlceci: Refactor how to grab the tagged version
Discovered that the upload scripts do not do well when there's no
pytorch repository to actually do git operations on.

CirlceCI however provides a nice environment variable with the name of
the current tag so let's just use that when it's available and fall back
on the git describe functionality if that fails.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 16:34:57 -07:00
bcd3f6da1a .circleci: Remove quotes from --git-dir
git doesn't handle the escapes correctly so let's just not put them
altogether.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 15:39:31 -07:00
0b3d2f7b7d .circleci: Make sure to add .git to --git-dir
--git-dir only works when it points directly to a .git folder

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 15:28:23 -07:00
f522651a7e .circleci: Switch git -C -> git --git-dir
Older versions of git do not contain the '-C' flag so let's switch to a
flag that is pre-historic and will run on any version of RHEL that is
still supported in the modern era.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 15:22:44 -07:00
01c8ef2757 .circleci: One more -C to add to get correct git info
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 15:08:02 -07:00
7cfe68ce3a .circleci: Hardcode directory to /pytorch to ensure git
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 14:54:57 -07:00
6f3120c6b9 .circleci: Ensure describe happens in pytorch repo
Found an issue where the git describe wasn't properly executed since the
binary_populate_env.sh script was being executed from a different
directory.

'git -C' forces the describe to run in the running directory for the
script which should contain the correct git information

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-03-19 14:24:18 -07:00
363 changed files with 6151 additions and 4693 deletions

View File

@ -466,7 +466,7 @@ But if you want to try, then Id recommend
# Always install miniconda 3, even if building for Python <3
new_conda="~/my_new_conda"
conda_sh="$new_conda/install_miniconda.sh"
curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"

View File

@ -34,8 +34,6 @@ def get_processor_arch_name(cuda_version):
LINUX_PACKAGE_VARIANTS = OrderedDict(
manywheel=[
"2.7m",
"2.7mu",
"3.5m",
"3.6m",
"3.7m",
@ -43,7 +41,7 @@ LINUX_PACKAGE_VARIANTS = OrderedDict(
],
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[
"2.7m",
"3.7m",
],
)
@ -53,7 +51,7 @@ CONFIG_TREE_DATA = OrderedDict(
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[
"2.7",
"3.7",
],
)),
)

View File

@ -67,9 +67,14 @@ class Conf(object):
job_def["requires"].append("update_s3_htmls_for_nightlies_devtoolset7")
job_def["filters"] = {"branches": {"only": "postnightly"}}
else:
filter_branches = ["nightly"]
# we only want to add the release branch filter if we aren't
# uploading
if phase not in ["upload"]:
filter_branches.append(r"/release\/.*/")
job_def["filters"] = {
"branches": {
"only": "nightly"
"only": filter_branches
},
# Will run on tags like v1.5.0-rc1, etc.
"tags": {

View File

@ -4,7 +4,6 @@ from cimodel.lib.conf_tree import Ver
CONFIG_TREE_DATA = [
(Ver("ubuntu", "16.04"), [
([Ver("gcc", "5")], [XImportant("onnx_py2")]),
([Ver("clang", "7")], [XImportant("onnx_main_py3.6"),
XImportant("onnx_ort1_py3.6"),
XImportant("onnx_ort2_py3.6")]),

View File

@ -33,8 +33,7 @@ class Conf:
# TODO: Eventually we can probably just remove the cudnn7 everywhere.
def get_cudnn_insertion(self):
omit = self.language == "onnx_py2" \
or self.language == "onnx_main_py3.6" \
omit = self.language == "onnx_main_py3.6" \
or self.language == "onnx_ort1_py3.6" \
or self.language == "onnx_ort2_py3.6" \
or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \
@ -71,11 +70,10 @@ class Conf:
def gen_docker_image(self):
lang_substitutions = {
"onnx_py2": "py2",
"onnx_main_py3.6": "py3.6",
"onnx_ort1_py3.6": "py3.6",
"onnx_ort2_py3.6": "py3.6",
"cmake": "py2",
"cmake": "py3",
}
lang = miniutils.override(self.language, lang_substitutions)
@ -85,7 +83,7 @@ class Conf:
def gen_workflow_params(self, phase):
parameters = OrderedDict()
lang_substitutions = {
"onnx_py2": "onnx-py2",
"onnx_py3": "onnx-py3",
"onnx_main_py3.6": "onnx-main-py3.6",
"onnx_ort1_py3.6": "onnx-ort1-py3.6",
"onnx_ort2_py3.6": "onnx-ort2-py3.6",
@ -129,7 +127,7 @@ class Conf:
job_name = "caffe2_" + self.get_platform() + "_build"
if not self.is_important:
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/", r"/release\/.*/"]}}
job_def.update(self.gen_workflow_params(phase))
return {job_name : job_def}

View File

@ -8,7 +8,6 @@ CUDA_VERSIONS = [
]
STANDARD_PYTHON_VERSIONS = [
"2.7",
"3.5",
"3.6",
"3.7",

View File

@ -114,7 +114,7 @@ class Conf:
if not self.is_important:
# If you update this, update
# caffe2_build_definitions.py too
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}
job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/", r"/release\/.*/"]}}
job_def.update(self.gen_workflow_params(phase))
return {job_name : job_def}

File diff suppressed because it is too large Load Diff

View File

@ -4,7 +4,7 @@ set -ex
# Optionally install conda
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://repo.continuum.io/miniconda"
BASE_URL="https://repo.anaconda.com/miniconda"
MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)

View File

@ -31,9 +31,9 @@ fi
conda_sh="$workdir/install_miniconda.sh"
if [[ "$(uname)" == Darwin ]]; then
curl --retry 3 -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
else
curl --retry 3 -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
fi
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"

View File

@ -2,6 +2,19 @@
set -eux -o pipefail
export TZ=UTC
tagged_version() {
# Grabs version from either the env variable CIRCLE_TAG
# or the pytorch git described version
GIT_DESCRIBE="git --git-dir ${workdir}/pytorch/.git describe"
if [[ -n "${CIRCLE_TAG:-}" ]]; then
echo "${CIRCLE_TAG}"
elif ${GIT_DESCRIBE} --exact --tags >/dev/null; then
${GIT_DESCRIBE} --tags
else
return 1
fi
}
# We need to write an envfile to persist these variables to following
# steps, but the location of the envfile depends on the circleci executor
if [[ "$(uname)" == Darwin ]]; then
@ -47,15 +60,17 @@ export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="1.5.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
if git describe --tags --exact >/dev/null 2>/dev/null; then
# Use 'git -C' to make doubly sure we're in the correct directory for checking
# the git tag
if tagged_version >/dev/null; then
# Switch upload folder to 'test/' if we are on a tag
PIP_UPLOAD_FOLDER='test/'
# Grab git tag, remove prefixed v and remove everything after -
# Used to clean up tags that are for release candidates like v1.5.0-rc1
# Turns tag v1.5.0-rc1 -> v1.5.0
BASE_BUILD_VERSION="$(git describe --tags | sed -e 's/^v//' -e 's/-.*$//')"
BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"
fi
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu101" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu102" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"
else
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"

View File

@ -72,10 +72,10 @@ time python tools/setup_helpers/generate_code.py \
# Build the docs
pushd docs/cpp
pip install breathe>=4.13.0 bs4 lxml six
pip install breathe==4.13.0 bs4 lxml six
pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"
pip install exhale>=0.2.1
pip install sphinx>=2.0
pip install sphinx==2.4.4
# Uncomment once it is fixed
# pip install -r requirements.txt
time make VERBOSE=1 html -j

View File

@ -151,7 +151,7 @@
# Install Anaconda if we need to
if [ -n "${CAFFE2_USE_ANACONDA}" ]; then
rm -rf ${TMPDIR}/anaconda
curl --retry 3 -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
curl --retry 3 -o ${TMPDIR}/conda.sh https://repo.anaconda.com/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh
chmod +x ${TMPDIR}/conda.sh
/bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda
rm -f ${TMPDIR}/conda.sh

View File

@ -20,16 +20,16 @@ jobs:
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
# TODO We may want to move the rebase logic to a separate step after checkout
# Rebase to master only if in xenial_py3_6_gcc5_4 case
if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
# Rebase to release/1.5 only if in xenial_py3_6_gcc5_4 case
if [[ "${CIRCLE_BRANCH}" != "release/1.5" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
echo "Merge release/1.5 branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
set -x
git config --global user.email "circleci.ossci@gmail.com"
git config --global user.name "CircleCI"
git config remote.origin.url https://github.com/pytorch/pytorch.git
git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
git config --add remote.origin.fetch +refs/heads/release/1.5:refs/remotes/origin/release/1.5
git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/release/1.5:refs/remotes/origin/release/1.5 --depth=100 --quiet
export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/release/1.5`
echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
export GIT_COMMIT=${CIRCLE_SHA1}
echo "GIT_COMMIT: " ${GIT_COMMIT}
@ -38,7 +38,7 @@ jobs:
git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}
set +x
else
echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
echo "Do NOT merge release/1.5 branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
fi
git submodule sync && git submodule update -q --init --recursive

View File

@ -15,6 +15,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.11_py36_cuda10.1_test1
test_name: pytorch-windows-test1
@ -32,6 +33,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.11_py36_cuda10.1_test2
test_name: pytorch-windows-test2
@ -49,6 +51,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_build:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_build
cuda_version: "10"
@ -64,6 +67,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_test1
test_name: pytorch-windows-test1
@ -81,6 +85,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
name: pytorch_windows_vs2017_14.16_py36_cuda10.1_test2
test_name: pytorch-windows-test2
@ -98,6 +103,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_build:
name: pytorch_windows_vs2019_py36_cuda10.1_build
cuda_version: "10"

View File

@ -7,12 +7,6 @@
# pytorch-ci-hud to adjust the list of whitelisted builds
# at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js
- binary_linux_build:
name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_build
build_environment: "manywheel 2.7mu cpu devtoolset7"
requires:
- setup
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_build:
name: binary_linux_manywheel_3_7m_cu102_devtoolset7_build
build_environment: "manywheel 3.7m cu102 devtoolset7"
@ -23,24 +17,21 @@
branches:
only:
- master
- binary_linux_build:
name: binary_linux_conda_2_7_cpu_devtoolset7_build
build_environment: "conda 2.7 cpu devtoolset7"
requires:
- setup
docker_image: "pytorch/conda-cuda"
- /ci-all\/.*/
- /release\/.*/
# This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3_6_cu90_devtoolset7_build
# TODO rename to remove python version for libtorch
- binary_linux_build:
name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build
build_environment: "libtorch 2.7m cpu devtoolset7"
name: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build
build_environment: "libtorch 3.7m cpu devtoolset7"
requires:
- setup
libtorch_variant: "shared-with-deps"
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_build:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
name: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
build_environment: "libtorch 3.7m cpu gcc5.4_cxx11-abi"
requires:
- setup
libtorch_variant: "shared-with-deps"
@ -48,45 +39,30 @@
# TODO we should test a libtorch cuda build, but they take too long
# - binary_linux_libtorch_2_7m_cu90_devtoolset7_static-without-deps_build
- binary_mac_build:
name: binary_macos_wheel_3_6_cpu_build
build_environment: "wheel 3.6 cpu"
requires:
- setup
filters:
branches:
only:
- master
- binary_mac_build:
name: binary_macos_conda_2_7_cpu_build
build_environment: "conda 2.7 cpu"
name: binary_macos_wheel_3_7_cpu_build
build_environment: "wheel 3.7 cpu"
requires:
- setup
filters:
branches:
only:
- master
- /ci-all\/.*/
- /release\/.*/
# This job has an average run time of 3 hours o.O
# Now only running this on master to reduce overhead
# TODO rename to remove python version for libtorch
- binary_mac_build:
name: binary_macos_libtorch_2_7_cpu_build
build_environment: "libtorch 2.7 cpu"
name: binary_macos_libtorch_3_7_cpu_build
build_environment: "libtorch 3.7 cpu"
requires:
- setup
filters:
branches:
only:
- master
- binary_linux_test:
name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_test
build_environment: "manywheel 2.7mu cpu devtoolset7"
requires:
- setup
- binary_linux_manywheel_2_7mu_cpu_devtoolset7_build
docker_image: "pytorch/manylinux-cuda102"
filters:
branches:
only:
- master
- /ci-all\/.*/
- /release\/.*/
- binary_linux_test:
name: binary_linux_manywheel_3_7m_cu102_devtoolset7_test
build_environment: "manywheel 3.7m cu102 devtoolset7"
@ -100,29 +76,25 @@
branches:
only:
- master
- binary_linux_test:
name: binary_linux_conda_2_7_cpu_devtoolset7_test
build_environment: "conda 2.7 cpu devtoolset7"
requires:
- setup
- binary_linux_conda_2_7_cpu_devtoolset7_build
docker_image: "pytorch/conda-cuda"
- /ci-all\/.*/
- /release\/.*/
# This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3_6_cu90_devtoolset7_test:
# TODO rename to remove python version for libtorch
- binary_linux_test:
name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_test
build_environment: "libtorch 2.7m cpu devtoolset7"
name: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test
build_environment: "libtorch 3.7m cpu devtoolset7"
requires:
- setup
- binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build
- binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "pytorch/manylinux-cuda102"
- binary_linux_test:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
name: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test
build_environment: "libtorch 3.7m cpu gcc5.4_cxx11-abi"
requires:
- setup
- binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
- binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"

View File

@ -20,21 +20,12 @@
- docker_build_job:
name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9-cudnn7-py2"
image_name: "pytorch-linux-xenial-cuda9-cudnn7-py2"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9-cudnn7-py3"
image_name: "pytorch-linux-xenial-cuda9-cudnn7-py3"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-py2.7.9"
image_name: "pytorch-linux-xenial-py2.7.9"
- docker_build_job:
name: "pytorch-linux-xenial-py2.7"
image_name: "pytorch-linux-xenial-py2.7"
- docker_build_job:
name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
image_name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

View File

@ -4,6 +4,8 @@
branches:
only:
- master
- /ci-all\/.*/
- /release\/.*/
requires:
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
@ -13,6 +15,8 @@
branches:
only:
- master
- /ci-all\/.*/
- /release\/.*/
requires:
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build
- pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

View File

@ -31,6 +31,7 @@
only:
- master
- /ci-all\/.*/
- /release\/.*/
build_environment: "pytorch-linux-xenial-py3-clang5-mobile-code-analysis"
build_only: "1"
# Use LLVM-DEV toolchain in android-ndk-r19c docker image

View File

@ -81,44 +81,6 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
flake8-py2:
runs-on: ubuntu-latest
steps:
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 2.x
architecture: x64
- name: Fetch PyTorch
uses: actions/checkout@v1
- name: Checkout PR tip
run: |
set -eux
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
# We are on a PR, so actions/checkout leaves us on a merge commit.
# Check out the actual tip of the branch.
git checkout ${{ github.event.pull_request.head.sha }}
fi
echo ::set-output name=commit_sha::$(git rev-parse HEAD)
id: get_pr_tip
- name: Run flake8
run: |
set -eux
pip install flake8
rm -rf .circleci tools/clang_format_new.py
flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt
cat ${GITHUB_WORKSPACE}/flake8-output.txt
- name: Add annotations
uses: pytorch/add-annotations-github-action@master
with:
check_name: 'flake8-py2'
linter_output_path: 'flake8-output.txt'
commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}
regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
clang-tidy:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
@ -198,6 +160,8 @@ jobs:
-g"-torch/csrc/jit/export.cpp" \
-g"-torch/csrc/jit/import.cpp" \
-g"-torch/csrc/jit/netdef_converter.cpp" \
-g"-torch/csrc/cuda/nccl.*" \
-g"-torch/csrc/cuda/python_nccl.cpp" \
"$@" > ${GITHUB_WORKSPACE}/clang-tidy-output.txt
cat ${GITHUB_WORKSPACE}/clang-tidy-output.txt

View File

@ -167,7 +167,7 @@ fi
# Patch required to build xla
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
git clone --recursive https://github.com/pytorch/xla.git
git clone --recursive -b r1.5 https://github.com/pytorch/xla.git
./xla/scripts/apply_patches.sh
fi

View File

@ -13,12 +13,12 @@ mkdir -p ${WORKSPACE_DIR}
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then
mkdir -p ${WORKSPACE_DIR}
curl --retry 3 https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
curl --retry 3 https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
retry bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3
fi
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
source ${WORKSPACE_DIR}/miniconda3/bin/activate
retry conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
retry conda install -y mkl mkl-include numpy pyyaml=5.3 setuptools=46.0.0 cmake cffi ninja
# The torch.hub tests make requests to GitHub.
#

View File

@ -20,7 +20,7 @@ if [ -n "${IN_CIRCLECI}" ]; then
sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-cudnn7-py3* ]]; then
# TODO: move this to Docker
sudo apt-get update
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

View File

@ -21,7 +21,7 @@ if [ -n "${IN_CIRCLECI}" ]; then
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1
fi
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda10.1-cudnn7-py3* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
@ -244,7 +244,7 @@ test_backward_compatibility() {
pushd test/backward_compatibility
python dump_all_function_schemas.py --filename new_schemas.txt
pip_uninstall torch
pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
pip_install torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
python check_backward_compatibility.py --new-schemas new_schemas.txt
popd
set +x

View File

@ -5,7 +5,7 @@ if "%BUILD_ENVIRONMENT%"=="" (
)
if "%REBUILD%"=="" (
IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )
curl --retry 3 -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
curl --retry 3 -k https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
)
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

View File

@ -13,7 +13,7 @@ if "%BUILD_ENVIRONMENT%"=="" (
)
if NOT "%BUILD_ENVIRONMENT%"=="" (
IF EXIST %CONDA_PARENT_DIR%\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\Miniconda3 )
curl --retry 3 https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
curl --retry 3 https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
)
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

View File

@ -160,20 +160,18 @@ ENDIF(BLAS_FOUND)
IF(LAPACK_FOUND)
list(APPEND ATen_CPU_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
if(USE_CUDA)
if(USE_CUDA AND MSVC)
# Although Lapack provides CPU (and thus, one might expect that ATen_cuda
# would not need this at all), some of our libraries (magma in particular)
# backend to CPU BLAS/LAPACK implementations, and so it is very important
# we get the *right* implementation, because even if the symbols are the
# same, LAPACK implementions may have different calling conventions.
# This caused https://github.com/pytorch/pytorch/issues/7353
#
# We do NOT do this on Linux, since we just rely on torch_cpu to
# provide all of the symbols we need
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
endif()
if(USE_ROCM)
# It's not altogether clear that HIP behaves the same way, but it
# seems safer to assume that it needs it too
list(APPEND ATen_HIP_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
endif()
ENDIF(LAPACK_FOUND)
IF (UNIX AND NOT APPLE)
@ -331,8 +329,12 @@ IF(USE_CUDA AND NOT USE_ROCM)
IF(USE_MAGMA)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${MAGMA_LIBRARIES})
IF ($ENV{TH_BINARY_BUILD})
IF (MSVC)
# Do not do this on Linux: see Note [Extra MKL symbols for MAGMA in torch_cpu]
# in caffe2/CMakeLists.txt
list(APPEND ATen_CUDA_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
ENDIF(MSVC)
ENDIF($ENV{TH_BINARY_BUILD})
ENDIF(USE_MAGMA)
IF ($ENV{ATEN_STATIC_CUDA})

View File

@ -125,13 +125,15 @@ void _parallel_run(
std::tie(num_tasks, chunk_size) =
internal::calc_num_tasks_and_chunk_size(begin, end, grain_size);
struct {
std::atomic_flag err_flag = ATOMIC_FLAG_INIT;
std::exception_ptr eptr;
std::vector<std::shared_ptr<c10::ivalue::Future>> futures(num_tasks);
for (size_t task_id = 0; task_id < num_tasks; ++task_id) {
futures[task_id] = std::make_shared<c10::ivalue::Future>(c10::NoneType::get());
}
auto task = [f, &eptr, &err_flag, &futures, begin, end, chunk_size]
std::mutex mutex;
volatile size_t remaining;
std::condition_variable cv;
} state;
auto task = [f, &state, begin, end, chunk_size]
(int /* unused */, size_t task_id) {
int64_t local_start = begin + task_id * chunk_size;
if (local_start < end) {
@ -140,21 +142,30 @@ void _parallel_run(
ParallelRegionGuard guard(task_id);
f(local_start, local_end, task_id);
} catch (...) {
if (!err_flag.test_and_set()) {
eptr = std::current_exception();
if (!state.err_flag.test_and_set()) {
state.eptr = std::current_exception();
}
}
}
futures[task_id]->markCompleted();
{
std::unique_lock<std::mutex> lk(state.mutex);
if (--state.remaining == 0) {
state.cv.notify_one();
}
}
};
state.remaining = num_tasks;
_run_with_pool(task, num_tasks);
// Wait for all tasks to finish.
for (size_t task_id = 0; task_id < num_tasks; ++task_id) {
futures[task_id]->wait();
{
std::unique_lock<std::mutex> lk(state.mutex);
if (state.remaining != 0) {
state.cv.wait(lk);
}
if (eptr) {
std::rethrow_exception(eptr);
}
if (state.eptr) {
std::rethrow_exception(state.eptr);
}
}

View File

@ -16,14 +16,6 @@
#include <numeric>
#include <memory>
#if defined(__clang__)
#define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
#define __ubsan_ignore_vptr__ __attribute__((no_sanitize("vptr")))
#else
#define __ubsan_ignore_float_divide_by_zero__
#define __ubsan_ignore_vptr__
#endif
#define AT_DISALLOW_COPY_AND_ASSIGN(TypeName) \
TypeName(const TypeName&) = delete; \
void operator=(const TypeName&) = delete

View File

@ -20,6 +20,10 @@ void registerCustomClass(at::ClassTypePtr class_type) {
}
at::ClassTypePtr getCustomClass(const std::string& name) {
// BC hack so we can upgrade a binary internally
if (name == "__torch__.torch.classes.SentencePiece") {
return getCustomClass("__torch__.torch.classes.fb.SentencePiece");
}
return customClasses().count(name) ? customClasses()[name] : nullptr;
}

View File

@ -15,6 +15,7 @@
#include <c10/util/math_compat.h>
#include <ATen/native/cpu/zmath.h>
#include <c10/util/TypeCast.h>
#include <c10/macros/Macros.h>
#if defined(__GNUC__)
#define __at_align32__ __attribute__((aligned(32)))

View File

@ -145,7 +145,7 @@ private:
std::ostream& operator<<(std::ostream & out, const TensorDescriptor& d);
class FilterDescriptor
class TORCH_CUDA_API FilterDescriptor
: public Descriptor<cudnnFilterStruct,
&cudnnCreateFilterDescriptor,
&cudnnDestroyFilterDescriptor>

View File

@ -138,6 +138,10 @@ Tensor true_divide(const Tensor& self, const Tensor& divisor) {
return iter.output();
}
Tensor& true_divide_(Tensor& self, const Tensor& divisor) {
return native::true_divide_out(self, self, divisor);
}
Tensor& floor_divide_out(Tensor& result, const Tensor& self, const Tensor& other) {
auto iter = TensorIterator::binary_op(result, self, other,
/*check_mem_overlap=*/true);
@ -731,7 +735,11 @@ Tensor& fmod_(Tensor& self, Scalar other) {
}
Tensor true_divide(const Tensor& self, Scalar divisor) {
return at::true_divide(self, wrapped_scalar_tensor(divisor)); // redispatch!
return self.true_divide(wrapped_scalar_tensor(divisor)); // redispatch!
}
Tensor& true_divide_(Tensor& self, Scalar divisor) {
return self.true_divide_(wrapped_scalar_tensor(divisor)); // redispatch!
}
}

View File

@ -70,8 +70,8 @@ struct CAFFE2_API DispatchStub<rT (*)(Args...), T> {
// they will still compute the same value for cpu_dispatch_ptr.
if (!cpu_dispatch_ptr.load(std::memory_order_relaxed)) {
FnPtr tmp_cpu_dispatch_ptr = nullptr;
cpu_dispatch_ptr.compare_exchange_weak(
tmp_cpu_dispatch_ptr, choose_cpu_impl(), std::memory_order_relaxed);
while(!cpu_dispatch_ptr.compare_exchange_weak(
tmp_cpu_dispatch_ptr, choose_cpu_impl(), std::memory_order_relaxed));
}
return (*cpu_dispatch_ptr)(std::forward<ArgTypes>(args)...);
} else if (device_type == DeviceType::CUDA) {

View File

@ -533,7 +533,7 @@ Tensor frobenius_norm(const Tensor& self, IntArrayRef dim, bool keepdim) {
return at::norm(self, 2, dim, keepdim, self.scalar_type());
}
if (self.is_complex()){
return at::sqrt(at::sum((self.conj() * self).real(), dim, keepdim));
return at::sqrt(at::sum(at::real(self.conj() * self), dim, keepdim));
} else {
return at::sqrt(at::sum((self * self), dim, keepdim));
}
@ -553,7 +553,7 @@ Tensor &frobenius_norm_out(
return at::norm_out(result, self, 2, dim, keepdim, self.scalar_type());
}
if (self.is_complex()){
return at::sqrt_out(result, at::sum((self.conj() * self).real(), dim, keepdim));
return at::sqrt_out(result, at::sum(at::real(self.conj() * self), dim, keepdim));
} else {
return at::sqrt_out(result, at::sum((self * self), dim, keepdim));
}

View File

@ -799,7 +799,7 @@ static Tensor &std_var_out(Tensor &result, const Tensor &self, IntArrayRef dim,
if (at::isComplexType(self.scalar_type())){
ScalarType dtype = c10::toValueType(get_dtype(result, self, {}, true));
Tensor real_in = self.real().to(dtype);
Tensor real_in = at::real(self).to(dtype);
Tensor real_out = at::empty({0}, self.options().dtype(dtype));
auto iter = make_reduction("std or var", real_out, real_in, dim, keepdim, dtype);
if (iter.numel() == 0) {
@ -807,7 +807,7 @@ static Tensor &std_var_out(Tensor &result, const Tensor &self, IntArrayRef dim,
} else {
std_var_stub(iter.device_type(), iter, unbiased, false);
}
Tensor imag_in = self.imag().to(dtype);
Tensor imag_in = at::imag(self).to(dtype);
Tensor imag_out = at::empty({0}, self.options().dtype(dtype));
iter = make_reduction("std or var", imag_out, imag_in, dim, keepdim, dtype);
if (iter.numel() == 0) {
@ -845,7 +845,7 @@ static std::tuple<Tensor&,Tensor&> std_var_mean_out(const char* fname, Tensor &r
".");
if (at::isComplexType(self.scalar_type())){
ScalarType dtype = c10::toValueType(get_dtype(result1, self, {}, true));
Tensor real_in = self.real().to(dtype);
Tensor real_in = at::real(self).to(dtype);
Tensor real_out_var = at::empty({0}, self.options().dtype(dtype));
Tensor real_out_mean = at::empty({0}, self.options().dtype(dtype));
auto iter = make_reduction(fname, real_out_var, real_out_mean, real_in, dim, keepdim, dtype);
@ -855,7 +855,7 @@ static std::tuple<Tensor&,Tensor&> std_var_mean_out(const char* fname, Tensor &r
} else {
std_var_stub(iter.device_type(), iter, unbiased, false);
}
Tensor imag_in = self.imag().to(dtype);
Tensor imag_in = at::imag(self).to(dtype);
Tensor imag_out_var = at::empty({0}, self.options().dtype(dtype));
Tensor imag_out_mean = at::empty({0}, self.options().dtype(dtype));
iter = make_reduction(fname, imag_out_var, imag_out_mean, imag_in, dim, keepdim, dtype);

View File

@ -33,7 +33,7 @@ static inline Tensor to_impl(const Tensor& self, const TensorOptions& options, b
if (self.is_non_overlapping_and_dense()) {
// Copy all strides
auto r = at::empty_strided(self.sizes(), self.strides(), options.memory_format(c10::nullopt));
r.copy_(self);
r.copy_(self, non_blocking);
return r;
} else {
memory_format = self.suggest_memory_format();

View File

@ -99,7 +99,7 @@ Tensor _dim_arange(const Tensor& like, int64_t dim) {
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ empty ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tensor empty_cpu(IntArrayRef size, const TensorOptions& options_, c10::optional<c10::MemoryFormat> optional_memory_format) {
TORCH_CHECK(!isComplexType(at::typeMetaToScalarType(options_.dtype())), "Complex dtype not supported.");
TORCH_CHECK(
!(options_.has_memory_format() && optional_memory_format.has_value()),
"Cannot set memory_format both in TensorOptions and explicit argument; please delete "

View File

@ -98,6 +98,15 @@ Tensor & _cat_out_cpu(Tensor& result, TensorList tensors, int64_t dim) {
"output memory locations. Found overlap in input tensor ", i);
}
// Dtypes should be the same
const auto first_in_cat = tensors[0];
for (int64_t i = 1; i < tensors.size(); i++) {
TORCH_CHECK(first_in_cat.dtype() == tensors[i].dtype(),
"Expected object of scalar type ", first_in_cat.dtype(),
" but got scalar type ", tensors[i].dtype(),
" for sequence element ", i, ".");
}
auto should_skip = [](const Tensor& t) { return t.numel() == 0 && t.dim() == 1; };
for (auto const &tensor : tensors) {
if (should_skip(tensor)) {

View File

@ -73,11 +73,17 @@ Tensor& abs_(Tensor& self) { return unary_op_impl_(self, at::abs_out); }
Tensor& angle_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, angle_stub); }
Tensor angle(const Tensor& self) { return unary_op_impl(self, at::angle_out); }
Tensor& real_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, real_stub); }
Tensor real(const Tensor& self) { return unary_op_impl(self, at::real_out); }
Tensor real(const Tensor& self) {
TORCH_CHECK(!self.is_complex(), "real is not yet implemented for complex tensors.");
return self;
}
Tensor& imag_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, imag_stub); }
Tensor imag(const Tensor& self) { return unary_op_impl(self, at::imag_out); }
Tensor imag(const Tensor& self) {
TORCH_CHECK(false, "imag is not yet implemented.");
// Note: unreachable
return at::zeros_like(self);
}
Tensor& conj_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, conj_stub); }
Tensor conj(const Tensor& self) { return unary_op_impl(self, at::conj_out); }

View File

@ -7,6 +7,7 @@
#include <ATen/native/TensorIterator.h>
#include <ATen/native/BinaryOps.h>
#include <ATen/native/cpu/Loops.h>
#include <c10/macros/Macros.h>
namespace at { namespace native {
namespace {

View File

@ -4,7 +4,7 @@
#include <ATen/native/cuda/zmath.cuh>
#include <ATen/native/TensorIterator.h>
#include <ATen/native/BinaryOps.h>
#include <c10/macros/Macros.h>
// NOTE: CUDA on Windows requires that the enclosing function
// of a __device__ lambda not have internal linkage.

View File

@ -358,7 +358,7 @@ void max_pool2d_with_indices_out_cuda_template(
Tensor input = input_.contiguous(memory_format);
const int64_t in_stride_n = input.stride(-4);
const int64_t in_stride_n = input_.ndimension() == 4 ? input.stride(-4) : 0;
const int64_t in_stride_c = input.stride(-3);
const int64_t in_stride_h = input.stride(-2);
const int64_t in_stride_w = input.stride(-1);
@ -506,7 +506,7 @@ void max_pool2d_with_indices_backward_out_cuda_template(
const int64_t inputHeight = input.size(-2);
const int64_t inputWidth = input.size(-1);
const int64_t in_stride_n = input.stride(-4);
const int64_t in_stride_n = input.ndimension() == 4 ? input.stride(-4) : 0;
const int64_t in_stride_c = input.stride(-3);
const int64_t in_stride_h = input.stride(-2);
const int64_t in_stride_w = input.stride(-1);

View File

@ -198,7 +198,7 @@ void index_put_accum_kernel(Tensor & self, TensorList indices, const Tensor & va
using device_ptr = thrust::device_ptr<int64_t>;
const cudaStream_t stream = at::cuda::getCurrentCUDAStream();
linearIndex.div_(sliceSize);
linearIndex.floor_divide_(sliceSize);
{
sorted_indices.copy_(linearIndex);
auto allocator = THCThrustAllocator(globalContext().lazyInitCUDA());

View File

@ -307,6 +307,15 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
"tensor ", i);
}
// Dtypes should be the same
const auto first_in_cat = inputs[0];
for (int64_t i = 1; i < inputs.size(); i++) {
TORCH_CHECK(first_in_cat.dtype() == inputs[i].dtype(),
"Expected object of scalar type ", first_in_cat.dtype(),
" but got scalar type ", inputs[i].dtype(),
" for sequence element ", i, ".");
}
for (int i = 0; i < inputs.size(); i++)
{
if (should_skip(inputs[i])) {
@ -325,6 +334,12 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
TORCH_CHECK(inputs.size() > 0, "invalid number of inputs ", inputs.size());
TORCH_CHECK(dimension >= 0, "invalid dimension ", dimension);
for (const Tensor& t: inputs) {
TORCH_CHECK(t.device() == notSkippedTensor->device(),
"All input tensors must be on the same device. Received ",
t.device(), " and ", notSkippedTensor->device());
}
c10::MemoryFormat memory_format = compute_output_memory_format(inputs);
std::vector<int64_t> size(notSkippedTensor->sizes().vec());
@ -355,17 +370,11 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
// 4. The number of dimensions is <= 4
// 5. All input tensors are contiguous (output tensor may be non-contig)
// 6. All input tensors can use 32-bit indexing
// 7. All input tensors are on the same device
const bool all32BitIndexable = std::all_of(inputs.begin(), inputs.end(),
[] (const Tensor& t) {
return at::cuda::detail::canUse32BitIndexMath(t);
});
Device firstDevice = notSkippedTensor->device();
const bool allSameDevice = std::all_of(inputs.begin(), inputs.end(),
[firstDevice](const Tensor& t) {
return t.device() == firstDevice;
});
const bool allContiguous = std::all_of(inputs.begin(), inputs.end(),
[=](const Tensor& t) {
return !t.defined() || t.is_contiguous(memory_format);
@ -375,8 +384,7 @@ Tensor& cat_out_cuda(Tensor& out, TensorList inputs, int64_t dimension) {
out.dim() <= CAT_ARRAY_MAX_INPUT_DIMS &&
at::cuda::detail::canUse32BitIndexMath(out) &&
allContiguous &&
all32BitIndexable &&
allSameDevice) {
all32BitIndexable) {
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(
at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16,

View File

@ -125,7 +125,7 @@ struct TopKTypeConfig<at::Half> {
static inline __device__ RadixType convert(at::Half v) {
#if defined(__CUDA_ARCH__) || defined(__HIP_PLATFORM_HCC__)
RadixType x = __half_as_ushort(v);
RadixType mask = -((x >> 15)) | 0x8000;
RadixType mask = (x & 0x00008000) ? 0x0000ffff : 0x00008000;
return (v == v) ? (x ^ mask) : 0xffff;
#else
assert(false);
@ -135,7 +135,7 @@ struct TopKTypeConfig<at::Half> {
static inline __device__ at::Half deconvert(RadixType v) {
#if defined(__CUDA_ARCH__) || defined(__HIP_PLATFORM_HCC__)
RadixType mask = ((v >> 15) - 1) | 0x8000;
RadixType mask = (v & 0x00008000) ? 0x00008000 : 0x0000ffff;
return __ushort_as_half(v ^ mask);
#else
assert(false);

View File

@ -44,6 +44,7 @@ Tensor& eye_out_cuda(Tensor& result, int64_t n, int64_t m) {
}
Tensor empty_cuda(IntArrayRef size, const TensorOptions& options, c10::optional<MemoryFormat> optional_memory_format) {
TORCH_CHECK(!isComplexType(at::typeMetaToScalarType(options.dtype())), "Complex dtype not supported.");
AT_ASSERT(options.device().type() == at::DeviceType::CUDA);
TORCH_INTERNAL_ASSERT(impl::variable_excluded_from_dispatch());
TORCH_CHECK(!options.pinned_memory(), "Only dense CPU tensors can be pinned");

View File

@ -238,18 +238,12 @@
- func: real(Tensor self) -> Tensor
use_c10_dispatcher: full
variants: function, method
supports_named_tensor: True
- func: real.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
variants: function
supports_named_tensor: True
- func: imag(Tensor self) -> Tensor
use_c10_dispatcher: full
variants: function, method
supports_named_tensor: True
- func: imag.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!)
variants: function
supports_named_tensor: True
- func: conj(Tensor self) -> Tensor
@ -2872,7 +2866,7 @@
- func: true_divide.Tensor(Tensor self, Tensor other) -> Tensor
use_c10_dispatcher: full
variants: function
variants: function, method
dispatch:
CPU: true_divide
CUDA: true_divide
@ -2880,6 +2874,15 @@
SparseCUDA: true_divide_sparse
supports_named_tensor: True
- func: true_divide_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
variants: method
dispatch:
CPU: true_divide_
CUDA: true_divide_
SparseCPU: true_divide_sparse_
SparseCUDA: true_divide_sparse_
supports_named_tensor: True
- func: true_divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
dispatch:
CPU: true_divide_out
@ -2890,7 +2893,11 @@
- func: true_divide.Scalar(Tensor self, Scalar other) -> Tensor
use_c10_dispatcher: full
variants: function
variants: function, method
supports_named_tensor: True
- func: true_divide_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!)
variants: method
supports_named_tensor: True
- func: trunc(Tensor self) -> Tensor

View File

@ -272,6 +272,10 @@ SparseTensor& true_divide_out_sparse_scalar(
return true_divide_out_sparse_zerodim(result, dividend, wrapped_scalar_tensor(divisor));
}
Tensor& true_divide_sparse_(Tensor& self, const Tensor& divisor) {
return true_divide_out_sparse_zerodim(self, self, divisor);
}
// --------------------------------------------------------------------
// floor_divide(SparseTensor, Scalar)
// --------------------------------------------------------------------

View File

@ -138,7 +138,7 @@ SparseTensor coalesce_sparse_cuda(const SparseTensor& self) {
// broadcasting logic; instead, it will blast the elements from one
// to the other so long as the numel is the same
indicesSlice.copy_(indices1D);
indices1D.div_(self.size(d));
indices1D.floor_divide_(self.size(d));
indicesSlice.add_(indices1D, -self.size(d));
}
}

View File

@ -14,7 +14,7 @@ namespace xnnpack {
namespace {
torch::jit::class_<XNNPackLinearOpContext> register_xnnpack_linear_op_context_class() {
static auto register_linear_op_context_class =
torch::jit::class_<XNNPackLinearOpContext>("XNNPackLinearOpContext")
torch::jit::class_<XNNPackLinearOpContext>("xnnpack", "XNNPackLinearOpContext")
.def_pickle(
[](const c10::intrusive_ptr<XNNPackLinearOpContext>& op_context)
-> SerializationTypeLinearPrePack { // __getstate__
@ -38,7 +38,7 @@ torch::jit::class_<XNNPackLinearOpContext> register_xnnpack_linear_op_context_cl
torch::jit::class_<XNNPackConv2dOpContext> register_xnnpack_conv2d_op_context_class() {
static auto register_conv2d_op_context_class =
torch::jit::class_<XNNPackConv2dOpContext>("XNNPackConv2dOpContext")
torch::jit::class_<XNNPackConv2dOpContext>("xnnpack", "XNNPackConv2dOpContext")
.def_pickle(
[](const c10::intrusive_ptr<XNNPackConv2dOpContext>& op_context)
-> SerializationTypeConv2dPrePack { // __getstate__
@ -74,25 +74,25 @@ static auto registry =
// Registering under _xnnpack namespace for now. As we add more backend requiring similar functionality
// We can refactor the code and use a better namespace.
torch::RegisterOperators()
.op("_xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> __torch__.torch.classes.XNNPackLinearOpContext",
.op("_xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> __torch__.torch.classes.xnnpack.XNNPackLinearOpContext",
torch::RegisterOperators::options()
.aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
.kernel<internal::linear::LinearPrePack>(
DispatchKey::CPUTensorId))
.op("_xnnpack::linear_packed(Tensor X, __torch__.torch.classes.XNNPackLinearOpContext W_prepack) -> Tensor Y",
.op("_xnnpack::linear_packed(Tensor X, __torch__.torch.classes.xnnpack.XNNPackLinearOpContext W_prepack) -> Tensor Y",
torch::RegisterOperators::options()
.aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
.kernel<internal::linear::LinearPacked>(
DispatchKey::CPUTensorId))
.op("_xnnpack::conv2d_prepack(Tensor W, Tensor? B, int[2] stride, "
"int[2] padding, int[2] dilation, int groups) "
"-> __torch__.torch.classes.XNNPackConv2dOpContext",
"-> __torch__.torch.classes.xnnpack.XNNPackConv2dOpContext",
torch::RegisterOperators::options()
.aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
.kernel<internal::convolution2d::Conv2dPrePack>(
DispatchKey::CPUTensorId))
.op("_xnnpack::conv2d_packed(Tensor X, "
"__torch__.torch.classes.XNNPackConv2dOpContext W_prepack) -> Tensor Y",
"__torch__.torch.classes.xnnpack.XNNPackConv2dOpContext W_prepack) -> Tensor Y",
torch::RegisterOperators::options()
.aliasAnalysis(at::AliasAnalysisKind::PURE_FUNCTION)
.kernel<internal::convolution2d::Conv2dPacked>(

View File

@ -423,6 +423,85 @@ class CAFFE2_API Tensor {
// ~~~~~ Autograd API ~~~~~
/// \fn bool is_leaf() const;
///
/// All Tensors that have `requires_grad()` which is ``false`` will be leaf Tensors by convention.
///
/// For Tensors that have `requires_grad()` which is ``true``, they will be leaf Tensors if they were
/// created by the user. This means that they are not the result of an operation and so
/// `grad_fn()` is `nullptr`.
///
/// Only leaf Tensors will have their `grad()` populated during a call to `backward()`.
/// To get `grad()` populated for non-leaf Tensors, you can use `retain_grad()`.
///
/// Example:
/// @code
/// auto a = torch::rand(10, torch::requires_grad());
/// std::cout << a.is_leaf() << std::endl; // prints `true`
///
/// auto b = torch::rand(10, torch::requires_grad()).to(torch::kCUDA);
/// std::cout << b.is_leaf() << std::endl; // prints `false`
/// // b was created by the operation that cast a cpu Tensor into a cuda Tensor
///
/// auto c = torch::rand(10, torch::requires_grad()) + 2;
/// std::cout << c.is_leaf() << std::endl; // prints `false`
/// // c was created by the addition operation
///
/// auto d = torch::rand(10).cuda();
/// std::cout << d.is_leaf() << std::endl; // prints `true`
/// // d does not require gradients and so has no operation creating it (that is tracked by the autograd engine)
///
/// auto e = torch::rand(10).cuda().requires_grad_();
/// std::cout << e.is_leaf() << std::endl; // prints `true`
/// // e requires gradients and has no operations creating it
///
/// auto f = torch::rand(10, torch::device(torch::kCUDA).requires_grad(true));
/// std::cout << f.is_leaf() << std::endl; // prints `true`
/// // f requires grad, has no operation creating it
/// @endcode
/// \fn void backward(const Tensor & gradient={}, bool keep_graph=false, bool create_graph=false) const;
///
/// Computes the gradient of current tensor with respect to graph leaves.
///
/// The graph is differentiated using the chain rule. If the tensor is
/// non-scalar (i.e. its data has more than one element) and requires
/// gradient, the function additionally requires specifying ``gradient``.
/// It should be a tensor of matching type and location, that contains
/// the gradient of the differentiated function w.r.t. this Tensor.
///
/// This function accumulates gradients in the leaves - you might need to
/// zero them before calling it.
///
/// \param gradient Gradient w.r.t. the
/// tensor. If it is a tensor, it will be automatically converted
/// to a Tensor that does not require grad unless ``create_graph`` is True.
/// None values can be specified for scalar Tensors or ones that
/// don't require grad. If a None value would be acceptable then
/// this argument is optional.
/// \param keep_graph If ``false``, the graph used to compute
/// the grads will be freed. Note that in nearly all cases setting
/// this option to True is not needed and often can be worked around
/// in a much more efficient way. Defaults to the value of
/// ``create_graph``.
/// \param create_graph If ``true``, graph of the derivative will
/// be constructed, allowing to compute higher order derivative
/// products. Defaults to ``false``.
/// \fn Tensor detach() const;
///
/// Returns a new Tensor, detached from the current graph.
/// The result will never require gradient.
/// \fn Tensor & detach_() const;
///
/// Detaches the Tensor from the graph that created it, making it a leaf.
/// Views cannot be detached in-place.
/// \fn void retain_grad() const;
///
/// Enables .grad() for non-leaf Tensors.
Tensor& set_requires_grad(bool requires_grad) {
impl_->set_requires_grad(requires_grad);
return *this;
@ -431,9 +510,16 @@ class CAFFE2_API Tensor {
return impl_->requires_grad();
}
/// Return a mutable reference to the gradient. This is conventionally
/// used as `t.grad() = x` to set a gradient to a completely new tensor.
Tensor& grad() {
return impl_->grad();
}
/// This function returns an undefined tensor by default and returns a defined tensor
/// the first time a call to `backward()` computes gradients for this Tensor.
/// The attribute will then contain the gradients computed and future calls
/// to `backward()` will accumulate (add) gradients into it.
const Tensor& grad() const {
return impl_->grad();
}
@ -505,11 +591,38 @@ class CAFFE2_API Tensor {
template <typename T>
using hook_return_var_t = std::enable_if_t<std::is_same<typename std::result_of<T&(Tensor)>::type, Tensor>::value, unsigned>;
// Returns the index of the hook in the list which can be used to remove hook
// Register a hook with no return value
/// Registers a backward hook.
///
/// The hook will be called every time a gradient with respect to the Tensor is computed.
/// The hook should have one of the following signature:
/// ```
/// hook(Tensor grad) -> Tensor
/// ```
/// ```
/// hook(Tensor grad) -> void
/// ```
/// The hook should not modify its argument, but it can optionally return a new gradient
/// which will be used in place of `grad`.
///
/// This function returns the index of the hook in the list which can be used to remove hook.
///
/// Example:
/// @code
/// auto v = torch::tensor({0., 0., 0.}, torch::requires_grad());
/// auto h = v.register_hook([](torch::Tensor grad){ return grad * 2; }); // double the gradient
/// v.backward(torch::tensor({1., 2., 3.}));
/// // This prints:
/// // ```
/// // 2
/// // 4
/// // 6
/// // [ CPUFloatType{3} ]
/// // ```
/// std::cout << v.grad() << std::endl;
/// v.remove_hook(h); // removes the hook
/// @endcode
template <typename T>
hook_return_void_t<T> register_hook(T&& hook) const;
// Register a hook with variable return value
template <typename T>
hook_return_var_t<T> register_hook(T&& hook) const;
@ -518,7 +631,7 @@ private:
public:
// Remove hook at given position
/// Remove hook at given position
void remove_hook(unsigned pos) const;
// View Variables

View File

@ -69,12 +69,6 @@
# define TH_UNUSED
#endif
#if defined(__clang__)
#define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
#else
#define __ubsan_ignore_float_divide_by_zero__
#endif
#ifndef M_PI
# define M_PI 3.14159265358979323846
#endif

View File

@ -9,7 +9,7 @@ set(extra_src)
# loop over all types
foreach(THC_TYPE Byte Char Short Int Long Half Float Double)
# loop over files which need to be split between types (because of long compile times)
foreach(THC_FILE TensorSort TensorMathPointwise TensorMathReduce TensorMasked)
foreach(THC_FILE TensorSort TensorMathPointwise TensorMathReduce TensorMasked TensorTopK)
if(NOT EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/generated/THC${THC_FILE}${THC_TYPE}.cu")
FILE(WRITE "${CMAKE_CURRENT_SOURCE_DIR}/generated/THC${THC_FILE}${THC_TYPE}.cu"
"#include <THC/THC${THC_FILE}.cuh>\n#include <THC/THCTensor.hpp>\n\n#include <THC/generic/THC${THC_FILE}.cu>\n#include <THC/THCGenerate${THC_TYPE}Type.h>\n")
@ -56,7 +56,6 @@ set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS}
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorIndex.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorRandom.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorScatterGather.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorTopK.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorSort.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCSortUtils.cu
${CMAKE_CURRENT_SOURCE_DIR}/THCTensorMode.cu

View File

@ -1,19 +0,0 @@
#include <THC/THC.h>
#include <THC/THCReduceApplyUtils.cuh>
#include <THC/THCTensorCopy.h>
#include <THC/THCTensorMath.h>
#include <THC/THCAsmUtils.cuh>
#include <THC/THCScanUtils.cuh>
#include <THC/THCTensorTypeUtils.cuh>
#include <THC/THCTensorMathReduce.cuh>
#include <ATen/WrapDimUtils.h>
#include <algorithm> // for std::min
#if CUDA_VERSION >= 7000 || defined __HIP_PLATFORM_HCC__
#include <thrust/system/cuda/execution_policy.h>
#endif
#include <THC/THCTensorTopK.cuh>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateAllTypes.h>

View File

@ -1,6 +1,21 @@
#ifndef THC_TENSOR_TOPK_CUH
#define THC_TENSOR_TOPK_CUH
#include <THC/THC.h>
#include <THC/THCReduceApplyUtils.cuh>
#include <THC/THCTensorCopy.h>
#include <THC/THCTensorMath.h>
#include <THC/THCAsmUtils.cuh>
#include <THC/THCScanUtils.cuh>
#include <THC/THCTensorTypeUtils.cuh>
#include <THC/THCTensorMathReduce.cuh>
#include <ATen/WrapDimUtils.h>
#include <algorithm> // for std::min
#if CUDA_VERSION >= 7000 || defined __HIP_PLATFORM_HCC__
#include <thrust/system/cuda/execution_policy.h>
#endif
#include <c10/macros/Macros.h>
#include <ATen/native/cuda/SortingRadixSelect.cuh>
@ -52,6 +67,7 @@ __global__ void gatherTopK(TensorInfo<T, IndexType> input,
inputSliceStart, outputSliceSize,
inputSliceSize, inputWithinSliceStride,
smem, &topKValue);
const auto topKConverted = at::native::TopKTypeConfig<T>::convert(topKValue);
// Every value that is strictly less/greater than `pattern`
// (depending on sort dir) in sorted int format is in the top-K.
@ -74,11 +90,12 @@ __global__ void gatherTopK(TensorInfo<T, IndexType> input,
bool inRange = (i < inputSliceSize);
T v =
inRange ? doLdg(&inputSliceStart[i * inputWithinSliceStride]) : ScalarConvert<int, T>::to(0);
const auto convertedV = at::native::TopKTypeConfig<T>::convert(v);
bool hasTopK;
if (Order) {
hasTopK = inRange && (THCNumerics<T>::gt(v, topKValue));
hasTopK = inRange && (convertedV > topKConverted);
} else {
hasTopK = inRange && (THCNumerics<T>::lt(v, topKValue));
hasTopK = inRange && (convertedV < topKConverted);
}
int index;
@ -111,7 +128,8 @@ __global__ void gatherTopK(TensorInfo<T, IndexType> input,
bool inRange = (i < inputSliceSize);
T v =
inRange ? doLdg(&inputSliceStart[i * inputWithinSliceStride]) : ScalarConvert<int, T>::to(0);
bool hasTopK = inRange && (THCNumerics<T>::eq(v, topKValue));
const auto convertedV = at::native::TopKTypeConfig<T>::convert(v);
bool hasTopK = inRange && (convertedV == topKConverted);
int index;
int carry;

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateByteType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateCharType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateDoubleType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateFloatType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateHalfType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateIntType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateLongType.h>

View File

@ -0,0 +1,5 @@
#include <THC/THCTensorTopK.cuh>
#include <THC/THCTensor.hpp>
#include <THC/generic/THCTensorTopK.cu>
#include <THC/THCGenerateShortType.h>

View File

@ -23,6 +23,14 @@
#include "c10/macros/Export.h"
#if defined(__clang__)
#define __ubsan_ignore_float_divide_by_zero__ __attribute__((no_sanitize("float-divide-by-zero")))
#define __ubsan_ignore_float_cast_overflow__ __attribute__((no_sanitize("float-cast-overflow")))
#else
#define __ubsan_ignore_float_divide_by_zero__
#define __ubsan_ignore_float_cast_overflow__
#endif
// Disable the copy and assignment operator for a class. Note that this will
// disable the usage of the class in std containers.
#define C10_DISABLE_COPY_AND_ASSIGN(classname) \

View File

@ -66,24 +66,44 @@ void Error::AppendMessage(const std::string& new_msg) {
namespace Warning {
namespace {
WarningHandler* getHandler() {
WarningHandler* getBaseHandler() {
static WarningHandler base_warning_handler_ = WarningHandler();
return &base_warning_handler_;
};
static thread_local WarningHandler* warning_handler_ = getHandler();
class ThreadWarningHandler {
public:
ThreadWarningHandler() = delete;
static WarningHandler* get_handler() {
if (!warning_handler_) {
warning_handler_ = getBaseHandler();
}
return warning_handler_;
}
static void set_handler(WarningHandler* handler) {
warning_handler_ = handler;
}
private:
static thread_local WarningHandler* warning_handler_;
};
thread_local WarningHandler* ThreadWarningHandler::warning_handler_ = nullptr;
}
void warn(SourceLocation source_location, const std::string& msg) {
warning_handler_->process(source_location, msg);
ThreadWarningHandler::get_handler()->process(source_location, msg);
}
void set_warning_handler(WarningHandler* handler) noexcept(true) {
warning_handler_ = handler;
ThreadWarningHandler::set_handler(handler);
}
WarningHandler* get_warning_handler() noexcept(true) {
return warning_handler_;
return ThreadWarningHandler::get_handler();
}
} // namespace Warning

View File

@ -67,7 +67,7 @@ struct maybe_real<true, src_t> {
template <typename dest_t, typename src_t>
struct static_cast_with_inter_type {
C10_HOST_DEVICE static inline dest_t apply(src_t src) {
C10_HOST_DEVICE __ubsan_ignore_float_cast_overflow__ static inline dest_t apply(src_t src) {
constexpr bool real = needs_real<dest_t, src_t>::value;
return static_cast<dest_t>(
static_cast<inter_copy_type_t<dest_t>>(maybe_real<real, src_t>::apply(src)));

View File

@ -748,7 +748,7 @@ if (NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE)
target_include_directories(torch_cuda PUBLIC "${NVTOOLEXT_HOME}/include")
# -INCLUDE is used to ensure torch_cuda is linked against in a project that relies on it.
# Related issue: https://github.com/pytorch/pytorch/issues/31611
target_link_libraries(torch_cuda INTERFACE "-INCLUDE:\"?warp_size@cuda@at@@YAHXZ\"")
target_link_libraries(torch_cuda INTERFACE "-INCLUDE:?warp_size@cuda@at@@YAHXZ")
elseif(APPLE)
set(TORCH_CUDA_LIBRARIES
@ -949,6 +949,31 @@ if (USE_OPENMP AND OPENMP_FOUND)
target_link_libraries(torch_cpu PRIVATE ${OpenMP_CXX_LIBRARIES})
endif()
if ($ENV{TH_BINARY_BUILD})
if (NOT MSVC AND USE_CUDA AND NOT APPLE)
# Note [Extra MKL symbols for MAGMA in torch_cpu]
#
# When we build CUDA libraries and link against MAGMA, MAGMA makes use of
# some BLAS symbols in its CPU fallbacks when it has no GPU versions
# of kernels. Previously, we ensured the BLAS symbols were filled in by
# MKL by linking torch_cuda with BLAS, but when we are statically linking
# against MKL (when we do wheel builds), this actually ends up pulling in a
# decent chunk of MKL into torch_cuda, inflating our torch_cuda binary
# size by 8M. torch_cpu exposes most of the MKL symbols we need, but
# empirically we determined that there are four which it doesn't provide. If
# we link torch_cpu with these --undefined symbols, we can ensure they
# do get pulled in, and then we can avoid statically linking in MKL to
# torch_cuda at all!
#
# We aren't really optimizing for binary size on Windows (and this link
# line doesn't work on Windows), so don't do it there.
#
# These linker commands do not work on OS X, do not attempt this there.
# (It shouldn't matter anyway, though, because OS X has dropped CUDA support)
set_target_properties(torch_cpu PROPERTIES LINK_FLAGS "-Wl,--undefined=mkl_lapack_slaed0 -Wl,--undefined=mkl_lapack_dlaed0 -Wl,--undefined=mkl_lapack_dormql -Wl,--undefined=mkl_lapack_sormql")
endif()
endif()
target_link_libraries(torch_cpu PUBLIC c10)
target_link_libraries(torch_cpu PUBLIC ${Caffe2_PUBLIC_DEPENDENCY_LIBS})
target_link_libraries(torch_cpu PRIVATE ${Caffe2_DEPENDENCY_LIBS})

View File

@ -1,6 +1,8 @@
#include "caffe2/operators/fused_rowwise_nbitfake_conversion_ops.h"
#include <fp16.h>
#ifdef __AVX__
#include <immintrin.h>
#endif
#include "c10/util/Registry.h"
namespace caffe2 {

View File

@ -50,8 +50,13 @@ __global__ void ReluCUDAKernel<half2>(const int N, const half2* X, half2* Y) {
Y[i] = __hmul2(__hgt2(__ldg(X + i), kZero), __ldg(X + i));
#else
const float2 xx = __half22float2(X[i]);
Y[i] =
__floats2half2_rn(xx.x > 0.0f ? xx.x : 0.0f, xx.y > 0.0f ? xx.y : 0.0f);
// There are explicit cast to float here, because it may otherwise cause ambiguity on ROCm and can be triggered
// sometimes:
//
// error: conditional expression is ambiguous; 'const hip_impl::Scalar_accessor<float, Native_vec_, 0>' can be
// converted to 'float' and vice versa
Y[i] = __floats2half2_rn(xx.x > 0.0f ? static_cast<float>(xx.x) : 0.0f,
xx.y > 0.0f ? static_cast<float>(xx.y) : 0.0f);
#endif
}
}
@ -100,8 +105,14 @@ __global__ void ReluGradientCUDAKernel<half2>(
#else
const float2 dy = __half22float2(dY[i]);
const float2 yy = __half22float2(Y[i]);
dX[i] =
__floats2half2_rn(yy.x > 0.0f ? dy.x : 0.0f, yy.y > 0.0f ? dy.y : 0.0f);
// There are explicit cast to float here, because it may otherwise cause ambiguity on ROCm and can be triggered
// sometimes:
//
// error: conditional expression is ambiguous; 'const hip_impl::Scalar_accessor<float, Native_vec_, 1>' can be
// converted to 'float' and vice versa
dX[i] = __floats2half2_rn(yy.x > 0.0f ? static_cast<float>(dy.x) : 0.0f,
yy.y > 0.0f ? static_cast<float>(dy.y) : 0.0f);
#endif
}
}

View File

@ -15,6 +15,7 @@ if (NOT __NCCL_INCLUDED)
# this second replacement is needed when there are multiple archs
string(REPLACE ";-gencode" " -gencode" NVCC_GENCODE "${NVCC_GENCODE}")
set(__NCCL_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/nccl")
ExternalProject_Add(nccl_external
SOURCE_DIR ${PROJECT_SOURCE_DIR}/third_party/nccl/nccl
BUILD_IN_SOURCE 1
@ -30,20 +31,49 @@ if (NOT __NCCL_INCLUDED)
"CUDA_HOME=${CUDA_TOOLKIT_ROOT_DIR}"
"NVCC=${CUDA_NVCC_EXECUTABLE}"
"NVCC_GENCODE=${NVCC_GENCODE}"
"BUILDDIR=${CMAKE_CURRENT_BINARY_DIR}/nccl"
"BUILDDIR=${__NCCL_BUILD_DIR}"
"VERBOSE=0"
"-j"
BUILD_BYPRODUCTS "${CMAKE_CURRENT_BINARY_DIR}/nccl/lib/libnccl_static.a"
BUILD_BYPRODUCTS "${__NCCL_BUILD_DIR}/lib/libnccl_static.a"
INSTALL_COMMAND ""
)
# Detect objcopy version
execute_process (COMMAND "${CMAKE_OBJCOPY}" "--version" OUTPUT_VARIABLE OBJCOPY_VERSION_STR)
string(REGEX REPLACE "GNU objcopy version ([0-9])\\.([0-9]+).*" "\\1" OBJCOPY_VERSION_MAJOR ${OBJCOPY_VERSION_STR})
string(REGEX REPLACE "GNU objcopy version ([0-9])\\.([0-9]+).*" "\\2" OBJCOPY_VERSION_MINOR ${OBJCOPY_VERSION_STR})
if ((${OBJCOPY_VERSION_MAJOR} GREATER 2) OR ((${OBJCOPY_VERSION_MAJOR} EQUAL 2) AND (${OBJCOPY_VERSION_MINOR} GREATER 27)))
message(WARNING "Enabling NCCL library slimming")
add_custom_command(
OUTPUT "${__NCCL_BUILD_DIR}/lib/libnccl_slim_static.a"
DEPENDS nccl_external
COMMAND "${CMAKE_COMMAND}" -E make_directory "${__NCCL_BUILD_DIR}/objects"
COMMAND cd objects
COMMAND "${CMAKE_AR}" x "${__NCCL_BUILD_DIR}/lib/libnccl_static.a"
COMMAND for obj in all_gather_* all_reduce_* broadcast_* reduce_*.o$<SEMICOLON> do "${CMAKE_OBJCOPY}" --remove-relocations .nvFatBinSegment --remove-section __nv_relfatbin $$obj$<SEMICOLON> done
COMMAND "${CMAKE_AR}" cr "${__NCCL_BUILD_DIR}/lib/libnccl_slim_static.a" "*.o"
COMMAND cd -
COMMAND "${CMAKE_COMMAND}" -E remove_directory "${__NCCL_BUILD_DIR}/objects"
WORKING_DIRECTORY "${__NCCL_BUILD_DIR}"
COMMENT "Slimming NCCL"
)
add_custom_target(nccl_slim_external DEPENDS "${__NCCL_BUILD_DIR}/lib/libnccl_slim_static.a")
set(__NCCL_LIBRARY_DEP nccl_slim_external)
set(NCCL_LIBRARIES ${__NCCL_BUILD_DIR}/lib/libnccl_slim_static.a)
else()
message(WARNING "Objcopy version is too old to support NCCL library slimming")
set(__NCCL_LIBRARY_DEP nccl_external)
set(NCCL_LIBRARIES ${__NCCL_BUILD_DIR}/lib/libnccl_static.a)
endif()
set(NCCL_FOUND TRUE)
add_library(__caffe2_nccl INTERFACE)
# The following old-style variables are set so that other libs, such as Gloo,
# can still use it.
set(NCCL_INCLUDE_DIRS ${CMAKE_CURRENT_BINARY_DIR}/nccl/include)
set(NCCL_LIBRARIES ${CMAKE_CURRENT_BINARY_DIR}/nccl/lib/libnccl_static.a)
add_dependencies(__caffe2_nccl nccl_external)
set(NCCL_INCLUDE_DIRS ${__NCCL_BUILD_DIR}/include)
add_dependencies(__caffe2_nccl ${__NCCL_LIBRARY_DEP})
target_link_libraries(__caffe2_nccl INTERFACE ${NCCL_LIBRARIES})
target_include_directories(__caffe2_nccl INTERFACE ${NCCL_INCLUDE_DIRS})
endif()

View File

@ -56,6 +56,10 @@ INPUT = ../../../aten/src/ATen/ATen.h \
../../../c10/cuda/CUDAStream.h \
../../../torch/csrc/api/include \
../../../torch/csrc/api/src \
../../../torch/csrc/autograd/autograd.h \
../../../torch/csrc/autograd/custom_function.h \
../../../torch/csrc/autograd/function.h \
../../../torch/csrc/autograd/variable.h \
../../../torch/csrc/autograd/generated/variable_factories.h \
../../../torch/csrc/jit/runtime/custom_operator.h \
../../../torch/csrc/jit/serialization/import.h \

View File

@ -281,7 +281,9 @@ change one property, this is quite practical.
In conclusion, we can now compare how ``TensorOptions`` defaults, together with
the abbreviated API for creating ``TensorOptions`` using free functions, allow
tensor creation in C++ with the same convenience as in Python. Compare this
call in Python::
call in Python:
.. code-block:: python
torch.randn(3, 4, dtype=torch.float32, device=torch.device('cuda', 1), requires_grad=True)

View File

@ -0,0 +1,99 @@
Tensor Indexing API
===================
Indexing a tensor in the PyTorch C++ API works very similar to the Python API.
All index types such as ``None`` / ``...`` / integer / boolean / slice / tensor
are available in the C++ API, making translation from Python indexing code to C++
very simple. The main difference is that, instead of using the ``[]``-operator
similar to the Python API syntax, in the C++ API the indexing methods are:
- ``torch::Tensor::index`` (`link <https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor5indexE8ArrayRefIN2at8indexing11TensorIndexEE>`_)
- ``torch::Tensor::index_put_`` (`link <https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4N2at6Tensor10index_put_E8ArrayRefIN2at8indexing11TensorIndexEERK6Tensor>`_)
It's also important to note that index types such as ``None`` / ``Ellipsis`` / ``Slice``
live in the ``torch::indexing`` namespace, and it's recommended to put ``using namespace torch::indexing``
before any indexing code for convenient use of those index types.
Here are some examples of translating Python indexing code to C++:
Getter
------
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| Python | C++ (assuming ``using namespace torch::indexing``) |
+==========================================================+======================================================================================+
| ``tensor[None]`` | ``tensor.index({None})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[Ellipsis, ...]`` | ``tensor.index({Ellipsis, "..."})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[1, 2]`` | ``tensor.index({1, 2})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[True, False]`` | ``tensor.index({true, false})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[1::2]`` | ``tensor.index({Slice(1, None, 2)})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[torch.tensor([1, 2])]`` | ``tensor.index({torch::tensor({1, 2})})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[..., 0, True, 1::2, torch.tensor([1, 2])]`` | ``tensor.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
Setter
------
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| Python | C++ (assuming ``using namespace torch::indexing``) |
+==========================================================+======================================================================================+
| ``tensor[None] = 1`` | ``tensor.index_put_({None}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[Ellipsis, ...] = 1`` | ``tensor.index_put_({Ellipsis, "..."}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[1, 2] = 1`` | ``tensor.index_put_({1, 2}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[True, False] = 1`` | ``tensor.index_put_({true, false}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[1::2] = 1`` | ``tensor.index_put_({Slice(1, None, 2)}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[torch.tensor([1, 2])] = 1`` | ``tensor.index_put_({torch::tensor({1, 2})}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
| ``tensor[..., 0, True, 1::2, torch.tensor([1, 2])] = 1`` | ``tensor.index_put_({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}, 1)`` |
+----------------------------------------------------------+--------------------------------------------------------------------------------------+
Translating between Python/C++ index types
------------------------------------------
The one-to-one translation between Python and C++ index types is as follows:
+-------------------------+------------------------------------------------------------------------+
| Python | C++ (assuming ``using namespace torch::indexing``) |
+=========================+========================================================================+
| ``None`` | ``None`` |
+-------------------------+------------------------------------------------------------------------+
| ``Ellipsis`` | ``Ellipsis`` |
+-------------------------+------------------------------------------------------------------------+
| ``...`` | ``"..."`` |
+-------------------------+------------------------------------------------------------------------+
| ``123`` | ``123`` |
+-------------------------+------------------------------------------------------------------------+
| ``True`` | ``true`` |
+-------------------------+------------------------------------------------------------------------+
| ``False`` | ``false`` |
+-------------------------+------------------------------------------------------------------------+
| ``:`` or ``::`` | ``Slice()`` or ``Slice(None, None)`` or ``Slice(None, None, None)`` |
+-------------------------+------------------------------------------------------------------------+
| ``1:`` or ``1::`` | ``Slice(1, None)`` or ``Slice(1, None, None)`` |
+-------------------------+------------------------------------------------------------------------+
| ``:3`` or ``:3:`` | ``Slice(None, 3)`` or ``Slice(None, 3, None)`` |
+-------------------------+------------------------------------------------------------------------+
| ``::2`` | ``Slice(None, None, 2)`` |
+-------------------------+------------------------------------------------------------------------+
| ``1:3`` | ``Slice(1, 3)`` |
+-------------------------+------------------------------------------------------------------------+
| ``1::2`` | ``Slice(1, None, 2)`` |
+-------------------------+------------------------------------------------------------------------+
| ``:3:2`` | ``Slice(None, 3, 2)`` |
+-------------------------+------------------------------------------------------------------------+
| ``1:3:2`` | ``Slice(1, 3, 2)`` |
+-------------------------+------------------------------------------------------------------------+
| ``torch.tensor([1, 2])``| ``torch::tensor({1, 2})`` |
+-------------------------+------------------------------------------------------------------------+

View File

@ -1,4 +1,4 @@
sphinx
sphinx==2.4.4
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
sphinxcontrib.katex
matplotlib

View File

@ -13,6 +13,13 @@ use ``torch.float16`` (``half``). Some operations, like linear layers and convol
are much faster in ``float16``. Other operations, like reductions, often require the dynamic
range of ``float32``. Networks running in mixed precision try to match each operation to its appropriate datatype.
.. warning::
:class:`torch.cuda.amp.GradScaler` is not a complete implementation of automatic mixed precision.
:class:`GradScaler` is only useful if you manually run regions of your model in ``float16``.
If you aren't sure how to choose op precision manually, the master branch and nightly pip/conda
builds include a context manager that chooses op precision automatically wherever it's enabled.
See the `master documentation<https://pytorch.org/docs/master/amp.html>`_ for details.
.. contents:: :local:
.. _gradient-scaling:

View File

@ -395,6 +395,8 @@ of 16
.. autofunction:: all_gather_multigpu
.. _distributed-launch:
Launch utility
--------------

View File

@ -16,7 +16,6 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
:caption: Notes
notes/*
PyTorch on XLA Devices <http://pytorch.org/xla/>
.. toctree::
:maxdepth: 1
@ -46,7 +45,7 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
onnx
optim
quantization
rpc
rpc/index.rst
torch.random <random>
sparse
storage
@ -62,24 +61,15 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
name_inference
torch.__config__ <__config__>
.. toctree::
:glob:
:maxdepth: 2
:caption: torchvision Reference
torchvision/index
.. toctree::
:maxdepth: 1
:caption: torchaudio Reference
:caption: Libraries
PyTorch on XLA Devices <http://pytorch.org/xla/>
PyTorch Elastic (torchelastic) <https://pytorch.org/elastic/>
torchaudio <https://pytorch.org/audio>
.. toctree::
:maxdepth: 1
:caption: torchtext Reference
torchtext <https://pytorch.org/text>
torchvision/index
.. toctree::
:glob:

View File

@ -5,6 +5,13 @@ Automatic Mixed Precision examples
.. currentmodule:: torch.cuda.amp
.. warning::
:class:`torch.cuda.amp.GradScaler` is not a complete implementation of automatic mixed precision.
:class:`GradScaler` is only useful if you manually run regions of your model in ``float16``.
If you aren't sure how to choose op precision manually, the master branch and nightly pip/conda
builds include a context manager that chooses op precision automatically wherever it's enabled.
See the `master documentation<https://pytorch.org/docs/master/amp.html>`_ for details.
.. contents:: :local:
.. _gradient-scaling-examples:

View File

@ -306,20 +306,30 @@ to overlap data transfers with computation.
You can make the :class:`~torch.utils.data.DataLoader` return batches placed in
pinned memory by passing ``pin_memory=True`` to its constructor.
.. _cuda-nn-dataparallel-instead:
.. _cuda-nn-ddp-instead:
Use nn.DataParallel instead of multiprocessing
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Most use cases involving batched inputs and multiple GPUs should default to
using :class:`~torch.nn.DataParallel` to utilize more than one GPU. Even with
the GIL, a single Python process can saturate multiple GPUs.
As of version 0.1.9, large numbers of GPUs (8+) might not be fully utilized.
However, this is a known issue that is under active development. As always,
test your use case.
using :class:`~torch.nn.parallel.DistributedDataParallel` to utilize more
than one GPU.
There are significant caveats to using CUDA models with
:mod:`~torch.multiprocessing`; unless care is taken to meet the data handling
requirements exactly, it is likely that your program will have incorrect or
undefined behavior.
It is recommended to use :class:`~torch.nn.parallel.DistributedDataParallel`,
instead of :class:`~torch.nn.DataParallel` to do multi-GPU training, even if
there is only a single node.
The difference between :class:`~torch.nn.parallel.DistributedDataParallel` and
:class:`~torch.nn.DataParallel` is: :class:`~torch.nn.parallel.DistributedDataParallel`
uses multiprocessing where a process is created for each GPU, while
:class:`~torch.nn.DataParallel` uses multithreading. By using multiprocessing,
each GPU has its dedicated process, this avoids the performance overhead caused
by GIL of Python interpreter.
If you use :class:`~torch.nn.parallel.DistributedDataParallel`, you could use
`torch.distributed.launch` utility to launch your program, see :ref:`distributed-launch`.

View File

@ -45,7 +45,7 @@ the consumer process has references to the tensor, and the refcounting can not
save you if the consumer process exits abnormally via a fatal signal. See
:ref:`this section <multiprocessing-cuda-sharing-details>`.
See also: :ref:`cuda-nn-dataparallel-instead`
See also: :ref:`cuda-nn-ddp-instead`
Best practices and tips

24
docs/source/rpc/index.rst Normal file
View File

@ -0,0 +1,24 @@
.. _rpc-index:
Distributed RPC Framework
==============================
The distributed RPC framework provides mechanisms for multi-machine model training through a set of primitives to allow for remote communication, and a higher-level API to automatically differentiate models split across several machines.
- :ref:`distributed-rpc-framework`
Design Notes
-----------
The distributed autograd design note covers the design of the RPC-based distributed autograd framework that is useful for applications such as model parallel training.
- :ref:`distributed-autograd-design`
The RRef design note covers the design of the :ref:`rref` (Remote REFerence) protocol used to refer to values on remote workers by the framework.
- :ref:`remote-reference-protocol`
Tutorials
---------
The RPC tutorial introduces users to the RPC framework and provides two example applications using :ref:`torch.distributed.rpc<distributed-rpc-framework>` APIs.
- `Getting started with Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_tutorial.html>`__

View File

@ -8,6 +8,8 @@ training through a set of primitives to allow for remote communication, and a
higher-level API to automatically differentiate models split across several
machines.
.. warning ::
APIs in the RPC package are stable. There are multiple ongoing work items to improve performance and error handling, which will ship in future releases.
Basics

View File

@ -210,3 +210,25 @@ Example::
(1, 5)
For more information on ``torch.sparse_coo`` tensors, see :ref:`sparse-docs`.
torch.memory_format
------------
.. class:: torch.memory_format
A :class:`torch.memory_format` is an object representing the memory format on which a :class:`torch.Tensor` is
or will be allocated.
Possible values are:
- ``torch.contiguous_format``:
Tensor is or will be allocated in dense non-overlapping memory. Strides represented by values in decreasing order.
- ``torch.channels_last``:
Tensor is or will be allocated in dense non-overlapping memory. Strides represented by values in
``strides[0] > strides[2] > strides[3] > strides[1] == 1`` aka NHWC order.
- ``torch.preserve_format``:
Used in functions like `clone` to preserve the memory format of the input tensor. If input tensor is
allocated in dense non-overlapping memory, the output tensor strides will be copied from the input.
Otherwise output strides will follow ``torch.contiguous_format``

View File

@ -49,8 +49,10 @@ For reference, heres a full list of view ops in PyTorch:
- Basic slicing and indexing op, e.g. ``tensor[0, 2:, 1:7:2]`` returns a view of base ``tensor``, see note below.
- :meth:`~torch.Tensor.as_strided`
- :meth:`~torch.Tensor.detach`
- :meth:`~torch.Tensor.diagonal`
- :meth:`~torch.Tensor.expand`
- :meth:`~torch.Tensor.expand_as`
- :meth:`~torch.Tensor.narrow`
- :meth:`~torch.Tensor.permute`
- :meth:`~torch.Tensor.select`

View File

@ -296,7 +296,6 @@ view of a storage and defines numeric operations on it.
.. automethod:: hardshrink
.. automethod:: histc
.. automethod:: ifft
.. automethod:: imag
.. automethod:: index_add_
.. automethod:: index_add
.. automethod:: index_copy_
@ -413,7 +412,6 @@ view of a storage and defines numeric operations on it.
:noindex:
.. automethod:: remainder
.. automethod:: remainder_
.. automethod:: real
.. automethod:: renorm
.. automethod:: renorm_
.. automethod:: repeat
@ -495,6 +493,8 @@ view of a storage and defines numeric operations on it.
.. automethod:: tril_
.. automethod:: triu
.. automethod:: triu_
.. automethod:: true_divide
.. automethod:: true_divide_
.. automethod:: trunc
.. automethod:: trunc_
.. automethod:: type

View File

@ -352,10 +352,10 @@ def build_deps():
################################################################################
# the list of runtime dependencies required by this built package
install_requires = []
install_requires = ['future']
if sys.version_info <= (2, 7):
install_requires += ['future', 'typing']
install_requires += ['typing']
missing_pydep = '''
Missing build dependency: Unable to `import {importname}`.

View File

@ -21,100 +21,109 @@ white_list = [
# We export some functions and classes for test_jit.py directly from libtorch.so,
# it's not important to have BC for them
('_TorchScriptTesting.*', datetime.date(9999, 1, 1)),
('aten::pop*', datetime.date(2020, 4, 1)),
('aten::insert*', datetime.date(2020, 4, 1)),
('aten::Delete*', datetime.date(2020, 4, 1)),
('aten::clear*', datetime.date(2020, 4, 1)),
('aten::_set_item*', datetime.date(2020, 4, 1)),
('aten::copy*', datetime.date(2020, 4, 1)),
('aten::extend*', datetime.date(2020, 4, 1)),
('aten::reverse*', datetime.date(2020, 4, 1)),
('aten::append*', datetime.date(2020, 4, 1)),
('aten::list*', datetime.date(2020, 4, 1)),
('aten::__getitem__*', datetime.date(2020, 4, 1)),
('aten::len*', datetime.date(2020, 4, 1)),
('aten::mul_*', datetime.date(2020, 4, 1)),
('aten::slice*', datetime.date(2020, 4, 1)),
('aten::add*', datetime.date(2020, 4, 1)),
('aten::mul*', datetime.date(2020, 4, 1)),
('aten::select*', datetime.date(2020, 4, 1)),
('aten::add_*', datetime.date(2020, 4, 1)),
# _like default change, see https://github.com/pytorch/pytorch/issues/33580
('aten::randn_like', datetime.date(2020, 3, 15)),
('aten::full_like', datetime.date(2020, 3, 15)),
('aten::empty_like', datetime.date(2020, 3, 15)),
('aten::rand_like', datetime.date(2020, 3, 15)),
('aten::ones_like', datetime.date(2020, 3, 15)),
('aten::randint_like', datetime.date(2020, 3, 15)),
('aten::zeros_like', datetime.date(2020, 3, 15)),
('aten::floor_divide', datetime.date(2020, 4, 1)),
('aten::Bool', datetime.date(2020, 4, 1)),
('aten::Float', datetime.date(2020, 4, 1)),
('aten::to', datetime.date(2020, 4, 1)),
('aten::backward', datetime.date(2020, 4, 1)),
('aten::len', datetime.date(2020, 4, 1)),
('aten::remove', datetime.date(2020, 4, 1)),
('aten::index', datetime.date(2020, 4, 1)),
('aten::count', datetime.date(2020, 4, 1)),
('aten::__contains__', datetime.date(2020, 4, 1)),
('aten::sort', datetime.date(2020, 4, 1)),
('aten::sorted', datetime.date(2020, 4, 1)),
('aten::eq', datetime.date(2020, 4, 1)),
('aten::ne', datetime.date(2020, 4, 1)),
('aten::lt', datetime.date(2020, 4, 1)),
('aten::gt', datetime.date(2020, 4, 1)),
('aten::le', datetime.date(2020, 4, 1)),
('aten::ge', datetime.date(2020, 4, 1)),
('aten::divmod', datetime.date(2020, 4, 1)),
('aten::__upsample_bilinear', datetime.date(2020, 4, 1)),
('aten::__upsample', datetime.date(2020, 4, 1)),
('aten::__upsample_nearest', datetime.date(2020, 4, 1)),
('aten::__interpolate', datetime.date(2020, 4, 1)),
('aten::fabs', datetime.date(2020, 4, 1)),
('aten::gamma', datetime.date(2020, 4, 1)),
('prim::abs', datetime.date(2020, 4, 1)),
('aten::factorial', datetime.date(2020, 4, 1)),
('aten::radians', datetime.date(2020, 4, 1)),
('aten::degrees', datetime.date(2020, 4, 1)),
('prim::acosh', datetime.date(2020, 4, 1)),
('prim::atanh', datetime.date(2020, 4, 1)),
('aten::asinh', datetime.date(2020, 4, 1)),
('aten::floordiv', datetime.date(2020, 4, 1)),
('prim::NumToTensor', datetime.date(2020, 4, 1)),
('aten::sin', datetime.date(2020, 4, 1)),
('aten::round', datetime.date(2020, 4, 1)),
('aten::remainder', datetime.date(2020, 4, 1)),
('aten::isfinite', datetime.date(2020, 4, 1)),
('aten::sub', datetime.date(2020, 4, 1)),
('aten::sqrt', datetime.date(2020, 4, 1)),
('aten::log1p', datetime.date(2020, 4, 1)),
('aten::acos', datetime.date(2020, 4, 1)),
('aten::floor', datetime.date(2020, 4, 1)),
('aten::exp', datetime.date(2020, 4, 1)),
('aten::tan', datetime.date(2020, 4, 1)),
('aten::sinh', datetime.date(2020, 4, 1)),
('aten::ceil', datetime.date(2020, 4, 1)),
('aten::atan', datetime.date(2020, 4, 1)),
('aten::erf', datetime.date(2020, 4, 1)),
('aten::erfc', datetime.date(2020, 4, 1)),
('aten::cosh', datetime.date(2020, 4, 1)),
('aten::expm1', datetime.date(2020, 4, 1)),
('aten::isinf', datetime.date(2020, 4, 1)),
('aten::lgamma', datetime.date(2020, 4, 1)),
('aten::asin', datetime.date(2020, 4, 1)),
('aten::log', datetime.date(2020, 4, 1)),
('aten::log10', datetime.date(2020, 4, 1)),
('aten::cos', datetime.date(2020, 4, 1)),
('aten::tanh', datetime.date(2020, 4, 1)),
('prim::min', datetime.date(2020, 4, 1)),
('prim::max', datetime.date(2020, 4, 1)),
('aten::_linear_packed', datetime.date(2020, 4, 1)),
('aten::_linear_prepack', datetime.date(2020, 4, 1)),
('aten::_conv2d_packed', datetime.date(2020, 4, 1)),
('aten::_conv2d_prepack', datetime.date(2020, 4, 1)),
('aten::confirmed_by_owner', datetime.date(2020, 3, 17)),
('aten::owner', datetime.date(2020, 3, 27)),
('aten::owner_name', datetime.date(2020, 3, 27)),
('_caffe2', datetime.date(9999, 1, 1)),
('_aten', datetime.date(9999, 1, 1)),
('prim::', datetime.date(9999, 1, 1)),
('onnx::', datetime.date(9999, 1, 1)),
('aten::_set_item', datetime.date(9999, 1, 1)),
('aten::setdefault', datetime.date(9999, 1, 1)),
('aten::_test_optional_float', datetime.date(9999, 1, 1)),
('aten::__upsample', datetime.date(9999, 1, 1)),
('aten::__interpolate', datetime.date(9999, 1, 1)),
('aten::divmod', datetime.date(9999, 1, 1)),
('aten::fabs', datetime.date(9999, 1, 1)),
('aten::gamma', datetime.date(9999, 1, 1)),
('aten::abs', datetime.date(9999, 1, 1)),
('aten::isinf', datetime.date(9999, 1, 1)),
('aten::factorial', datetime.date(9999, 1, 1)),
('aten::radians', datetime.date(9999, 1, 1)),
('aten::degrees', datetime.date(9999, 1, 1)),
('aten::acosh', datetime.date(9999, 1, 1)),
('aten::atanh', datetime.date(9999, 1, 1)),
('aten::asinh', datetime.date(9999, 1, 1)),
('aten::floordiv', datetime.date(9999, 1, 1)),
('aten::sorted', datetime.date(9999, 1, 1)),
('aten::__contains__', datetime.date(9999, 1, 1)),
('aten::count', datetime.date(9999, 1, 1)),
('aten::remove', datetime.date(9999, 1, 1)),
('aten::pop', datetime.date(9999, 1, 1)),
('aten::insert', datetime.date(9999, 1, 1)),
('aten::clear', datetime.date(9999, 1, 1)),
('aten::copy', datetime.date(9999, 1, 1)),
('aten::extend', datetime.date(9999, 1, 1)),
('aten::reverse', datetime.date(9999, 1, 1)),
('aten::append', datetime.date(9999, 1, 1)),
('aten::list', datetime.date(9999, 1, 1)),
('aten::__getitem__', datetime.date(9999, 1, 1)),
('aten::len', datetime.date(9999, 1, 1)),
('aten::backward', datetime.date(9999, 1, 1)),
('aten::Float', datetime.date(9999, 1, 1)),
('aten::Int', datetime.date(9999, 1, 1)),
('aten::Bool', datetime.date(9999, 1, 1)),
('aten::_ncf_view', datetime.date(9999, 1, 1)),
('aten::_ncf_unsqueeze', datetime.date(9999, 1, 1)),
('quantized::mul_scalar_relu_out', datetime.date(9999, 1, 1)),
('quantized::mul_scalar_out', datetime.date(9999, 1, 1)),
('quantized::mul_relu_out', datetime.date(9999, 1, 1)),
('quantized::mul_out', datetime.date(9999, 1, 1)),
('aten::tan', datetime.date(9999, 1, 1)),
('aten::sub', datetime.date(9999, 1, 1)),
('aten::sqrt', datetime.date(9999, 1, 1)),
('aten::sort', datetime.date(9999, 1, 1)),
('aten::slice', datetime.date(9999, 1, 1)),
('aten::sinh', datetime.date(9999, 1, 1)),
('aten::sin', datetime.date(9999, 1, 1)),
('aten::round', datetime.date(9999, 1, 1)),
('aten::remainder', datetime.date(9999, 1, 1)),
('aten::full_like', datetime.date(9999, 1, 1)),
('aten::real', datetime.date(9999, 1, 1)),
('aten::randn_like', datetime.date(9999, 1, 1)),
('aten::pow', datetime.date(9999, 1, 1)),
('aten::floor', datetime.date(9999, 1, 1)),
('quantized::cat_relu_out', datetime.date(9999, 1, 1)),
('quantized::cat_out', datetime.date(9999, 1, 1)),
('aten::neg', datetime.date(9999, 1, 1)),
('quantized::add_out', datetime.date(9999, 1, 1)),
('aten::expm1', datetime.date(9999, 1, 1)),
('aten::ceil', datetime.date(9999, 1, 1)),
('aten::add', datetime.date(9999, 1, 1)),
('aten::acos', datetime.date(9999, 1, 1)),
('aten::cudnn_convolution', datetime.date(9999, 1, 1)),
('aten::cudnn_convolution_backward', datetime.date(9999, 1, 1)),
('aten::cudnn_convolution_transpose', datetime.date(9999, 1, 1)),
('aten::cudnn_convolution_transpose_backward', datetime.date(9999, 1, 1)),
('aten::cudnn_convolution_backward_bias', datetime.date(9999, 1, 1)),
('aten::cudnn_convolution_transpose_backward_bias', datetime.date(9999, 1, 1)),
('aten::atan', datetime.date(9999, 1, 1)),
('aten::log10', datetime.date(9999, 1, 1)),
('quantized::add_scalar_out', datetime.date(9999, 1, 1)),
('quantized::add_scalar_relu_out', datetime.date(9999, 1, 1)),
('quantized::add_relu_out', datetime.date(9999, 1, 1)),
('aten::exp', datetime.date(9999, 1, 1)),
('aten::cosh', datetime.date(9999, 1, 1)),
('aten::erf', datetime.date(9999, 1, 1)),
('aten::imag', datetime.date(9999, 1, 1)),
('aten::empty_like', datetime.date(9999, 1, 1)),
('aten::eq', datetime.date(9999, 1, 1)),
('aten::index', datetime.date(9999, 1, 1)),
('aten::isfinite', datetime.date(9999, 1, 1)),
('aten::leaky_relu_backward', datetime.date(9999, 1, 1)),
('aten::lgamma', datetime.date(9999, 1, 1)),
('aten::log1p', datetime.date(9999, 1, 1)),
('aten::asin', datetime.date(9999, 1, 1)),
('aten::cos', datetime.date(9999, 1, 1)),
('aten::log', datetime.date(9999, 1, 1)),
('aten::mul', datetime.date(9999, 1, 1)),
('aten::ne', datetime.date(9999, 1, 1)),
('aten::rand_like', datetime.date(9999, 1, 1)),
('aten::randint_like', datetime.date(9999, 1, 1)),
('aten::rrelu_with_noise_backward', datetime.date(9999, 1, 1)),
('aten::select', datetime.date(9999, 1, 1)),
('aten::tanh', datetime.date(9999, 1, 1)),
('aten::add_', datetime.date(9999, 1, 1)),
('aten::ones_like', datetime.date(9999, 1, 1)),
('aten::to', datetime.date(9999, 1, 1)),
('aten::zeros_like', datetime.date(9999, 1, 1)),
]
@ -162,6 +171,15 @@ def check_bc(new_schema_dict):
return is_bc
blacklist = [
"torch.classes",
"Any",
"RRef",
"aten::setdefault",
"aten::_set_item",
]
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument(
@ -176,6 +194,9 @@ if __name__ == '__main__':
line = f.readline()
if not line:
break
if any(w for w in blacklist if w in line):
# TODO Fix type __torch__.torch.classes.xxx
continue
s = parse_schema(line.strip())
slist = new_schema_dict.get(s.name, [])

View File

@ -293,7 +293,7 @@ TEST_F(FunctionalTest, MultiLabelSoftMarginLossWeightedNoReduction) {
auto input = torch::tensor({{0., 2., 2., 0.}, {2., 1., 0., 1.}}, torch::dtype(torch::kFloat).requires_grad(true));
auto target = torch::tensor({{0., 0., 1., 0.}, {1., 0., 1., 1.}}, torch::kFloat);
auto weight = torch::tensor({0.1, 0.6, 0.4, 0.8}, torch::kFloat);
auto options = F::MultiLabelSoftMarginLossFuncOptions().reduction(torch::kNone).weight(weight);
auto options = F::MultilabelSoftMarginLossFuncOptions().reduction(torch::kNone).weight(weight);
auto output =
F::multilabel_soft_margin_loss(input, target, options);
auto expected = torch::tensor({0.4876902, 0.3321295}, torch::kFloat);
@ -1875,7 +1875,7 @@ TEST_F(FunctionalTest, Interpolate) {
// 1D interpolation
auto input = torch::ones({1, 1, 2});
auto options = F::InterpolateFuncOptions()
.size({4})
.size(std::vector<int64_t>({4}))
.mode(torch::kNearest);
auto output = F::interpolate(input, options);
auto expected = torch::ones({1, 1, 4});
@ -1889,7 +1889,7 @@ TEST_F(FunctionalTest, Interpolate) {
for (const auto scale_factor : {0.5, 1.5, 2.0}) {
auto input = torch::ones({1, 1, 2, 2});
auto options = F::InterpolateFuncOptions()
.scale_factor({scale_factor, scale_factor})
.scale_factor(std::vector<double>({scale_factor, scale_factor}))
.mode(torch::kBilinear)
.align_corners(align_corners);
auto output = F::interpolate(input, options);
@ -1908,7 +1908,7 @@ TEST_F(FunctionalTest, Interpolate) {
auto input = torch::ones({1, 1, 2, 2, 2});
auto options =
F::InterpolateFuncOptions()
.scale_factor({scale_factor, scale_factor, scale_factor})
.scale_factor(std::vector<double>({scale_factor, scale_factor, scale_factor}))
.mode(torch::kTrilinear)
.align_corners(align_corners);
auto output = F::interpolate(input, options);
@ -1924,13 +1924,13 @@ TEST_F(FunctionalTest, Interpolate) {
{
auto input = torch::randn({3, 2, 2});
ASSERT_THROWS_WITH(
F::interpolate(input[0], F::InterpolateFuncOptions().size({4, 4})),
F::interpolate(input[0], F::InterpolateFuncOptions().size(std::vector<int64_t>({4, 4}))),
"Input Error: Only 3D, 4D and 5D input Tensors supported (got 2D) "
"for the modes: nearest | linear | bilinear | bicubic | trilinear (got kNearest)");
ASSERT_THROWS_WITH(
F::interpolate(
torch::reshape(input, {1, 1, 1, 3, 2, 2}),
F::InterpolateFuncOptions().size({1, 1, 1, 3, 4, 4})),
F::InterpolateFuncOptions().size(std::vector<int64_t>({1, 1, 1, 3, 4, 4}))),
"Input Error: Only 3D, 4D and 5D input Tensors supported (got 6D) "
"for the modes: nearest | linear | bilinear | bicubic | trilinear (got kNearest)");
ASSERT_THROWS_WITH(
@ -1939,12 +1939,12 @@ TEST_F(FunctionalTest, Interpolate) {
ASSERT_THROWS_WITH(
F::interpolate(
input,
F::InterpolateFuncOptions().size({3, 4, 4}).scale_factor({0.5})),
F::InterpolateFuncOptions().size(std::vector<int64_t>({3, 4, 4})).scale_factor(std::vector<double>({0.5}))),
"only one of size or scale_factor should be defined");
ASSERT_THROWS_WITH(
F::interpolate(input, F::InterpolateFuncOptions().scale_factor({3, 2})),
F::interpolate(input, F::InterpolateFuncOptions().scale_factor(std::vector<double>({3, 2}))),
"scale_factor shape must match input shape. "
"Input is 1D, scale_factor size is 2");
"Input is 1D, scale_factor size is [3, 2]");
ASSERT_THROWS_WITH(
F::interpolate(
input,
@ -2328,9 +2328,15 @@ TEST_F(FunctionalTest, AlphaDropout) {
auto input_std = input.std();
for (const auto rate : {0.2, 0.5, 0.8}) {
auto output = F::alpha_dropout(input, F::AlphaDropoutFuncOptions().p(rate).training(false));
for (const auto inplace : {false, true}) {
auto input_ = input.clone();
auto output = F::alpha_dropout(input_, F::AlphaDropoutFuncOptions().p(rate).training(false).inplace(inplace));
ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
ASSERT_TRUE(torch::allclose(input_std, output.std(), 0.1));
if (inplace) {
ASSERT_TRUE(torch::allclose(input_, output));
}
}
}
auto output = F::detail::alpha_dropout(input, 0.5, false, false);
ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
@ -2343,9 +2349,15 @@ TEST_F(FunctionalTest, FeatureAlphaDropout) {
auto input_std = input.std();
for (const auto rate : {0.2, 0.5, 0.8}) {
auto output = F::feature_alpha_dropout(input, F::FeatureAlphaDropoutFuncOptions().p(rate).training(false));
for (const auto inplace : {false, true}) {
auto input_ = input.clone();
auto output = F::feature_alpha_dropout(input_, F::FeatureAlphaDropoutFuncOptions().p(rate).training(false).inplace(inplace));
ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));
ASSERT_TRUE(torch::allclose(input_std, output.std(), 0.1));
if (inplace) {
ASSERT_TRUE(torch::allclose(input_, output));
}
}
}
auto output = F::feature_alpha_dropout(input);
ASSERT_TRUE(torch::allclose(input_mean, output.mean(), 0.1));

View File

@ -1300,54 +1300,81 @@ TEST_F(ModulesTest, FeatureAlphaDropout) {
}
TEST_F(ModulesTest, Dropout) {
Dropout dropout(0.5);
torch::Tensor x = torch::ones(100, torch::requires_grad());
for (const auto inplace : {false, true}) {
Dropout dropout(DropoutOptions(0.5).inplace(inplace));
torch::Tensor x = torch::ones(100);
if (!inplace) {
x.requires_grad_(true);
}
torch::Tensor y = dropout(x);
y.backward(torch::ones_like(y));
ASSERT_EQ(y.ndimension(), 1);
ASSERT_EQ(y.size(0), 100);
ASSERT_LT(y.sum().item<float>(), 130); // Probably
ASSERT_GT(y.sum().item<float>(), 70); // Probably
if (inplace) {
ASSERT_TRUE(y.allclose(x));
} else {
y.backward(torch::ones_like(y));
}
dropout->eval();
y = dropout(x);
y = dropout(torch::ones(100));
ASSERT_EQ(y.sum().item<float>(), 100);
}
}
TEST_F(ModulesTest, Dropout2d) {
Dropout2d dropout(0.5);
torch::Tensor x = torch::ones({10, 10}, torch::requires_grad());
for (const auto inplace : {false, true}) {
Dropout2d dropout(Dropout2dOptions(0.5).inplace(inplace));
torch::Tensor x = torch::ones({10, 10});
if (!inplace) {
x.requires_grad_(true);
}
torch::Tensor y = dropout(x);
y.backward(torch::ones_like(y));
ASSERT_EQ(y.ndimension(), 2);
ASSERT_EQ(y.size(0), 10);
ASSERT_EQ(y.size(1), 10);
ASSERT_LT(y.sum().item<float>(), 130); // Probably
ASSERT_GT(y.sum().item<float>(), 70); // Probably
if (inplace) {
ASSERT_TRUE(y.allclose(x));
} else {
y.backward(torch::ones_like(y));
}
dropout->eval();
y = dropout(x);
y = dropout(torch::ones({10, 10}));
ASSERT_EQ(y.sum().item<float>(), 100);
}
}
TEST_F(ModulesTest, Dropout3d) {
Dropout3d dropout(0.5);
torch::Tensor x = torch::ones({4, 5, 5}, torch::requires_grad());
for (const auto inplace : {false, true}) {
Dropout3d dropout(Dropout3dOptions(0.5).inplace(inplace));
torch::Tensor x = torch::ones({4, 5, 5});
if (!inplace) {
x.requires_grad_(true);
}
torch::Tensor y = dropout(x);
y.backward(torch::ones_like(y));
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.size(0), 4);
ASSERT_EQ(y.size(1), 5);
ASSERT_EQ(y.size(1), 5);
ASSERT_LT(y.sum().item<float>(), 130); // Probably
ASSERT_GT(y.sum().item<float>(), 70); // Probably
if (inplace) {
ASSERT_TRUE(y.allclose(x));
} else {
y.backward(torch::ones_like(y));
}
dropout->eval();
y = dropout(x);
y = dropout(torch::ones({4, 5, 5}));
ASSERT_EQ(y.sum().item<float>(), 100);
}
}
TEST_F(ModulesTest, Parameters) {
@ -2147,38 +2174,58 @@ TEST_F(ModulesTest, PairwiseDistance) {
TEST_F(ModulesTest, ELU) {
const auto size = 3;
for (const auto alpha : {0.0, 0.42, 1.0, 4.2, 42.42}) {
ELU model {ELUOptions().alpha(alpha)};
for (const auto inplace : {false, true}) {
ELU model {ELUOptions().alpha(alpha).inplace(inplace)};
auto x = torch::linspace(-10.0, 10.0, size * size * size);
x.resize_({size, size, size}).set_requires_grad(true);
x.resize_({size, size, size});
if (!inplace) {
x.requires_grad_(true);
}
auto x_orig = x.clone();
auto y = model(x);
torch::Tensor s = y.sum();
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
auto y_exp = torch::max(torch::zeros_like(x), x) +
torch::min(torch::zeros_like(x), alpha * (torch::exp(x) - 1.0));
auto y_exp = torch::max(torch::zeros_like(x_orig), x_orig) +
torch::min(torch::zeros_like(x_orig), alpha * (torch::exp(x_orig) - 1.0));
ASSERT_TRUE(torch::allclose(y, y_exp));
if (inplace) {
ASSERT_TRUE(torch::allclose(x, y_exp));
} else {
s.backward();
}
}
}
}
TEST_F(ModulesTest, SELU) {
SELU model;
auto input = torch::randn({5, 5}, torch::requires_grad());
for (const auto inplace : {false, true}) {
SELU model(inplace);
auto input = torch::randn({5, 5});
if (!inplace) {
input.requires_grad_(true);
}
auto input_orig = input.clone();
auto output = model->forward(input);
const double scale = 1.0507009873554804934193349852946;
const double alpha = 1.6732632423543772848170429916717;
auto zero = torch::zeros_like(input);
auto expected = scale *
(torch::max(zero, input) +
torch::min(zero, alpha * (torch::exp(input) - 1)));
(torch::max(zero, input_orig) +
torch::min(zero, alpha * (torch::exp(input_orig) - 1)));
auto s = output.sum();
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_TRUE(output.allclose(expected));
if (inplace) {
ASSERT_TRUE(input.allclose(expected));
} else {
s.backward();
}
}
}
TEST_F(ModulesTest, Hardshrink) {
@ -2192,7 +2239,6 @@ TEST_F(ModulesTest, Hardshrink) {
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
auto y_exp = (x.abs() > lambda) * x;
@ -2204,21 +2250,30 @@ TEST_F(ModulesTest, Hardtanh) {
const auto size = 3;
for (const auto min_val : {-4.2, -1.0, -0.42, 0.0}) {
for (const auto max_val : {0.42, 1.0, 4.2}) {
Hardtanh model {HardtanhOptions().min_val(min_val).max_val(max_val)};
for (const auto inplace : {false, true}) {
Hardtanh model {HardtanhOptions().min_val(min_val).max_val(max_val).inplace(inplace)};
auto x = torch::linspace(-10.0, 10.0, size * size * size);
x.resize_({size, size, size}).set_requires_grad(true);
x.resize_({size, size, size});
if (!inplace) {
x.requires_grad_(true);
}
auto x_orig = x.clone();
auto y = model(x);
torch::Tensor s = y.sum();
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
auto y_exp = (x < min_val) * min_val +
((x >= min_val) * (x <= max_val)) * x +
(x > max_val) * max_val;
auto y_exp = (x_orig < min_val) * min_val +
((x_orig >= min_val) * (x_orig <= max_val)) * x_orig +
(x_orig > max_val) * max_val;
ASSERT_TRUE(torch::allclose(y, y_exp));
if (inplace) {
ASSERT_TRUE(torch::allclose(x, y_exp));
} else {
s.backward();
}
}
}
}
}
@ -2238,20 +2293,29 @@ TEST_F(ModulesTest, HardtanhMinValGEMaxVal) {
TEST_F(ModulesTest, LeakyReLU) {
const auto size = 3;
for (const auto inplace : {false, true}) {
for (const auto negative_slope : {0.0, 0.42, 1.0}) {
LeakyReLU model {LeakyReLUOptions().negative_slope(negative_slope)};
LeakyReLU model {LeakyReLUOptions().negative_slope(negative_slope).inplace(inplace)};
auto x = torch::linspace(-10.0, 10.0, size * size * size);
x.resize_({size, size, size}).set_requires_grad(true);
x.resize_({size, size, size});
if (!inplace) {
x.requires_grad_(true);
}
auto x_orig = x.clone();
auto y = model(x);
torch::Tensor s = y.sum();
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
auto y_exp = (x < 0) * x * negative_slope + (x >= 0) * x;
auto y_exp = (x_orig < 0) * x_orig * negative_slope + (x_orig >= 0) * x_orig;
ASSERT_TRUE(torch::allclose(y, y_exp));
if (inplace) {
ASSERT_TRUE(torch::allclose(x, y_exp));
} else {
s.backward();
}
}
}
}
@ -2394,78 +2458,114 @@ TEST_F(ModulesTest, PReLU) {
}
TEST_F(ModulesTest, ReLU) {
for (const auto inplace : {false, true}) {
const auto size = 3;
ReLU model;
ReLU model(inplace);
auto x = torch::linspace(-10.0, 10.0, size * size * size);
x.resize_({size, size, size}).set_requires_grad(true);
x.resize_({size, size, size});
if (!inplace) {
x.requires_grad_(true);
}
auto x_orig = x.clone();
auto y = model(x);
torch::Tensor s = y.sum();
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
auto y_exp = (x < 0) * 0 + (x >= 0) * x;
auto y_exp = (x_orig < 0) * 0 + (x_orig >= 0) * x_orig;
ASSERT_TRUE(torch::allclose(y, y_exp));
if (inplace) {
ASSERT_TRUE(torch::allclose(x, y_exp));
} else {
s.backward();
}
}
}
TEST_F(ModulesTest, ReLU6) {
for (const auto inplace : {false, true}) {
const auto size = 3;
ReLU6 model;
ReLU6 model(inplace);
auto x = torch::linspace(-10.0, 10.0, size * size * size);
x.resize_({size, size, size}).set_requires_grad(true);
x.resize_({size, size, size});
if (!inplace) {
x.requires_grad_(true);
}
auto x_orig = x.clone();
auto y = model(x);
torch::Tensor s = y.sum();
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
auto y_exp = (x < 0) * 0 + ((x >= 0) * (x <= 6)) * x + (x > 6) * 6;
auto y_exp = (x_orig < 0) * 0 + ((x_orig >= 0) * (x_orig <= 6)) * x_orig + (x_orig > 6) * 6;
ASSERT_TRUE(torch::allclose(y, y_exp));
if (inplace) {
ASSERT_TRUE(torch::allclose(x, y_exp));
} else {
s.backward();
}
}
}
TEST_F(ModulesTest, RReLU) {
const auto size = 3;
for (const auto lower : {0.01, 0.1, 0.2}) {
for (const auto upper : {0.3, 0.4, 0.5}) {
RReLU model {RReLUOptions().lower(lower).upper(upper)};
for (const auto inplace : {false, true}) {
RReLU model {RReLUOptions().lower(lower).upper(upper).inplace(inplace)};
auto x = torch::linspace(-10.0, 10.0, size * size * size);
x.resize_({size, size, size}).set_requires_grad(true);
x.resize_({size, size, size});
if (!inplace) {
x.requires_grad_(true);
}
auto x_orig = x.clone();
auto y = model(x);
torch::Tensor s = y.sum();
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
auto z = ((x >= 0) * (x == y) +
(x < 0) * (y >= x * upper) * (y <= lower * x)) * 1.0;
auto z = ((x_orig >= 0) * (x_orig == y) +
(x_orig < 0) * (y >= x_orig * upper) * (y <= lower * x_orig)) * 1.0;
ASSERT_TRUE(torch::allclose(z, torch::ones_like(z)));
if (inplace) {
ASSERT_TRUE(torch::allclose(x, y));
} else {
s.backward();
}
}
}
}
}
TEST_F(ModulesTest, CELU) {
const auto size = 3;
for (const auto inplace : {false, true}) {
for (const auto alpha : {0.42, 1.0, 4.2, 42.42}) {
CELU model {CELUOptions().alpha(alpha)};
CELU model {CELUOptions().alpha(alpha).inplace(inplace)};
auto x = torch::linspace(-10.0, 10.0, size * size * size);
x.resize_({size, size, size}).set_requires_grad(true);
x.resize_({size, size, size});
if (!inplace) {
x.requires_grad_(true);
}
auto x_orig = x.clone();
auto y = model(x);
torch::Tensor s = y.sum();
s.backward();
ASSERT_EQ(s.ndimension(), 0);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
auto y_exp = torch::max(torch::zeros_like(x), x) +
torch::min(torch::zeros_like(x), alpha * (torch::exp(x / alpha) - 1.0));
auto y_exp = torch::max(torch::zeros_like(x_orig), x_orig) +
torch::min(torch::zeros_like(x_orig), alpha * (torch::exp(x_orig / alpha) - 1.0));
ASSERT_TRUE(torch::allclose(y, y_exp));
if (inplace) {
ASSERT_TRUE(torch::allclose(x, y_exp));
} else {
s.backward();
}
}
}
}
@ -2597,12 +2697,16 @@ TEST_F(ModulesTest, Threshold) {
Threshold model {ThresholdOptions(threshold, value).inplace(inplace)};
auto x = torch::linspace(-3.0, 3.0, 61);
x.resize_({size, size, size});
auto y_exp = (x <= threshold) * value + (x > threshold) * x;
auto x_orig = x.clone();
auto y_exp = (x_orig <= threshold) * value + (x_orig > threshold) * x_orig;
auto y = model(x);
ASSERT_EQ(y.ndimension(), 3);
ASSERT_EQ(y.sizes(), std::vector<int64_t>({size, size, size}));
ASSERT_TRUE(torch::allclose(y, y_exp));
if (inplace) {
ASSERT_TRUE(torch::allclose(x, y_exp));
}
}
}
}
@ -2611,7 +2715,7 @@ TEST_F(ModulesTest, Threshold) {
TEST_F(ModulesTest, Upsampling1D) {
{
Upsample model(UpsampleOptions()
.size({4})
.size(std::vector<int64_t>({4}))
.mode(torch::kNearest));
auto input = torch::ones({1, 1, 2}, torch::requires_grad());
auto output = model->forward(input);
@ -2627,7 +2731,7 @@ TEST_F(ModulesTest, Upsampling1D) {
// test float scale factor up & down sampling
for (const auto scale_factor : {0.5, 1.5, 2.0}) {
Upsample model(UpsampleOptions()
.scale_factor({scale_factor})
.scale_factor(std::vector<double>({scale_factor}))
.mode(torch::kLinear)
.align_corners(align_corners));
auto input = torch::ones({1, 1, 2}, torch::requires_grad());
@ -2646,7 +2750,7 @@ TEST_F(ModulesTest, Upsampling1D) {
{
// linear (1D) upsampling spatial invariance
Upsample model(UpsampleOptions()
.scale_factor({3})
.scale_factor(std::vector<double>({3}))
.mode(torch::kLinear)
.align_corners(false));
auto input = torch::zeros({1, 1, 9});
@ -2661,7 +2765,7 @@ TEST_F(ModulesTest, Upsampling1D) {
TEST_F(ModulesTest, Upsampling2D) {
{
Upsample model(UpsampleOptions()
.size({4, 4})
.size(std::vector<int64_t>({4, 4}))
.mode(torch::kNearest));
auto input = torch::ones({1, 1, 2, 2}, torch::requires_grad());
auto output = model->forward(input);
@ -2677,7 +2781,7 @@ TEST_F(ModulesTest, Upsampling2D) {
// test float scale factor up & down sampling
for (const auto scale_factor : {0.5, 1.5, 2.0}) {
Upsample model(UpsampleOptions()
.scale_factor({scale_factor, scale_factor})
.scale_factor(std::vector<double>({scale_factor, scale_factor}))
.mode(torch::kBilinear)
.align_corners(align_corners));
auto input = torch::ones({1, 1, 2, 2}, torch::requires_grad());
@ -2698,7 +2802,7 @@ TEST_F(ModulesTest, Upsampling2D) {
// test float scale factor up & down sampling
for (const auto scale_factor : {0.5, 1.5, 2.0}) {
Upsample model(UpsampleOptions()
.scale_factor({scale_factor, scale_factor})
.scale_factor(std::vector<double>({scale_factor, scale_factor}))
.mode(torch::kBicubic)
.align_corners(align_corners));
auto input = torch::ones({1, 1, 2, 2}, torch::requires_grad());
@ -2719,7 +2823,7 @@ TEST_F(ModulesTest, Upsampling2D) {
TEST_F(ModulesTest, Upsampling3D) {
{
Upsample model(UpsampleOptions()
.size({4, 4, 4})
.size(std::vector<int64_t>({4, 4, 4}))
.mode(torch::kNearest));
auto input = torch::ones({1, 1, 2, 2, 2}, torch::requires_grad());
auto output = model->forward(input);
@ -2736,7 +2840,7 @@ TEST_F(ModulesTest, Upsampling3D) {
for (const auto scale_factor : {0.5, 1.5, 2.0}) {
Upsample model(
UpsampleOptions()
.scale_factor({scale_factor, scale_factor, scale_factor})
.scale_factor(std::vector<double>({scale_factor, scale_factor, scale_factor}))
.mode(torch::kTrilinear)
.align_corners(align_corners));
auto input = torch::ones({1, 1, 2, 2, 2}, torch::requires_grad());
@ -3876,10 +3980,10 @@ TEST_F(ModulesTest, PrettyPrintConvTranspose) {
TEST_F(ModulesTest, PrettyPrintUpsample) {
ASSERT_EQ(
c10::str(Upsample(UpsampleOptions().size({2, 4, 4}))),
c10::str(Upsample(UpsampleOptions().size(std::vector<int64_t>({2, 4, 4})))),
"torch::nn::Upsample(size=[2, 4, 4], mode=kNearest)");
ASSERT_EQ(
c10::str(Upsample(UpsampleOptions().scale_factor({0.5, 1.5}).mode(torch::kBilinear))),
c10::str(Upsample(UpsampleOptions().scale_factor(std::vector<double>({0.5, 1.5})).mode(torch::kBilinear))),
"torch::nn::Upsample(scale_factor=[0.5, 1.5], mode=kBilinear)");
}
@ -3987,15 +4091,27 @@ TEST_F(ModulesTest, PrettyPrintAdaptiveMaxPool) {
c10::str(AdaptiveMaxPool2d(5)),
"torch::nn::AdaptiveMaxPool2d(output_size=[5, 5])");
ASSERT_EQ(
c10::str(AdaptiveMaxPool2d(std::vector<int64_t>{5, 6})),
c10::str(AdaptiveMaxPool2d(AdaptiveMaxPool2dOptions({5, 6}))),
"torch::nn::AdaptiveMaxPool2d(output_size=[5, 6])");
ASSERT_EQ(
c10::str(AdaptiveMaxPool2d(AdaptiveMaxPool2dOptions({5, c10::nullopt}))),
"torch::nn::AdaptiveMaxPool2d(output_size=[5, None])");
ASSERT_EQ(
c10::str(AdaptiveMaxPool2d(AdaptiveMaxPool2dOptions({c10::nullopt, c10::nullopt}))),
"torch::nn::AdaptiveMaxPool2d(output_size=[None, None])");
ASSERT_EQ(
c10::str(AdaptiveMaxPool3d(5)),
"torch::nn::AdaptiveMaxPool3d(output_size=[5, 5, 5])");
ASSERT_EQ(
c10::str(AdaptiveMaxPool3d(std::vector<int64_t>{5, 6, 7})),
c10::str(AdaptiveMaxPool3d(AdaptiveMaxPool3dOptions({5, 6, 7}))),
"torch::nn::AdaptiveMaxPool3d(output_size=[5, 6, 7])");
ASSERT_EQ(
c10::str(AdaptiveMaxPool3d(AdaptiveMaxPool3dOptions({5, c10::nullopt, 7}))),
"torch::nn::AdaptiveMaxPool3d(output_size=[5, None, 7])");
ASSERT_EQ(
c10::str(AdaptiveMaxPool3d(AdaptiveMaxPool3dOptions({c10::nullopt, c10::nullopt, c10::nullopt}))),
"torch::nn::AdaptiveMaxPool3d(output_size=[None, None, None])");
}
TEST_F(ModulesTest, PrettyPrintAdaptiveAvgPool) {
@ -4007,15 +4123,27 @@ TEST_F(ModulesTest, PrettyPrintAdaptiveAvgPool) {
c10::str(AdaptiveAvgPool2d(5)),
"torch::nn::AdaptiveAvgPool2d(output_size=[5, 5])");
ASSERT_EQ(
c10::str(AdaptiveAvgPool2d(std::vector<int64_t>{5, 6})),
c10::str(AdaptiveAvgPool2d(AdaptiveAvgPool2dOptions({5, 6}))),
"torch::nn::AdaptiveAvgPool2d(output_size=[5, 6])");
ASSERT_EQ(
c10::str(AdaptiveAvgPool2d(AdaptiveAvgPool2dOptions({5, c10::nullopt}))),
"torch::nn::AdaptiveAvgPool2d(output_size=[5, None])");
ASSERT_EQ(
c10::str(AdaptiveAvgPool2d(AdaptiveAvgPool2dOptions({c10::nullopt, c10::nullopt}))),
"torch::nn::AdaptiveAvgPool2d(output_size=[None, None])");
ASSERT_EQ(
c10::str(AdaptiveAvgPool3d(5)),
"torch::nn::AdaptiveAvgPool3d(output_size=[5, 5, 5])");
ASSERT_EQ(
c10::str(AdaptiveAvgPool3d(std::vector<int64_t>{5, 6, 7})),
c10::str(AdaptiveAvgPool3d(AdaptiveAvgPool3dOptions({5, 6, 7}))),
"torch::nn::AdaptiveAvgPool3d(output_size=[5, 6, 7])");
ASSERT_EQ(
c10::str(AdaptiveAvgPool3d(AdaptiveAvgPool3dOptions({5, c10::nullopt, 7}))),
"torch::nn::AdaptiveAvgPool3d(output_size=[5, None, 7])");
ASSERT_EQ(
c10::str(AdaptiveAvgPool3d(AdaptiveAvgPool3dOptions({c10::nullopt, c10::nullopt, c10::nullopt}))),
"torch::nn::AdaptiveAvgPool3d(output_size=[None, None, None])");
}
TEST_F(ModulesTest, PrettyPrintMaxUnpool) {

View File

@ -26,7 +26,7 @@ bool test_optimizer_xor(Options options) {
Linear(8, 1),
Functional(torch::sigmoid));
const int64_t kBatchSize = 4;
const int64_t kBatchSize = 200;
const int64_t kMaximumNumberOfEpochs = 3000;
OptimizerClass optimizer(model->parameters(), options);
@ -40,13 +40,21 @@ bool test_optimizer_xor(Options options) {
inputs[i] = torch::randint(2, {2}, torch::kInt64);
labels[i] = inputs[i][0].item<int64_t>() ^ inputs[i][1].item<int64_t>();
}
inputs.set_requires_grad(true);
auto step = [&](OptimizerClass& optimizer, Sequential model, torch::Tensor inputs, torch::Tensor labels) {
auto closure = [&]() {
optimizer.zero_grad();
auto x = model->forward(inputs);
torch::Tensor loss = torch::binary_cross_entropy(x, labels);
auto loss = torch::binary_cross_entropy(x, labels);
loss.backward();
return loss;
};
return optimizer.step(closure);
};
optimizer.step();
torch::Tensor loss = step(optimizer, model, inputs, labels);
running_loss = running_loss * 0.99 + loss.item<float>() * 0.01;
if (epoch > kMaximumNumberOfEpochs) {
@ -166,30 +174,66 @@ TEST(OptimTest, OptimizerAccessors) {
optimizer_.state();
}
TEST(OptimTest, BasicInterface) {
#define OLD_INTERFACE_WARNING_CHECK(func) \
{ \
std::stringstream buffer;\
torch::test::CerrRedirect cerr_redirect(buffer.rdbuf());\
func;\
ASSERT_EQ(\
torch::test::count_substr_occurrences(\
buffer.str(),\
"will be removed"\
),\
1);\
}
struct MyOptimizerOptions : public OptimizerCloneableOptions<MyOptimizerOptions> {
MyOptimizerOptions(double lr = 1.0) : lr_(lr) {};
TORCH_ARG(double, lr) = 1.0;
};
TEST(OptimTest, OldInterface) {
struct MyOptimizer : Optimizer {
using Optimizer::Optimizer;
torch::Tensor step(LossClosure closure = nullptr) override { return {};}
explicit MyOptimizer(
std::vector<at::Tensor> params, MyOptimizerOptions defaults = {}) :
Optimizer({std::move(OptimizerParamGroup(params))}, std::make_unique<MyOptimizerOptions>(defaults)) {}
};
std::vector<torch::Tensor> parameters = {
torch::ones({2, 3}), torch::zeros({2, 3}), torch::rand({2, 3})};
{
MyOptimizer optimizer(parameters);
ASSERT_EQ(optimizer.size(), parameters.size());
size_t size;
OLD_INTERFACE_WARNING_CHECK(size = optimizer.size());
ASSERT_EQ(size, parameters.size());
}
{
MyOptimizer optimizer;
ASSERT_EQ(optimizer.size(), 0);
optimizer.add_parameters(parameters);
ASSERT_EQ(optimizer.size(), parameters.size());
for (size_t p = 0; p < parameters.size(); ++p) {
ASSERT_TRUE(optimizer.parameters()[p].allclose(parameters[p]));
std::vector<at::Tensor> params;
MyOptimizer optimizer(params);
size_t size;
OLD_INTERFACE_WARNING_CHECK(size = optimizer.size());
ASSERT_EQ(size, 0);
OLD_INTERFACE_WARNING_CHECK(optimizer.add_parameters(parameters));
OLD_INTERFACE_WARNING_CHECK(size = optimizer.size());
ASSERT_EQ(size, parameters.size());
std::vector<torch::Tensor> params_;
OLD_INTERFACE_WARNING_CHECK(params_ = optimizer.parameters());
for (size_t p = 0; p < size; ++p) {
ASSERT_TRUE(params_[p].allclose(parameters[p]));
}
}
{
Linear linear(3, 4);
MyOptimizer optimizer(linear->parameters());
ASSERT_EQ(optimizer.size(), linear->parameters().size());
size_t size;
OLD_INTERFACE_WARNING_CHECK(size = optimizer.size());
ASSERT_EQ(size, linear->parameters().size());
}
}
@ -198,6 +242,11 @@ TEST(OptimTest, XORConvergence_SGD) {
SGDOptions(0.1).momentum(0.9).nesterov(true).weight_decay(1e-6)));
}
TEST(OptimTest, XORConvergence_LBFGS) {
ASSERT_TRUE(test_optimizer_xor<LBFGS>(LBFGSOptions(1.0)));
ASSERT_TRUE(test_optimizer_xor<LBFGS>(LBFGSOptions(1.0).line_search_fn("strong_wolfe")));
}
TEST(OptimTest, XORConvergence_Adagrad) {
ASSERT_TRUE(test_optimizer_xor<Adagrad>(
AdagradOptions(1.0).weight_decay(1e-6).lr_decay(1e-3)));
@ -375,7 +424,7 @@ TEST(OptimTest, AddParameter_LBFGS) {
}
LBFGS optimizer(std::vector<torch::Tensor>{}, 1.0);
optimizer.add_parameters(parameters);
OLD_INTERFACE_WARNING_CHECK(optimizer.add_parameters(parameters));
optimizer.step([]() { return torch::tensor(1); });

View File

@ -64,7 +64,7 @@ void is_optimizer_state_equal(
}
template <typename OptimizerClass, typename DerivedOptimizerOptions, typename DerivedOptimizerParamState>
void test_serialize_optimizer(DerivedOptimizerOptions options) {
void test_serialize_optimizer(DerivedOptimizerOptions options, bool only_has_global_state = false) {
auto model1 = Linear(5, 2);
auto model2 = Linear(5, 2);
auto model3 = Linear(5, 2);
@ -125,9 +125,11 @@ void test_serialize_optimizer(DerivedOptimizerOptions options) {
auto& optim3_2_state = optim3_2.state();
auto& optim3_state = optim3.state();
// optim3_2 and optim1 should have param_groups and state of size 1 and 2 respectively
// optim3_2 and optim1 should have param_groups and state of size 1 and state_size respectively
ASSERT_TRUE(optim3_2_param_groups.size() == 1);
ASSERT_TRUE(optim3_2_state.size() == 2);
// state_size = 2 for all optimizers except LBFGS as LBFGS only maintains one global state
int state_size = only_has_global_state ? 1 : 2;
ASSERT_TRUE(optim3_2_state.size() == state_size);
// optim3_2 and optim1 should have param_groups and state of same size
ASSERT_TRUE(optim3_2_param_groups.size() == optim3_param_groups.size());
@ -668,39 +670,16 @@ TEST(SerializeTest, Optim_RMSprop) {
}
TEST(SerializeTest, Optim_LBFGS) {
auto options = LBFGSOptions();
test_serialize_optimizer<LBFGS, LBFGSOptions, LBFGSParamState>(LBFGSOptions(), true);
// bc compatibility check
auto model1 = Linear(5, 2);
auto model2 = Linear(5, 2);
auto model3 = Linear(5, 2);
// Models 1, 2, 3 will have the same parameters.
auto model_tempfile = c10::make_tempfile();
torch::save(model1, model_tempfile.name);
torch::load(model2, model_tempfile.name);
torch::load(model3, model_tempfile.name);
auto param1 = model1->named_parameters();
auto param2 = model2->named_parameters();
auto param3 = model3->named_parameters();
for (const auto& p : param1) {
ASSERT_TRUE(p->allclose(param2[p.key()]));
ASSERT_TRUE(param2[p.key()].allclose(param3[p.key()]));
}
// Make some optimizers
auto optim1 = LBFGS(
{torch::optim::OptimizerParamGroup(model1->parameters())}, options);
auto optim2 = LBFGS(
model2->parameters(), options);
auto optim2_2 = LBFGS(
model2->parameters(), options);
auto optim3 = LBFGS(
model3->parameters(), options);
auto optim3_2 = LBFGS(
model3->parameters(), options);
auto model1_params = model1->parameters();
// added a tensor for lazy init check - when all params do not have entry in buffers
model1_params.emplace_back(torch::randn({2,3}));
auto optim1 = torch::optim::LBFGS(model1_params, torch::optim::LBFGSOptions());
auto x = torch::ones({10, 5});
auto step = [&x](torch::optim::LossClosureOptimizer& optimizer, Linear model) {
auto step = [&x](torch::optim::Optimizer& optimizer, Linear model) {
optimizer.zero_grad();
auto y = model->forward(x).sum();
y.backward();
@ -708,56 +687,47 @@ TEST(SerializeTest, Optim_LBFGS) {
optimizer.step(closure);
};
// Do 2 steps of model1
step(optim1, model1);
step(optim1, model1);
// Do 2 steps of model 2 without saving the optimizer
step(optim2, model2);
step(optim2_2, model2);
at::Tensor d, t, H_diag, prev_flat_grad, prev_loss;
std::deque<at::Tensor> old_dirs, old_stps;
// Do 1 step of model 3
step(optim3, model3);
const auto& params_ = optim1.param_groups()[0].params();
auto key_ = c10::guts::to_string(params_[0].unsafeGetTensorImpl());
const auto& optim1_state = static_cast<const LBFGSParamState&>(*(optim1.state().at(key_).get()));
d = optim1_state.d();
t = at::tensor(optim1_state.t());
H_diag = optim1_state.H_diag();
prev_flat_grad = optim1_state.prev_flat_grad();
prev_loss = at::tensor(optim1_state.prev_loss());
old_dirs = optim1_state.old_dirs();
// save the optimizer
auto optim_tempfile = c10::make_tempfile();
torch::save(optim3, optim_tempfile.name);
torch::load(optim3_2, optim_tempfile.name);
// write buffers to the file
auto optim_tempfile_old_format = c10::make_tempfile();
torch::serialize::OutputArchive output_archive;
output_archive.write("d", d, /*is_buffer=*/true);
output_archive.write("t", t, /*is_buffer=*/true);
output_archive.write("H_diag", H_diag, /*is_buffer=*/true);
output_archive.write("prev_flat_grad", prev_flat_grad, /*is_buffer=*/true);
output_archive.write("prev_loss", prev_loss, /*is_buffer=*/true);
write_tensors_to_archive(output_archive, "old_dirs", old_dirs);
write_tensors_to_archive(output_archive, "old_stps", old_stps);
output_archive.save_to(optim_tempfile_old_format.name);
auto& optim3_2_param_groups = optim3_2.param_groups();
auto& optim3_param_groups = optim3.param_groups();
auto& optim3_2_state = optim3_2.state();
auto& optim3_state = optim3.state();
auto optim1_2 = LBFGS(model1_params, torch::optim::LBFGSOptions());
OLD_SERIALIZATION_LOGIC_WARNING_CHECK(torch::load, optim1_2, optim_tempfile_old_format.name);
// LBFGS only supports 1 param_group
// optim3_2 and optim1 should have param_groups of size 1
ASSERT_TRUE(optim3_param_groups.size() == 1);
ASSERT_TRUE(optim3_2_param_groups.size() == 1);
// LBFGS only maintains one global state
ASSERT_TRUE(optim3_2_state.size() == 1);
ASSERT_TRUE(optim3_state.size() == 1);
const auto& params1_2_ = optim1_2.param_groups()[0].params();
auto param_key = c10::guts::to_string(params1_2_[0].unsafeGetTensorImpl());
auto& optim1_2_state = static_cast<LBFGSParamState&>(*(optim1_2.state().at(param_key).get()));
// checking correctness of serialization logic for optimizer.param_groups_ and optimizer.state_
for (int i = 0; i < optim3_2_param_groups.size(); i++) {
is_optimizer_param_group_equal<LBFGSOptions>(
optim3_2_param_groups[i], optim3_param_groups[i]);
is_optimizer_state_equal<LBFGSParamState>(optim3_2_state, optim3_state);
}
// old LBFGS didn't track func_evals, n_iter, ro, al values
optim1_2_state.func_evals(optim1_state.func_evals());
optim1_2_state.n_iter(optim1_state.n_iter());
optim1_2_state.ro(optim1_state.ro());
optim1_2_state.al(optim1_state.al());
// Do step2 for model 3
step(optim3_2, model3);
param1 = model1->named_parameters();
param2 = model2->named_parameters();
param3 = model3->named_parameters();
for (const auto& p : param1) {
const auto& name = p.key();
// Model 1 and 3 should be the same
ASSERT_TRUE(
param1[name].norm().item<float>() == param3[name].norm().item<float>());
ASSERT_TRUE(
param1[name].norm().item<float>() != param2[name].norm().item<float>());
}
is_optimizer_state_equal<LBFGSParamState>(optim1.state(), optim1_2.state());
}
TEST(SerializeTest, XOR_CUDA) {

View File

@ -138,7 +138,7 @@ void testClassDerive() {
static const auto torchbindSrc = R"JIT(
class FooBar1234(Module):
__parameters__ = []
f : __torch__.torch.classes._TorchScriptTesting_StackString
f : __torch__.torch.classes._TorchScriptTesting._StackString
training : bool
def forward(self: __torch__.FooBar1234) -> str:
return (self.f).top()

View File

@ -66,7 +66,7 @@ struct PickleTester : torch::CustomClassHolder {
std::vector<int64_t> vals;
};
static auto test = torch::class_<Foo>("_TorchScriptTesting_Foo")
static auto test = torch::class_<Foo>("_TorchScriptTesting", "_Foo")
.def(torch::init<int64_t, int64_t>())
// .def(torch::init<>())
.def("info", &Foo::info)
@ -75,7 +75,9 @@ static auto test = torch::class_<Foo>("_TorchScriptTesting_Foo")
.def("combine", &Foo::combine);
static auto testStack =
torch::class_<MyStackClass<std::string>>("_TorchScriptTesting_StackString")
torch::class_<MyStackClass<std::string>>(
"_TorchScriptTesting",
"_StackString")
.def(torch::init<std::vector<std::string>>())
.def("push", &MyStackClass<std::string>::push)
.def("pop", &MyStackClass<std::string>::pop)
@ -101,7 +103,7 @@ static auto testStack =
// clang-format on
static auto testPickle =
torch::class_<PickleTester>("_TorchScriptTesting_PickleTester")
torch::class_<PickleTester>("_TorchScriptTesting", "_PickleTester")
.def(torch::init<std::vector<int64_t>>())
.def_pickle(
[](c10::intrusive_ptr<PickleTester> self) { // __getstate__
@ -129,7 +131,7 @@ torch::RegisterOperators& register_take_instance() {
static auto instance_registry = torch::RegisterOperators().op(
torch::RegisterOperators::options()
.schema(
"_TorchScriptTesting::take_an_instance(__torch__.torch.classes._TorchScriptTesting_PickleTester x) -> Tensor Y")
"_TorchScriptTesting::take_an_instance(__torch__.torch.classes._TorchScriptTesting._PickleTester x) -> Tensor Y")
.catchAllKernel<decltype(take_an_instance), &take_an_instance>());
return instance_registry;
}
@ -146,7 +148,7 @@ void testTorchbindIValueAPI() {
auto custom_class_obj = make_custom_class<MyStackClass<std::string>>(
std::vector<std::string>{"foo", "bar"});
m.define(R"(
def forward(self, s : __torch__.torch.classes._TorchScriptTesting_StackString):
def forward(self, s : __torch__.torch.classes._TorchScriptTesting._StackString):
return s.pop(), s
)");

View File

@ -343,7 +343,8 @@ void testLiteInterpreterBuiltinFunction() {
namespace {
static auto reg =
torch::jit::class_<TorchBindLiteInterpreterTestStruct>(
"_TorchScriptTesting_LiteInterpreterTest")
"_TorchScriptTesting",
"_LiteInterpreterTest")
.def("get", &TorchBindLiteInterpreterTestStruct::get)
.def_pickle(
// __getattr__

View File

@ -1,85 +0,0 @@
from collections import namedtuple
TorchNNTestParams = namedtuple(
'TorchNNTestParams',
[
'module_name',
'module_variant_name',
'test_instance',
'cpp_constructor_args',
'has_parity',
'device',
]
)
CppArg = namedtuple('CppArg', ['type', 'value'])
ParityStatus = namedtuple('ParityStatus', ['has_impl_parity', 'has_doc_parity'])
TorchNNModuleMetadata = namedtuple(
'TorchNNModuleMetadata',
[
'cpp_default_constructor_args',
'num_attrs_recursive',
'python_ignored_constructor_args',
'python_ignored_attrs',
'python_optional_attribute_to_jit_type',
'cpp_sources',
]
)
TorchNNModuleMetadata.__new__.__defaults__ = (None, None, [], [], {}, '')
'''
This function expects the parity tracker Markdown file to have the following format:
```
## package1_name
API | Implementation Parity | Doc Parity
------------- | ------------- | -------------
API_Name|No|No
...
## package2_name
API | Implementation Parity | Doc Parity
------------- | ------------- | -------------
API_Name|No|No
...
```
The returned dict has the following format:
```
Dict[package_name]
-> Dict[api_name]
-> ParityStatus
```
'''
def parse_parity_tracker_table(file_path):
def parse_parity_choice(str):
if str in ['Yes', 'No']:
return str == 'Yes'
else:
raise RuntimeError(
'{} is not a supported parity choice. The valid choices are "Yes" and "No".'.format(str))
parity_tracker_dict = {}
with open(file_path, 'r') as f:
all_text = f.read()
packages = all_text.split('##')
for package in packages[1:]:
lines = [line.strip() for line in package.split('\n') if line.strip() != '']
package_name = lines[0]
if package_name in parity_tracker_dict:
raise RuntimeError("Duplicated package name `{}` found in {}".format(package_name, file_path))
else:
parity_tracker_dict[package_name] = {}
for api_status in lines[3:]:
api_name, has_impl_parity_str, has_doc_parity_str = [x.strip() for x in api_status.split('|')]
parity_tracker_dict[package_name][api_name] = ParityStatus(
has_impl_parity=parse_parity_choice(has_impl_parity_str),
has_doc_parity=parse_parity_choice(has_doc_parity_str))
return parity_tracker_dict

View File

@ -0,0 +1,237 @@
# The purpose of this test is to check that we have implementation parity between
# a Python `torch.nn.functional` function and its corresponding C++ `torch::nn::functional`
# function. Concretely, this test does the following:
#
# 1. Get a test params dict from common_nn.py, run forward pass on the Python functional
# created using the test params.
#
# 2. Serialize the Python functional's forward input arguments, deserialize them
# in C++ and use them as input for the C++ functional's forward pass.
#
# 3. Run the forward pass on the C++ functional, and serialize the C++ functional's
# forward output.
#
# 4. Compare Python/C++ functional's forward output. If they are the same, then we
# have implementation parity between Python/C++ module.
import tempfile
from string import Template
import re
import pprint
import os
import torch
from cpp_api_parity.utils import TorchNNFunctionalTestParams, TORCH_NN_COMMON_TEST_HARNESS, \
compile_cpp_code_inline, set_python_tensors_requires_grad, move_python_tensors_to_device, \
add_test, compute_cpp_args_construction_stmts_and_forward_arg_symbols, serialize_arg_dict_as_script_module, \
compute_arg_dict, decorate_test_fn, compute_temp_file_path, generate_error_msg, is_torch_nn_functional_test, \
try_remove_folder
from cpp_api_parity.sample_functional import SAMPLE_FUNCTIONAL_CPP_SOURCE
# Expected substitutions:
#
# ${functional_variant_name} (e.g. `BCELoss_no_reduce`)
# ${cpp_args_construction_stmts}
# ${cpp_function_call}
TORCH_NN_FUNCTIONAL_TEST_FORWARD = Template("""
void ${functional_variant_name}_test_forward(
const std::string& arg_dict_file_path,
const std::string& forward_output_file_path) {
pybind11::gil_scoped_release no_gil;
namespace F = torch::nn::functional;
// Declare arguments
auto arg_dict = load_dict_from_file(arg_dict_file_path);
${cpp_args_construction_stmts};
// Some functionals (such as `F::rrelu`) create random tensors in their call path.
// To make sure the random tensors created are the same in Python/C++, we need
// to set the RNG seed manually.
torch::manual_seed(0);
// Run function with arguments
auto cpp_output = ${cpp_function_call};
// Save the output into a file to be compared in Python later
write_ivalue_to_file(torch::IValue(cpp_output), forward_output_file_path);
}
""")
def run_forward(unit_test_class, test_params):
device = test_params.device
inputs = set_python_tensors_requires_grad(move_python_tensors_to_device(
[arg_value for _, arg_value in test_params.arg_dict['input']], device))
inputs += move_python_tensors_to_device(
[arg_value for _, arg_value in test_params.arg_dict['target']], device)
inputs += move_python_tensors_to_device(
[arg_value for _, arg_value in test_params.arg_dict['extra_args']], device)
# Some functionals (such as `F.rrelu`) create random tensors in their call path.
# To make sure the random tensors created are the same in Python/C++, we need
# to set the RNG seed manually.
torch.manual_seed(0)
python_output = test_params.test_instance.constructor()(*inputs)
return python_output
def test_forward(unit_test_class, test_params):
functional_variant_name = test_params.functional_variant_name
cpp_tmp_folder = test_params.cpp_tmp_folder
# Remove the temporary folder if it exists already
try_remove_folder(cpp_tmp_folder)
os.mkdir(cpp_tmp_folder)
# Run forward on Python functional
python_output = run_forward(unit_test_class, test_params)
# Save Python arguments to be used from C++ function
arg_dict_file_path = compute_temp_file_path(cpp_tmp_folder, functional_variant_name, 'arg_dict')
serialize_arg_dict_as_script_module(test_params.arg_dict).save(arg_dict_file_path)
cpp_test_name = '{}_test_forward'.format(test_params.functional_variant_name)
cpp_test_fn = getattr(unit_test_class.functional_impl_check_cpp_module, cpp_test_name)
def run_cpp_test_fn_and_check_output():
forward_output_file_path = compute_temp_file_path(cpp_tmp_folder, functional_variant_name, 'forward_output')
cpp_test_fn(arg_dict_file_path, forward_output_file_path)
cpp_output = torch.load(forward_output_file_path)
# Check that forward outputs are equal
unit_test_class.assertEqual(
python_output, cpp_output,
message=generate_error_msg("forward output", cpp_output, python_output))
run_cpp_test_fn_and_check_output()
# Remove temporary folder that stores C++ outputs
try_remove_folder(cpp_tmp_folder)
def compute_functional_name(test_params_dict):
def camel_case_to_snake_case(camel_case_str):
return re.sub(r'(?<!^)(?=[A-Z])', '_', camel_case_str).lower()
if 'cpp_options_args' in test_params_dict:
# Expected format for `cpp_options_args`: `F::FunctionalFuncOptions(...)`
# Example output: `binary_cross_entropy`
return camel_case_to_snake_case(
test_params_dict['cpp_options_args'].split('(')[0].replace('F::', '').replace('FuncOptions', ''))
elif 'cpp_function_call' in test_params_dict:
# Expected format for `cpp_function_call`: `F::functional_name(...)`
# Example output: `binary_cross_entropy`
return test_params_dict['cpp_function_call'].split('(')[0].replace('F::', '')
else:
raise RuntimeError(
"`cpp_options_args` or `cpp_function_call` entry must be present in test params dict:\n{}".format(
pprint.pformat(test_params_dict)))
def compute_cpp_function_call(test_params_dict, arg_dict, functional_name):
if 'cpp_function_call' in test_params_dict:
return test_params_dict['cpp_function_call']
elif 'cpp_options_args' in test_params_dict:
cpp_forward_args_symbols = [arg_name for arg_name, _ in
arg_dict['input'] + arg_dict['target'] + arg_dict['extra_args']]
return 'F::{}({}, {})'.format(
functional_name, ", ".join(cpp_forward_args_symbols), test_params_dict['cpp_options_args'])
else:
raise RuntimeError(
"`cpp_options_args` or `cpp_function_call` entry must be present in test params dict:\n{}".format(
pprint.pformat(test_params_dict)))
def process_test_params_for_functional(test_params_dict, device, test_instance_class):
test_instance = test_instance_class(**test_params_dict)
functional_name = compute_functional_name(test_params_dict)
assert test_instance.get_name().startswith('test_')
# Example output: `BCELoss_no_reduce_cuda`
functional_variant_name = test_instance.get_name()[5:] + (('_' + device) if device != 'cpu' else '')
arg_dict = compute_arg_dict(test_params_dict, test_instance)
return TorchNNFunctionalTestParams(
functional_name=functional_name,
functional_variant_name=functional_variant_name,
test_instance=test_instance,
cpp_function_call=compute_cpp_function_call(test_params_dict, arg_dict, functional_name),
arg_dict=arg_dict,
has_parity=test_params_dict.get('has_parity', True),
device=device,
cpp_tmp_folder=tempfile.mkdtemp(),
)
def write_test_to_test_class(
unit_test_class, test_params_dict, test_instance_class, parity_table, devices):
assert is_torch_nn_functional_test(test_params_dict)
assert 'cpp_options_args' in test_params_dict or 'cpp_function_call' in test_params_dict, (
"To enable C++ API parity test, "
"`cpp_options_args` or `cpp_function_call` entry must be present in test params dict:\n{}. \n"
"If you are interested in adding the C++ API parity test, please see:\n"
"NOTE [How to check NN module / functional API parity between Python and C++ frontends]. \n"
"If not, please add `test_cpp_api_parity=False` to the test params dict and file an issue about this."
).format(pprint.pformat(test_params_dict))
assert not ('cpp_options_args' in test_params_dict and 'cpp_function_call' in test_params_dict), (
"Only one of `cpp_options_args` and `cpp_function_call` entries "
"should be present in test params dict:\n{}").format(pprint.pformat(test_params_dict))
functional_name = compute_functional_name(test_params_dict)
assert hasattr(torch.nn.functional, functional_name), \
"`torch.nn.functional` doesn't have function `{}`. (Discovered while processing\n{}.)".format(
functional_name, pprint.pformat(test_params_dict))
functional_full_name = 'F::' + functional_name
assert functional_full_name in parity_table['torch::nn::functional'], (
"Please add `{}` entry to `torch::nn::functional` section of `test/cpp_api_parity/parity-tracker.md`. "
"(Discovered while processing\n{}.)").format(functional_full_name, pprint.pformat(test_params_dict))
for device in devices:
test_params = process_test_params_for_functional(
test_params_dict=test_params_dict,
device=device,
test_instance_class=test_instance_class,
)
unit_test_name = 'test_torch_nn_functional_{}'.format(test_params.functional_variant_name)
unit_test_class.functional_test_params_map[unit_test_name] = test_params
def test_fn(self):
test_forward(
unit_test_class=self, test_params=unit_test_class.functional_test_params_map[self._testMethodName])
test_fn = decorate_test_fn(
test_fn=test_fn,
test_cuda=test_params_dict.get('test_cuda', True),
has_impl_parity=parity_table['torch::nn::functional'][functional_full_name][0] and
test_params_dict.get('has_parity', True),
device=device)
add_test(unit_test_class, unit_test_name, test_fn)
def generate_test_cpp_sources(test_params, template):
cpp_args_construction_stmts, _ = compute_cpp_args_construction_stmts_and_forward_arg_symbols(test_params)
test_cpp_sources = template.substitute(
functional_variant_name=test_params.functional_variant_name,
cpp_args_construction_stmts=";\n ".join(cpp_args_construction_stmts),
cpp_function_call=test_params.cpp_function_call,
)
return test_cpp_sources
# Build all C++ tests together, instead of once per test.
def build_cpp_tests(unit_test_class, print_cpp_source=False):
assert len(unit_test_class.functional_test_params_map) > 0
cpp_sources = TORCH_NN_COMMON_TEST_HARNESS + SAMPLE_FUNCTIONAL_CPP_SOURCE
functions = []
for test_name, test_params in unit_test_class.functional_test_params_map.items():
cpp_sources += generate_test_cpp_sources(test_params=test_params, template=TORCH_NN_FUNCTIONAL_TEST_FORWARD)
functions.append('{}_test_forward'.format(test_params.functional_variant_name))
if print_cpp_source:
print(cpp_sources)
cpp_module = compile_cpp_code_inline(
name='functional_impl_check',
cpp_sources=cpp_sources,
functions=functions)
unit_test_class.functional_impl_check_cpp_module = cpp_module

View File

@ -0,0 +1,298 @@
# The purpose of this test is to check that we have implementation parity between
# a Python `torch.nn` module and its corresponding C++ `torch::nn` module. Concretely,
# this test does the following:
#
# 1. Get a test params dict from common_nn.py, run forward and backward on the
# Python module created using the test params.
#
# 2. Serialize the Python module's parameters / buffers and its forward input
# arguments, deserialize them in C++ and load them into the C++ module.
#
# 3. Run the same forward and backward passes on the C++ module, and serialize
# the C++ module's forward output and backward gradients.
#
# 4. Compare Python/C++ module's forward output and backward gradients. If they
# are the same, then we have implementation parity between Python/C++ module.
import tempfile
from string import Template
import types
import pprint
import os
import torch
from cpp_api_parity.utils import TorchNNModuleTestParams, TORCH_NN_COMMON_TEST_HARNESS, \
compile_cpp_code_inline, set_python_tensors_requires_grad, move_python_tensors_to_device, \
add_test, compute_cpp_args_construction_stmts_and_forward_arg_symbols, serialize_arg_dict_as_script_module, \
compute_arg_dict, decorate_test_fn, compute_temp_file_path, generate_error_msg, is_torch_nn_functional_test, \
try_remove_folder
from cpp_api_parity.sample_module import SAMPLE_MODULE_CPP_SOURCE
# Expected substitutions:
#
# ${module_variant_name} (e.g. `Linear_no_bias_cpu`)
# ${module_qualified_name} (e.g. `torch::nn::Linear`)
# ${cpp_args_construction_stmts}
# ${cpp_constructor_args}
# ${device}
# ${cpp_forward_args_symbols}
TORCH_NN_MODULE_TEST_FORWARD_BACKWARD = Template("""
void ${module_variant_name}_test_forward_backward(
const std::string& arg_dict_file_path,
const std::string& module_file_path,
const std::string& forward_output_file_path,
const std::string& backward_grad_dict_file_path) {
pybind11::gil_scoped_release no_gil;
// Declare arguments
auto arg_dict = load_dict_from_file(arg_dict_file_path);
${cpp_args_construction_stmts};
// Construct module and load params/buffers from Python module
${module_qualified_name} module${cpp_constructor_args};
module->to(std::string("${device}"));
torch::load(module, module_file_path);
// Some modules (such as `RReLU`) create random tensors in their forward pass.
// To make sure the random tensors created are the same in Python/C++, we need
// to set the RNG seed manually.
torch::manual_seed(0);
// Forward pass
auto cpp_output = module(${cpp_forward_args_symbols});
// Save the output into a file to be compared in Python later
write_ivalue_to_file(torch::IValue(cpp_output), forward_output_file_path);
// Backward pass
cpp_output.sum().backward();
// Put all gradients into a c10::Dict, save it into a file to be compared in Python later
c10::Dict<std::string, torch::Tensor> grad_dict;
for (const auto& param : module->named_parameters()) {
torch::Tensor grad = param.value().grad();
if (grad.is_sparse()) {
grad_dict.insert(param.key() + "_grad_indices", grad.coalesce().indices());
grad_dict.insert(param.key() + "_grad_values", grad.coalesce().values());
} else {
grad_dict.insert(param.key() + "_grad", grad);
}
}
write_ivalue_to_file(torch::IValue(grad_dict), backward_grad_dict_file_path);
}
""")
def run_python_forward_backward(unit_test_class, test_params):
device = test_params.device
module = test_params.test_instance.constructor(*test_params.test_instance.constructor_args).to(device)
inputs = set_python_tensors_requires_grad(move_python_tensors_to_device(
[arg_value for _, arg_value in test_params.arg_dict['input']], device))
inputs += move_python_tensors_to_device(
[arg_value for _, arg_value in test_params.arg_dict['target']], device)
inputs += move_python_tensors_to_device(
[arg_value for _, arg_value in test_params.arg_dict['extra_args']], device)
# Some modules (such as `RReLU`) create random tensors in their forward pass.
# To make sure the random tensors created are the same in Python/C++, we need
# to set the RNG seed manually.
torch.manual_seed(0)
# Forward pass
python_output = module(*inputs)
# NOTE: This is a workaround to allow any module to be traced.
# We can do this because we are only interested in transferring
# the Python module's parameters and buffers to the C++ module.
module.forward = types.MethodType(lambda self, input: input, module)
script_module = torch.jit.trace(module, torch.tensor(0))
# Backward pass
python_output.sum().backward()
# Put all gradients into a dict, to be compared later
python_grad_dict = {}
for name, param in module.named_parameters():
grad = param.grad
if grad.is_sparse:
python_grad_dict[name + "_grad_indices"] = grad.coalesce().indices()
python_grad_dict[name + "_grad_values"] = grad.coalesce().values()
else:
python_grad_dict[name + "_grad"] = grad
return script_module, python_output, python_grad_dict
def test_forward_backward(unit_test_class, test_params):
module_variant_name = test_params.module_variant_name
cpp_tmp_folder = test_params.cpp_tmp_folder
# Remove the temporary folder if it exists already
try_remove_folder(cpp_tmp_folder)
os.mkdir(cpp_tmp_folder)
# Run forward and backward on Python module
script_module, python_output, python_grad_dict = run_python_forward_backward(unit_test_class, test_params)
# Save Python module and arguments to be used from C++ function
module_file_path = compute_temp_file_path(cpp_tmp_folder, module_variant_name, 'module')
arg_dict_file_path = compute_temp_file_path(cpp_tmp_folder, module_variant_name, 'arg_dict')
script_module.save(module_file_path)
serialize_arg_dict_as_script_module(test_params.arg_dict).save(arg_dict_file_path)
cpp_test_name = '{}_test_forward_backward'.format(test_params.module_variant_name)
cpp_test_fn = getattr(unit_test_class.module_impl_check_cpp_module, cpp_test_name)
def run_cpp_test_fn_and_check_output():
forward_output_file_path = compute_temp_file_path(cpp_tmp_folder, module_variant_name, 'forward_output')
backward_grad_dict_file_path = compute_temp_file_path(cpp_tmp_folder, module_variant_name, 'backward_grad_dict')
cpp_test_fn(arg_dict_file_path, module_file_path, forward_output_file_path, backward_grad_dict_file_path)
cpp_output = torch.load(forward_output_file_path)
cpp_grad_dict = torch.load(backward_grad_dict_file_path)
# Check that forward outputs are equal
unit_test_class.assertEqual(python_output, cpp_output,
message=generate_error_msg("forward output", cpp_output, python_output))
# Check that module parameter gradients are equal after backward pass
unit_test_class.assertEqual(
len(python_grad_dict), len(cpp_grad_dict),
message=generate_error_msg("# of parameters", len(cpp_grad_dict), len(python_grad_dict)))
for key in python_grad_dict:
param_name = None
for suffix in ['_grad', '_grad_indices', '_grad_values']:
if key.endswith(suffix):
param_name = key[:-len(suffix)]
break
assert param_name is not None
sparsity_str = 'sparse' if key.endswith('_grad_indices') or key.endswith('_grad_values') else 'dense'
unit_test_class.assertTrue(
key in cpp_grad_dict,
msg=generate_error_msg(
"\"Does module have a parameter named `{}` with {} gradient?\"".format(param_name, sparsity_str),
False, True))
unit_test_class.assertEqual(
python_grad_dict[key], cpp_grad_dict[key],
message=generate_error_msg(
"`{}`'s {} gradient (`{}`)".format(param_name, sparsity_str, key),
cpp_grad_dict[key], python_grad_dict[key]))
run_cpp_test_fn_and_check_output()
# Remove temporary folder that stores C++ outputs
try_remove_folder(cpp_tmp_folder)
def compute_module_name(test_params_dict):
fullname = test_params_dict.get('fullname', None)
if fullname:
module_name = fullname.split('_')[0]
else:
module_name = test_params_dict.get('module_name')
return module_name
def process_test_params_for_module(test_params_dict, device, test_instance_class):
module_name = compute_module_name(test_params_dict)
test_params_dict['constructor'] = test_params_dict.get('constructor', getattr(torch.nn, module_name))
test_instance = test_instance_class(**test_params_dict)
assert test_instance.get_name().startswith('test_')
# Example output: `BCELoss_weights_cuda`
module_variant_name = test_instance.get_name()[5:] + (('_' + device) if device != 'cpu' else '')
if 'constructor_args' in test_params_dict:
assert 'cpp_constructor_args' in test_params_dict, (
"If `constructor_args` is present in test params dict, to enable C++ API parity test, "
"`cpp_constructor_args` must be present in:\n{}"
"If you are interested in adding the C++ API parity test, please see:\n"
"NOTE [How to check NN module / functional API parity between Python and C++ frontends]. \n"
"If not, please add `test_cpp_api_parity=False` to the test params dict and file an issue about this."
).format(pprint.pformat(test_params_dict))
return TorchNNModuleTestParams(
module_name=module_name,
module_variant_name=module_variant_name,
test_instance=test_instance,
cpp_constructor_args=test_params_dict.get('cpp_constructor_args', ''),
arg_dict=compute_arg_dict(test_params_dict, test_instance),
has_parity=test_params_dict.get('has_parity', True),
device=device,
cpp_tmp_folder=tempfile.mkdtemp(),
)
def write_test_to_test_class(
unit_test_class, test_params_dict, test_instance_class, parity_table, devices):
assert not is_torch_nn_functional_test(test_params_dict)
module_name = compute_module_name(test_params_dict)
assert hasattr(torch.nn, module_name), (
"`torch.nn` doesn't have module `{}`. "
"If you are adding a new test, please set `fullname` using format `ModuleName_desc` "
"or set `module_name` using format `ModuleName` in the module test dict:\n{}"
).format(module_name, pprint.pformat(test_params_dict))
module_full_name = 'torch::nn::' + module_name
assert module_full_name in parity_table['torch::nn'], (
"Please add `{}` entry to `torch::nn` section of `test/cpp_api_parity/parity-tracker.md`. "
"(Discovered while processing\n{}.)").format(module_full_name, pprint.pformat(test_params_dict))
for device in devices:
test_params = process_test_params_for_module(
test_params_dict=test_params_dict,
device=device,
test_instance_class=test_instance_class,
)
unit_test_name = 'test_torch_nn_{}'.format(test_params.module_variant_name)
unit_test_class.module_test_params_map[unit_test_name] = test_params
def test_fn(self):
test_forward_backward(
unit_test_class=self, test_params=unit_test_class.module_test_params_map[self._testMethodName])
test_fn = decorate_test_fn(
test_fn=test_fn,
test_cuda=test_params_dict.get('test_cuda', True),
has_impl_parity=parity_table['torch::nn'][module_full_name][0] and
test_params_dict.get('has_parity', True),
device=device)
add_test(unit_test_class, unit_test_name, test_fn)
def generate_test_cpp_sources(test_params, template):
device = test_params.device
cpp_constructor_args = test_params.cpp_constructor_args
if cpp_constructor_args != '':
cpp_constructor_args = '({})'.format(cpp_constructor_args)
cpp_args_construction_stmts, cpp_forward_args_symbols = \
compute_cpp_args_construction_stmts_and_forward_arg_symbols(test_params)
test_cpp_sources = template.substitute(
module_variant_name=test_params.module_variant_name,
module_qualified_name='torch::nn::{}'.format(test_params.module_name),
cpp_args_construction_stmts=";\n ".join(cpp_args_construction_stmts),
cpp_constructor_args=cpp_constructor_args,
cpp_forward_args_symbols=", ".join(cpp_forward_args_symbols),
device=device,
)
return test_cpp_sources
# Build all C++ tests together, instead of once per test.
def build_cpp_tests(unit_test_class, print_cpp_source=False):
assert len(unit_test_class.module_test_params_map) > 0
cpp_sources = TORCH_NN_COMMON_TEST_HARNESS + SAMPLE_MODULE_CPP_SOURCE
functions = []
for test_name, test_params in unit_test_class.module_test_params_map.items():
cpp_sources += generate_test_cpp_sources(
test_params=test_params, template=TORCH_NN_MODULE_TEST_FORWARD_BACKWARD)
functions.append('{}_test_forward_backward'.format(test_params.module_variant_name))
if print_cpp_source:
print(cpp_sources)
cpp_module = compile_cpp_code_inline(
name='module_impl_check',
cpp_sources=cpp_sources,
functions=functions)
unit_test_class.module_impl_check_cpp_module = cpp_module

Some files were not shown because too many files have changed in this diff Show More