This is reland of PRs #https://github.com/pytorch/pytorch/pull/108626 and #109564. We fixed the IOS build failure by changing
```
((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR)))
```
to
```
((CHECK) ? (EXPR) : ([] { assert(false); }(), (EXPR)))
```
in TR2_OPTIONAL_ASSERTED_EXPRESSION, since the former syntax was invalid on Apple Clang. Anyway, we could apply the simple fix hoping that c10::optional would be replaced by std::optional soon.
We also enabled -Wdeprecated on c10.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110019
Approved by: https://github.com/clee2000
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71196
`caffe2` headers contain code that can elicit warnings when built with strict compiler flags. Rather than force downstream/consuming code to weaken their compiler flags, suppress those warnings in the header using `#pragma clang diagnostic` suppressions.
Test Plan: CI Pass
Reviewed By: malfet
Differential Revision: D33536233
fbshipit-source-id: 74404e7a5edaf244f79f7a0addd991a84442a31f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69110
I pasted the current LLVM code, reapplied the modifications listed in the code comments, caught a few more in the diff/build process. The trivially copyable detection is different now; if gcc builds fail, will try reverting to C10_IS_TRIVIALLY_COPYABLE or copying what LLVM is doing.
The motivation for this change is that, as noted in an existing comment, C10_IS_TRIVIALLY_COPYABLE did the wrong thing for std::unique_ptr, which caused problems with D32454856 / #68412.
ghstack-source-id: 145327773
Test Plan: CI
Reviewed By: bhosmer, mruberry
Differential Revision: D32733017
fbshipit-source-id: 9452ab90328e3fdf457aad23a26f2f6835b0bd3d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64682
Looks like it was forked from llvm before cbegin and cend existed.
ghstack-source-id: 138036981
Test Plan: CI
Reviewed By: dhruvbird
Differential Revision: D30814434
fbshipit-source-id: 9740fa8d3df1c90b77298a95ab9f1d0cf8c90320
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
All pretty minor. I avoided renaming `class DestructableMock` to `class DestructibleMock` and similar such symbol renames (in this PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49815
Reviewed By: VitalyFedyunin
Differential Revision: D25734507
Pulled By: mruberry
fbshipit-source-id: bbe8874a99d047e9d9814bf92ea8c036a5c6a3fd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44702
Original commit changeset: c6bd6d277aca
This diff caused windows build to fail due to a compiler bug in VS2019 (lambda capture constant int value). This back out works around the issue with explicit capture of const int value.
Test Plan: Tested and previously landed.
Reviewed By: mruberry
Differential Revision: D23703215
fbshipit-source-id: f9ef23be97540bc9cf78a855295fb8c69f360459
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44252
Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context.
Test Plan:
Tested with dpp perf test and able to collect trace.
{F307824044}
Reviewed By: ilia-cher
Differential Revision: D23323486
fbshipit-source-id: 4b6ca6c0e32028fb38a476cd1f44c17a001fc03b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42694
The old implementation allowed calling SmallVector constructor and operator= for any type without restrictions,
but then failed with a compiler error when the type wasn't a collection.
Instead, we should only use it if Container follows a container concept and just not match the constructor otherwise.
This fixes an issue kimishpatel was running into.
ghstack-source-id: 109370513
Test Plan: unit tests
Reviewed By: kimishpatel, ezyang
Differential Revision: D22983020
fbshipit-source-id: c31264f5c393762d822f3d64dd2a8e3279d8da44
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)
Changes so far:
- [x] Renamed references to variable "I" that may be confused for "I" defined in complex.h. I did this to avoid crazy CI failures messages as complex.h is included by more source files.
- aten/src/ATen/native/cpu/Loops.h (Renamed I to INDEX)
- aten/src/ATen/native/cuda/Loops.cuh (Renamed I to INDEX)
- aten/src/ATen/core/ivalue_inl.h (Renamed I to INDEX)
- c10/util/Array.h (Renamed I to INDEX)
- c10/util/C++17.h (Renamed I to INDEX)
- c10/util/Metaprogramming.h (Renamed I to INDEX)
- c10/util/SmallVector.h (custom renaming)
- [x] Added complex support of Linear Algebra Ops.
- SVD needed to be modified to support mixed data types
- Example U(std::complex<double)), S(double), V(std::complex<double>)
- See before and after benchmark below (No observable change in performance).
- [x] Added complex support of Reduce Ops.
- var/std computations could have been faster if it was possible to interpret std::complex<double> Tensor as a double Tensor.
- [x] Added complex derivative support for autograd functionality.
- derivatives are the same as defined by numpy autograd library for real(), imag(), conj(), angle(). These functions only affect complex numbers.
- derivative of abs() has not been modified to not interfere with existing code.
- Autograd defines abs() for complex numbers and fabs() for real numbers. I will look into this further down the road.
----------------------------------------
PyTorch/Caffe2 Operator Micro-benchmarks Before Changes
----------------------------------------
Tag : short
Benchmarking PyTorch: svd
Mode: Eager
Name: svd_M512_N512
Input: M: 512, N: 512
Forward Execution Time (us) : 162339.425
Forward Execution Time (us) : 162517.479
Forward Execution Time (us) : 162847.775
----------------------------------------
PyTorch/Caffe2 Operator Micro-benchmarks After Changes
----------------------------------------
Tag : short
Benchmarking PyTorch: svd
Mode: Eager
Name: svd_M512_N512
Input: M: 512, N: 512
Forward Execution Time (us) : 162032.117
Forward Execution Time (us) : 161943.484
Forward Execution Time (us) : 162513.786
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27653
Differential Revision: D17907886
Pulled By: ezyang
fbshipit-source-id: a88b6d0427591ec1fba09e97c880f535c5d0e513
Summary:
Hi guys,
I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios.
This is the first pull request.
Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015.
CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system.
Python is 3.5, Detectron works from python interface as well.
It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built.
What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat.
After this pull request the next step is to add Visual Studio 2017 support in the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550
Reviewed By: ezyang
Differential Revision: D13042597
Pulled By: orionr
fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc