Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531
Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service,
we would like to change it to C10_LOG_FIRST_N to prevent that.
Reviewed By: dzhulgakov
Differential Revision: D14647704
fbshipit-source-id: b84e4002bd4aa94d616133cd1049c3d4ab05386e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18284
ghimport-source-id: 5a92c03fda19072ffb6afd40e0f56806716c7be6
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18296 [jit] Add namespacing for ScriptClasses
* **#18284 [jit] make test module hook use save/load**
* #18211 [jit] Turn script_type_parser into a class
* #18148 [jit] python interop for script classes
Instead of python-printing and comparing strings (which does not capture
depdency information, etc.), use save/load on in-memory buffers and
compare the main module contents inside the buffer
Reviewed By: ailzhang
Differential Revision: D14581129
fbshipit-source-id: 52264ae9ce076775ab3fd1a0c32c8d6f6677a903
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18314
ghimport-source-id: 8cecb768d476ab19c9460f39c8f94a764e4cb052
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18314 Add ability to specialize class types to ArgumentSpec**
* #18226 Add Slot type to abstract the raw pointers being used for slots.
Differential Revision: D14574395
fbshipit-source-id: cc3af6e56e9ae52990f4a1ad56ecceaa2d493577
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740
Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure
* benchmark_core.py : core utiltiites for running microbenchmark tests
* benchmark_caffe2.py : Caffe2 specific benchmark utilitites
* benchmark_pytorch.py: PyTorch specific benchmark utilities
* benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP.
The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend.
Include two operator microbenchmarks; support both Caffe2/PyTorch:
* MatMul
* Add
Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo.
Reviewed By: zheng-xq
Differential Revision: D13887111
fbshipit-source-id: b7a56b95448c9ec3e674b0de0ffb96af4439bfce
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18166
ghimport-source-id: a8e2ba2d966e49747a55701c4f6863c5e24d6f14
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18166 Bool Tensor for CUDA**
* #18165 Resolved comments from Bool Tensor for CPU PR
------
This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
1. Storage Implementation [Done]
2. Tensor Creation.
a) CPU [Done]
b) CUDA [This PR]
3. Tensor Conversions.
4. Tensor Indexing.
5. Tensor Operations.
6. Back compatibility related changes.
Change:
Enable bool tensor in CUDA with the following operations:
torch.zeros
torch.tensor
torch.ones
torch.rand/rand_like/randint/randint_like
torch.full
torch.full_like
torch.empty
torch.empty_like
Tested via unit tests and local scripts.
Differential Revision: D14605104
fbshipit-source-id: b7d7340a7d70edd03a109222d271e68becba762c
Summary:
To debug a `one of the variables needed for gradient computation has been modified by an inplace operation` error, I wanted to know *which* variable has been modified, so I extended the error message with what information is easily available at this point.
Before:
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
```
After:
```
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [80, 1]], which is output 0 of UnsqueezeBackward0, is at version 1, not expected version 0. Hint: enable anomaly detection to find the forward pass operation which modified it.
```
The hint to enable anomaly detection is only shown when it is not enabled. It's meant to save people some googling. I'd even go further and reference `torch.autograd.set_detect_anomaly(True)`, but maybe we're not running Python?
Disclaimer: I haven't looked at other parts of the code to check if using `std::stringstream` is acceptable practice, let me know if it isn't. Similarly, I haven't checked about indentation practices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18523
Differential Revision: D14683249
Pulled By: soumith
fbshipit-source-id: f97a99d4aabea7461df766d66cd72300b48e2350
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18763
Without `link_whole` flag in opt-builds some of the files are not linked into `_C_impl` library, which causes some of static initializers not to run (namely, registering an cutomPythonOperation from python_interpreter.cpp). This diff fixes it.
Differential Revision: D14732471
fbshipit-source-id: 57cff6b4b6d479ad7ab7fd29f677746d91d6ff45
Summary:
Fix the bug introduced by #18681 where an undefined variable was being used to limit max cpu count when building for Windows without Ninja.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18748
Differential Revision: D14733209
Pulled By: soumith
fbshipit-source-id: 52fc0dd4dde99da75a6956b63f02da2e647eed4f
Summary:
Argument dim=-1 doesn't work for torch.cross. The signature of the torch.cross has been changed to c10::optional<int64_t> dim instead of int64_t. So based on document "If dim is not given, it defaults to the first dimension found with the size 3." and if dim is specified (even negative) it will use the correspondent dim.
Fixes#17229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17582
Differential Revision: D14483063
Pulled By: ifedan
fbshipit-source-id: f9699093ec401cb185fd33ca4563c8a46cdcd746
Summary:
Multiple configurations is the default (eg. Release;Debug) on Windows and this check always broke this configuration as CMAKE_BUILD_TYPE was not set. The workaround was to always set CMAKE_BUILD_TYPE to Debug or Release, which was very unfortunate.
The correct method is to use generator expressions that expand depending on the current CONFIG being processed.
Side note: Anywhere else CMAKE_BUILD_TYPE is checked should probably be fixed too.
Note that the CMakeLists.txt forces it in to Release mode. However, I came across this error when importing the prebuilt Config in to another project, where CMAKE_BUILD_TYPE was not set.
> 3>CMake Error at pre_built/pytorch-1.0.1/share/cmake/Caffe2/public/cuda.cmake:380 (message):
> 3> Unknown cmake build type:
Proper support for configurations would mean we can build debug and release at the same time and as you can see, it is less CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18548
Differential Revision: D14730790
Pulled By: ezyang
fbshipit-source-id: 70ae16832870d742c577c34a50ec7564c3da0afb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18577
This is also part of the legacy API and we need to support it if we want to replace it.
Reviewed By: dzhulgakov
Differential Revision: D14671432
fbshipit-source-id: 007abf4ab816647a509fc08e35d79b6c1aa55b03
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18551
This is helpful for defining a set of operators as an interface but not adding concrete kernels just yet.
The registration logic will ensure that any other libraries that add kernels for these schemas exactly match the schema defined here.
Reviewed By: dzhulgakov
Differential Revision: D14660208
fbshipit-source-id: 7adb5a4876cff5a0ad21d92d8c450cb889f00cc3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18550
When the operator registration API is used wrongly, in most cases we should now get a nice compiler error
instead of weird template error messages.
This is done by making the enable_if conditions more broad so they also match error cases,
but then having static_asserts against these error cases inside the function.
Before that, since the function didn't match, the error message said something like "no function found to match your call",
now it will show the error message specified in the static_asserts.
Reviewed By: dzhulgakov
Differential Revision: D14659178
fbshipit-source-id: 7ca4fb72d9051eadf0a7e2717b962bf1213a52b2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18547
- Argument indices in the error messages are 1-indexed not 0-indexed.
- Add test cases that a mismatching signature actually shows the correct error messages
Reviewed By: dzhulgakov
Differential Revision: D14656695
fbshipit-source-id: 55e45634baa3117e18b8687ea6b2a2f83715bdf6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18703
`zeroPtr` is sometimes a `std::string` tensor, so `memset` to 0 is undefined behavior.
This might be accidentally safe with `std::string` implementation that use SSO (Small String Optimization), but will crash otherwise.
Reviewed By: zheng-xq
Differential Revision: D14714458
fbshipit-source-id: 012a18464e6514d38ff791509b88ddc3fc55b2b1
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.
Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```
Part of the bigger: `Remove Storage` plan.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455
Reviewed By: ezyang
Differential Revision: D14672084
Pulled By: VitalyFedyunin
fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124
Summary:
Hi. It seems that when building CPP-extensions with CUDA for Windows, an `extra_cuda_cflags` options are not properly forwarded to `nvcc`.
Use of extra CUDA options is necessary to build, for instance, a InplaceABN (https://github.com/mapillary/inplace_abn), which requires `--expt-extended-lambda` option.
This PR adds one line that correctly appends `extra_cuda_cflags`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18638
Differential Revision: D14704270
Pulled By: ezyang
fbshipit-source-id: e1e330d193d9afd5707a5437a74c0499460d2b90
Summary:
At some point, we needed these functions to deal with autograd dispatching to the sparse of TH version of a backwards. But we rewrote all backwards definitions in terms of native functions, so this is no longer necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18696
Differential Revision: D14710834
Pulled By: gchanan
fbshipit-source-id: b22568c58eefc79d672555bd8832398ccd965cb7
Summary:
Added stubs for:
* The `device` module
* The `cuda` module
* Parts of the `optim` module
* Began adding stubs for the `autograd` module. I'll annotate more later but `no_grad` and friends are probably the most used exports from it so it seemed like a good place to start.
This would close#16996, although comments on that issue reference other missing stubs so maybe it's worth keeping open as an umbrella issue.
The big remaining missing package is `nn`.
Also added a `py.typed` file so mypy will pick up on the type stubs. That closes#17639.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18511
Differential Revision: D14715053
Pulled By: ezyang
fbshipit-source-id: 9e4882ac997063650e6ce47604b3eaf1232c61c9
Summary:
Peephole optimize ops that just require Dimensioned Tensor Type, which is what we specialize graphs on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18549
Differential Revision: D14690827
Pulled By: eellison
fbshipit-source-id: 9d7439eb584f0a5b877f5aa53cf80150f00e7e5f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18542
This adds the deprecated API for defining kernels as lambdas. The new API for defining kernels as lambdas was introduced in D14653005.
Reviewed By: dzhulgakov
Differential Revision: D14653551
fbshipit-source-id: 99900f1436716c69e52c83b68333b642ec2c8558
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18444
This adds the deprecated function based API to c10::RegisterOperators().
This is the API currently exposed under jit::RegisterOperators() and we need to support it for backwards compatibility.
Reviewed By: dzhulgakov
Differential Revision: D14514218
fbshipit-source-id: c77676851cfd431d66f18fd8038cf153a3a7d7cc
Summary:
This commit adds the `c10d::Reducer` class that hooks into autograd
and performs gradient bucketing and reduction. These are the core
parts of `nn.parallel.DistributedDataParallel` that up to now were
only usable for CUDA models.
This should enable the following:
* Distributed data parallelism for models defined using the C++ frontend.
* Allow overlap of gradient computation and reduction for non-CUDA models.
* Enable distributed data parallelism for models with some unused parameters.
This does not include any logic for computing bucket assignment, which
can be done separately; either by observing autograd execution order
(this is what Apex does), or by assigning buckets based on some
maximum byte size, or both.
Also see #17757 and #13273.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18251
Reviewed By: mrshenli
Differential Revision: D14571899
Pulled By: pietern
fbshipit-source-id: 20f95eefd288dfe8cfffe0a28ca22fa7c9c3cd4c
Summary:
If none of the outputs require_grad, we don't actually check gradgrad, instead we will check that their numerical gradients are 0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18190
Differential Revision: D14563388
Pulled By: ifedan
fbshipit-source-id: a4eb94c9eb60f14dbe6986cd8cef1fe78a7bc839
Summary:
The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix.
Re-land of https://github.com/pytorch/pytorch/pull/18304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570
Reviewed By: driazati
Differential Revision: D14707285
Pulled By: eellison
fbshipit-source-id: 3a0265928aa8cad78961723d8bf0fbf871fdb71d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18672
In Skylake, when n < 128 or k < 128, acc16 is slower.
Reviewed By: jianyuh
Differential Revision: D14700576
fbshipit-source-id: 80ca9f1af4626637eed9c5ca49f95ae744811189