Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20740
Provide a way to assemble quantized Tensor from int8 Tensor, scale and zero point.
Differential Revision: D15232416
fbshipit-source-id: c3a3d9d7214b1dc569214c019440c2779fbd063b
Summary:
This is the first part of the planned changes to change the comparison operations result tensor dtype from Byte to Bool. You can see the whole list of changes (not cleaned up) [here](https://github.com/pytorch/pytorch/pull/19332). As the PR is too big for a single review im breaking it into pieces.
**Changes in this PR:**
1. Enable these methods for bool tensors:
- maskedSelect
- maskedSelectBool
- bitand
- cbitand
- bitor
- cbitor
- bitxor
- cbitxor
- sign
- equal
- neg
2. Add bool clause for the TH version of sign method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20767
Differential Revision: D15436446
Pulled By: izdeby
fbshipit-source-id: 8d2494b5f4873cd79c7f1a40d2cb045cadfad51a
Summary:
I didn't update the Windows references because I wasn't sure if they apply to CUDA 9. peterjc123 what should the Windows section say?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20718
Differential Revision: D15459276
Pulled By: colesbury
fbshipit-source-id: 917e22f8ac75378d88c962c226b5a42b6799c79a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20802
Need this for sequence model
Reviewed By: dzhulgakov
Differential Revision: D15448529
fbshipit-source-id: cd5abe3b689fc0e02feff10faf8cd61c99369f4f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20786
Add a method to LayerModelHelper to filter metrics_schema. A general model builder may add metric schema that is not needed in some situations. This change add the ability to skip those unneeded.
Reviewed By: alex1o1o7cloud
Differential Revision: D15418140
fbshipit-source-id: 520f5dffd9938cf206cb1352e2953a4d4d2b6ab1
Summary:
When detecting the presence of NumPy using import, move numpy-related variable assignments outside the try block (i.e., to an else block) to improve readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20739
Differential Revision: D15453916
Pulled By: ezyang
fbshipit-source-id: d3c37f2b290846be3c6a1462251cbb3e95d493be
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20787
Set requires_grad=False for bias: this will block the jit tracing.
The as_type fix: The input tensor shape and output tensor shape will be different, which will trigger the assertion failure at https://fburl.com/0m8xy7tc.
Reviewed By: jamesr66a
Differential Revision: D15445092
fbshipit-source-id: 22da41a56ecb9ac092585d0cc1ff0658fb9d631b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20045
This pass adds quant-dequant nodes for bias. This pass requires
quant-dequant pass for activations and weights to be done as it is required
to compute the qparams for bias
Differential Revision: D15179141
fbshipit-source-id: 3aab9fceefcadc3fa42a4e802d9b1e18addad78a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20770
Add dict type since it's part of the pytorch built-in system, and sparse features and text features will be converted to Dict
Reviewed By: pritamdamania87
Differential Revision: D15436255
fbshipit-source-id: 239adbd6a8f68be29020fe656d790f6872f1f0e9
Summary:
as title. We were using AT_ASSERT, which is newly deprecated. In this case, we do in fact want an internal assertion since this is used in testing code to describe expected behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20555
Differential Revision: D15362964
Pulled By: suo
fbshipit-source-id: 984bfe71a774571611f3bbd81767d3cdb878a6fd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20345
Seperate from D15194600
Optimize pytorch layer_norm op part 1:
optimize layer_norm_forward_cpu
import Eigen Maps for the performance of reduction
Reviewed By: zheng-xq
Differential Revision: D15290608
fbshipit-source-id: cf2c208dfd6fbcbc4c69db3ed60278d9bee156b5
Summary:
Previous implementation of magic methods extended from BuiltinOperators, but it should be able to work with other sugared values, such as casts.
I was also considering making CastValue's and BuiltinOperators's extend from a MagicMethod super class, and having them try to call into the super's before their own call. However, not all Builtin Operators have corresponding magic methods so i did it this way instead (although there are workarounds for that).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20654
Differential Revision: D15434469
Pulled By: eellison
fbshipit-source-id: 813fa00bf8b5b9ada46505075ebf984d8eee6aef
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20711
For uint8_t, ```std::numeric_limits::digits``` returns 8;
For int8_t, ```std::numeric_limits::digits``` returns 7.
FBGEMM wants to get the ```qparams.precision``` to be always 8 for both int8_t and uint8_t.
Reviewed By: jerryzh168
Differential Revision: D15410695
fbshipit-source-id: 17dc3842d7c426947454c201bcb167b87b7301ce
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20726
Edward says it doesn't actually provide compilers,
but it does provide dependencies, so let's mention that instead.
Reviewed By: ezyang
Differential Revision: D15423316
fbshipit-source-id: 9b384f88e5bf7a3d2c132508620c276b49e1569f
Summary:
This PR implements auto-conversion of GPU arrays that support the `__cuda_array_interface__` protocol (fixes#15601).
If an object exposes the `__cuda_array_interface__` attribute, `touch.as_tensor()` and `touch.tensor()` will use the exposed device memory.
#### Zero-copy
When using `touch.as_tensor(...,device=D)` where `D` is the same device as the one used in `__cuda_array_interface__`.
#### Implicit copy
When using `touch.as_tensor(...,device=D)` where `D` is the CPU or another non-CUDA device.
#### Explicit copy
When using `torch.tensor()`.
#### Exception
When using `touch.as_tensor(...,device=D)` where `D` is a CUDA device not used in `__cuda_array_interface__`.
#### Lifetime
`torch.as_tensor(obj)` tensor grabs a reference to `obj` so that the lifetime of `obj` exceeds the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20584
Differential Revision: D15435610
Pulled By: ezyang
fbshipit-source-id: c423776ba2f2c073b902e0a0ce272d54e9005286
Summary:
Appending `arch` to the generator name is not supported for VS starting from VS 2019.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20752
Differential Revision: D15436740
Pulled By: ezyang
fbshipit-source-id: 20057aae8f708d82619927bf2cb87dd1bc2df312
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20737
If someone tries to register multiple kernels in the same .op() call, we're now throwing an error.
Differential Revision: D15425660
fbshipit-source-id: 6d2f1444da3e16a6a98863d847965c2aa211e046
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20674
A few targets in caffe2/caffe2/distribute needs to be split too, otherwise won't compile. Also some clean ups and make select_gpu_type to gpu_library_selector
Differential Revision: D15406019
fbshipit-source-id: 6455ab885b248502b48d4c7565597e00fecfd547
Summary:
Let there be color!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20662
Differential Revision: D15434110
Pulled By: suo
fbshipit-source-id: a317ae72ad72e0b8249f55c9c8d31f420c78c040
Summary:
building with cuda and gcc 4.8.5-28, we see many warnings like:
[893/1645] Building NVCC (Device) object caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/THCUNN/caffe2_gpu_generated_ELU.cu.o
/home/bvaughan/repos/pytorch/c10/util/ArrayRef.h:277:48: warning: ‘deprecated’ attribute directive ignored [-Wattributes]
using IntList C10_DEPRECATED_USING = ArrayRef<int64_t>;
This change prevents those warnings on the older compiler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20587
Differential Revision: D15432749
Pulled By: nairbv
fbshipit-source-id: fd707afcbd6564f96617378d7cd6d62d941a052b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20468
ScalarType node is mandatory for activations and parameters now.
This change inserts ScalarType node for all the quant-dequant nodes. For the activations, currently the default value is at::ScalarType::Undefined. Remove this and explicitly pass the at::ScalarType::QUint8 dtype
Differential Revision: D15331600
fbshipit-source-id: 5b51e0b42e694bf409026af4783a12da6d7e234b
Summary:
Copy.cu goes from 308 to 190 lines of code. In general it uses, the same
copy strategy, using cudaMempcyAsync, a pointwise kernel, or a copy
using temporary buffers. The pointwise kernel has slightly improved
performance when broadcasting due to faster index calculation.
This deletes "`s_copy_`", "`_s_copy_from`", and "`_copy_same_type_`". The only
entry-point now is "`copy_`".
A mini-benchmark is here:
https://gist.github.com/colesbury/706de1d4e8260afe046020988410b992
Before:
https://gist.github.com/colesbury/ab454b6fe3791bff420d7bcf8c041f18
After:
https://gist.github.com/colesbury/9024d242b56ab09a9ec985fa6d1620bc
Results were measured on 2.2 GHz Broadwell; no-turbo; one thread;
compiled with GCC 7.3.0. (Results are slower than typical usage due to
turbo being off.)
The only significant differences is in the CUDA [1024] -> [1024, 1024]
broadcasting copy which is ~25% faster. I don't expect a noticeable
difference in real programs.
CPU copy overhead is a tiny bit (~200 ns) faster, but I don't expect
anyone to notice that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20685
Differential Revision: D15414819
Pulled By: colesbury
fbshipit-source-id: d3c6e04a5020470e3bef15b1fc09503cae5df440