Summary:
1) Adding MKL/AVX2 based implementation into perfkernels. This implementation is similar to caffe2/operators/batch_box_cox_op.cc
2) Migrating batch_box_cox_op of caffe2 use this implementation
Test Plan: CI
Differential Revision: D40208074
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86569
Approved by: https://github.com/hyuen
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36371
It allows to drop circular dependency and remove unknown_symbols in Buck build.
It'd be good to get rid of GetCpuId all together in favor of cpuinfo, but it's not really blocking anything
Reviewed By: malfet
Differential Revision: D20958000
fbshipit-source-id: ed17a2a90a51dc1adf9e634af56c85f0689f8f29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35556
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35542
Apply explicit vectorization to lstm_unit operator.
Enabled by -DENABLE_VECTORIZATION=1
This optimization requires vector library support and was tested with Intel SVML & clang.
However, compiler which support OpenMP4.5 with omp simd extention should also benefit.
After the code changes
In file included from caffe2/caffe2/operators/lstm_unit_op.cc:1:
caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize]
VECTOR_LOOP for (int d = 0; d < D; ++d) {
caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize]
caffe2/caffe2/operators/lstm_unit_op.h:112:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize]
VECTOR_LOOP for (int d = 0; d < D; ++d) {
Test Plan:
Check failures at OSS CI
- No build failures related to this change
- Failing tests are:
- py3.6-clang7-rocmdeb-ubuntu16.04-test2
>RuntimeError: fft: ATen not compiled with MKL support
- caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test -
>gradient_check_test.py::TestMakeTwo
Exited with code exit status 1
- pytorch_macos_10_13_py3_test , Test errors like:
> ERROR [0.014s]: test_boolean_indexing_weirdness_cpu (__main__.NumpyTestsCPU)
RuntimeError: shape mismatch: indexing tensors could not be broadcast together with shapes [0], [2]
- caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test
- No failure info
Reviewed By: jspark1105
Differential Revision: D20484640
fbshipit-source-id: 8fb82dbd6698c8de3e0bbbc0b48d15b70e36ca94
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14950
Minimize the number of headers included from _avx2.cc files to avoid accidental compilation of functions defined the header files reused by other translation units that can lead to illegal instruction errors.
Reviewed By: dskhudia
Differential Revision: D13394483
fbshipit-source-id: 67149a6fb51f7f047e745bfe395cb6dd4ae7c1ae
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14733
We often also want to use AVX512VL instruction sets.
We already included AVX512F, AVX512DQ.
Skylake also has AVX512BW, AVX512CD we may want to later.
Reviewed By: duc0
Differential Revision: D13317282
fbshipit-source-id: 82c8e401d82d5c3a5452fb4ccb6e5cb88d242bda
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14664
This diff just adds a framework to add avx512 kernels.
Please be really really careful about using avx512 kernels unless you're convinced using avx512 will bring good enough *overall* speedups because it can backfire because of cpu frequency going down.
Reviewed By: duc0
Differential Revision: D13281944
fbshipit-source-id: 04fce8619c63f814944b727a99fbd7d35538eac6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13549
caffe2/perfkernels has a nice framework to switch btw implementations optimized for different instructions at runtime.
This can be a good preparation to implement avx512 adagrad kernels.
Reviewed By: hyuen
Differential Revision: D12882872
fbshipit-source-id: a8f0419f6a9fd4e9b864c454dad0a80db267190c
Summary:
This adds an example for vectorized typed axpy implementation under
perfkernels.
Reviewed By: dzhulgakov
Differential Revision: D5479258
fbshipit-source-id: 469e6c8aaf2c12cdf0025bc867eb9d4cab84184f