pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 13:44:15 +08:00

Author	SHA1	Message	Date
Siddhartha Menon	e1e6417d4c	Add SVE implementation of embedding_lookup_idx (#133995 ) Adds an accelerated version of the embedding_lookup_idx perfkernels. This is done via a python codegen file similarly to `caffe2/perfkernels/hp_emblookup_codegen.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133995 Approved by: https://github.com/malfet, https://github.com/huydhn	2024-10-15 18:52:44 +00:00
PyTorch MergeBot	dac0b4e62b	Revert "Add SVE implementation of embedding_lookup_idx (#133995 )" This reverts commit 770c134998d3422bc2fa3b90baa235ed0c409e62. Reverted https://github.com/pytorch/pytorch/pull/133995 on behalf of https://github.com/clee2000 due to breaking internal tests, I wondering if this just needs a targets change for buck? ([comment](https://github.com/pytorch/pytorch/pull/133995#issuecomment-2414596554))	2024-10-15 17:23:50 +00:00
Siddhartha Menon	770c134998	Add SVE implementation of embedding_lookup_idx (#133995 ) Adds an accelerated version of the embedding_lookup_idx perfkernels. This is done via a python codegen file similarly to `caffe2/perfkernels/hp_emblookup_codegen.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133995 Approved by: https://github.com/malfet, https://github.com/huydhn	2024-10-14 10:17:27 +00:00
efiks	2e4c89eba9	[torch] Unify batch_box_cox implementations into perfkernels folder (#86569 ) Summary: 1) Adding MKL/AVX2 based implementation into perfkernels. This implementation is similar to caffe2/operators/batch_box_cox_op.cc 2) Migrating batch_box_cox_op of caffe2 use this implementation Test Plan: CI Differential Revision: D40208074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86569 Approved by: https://github.com/hyuen	2022-10-23 19:29:25 +00:00
Shashank Chaudhry	06d1be2447	[NOOP][clangformat][codemod] Enable CLANGFORMAT for caffe2/caffe2/* (#67624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67624 Test Plan: Visual inspection. Sandcastle. Reviewed By: malfet Differential Revision: D31986628 fbshipit-source-id: c872bded7325997a2945dbf5d4d052628dcb3659	2021-11-02 22:14:04 -07:00
Dmytro Dzhulgakov	7576cf8d00	[caffe2] Use cpuinfo in perfkernels to simplify build dependency (#36371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36371 It allows to drop circular dependency and remove unknown_symbols in Buck build. It'd be good to get rid of GetCpuId all together in favor of cpuinfo, but it's not really blocking anything Reviewed By: malfet Differential Revision: D20958000 fbshipit-source-id: ed17a2a90a51dc1adf9e634af56c85f0689f8f29	2020-04-10 13:26:34 -07:00
Evgeny Fiksman	e372f42110	[caffe2] Explicit vectorization of LSTM operator (#35556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35542 Apply explicit vectorization to lstm_unit operator. Enabled by -DENABLE_VECTORIZATION=1 This optimization requires vector library support and was tested with Intel SVML & clang. However, compiler which support OpenMP4.5 with omp simd extention should also benefit. After the code changes In file included from caffe2/caffe2/operators/lstm_unit_op.cc:1: caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] VECTOR_LOOP for (int d = 0; d < D; ++d) { caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] caffe2/caffe2/operators/lstm_unit_op.h:112:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] VECTOR_LOOP for (int d = 0; d < D; ++d) { Test Plan: Check failures at OSS CI - No build failures related to this change - Failing tests are: - py3.6-clang7-rocmdeb-ubuntu16.04-test2 >RuntimeError: fft: ATen not compiled with MKL support - caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test - >gradient_check_test.py::TestMakeTwo Exited with code exit status 1 - pytorch_macos_10_13_py3_test , Test errors like: > ERROR [0.014s]: test_boolean_indexing_weirdness_cpu (__main__.NumpyTestsCPU) RuntimeError: shape mismatch: indexing tensors could not be broadcast together with shapes [0], [2] - caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test - No failure info Reviewed By: jspark1105 Differential Revision: D20484640 fbshipit-source-id: 8fb82dbd6698c8de3e0bbbc0b48d15b70e36ca94	2020-04-01 17:19:56 -07:00
Tongliang Liao	55511004d1	Resolve errors in perfkernel for Windows (#16031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16031 1. MSVC only has _mm_prefetch(const char*, int). Fixed in both python codegen and C++ files. 2. uint32_t in "cvtsh_ss_bugfix.h" requires "#include <cstdint>". 3. Some files use gflags headers. Add dependency via c10. 4. Isolate arch flags with interface library and private compile options. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15753 Reviewed By: dskhudia Differential Revision: D13636233 Pulled By: jspark1105 fbshipit-source-id: cdcbd4240e07b749554a2a5676c11af88f23c31d	2019-01-16 21:51:00 -08:00
Jongsoo Park	1e0eab5df8	minimize header file includes from _avx2.cc (#14950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14950 Minimize the number of headers included from _avx2.cc files to avoid accidental compilation of functions defined the header files reused by other translation units that can lead to illegal instruction errors. Reviewed By: dskhudia Differential Revision: D13394483 fbshipit-source-id: 67149a6fb51f7f047e745bfe395cb6dd4ae7c1ae	2018-12-13 00:18:11 -08:00
Jongsoo Park	0573ef664e	include avx512vl to avx512 code path (#14733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14733 We often also want to use AVX512VL instruction sets. We already included AVX512F, AVX512DQ. Skylake also has AVX512BW, AVX512CD we may want to later. Reviewed By: duc0 Differential Revision: D13317282 fbshipit-source-id: 82c8e401d82d5c3a5452fb4ccb6e5cb88d242bda	2018-12-05 00:50:51 -08:00
Jongsoo Park	b5181ba1df	add avx512 option (but no avx512 kernel yet) (#14664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14664 This diff just adds a framework to add avx512 kernels. Please be really really careful about using avx512 kernels unless you're convinced using avx512 will bring good enough overall speedups because it can backfire because of cpu frequency going down. Reviewed By: duc0 Differential Revision: D13281944 fbshipit-source-id: 04fce8619c63f814944b727a99fbd7d35538eac6	2018-12-03 12:18:19 -08:00
Jongsoo Park	4b86a215ca	moving simd adagrad code to perfkernels (#13549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13549 caffe2/perfkernels has a nice framework to switch btw implementations optimized for different instructions at runtime. This can be a good preparation to implement avx512 adagrad kernels. Reviewed By: hyuen Differential Revision: D12882872 fbshipit-source-id: a8f0419f6a9fd4e9b864c454dad0a80db267190c	2018-11-11 00:20:39 -08:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Yangqing Jia	eccddbc204	vectorized typed axpy implementation Summary: This adds an example for vectorized typed axpy implementation under perfkernels. Reviewed By: dzhulgakov Differential Revision: D5479258 fbshipit-source-id: 469e6c8aaf2c12cdf0025bc867eb9d4cab84184f	2017-07-25 12:08:27 -07:00

15 Commits