mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Add the OpenMP optimization for BatchPermutation. (#12153)
Summary: This is for Caffe2 optimization. WIth this optimization, the following two ops can boost a lot. (Test with MaskRCNN, on SKX8180 one socket) BatchPermutation op: reduced from 8.296387 ms to 1.4501984 ms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12153 Differential Revision: D10362823 Pulled By: ezyang fbshipit-source-id: 04d1486f6c7db49270992cd8cde41092154e62ee
This commit is contained in:
committed by
Facebook Github Bot
parent
3709734b1c
commit
5416260b1e
@ -100,6 +100,13 @@ bool BatchPermutationOp<float, CPUContext>::RunOnDevice() {
|
||||
const float *src = X.template data<float>();
|
||||
float *dst = Y->template mutable_data<float>();
|
||||
|
||||
#ifdef _OPENMP
|
||||
#if (_OPENMP >= 201307)
|
||||
#pragma omp parallel for simd
|
||||
#else
|
||||
#pragma omp parallel for
|
||||
#endif
|
||||
#endif
|
||||
for (int i = 0; i < N; i++) {
|
||||
int idx = indices.template data<int>()[i];
|
||||
|
||||
|
Reference in New Issue
Block a user