Files
pytorch/caffe2/perfkernels
Nicolas De Carli eb11d172e3 [Caffe2] Improve SVE batch box cox by 2% (#163360)
Summary:
Improve bound checking on exp computation, decreasing the longest dependency chain by 1.

Box-cox benchmarks show about 2% of improved throughput.
Precision remains unaltered.

before:

NonZeroLambdaBatch                                        155.30us     6.44K

after:

NonZeroLambdaBatch                                        151.78us     6.59K

Test Plan:
Correctness:

buck2 test @//mode/opt //koski/functions_contrib/df4ai/tests:batch_box_cox_test

Performance:

buck2 run @//mode/opt //koski/functions_contrib/df4ai/benchmark:boxcox_benchmark

Differential Revision:
D82847111

Privacy Context Container: L1208939

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163360
Approved by: https://github.com/Skylion007
2025-09-20 06:42:26 +00:00
..