pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Nicolas De Carli 31681bcacc [PyTorch] Pull ARM's box-cox (#164152 )

Summary:
ARM has provided with an SVE128 box-cox implementation.

It uses the same underlying algorithm as the previous version, but it has better log and exp implementations.
These supplied mathematical functions have switches to adjust the precision/speed trade-off.

We've noted a slight precision improvement, while also about a 5% peroformance increase

Before:

ZeroLambda1                                                61.66ns    16.22M
NonZeroLambda1                                            125.73ns     7.95M
NonZeroLambdaManyColumns                                    1.84ms    542.11
NonZeroLambdaEigenColumnar                                262.31us     3.81K
NonZeroLambdaEigenRowMajor                                275.17us     3.63K
NonZeroLambdaWithPyTorchColumnar                           97.43us    10.26K
NonZeroLambdaWithPyTorchRowMajor                           90.82us    11.01K
NonZeroLambdaWithPyTorchRowMajorFullBatch                  96.96us    10.31K
NonZeroLambdaBatch                                        151.84us     6.59K

After:

ZeroLambda1                                                57.85ns    17.29M
NonZeroLambda1                                            118.85ns     8.41M
NonZeroLambdaManyColumns                                    1.82ms    548.16
NonZeroLambdaEigenColumnar                                261.67us     3.82K
NonZeroLambdaEigenRowMajor                                274.53us     3.64K
NonZeroLambdaWithPyTorchColumnar                           89.12us    11.22K
NonZeroLambdaWithPyTorchRowMajor                           83.49us    11.98K
NonZeroLambdaWithPyTorchRowMajorFullBatch                  88.79us    11.26K
NonZeroLambdaBatch                                        144.74us     6.91K

Test Plan:
Correctness:

buck2 test @//mode/opt //koski/functions_contrib/df4ai/tests:batch_box_cox_test

Performance:

buck2 run @//mode/opt //koski/functions_contrib/df4ai/benchmark:boxcox_benchmark

Differential Revision:
D83485704

Privacy Context Container: L1196524

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164152
Approved by: https://github.com/ezyang

2025-10-01 15:00:03 +00:00

core

Record the XPU and XCCL build settings in the compiled binary (#147161 )

2025-05-20 09:21:39 +00:00

perfkernels

[PyTorch] Pull ARM's box-cox (#164152 )

2025-10-01 15:00:03 +00:00

serialize

[AOTI] raise PyTorchStreamWriter open failed error code on windows (#162799 )

2025-09-13 01:41:14 +00:00

utils

[3/N] Use internal linkage in C++ files (#151297 )