mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

yewentao256 fd6655a0f5 Feature: Implement support for cudnn_batch_norm_out kernel to replace the autogen approach. (#123020 )

Fixes #115611

Autogen kernel may cause redundant copy, so we develop the kernel to improve efficiency.

Test Case:

```c++
#include <torch/torch.h>
#include <iostream>
#include <ATen/ATen.h>
#include <ATen/cuda/CUDAContext.h>

int main() {
    auto input = torch::rand({2, 3, 4, 4}, torch::device(torch::kCUDA));
    auto weight = torch::randn({3}, torch::device(torch::kCUDA));
    auto bias = torch::randn({3}, torch::device(torch::kCUDA));
    auto running_mean = torch::zeros({3}, torch::device(torch::kCUDA));
    auto running_var = torch::ones({3}, torch::device(torch::kCUDA));

    bool training = true;
    double exponential_average_factor = 0.1;
    double epsilon = 1e-5;

    auto output = torch::empty_like(input);
    auto save_mean = torch::empty({3}, torch::device(torch::kCUDA));
    auto save_var = torch::empty({3}, torch::device(torch::kCUDA));
    auto reserve = torch::empty({0}, torch::device(torch::kCUDA)); // empty place-holder

    at::native::cudnn_batch_norm_out(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon, output, save_mean, save_var, reserve);
    auto outputs = at::native::cudnn_batch_norm(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon);

    bool is_close_output = torch::allclose(output, std::get<0>(outputs));
    bool is_close_save_mean = torch::allclose(save_mean, std::get<1>(outputs));
    bool is_close_save_var = torch::allclose(save_var, std::get<2>(outputs));
    bool is_close_reserve = torch::allclose(reserve, std::get<3>(outputs));

    std::cout << "Is output close: " << is_close_output << std::endl;
    std::cout << "Is save_mean close: " << is_close_save_mean << std::endl;
    std::cout << "Is save_var close: " << is_close_save_var << std::endl;
    std::cout << "Is reserve close: " << is_close_reserve << std::endl;

    return 0;
}
```

Please CC @albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123020
Approved by: https://github.com/andrewor14, https://github.com/eqy, https://github.com/albanD

2025-08-04 22:40:33 +00:00

any.cpp

…

autograd.cpp

c10::optional -> std::optional (#142514 )

2024-12-12 17:23:46 +00:00

CMakeLists.txt

Fix some CMake issues (#153686 )

2025-05-19 00:31:34 +00:00

dataloader.cpp

[Lint] Update clang-format to 19.1.4 (#153889 )

2025-05-20 14:12:46 +00:00

dispatch.cpp

[Environment Variable][2/N] Use thread-safe setenv wrapper (#124485 )

2024-10-04 07:30:51 +00:00

enum.cpp

…

expanding-array.cpp

…

fft.cpp

…

functional.cpp

Expose bicubic mode for torch::nn::functional::grid_sample in LibTorch (#150817 )

2025-04-21 08:55:27 +00:00

grad_mode.cpp

…

inference_mode.cpp

…

init_baseline.h

…

init_baseline.py

[BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758 )

2024-07-31 10:54:03 +00:00

init.cpp

…

integration.cpp

C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 )

2024-10-19 13:17:43 +00:00

ivalue.cpp

Use object identity for deepcopy memo (#126126 )

2024-05-17 00:06:26 +00:00

jit.cpp

…

memory.cpp

[codemod] c10:optional -> std::optional (#126135 )

2024-05-14 19:35:51 +00:00

meta_tensor.cpp

…

misc.cpp

…

module.cpp

torch::optional -> std::optional (#138987 )

2024-10-28 19:09:46 +00:00

moduledict.cpp

…

modulelist.cpp

…

modules.cpp

[2/N] Fix cppcoreguidelines-init-variables suppression (#146237 )

2025-06-19 23:26:42 +00:00

namespace.cpp

…

nested_int.cpp

…

nested.cpp

…

nn_utils.cpp

C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 )

2024-10-19 13:17:43 +00:00

operations.cpp

C10_UNUSED to [[maybe_unused]] (#6357 ) (#138364 )

2024-10-19 13:17:43 +00:00

optim_baseline.h

…

optim_baseline.py

UFMT formatting on test/autograd test/ao test/cpp test/backends (#123369 )

2024-04-05 18:51:38 +00:00

optim.cpp

[14/N] Fix extra warnings brought by clang-tidy-17 (#141644 )

2024-12-13 06:22:13 +00:00

ordered_dict.cpp

…

parallel_benchmark.cpp

…

parallel.cpp

torch::optional -> std::optional (#138987 )

2024-10-28 19:09:46 +00:00

parameterdict.cpp

…

parameterlist.cpp

…

README.md

…

rnn.cpp

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

sequential.cpp

…

serialize.cpp

Remove more unused variables in tests (#127510 )

2024-05-31 03:39:45 +00:00

special.cpp

…

static.cpp

std::value/std::type -> std::_v/std::_t (#138746 )

2024-10-26 20:59:24 +00:00

support.cpp

…

support.h

…

tensor_cuda.cpp

Feature: Implement support for cudnn_batch_norm_out kernel to replace the autogen approach. (#123020 )

2025-08-04 22:40:33 +00:00

tensor_flatten.cpp

…

tensor_indexing.cpp

…

tensor_options_cuda.cpp

…

tensor_options.cpp

…

tensor.cpp

[Lint] Update clang-format to 19.1.4 (#153889 )

2025-05-20 14:12:46 +00:00

torch_include.cpp

…

transformer.cpp

[BE][3/6] fix typos in test/ (#157637 )

2025-07-17 12:08:33 +00:00

README.md

C++ Frontend Tests

In this folder live the tests for PyTorch's C++ Frontend. They use the GoogleTest test framework.

CUDA Tests

To make a test runnable only on platforms with CUDA, you should suffix your test with _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_CUDA) { }

To make it runnable only on platforms with at least two CUDA machines, suffix it with _MultiCUDA instead of _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_MultiCUDA) { }

There is logic in main.cpp that detects the availability and number of CUDA devices and supplies the appropriate negative filters to GoogleTest.

Integration Tests

Integration tests use the MNIST dataset. You must download it by running the following command from the PyTorch root folder:

$ python tools/download_mnist.py -d test/cpp/api/mnist

The required paths will be referenced as test/cpp/api/mnist/... in the test code, so you must run the integration tests from the PyTorch root folder.