Files
pytorch/test/cpp/api
yewentao256 fd6655a0f5 Feature: Implement support for cudnn_batch_norm_out kernel to replace the autogen approach. (#123020)
Fixes #115611

Autogen kernel may cause redundant copy, so we develop the kernel to improve efficiency.

Test Case:

```c++
#include <torch/torch.h>
#include <iostream>
#include <ATen/ATen.h>
#include <ATen/cuda/CUDAContext.h>

int main() {
    auto input = torch::rand({2, 3, 4, 4}, torch::device(torch::kCUDA));
    auto weight = torch::randn({3}, torch::device(torch::kCUDA));
    auto bias = torch::randn({3}, torch::device(torch::kCUDA));
    auto running_mean = torch::zeros({3}, torch::device(torch::kCUDA));
    auto running_var = torch::ones({3}, torch::device(torch::kCUDA));

    bool training = true;
    double exponential_average_factor = 0.1;
    double epsilon = 1e-5;

    auto output = torch::empty_like(input);
    auto save_mean = torch::empty({3}, torch::device(torch::kCUDA));
    auto save_var = torch::empty({3}, torch::device(torch::kCUDA));
    auto reserve = torch::empty({0}, torch::device(torch::kCUDA)); // empty place-holder

    at::native::cudnn_batch_norm_out(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon, output, save_mean, save_var, reserve);
    auto outputs = at::native::cudnn_batch_norm(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon);

    bool is_close_output = torch::allclose(output, std::get<0>(outputs));
    bool is_close_save_mean = torch::allclose(save_mean, std::get<1>(outputs));
    bool is_close_save_var = torch::allclose(save_var, std::get<2>(outputs));
    bool is_close_reserve = torch::allclose(reserve, std::get<3>(outputs));

    std::cout << "Is output close: " << is_close_output << std::endl;
    std::cout << "Is save_mean close: " << is_close_save_mean << std::endl;
    std::cout << "Is save_var close: " << is_close_save_var << std::endl;
    std::cout << "Is reserve close: " << is_close_reserve << std::endl;

    return 0;
}
```

Please CC @albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123020
Approved by: https://github.com/andrewor14, https://github.com/eqy, https://github.com/albanD
2025-08-04 22:40:33 +00:00
..
2025-05-19 00:31:34 +00:00
2025-04-27 09:56:42 +00:00

C++ Frontend Tests

In this folder live the tests for PyTorch's C++ Frontend. They use the GoogleTest test framework.

CUDA Tests

To make a test runnable only on platforms with CUDA, you should suffix your test with _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_CUDA) { }

To make it runnable only on platforms with at least two CUDA machines, suffix it with _MultiCUDA instead of _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_MultiCUDA) { }

There is logic in main.cpp that detects the availability and number of CUDA devices and supplies the appropriate negative filters to GoogleTest.

Integration Tests

Integration tests use the MNIST dataset. You must download it by running the following command from the PyTorch root folder:

$ python tools/download_mnist.py -d test/cpp/api/mnist

The required paths will be referenced as test/cpp/api/mnist/... in the test code, so you must run the integration tests from the PyTorch root folder.