pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-06 17:24:59 +08:00

Author	SHA1	Message	Date
yewentao256	fd6655a0f5	Feature: Implement support for `cudnn_batch_norm_out` kernel to replace the autogen approach. (#123020 ) Fixes #115611 Autogen kernel may cause redundant copy, so we develop the kernel to improve efficiency. Test Case: ```c++ #include <torch/torch.h> #include <iostream> #include <ATen/ATen.h> #include <ATen/cuda/CUDAContext.h> int main() { auto input = torch::rand({2, 3, 4, 4}, torch::device(torch::kCUDA)); auto weight = torch::randn({3}, torch::device(torch::kCUDA)); auto bias = torch::randn({3}, torch::device(torch::kCUDA)); auto running_mean = torch::zeros({3}, torch::device(torch::kCUDA)); auto running_var = torch::ones({3}, torch::device(torch::kCUDA)); bool training = true; double exponential_average_factor = 0.1; double epsilon = 1e-5; auto output = torch::empty_like(input); auto save_mean = torch::empty({3}, torch::device(torch::kCUDA)); auto save_var = torch::empty({3}, torch::device(torch::kCUDA)); auto reserve = torch::empty({0}, torch::device(torch::kCUDA)); // empty place-holder at::native::cudnn_batch_norm_out(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon, output, save_mean, save_var, reserve); auto outputs = at::native::cudnn_batch_norm(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon); bool is_close_output = torch::allclose(output, std::get<0>(outputs)); bool is_close_save_mean = torch::allclose(save_mean, std::get<1>(outputs)); bool is_close_save_var = torch::allclose(save_var, std::get<2>(outputs)); bool is_close_reserve = torch::allclose(reserve, std::get<3>(outputs)); std::cout << "Is output close: " << is_close_output << std::endl; std::cout << "Is save_mean close: " << is_close_save_mean << std::endl; std::cout << "Is save_var close: " << is_close_save_var << std::endl; std::cout << "Is reserve close: " << is_close_reserve << std::endl; return 0; } ``` Please CC @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/123020 Approved by: https://github.com/andrewor14, https://github.com/eqy, https://github.com/albanD	2025-08-04 22:40:33 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Charles Hofer	5fd037ce44	Fix MagmaInitializesCorrectly_CUDA by using an invertible matrix (#32547 ) Summary: This test case had been using the tensor ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ``` which is not an invertible tensor and causes the test case to fail, even if magma gets initialized just fine. This change uses a tensor that is invertible, and the inverse doesn't include any elements that are close to zero to avoid floating point rounding errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32547 Differential Revision: D19572316 Pulled By: ngimel fbshipit-source-id: 1baf3f8601b2ba69fdd6678d7a3d86772d01edbe	2020-01-25 20:00:54 -08:00
Will Feng	420b37f3c6	Deprecate tensor.data<T>(), and codemod tensor.data<T>() to tensor.data_ptr<T>() (#24886 ) Summary: This PR adds deprecation message for `tensor.data<T>()` (`91d94e7d41`), and changes all call sites of `tensor.data<T>()` to `tensor.data_ptr<T>()` in PyTorch core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24886 Differential Revision: D16924576 Pulled By: yf225 fbshipit-source-id: 0943d6be73245c7c549c78597b74c3b07fa24440	2019-08-21 20:11:24 -07:00
Soumith Chintala	b5d8844bbe	push magma init into lazyInitCUDA (#18527 ) Summary: Tries to fix C++ API's usage of MAGMA-based functions. Attempts to Fix https://github.com/pytorch/pytorch/issues/18074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18527 Differential Revision: D14691694 Pulled By: soumith fbshipit-source-id: dd04e74418e486d73ea4a92193ddf79352ed71ba	2019-04-03 12:47:34 -07:00
Peter Goldsborough	8311bbee7f	Fix Windows build and test in CI (#11716 ) Summary: This PR adds Windows support for the C++ frontend. A lot of declarations were missing `TORCH_API` macros, and lots of code just did not compile on MSVC. ebetica ezyang orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/11716 Reviewed By: orionr Differential Revision: D13038253 Pulled By: goldsborough fbshipit-source-id: c8e5a45efd26117aeb99e768b56fcd5a89fcb9f8	2018-11-13 16:35:54 -08:00
Tongzhou Wang	d8dab6ffa8	Add tensor.to(options) (#13146 ) Summary: ezyang on the template hack smessmer on SFINAE of the `TensorOptions(Device)` goldsborough on the C++ API test changes zdevito on the `jit` codegen changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/13146 Reviewed By: ezyang Differential Revision: D12823809 Pulled By: SsnL fbshipit-source-id: 98d65c401c98fda1c6fa358e4538f86c6495abdc	2018-10-29 16:26:06 -07:00
Peter Goldsborough	825181ea9d	Rewrite C++ API tests in gtest (#11953 ) Summary: This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch. You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were: 1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt` 2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all. 3. Updated README ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953 Differential Revision: D9998883 Pulled By: goldsborough fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80	2018-09-21 21:28:16 -07:00
Gregory Chanan	e00fb69b25	Use CATCH prefix to avoid name conflicts with Caffe2. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11780 Differential Revision: D9889925 Pulled By: gchanan fbshipit-source-id: 5eca849c36ced00b8ae7482b7945b445a3e1687e	2018-09-18 08:12:45 -07:00
Peter Goldsborough	b770156a7a	Functional DataParallel (#9234 ) Summary: This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend. For this, I had to: 1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s, 2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++. `replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice). `parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py). Added lots of tests for these things. apaszke ezyang ebetica colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/9234 Differential Revision: D8865182 Pulled By: goldsborough fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4	2018-07-19 16:12:04 -07:00
Peter Goldsborough	17784d2029	Make at::tensor faster (#8709 )	2018-06-20 14:46:58 -07:00
Peter Goldsborough	9335885b1b	Create at::tensor (#8475 )	2018-06-20 11:44:21 -07:00

14 Commits