pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
cyy	40fb738197	Use Wextra-semi (#140236 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140236 Approved by: https://github.com/ezyang	2024-11-13 02:15:16 +00:00
cyy	1605d4aeb8	Fix object slice (#138880 ) To avoid casting Tensor to Tensorbase Pull Request resolved: https://github.com/pytorch/pytorch/pull/138880 Approved by: https://github.com/Skylion007	2024-10-26 00:13:19 +00:00
cyy	05fa05cbae	[2/N] Change static functions in headers to inline (#127764 ) Follows #127727 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127764 Approved by: https://github.com/Skylion007	2024-06-04 00:49:04 +00:00
Yu, Guangye	eb7adc3ae0	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-30 13:04:38 +00:00
PyTorch MergeBot	968c4c4154	Revert "Refactor gpu trace to be device-agnostic (#121794 )" This reverts commit 74deacbf31d032a2659dc1633dc3e5248921d466. Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks ROCm jobs in trunk `74deacbf31`, please help take a look and reland the change ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2013674083))	2024-03-21 20:33:17 +00:00
Yu, Guangye	74deacbf31	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-21 01:52:58 +00:00
PyTorch MergeBot	f9ed1c432d	Revert "Refactor gpu trace to be device-agnostic (#121794 )" This reverts commit 0ff1109e2688b8c841c9dd0eeecfba16f027b049. Reverted https://github.com/pytorch/pytorch/pull/121794 on behalf of https://github.com/jeanschmidt due to Reverting to see if rocm trunk errors are related ([comment](https://github.com/pytorch/pytorch/pull/121794#issuecomment-2007519408))	2024-03-19 15:40:26 +00:00
Yu, Guangye	0ff1109e26	Refactor gpu trace to be device-agnostic (#121794 ) # Motivation Refactor gpu trace to be device-agnostic. gpu trace is usually used in runtime components, including Device, Stream, Event, Guard, and Allocator. It should be device-agnostic and can be shared among each device backend. # Solution move `_cuda_trace.py` to `_gpu_trace.py`, which makes each device backend owns their callback, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121794 Approved by: https://github.com/jgong5, https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui	2024-03-19 06:02:28 +00:00
cyy	39df084001	[Clang-tidy header][16/N] Enable clang-tidy on headers in torch/csrc/autograd (#117821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117821 Approved by: https://github.com/Skylion007	2024-01-22 00:52:56 +00:00
Jane Xu	6e71ad0509	Add tensor post accumulate grad hook API (#107063 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107063 Approved by: https://github.com/albanD, https://github.com/soulitzer	2023-08-24 00:19:35 +00:00
PyTorch MergeBot	432fce4e0d	Revert "Add tensor post accumulate grad hook API (#107063 )" This reverts commit 3f655277d44909e0770e77e1b4fe1c9b0f39d7b9. Reverted https://github.com/pytorch/pytorch/pull/107063 on behalf of https://github.com/ZainRizvi due to Diff train weirdness. Need to temporarily revert this PR and will right land it soon afterwards ([comment](https://github.com/pytorch/pytorch/pull/107063#issuecomment-1690799057))	2023-08-24 00:12:34 +00:00
Jane Xu	3f655277d4	Add tensor post accumulate grad hook API (#107063 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107063 Approved by: https://github.com/albanD, https://github.com/soulitzer	2023-08-22 15:15:57 +00:00
Jason Ansel	2e02dfae9a	[Compiled Autograd] Fix handling of undefined gradients in hooks (#105813 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105813 Approved by: https://github.com/albanD	2023-07-28 15:59:35 +00:00
Jason Ansel	66d3729388	Add THPVariable_WrapList helper (#105194 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105194 Approved by: https://github.com/soulitzer, https://github.com/albanD	2023-07-15 18:13:35 +00:00
albanD	dda95236c9	Add fast path in our type checks and argparser (#98764 ) Add fastpath for common use cases in our python arg parsing. This is using the observation that exact type check is a lot fast (pointer comparison) than subtype check (isintance call). So we make sure to do these before any isinstance check. This can be pretty significant where `a.view((1, 1, 1, 1))` goes from ~1.13us to 800ns. Full test: Tested perf locally with cpu freq locked and script pinned to a single core to reduce jitter. Benchmark results after doing each change in this PR one by one: ``` [albandes@albandes-fedora-K2202N0104138 test]$ # Original [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 827 ns ± 0.945 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 947 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.04 µs ± 0.882 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.14 µs ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 797 ns ± 0.955 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 937 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 3.52 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 823 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 938 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.03 µs ± 0.801 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.13 µs ± 0.877 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 768 ns ± 2.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 927 ns ± 0.779 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.01 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # checkLong fastpath [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 801 ns ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 900 ns ± 0.593 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1 µs ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.1 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 782 ns ± 0.968 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 1.11 µs ± 424 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.09 µs ± 54.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 817 ns ± 0.65 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 912 ns ± 0.853 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.02 µs ± 8.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.11 µs ± 2.53 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 781 ns ± 0.942 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 939 ns ± 1.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.01 µs ± 0.875 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # Tensor check fastpath [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 806 ns ± 2.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 903 ns ± 1.82 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1 µs ± 1.21 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.1 µs ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 770 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 931 ns ± 3.36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 0.983 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 813 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 915 ns ± 0.868 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.02 µs ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.11 µs ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 785 ns ± 0.807 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 941 ns ± 1.02 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 0.857 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # Fast path number in intlist/symintlist [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 728 ns ± 0.503 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 749 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 771 ns ± 0.727 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 800 ns ± 0.962 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 772 ns ± 0.622 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 883 ns ± 0.567 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 915 ns ± 0.638 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 735 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 753 ns ± 2.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 774 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 801 ns ± 0.835 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 773 ns ± 0.677 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 873 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 907 ns ± 0.836 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) ``` <details> <summary>Test script</summary> ```python import torch from IPython import get_ipython a = torch.empty(1) print("Running ", "a.view(1)") get_ipython().run_line_magic("timeit", "a.view(1)") print("Running ", "a.view((1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1))") print("Running ", "a.view((1, 1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1, 1))") print("Running ", "a.view((1, 1, 1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1, 1, 1))") a = torch.empty(1, 1, 1) print("Running ", "a.squeeze(0)") get_ipython().run_line_magic("timeit", "a.squeeze(0)") print("Running ", "a.squeeze((0,))") get_ipython().run_line_magic("timeit", "a.squeeze((0,))") print("Running ", "a.squeeze((0, 1))") get_ipython().run_line_magic("timeit", "a.squeeze((0, 1))") ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98764 Approved by: https://github.com/ngimel	2023-04-11 00:08:26 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
Kurt Mohler	4d9920fa9c	Move PyInterpreter code in `python_variable.cpp` to its own files (#92647 ) Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92647 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-01-24 23:08:23 +00:00
cyy	85851b1e8f	remove useless clang-tidy suppression (#92287 ) remove NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) remove NOLINTNEXTLINE(performance-move-const-arg) remove NOLINTNEXTLINE(performance-no-automatic-move) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92287 Approved by: https://github.com/albanD	2023-01-21 02:33:24 +00:00
Edward Z. Yang	f884e817d4	Make Python op registration work with torchdeploy/multipy (#87162 ) See strategy at PythonOpRegistrationTrampoline.cpp for the big picture. Along the way, I made OperatorHandle support == and hashing, and slightly changed the low level python_dispatch impl API to disallow empty strings for dispatch key, which had the knock on effect of requiring us to explicitly make sure we pass in CompositeImplicitAutograd if we would have passed in "" (I didn't apply this to the rest of the file because I'm lazy.) Test strategy is we delete the logic for preventing Python op registrations in torch from being skipped in a torchdeploy context and show CI still works. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162 Approved by: https://github.com/anjali411, https://github.com/bdhirsh	2022-11-03 12:56:44 +00:00
Mateusz Sypniewski	916def84d4	CUDA trace Python hooks (#82824 ) ### Description This adds Python hooks into PyTorch that allow the user to register their own callbacks for events such as tensor allocation, stream allocation, event record / wait etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82824 Approved by: https://github.com/lw, https://github.com/ezyang, https://github.com/malfet	2022-08-11 10:21:40 +00:00
Edward Z. Yang	df69660832	Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )"" (#82599 ) This reverts commit 532b8a9e00d7eea2636e67621bfcfa34d9c85bcb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599 Approved by: https://github.com/albanD	2022-08-02 19:37:02 +00:00
PyTorch MergeBot	532b8a9e00	Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )" This reverts commit 9465c0e0b50f3c37bc150ef0016238ba33eca6f4. Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels	2022-08-01 20:25:35 +00:00
Edward Z. Yang	9465c0e0b5	Add a lint rule for torch/csrc/util/pybind.h include (#82552 ) We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations always be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552 Approved by: https://github.com/albanD	2022-08-01 17:16:58 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
anjali411	55f55a4cf6	Allow users to override kernels for existing C++ ops through Python Pull Request resolved: https://github.com/pytorch/pytorch/pull/75905 Approved by: https://github.com/ezyang	2022-05-05 03:31:39 +00:00
Peter Bell	40d1f77384	Codegen: python_torch_functions only include relevant operators (#68693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693 Generation of python bindings for native functions is split over 8 different files. One for each namespace, with the torch namespace split into 3 shards, and methods in their own file as well. This change ensures that editing any single (non-method) operator only causes one of these files to be rebuilt. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596270 Pulled By: albanD fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f (cherry picked from commit ba0fc71a3a6835e49b332a8be52bf798fa2726b3)	2022-01-21 15:37:06 +00:00
Peter Bell	cd9da3267c	Rationalize API exports in torch_python (#68095 ) Summary: This renames `WindowsTorchApiMacro.h` to `Export.h` to mirror the c10 header `c10/macros/Export.h` and also updates it to use `C10_EXPORT`/`C10_IMPORT`. This also removes the `THP_API` macro from `THP_export.h` which appears to serve the same purpose. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68095 Reviewed By: jbschlosser Differential Revision: D32810881 Pulled By: albanD fbshipit-source-id: d6949ccd0d80d6c3e5ec1264207611fcfe2503e3	2021-12-07 15:24:37 -08:00
francescocastelli	152f665dee	Inserted check for PyObject_IsInstance in THPVariableCheck (#67588 ) Summary: Inserted check for the return of PyObject_IsInstance to capture the case in which it raises an exception and return -1. When this happen THPVariable_Check now throws a python_error to signal the exception. Fixes https://github.com/pytorch/pytorch/issues/65084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67588 Reviewed By: mruberry Differential Revision: D32064776 Pulled By: albanD fbshipit-source-id: 895c7682e0991ca257e27f9638a7462d83707320	2021-11-01 16:53:54 -07:00
Nikolay Korovaiko	1f55dd83ac	[WIP] wrap XLATensors into Python XLA wrapper class (#65841 ) Summary: Improbably fixes https://github.com/pytorch/pytorch/issues/65130 ezyang I'm super n00b in Python extensions, is this what we want to do? Pull Request resolved: https://github.com/pytorch/pytorch/pull/65841 Reviewed By: navahgar Differential Revision: D31889790 Pulled By: Krovatkin fbshipit-source-id: c7f077b89f6f02df1962ab83d9e13fcc348a227d	2021-10-25 16:11:03 -07:00
Peter Bell	d701357d92	Factor out TensorBase that doesn't depend on native operators (#63612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612 This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to be rebuilt every time someone changes an operator signature. Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to minimize friction in code mixing the two types. To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build system for certain folders, or just define it at the top of any file. I've also included an example of manually special-casing the commonly used `contiguous` operator. The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in `Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can materialize a `Tensor` for use in dispatch without actually increasing its refcount. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728580 Pulled By: ezyang fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03	2021-09-08 13:28:54 -07:00
Edward Yang	5e5de75f4d	Add getPyInterpreter() API (#62659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62659 It turns out that it is occasionally useful to be able to access the PyInterpreter object from other Python bindings (see next diff in the stack). Make it publicly available. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30074926 Pulled By: ezyang fbshipit-source-id: 2f745ab7c7a672ed7215231fdf9eef6af9705511	2021-08-06 08:23:24 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Edward Yang	f05d5bec48	Preserve PyObject even when it goes dead (#56017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56017 Fixes #55686 This patch is seemingly straightforward but some of the changes are very subtle. For the general algorithmic approach, please first read the quoted issue. Based on the algorithm, there are some fairly straightforward changes: - New boolean on TensorImpl tracking if we own the pyobj or not - PythonHooks virtual interface for requesting deallocation of pyobj when TensorImpl is being released and we own its pyobj, and implementation of the hooks in python_tensor.cpp - Modification of THPVariable to MaybeOwned its C++ tensor, directly using swolchok's nice new class And then, there is python_variable.cpp. Some of the changes follow the general algorithmic approach: - THPVariable_NewWithVar is simply adjusted to handle MaybeOwned and initializes as owend (like before) - THPVariable_Wrap adds the logic for reverting ownership back to PyObject when we take out an owning reference to the Python object - THPVariable_dealloc attempts to resurrect the Python object if the C++ tensor is live, and otherwise does the same old implementation as before - THPVariable_tryResurrect implements the resurrection logic. It is modeled after CPython code so read the cited logic and see if it is faithfully replicated - THPVariable_clear is slightly updated for MaybeOwned and also to preserve the invariant that if owns_pyobj, then pyobj_ is not null. This change is slightly dodgy: the previous implementation has a comment mentioning that the pyobj nulling is required to ensure we don't try to reuse the dead pyobj. I don't think, in this new world, this is possible, because the invariant says that the pyobj only dies if the C++ object is dead too. But I still unset the field for safety. And then... there is THPVariableMetaType. colesbury explained in the issue why this is necessary: when destructing an object in Python, you start off by running the tp_dealloc of the subclass before moving up to the parent class (much in the same way C++ destructors work). The deallocation process for a vanilla Python-defined class does irreparable harm to the PyObject instance (e.g., the finalizers get run) making it no longer valid attempt to resurrect later in the tp_dealloc chain. (BTW, the fact that objects can resurrect but in an invalid state is one of the reasons why it's so frickin' hard to write correct __del__ implementations). So we need to make sure that we actually override the tp_dealloc of the bottom most subclass of Tensor to make sure we attempt a resurrection before we start finalizing. To do this, we need to define a metaclass for Tensor that can override tp_dealloc whenever we create a new subclass of Tensor. By the way, it was totally not documented how to create metaclasses in the C++ API, and it took a good bit of trial error to figure it out (and the answer is now immortalized in https://stackoverflow.com/q/67077317/23845 -- the things that I got wrong in earlier versions of the PR included setting tp_basicsize incorrectly, incorrectly setting Py_TPFLAGS_HAVE_GC on the metaclass--you want to leave it unset so that it inherits, and determining that tp_init is what actually gets called when you construct a class, not tp_call as another not-to-be-named StackOverflow question suggests). Aside: Ordinarily, adding a metaclass to a class is a user visible change, as it means that it is no longer valid to mixin another class with a different metaclass. However, because _C._TensorBase is a C extension object, it will typically conflict with most other metaclasses, so this is not BC breaking. The desired new behavior of a subclass tp_dealloc is to first test if we should resurrect, and otherwise do the same old behavior. In an initial implementation of this patch, I implemented this by saving the original tp_dealloc (which references subtype_dealloc, the "standard" dealloc for all Python defined classes) and invoking it. However, this results in an infinite loop, as it attempts to call the dealloc function of the base type, but incorrectly chooses subclass type (because it is not a subtype_dealloc, as we have overridden it; see `b38601d496/Objects/typeobject.c (L1261)` ) So, with great reluctance, I must duplicate the behavior of subtype_dealloc in our implementation. Note that this is not entirely unheard of in Python binding code; for example, Cython `c25c3ccc4b/Cython/Compiler/ModuleNode.py (L1560)` also does similar things. This logic makes up the bulk of THPVariable_subclass_dealloc To review this, you should pull up the CPython copy of subtype_dealloc `b38601d496/Objects/typeobject.c (L1230)` and verify that I have specialized the implementation for our case appropriately. Among the simplifications I made: - I assume PyType_IS_GC, because I assume that Tensor subclasses are only ever done in Python and those classes are always subject to GC. (BTW, yes! This means I have broken anyone who has extend PyTorch tensor from C API directly. I'm going to guess no one has actually done this.) - I don't bother walking up the type bases to find the parent dealloc; I know it is always THPVariable_dealloc. Similarly, I can get rid of some parent type tests based on knowledge of how THPVariable_dealloc is defined - The CPython version calls some private APIs which I can't call, so I use the public PyObject_GC_UnTrack APIs. - I don't allow the finalizer of a Tensor to change its type (but more on this shortly) One alternative I discussed with colesbury was instead of copy pasting the subtype_dealloc, we could transmute the type of the object that was dying to turn it into a different object whose tp_dealloc is subtype_dealloc, so the stock subtype_dealloc would then be applicable. We decided this would be kind of weird and didn't do it that way. TODO: - More code comments - Figure out how not to increase the size of TensorImpl with the new bool field - Add some torture tests for the THPVariable_subclass_dealloc, e.g., involving subclasses of Tensors that do strange things with finalizers - Benchmark the impact of taking the GIL to release C++ side tensors (e.g., from autograd) - Benchmark the impact of adding a new metaclass to Tensor (probably will be done by separating out the metaclass change into its own change) - Benchmark the impact of changing THPVariable to conditionally own Tensor (as opposed to unconditionally owning it, as before) - Add tests that this actually indeed preserves the Python object Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27765125 Pulled By: ezyang fbshipit-source-id: 857f14bdcca2900727412aff4c2e2d7f0af1415a	2021-06-03 10:50:36 -07:00
Edward Yang	61418aa069	Make THPVariable_Unpack work on THPVariable too (#55798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55798 I'm going to change how cdata is implemented internally, so I want to make all callsites call through THPVariable_Unpack even if they actually have a THPVariable in hand Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27712131 Pulled By: ezyang fbshipit-source-id: bd2eb1e43c52c6b7a776ff3a45350a23934e643c	2021-04-15 08:57:02 -07:00
Nikita Shulga	6a39613f35	[BE] Make torch/csrc/jit/tensorexpr/ clang-tidy clean (#55628 ) Summary: Mostly auto-generated changes using ``` python3 tools/clang_tidy.py -c build -x torch/csrc/jit/tensorexpr/eval.cpp -s ``` With following common patterns manually fixed - Use ` = default` instead of `{}` - deleted methods should be public - Use pass-by-value + std::move instead of pass-by-reference+copy Pull Request resolved: https://github.com/pytorch/pytorch/pull/55628 Reviewed By: walterddr Differential Revision: D27655378 Pulled By: malfet fbshipit-source-id: 92be87a08113435d820711103ea9b0364182c71a	2021-04-08 19:44:14 -07:00
Taylor Robie	d31a760be4	move has_torch_function to C++, and make a special case object_has_torch_function (#48965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48965 This PR pulls `__torch_function__` checking entirely into C++, and adds a special `object_has_torch_function` method for ops which only have one arg as this lets us skip tuple construction and unpacking. We can now also do away with the Python side fast bailout for `Tensor` (e.g. `if any(type(t) is not Tensor for t in tensors) and has_torch_function(tensors)`) because they're actually slower than checking with the Python C API. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ezyang Differential Revision: D25590732 Pulled By: robieta fbshipit-source-id: 6bd74788f06cdd673f3a2db898143d18c577eb42	2021-01-10 19:23:35 -08:00
Taylor Robie	839c2f235f	treat Parameter the same way as Tensor (#48963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48963 This PR makes the binding code treat `Parameter` the same way as `Tensor`, unlike all other `Tensor` subclasses. This does change the semantics of `THPVariable_CheckExact`, but it isn't used much and it seemed to make sense for the half dozen or so places that it is used. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ezyang Differential Revision: D25590733 Pulled By: robieta fbshipit-source-id: 060ecaded27b26e4b756898eabb9a94966fc9840	2021-01-10 19:18:31 -08:00
Sam Gross	b1b65f34a9	Make PythonArgs::tensor and PythonArgs::scalar faster (#22782 ) Summary: Speeds up the common case where Tensor is a torch.Tensor (not a subclass). This reduces the number of executed instructions for a torch.add(tensor1, tensor2) by ~328 (should be ~65 ns faster). Note that most of the PythonArgs accessors are too large to be inlined. We should move most of them to the cpp file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22782 Differential Revision: D16223592 Pulled By: colesbury fbshipit-source-id: cc20f8989944389d5a5e3fab033cdd70d581ffb1	2019-07-12 11:57:29 -07:00
Will Feng	8cde4c4d22	Remove Variable::Impl and DifferentiableViewImpl (#17072 ) Summary: As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR: 1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class 2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()` 3. Remove `Variable.data()` API 3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history. After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't. Note that this PR is BC-breaking in the following use cases: Use Case 1: Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type. Use Case 2: If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example: ```python params = torch.tensor([1.5, 1.5]).requires_grad_() with torch.no_grad(): # Change gradient to a sparse tensor params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.])) grad_saved = params.grad params.backward(torch.tensor([1.5, 1.5])) assert id(grad_saved) == id(params.grad) # This will fail after this PR ``` The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072 Differential Revision: D14075257 Pulled By: yf225 fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957	2019-05-23 21:09:04 -07:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Peter Goldsborough	d6c53328f9	Large scale fix of python-related files in torch/csrc/ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14515 Differential Revision: D13247966 Pulled By: goldsborough fbshipit-source-id: 7a127c508fc576a7a92626dd6b729f660162d628	2018-12-07 13:04:46 -08:00
Zachary DeVito	d985cf46f1	Add workaround to fix include warnings in Python 2 builds. (#6716 )	2018-04-24 12:30:19 -07:00
peterjc123	63af898d46	Fix extension test on Windows (#5548 ) * Change cpp_extensions.py to make it work on Windows * Fix linting * Show python paths * Debug * Debug 1 * set PYTHONPATH * Add ATen into library * expose essential libs and functions, and copy _C.lib * Specify dir in header * Update check_abi for MSVC * Activate cl environment to compile cpp extensions * change version string * Redirect stderr to stdout * Add monkey patch for windows * Remove unnecessary self * Fix various issues * Append necessary flags * add /MD flag to cuda * Install ninja * Use THP_API instead of THP_CLASS * Beautify the paths * Revert "Use THP_API instead of THP_CLASS" This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d. * Use THP_API instead of THP_CLASS(new)	2018-04-02 13:53:25 -04:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
gchanan	9bb6d33d35	Enable scalars if compiled with WITH_SCALAR environment variable. (#4806 ) * Enable scalars if compiled with WITH_SCALAR environment variable. We are pretty close to enabling scalars (0-dimensional arrays); this allows turning them on for development purposes and to be able to write code that works both with and without scalars enabled. WITH_SCALARS is currently broken with distributions, but should work for test_torch, test_autograd, test_nn. * Fix unsqueeze. * Fix wrap dim, wrapping with Scalar.	2018-01-23 15:44:11 -05:00
Sam Gross	e23acb3b08	Allow Variables in the (legacy) THNN bindings. (#4723 ) The legacy NN bindings currently operate only on Tensors. We are slowly replacing all uses of Tensor with Variable in Python code so that there will only be one user-visible class. This changes the NN bindings accessed through type2backend to accept either Tensors or Variables. This does not affect the NN bindings that go through ATen.	2018-01-19 10:56:58 -05:00
gchanan	eb857ec367	Introduce a (non-public) autograd scalar method and improve printing (#4586 ) * Specialize Variable pinting and always print device for GPU tensors/Variables. * Introduce a (non-public) _scalar_sum() method for autograd scalar testing.	2018-01-12 14:26:38 -05:00
Sam Gross	d605058212	Replace Variable.volatile with torch.no_grad() (#3970 ) This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627	2017-12-18 15:46:13 -05:00
Sam Gross	fde355f7d4	Allow in-place operations on views (#3384 ) Allow in-place operations on views Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to the base Variable on which it is a view. In-place operations on views change the grad_fn of the base. Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception. Fixes #3313	2017-11-06 18:19:56 -05:00
Sam Gross	6647475bc2	Lazily create Variable.data PyObject* (#3149 ) Previously, we the Variable.data PyObject* in THPVariable_Wrap. For many Variables, we don't access their data directly. Instead, they are passed from one Variable compuatation to another. This reduces the overhead of ATen-implemented Variable methods by ~200ns.	2017-10-17 11:54:55 -04:00

1 2

57 Commits