pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-03 07:24:58 +08:00

Author	SHA1	Message	Date
cyy	e0a5536cc9	[2/N] Fix clang-tidy warnings in torch/csrc/autograd (#133295 ) Follows #133180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133295 Approved by: https://github.com/Skylion007	2024-08-13 13:23:46 +00:00
cyy	8a3c241094	Remove unused header inclusion (#119667 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119667 Approved by: https://github.com/Skylion007	2024-02-12 05:36:25 +00:00
cyy	efc7c366f4	Remove auto_gil.h (#108492 ) auto_gil.h has been deprecated for a long time. We can switch to pybind11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108492 Approved by: https://github.com/Skylion007	2023-09-05 08:26:13 +00:00
Sherlock Huang	a7baad04f6	Preserve stack trace for backward nodes over AOTAutograd (#83558 ) For the following program. ``` def my_relu(a): return a.relu() def func(a, b): a = torch.nn.Linear(10, 10)(a) d = torch.square(b) d = my_relu(d) loss = d.sum() return loss with torchdynamo.optimize("aot_nop"): x = torch.rand(10, 10, requires_grad=True) y = torch.rand(10, 10, requires_grad=True) out = func(x, y) ``` It would generate the following fx graph with stack_trace populated in both forward and backward nodes. ``` def forward(self, primals, tangents): primals_1, primals_2, primals_3, primals_4, tangents_1, = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec) t_default = torch.ops.aten.t.default(primals_3); primals_3 = None addmm_default = torch.ops.aten.addmm.default(primals_4, primals_1, t_default); primals_4 = primals_1 = t_default = None pow_tensor_scalar = torch.ops.aten.pow.Tensor_Scalar(primals_2, 2) relu_default = torch.ops.aten.relu.default(pow_tensor_scalar); pow_tensor_scalar = None detach_default = torch.ops.aten.detach.default(relu_default) sum_default = torch.ops.aten.sum.default(relu_default); relu_default = None is_same_size_default = torch.ops.aten.is_same_size.default(sum_default, tangents_1) expand_default = torch.ops.aten.expand.default(tangents_1, [10, 10]); tangents_1 = None detach_default_1 = torch.ops.aten.detach.default(detach_default); detach_default = None threshold_backward_default = torch.ops.aten.threshold_backward.default(expand_default, detach_default_1, 0); expand_default = detach_default_1 = None pow_tensor_scalar_1 = torch.ops.aten.pow.Tensor_Scalar(primals_2, 1.0); primals_2 = None mul_scalar = torch.ops.aten.mul.Scalar(pow_tensor_scalar_1, 2.0); pow_tensor_scalar_1 = None mul_tensor = torch.ops.aten.mul.Tensor(threshold_backward_default, mul_scalar); threshold_backward_default = mul_scalar = None return pytree.tree_unflatten([sum_default, None, mul_tensor, None, None], self._out_spec) ====== joint graph ======= primals_1 None primals_2 None primals_3 None primals_4 None tangents_1 None t_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 12, in func def func(a, b): File "/fsx/users/bahuang/repos/pytorch_fsx/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) addmm_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 12, in func def func(a, b): File "/fsx/users/bahuang/repos/pytorch_fsx/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) pow_tensor_scalar File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 14, in func d = torch.square(b) relu_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 15, in func d = my_relu(d) File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 10, in my_relu return a.relu() detach_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 15, in func d = my_relu(d) File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 10, in my_relu return a.relu() sum_default is_same_size_default expand_default detach_default_1 File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 15, in func d = my_relu(d) File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 10, in my_relu return a.relu() threshold_backward_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 15, in func d = my_relu(d) File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 10, in my_relu return a.relu() pow_tensor_scalar_1 File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 14, in func d = torch.square(b) mul_scalar File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 14, in func d = torch.square(b) mul_tensor File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 14, in func d = torch.square(b) output None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83558 Approved by: https://github.com/albanD	2022-08-18 22:13:04 +00:00
Edward Z. Yang	df69660832	Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )"" (#82599 ) This reverts commit 532b8a9e00d7eea2636e67621bfcfa34d9c85bcb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599 Approved by: https://github.com/albanD	2022-08-02 19:37:02 +00:00
PyTorch MergeBot	532b8a9e00	Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )" This reverts commit 9465c0e0b50f3c37bc150ef0016238ba33eca6f4. Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels	2022-08-01 20:25:35 +00:00
Edward Z. Yang	9465c0e0b5	Add a lint rule for torch/csrc/util/pybind.h include (#82552 ) We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations always be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552 Approved by: https://github.com/albanD	2022-08-01 17:16:58 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Jeffrey Wan	2e8e560cdf	Fix anomaly mode memory leak (#51610 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51349 The memory leak happens when 1) `create_graph` is True AND 2) detect anomaly mode is on. When a backward node's constructor is called during backward, the current evaluating node is assigned as a "parent" of the created node. The code that assigns the parent encounters the below issue: `functionToPyObject(parent_node)` returns a new PyObject (with refcount 1) or if PyObject already exists, increments its refcount by 1. However [PyDict_SetItem](`1b55b65638/Objects/dictobject.c (L1532)`) calls into [insertdict](https://github.com/python/cpython/blob/v3.8.1/Objects/dictobject.c#L1034) which increments refcount again. This means that when dict is destroyed, the refcount of the PyObject is at least one. This keeps `parent_node` (the backward function) alive, which then keeps the saved tensor alive. Similar calls in the codebase to `functionToPyObject` won't require Py_DECREF if it is then passed into a tuple (instead of dict), because the analogous PyTuple_SetItem call does not increment refcount. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51610 Reviewed By: albanD Differential Revision: D26240336 Pulled By: soulitzer fbshipit-source-id: 2854528f66fab9dbce448f8a7ba732ce386a7310	2021-02-04 11:53:37 -08:00
mfkasim91	576880febf	Print all traceback for nested backwards in detect_anomaly (#43626 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43405. This pull request adds a feature of printing all tracebacks if a `detect_anomaly` mode detects `nan` in nested backward operations. The way I did it is by assigning a node as a parent to all nodes it produces during its backward calculation. Then if one of the children produces `nan`, it will print the traceback from the parent and grand parents (if any). The parent is assigned in `parent_node_` member in `Node` class which is accessible in C++ by function `node->parent()` and in Python by `node.parent_function`. A node has a parent iff: 1. it is created from a backward operation, and 2. created when anomaly mode and grad mode are both enabled. An example of this feature: import torch def example(): x = torch.tensor(1.0, requires_grad=True) y = torch.tensor(1e-8, requires_grad=True) # small to induce nan in n-th backward a = x * y b = x * y z1 = a / b # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved z = z1 * z1 gy , = torch.autograd.grad( z , (y,), create_graph=True) gy2, = torch.autograd.grad(gy , (y,), create_graph=True) gy3, = torch.autograd.grad(gy2, (y,), create_graph=True) gy4, = torch.autograd.grad(gy3, (y,), create_graph=True) return gy4 with torch.autograd.detect_anomaly(): gy4 = example() with output: example.py:16: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging. with torch.autograd.detect_anomaly(): /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Error detected in DivBackward0. Traceback of forward call that caused the error: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 12, in example gy3, = torch.autograd.grad(gy2, (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:61.) return Variable._execution_engine.run_backward( /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Traceback of forward call that induces the previous calculation: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 11, in example gy2, = torch.autograd.grad(gy , (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:65.) return Variable._execution_engine.run_backward( /home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py:190: UserWarning: Traceback of forward call that induces the previous calculation: File "example.py", line 17, in <module> gy4 = example() File "example.py", line 8, in example z1 = a / b # can produce nan in n-th backward as long as https://github.com/pytorch/pytorch/issues/43414 is unsolved (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:65.) return Variable._execution_engine.run_backward( Traceback (most recent call last): File "example.py", line 17, in <module> gy4 = example() File "example.py", line 13, in example gy4, = torch.autograd.grad(gy3, (y,), create_graph=True) File "/home/mfkasim/anaconda2/envs/base3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 190, in grad return Variable._execution_engine.run_backward( RuntimeError: Function 'DivBackward0' returned nan values in its 1th output. cc & thanks to albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/43626 Reviewed By: malfet Differential Revision: D23397499 Pulled By: albanD fbshipit-source-id: aa7435ec2a7f0d23a7a02ab7db751c198faf3b7d	2020-08-31 08:23:07 -07:00
Hong Xu	027d7f7ba5	Delete AT_WARN and replace all AT_WARN with TORCH_WARN (#34623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34623 The bandaid of "AT_WARN" keeps introducing new warnings. Let's get rid of it entirely. Close #34502 Test Plan: Imported from OSS Differential Revision: D20420112 Pulled By: albanD fbshipit-source-id: 7160c113cb4deb2d2f50a375356f423fe5e86f50	2020-03-13 12:27:22 -07:00
Richard Zou	3e6e2e9b7b	Print the current Node name in anomaly mode (#33875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33875 Fixes #33675. I added a `current_node_name` argument to AnomalyMetadata::print_stack. This is a mandatory arg because I found only one callsite and making it a default arg on a virtual function can be confusing. Test Plan: - Tested locally: https://gist.github.com/zou3519/09937387c83efc76e1700374d5c9c9d9 - I don't know how to add a test for this: the message is printed to stderr but it isn't an exception nor a warning. I considered capturing the stderr of a subprocess but that seems like asking for flakiness. Differential Revision: D20349399 Pulled By: zou3519 fbshipit-source-id: 7585ddffe2bf9e1081f4028a9c44de783978a052	2020-03-10 07:51:52 -07:00
Edward Yang	1111a6b810	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/29095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274 Differential Revision: D18762293 Pulled By: ezyang fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9	2019-12-02 12:19:58 -08:00
Mike Ruberry	eff4c4d7c1	Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL Test Plan: revert-hammer Differential Revision: D18301806 Original commit changeset: 03da6a26c41e fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39	2019-11-21 14:50:07 -08:00
Alan Du	f4b9690f2d	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095 ) Summary: Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions. Fixes https://github.com/pytorch/pytorch/issues/29065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095 Differential Revision: D18301806 Pulled By: ezyang fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a	2019-11-21 13:44:40 -08:00
Mikhail Zolotukhin	722eb48ff2	Cleanup includes in torch/csrc/* (#19924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19924 ghimport-source-id: f7248b16c8e263a7d0ba7975b1fc0b00cb2cf2c0 Differential Revision: D15125018 Pulled By: ZolotukhinM fbshipit-source-id: 322c7ca53e38ef8b43b5ac5bd747b28bc10379f1	2019-05-06 14:03:18 -07:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
albanD	3365d74df9	Fix refcounting in anomaly metadata Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13249 Differential Revision: D12823875 Pulled By: soumith fbshipit-source-id: a0857a7cc8a4888aff99991fbae6bdd7a49d1ac4	2018-10-29 15:55:08 -07:00
Sam Gross	77484d91db	Add AT_WARN to issue warnings from ATen (#8967 ) Summary: Use AT_WARN from python_anomaly_mode instead of printing to stdout. Closes https://github.com/pytorch/pytorch/pull/8967 Reviewed By: ezyang Differential Revision: D8670654 Pulled By: colesbury fbshipit-source-id: 3f7aee8ea06914d7d4381feec086e95f0b194752	2018-06-27 21:24:39 -07:00
albanD	78e3259bbe	Add autograd automatic anomaly detection (#7677 ) * add autograd automatic anomaly detection * python 3 string support * Fix non python build * fix typo in doc * better test and naming fix * fix no python build and python object handling * fix missing checks * clean NO_PYTHON build * Remove unwanted changes	2018-06-11 21:26:17 -04:00

20 Commits