pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	b2953f5643	[9/N] Apply ruff UP035 rule (#165515 ) This is follow-up of #165214 to continue applying ruff UP035 rule to the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165515 Approved by: https://github.com/Lucaskabela	2025-10-17 00:09:51 +00:00
Laith Sakka	189a054cfb	Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 ) [relanding again after fixing internal build] Summary: This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous() but want to find those call sites to handle this properly by calling is_contiguous_or_false() and not is_contiguous() explitly when appropriate. I had to fix one issue after removing the implicit size oblivious reasoning. here is context we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE. when people call is_contiguous we do sym_is_contiguous().guard_bool() when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false() one issue not handled well was this path ``` c10::SymBool TensorImpl::sym_is_contiguous_custom( at::MemoryFormat memory_format) const { if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) { return pyobj_slot_.load_pyobj_interpreter()->is_contiguous( this, memory_format); } return sym_is_contiguous_default(memory_format); } ``` namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format); This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning. once we removed that implicit size oblivious reasoning, the right thing we want is to call return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format); otherwise we would get DDE even if the caller is doing sym_is_contiguous. so I had to define it for pyinterpreter, and then I had to override it for nested tensors. Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see `e444cd24d4` Rollback Plan: Differential Revision: D80435179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160869 Approved by: https://github.com/ezyang	2025-09-08 22:59:13 +00:00
PyTorch MergeBot	b82aa3df20	Revert "Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197 )" This reverts commit e444cd24d48b3a46f067974f2cc157f5ed27709f. Reverted https://github.com/pytorch/pytorch/pull/159197 on behalf of https://github.com/laithsakka due to internal build failures ([comment](https://github.com/pytorch/pytorch/pull/159197#issuecomment-3195436668))	2025-08-18 07:22:13 +00:00
Laith Sakka	e444cd24d4	Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. (#159197 ) This might cause some new DDEs on call sites that do not use is_contiguous_or_false() or sym_is_contiguous() but want to find those call sites to handle this properly by calling is_contiguous_or_false() and not is_contiguous() explitly when appropriate. I had to fix one issue after removing the implicit size oblivious reasoning. here is context we defined in this https://github.com/pytorch/pytorch/pull/157472 sym_is_contiguous to be the function computing contiguity for dynamic shapes in c++. It returns a symbolic expression that represents contiguity and guaranteed not to throw a DDE. when people call is_contiguous we do sym_is_contiguous().guard_bool() when people call is_contiguous_or_false we do sym_is_contiguous().guard_or_false() one issue not handled well was this path ``` c10::SymBool TensorImpl::sym_is_contiguous_custom( at::MemoryFormat memory_format) const { if (C10_UNLIKELY(matches_python_custom(SizesStridesPolicy::CustomStrides))) { return pyobj_slot_.load_pyobj_interpreter()->is_contiguous( this, memory_format); } return sym_is_contiguous_default(memory_format); } ``` namely if we call sym_is_contiguous_custom but we have matches_python_custom(SizesStridesPolicy::CustomStrides) return true , then we used to call is_contiguous(this, memory_format); This used to go through the load_pyobj_interpreter and end up calling the python is_contiguous call which used implicit size oblivious reasoning. once we removed that implicit size oblivious reasoning, the right thing we want is to call return pyobj_slot_.load_pyobj_interpreter()->sym_is_contiguous(this, memory_format); otherwise we would get DDE even if the caller is doing sym_is_contiguous. so I had to define it for pyinterpreter, and then I had to override it for nested tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159197 Approved by: https://github.com/ezyang	2025-08-16 09:15:58 +00:00
Xuehai Pan	a69785b3ec	[BE] fix typos in tools/ (#156082 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156082 Approved by: https://github.com/soulitzer ghstack dependencies: #156079	2025-06-17 19:25:50 +00:00
Xuehai Pan	c73a92fbf5	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 ) Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements > Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target: > > ```python > # Input > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > > # Black > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > # Ruff > assert len(policy_types) >= priority + num_duplicates, ( > f"This tests needs at least {priority + num_duplicates} many types." > ) > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546 Approved by: https://github.com/malfet	2025-02-27 20:46:16 +00:00
Xuehai Pan	754fb834db	[BE][CI] bump `ruff` to 0.9.0: string quote styles (#144569 ) Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting - Change the outer quotes to double quotes for nested f-strings ```diff - f'{", ".join(args)}' + f"{', '.join(args)}" ``` - Change the inner quotes to double quotes for triple f-strings ```diff string = """ - {', '.join(args)} + {", ".join(args)} """ ``` - Join implicitly concatenated strings ```diff - string = "short string " "short string " f"{var}" + string = f"short string short string {var}" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569 Approved by: https://github.com/Skylion007 ghstack dependencies: #146509	2025-02-24 19:56:09 +00:00
cyy	82aaf64422	[3/N] Apply py39 ruff fixes (#142115 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/142115 Approved by: https://github.com/ezyang	2024-12-11 17:50:10 +00:00
Tom Ritchford	498a7808ff	Fix unused Python variables outside torch/ and test/ (#136359 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359 Approved by: https://github.com/albanD	2024-12-11 17:10:23 +00:00
cyy	70ba471957	[3/N] Fix clang-tidy warnings in python_variable_methods.cpp (#139248 ) Follows #139158 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139248 Approved by: https://github.com/ezyang	2024-10-31 03:29:19 +00:00
Xuehai Pan	267f82b860	[BE] Format `.ci/` / `.github/` / `benchmarks/` / `functorch/` / `tools/` / `torchgen/` with `ruff format` (#132577 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132577 Approved by: https://github.com/malfet	2024-10-11 18:30:26 +00:00
Xuehai Pan	8a67daf283	[BE][Easy] enable postponed annotations in `tools` (#129375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375 Approved by: https://github.com/malfet	2024-06-29 09:23:35 +00:00
PyTorch MergeBot	a32ce5ce34	Revert "[BE][Easy] enable postponed annotations in `tools` (#129375 )" This reverts commit 59eb2897f1745f513edb6c63065ffad481c4c8d0. Reverted https://github.com/pytorch/pytorch/pull/129375 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I need to revert to cleanly revert https://github.com/pytorch/pytorch/pull/129374, please do a rebase and reland this ([comment](https://github.com/pytorch/pytorch/pull/129375#issuecomment-2197800541))	2024-06-29 00:44:25 +00:00
Xuehai Pan	59eb2897f1	[BE][Easy] enable postponed annotations in `tools` (#129375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375 Approved by: https://github.com/malfet	2024-06-28 15:37:54 +00:00
Xuehai Pan	35ea5c6b22	[3/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torchgen (#127124 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127124 Approved by: https://github.com/Skylion007 ghstack dependencies: #127122, #127123	2024-05-25 19:20:03 +00:00
Brian Hirsh	f25c7c9699	functionalize storage resizing, minimal ppFSDP traceable forward (#122434 ) More details further down, but first a more high-level description of "how do we functionalize storage resizing" Today, dynamo converts `param.untyped_storage().resize_(x)` calls that it sees from fsdp into a custom op, `ops.inductor.resize_storage_bytes_(x)` So given this setup, there are 3 main cases that I think we want to handle: (1) graph input starts with a real storage size, gets resized down to zero in the graph (2) graph input starts with 0 storage size, gets resized up in the graph (3) graph input starts with 0 storage size, gets resized up and used in some compute, then resized back down to 0 For case (1) we need to emit a `resize_storage_bytes_` at the end of the graph, similar to how we emit `copy_()` for data mutations. For case (2), we need to emit a `resize_storage_bytes_` in the graph, and we also need to emit a `copy_()` (the input had its storage resized up, and filled in with data, which is we need to reflect as an input mutation) For case (3), the net effect is that the input had no data on entry and exit of the function, so we don't need to emit any mutable ops in the end of the graph. The main thing to call out is that: we need to write a functionalization rule for `resize_storage_byte_`, (`FunctionalTensorWrapper::storage_resize_()`) and this rule actually does very little. We would like to not emit any new ops in the graph (like say, a functional resize op). Instead, we should expect / rely on the fact that any resize up will be immediately followed by a `copy_()`/`foreach_copy_`/`out=` op, that will fill in the data of the tensor. So `FunctionalTensor` can temporarily live in a state where its data is invalid, until the `x.copy_(y)` "updates" its data with the new tensor. So effectively, all that this rule does is: (1) it stores metadata on the storage, indicating that the tensor was resized, as well as the updated storage size. We need this info in AOTAutograd, so it knows whether to emit a mutable resize_() op in the graph epilogue (2) There is also a corner case: if we are resizing down to zero, but our tensor had previously had a zero size storage, then we update `value_` to point to the original value of the tensor. The reason this seems safe is because if we have a zero storage sized tensor `x`, and we resize it up, use it in some compute, resize it back down to zero, and use it somewhere, we would want the functional version of this code to use the original `x` after the second resize. For FSDP, this is important because we end up saving parameters (graph inputs) for backward, and we want to make sure that the thing we save (and the output to the forward graph) is the original, zero-storage-sized parameter, and not the "version 2" of the parameter after the first resize_() I think a good order to look at changes in this PR would be: (1) `test_aotdispatch.py` shows the 3 main cases I focused on as well as the expected functionalized graphs (2) In `FunctionalStorageImpl.h/cpp`, I had to add a notion of "original base", and "original/curr_size". The first is so I can re-use the zero-size tensor after multiple resizes, and the second is so I can tell in AOTAutograd whether any resizes canceled each other out into a no-op (3) FunctionalTensorWrapper.h/cpp has the new resize functionalizion rule + some extra utils (4) `_functorch/_autograd`: the main changes in this folder were around adding the logic at trace-time to detect when we need to put a resize_() in the graph. I also have some assertions to check that any inputs that experience storage resizing will always be in the graph and not the opaque epilogue, and I also limited the resize_() mutation case so that you can only ever start with zero storage, or end with zero storage (you can't do e.g. `torch.ones(2).storage().resize_(3)`), and banned it on tensor subclasses (5) `fake_tensor.py`/`meta_utils.py`: we now need to be able to fakeify tensors with zero storage, so I added a quick version of it in meta_utils.py. This also.. has ramifications for fake tensor caching that I need to fix (include the storage size on the cache key, maybe?) ------------------ This PR subsumes https://github.com/pytorch/pytorch/pull/120971. This PR is enough to almost get a simple ppFSDP forward pass tracing with a functionalized resize_() properly. It also attempts to do the updated version from @jansel, where we don't have any notion of `resize_()` in the graph at all, post functionalization. It would probably be good to test it with @yf225 's FSDP changes, and see how many of the FX passes it allows us to remove. I think that in theory, it should allow us to remove all FX passes that affect the forward graph / partitioner, except the one that forces views to be recomputed in the backward (more details below). There are a few things worth calling out: (1) failed attempt at functionalizing `aten.copy_()`. I originally wanted to get a version takes these operations: ``` param.storage().resize_(all_gather_size) param.copy_(all_gather_buffer) out = aten.matmul(param, param) ``` and functionalizes them into: ``` out = aten.matmul(all_gather_buffer, all_gather_buffer) ``` This would involve getting functionalization to turn `x.copy_(y)` into a giant no-op that just returns `y`. Unfortunately, we can't actually do this in a reasonable way within functionalization (instead, there's a functional `aten.copy` in the graph - see the test case graph expecttest for details). Why? In order for that transformation to be safe, `x` and `y` need to have the same metadata. However, it's possible for `x` and `y` to be subclasses of different types. This is not something we can easily tell from within functionalization, and would be a layering violation. So for now I'm leaving it to downstream code to optimize away the `aten.copy` (this is already the case today, so I think inductor can handle this) (2) The forward doesn't actually run successfully in this PR (see the `assertRaisesRegex` in the test). Why? The final forward graph looks like this: ``` def forward(self, primals_1, primals_2): _foreach_copy = torch.ops.aten._foreach_copy.default([primals_1], [primals_2]); primals_2 = None getitem = _foreach_copy[0]; _foreach_copy = None mm = torch.ops.aten.mm.default(getitem, getitem); getitem = None t_1 = torch.ops.aten.t.default(primals_1); primals_1 = None return [mm, t_1] ``` Where `primals_1` starts out as a secretly-zero-storage-size parameter, and gets resized up and back down within the forward (these are functionalized away). Importantly, the matmul happy on the result of the `foreach_copy`, but the activation that we save for backward (`t_1`) is the result of transposing the original parameter (the zero-storage-size param). This is exactly the optimization in fsdp that allows us to have good peak memory usage. The problem is that the min-cut partitioner decides to save `t_1` for backward. Running this code in eager breaks, because the kernel for `aten.permute(x)` is not happy when `x` has secretly-zero-sized-storage. The real problem here is that in eager mode the `permute` kernel runs during the backward, after backward hooks have properly resized the saved activation. Here, we are running the transpose in the forward. One option would be to turn off the checks in our view kernels and allow them to work on zero-storage-sized tensors, which feels pretty bad. Another option is to tweak the partitioner (or use one of Will's FX passes) to force the partitioner to not save views for backward, and allow the views to be recomputed in the backward. This seems kind of silly, but is also probably harmless. (3) The backward is still broken. To be fair, this issue is pretty separable from "functionalizing storage resize calls", and can be fixed later (either by a real fix to our tracing infra, or via another hacky FX pass). More description of this problem is described at issue (8) of my PR description in https://github.com/pytorch/pytorch/pull/120971 (4) I only added support for "full graph" resizing: basically, the limited case where a param starts with zero storage size, and gets resized up and back down. I think we can add support for the graph break case, but I think we can keep that add-on separate from this PR unless we need it immediately. I also added asserts so we should fail loudly when we hit this case (5) I have a change to FakeTensor creation when inputs have zero storage size that.. is probably ok. But I also removed FakeTensor caching on view ops, which I probably need to fix before I can land this PR (6) I added a notion of "original_base" to `FunctionalStorageImpl`. More details are in the comments, but my rational for this was that we basically need it to ensure that autograd saves the original, zero-storage-sized param for backward, after resizing up and back down (7) I had to update our eager kernels for `aten.copy` and `aten._foreach_copy`, to handle the case where the `self` argument has secretly-zero-storage. Inductor can probably generate correct code for this case, but we need these ops to work properly in this situation for the `aot_eager` backend to do the right thing Pull Request resolved: https://github.com/pytorch/pytorch/pull/122434 Approved by: https://github.com/jansel	2024-05-10 18:09:10 +00:00
andrewor14	773ae817f7	Batch Norm Consolidation (#116092 ) Summary: This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op: `batch_norm_with_update`. The existing hierarchy looks like: ``` aten.batch_norm -> aten._batch_norm_impl_index -> [ aten.native_batch_norm -> aten._native_batch_norm_legit (export only) -> _batch_norm_legit_cpu/cuda (kernels, export only) -> _batch_norm_cpu/cuda (kernels) ] OR [ aten.cudnn_batch_norm ] OR [ aten.miopen_batch_norm ] ``` Aside from complexity, an important problem with the above decomposition hierarchy is cuda numerics in export flows. We observed significantly worse convergence when training a mobilenetv2-like model when using the `_batch_norm_cuda` kernel instead of the `cudnn_batch_norm` kernel. This means users who export their models on CPU first then move the models to cuda later may silently see worse accuracies even when cudnn is installed, because they are using the worse kernel. This issue is summarized in https://github.com/pytorch/pytorch/issues/111384. Instead, the new hierarchy proposed by consolidating existing batch norm ops will look like: ``` aten.batch_norm -> aten.batch_norm_with_update -> [ _batch_norm_cpu (kernel) ] OR [ _batch_norm_cuda (kernel) ] OR [ cudnn_batch_norm (kernel) ] OR [ miopen_batch_norm (kernel) ] ``` The new op `batch_norm_with_update` hides backend implementation details and automatically picks the right kernel based on what is installed. This commit also adds the following variants to this op: ``` batch_norm_with_update_functional batch_norm_with_update.out batch_norm_no_update batch_norm_no_update.out batch_norm_backward ``` Note that this commit only adds this op and its variants, but does not actually change the decomps to produce these ops in the graph. This will be done after the 2 week FC window, and the ops used in the old stack is planned to be removed after the 6 month BC window. Test Plan: `OpInfo` tests for `batch_norm_with_update`. Reviewers: albanD, bdhirsh Subscribers: albanD, bdhirsh, supriyar Tasks: https://github.com/pytorch/pytorch/issues/111384 Differential Revision: [D54805279](https://our.internmc.facebook.com/intern/diff/D54805279) Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2024-03-18 21:01:30 +00:00
PyTorch MergeBot	fd0dbcd891	Revert "Batch Norm Consolidation (#116092 )" This reverts commit 7b4f70eda519ccd7f28de17689edd43c52743bc9. Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/osalpekar due to Causes build failure in //caffe2:aten-hip (AMD build) target. See [D54707318](https://www.internalfb.com/diff/D54707318) for more details, may require internal build system changes to resolve. ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1989542965))	2024-03-11 22:22:41 +00:00
Masaki Kozuki	82bb06334d	Update python binding for in-place foreach to return `List[Tensor]` (#121405 ) fixes #104817 taking over #118622 ```c++ // _foreach_atan_ static PyObject * THPVariable__foreach_atan_(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PythonArgParser parser({ "_foreach_atan_(TensorList self)", }, /traceable=/false); ParsedArgs<1> parsed_args; auto _r = parser.parse(nullptr, args, kwargs, parsed_args); if(_r.has_torch_function()) { return handle_torch_function(_r, nullptr, args, kwargs, THPVariableFunctionsModule, "torch"); } // aten::_foreach_atan_(Tensor(a!)[] self) -> () // auto dispatch__foreach_atan_ = [](at::TensorList self) -> at::TensorList { auto dispatch__foreach_atan_ = [](at::TensorList self) -> void { pybind11::gil_scoped_release no_gil; at::_foreach_atan_(self); }; dispatch__foreach_atan_(_r.tensorlist(0)); PyObject* self_tensorlist = _r.args[0]; Py_INCREF(self_tensorlist); return self_tensorlist; Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ... // _foreach_div_ static PyObject * THPVariable__foreach_div_(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PythonArgParser parser({ "_foreach_div_(TensorList self, ScalarList scalars)", "_foreach_div_(TensorList self, Tensor other)", "_foreach_div_(TensorList self, TensorList other)", "_foreach_div_(TensorList self, Scalar scalar)", }, /traceable=/false); ParsedArgs<2> parsed_args; auto _r = parser.parse(nullptr, args, kwargs, parsed_args); if(_r.has_torch_function()) { return handle_torch_function(_r, nullptr, args, kwargs, THPVariableFunctionsModule, "torch"); } switch (_r.idx) { case 0: { // aten::_foreach_div_.ScalarList(Tensor(a!)[] self, Scalar[] scalars) -> () // auto dispatch__foreach_div_ = [](at::TensorList self, at::ArrayRef<at::Scalar> scalars) -> at::TensorList { auto dispatch__foreach_div_ = [](at::TensorList self, at::ArrayRef<at::Scalar> scalars) -> void { pybind11::gil_scoped_release no_gil; at::_foreach_div_(self, scalars); }; dispatch__foreach_div_(_r.tensorlist(0), _r.scalarlist(1)); PyObject* self_tensorlist = _r.args[0]; Py_INCREF(self_tensorlist); return self_tensorlist; } case 1: { // aten::_foreach_div_.Tensor(Tensor(a!)[] self, Tensor other) -> () // auto dispatch__foreach_div_ = [](at::TensorList self, const at::Tensor & other) -> at::TensorList { auto dispatch__foreach_div_ = [](at::TensorList self, const at::Tensor & other) -> void { pybind11::gil_scoped_release no_gil; at::_foreach_div_(self, other); }; dispatch__foreach_div_(_r.tensorlist(0), _r.tensor(1)); PyObject* self_tensorlist = _r.args[0]; Py_INCREF(self_tensorlist); return self_tensorlist; } case 2: { // aten::_foreach_div_.List(Tensor(a!)[] self, Tensor[] other) -> () // auto dispatch__foreach_div_ = [](at::TensorList self, at::TensorList other) -> at::TensorList { auto dispatch__foreach_div_ = [](at::TensorList self, at::TensorList other) -> void { pybind11::gil_scoped_release no_gil; at::_foreach_div_(self, other); }; dispatch__foreach_div_(_r.tensorlist(0), _r.tensorlist(1)); PyObject* self_tensorlist = _r.args[0]; Py_INCREF(self_tensorlist); return self_tensorlist; } case 3: { // aten::_foreach_div_.Scalar(Tensor(a!)[] self, Scalar scalar) -> () // auto dispatch__foreach_div_ = [](at::TensorList self, const at::Scalar & scalar) -> at::TensorList { auto dispatch__foreach_div_ = [](at::TensorList self, const at::Scalar & scalar) -> void { pybind11::gil_scoped_release no_gil; at::_foreach_div_(self, scalar); }; dispatch__foreach_div_(_r.tensorlist(0), _r.scalar(1)); PyObject* self_tensorlist = _r.args[0]; Py_INCREF(self_tensorlist); return self_tensorlist; } } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121405 Approved by: https://github.com/soulitzer	2024-03-08 21:00:01 +00:00
andrewor14	7b4f70eda5	Batch Norm Consolidation (#116092 ) Summary: This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op: `batch_norm_with_update`. The existing hierarchy looks like: ``` aten.batch_norm -> aten._batch_norm_impl_index -> [ aten.native_batch_norm -> aten._native_batch_norm_legit (export only) -> _batch_norm_legit_cpu/cuda (kernels, export only) -> _batch_norm_cpu/cuda (kernels) ] OR [ aten.cudnn_batch_norm ] OR [ aten.miopen_batch_norm ] ``` Aside from complexity, an important problem with the above decomposition hierarchy is cuda numerics in export flows. We observed significantly worse convergence when training a mobilenetv2-like model when using the `_batch_norm_cuda` kernel instead of the `cudnn_batch_norm` kernel. This means users who export their models on CPU first then move the models to cuda later may silently see worse accuracies even when cudnn is installed, because they are using the worse kernel. This issue is summarized in https://github.com/pytorch/pytorch/issues/111384. Instead, the new hierarchy proposed by consolidating existing batch norm ops will look like: ``` aten.batch_norm -> aten.batch_norm_with_update -> [ _batch_norm_cpu (kernel) ] OR [ _batch_norm_cuda (kernel) ] OR [ cudnn_batch_norm (kernel) ] OR [ miopen_batch_norm (kernel) ] ``` The new op `batch_norm_with_update` hides backend implementation details and automatically picks the right kernel based on what is installed. This commit also adds the following variants to this op: ``` batch_norm_with_update_functional batch_norm_with_update.out batch_norm_no_update batch_norm_no_update.out batch_norm_backward ``` Note that this commit only adds this op and its variants, but does not actually change the decomps to produce these ops in the graph. This will be done after the 2 week FC window, and the ops used in the old stack is planned to be removed after the 6 month BC window. Test Plan: `OpInfo` tests for `batch_norm_with_update`. Reviewers: albanD, bdhirsh Subscribers: albanD, bdhirsh, supriyar Tasks: https://github.com/pytorch/pytorch/issues/111384 Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2024-03-08 15:07:15 +00:00
PyTorch MergeBot	b529c19bdf	Revert "Batch Norm Consolidation (#116092 )" This reverts commit 5680f565d5b7d4aa412a3988d3d91ca4c5679303. Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/jeffdaily due to broke ROCm, PR signal was clean but trunk was not, the merge should have been blocked but wasn't ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1981373237))	2024-03-06 17:10:01 +00:00
Tugsbayasgalan Manlaibaatar	5680f565d5	Batch Norm Consolidation (#116092 ) Summary: This commit simplifies the existing decomposition hierarchy of batch norm ops by adding a single, backend agnostic op: `batch_norm_with_update`. The existing hierarchy looks like: ``` aten.batch_norm -> aten._batch_norm_impl_index -> [ aten.native_batch_norm -> aten._native_batch_norm_legit (export only) -> _batch_norm_legit_cpu/cuda (kernels, export only) -> _batch_norm_cpu/cuda (kernels) ] OR [ aten.cudnn_batch_norm ] OR [ aten.miopen_batch_norm ] ``` Aside from complexity, an important problem with the above decomposition hierarchy is cuda numerics in export flows. We observed significantly worse convergence when training a mobilenetv2-like model when using the `_batch_norm_cuda` kernel instead of the `cudnn_batch_norm` kernel. This means users who export their models on CPU first then move the models to cuda later may silently see worse accuracies even when cudnn is installed, because they are using the worse kernel. This issue is summarized in https://github.com/pytorch/pytorch/issues/111384. Instead, the new hierarchy proposed by consolidating existing batch norm ops will look like: ``` aten.batch_norm -> aten.batch_norm_with_update -> [ _batch_norm_cpu (kernel) ] OR [ _batch_norm_cuda (kernel) ] OR [ cudnn_batch_norm (kernel) ] OR [ miopen_batch_norm (kernel) ] ``` The new op `batch_norm_with_update` hides backend implementation details and automatically picks the right kernel based on what is installed. This commit also adds the following variants to this op: ``` batch_norm_with_update_functional batch_norm_with_update.out batch_norm_no_update batch_norm_no_update.out batch_norm_backward ``` Note that this commit only adds this op and its variants, but does not actually change the decomps to produce these ops in the graph. This will be done after the 2 week FC window, and the ops used in the old stack is planned to be removed after the 6 month BC window. Test Plan: `OpInfo` tests for `batch_norm_with_update`. Reviewers: albanD, bdhirsh Subscribers: albanD, bdhirsh, supriyar Tasks: https://github.com/pytorch/pytorch/issues/111384 Co-authored-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116092 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2024-03-06 04:50:46 +00:00
Isuru Fernando	c3496d50f0	Fix torch.return_types init signature (#119284 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119284 Approved by: https://github.com/peterbell10, https://github.com/XuehaiPan	2024-02-23 21:52:34 +00:00
Aaron Gokaslan	b7b2178204	[BE]: Remove useless lambdas (#113602 ) Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602 Approved by: https://github.com/albanD	2023-11-14 20:06:48 +00:00
Joel Schlosser	43ea782af3	Multiprocessing support for NT (#110292 ) Fixes #110161 Allows NTs to be used in DataLoaders with `num_workers > 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292 Approved by: https://github.com/cpuhrsch, https://github.com/albanD	2023-10-10 21:58:19 +00:00
Kazuaki Ishizaki	50bd252863	Fix typo `the the` (#110869 ) This PR fixes typo `the the` of comments and exception message in files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110869 Approved by: https://github.com/soulitzer	2023-10-09 19:32:45 +00:00
Joel Schlosser	3693777a86	Pickle support for NT (#110219 ) Fixes #104198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110219 Approved by: https://github.com/cpuhrsch	2023-09-29 15:30:06 +00:00
Andrei Gheorghe	00908475e6	Use global variables to register the return_types namedtuples (#108832 ) Fixes #69221. Builds on top of #107000, fixing the buck build issue linked [here](https://github.com/pytorch/pytorch/pull/107000#issuecomment-1708857375). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108832 Approved by: https://github.com/zou3519	2023-09-13 17:42:46 +00:00
PyTorch MergeBot	27d5dcf589	Revert "Use global variables to register the return_types namedtuples (#107000 )" This reverts commit ae8eb7a3f9aee106affca3b27c1f4031bd216730. Reverted https://github.com/pytorch/pytorch/pull/107000 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing internal build ([comment](https://github.com/pytorch/pytorch/pull/107000#issuecomment-1708862325))	2023-09-06 18:13:23 +00:00
Andrei Gheorghe	ae8eb7a3f9	Use global variables to register the return_types namedtuples (#107000 ) Fixes #69221 @pytorchbot label "topic: not user facing" Pull Request resolved: https://github.com/pytorch/pytorch/pull/107000 Approved by: https://github.com/zou3519	2023-09-05 20:00:29 +00:00
Justin Chu	14d87bb5ff	[BE] Enable ruff's UP rules and autoformat tools and scripts (#105428 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105428 Approved by: https://github.com/albanD, https://github.com/soulitzer, https://github.com/malfet	2023-07-19 01:24:44 +00:00
SherlockNoMad	d997969b8b	[Reland] Add sym_size/stride/numel/storage_offset to native_function.yaml (#103107 ) Differential Revision: D46459100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103107 Approved by: https://github.com/angelayi, https://github.com/soulitzer	2023-06-12 19:18:49 +00:00
Nikita Shulga	20cf42de2c	Revert "[Reland] Add sym_size/stride/numel/storage_offset to native_function.… (#100749 )" This reverts commit bb454891ed5ce97f580ae52e20f8e9ff2d0f3bf5.	2023-05-16 18:17:02 -07:00
Andrew Gallagher	799ef7e501	[caffe2/tools/autograd] Fix non-determinism in code gen (#101425 ) Fix several cases of leaking set-iteration-order to generated sources, causing non-determinism in generated code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101425 Approved by: https://github.com/albanD	2023-05-16 00:54:03 +00:00
PyTorch MergeBot	2341bd69e9	Revert "[caffe2/tools/autograd] Fix non-determinism in code gen (#101287 )" This reverts commit 52f526cfc0092978ebe6d7be8ae2e71a6d989254. Reverted https://github.com/pytorch/pytorch/pull/101287 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/101287#issuecomment-1548273201))	2023-05-15 17:39:14 +00:00
Sherlock Huang	bb454891ed	[Reland] Add sym_size/stride/numel/storage_offset to native_function.… (#100749 ) …yaml (#91… (#91919) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/91919 Approved by: https://github.com/ezyang Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/92402 Reviewed By: ezyang Differential Revision: D42565586 Pulled By: SherlockNoMad fbshipit-source-id: 1c2986e45307e076d239836a1b45441a9fa3c9d9 ghstack-source-id: 969f4928486e04c57aaf98e20e3c3ca946c51613 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100749 Approved by: https://github.com/zhxchen17, https://github.com/albanD	2023-05-12 22:57:42 +00:00
Andrew Gallagher	52f526cfc0	[caffe2/tools/autograd] Fix non-determinism in code gen (#101287 ) Fix several cases of leaking set-iteration-order to generated sources, causing non-determinism in generated code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101287 Approved by: https://github.com/albanD	2023-05-12 20:23:50 +00:00
Richard Zou	4135295a76	Excise yaml dependency in torchgen.model (#100203 ) The problem: - The new CustomOp API depends on torchgen.model - torchgen.model imports `yaml` - `yaml` is not a PyTorch runtime dependency To unblock myself, because I'm not sure how long it'll take to convince people yaml should be a PyTorch runtime dependency (unless one of you wants to approve #100166), this PR removes the yaml dependency from torchgen.model. It does so by splitting torchgen.utils (the offender) into torchgen.utils (no yaml) and torchgen.yaml (which uses yaml). Test Plan: - CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/100203 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-04-28 13:45:39 +00:00
mikey dagitses	cc628293bf	simplify method_def generation (#100059 ) simplify method_def generation Summary: This removes some duplication. This was originally done to streamline a subsequent change, but that change turned out to be misguided. Nevertheless, this is a nice simplification. Test Plan: This should change the code gen by removing some redundant parentheses. Rely on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100059 Approved by: https://github.com/ezyang	2023-04-26 18:46:57 +00:00
BJ Hargrave	555ab310dc	Add itemsize and nbytes properties to Tensor (#98322 ) Adds properties for itemsize and nbytes to Tensor matching the properties in NumPy. Fixes https://github.com/pytorch/pytorch/issues/12728 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98322 Approved by: https://github.com/ezyang	2023-04-05 12:11:55 +00:00
PyTorch MergeBot	f4f1a5b5b3	Revert "Move functional collectives to the right namespace (#97793 )" This reverts commit 184bfbc3d7b37e8f202f4938f6ea9ba557c93b1e. Reverted https://github.com/pytorch/pytorch/pull/97793 on behalf of https://github.com/atalman due to breaks internal builds	2023-03-31 16:02:07 +00:00
Rodrigo Kumpera	184bfbc3d7	Move functional collectives to the right namespace (#97793 ) This moves them from `torch._C._nn` to `torch._C._dist` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97793 Approved by: https://github.com/albanD	2023-03-30 22:18:13 +00:00
Aaron Gokaslan	47dca20d80	[BE] Enable flake8-comprehension rule C417 (#97880 ) Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880 Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD	2023-03-30 14:34:24 +00:00
Joel Schlosser	77e73b9b7a	Refactor NT offsets metadata to be a Tensor (#96909 ) It's tedious work, but somebody's gotta do it. Benefits: * Enable access to offsets metadata from Python via private API (for validation, etc.) * Consistency with nested sizes / strides metadata * Needed for SymInt-ifying offsets metadata * more TBD Bonus: * Remove `_tensor` suffixes from metadata / getter names Pull Request resolved: https://github.com/pytorch/pytorch/pull/96909 Approved by: https://github.com/drisspg	2023-03-21 18:51:35 +00:00
Mikayla Gawarecki	c7c7238976	Fix bug in unsqueeze_nested stride calculation (#88688 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88688 Approved by: https://github.com/cpuhrsch	2023-02-10 17:00:04 +00:00
PyTorch MergeBot	f7bd5d0ccb	Revert "[Reland] Add sym_size/stride/numel/storage_offset to native_function.yaml (#91… (#92402 )" This reverts commit 965f4ea3bac8186b99119e73b9ff00e390a5d28b. Reverted https://github.com/pytorch/pytorch/pull/92402 on behalf of https://github.com/zhxchen17 due to Caused a regression for an export model.	2023-02-03 03:12:43 +00:00
Sherlock Huang	965f4ea3ba	[Reland] Add sym_size/stride/numel/storage_offset to native_function.yaml (#91… (#92402 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91919 Approved by: https://github.com/ezyang Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/92402 Approved by: https://github.com/ezyang	2023-02-01 04:47:49 +00:00
Ivan Yashchuk	fba13d94a1	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. - [x] XLA PR: https://github.com/pytorch/xla/pull/4498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet	2023-01-31 11:59:11 +00:00
Jacob Szwejbka	2e9107ec1e	[Pytorch][Executorch] Handwritten view copy out ops should resize out (#91194 ) Summary: Handwritten out ops should have feature parity with the codegend ones. This means they should resize out to the appropriate size. Q. Why are these handwritten instead of codegend anyway? Q2. Wheres a good spot to put the resize and copy helpers since they are reused in the codegend out kernels Test Plan: ci. Differential Revision: D42177051 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91194 Approved by: https://github.com/ezyang	2023-01-30 23:07:14 +00:00
PyTorch MergeBot	acdd462b1a	Revert "Remove deprecated torch.symeig (#70988 )" This reverts commit d70ed68162521341060b06985620cdbef04a8fa9. Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful	2023-01-24 19:03:40 +00:00

1 2 3 4 5 ...

362 Commits