pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Xuehai Pan	4226ed1585	[BE] Format uncategorized Python files with `ruff format` (#132576 ) Remove patterns ``, `test/`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: #132574	2024-08-04 17:13:31 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
rzou	e393c7fa05	Tighten torch.library.infer_schema input types (#130705 ) Made the following changes: - mutates_args is now keyword-only and mandatory. This is to align with torch.library.custom_op (which makes it mandatory because it's easy to miss) - op_name is now keyword-only. This helps the readability of the API - updated all usages of infer_schema This change is not BC-breaking because we introduced torch.library.infer_schema a couple of days ago. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130705 Approved by: https://github.com/yushangdi ghstack dependencies: #131777	2024-07-29 16:01:19 +00:00
Shangdi Yu	68c725a094	[custom ops] Add register_vmap for custom ops (#130589 ) Fixes #130284 Fixes #130653 - Add `torch.library.register_vmap` to custom ops - Add `register_vmap` for operators in ops in custom_op_db. - Make `torch.autograd.Function` support kwarg-only kwargs for vmap - test operators in op_db with `tests/test_vmap`. - change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589 Approved by: https://github.com/zou3519	2024-07-23 17:48:38 +00:00
PyTorch MergeBot	b435d84261	Revert "[custom ops] Add register_vmap for custom ops (#130589 )" This reverts commit 074b42064195c45471912f851e94c753992a9a1f. Reverted https://github.com/pytorch/pytorch/pull/130589 on behalf of https://github.com/atalman due to Please fix lint and reland ([comment](https://github.com/pytorch/pytorch/pull/130589#issuecomment-2244092174))	2024-07-23 01:44:44 +00:00
Shangdi Yu	074b420641	[custom ops] Add register_vmap for custom ops (#130589 ) Fixes #130284 Fixes #130653 - Add `torch.library.register_vmap` to custom ops - Add `register_vmap` for operators in ops in custom_op_db. - Make `torch.autograd.Function` support kwarg-only kwargs for vmap - test operators in op_db with `tests/test_vmap`. - change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589 Approved by: https://github.com/zou3519	2024-07-23 00:54:52 +00:00
Xuehai Pan	ba48cf6535	[BE][Easy][6/19] enforce style for empty lines in import segments in `test/` (#129757 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757 Approved by: https://github.com/ezyang	2024-07-17 06:42:37 +00:00
PyTorch MergeBot	68a4f2a3df	Revert "Tighten torch.library.infer_schema input types (#130705 )" This reverts commit ca2d424c6e5358f9fee8dc9ee7477de76b50f848. Reverted https://github.com/pytorch/pytorch/pull/130705 on behalf of https://github.com/atalman due to Failing internal CI ([comment](https://github.com/pytorch/pytorch/pull/130705#issuecomment-2230821876))	2024-07-16 12:57:11 +00:00
rzou	ca2d424c6e	Tighten torch.library.infer_schema input types (#130705 ) Made the following changes: - mutates_args is now keyword-only and mandatory. This is to align with torch.library.custom_op (which makes it mandatory because it's easy to miss) - op_name is now keyword-only. This helps the readability of the API - updated all usages of infer_schema This change is not BC-breaking because we introduced torch.library.infer_schema a couple of days ago. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130705 Approved by: https://github.com/yushangdi	2024-07-15 16:43:57 +00:00
rzou	9c69684af8	[custom_ops] expose torch.library.register_torch_dispatch (#130261 ) This is the API for defining the interaction between a torch_dispatch class and a custom op. Taking API bikeshedding. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261 Approved by: https://github.com/albanD ghstack dependencies: #130064	2024-07-12 14:13:01 +00:00
rzou	ba941769b5	Add API for open registration between operators and subclasses (and modes) (#130064 ) We add torch.library.Library._register_torch_dispatch_rule. Here, a user can provide us a specific rule to run for a specific (torch_dispatch_class, operator) pair. The motivation is that a user might want to extend a subclass/mode but may not have access to the source code of the subclass/mode. I'll make this public in a follow-up PR if we think the approach and API is good. Keep in mind that many subclasses will likely deliver their own open registration solution (DTensor has register_sharding_prop_rule and NJT has register_jagged_op); _register_torch_dispatch_rule is meant as a catch-all open registration mechanism for when the subclass hasn't provided anything more specific. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130064 Approved by: https://github.com/albanD	2024-07-12 14:13:01 +00:00
Shangdi Yu	a4576dad34	[reland][custom ops] infer schema (#130079 ) Fixes #129617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079 Approved by: https://github.com/zou3519	2024-07-11 03:39:07 +00:00
PyTorch MergeBot	ce499eee0c	Revert "Add API for open registration between operators and subclasses (and modes) (#130064 )" This reverts commit c23d103afae65588772cb30037ea4110f01f6f41. Reverted https://github.com/pytorch/pytorch/pull/130064 on behalf of https://github.com/izaitsevfb due to fails internal builds, see [D59553526](https://www.internalfb.com/diff/D59553526) ([comment](https://github.com/pytorch/pytorch/pull/130064#issuecomment-2221587575))	2024-07-10 21:50:32 +00:00
PyTorch MergeBot	86bca69c5f	Revert "[custom_ops] expose torch.library.register_torch_dispatch (#130261 )" This reverts commit bb9a73f767526e0d23c60360db5212b6bed0e8bc. Reverted https://github.com/pytorch/pytorch/pull/130261 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130261#issuecomment-2221569707))	2024-07-10 21:43:28 +00:00
PyTorch MergeBot	e14a0f45ed	Revert "[reland][custom ops] infer schema (#130079 )" This reverts commit bef085bdfa62cc14589c70279de17108b2c2089f. Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2221561483))	2024-07-10 21:40:16 +00:00
Shangdi Yu	bef085bdfa	[reland][custom ops] infer schema (#130079 ) Fixes #129617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079 Approved by: https://github.com/zou3519	2024-07-10 16:18:36 +00:00
rzou	bb9a73f767	[custom_ops] expose torch.library.register_torch_dispatch (#130261 ) This is the API for defining the interaction between a torch_dispatch class and a custom op. Taking API bikeshedding. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261 Approved by: https://github.com/albanD ghstack dependencies: #130064	2024-07-09 21:11:27 +00:00
rzou	c23d103afa	Add API for open registration between operators and subclasses (and modes) (#130064 ) We add torch.library.Library._register_torch_dispatch_rule. Here, a user can provide us a specific rule to run for a specific (torch_dispatch_class, operator) pair. The motivation is that a user might want to extend a subclass/mode but may not have access to the source code of the subclass/mode. I'll make this public in a follow-up PR if we think the approach and API is good. Keep in mind that many subclasses will likely deliver their own open registration solution (DTensor has register_sharding_prop_rule and NJT has register_jagged_op); _register_torch_dispatch_rule is meant as a catch-all open registration mechanism for when the subclass hasn't provided anything more specific. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130064 Approved by: https://github.com/albanD	2024-07-09 21:11:27 +00:00
Shangdi Yu	cab90b0049	[custom ops] disable kernel temporarily (#130190 ) Fixes #128621 Sometimes we want to disable the backend implementation for testing/benchmarking purposes. For example: ```python @custom_op("mylib::f", mutates_args=()) def f(x: Tensor) -> Tensor: return torch.zeros(1) print(f(torch.randn(1))) # tensor([0.]) @f.register_kernel("cpu") def _(x): return torch.ones(1) print(f(torch.randn(1))). # tensor([1.]) with f.set_kernel_enabled("cpu", enabled = False): print(f(0)) # tensor([0.]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130190 Approved by: https://github.com/williamwen42, https://github.com/zou3519	2024-07-09 16:13:50 +00:00
PyTorch MergeBot	d44c30e2f9	Revert "Add API for open registration between operators and subclasses (and modes) (#130064 )" This reverts commit 922d2737d5e0ad22ee1dcf91c48ab09d641de840. Reverted https://github.com/pytorch/pytorch/pull/130064 on behalf of https://github.com/huydhn due to Sorry for reverting your change but test_profiler_tree is failing in trunk after this lands `922d2737d5`, maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/130064#issuecomment-2216135497))	2024-07-09 01:48:38 +00:00
rzou	922d2737d5	Add API for open registration between operators and subclasses (and modes) (#130064 ) We add torch.library.Library._register_torch_dispatch_rule. Here, a user can provide us a specific rule to run for a specific (torch_dispatch_class, operator) pair. The motivation is that a user might want to extend a subclass/mode but may not have access to the source code of the subclass/mode. I'll make this public in a follow-up PR if we think the approach and API is good. Keep in mind that many subclasses will likely deliver their own open registration solution (DTensor has register_sharding_prop_rule and NJT has register_jagged_op); _register_torch_dispatch_rule is meant as a catch-all open registration mechanism for when the subclass hasn't provided anything more specific. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130064 Approved by: https://github.com/albanD	2024-07-08 22:13:05 +00:00
PyTorch MergeBot	44a773c121	Revert "[custom ops] infer schema (#130079 )" This reverts commit 3fe324ffb612c8712f6af7639c1e7bcec5f3b4fd. Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/huydhn due to The test_public_bindings failure looks legit `3fe324ffb6` ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2215420957))	2024-07-08 22:02:29 +00:00
Shangdi Yu	3fe324ffb6	[custom ops] infer schema (#130079 ) Fixes #129617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079 Approved by: https://github.com/zou3519	2024-07-08 20:46:23 +00:00
Shangdi Yu	2fe7c1fe04	[custom ops] Support factory function (#129978 ) Fixes #129389 If a user registers a device-specific implementation for an operator that accepts no Tensors, then we require the operator to have a "device: torch.device argument" We switch on the device argument to select the correct backend to dispatch to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129978 Approved by: https://github.com/zou3519	2024-07-04 00:10:52 +00:00
rzou	872d972e41	[custom_op] better error message on no returns (#129896 ) I run into this a lot. I can imagine that it would look opaque to users, so made it more friendly Old error message: "ValueError: infer_schema(func): Return has unsupported type <class 'inspect._empty'>." Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/129896 Approved by: https://github.com/yushangdi	2024-07-02 23:34:23 +00:00
Shangdi Yu	aa0352ca38	[custom ops] add default value support for device types (#129792 ) Fixes #129371 I think the first case in Issue #129371 is already supported in the current code? Since it takes care of string default values. This PR adds support for device type default values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129792 Approved by: https://github.com/zou3519	2024-07-02 23:31:29 +00:00
Shangdi Yu	9fb2dec7a6	[custom ops] Add unknown arg (#129614 ) Fixes #129372 Add a mutated_args="unknown" that pessimistically assumes that all inputs to the operator are being mutates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129614 Approved by: https://github.com/zou3519	2024-07-02 16:10:14 +00:00
Shangdi Yu	deaab33f3f	[custom op] add error message (#129417 ) Fixes [#129370](https://github.com/pytorch/pytorch/issues/129370) Suggest correct a List type annotation when input is in Tuple type. To avoid confusion, we only suggest a type if the type is supported. Example: Tuple[int, int] -> List[int] Tuple[Tensor, Tensor, Optional[Tensor]] -> List[Optional[Tensor]] Tuple[int, ...] -> List[int] ValueError: infer_schema(func): Parameter y has unsupported type typing.Tuple[torch.Tensor, torch.Tensor, typing.Optional[torch.Tensor]]. Tuple type annotation is not supported. Please try to use a List instead. For example, typing.List[typing.Optional[torch.Tensor]]. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129417 Approved by: https://github.com/zou3519	2024-06-28 01:03:14 +00:00
rzou	856541c701	[custom_op] support default dtype values (#129189 ) This PR: - moves some of the dtype-string utilities into ScalarType.{h, cpp} - adds a new utility to get a mapping from dtype name to the C++ dtype - the perser now checks if the string is a dtype name; if it is then it pulls the c++ dtype from the mapping. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/129189 Approved by: https://github.com/albanD ghstack dependencies: #129177, #129178, #129179	2024-06-23 00:13:23 +00:00
rzou	5d8e23b49c	[custom_op] Support string default values in schema (#129179 ) Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/129179 Approved by: https://github.com/albanD ghstack dependencies: #129177, #129178	2024-06-21 13:31:40 +00:00
rzou	9972e5f447	Rename impl_abstract to register_fake, part 2/2 (#123938 ) This PR renames the implementation details of register_fake to align more with the new name. It is in its own PR because this is risky (torch.package sometimes depends on private library functions and implementation details). Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/123938 Approved by: https://github.com/williamwen42	2024-06-14 14:37:24 +00:00
rzou	61421c42c0	[custom_op] don't invoke autograd.Function when unnecessary (#127976 ) This matches our autograd logic for pytorch native operators. There's no need to invoke an autograd.Function if we're under a torch.no_grad() or if none of the inputs have requires_grad=True (invoking an autograd.Function results in (noticeable) overhead). Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/127976 Approved by: https://github.com/williamwen42	2024-06-13 23:38:23 +00:00
rzou	7cc07a3eb1	[custom_op] stop using nonlocals to store information (#128547 ) Fixes https://github.com/pytorch/pytorch/issues/128544 Fixes https://github.com/pytorch/pytorch/issues/128535 We had a problem with multithreading where the nonlocals were being clobbered. In the first place, we stored these nonlocals because we wanted to ferry information from an autograd.Function.apply to autograd.Function.forward. Our new approach is: - pass the information directly as an input to the autograd.Function.apply. This means that the autograd.Function.forward will receive the information too. - this messes up ctx.needs_input_grad, which has an element per input to forward. The user should not see the additional information we passed. We fix this by temporarily overriding ctx.needs_input_grad to the right thing. - this exposed a bug in that ctx.needs_input_grad wasn't correct for TensorList inputs. This PR fixes that too. Test Plan: - existing and new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/128547 Approved by: https://github.com/williamwen42, https://github.com/soulitzer	2024-06-13 13:36:39 +00:00
rzou	6412c6060c	[reland] Refresh OpOverloadPacket if a new OpOverload gets added (#128000 ) If a user accesses an OpOverloadPacket, then creates a new OpOverload, then uses the OpOverloadPacket, the new OpOverload never gets hit. This is because OpOverloadPacket caches OpOverloads when it is constructed. This PR fixes the problem by "refreshing" the OpOverloadPacket if a new OpOverload gets constructed and the OpOverloadPacket exists. Test Plan: - new tests This is the third land attempt. The first one was reverted for breaking internal tests, the second was reverted for being erroneously suspected of causing a perf regression. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128000 Approved by: https://github.com/albanD	2024-06-05 17:57:09 +00:00
Sam Larsen	82a370ae3a	Revert "Refresh OpOverloadPacket if a new OpOverload gets added (#126863 )" (#127366 ) This reverts commit ed734178abc99bc1d83ad2c61d3a1e4d4f5d20c8. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127366 Approved by: https://github.com/zou3519	2024-05-29 19:26:06 +00:00
rzou	28de9143a3	opcheck should be usable without optional dependencies (#127292 ) This PR excises opcheck's dependency on torch.testing._internal.common_utils, (which comes with dependencies on expecttest and hypothesis). We do this by moving what we need to torch.testing._utils and adding a test for it. Fixes #126870, #126871 Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/127292 Approved by: https://github.com/williamwen42 ghstack dependencies: #127291	2024-05-29 17:17:49 +00:00
William Wen	5359af0c7e	[dynamo] wrap GraphModule exceptions in dynamo-wrapped tests (#126341 ) Better approach to https://github.com/pytorch/pytorch/pull/126197 to catch issues like https://github.com/pytorch/pytorch/issues/125568. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126341 Approved by: https://github.com/anijain2305, https://github.com/jansel	2024-05-29 05:18:04 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Xuehai Pan	a28bfb5ed5	[4/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort functorch (#127125 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127125 Approved by: https://github.com/Skylion007 ghstack dependencies: #127122, #127123, #127124	2024-05-25 22:45:38 +00:00
Richard Zou	f8857cef45	[Reland] Verify types in custom op schemas (#126861 ) Summary: co-dev reland of https://github.com/pytorch/pytorch/pull/124520, which requires the removal of some executorch tests. Before this PR, we didn't check that types in a schema were valid. This is because TorchScript treats unknown types as type variables. This PR checks types in a schema for the TORCH_LIBRARY APIs. To do this, we add an `allow_typevars` flag to parseSchema so that TorchScript can use allow_typevars=True. We also add some error messages for common mistakes (e.g. using int64_t or double in schema). Test Plan: Wait for tests Differential Revision: D57666659 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126861 Approved by: https://github.com/albanD	2024-05-23 19:53:52 +00:00
rzou	ed734178ab	Refresh OpOverloadPacket if a new OpOverload gets added (#126863 ) If a user accesses an OpOverloadPacket, then creates a new OpOverload, then uses the OpOverloadPacket, the new OpOverload never gets hit. This is because OpOverloadPacket caches OpOverloads when it is constructed. This PR fixes the problem by "refreshing" the OpOverloadPacket if a new OpOverload gets constructed and the OpOverloadPacket exists. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/126863 Approved by: https://github.com/albanD	2024-05-22 14:13:27 +00:00
Brian Hirsh	f25c7c9699	functionalize storage resizing, minimal ppFSDP traceable forward (#122434 ) More details further down, but first a more high-level description of "how do we functionalize storage resizing" Today, dynamo converts `param.untyped_storage().resize_(x)` calls that it sees from fsdp into a custom op, `ops.inductor.resize_storage_bytes_(x)` So given this setup, there are 3 main cases that I think we want to handle: (1) graph input starts with a real storage size, gets resized down to zero in the graph (2) graph input starts with 0 storage size, gets resized up in the graph (3) graph input starts with 0 storage size, gets resized up and used in some compute, then resized back down to 0 For case (1) we need to emit a `resize_storage_bytes_` at the end of the graph, similar to how we emit `copy_()` for data mutations. For case (2), we need to emit a `resize_storage_bytes_` in the graph, and we also need to emit a `copy_()` (the input had its storage resized up, and filled in with data, which is we need to reflect as an input mutation) For case (3), the net effect is that the input had no data on entry and exit of the function, so we don't need to emit any mutable ops in the end of the graph. The main thing to call out is that: we need to write a functionalization rule for `resize_storage_byte_`, (`FunctionalTensorWrapper::storage_resize_()`) and this rule actually does very little. We would like to not emit any new ops in the graph (like say, a functional resize op). Instead, we should expect / rely on the fact that any resize up will be immediately followed by a `copy_()`/`foreach_copy_`/`out=` op, that will fill in the data of the tensor. So `FunctionalTensor` can temporarily live in a state where its data is invalid, until the `x.copy_(y)` "updates" its data with the new tensor. So effectively, all that this rule does is: (1) it stores metadata on the storage, indicating that the tensor was resized, as well as the updated storage size. We need this info in AOTAutograd, so it knows whether to emit a mutable resize_() op in the graph epilogue (2) There is also a corner case: if we are resizing down to zero, but our tensor had previously had a zero size storage, then we update `value_` to point to the original value of the tensor. The reason this seems safe is because if we have a zero storage sized tensor `x`, and we resize it up, use it in some compute, resize it back down to zero, and use it somewhere, we would want the functional version of this code to use the original `x` after the second resize. For FSDP, this is important because we end up saving parameters (graph inputs) for backward, and we want to make sure that the thing we save (and the output to the forward graph) is the original, zero-storage-sized parameter, and not the "version 2" of the parameter after the first resize_() I think a good order to look at changes in this PR would be: (1) `test_aotdispatch.py` shows the 3 main cases I focused on as well as the expected functionalized graphs (2) In `FunctionalStorageImpl.h/cpp`, I had to add a notion of "original base", and "original/curr_size". The first is so I can re-use the zero-size tensor after multiple resizes, and the second is so I can tell in AOTAutograd whether any resizes canceled each other out into a no-op (3) FunctionalTensorWrapper.h/cpp has the new resize functionalizion rule + some extra utils (4) `_functorch/_autograd`: the main changes in this folder were around adding the logic at trace-time to detect when we need to put a resize_() in the graph. I also have some assertions to check that any inputs that experience storage resizing will always be in the graph and not the opaque epilogue, and I also limited the resize_() mutation case so that you can only ever start with zero storage, or end with zero storage (you can't do e.g. `torch.ones(2).storage().resize_(3)`), and banned it on tensor subclasses (5) `fake_tensor.py`/`meta_utils.py`: we now need to be able to fakeify tensors with zero storage, so I added a quick version of it in meta_utils.py. This also.. has ramifications for fake tensor caching that I need to fix (include the storage size on the cache key, maybe?) ------------------ This PR subsumes https://github.com/pytorch/pytorch/pull/120971. This PR is enough to almost get a simple ppFSDP forward pass tracing with a functionalized resize_() properly. It also attempts to do the updated version from @jansel, where we don't have any notion of `resize_()` in the graph at all, post functionalization. It would probably be good to test it with @yf225 's FSDP changes, and see how many of the FX passes it allows us to remove. I think that in theory, it should allow us to remove all FX passes that affect the forward graph / partitioner, except the one that forces views to be recomputed in the backward (more details below). There are a few things worth calling out: (1) failed attempt at functionalizing `aten.copy_()`. I originally wanted to get a version takes these operations: ``` param.storage().resize_(all_gather_size) param.copy_(all_gather_buffer) out = aten.matmul(param, param) ``` and functionalizes them into: ``` out = aten.matmul(all_gather_buffer, all_gather_buffer) ``` This would involve getting functionalization to turn `x.copy_(y)` into a giant no-op that just returns `y`. Unfortunately, we can't actually do this in a reasonable way within functionalization (instead, there's a functional `aten.copy` in the graph - see the test case graph expecttest for details). Why? In order for that transformation to be safe, `x` and `y` need to have the same metadata. However, it's possible for `x` and `y` to be subclasses of different types. This is not something we can easily tell from within functionalization, and would be a layering violation. So for now I'm leaving it to downstream code to optimize away the `aten.copy` (this is already the case today, so I think inductor can handle this) (2) The forward doesn't actually run successfully in this PR (see the `assertRaisesRegex` in the test). Why? The final forward graph looks like this: ``` def forward(self, primals_1, primals_2): _foreach_copy = torch.ops.aten._foreach_copy.default([primals_1], [primals_2]); primals_2 = None getitem = _foreach_copy[0]; _foreach_copy = None mm = torch.ops.aten.mm.default(getitem, getitem); getitem = None t_1 = torch.ops.aten.t.default(primals_1); primals_1 = None return [mm, t_1] ``` Where `primals_1` starts out as a secretly-zero-storage-size parameter, and gets resized up and back down within the forward (these are functionalized away). Importantly, the matmul happy on the result of the `foreach_copy`, but the activation that we save for backward (`t_1`) is the result of transposing the original parameter (the zero-storage-size param). This is exactly the optimization in fsdp that allows us to have good peak memory usage. The problem is that the min-cut partitioner decides to save `t_1` for backward. Running this code in eager breaks, because the kernel for `aten.permute(x)` is not happy when `x` has secretly-zero-sized-storage. The real problem here is that in eager mode the `permute` kernel runs during the backward, after backward hooks have properly resized the saved activation. Here, we are running the transpose in the forward. One option would be to turn off the checks in our view kernels and allow them to work on zero-storage-sized tensors, which feels pretty bad. Another option is to tweak the partitioner (or use one of Will's FX passes) to force the partitioner to not save views for backward, and allow the views to be recomputed in the backward. This seems kind of silly, but is also probably harmless. (3) The backward is still broken. To be fair, this issue is pretty separable from "functionalizing storage resize calls", and can be fixed later (either by a real fix to our tracing infra, or via another hacky FX pass). More description of this problem is described at issue (8) of my PR description in https://github.com/pytorch/pytorch/pull/120971 (4) I only added support for "full graph" resizing: basically, the limited case where a param starts with zero storage size, and gets resized up and back down. I think we can add support for the graph break case, but I think we can keep that add-on separate from this PR unless we need it immediately. I also added asserts so we should fail loudly when we hit this case (5) I have a change to FakeTensor creation when inputs have zero storage size that.. is probably ok. But I also removed FakeTensor caching on view ops, which I probably need to fix before I can land this PR (6) I added a notion of "original_base" to `FunctionalStorageImpl`. More details are in the comments, but my rational for this was that we basically need it to ensure that autograd saves the original, zero-storage-sized param for backward, after resizing up and back down (7) I had to update our eager kernels for `aten.copy` and `aten._foreach_copy`, to handle the case where the `self` argument has secretly-zero-storage. Inductor can probably generate correct code for this case, but we need these ops to work properly in this situation for the `aot_eager` backend to do the right thing Pull Request resolved: https://github.com/pytorch/pytorch/pull/122434 Approved by: https://github.com/jansel	2024-05-10 18:09:10 +00:00
rzou	c6b7504d47	Fix torch.library.register_fake's module reporting (#125037 ) torch.library.register_fake reports the python module the fake impl is located in. This is used to check against `m.set_python_module("foo.bar")` calls in C++. The module reporting logic was wrong in most cases. This PR fixes it. Test Plan: - exhaustive tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/125037 Approved by: https://github.com/williamwen42	2024-04-26 20:53:33 +00:00
PyTorch MergeBot	35a82d4a4a	Revert "Refresh OpOverloadPacket if a new OpOverload gets added (#124654 )" This reverts commit 872eeb0d7deebb58915289756d8c786f68630547. Reverted https://github.com/pytorch/pytorch/pull/124654 on behalf of https://github.com/jeanschmidt due to Broken lots of internal signals, check D56571345 for more details ([comment](https://github.com/pytorch/pytorch/pull/124654#issuecomment-2078940680))	2024-04-26 08:56:03 +00:00
PyTorch MergeBot	a46c27d961	Revert "Verify types in custom op schemas (#124520 )" This reverts commit 141888765bba129914448a9609ad5e182778cbdc. Reverted https://github.com/pytorch/pytorch/pull/124520 on behalf of https://github.com/jeanschmidt due to Breaking internal tests check D56588015 for more details ([comment](https://github.com/pytorch/pytorch/pull/124520#issuecomment-2078917978))	2024-04-26 08:42:11 +00:00
rzou	4e340a7f8b	[custom_op] setup_context fills in default values (#124852 ) This is to mirror autograd.Function's setup_context behavior. The PyTorch Dispatcher removes default values for "FC/BC reasons", but I convinced myself there's no FC/BC problem for the setup_context API. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/124852 Approved by: https://github.com/albanD ghstack dependencies: #124637, #124805, #124806	2024-04-25 04:22:01 +00:00
Edward Z. Yang	9692b954c6	FakeTensorProp works with unbacked bindings (#124310 ) This is a partial revert of https://github.com/pytorch/pytorch/pull/124059 Like in #124297, profiling has revealed that testing equality on every output is kind of expensive. So we only test equality when we know there is an unbacked binding. This is the same playbook as the previous PR, just on FakeTensorProp instead of PropagateUnbackedSymInts. Note that we also need to populate `unbacked_bindings` in proxy_tensor.py, since we're generating an entirely new graph in that case. We now have enough propagation that we're able to trigger a bug related to divisibility replacement. In https://github.com/pytorch/pytorch/pull/113165 we allowed to replace `u0` with `u1 * c` for some constant c, when we have determined that u0 is divisible by c. However, where does the binding for u1 come from? What we will have in practice is that there is some node that is supposed to have bound u1, but which actually is getting a `u1 * c` in its output. So, to get u1, we must divide out c. Fortunately, under the divisibility condition, this is always possible (but remember, we must test divisibility at runtime!) Because we have tightened up asserts, it is now an error to allocate unbacked SymInts and then fail to track them under unbacked_bindings. In torch/_dynamo/eval_frame.py and torch/_functorch/_aot_autograd/collect_metadata_analysis.py there are examples of benign cases where we repropagated fake tensors but then immediately threw away the results. In these cases, it's not appropriate to rebind, since we're still using the old FX graph that has all of the old symbols. So we just manually clear it. It is possible that other cases will need to be updated, so this PR is "risky" from the perspective of hitting fbcode. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124310 Approved by: https://github.com/lezcano	2024-04-25 02:08:51 +00:00
rzou	141888765b	Verify types in custom op schemas (#124520 ) Before this PR, we didn't check that types in a schema were valid. This is because TorchScript treats unknown types as type variables. This PR checks types in a schema for the TORCH_LIBRARY APIs. To do this, we add an `allow_typevars` flag to parseSchema so that TorchScript can use allow_typevars=True. We also add some error messages for common mistakes (e.g. using int64_t or double in schema). Test Plan: - new tests Differential Revision: [D56432690](https://our.internmc.facebook.com/intern/diff/D56432690) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124520 Approved by: https://github.com/albanD	2024-04-25 01:56:58 +00:00

1 2 3 4

196 Commits