pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
James Wu	b54e466fd0	Megacache integration (#163533 ) This diff adds megacache integration for DynamoCache. Because DynamoCache requires lazy serialization, i.e. it can only be serialized once all relevant backends have been compiled and we're ready for a save, we actually do the DynamoCache saving only on a call to `torch.compiler.save_cache_artifacts`. Differential Revision: [D82735763](https://our.internmc.facebook.com/intern/diff/D82735763/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163533 Approved by: https://github.com/oulgen, https://github.com/zhxchen17	2025-10-15 22:49:15 +00:00
Yuanyuan Chen	a43c4c3972	[5/N] Apply ruff UP035 rule (#164423 ) Continued code migration to enable ruff `UP035`. Most changes are about moving `Callable` from `typing` to `from collections.abc`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164423 Approved by: https://github.com/ezyang	2025-10-02 07:31:11 +00:00
James Wu	eb9073a6b7	[easy] [precompile] Convert CompileArtifacts to callable (#162169 ) The goal of this PR stack is to be able to implement `aot_compile_module`, which AOT precompiles a torch.nn.Module. Step 1 is a simple refactor to make CompileArtifacts itself the callable, which makes it easier to use directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162169 Approved by: https://github.com/zhxchen17	2025-09-07 23:37:31 +00:00
zhxchen17	c36d18d7e8	[rfc] aot precompile with custom backend api (#161383 ) Adding a new feature to torch.compile(fullgraph=True) which "aot_compile" a function with given example inputs. On user side it should look like: ``` def foo(x, y): return x + y compiled_fn = torch.compile(fullgraph=True).aot_compile(((torch.randn(3, 4), torch.randn(3, 4)), {})) ``` This is different from the traditional `torch.compile` workflow where compiled object will be a drop-in replacement for the original eager model: ``` tensor input -> torch.compile() -> tensor output (and populates the cache entry) ``` `aot_compile` will instead return a compiled function as result, and it's purely functional and doesn't populate the compile cache entry in dynamo: ``` tensor input -> aot_compile() -> compiled function ``` The aot compiled function will be savable and loadable on disk as well: ``` torch.compile(fullgraph=True).aot_compile(...).save_compiled_function('my/path') compiled_fn = torch.compiler.load_compiled_function("my/path") ``` Right now we treat compiler backend as a blackbox and it needs to implement the following interface to make compile artifacts serialzable: ``` class SerializableCallable: def save_compile_artifacts(): .... def load_compile_artifacts(): .... ``` We haven't implemented this for inductor yet, but this shouldn't be an issue since we gate this feature through `torch._dynamo.config.aot_compile` (which defaults to False), and this will be left as follow up PR to the current PR. Differential Revision: [D80914270](https://our.internmc.facebook.com/intern/diff/D80914270/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161383 Approved by: https://github.com/tugsbayasgalan	2025-08-27 21:26:25 +00:00
Lucas Kabela	b0e325c2c8	[Dynamo][Better Engineering] Add type coverage to decorators (#158509 ) As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo This PR adds strict typing support to an important file in dynamo, `decorators.py` NOTE: Untyped fns are because there is a conflict with `__init__.py` in compiler so we can't type these at this time Running ``` mypy torch/_dynamo/decorators.py --linecount-report /tmp/coverage_log ``` \| -------- \| Lines Unannotated \| Lines Total \| % lines covered \| Funcs Unannotated \| Funcs Total \| % funcs covered \| \| -------- \| ------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| Main \| 209 \| 908 \| 23.02% \| 9 \| 39 \| 23.08% \| \| This PR \| 870 \| 943 \| 100.00% \| 36 \| 39 \| 100.00% \| \| Delta \| +661 \| +35 \| +76.98% \| +27 \| 0 \| +76.92% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/158509 Approved by: https://github.com/williamwen42	2025-07-17 23:31:26 +00:00
Lucas Kabela	a4d753295e	[Dynamo][Better Engineering] Add enhanced typing support to `_dynamo/eval_frame.py` (#158276 ) As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo This PR adds strict typing support to the main entrypoint for dynamo, `eval_frame.py` Running ``` mypy torch/_dynamo/eval_frame.py --linecount-report /tmp/coverage_log ``` \| -------- \| Lines Unannotated \| Lines Total \| % lines covered \| Funcs Unannotated \| Funcs Total \| % funcs covered \| \| -------- \| ------- \| -------- \| ------- \| ------- \| ------- \| ------- \| \| Main \| 623 \| 2232 \| 27.91% \| 19 \| 68 \| 27.94% \| \| This PR \| 2285 \| 2285 \| 100.00% \| 68 \| 68 \| 100.00% \| \| Delta \| +1662 \| +63 \| +72.09% \| +49 \| 0 \| +72.06% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/158276 Approved by: https://github.com/williamwen42 Co-authored-by: William Wen <williamwen@meta.com>	2025-07-16 23:31:10 +00:00
Xuehai Pan	3fd84a8592	[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format` (#144554 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144554 Approved by: https://github.com/soulitzer	2025-07-03 18:56:07 +00:00
Edward Z. Yang	17eb649d55	Implement guard collectives (optimized version) (#156562 ) This is a remix of https://github.com/pytorch/pytorch/pull/155558 Instead of mediating guard collective via a config option, in this one it's done via a `set_stance` like API. The motivation is that checking for the config value on entry on torch.compile is apparently quite expensive, according to functorch_maml_omniglot. So this makes it a bit cheaper. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/156562 Approved by: https://github.com/Microve	2025-06-24 04:59:49 +00:00
Animesh Jain	fab85fc5f9	[compile][hierarchical compilation] Release nested_compile_region API (#156449 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156449 Approved by: https://github.com/zou3519, https://github.com/jansel	2025-06-21 15:14:59 +00:00
Animesh Jain	54976bca10	[dynamo] Provide helper functions for guard filter hook (#155083 ) Collection of ready-made guard filters. One issue is that they are not composable - `filter1(filter2(guard))`. On the other hand, they are easy to use. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155083 Approved by: https://github.com/zhxchen17, https://github.com/jansel	2025-06-15 17:49:36 +00:00
James Wu	3819584f12	[precompile] Implement PrecompileContext for recording precompile artifacts, integrate with CompilePackage (#154415 ) This PR implements a basic interface and test for PrecompileContext, a special CacheArtifactManager specifically designed for precompile. The job of a PrecompileContext is to record things precompile needs as torch is compiling, dump it all into bytes, and then stitch it back together into a cache of callables. ## Why use CacheArtifactManager? Precompile needs a way to record various serializable data as torch is compiling. CacheArtifactManager already does this today pretty well, handling a lot of serialization and cache information. So we're reusing a bunch of that infrastructure directly. ## How is it different from CacheArtifactManager? Unlike regular CacheArtifactManager, PrecompileContext needs to be able to take the recorded artifacts and stitch them together after deserialization, to create a single working callable. Since PrecompileContext doesn't need the cache keys, the "key" field of PrecompileArtifacts can be used for metadata relating to how to stitch the individual functions being compiled together into a full callable. For example, on a given dynamo compile, if there are multiple functions (via graph breaks or recompiles) being compiled, MegaCache would organize it like so: ![image](https://github.com/user-attachments/assets/49a0a75b-1e7f-4d96-8d81-6769fe5a53ca) Whereas we'd visualize PrecompileContext's result like so: ![image](https://github.com/user-attachments/assets/fcc0dd4e-dfbf-4b13-9c08-2e99b373180b) For now, we just handle eager mode; in the diff above, I'll hook up the other backend artifacts from PrecompileContext. After this PR, precompile consists of three main interfaces: ### CompilePackage - Everything needed to run one torch.compile'd function (including graph breaks) - `__init__(fn, cache_entry)` Initializes with a DynamoCacheEntry - `install(backends)` load precompile artifacts into function's dynamo state with a dictionary of backends - `cache_entry()` return a serializable cache entry to save ### DynamoStore - Responsible for tracking CompilePackages on disk (and/or in memory) - `load_package(path)`: load a package given a torch compiled function and a path to the cache artifact - `save_package(package, path): Save a CompiledPackage to a path. Calls PrecompileContext to grab backend data - `record_package(package)`: Record a package to PrecompileContext (for global serialization/deserialization) ### PrecompileContext - Overarching context for serializing and deserializing precompile artifacts. Supports global and local setups. - `serialize()`: (Global) serializes all artifacts in PrecompileContext into bytes - `populate_caches(bytes)`: (Global) takes serialized bytes and puts them into DynamoStore (TODO) - `serialize_artifact_by_key(key)`: (Local) serialize a single artifact by its cache key <img width="1455" alt="image" src="https://github.com/user-attachments/assets/99b61330-7607-4763-bdbc-85b366e82cdd" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/154415 Approved by: https://github.com/zhxchen17 ghstack dependencies: #155118	2025-06-13 14:11:24 +00:00
bobrenjc93	984b1a80e3	[ez] add docs for *eager_then_compile stances (#154818 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154818 Approved by: https://github.com/williamwen42 ghstack dependencies: #154802, #154826, #154822, #154823, #154805	2025-06-02 19:04:35 +00:00
William Wen	25eff6e991	[dynamo] add reason field to torch.compiler.disable (#150341 ) Implements https://github.com/pytorch/pytorch/issues/146445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150341 Approved by: https://github.com/zou3519, https://github.com/jansel	2025-04-02 04:26:48 +00:00
bobrenjc93	2dcdb4ba78	[ez] include config as part of __all__ in torch.compiler (#148978 ) Right now we are susceptive to a race condition where if the torch.compiler.config is not implicitly import via dynamo/builder.py, we will throw an error when trying to set compiler configs. This fixes it by including config in `__all__`. Previous ``` >>> import torch >>> torch.compiler.config.dynamic_sources = "L['kwargs']['float_features']" Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'torch.compiler' has no attribute 'config' >>> torch.compiler.config.dynamic_sources = "L['kwargs']['float_features']" Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'torch.compiler' has no attribute 'config' ``` Now ``` >>> import torch >>> torch.compiler.config.dynamic_sources = "L['kwargs']['float_features']" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148978 Approved by: https://github.com/bdhirsh, https://github.com/laithsakka	2025-03-11 21:58:38 +00:00
Aaron Orenstein	db4ce78d46	PEP585: More UP006 fixes (#146392 ) This should be the final PR before we can enable RUFF UP006. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392 Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007	2025-02-20 06:18:13 +00:00
Nikita Shulga	95ff9f0340	[Doc] Add period at the end of the sentence (#145384 ) Test plan: https://docs-preview.pytorch.org/pytorch/pytorch/145384/generated/torch.compiler.disable.html#torch-compiler-disable Fixes https://github.com/pytorch/pytorch/issues/145365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145384 Approved by: https://github.com/huydhn, https://github.com/svekars, https://github.com/kit1980	2025-01-22 19:56:31 +00:00
Aaron Orenstein	805c4b597a	PEP585 update - torch/_higher_order_ops torch/_subclasses torch/backends torch/compiler torch/cuda torch/masked torch/mtia torch/nested (#145202 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145202 Approved by: https://github.com/bobrenjc93	2025-01-20 22:37:26 +00:00
Aaron Orenstein	a79100ab11	PEP585 update - torch/_dynamo (#145105 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105 Approved by: https://github.com/bobrenjc93	2025-01-18 20:47:11 +00:00
Oguz Ulgen	9ee242213b	[RFC] Introduce cache hot loading APIs (a.k.a. "Mega-cache") (#143341 ) This PR essentially introduces two new APIs * torch.compiler.save_cache_artifacts * torch.compiler.load_cache_artifacts which aim to create a mega cache experience where the user can start collecting cache artifacts, and later call the save API to fetch them. In the next attempt, the user can "hot load" the cache artifacts via the load function. This bundling approach reduces the need to rely on porting individual files one by one, or relying on many network requests. Note that these APIs CANNOT log to structured logging as these functions will be called before and after compilation, as opposed to during compilation. Due to this limitation, the API returns a struct that the user can log with. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143341 Approved by: https://github.com/jansel	2025-01-07 23:13:24 +00:00
yijun-lee	d4609af1ca	Propagate callable parameter types using ParamSpec (#142306 ) (#144047 ) Fixes #142306 This PR includes typing improvements and refactoring for the following files: - __init__.py - decorators.py - _ops.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/144047 Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>	2025-01-06 16:16:18 +00:00
Yidi Wu	1e201422ed	[export] add is_exporting flag (#142425 ) We added an is_export flag under torch.compiler.is_exporting. This comes handy when we try to do some special logic in user-level and system-level (e.g. in upper of the stack). In increasing-scope: - `_is_fx_tracing` is set to True when we use under symbolic_trace or make_fx. - `is_exporting` is set to True when we're doing strict or non-strict export, which internally has a step that calls make_fx and set _is_fx_tracing to be True. - `is_compiling` is set to True when we're either doing strict, non-strict export or torch.compile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142425 Approved by: https://github.com/avikchaudhuri	2024-12-18 21:36:28 +00:00
Animesh Jain	fb529c2c84	[dynamo] skip_guard_eval_unsafe stance for power users (#140251 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140251 Approved by: https://github.com/jansel ghstack dependencies: #140223, #140250	2024-11-21 06:28:58 +00:00
William Wen	73a153b931	[dynamo] add compiler.set_stance raw function call test and doc example (#138276 ) Followup to https://github.com/pytorch/pytorch/pull/137504#issuecomment-2420107198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138276 Approved by: https://github.com/anijain2305, https://github.com/jansel	2024-10-18 02:54:22 +00:00
William Wen	5b7f4767ff	Fix https://github.com/pytorch/pytorch/issues/138062 (#138137 ) Fixes https://github.com/pytorch/pytorch/issues/138062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138137 Approved by: https://github.com/mlazos	2024-10-17 07:12:15 +00:00
William Wen	4c8718d8e7	[dynamo] add torch.compiler.set_stance (#137504 ) Attempt # 2 at https://github.com/pytorch/pytorch/pull/132926 to implement https://github.com/pytorch/pytorch/issues/123771. Implement a new `torch.compiler.set_stance` function that can force `torch.compile` regions to run eagerly. See added tests for usage examples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137504 Approved by: https://github.com/yf225, https://github.com/jansel	2024-10-16 16:18:25 +00:00
Jane Xu	eaec72d1e6	Link directly to new Custom Ops Landing Page (#137933 ) e.g., click on first link in https://docs-preview.pytorch.org/pytorch/pytorch/137933/library.html#testing-custom-ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/137933 Approved by: https://github.com/zou3519	2024-10-15 21:18:21 +00:00
Xuehai Pan	e09324e7da	[dynamo] simplify polyfill registration for `builtins.all` and `builtins.any` (#133769 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133769 Approved by: https://github.com/jansel	2024-08-29 20:56:16 +00:00
Xuehai Pan	c95ddd4bf2	[dynamo] ensure polyfill function has the same signature as the original function in `substitute_in_graph` (#133813 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133813 Approved by: https://github.com/jansel	2024-08-22 16:38:06 +00:00
Xuehai Pan	022cd7c9aa	[RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712 ) Add decorator `torch.compiler.substitute_in_graph` to register polyfill for unsupported C++ function to avoid graph break. This API provides an official way to add support for dynamo for third-party C extensions. Also, it can be used to simplify our implementation for `torch._dynamo.polyfill`. `5ee070266f/torch/_dynamo/variables/builtin.py (L97-L107)` Example: ```python >>> import operator >>> operator.indexOf([1, 2, 3, 4, 5], 3) 2 >>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3) Unsupported: ... >>> @torch.compiler.substitute_in_graph(operator.indexOf) ... def indexOf(sequence, x): ... for i, item in enumerate(sequence): ... if item is x or item == x: ... return i ... raise ValueError("sequence.index(x): x not in sequence") >>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3) 2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133712 Approved by: https://github.com/jansel	2024-08-21 06:36:41 +00:00
PyTorch MergeBot	15b5a0b67f	Revert "[RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712 )" This reverts commit 71dd52f51a05d110c06e83f74cef165f64627842. Reverted https://github.com/pytorch/pytorch/pull/133712 on behalf of https://github.com/ZainRizvi due to breaking main windows cpu tests - this stack still causes that windows test to fail ([comment](https://github.com/pytorch/pytorch/pull/133712#issuecomment-2299776241))	2024-08-20 21:14:45 +00:00
Xuehai Pan	71dd52f51a	[RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712 ) Add decorator `torch.compiler.substitute_in_graph` to register polyfill for unsupported C++ function to avoid graph break. This API provides an official way to add support for dynamo for third-party C extensions. Also, it can be used to simplify our implementation for `torch._dynamo.polyfill`. `5ee070266f/torch/_dynamo/variables/builtin.py (L97-L107)` Example: ```python >>> import operator >>> operator.indexOf([1, 2, 3, 4, 5], 3) 2 >>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3) Unsupported: ... >>> @torch.compiler.substitute_in_graph(operator.indexOf) ... def indexOf(sequence, x): ... for i, item in enumerate(sequence): ... if item is x or item == x: ... return i ... raise ValueError("sequence.index(x): x not in sequence") >>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3) 2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133712 Approved by: https://github.com/jansel	2024-08-20 19:48:57 +00:00
PyTorch MergeBot	2bd02e0c82	Revert "[RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712 )" This reverts commit 641724ed1daad1e6fc2525cc6858d199e576d5cd. Reverted https://github.com/pytorch/pytorch/pull/133712 on behalf of https://github.com/jeanschmidt due to breaking main windows cpu tests - reverting them all, so we can identify the culprit with more calmness ([comment](https://github.com/pytorch/pytorch/pull/133712#issuecomment-2298528797))	2024-08-20 10:34:41 +00:00
Xuehai Pan	641724ed1d	[RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712 ) Add decorator `torch.compiler.substitute_in_graph` to register polyfill for unsupported C++ function to avoid graph break. This API provides an official way to add support for dynamo for third-party C extensions. Also, it can be used to simplify our implementation for `torch._dynamo.polyfill`. `5ee070266f/torch/_dynamo/variables/builtin.py (L97-L107)` Example: ```python >>> import operator >>> operator.indexOf([1, 2, 3, 4, 5], 3) 2 >>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3) Unsupported: ... >>> @torch.compiler.substitute_in_graph(operator.indexOf) ... def indexOf(sequence, x): ... for i, item in enumerate(sequence): ... if item is x or item == x: ... return i ... raise ValueError("sequence.index(x): x not in sequence") >>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3) 2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133712 Approved by: https://github.com/jansel	2024-08-19 22:14:33 +00:00
Aaron Orenstein	62bcdc0ac9	Flip default value for mypy disallow_untyped_defs [4/11] (#127841 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127841 Approved by: https://github.com/oulgen	2024-06-08 18:36:48 +00:00
rzou	08653fe355	Beef up the allow_in_graph docs (#127117 ) We make the following changes: - most of the time when someone uses allow_in_graph, they actually wanted to make a custom op. We add a link to the custom ops landing page and explain the differences between allow_in_graph and custom ops. - we warn people against using allow_in_graph footguns and document them. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/127117 Approved by: https://github.com/jansel, https://github.com/albanD	2024-06-02 15:00:46 +00:00
Aaron Orenstein	4e2b4c6ed6	Fix broken docs (#124940 ) These were causing doctest to be unhappy. In particular the doc from #124496 caused #124771 to fail "trunk / win-vs2019-cpu-py3 / test" to fail when pushing. Not sure why it wasn't a problem on the original PR. Testing: `./test/run_doctests.sh`: before: ``` === 4 warnings in 11.21 seconds === ``` after: ``` === in 11.11 seconds === ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124940 Approved by: https://github.com/zou3519, https://github.com/atalman, https://github.com/huydhn	2024-04-26 19:24:52 +00:00
Oleg Khabinov	4b18ab869f	[torch.export] Support is_compiling() flag for non-strict mode (#119602 ) Summary: In non-strict mode of torch.export() we didn't set those `is_compiling()` to `True` which is needed by some models. Test Plan: Unit tests and manual testing. Differential Revision: D53624452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119602 Approved by: https://github.com/suo	2024-02-29 05:52:51 +00:00
lezcano	b18d8d4595	Add a wrapper to transform a NumPy function into a PyTorch function (#114610 ) A less general version of this wrapper was used in the keynote on `torch.compile(numpy)`. We expose a generic version of the wrapper that works seamlessly with `torch.compile`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114610 Approved by: https://github.com/albanD	2024-01-02 18:35:29 +00:00
Carlos Mocholí	c847fd2ac8	Fix `torch.compiler.cudagraph_mark_step_begin` example (#112807 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/112807 Approved by: https://github.com/eellison	2023-11-07 04:15:31 +00:00
eellison	7fe51e3e9b	Add cudagraph_mark_step_begin in torch.compiler, reference in error message (#111722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111722 Approved by: https://github.com/ezyang, https://github.com/msaroufim	2023-10-25 21:53:21 +00:00
Mark Saroufim	ea384cd377	torch.compiler public namespace (#102182 ) # torch.compiler public API ## Goal The goal of this document is to describe the public facing API for torchdynamo and torchinductor. Today both dynamo and torchinductor are in `torch/_dynamo` and `torch/_inductor` namespace with the only public function `torch.compile()` which is directly placed in `torch/__init__.py` This poses a few problems for users trying to take dependencies on PyTorch 2.0 1. Unclear BC guarantees 2. No builtin discovery mechanism outside of reading the source code 3. No hard requirements for docstrings or type annotations Most importantly it mixes two personas the PyTorch 2.0 developer vs the PyTorch 2.0 customer so this is an attempt to address this. We draw a lot of inspiration from the `functorch` migration to the `func` namespace. ## Alternate names We did discuss some other alternative names 1. `torch.compile` -> problem is this would break BC on the existing `torch.compile` function 2. `torch.dynamo` -> `dynamo` is so far not something we've deliberately hidden from users but problem is now figuring out what it's `_dynamo` vs `dynamo` might be confusing 3. `torch.compiler` -> 1 would be better but to keep BC this is a good compromise # The general approach ## Proposal 1 In https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py We have function called `reset()`, this function is essential if users are trying to `torch.compile()` a model under different settings ```python # in _dynamo/ def reset(): do_reset_stuff() ``` Instead we propose ```python # in compiler/ def reset(): do_reset_stuff() # As in copy paste the logic from _dynamo.reset # in _dynamo/ import warnings import inspect def reset(): function_name = inspect.currentframe().f_code.co_name warnings.warn(f"{function_name} is deprecated, use compiler.{function_name} instead", DeprecationWarning) return compiler.reset() ``` ## Proposal 2 ```python # in compiler/ def reset(): “”” Docstrings here “”” _dynamo.reset() # in _dynamo/ No changes ``` Consensus so far seems to be proposal 2 since fewer warnings will be less jarring and it’ll make it quite easy to merge the public API ## Docstrings The above was an example of a function that has no inputs or outputs but there are other functions which could use an improvement in their docstrings, for example allow_in_graph actually works over lists of functions but that’s not mentioned anywhere in the example only if you read the source code. def allow_in_graph(fn): """ Customize which functions TorchDynamo will include in the generated graph. Similar to `torch.fx.wrap()`. Parameters: fn (callable or list/tuple): The function(s) to be allowed in the graph. Returns: callable or list/tuple: The input function(s) included in the graph. Examples: Customize inclusion of a single function: :: torch._dynamo.allow_in_graph(my_custom_function) Customize inclusion of multiple functions: :: torch._dynamo.allow_in_graph([my_custom_function1, my_custom_function2]) @torch._dynamo.optimize(...) def fn(a): x = torch.add(x, 1) x = my_custom_function(x) x = torch.add(x, 1) return x fn(...) Notes: The `allow_in_graph` function allows customization of which functions TorchDynamo includes in the generated graph. It can be used to include specific functions that are not automatically captured by TorchDynamo. If `fn` is a list or tuple, `allow_in_graph` will be called recursively on each element in the sequence. Once a function is allowed in the graph using `allow_in_graph`, it will be captured in the graph generated by TorchDynamo. This customization enables more fine-grained control over the functions included in the graph. Note that `allow_in_graph` expects the input `fn` to be a callable. """ if isinstance(fn, (list, tuple)): return [allow_in_graph(x) for x in fn] assert callable(fn), "allow_in_graph expects a callable" allowed_functions._allowed_function_ids.add(id(fn)) allowed_functions._disallowed_function_ids.remove(id(fn)) return fn So to make the API public, we’d have to write similar docstrings for all public functions we’d like to create. The benefit of this approach is that 1. No BC risks, internal and external users relying on our tooling can slowly wean off the private functions. 2. We will also have to write correct docstrings which will automatically make our documentation easier to maintain and render correctly on pytorch.org 3. We already have some BC guarantees already, we don’t kill OptimizedModule, we rejected the PR to change the config system The con of this approach is that Will be stuck with some potentially suboptimal functions/classes that you can’t kill ## Testing strategy If the approach is to mostly make a public function call an already tested private function then all we need to do is ensure that the function signatures don't change ## Which functions should be in the public API Our heuristic for deciding whether something should be public or not is are users already relying on it for lack of other options or have we recommended some non public functions for users to debug their PT 2.0 programs. Heuristic for not putting something in public is that it’s an experimental subsystem with the goal of turning it on by default, it’s very core dev centric, meta centric, a bunch of different configs that should be batched into a single user facing one, or something that needs to be renamed because the name is confusing #### Top level `torch.compile()` -> already is a public API it does require some minor improvements like having configs be passed in to any backend and not just inductor (EDIT: This was already done https://github.com/pytorch/pytorch/pull/99645l) and renaming `mode=reduce-overhead` to `mode=cudagraph` To make sure that PT 2.0 is supported with a given pytorch version users can create a new public function and this would replace the need for `try/except` blocks around `import torch._dynamo` that has been populating user code. ```python def pt2_enabled(): if hasattr(torch, 'compile'): return True else: return False ``` For all of the below they will be translated to `torch.compiler.function_name()` #### From _dynamo As a starting point we looked at https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py and we suggest redefining these functions in `pytorch/torch/compiler/__init__.py` It might also make sense to split them over multiple files and import them in `__init__.py` but because the number of functions is small it'd probably be fine to add them all into a single compiler/__init__.py until this list becomes larger 1. `reset()` 2. `allow_in_graph()` 10. `list_backends()` 12. `compile()`: torch.compile() would be mostly a shell function passing arguments to torch.compiler.compile() 13. `assume_constant_result()`: TODO: Double check how this is useful 15. `torch._dynamo.disable()` Some notable omissions 11. `explain()`: We need to clean up the output for this function, make it a data class and pretty printable 1. `forbid_in_graph()`: Considered adding this but should instead consolidate on `disallow_in_graph` 2. `optimize_assert()`: Already covered by `torch.compile(fullgraph=True)` 3. `check_if_dynamo_supported()`: this would be supplanted by pt2_enabled() 4. `compilation_metrics`, `graph_breaks_reasons` ..: would all be accessed via `torch.compiler.explain()` 5. `replay` does not seem useful to end customers 6. . `graph_break()`: Mostly useful for debugging or unit tests 9. `register_backend()`: End users will just pass a string backend to torch.compile, only devs will create new backends 10. `export()` : Eventually this needs to public but for now it’s not ready so just highlighting that it will be in the public API eventually 11. `disallow_in_graph()`: Usage is limited 12. `mark_static()`: we can keep this private until dynamic=True is recommended in stable 13. `mark_dynamic()`: we can keep this private until dynamic=True is recommended in trunk 14. 8. `OptimizedModule`: This is the only class that we'd expose but is crucial since users are running code like `if isinstance(mod, OptimizedModule): torch.save(mod._orig_mod)` EDIT: because we fixed pickling we no longer need to expose this 15. `is_compiling()`: Still not clear how this useful to end users There are also config variables which we need to expose https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py Some of our configs are useful dev flags, others are to gate experimental functionality and others are essential debugging tools and we seperate out the essential debugging and logging tools to a public facing config. TODO: I still need to think of a good way of porting the config in a BC way here are some ideas 1. Just make all passes available and controllable via `torch.compile(options={})` but only show docstrings for the ones users should care about. The current problem with our config system is we have 3 ways of setting them once via `options={}`, environment variables and variables in `config.py`, it'd be worth settling on one source of truth and have that be the public API. The configs we should make public are 1. `log_file_name` 2. `verbose` 3. `cache_size_limit` 4. `repro_level` and `repro_after`: Although we can rename these to minifier and give human readable names to the levels Everything else should stay private in particular 1. `print_graph_breaks`, `print_specializations`: should be supplanted by `explain()` for public users 2. dynamic shape configs : Users should only have to worry about `torch.compile(dynamic=True/False)` 3. The distributed flags, hook or guard configs: If we tell a user to use FSDP and DDP then the flag should be enabled by default or be in a private namespace 4. The fbcode flags: Obviously no need to be user facing 5. Skip/Allow lists: Not something normal users should play around with #### From _inductor Very little of inductor should be exposed in a public facing API, our core audience as in people writing models mostly just need information on what certain passes mean and how to control them a high level and they can do this with `torch.compile(options={})` so the goal here should be more to make available passes clearer and ideally consolidate them into `torch.compile()` docstrings or modes. There are some exceptions though from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/__init__.py 1. `list_mode_options()` 2. `list_options()`: this needs an additional pass to hide internal or debug options For both of these we’d rename them to compiler.inductor_list_mode_options and compiler.inductor_list_options() since they would be in the same init file as the one for dynamo Notable omissions 1. `_inductor.compile()`: Because of users are coming in with their own fx graph, they are likely developers 2. `_inductor.aot_compile()`:Again this is about capturing and modifying fx graphs so users APIs don't need to be public However the configs are a slightly different story, because we can choose to either 1. Make all configs public 2. Make some configs public and keep most of the private ones. If public config is set it should override the private version 3. Make all configs controllable via `torch.compile(options={})` but make list_options() hide more things For now 3 seems like the most reasonable choice with some high level configs we’ll keep like TORCH_COMPILE_DEBUG Regardless here's what should probably be public or advertised more 1. `disable_progress` and verbose_progress: Combine and enable by default 2. `fallback_random`: We could make the case this shouldn't be public if a top level deterministic mode enables this 3. `profile_bandwidth`: Or could make the case that this should be in TORCH_COMPILE_DEBUG Notable omissions 1. Any config that would generally improve performance for most that we should probably enable by default but might be disabled in the short term because of stability: example `epilogue_fusion`, `pattern_matcher`, `reordering` 2. Autotuning flags: Should just sit behind `torch.compile(mode="max-autotune")` like `max_autotune`, `max_autotune_gemm` 3. `coordinate_descent_tuning`: This one I'm a but mixed about, maybe it just also fall into `mode="max-autotune"` 4. `trace`: `TORCH_COMPILE_DEBUG` is the best flag for all of this 5. `triton.cudagraphs`: Default should be `torch.compile(mode="reduce-overhead")` - I'd go further and rename the `mode=cudagraph` and we can keep reduce-overhead for BC reasons 6. `triton_unique_kernel_names`: Mostly useful for devs debugging 7. `dce`: which doesnt really do anything 8. `shape_padding`: Elias is working on enabling this by default in which case we also remove it ## Mechanics This PR would include the public functions with their docstrings Another PR will take a stab at the configs And for work where the APIs are still being cleaned up whether its minifier or escape hatches, export or dynamic shapes, aot_inductor etc.. we’ll keep them private until a public commitment can be made Pull Request resolved: https://github.com/pytorch/pytorch/pull/102182 Approved by: https://github.com/jansel, https://github.com/albanD	2023-06-13 19:52:17 +00:00
PyTorch MergeBot	258d398eec	Revert "torch.compiler public namespace (#102182 )" This reverts commit b5840f99c3f2ae01b7831fd32b99758180fc22c3. Reverted https://github.com/pytorch/pytorch/pull/102182 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/102182#issuecomment-1576144551))	2023-06-05 06:52:37 +00:00
Mark Saroufim	b5840f99c3	torch.compiler public namespace (#102182 ) # torch.compiler public API ## Goal The goal of this document is to describe the public facing API for torchdynamo and torchinductor. Today both dynamo and torchinductor are in `torch/_dynamo` and `torch/_inductor` namespace with the only public function `torch.compile()` which is directly placed in `torch/__init__.py` This poses a few problems for users trying to take dependencies on PyTorch 2.0 1. Unclear BC guarantees 2. No builtin discovery mechanism outside of reading the source code 3. No hard requirements for docstrings or type annotations Most importantly it mixes two personas the PyTorch 2.0 developer vs the PyTorch 2.0 customer so this is an attempt to address this. We draw a lot of inspiration from the `functorch` migration to the `func` namespace. ## Alternate names We did discuss some other alternative names 1. `torch.compile` -> problem is this would break BC on the existing `torch.compile` function 2. `torch.dynamo` -> `dynamo` is so far not something we've deliberately hidden from users but problem is now figuring out what it's `_dynamo` vs `dynamo` might be confusing 3. `torch.compiler` -> 1 would be better but to keep BC this is a good compromise # The general approach ## Proposal 1 In https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py We have function called `reset()`, this function is essential if users are trying to `torch.compile()` a model under different settings ```python # in _dynamo/ def reset(): do_reset_stuff() ``` Instead we propose ```python # in compiler/ def reset(): do_reset_stuff() # As in copy paste the logic from _dynamo.reset # in _dynamo/ import warnings import inspect def reset(): function_name = inspect.currentframe().f_code.co_name warnings.warn(f"{function_name} is deprecated, use compiler.{function_name} instead", DeprecationWarning) return compiler.reset() ``` ## Proposal 2 ```python # in compiler/ def reset(): “”” Docstrings here “”” _dynamo.reset() # in _dynamo/ No changes ``` Consensus so far seems to be proposal 2 since fewer warnings will be less jarring and it’ll make it quite easy to merge the public API ## Docstrings The above was an example of a function that has no inputs or outputs but there are other functions which could use an improvement in their docstrings, for example allow_in_graph actually works over lists of functions but that’s not mentioned anywhere in the example only if you read the source code. def allow_in_graph(fn): """ Customize which functions TorchDynamo will include in the generated graph. Similar to `torch.fx.wrap()`. Parameters: fn (callable or list/tuple): The function(s) to be allowed in the graph. Returns: callable or list/tuple: The input function(s) included in the graph. Examples: Customize inclusion of a single function: :: torch._dynamo.allow_in_graph(my_custom_function) Customize inclusion of multiple functions: :: torch._dynamo.allow_in_graph([my_custom_function1, my_custom_function2]) @torch._dynamo.optimize(...) def fn(a): x = torch.add(x, 1) x = my_custom_function(x) x = torch.add(x, 1) return x fn(...) Notes: The `allow_in_graph` function allows customization of which functions TorchDynamo includes in the generated graph. It can be used to include specific functions that are not automatically captured by TorchDynamo. If `fn` is a list or tuple, `allow_in_graph` will be called recursively on each element in the sequence. Once a function is allowed in the graph using `allow_in_graph`, it will be captured in the graph generated by TorchDynamo. This customization enables more fine-grained control over the functions included in the graph. Note that `allow_in_graph` expects the input `fn` to be a callable. """ if isinstance(fn, (list, tuple)): return [allow_in_graph(x) for x in fn] assert callable(fn), "allow_in_graph expects a callable" allowed_functions._allowed_function_ids.add(id(fn)) allowed_functions._disallowed_function_ids.remove(id(fn)) return fn So to make the API public, we’d have to write similar docstrings for all public functions we’d like to create. The benefit of this approach is that 1. No BC risks, internal and external users relying on our tooling can slowly wean off the private functions. 2. We will also have to write correct docstrings which will automatically make our documentation easier to maintain and render correctly on pytorch.org 3. We already have some BC guarantees already, we don’t kill OptimizedModule, we rejected the PR to change the config system The con of this approach is that Will be stuck with some potentially suboptimal functions/classes that you can’t kill ## Testing strategy If the approach is to mostly make a public function call an already tested private function then all we need to do is ensure that the function signatures don't change ## Which functions should be in the public API Our heuristic for deciding whether something should be public or not is are users already relying on it for lack of other options or have we recommended some non public functions for users to debug their PT 2.0 programs. Heuristic for not putting something in public is that it’s an experimental subsystem with the goal of turning it on by default, it’s very core dev centric, meta centric, a bunch of different configs that should be batched into a single user facing one, or something that needs to be renamed because the name is confusing #### Top level `torch.compile()` -> already is a public API it does require some minor improvements like having configs be passed in to any backend and not just inductor (EDIT: This was already done https://github.com/pytorch/pytorch/pull/99645l) and renaming `mode=reduce-overhead` to `mode=cudagraph` To make sure that PT 2.0 is supported with a given pytorch version users can create a new public function and this would replace the need for `try/except` blocks around `import torch._dynamo` that has been populating user code. ```python def pt2_enabled(): if hasattr(torch, 'compile'): return True else: return False ``` For all of the below they will be translated to `torch.compiler.function_name()` #### From _dynamo As a starting point we looked at https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py and we suggest redefining these functions in `pytorch/torch/compiler/__init__.py` It might also make sense to split them over multiple files and import them in `__init__.py` but because the number of functions is small it'd probably be fine to add them all into a single compiler/__init__.py until this list becomes larger 1. `reset()` 2. `allow_in_graph()` 10. `list_backends()` 12. `compile()`: torch.compile() would be mostly a shell function passing arguments to torch.compiler.compile() 13. `assume_constant_result()`: TODO: Double check how this is useful 15. `torch._dynamo.disable()` Some notable omissions 11. `explain()`: We need to clean up the output for this function, make it a data class and pretty printable 1. `forbid_in_graph()`: Considered adding this but should instead consolidate on `disallow_in_graph` 2. `optimize_assert()`: Already covered by `torch.compile(fullgraph=True)` 3. `check_if_dynamo_supported()`: this would be supplanted by pt2_enabled() 4. `compilation_metrics`, `graph_breaks_reasons` ..: would all be accessed via `torch.compiler.explain()` 5. `replay` does not seem useful to end customers 6. . `graph_break()`: Mostly useful for debugging or unit tests 9. `register_backend()`: End users will just pass a string backend to torch.compile, only devs will create new backends 10. `export()` : Eventually this needs to public but for now it’s not ready so just highlighting that it will be in the public API eventually 11. `disallow_in_graph()`: Usage is limited 12. `mark_static()`: we can keep this private until dynamic=True is recommended in stable 13. `mark_dynamic()`: we can keep this private until dynamic=True is recommended in trunk 14. 8. `OptimizedModule`: This is the only class that we'd expose but is crucial since users are running code like `if isinstance(mod, OptimizedModule): torch.save(mod._orig_mod)` EDIT: because we fixed pickling we no longer need to expose this 15. `is_compiling()`: Still not clear how this useful to end users There are also config variables which we need to expose https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py Some of our configs are useful dev flags, others are to gate experimental functionality and others are essential debugging tools and we seperate out the essential debugging and logging tools to a public facing config. TODO: I still need to think of a good way of porting the config in a BC way here are some ideas 1. Just make all passes available and controllable via `torch.compile(options={})` but only show docstrings for the ones users should care about. The current problem with our config system is we have 3 ways of setting them once via `options={}`, environment variables and variables in `config.py`, it'd be worth settling on one source of truth and have that be the public API. The configs we should make public are 1. `log_file_name` 2. `verbose` 3. `cache_size_limit` 4. `repro_level` and `repro_after`: Although we can rename these to minifier and give human readable names to the levels Everything else should stay private in particular 1. `print_graph_breaks`, `print_specializations`: should be supplanted by `explain()` for public users 2. dynamic shape configs : Users should only have to worry about `torch.compile(dynamic=True/False)` 3. The distributed flags, hook or guard configs: If we tell a user to use FSDP and DDP then the flag should be enabled by default or be in a private namespace 4. The fbcode flags: Obviously no need to be user facing 5. Skip/Allow lists: Not something normal users should play around with #### From _inductor Very little of inductor should be exposed in a public facing API, our core audience as in people writing models mostly just need information on what certain passes mean and how to control them a high level and they can do this with `torch.compile(options={})` so the goal here should be more to make available passes clearer and ideally consolidate them into `torch.compile()` docstrings or modes. There are some exceptions though from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/__init__.py 1. `list_mode_options()` 2. `list_options()`: this needs an additional pass to hide internal or debug options For both of these we’d rename them to compiler.inductor_list_mode_options and compiler.inductor_list_options() since they would be in the same init file as the one for dynamo Notable omissions 1. `_inductor.compile()`: Because of users are coming in with their own fx graph, they are likely developers 2. `_inductor.aot_compile()`:Again this is about capturing and modifying fx graphs so users APIs don't need to be public However the configs are a slightly different story, because we can choose to either 1. Make all configs public 2. Make some configs public and keep most of the private ones. If public config is set it should override the private version 3. Make all configs controllable via `torch.compile(options={})` but make list_options() hide more things For now 3 seems like the most reasonable choice with some high level configs we’ll keep like TORCH_COMPILE_DEBUG Regardless here's what should probably be public or advertised more 1. `disable_progress` and verbose_progress: Combine and enable by default 2. `fallback_random`: We could make the case this shouldn't be public if a top level deterministic mode enables this 3. `profile_bandwidth`: Or could make the case that this should be in TORCH_COMPILE_DEBUG Notable omissions 1. Any config that would generally improve performance for most that we should probably enable by default but might be disabled in the short term because of stability: example `epilogue_fusion`, `pattern_matcher`, `reordering` 2. Autotuning flags: Should just sit behind `torch.compile(mode="max-autotune")` like `max_autotune`, `max_autotune_gemm` 3. `coordinate_descent_tuning`: This one I'm a but mixed about, maybe it just also fall into `mode="max-autotune"` 4. `trace`: `TORCH_COMPILE_DEBUG` is the best flag for all of this 5. `triton.cudagraphs`: Default should be `torch.compile(mode="reduce-overhead")` - I'd go further and rename the `mode=cudagraph` and we can keep reduce-overhead for BC reasons 6. `triton_unique_kernel_names`: Mostly useful for devs debugging 7. `dce`: which doesnt really do anything 8. `shape_padding`: Elias is working on enabling this by default in which case we also remove it ## Mechanics This PR would include the public functions with their docstrings Another PR will take a stab at the configs And for work where the APIs are still being cleaned up whether its minifier or escape hatches, export or dynamic shapes, aot_inductor etc.. we’ll keep them private until a public commitment can be made Pull Request resolved: https://github.com/pytorch/pytorch/pull/102182 Approved by: https://github.com/jansel	2023-06-02 14:38:55 +00:00

43 Commits