pytorch

frozenleaves/pytorch

Fork 0

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Commit Graph

Author	SHA1	Message	Date
Tugsbayasgalan Manlaibaatar	463fbc8ca0	Support vmap + custom autograd function/improve DTensor constructor inefficiency (#162240 ) This makes gemma3 exportable on transformers=4.55.4 In HF, there is a torch funciton mode called TransformGetItemToIndex which internally calls custom autograd function. When this custom autograd function is called under vmap, It triggers CustomFunctionHigherOrderOP which error-ed because there was no pre-dispatch proxy mode implementation. Since there are number of requests lately to add various operators in pre-dispatch IR, I introduce a decorator in export that works similar to `allow_in_graph`. Basically: 1) We intercept custom_autograd_function.apply at pre-dispatch mode when this decorator is applied 2) We apply `flat_apply` HOP to hide the pytree spec for this autograd function. Note that this adds restriction that this custom autograd function needs to take in fx-able types. 3) subclass constructor decorator is implemented similarly, so we just refactor it to use similar implementation as this new decorator. eventually we should delete the subclass constructor decorator. 4) Move some code in subclass constructor decorator to exit early in non-export environment which should shave off some inefficiency (around 1% according to @swolchok 's benchmark) Fixes: https://github.com/pytorch/pytorch/issues/161563#issuecomment-3246309758 Differential Revision: [D82141316](https://our.internmc.facebook.com/intern/diff/D82141316) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162240 Approved by: https://github.com/ydwu4	2025-09-11 17:42:41 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	c5deacc27a	Fix subclass access custom op bug (#149698 ) Summary: When we call torch.inference_mode, we seem to skip Autograd key causing the custom op export uses to be not decomposed properly before subclass dispatching starts. We fix this by force desugaring this op at Python key Test Plan: test Differential Revision: D71599541 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149698 Approved by: https://github.com/bdhirsh	2025-03-21 19:42:56 +00:00
Tugsbayasgalan Manlaibaatar	ebd992724f	Implement serializable getattr support for tensor subclasses (#145772 ) builtins.getattr is not serializable, so we replace it with a custom op that has more refined schema. Differential Revision: [D68899421](https://our.internmc.facebook.com/intern/diff/D68899421) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145772 Approved by: https://github.com/bdhirsh	2025-02-11 19:05:14 +00:00

Author

SHA1

Message

Date

Tugsbayasgalan Manlaibaatar

463fbc8ca0

Support vmap + custom autograd function/improve DTensor constructor inefficiency (#162240 )

This makes gemma3 exportable on transformers=4.55.4

In HF, there is a torch funciton mode called TransformGetItemToIndex which internally calls custom autograd function. When this custom autograd function is called under vmap, It triggers CustomFunctionHigherOrderOP which error-ed because there was no pre-dispatch proxy mode implementation.

Since there are number of requests lately to add various operators in pre-dispatch IR, I introduce a decorator in export that works similar to `allow_in_graph`. Basically:
1) We intercept custom_autograd_function.apply at pre-dispatch mode when this decorator is applied
2) We apply `flat_apply` HOP to hide the pytree spec for this autograd function. Note that this adds restriction that this custom autograd function needs to take in fx-able types.
3) subclass constructor decorator is implemented similarly, so we just refactor it to use similar implementation as this new decorator. eventually we should delete the subclass constructor decorator.
4) Move some code in subclass constructor decorator to exit early in non-export environment which should shave off some inefficiency (around 1% according to @swolchok 's benchmark)

Fixes: https://github.com/pytorch/pytorch/issues/161563#issuecomment-3246309758

Differential Revision: [D82141316](https://our.internmc.facebook.com/intern/diff/D82141316)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162240
Approved by: https://github.com/ydwu4

2025-09-11 17:42:41 +00:00

Tugsbayasgalan (Tugsuu) Manlaibaatar

c5deacc27a

Fix subclass access custom op bug (#149698 )

Summary: When we call torch.inference_mode, we seem to skip Autograd key causing the custom op export uses to be not decomposed properly before subclass dispatching starts. We fix this by force desugaring this op at Python key

Test Plan: test

Differential Revision: D71599541

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149698
Approved by: https://github.com/bdhirsh

2025-03-21 19:42:56 +00:00

Tugsbayasgalan Manlaibaatar

ebd992724f

Implement serializable getattr support for tensor subclasses (#145772 )

builtins.getattr is not serializable, so we replace it with a custom op that has more refined schema.

Differential Revision: [D68899421](https://our.internmc.facebook.com/intern/diff/D68899421)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145772
Approved by: https://github.com/bdhirsh

2025-02-11 19:05:14 +00:00

3 Commits