Commit Graph

50 Commits

Author SHA1 Message Date
5ec9c0bc4a Fix linearize(grad(...)) call (#133364)
Fixes #124550

Also moves `graph.eliminate_dead_code()` call to a few lines after
`_inline_module(...)` in `const_fold.py`

* Test plan:

Add a new test on `test_eager_transforms.py` to ensure the reported
issue was indeed fixed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133364
Approved by: https://github.com/zou3519
2024-08-15 17:55:36 +00:00
cbee9c1fd2 Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)"
This reverts commit 0e7e61f7cec82a43f2de52b83eff152d703be7a3.

Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2272370386))
2024-08-07 00:05:20 +00:00
0e7e61f7ce Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)
This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-08-03 09:43:38 +00:00
e7eeee473c [BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765
Approved by: https://github.com/ezyang
2024-07-31 10:42:50 +00:00
207fb96155 [functorch] saved tensor hooks error should only apply to grad, vjp transforms. (#131191)
There's no reason to ban them for vmap or jvp, because without the
{grad, vjp} transforms those just act above PyTorch autograd, which will
end up saving regular Tensors.

Test Plan:
- some tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131191
Approved by: https://github.com/drisspg
2024-07-19 23:16:27 +00:00
9818283da1 re-enable jacrev/jacfwd/hessian after #128028 landed (#128622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128622
Approved by: https://github.com/zou3519
2024-06-18 17:08:58 +00:00
4460e481bc Disable jacrev/jacfwd/hessian if compiling with dynamo (#128255)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128255
Approved by: https://github.com/zou3519
2024-06-10 20:47:53 +00:00
90bb510ece Revert "Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)"
This reverts commit 348b181a97abc2e636a6c18e5880a78e5d1dab94.

Reverted https://github.com/pytorch/pytorch/pull/127690 on behalf of https://github.com/clee2000 due to sorry I think https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456 is still relevant, I will reach out to them to see what needs to be done in internal to get this remerged ([comment](https://github.com/pytorch/pytorch/pull/127690#issuecomment-2159248859))
2024-06-10 20:44:42 +00:00
348b181a97 Deprecate torch._utils.is_compiling() and torch._dynamo.external_utils.is_compiling() (#127690)
This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007
2024-06-08 15:25:03 +00:00
033e733021 Revert "[BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)"
This reverts commit 749a132fb0a8325cbad4734a563aa459ca611991.

Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))
2024-05-31 19:47:24 +00:00
749a132fb0 [BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

UPDATE: Use `FutureWarning` instead of `DeprecationWarning`.

Resolves #126888

- #126888

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898
Approved by: https://github.com/albanD
2024-05-29 12:09:27 +00:00
763dc26e59 [Dynamo] Add dynamo support to torch.func.linearize (#123118)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123118
Approved by: https://github.com/zou3519
2024-04-23 21:31:49 +00:00
73f0ecc1ac [BE] UFMT directory torch/_functorch (#123723)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123723
Approved by: https://github.com/Skylion007
2024-04-12 08:04:51 +00:00
933d3a7829 Allow dynamo to inline through "hessian" (#121410)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121410
Approved by: https://github.com/zou3519
2024-03-27 21:39:37 +00:00
4eaa000acc Teach dynamo about torch.func.jvp (#119926)
List of changes:
- Replace JVP_NESTING by torch._C._functorch.maybe_current_level()
- Remove all increment nesting functions from wrap_fx_proxy_cls
- fwAD.make_dual receives the dual_level as keyword argument
- Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926
Approved by: https://github.com/zou3519
2024-03-22 20:25:47 +00:00
0696db8202 Revert "Teach dynamo about torch.func.jvp (#119926)"
This reverts commit 17489784b635187316c6c856c5fe6b6a28d8a15a.

Reverted https://github.com/pytorch/pytorch/pull/119926 on behalf of https://github.com/peterbell10 due to broken mac jobs on main ([comment](https://github.com/pytorch/pytorch/pull/119926#issuecomment-2010327997))
2024-03-20 18:34:43 +00:00
17489784b6 Teach dynamo about torch.func.jvp (#119926)
List of changes:
- Replace JVP_NESTING by torch._C._functorch.maybe_current_level()
- Remove all increment nesting functions from wrap_fx_proxy_cls
- fwAD.make_dual receives the dual_level as keyword argument
- Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926
Approved by: https://github.com/zou3519
2024-03-20 13:09:19 +00:00
36e5c1dcab Revert "Teach dynamo about torch.func.jvp (#119926)"
This reverts commit edd04b7c16cc6715411119bb7db234a9df59065f.

Reverted https://github.com/pytorch/pytorch/pull/119926 on behalf of https://github.com/jeanschmidt due to lots of breakages in pull jobs, checking if reverting this one will help ([comment](https://github.com/pytorch/pytorch/pull/119926#issuecomment-2007915919))
2024-03-19 18:59:46 +00:00
edd04b7c16 Teach dynamo about torch.func.jvp (#119926)
List of changes:
- Replace JVP_NESTING by torch._C._functorch.maybe_current_level()
- Remove all increment nesting functions from wrap_fx_proxy_cls
- fwAD.make_dual receives the dual_level as keyword argument
- Add jvp_increment_nesting, set_fwd_grad_enabled and dual_level context managers to dynamo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119926
Approved by: https://github.com/zou3519
2024-03-19 13:06:42 +00:00
fd35aafc26 Teach dynamo about vjp (#119405)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119405
Approved by: https://github.com/zou3519
ghstack dependencies: #118407
2024-03-01 00:21:10 +00:00
491c2b4665 Let torch dynamo inline torch.func.grad (#118407)
When dynamo sees torch.func.grad, it tries to inline all frames related
to.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118407
Approved by: https://github.com/zou3519
2024-02-28 20:05:00 +00:00
9bce208dfb Replace follow_imports = silent with normal (#118414)
This is a lot of files changed! Don't panic! Here's how it works:

* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.

In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.

The codemod was done with this script authored by GPT-4:

```
import glob

exclude_patterns = [
    ...
]

for pattern in exclude_patterns:
    for filepath in glob.glob(pattern, recursive=True):
        if filepath.endswith('.py'):
            with open(filepath, 'r+') as f:
                content = f.read()
                f.seek(0, 0)
                f.write('# mypy: ignore-errors\n\n' + content)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
2024-01-27 02:44:11 +00:00
66c32d099a Use pytree.arg_tree_leaves everywhere (#112394)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394
Approved by: https://github.com/lezcano
ghstack dependencies: #112391, #112392, #112393
2023-10-31 15:57:06 +00:00
bbd5b935e4 Use pytree.tree_leaves everywhere (#112324)
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327, #112323
2023-10-30 03:39:04 +00:00
a7a0955790 [pytree][BE] reorganize imports and format code style and update type hints (#112268)
Reland PR:

- #112109

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112268
Approved by: https://github.com/Skylion007
2023-10-28 16:30:24 +00:00
6d7744ca46 Fix typo under torch/_functorch directory (#111067)
This PR fixes typo the the of comments and exception messages in files under `torch/_functorch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111067
Approved by: https://github.com/Skylion007
2023-10-11 23:09:36 +00:00
238fb66085 python functionalization: support higher order ops (#108656)
We now have two types of functionalization, C++ Functionalization (through the `Functionalize` dispatch key), and python functionalization (through the `FunctionalTensorMode` torch_dispatch mode).

This means that all higher order ops need custom functionalization rules for the python variant too. I added them here, as well as a helper function `dispatch_functionalize()` - equivalent to `torch.func.functionalize()`, except that it uses `FunctionalTensorMode`.

In theory we could have secretly switched `torch.func.functionalize` to use `FunctionalTensorMode`. This would be BC-breaking, though, since `FunctionalTensorMode` isn't composable with the other functorch transforms (the functorch layer-mode stack doesn't know how to re-order torch_dispatch modes arbitrarily).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108656
Approved by: https://github.com/zou3519
ghstack dependencies: #109024, #109248
2023-09-20 04:37:31 +00:00
cce2c52b0b [pt2] support vmap (#101707)
Teach dynamo about `vmap`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101707
Approved by: https://github.com/zou3519
2023-08-09 03:39:33 +00:00
8a688277a2 [BE] Enable ruff's UP rules and autoformat dynamo / functorch and refs (#105432)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105432
Approved by: https://github.com/ezyang
2023-07-19 13:48:44 +00:00
d552c271db [pt2] grad support (#102264)
Teach dynamo about grad

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102264
Approved by: https://github.com/zou3519
2023-06-21 10:13:09 +00:00
e737a8486f Revert "[pt2] grad support (#102264)"
This reverts commit 85b83954c8820fc7473d8e7b68325fa8ed5753dc.

Reverted https://github.com/pytorch/pytorch/pull/102264 on behalf of https://github.com/huydhn due to This is failing in trunk 85b83954c8 and looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/102264#issuecomment-1600001309))
2023-06-21 03:02:55 +00:00
85b83954c8 [pt2] grad support (#102264)
Teach dynamo about grad

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102264
Approved by: https://github.com/zou3519
2023-06-21 01:37:08 +00:00
47dca20d80 [BE] Enable flake8-comprehension rule C417 (#97880)
Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880
Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD
2023-03-30 14:34:24 +00:00
2b369eb3c2 [fix] jacrev and jacfwd : support non-tensor args again (#97746)
Fixes https://github.com/pytorch/pytorch/issues/97636

The code to check if argument tensor are complex assumed that all arguments are tensor (which is not the case) which lead to the error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97746
Approved by: https://github.com/zou3519
2023-03-28 16:37:33 +00:00
3fc4bc115f [functorch] jacrev, jacfwd error for complex input or output (#94805)
Related: https://github.com/pytorch/pytorch/issues/94397, https://github.com/pytorch/pytorch/issues/94397#issuecomment-1428452756
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94805
Approved by: https://github.com/lezcano
2023-02-14 16:13:37 +00:00
4f3858c6d8 [functorch] linearize (#94173)
Fixes https://github.com/pytorch/functorch/issues/724

TODO:
* [x] Docs

NOTE: `const_fold` pass raises UserWarning -> https://github.com/pytorch/pytorch/issues/94374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94173
Approved by: https://github.com/Chillee
2023-02-09 15:45:08 +00:00
e0e4f1a890 Revert "[functorch] linearize (#94173)"
This reverts commit b6b9e1e6e043ae4b9f41fbbee4f2a9e9a7e7d3d7.

Reverted https://github.com/pytorch/pytorch/pull/94173 on behalf of https://github.com/kshitij12345 due to Broke lint runner
2023-02-09 09:22:39 +00:00
b6b9e1e6e0 [functorch] linearize (#94173)
Fixes https://github.com/pytorch/functorch/issues/724

TODO:
* [x] Docs

NOTE: `const_fold` pass raises UserWarning -> https://github.com/pytorch/pytorch/issues/94374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94173
Approved by: https://github.com/Chillee
2023-02-09 08:57:05 +00:00
ad782ff7df Enable xdoctest runner in CI for real this time (#83816)
Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-29 05:32:42 +00:00
3fdbf824ae [functorch] jacrev: chunk_size=1 without vmap (#91326)
As discussed at https://github.com/pytorch/pytorch/pull/91157#discussion_r1053679272

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91326
Approved by: https://github.com/zou3519
2022-12-28 04:56:25 +00:00
4437d0d161 [functorch] vmap: chunk_size support (#91157)
Ref: https://github.com/pytorch/functorch/issues/680

We introduce a kwarg `chunk_size` in vmap.

Also, we leverage most of the code from `chunk_vmap` (except for chunking the input based on `chunk_size`)

Benchmarks from https://github.com/pytorch/functorch/pull/774 apply.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91157
Approved by: https://github.com/zou3519
2022-12-22 19:45:45 +00:00
c47bdd7522 *_scatter ops should preserve input stride/storage_offset (#91029)
It turns out that we *do* need to update *_scatter ops to return the exact same strides as their inputs. I added a test to `test/test_functionalization.py`, which now trips thanks to Ed's functionalization stride debugging check. It only actually ends up tripping silent correctness if you try to .backward() on that function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91029
Approved by: https://github.com/ezyang
2022-12-22 19:41:53 +00:00
fb2e1878cb [torch.func] alias torch.func.vmap as torch.vmap (#91026)
This PR also redirects torch.vmap to torch.func.vmap instead of the old
vmap prototype.

Test Plan:
- tests
- view docs preview
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91026
Approved by: https://github.com/albanD, https://github.com/samdow
2022-12-21 20:51:49 +00:00
41846e205e [torch.func] Setup torch.func, populate it with all transforms (#91016)
This PR sets up torch.func and populates it with the following APIs:
- grad
- grad_and_value
- vjp
- jvp
- jacrev
- jacfwd
- hessian
- functionalize
- vmap

It also renames all instances of `functorch` in the APIs for those docs
to `torch.func`.

We rewrite the `__module__` fields on some of the above APIs so that the
APIs fit PyTorch's public api definition.
- For an API to be public, it must have a `__module__` that points to a
  public PyTorch submodule. However, `torch._functorch.eager_transforms`
  is not public due to the leading underscore.
- The solution is to rewrite `__module__` to point to where the API is
  exposed (torch.func). This is what both Numpy and JAX do for their
  APIs.
- h/t pmeier in
  https://github.com/pytorch/pytorch/issues/90284#issuecomment-1348595246
  for idea and code
- The helper function, `exposed_in`, is confined to
  torch._functorch/utils for now because we're not completely sure if
  this should be the long-term solution.

Implication for functorch.* APIs:
- functorch.grad is the same object as torch.func.grad
- this means that the functorch.grad docstring is actually the
  torch.func.grad docstring and will refer to torch.func instead of
  functorch.
- This isn't really a problem since the plan on record is to deprecate
  functorch in favor of torch.func. We can fix these if we really want,
  but I'm not sure if a solution is worth maintaining.

Test Plan:
- view docs preview

Future:
- vmap should actually just be torch.vmap. This requires an extra step
  where I need to test internal callsites, so, I'm separating it into a
  different PR.
- make_fx should be in torch.func to be consistent with `import
  functorch`. This one is a bit more of a headache to deal with w.r.t.
  public api, so going to deal with it separately.
- beef up func.rst with everything else currently on the functorch
  documention website. func.rst is currently just an empty shell.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91016
Approved by: https://github.com/samdow
2022-12-20 00:00:52 +00:00
cad1ce6158 Stop using :attr: in functorch docs (#91015)
We're using :attr: wrong. :attr: refers to an attribute of a Python
object, not the parameter to a function:
- https://www.sphinx-doc.org/en/master/usage/restructuredtext/domains.html#role-py-attr

This leads to some weird things when moving to torch.func: sphinx
decides to link torch.func for :attr:`func`

Test Plan:
- docs preview.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91015
Approved by: https://github.com/samdow
2022-12-20 00:00:52 +00:00
f02e93b584 jacrev : Support chunked computation (#89376)
Ref: https://github.com/pytorch/functorch/issues/680

We introduce a kwarg `chunk_size` in `jacrev` to control whether the Jacobian computation should be chunked and if so then `chunk_size` will dictate the maximum size of the chunks used.

We try two approaches,
* Stacked Approach: Append the intermediate computation to a list and then stack those results.
* Pre-allocation Approach: Pre-allocate a zeros tensor and copy chunked computation into it.

For Memory Benchmark, see https://github.com/pytorch/pytorch/pull/89376#issuecomment-1348479098

Benchmark CPU : Performs better with more chunks/ smaller chunk_size.

NOTE: There seems to be a lot of noise for shape `(64, 64)`.

<details>

```
[----------------------------------------------- jacrev : device cpu : chunks 2 -----------------------------------------------]
                                     |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: ---------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 2080     |               76.2            |          50.9        |                  80.1
      (128, 128) : chunk_size 8256   |             1172.8            |         783.3        |                1225.5
      (128, 144) : chunk_size 9288   |             1475.1            |         990.4        |                1548.3
      (144, 144) : chunk_size 10440  |             1871.3            |        1254.4        |                1971.2

Times are in milliseconds (ms).

[----------------------------------------------- jacrev : device cpu : chunks 3 ----------------------------------------------]
                                    |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: --------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 1386    |               39.9            |          25.8        |                  58.8
      (128, 128) : chunk_size 5504  |             1182.6            |         782.2        |                1229.7
      (128, 144) : chunk_size 6192  |             1483.6            |         995.4        |                1550.6
      (144, 144) : chunk_size 6960  |             1879.1            |        1257.7        |                1960.5

Times are in milliseconds (ms).

[----------------------------------------------- jacrev : device cpu : chunks 4 ----------------------------------------------]
                                    |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: --------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 1040    |               41.7            |          50.6        |                  29.1
      (128, 128) : chunk_size 4128  |             1171.6            |         782.3        |                1226.7
      (128, 144) : chunk_size 4644  |             1482.2            |         994.6        |                1550.9
      (144, 144) : chunk_size 5220  |             1870.2            |        1254.5        |                1961.4

Times are in milliseconds (ms).

[--------------------------------------------- jacrev : device cpu : chunks 100 ---------------------------------------------]
                                   |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: -------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 41     |               46.8            |          50.5        |                  46.4
      (128, 128) : chunk_size 165  |              622.2            |         775.2        |                 656.0
      (128, 144) : chunk_size 185  |              803.9            |         987.3        |                 866.9
      (144, 144) : chunk_size 208  |             1021.1            |        1251.2        |                1088.2

Times are in milliseconds (ms).

[--------------------------------------------- jacrev : device cpu : chunks 200 ---------------------------------------------]
                                   |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: -------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 20     |               60.9            |          50.2        |                  62.3
      (128, 128) : chunk_size 82   |              583.1            |         779.4        |                 634.3
      (128, 144) : chunk_size 92   |              834.1            |        1005.8        |                 472.3
      (144, 144) : chunk_size 104  |             1053.6            |        1277.0        |                1033.9

Times are in milliseconds (ms).

[--------------------------------------------- jacrev : device cpu : chunks 300 --------------------------------------------]
                                  |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: ------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 13    |              77.7             |          50.4        |                  79.6
      (128, 128) : chunk_size 55  |             578.9             |         782.3        |                 626.9
      (128, 144) : chunk_size 61  |             718.2             |        1024.9        |                 800.4
      (144, 144) : chunk_size 69  |             919.7             |        1313.7        |                1023.0

Times are in milliseconds (ms).
```

</details>

Benchmark CUDA: Performs better with less chunks/bigger chunk_size.

<details>

```
[--------------------------------------------- jacrev : device cuda:1 : chunks 2 ----------------------------------------------]
                                     |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: ---------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 2080     |             1485.7            |         923.8        |                1632.3
      (128, 128) : chunk_size 8256   |            25390.2            |       14103.2        |               33557.4
      (128, 144) : chunk_size 9288   |              801.7            |       16854.1        |               42894.6
      (144, 144) : chunk_size 10440  |             1003.5            |       21386.5        |               59648.5

Times are in microseconds (us).

3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 3
[--------------------------------------------- jacrev : device cuda:1 : chunks 3 ---------------------------------------------]
                                    |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: --------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 1386    |             1474.5            |         924.5        |                1655.5
      (128, 128) : chunk_size 5504  |            25368.9            |       10156.0        |               34022.1
      (128, 144) : chunk_size 6192  |            25223.0            |       12933.7        |               56418.5
      (144, 144) : chunk_size 6960  |            24729.3            |       16367.4        |               68744.7

Times are in microseconds (us).

3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 4
[--------------------------------------------- jacrev : device cuda:1 : chunks 4 ---------------------------------------------]
                                    |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: --------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 1040    |             1489.2            |         924.4        |                 1679.6
      (128, 128) : chunk_size 4128  |            25370.4            |        8987.4        |                57201.3
      (128, 144) : chunk_size 4644  |            32239.1            |       10136.2        |                72406.5
      (144, 144) : chunk_size 5220  |            40994.3            |       12867.8        |               108653.4

Times are in microseconds (us).

3 / 3 : Shape (144, 144) : Device cuda:1 : chunks: 100
[------------------------------------------- jacrev : device cuda:1 : chunks 100 --------------------------------------------]
                                   |  with chunk_size and stacked  |  without chunk_size  |  with chunk_size and pre-allocated
1 threads: -------------------------------------------------------------------------------------------------------------------
      (64, 64) : chunk_size 41     |            21121.8            |         924.2        |               22753.5
      (128, 128) : chunk_size 165  |            23679.7            |       14284.4        |               26758.2
      (128, 144) : chunk_size 185  |            30082.3            |       18063.3        |               33553.5
      (144, 144) : chunk_size 208  |            38175.6            |       22839.5        |               42030.0

Times are in microseconds (us).
```

</details>

Benchmark Script

<details>

```python
import functorch
import torch
import itertools
import time
from torch.utils.benchmark import Timer
from torch.utils.benchmark import Compare
import sys
import pickle
from torch import profiler

import math

def prod(l):
    prod = 1
    for el in l:
        prod *= el

    return prod

def fn(x, y):
    return x + y, x.sum(0)

shapes = ((64, 64), (128, 128), (128, 144), (144, 144))

for device in ('cpu', 'cuda:1'):
    if device == 'cuda:1':
        chunks = (2, 3, 4, 100,)
    else:
        chunks = (2, 3, 4, 100, 200, 300)
    for chunk in chunks:
        results = []
        for shape in shapes:
            x = torch.zeros(*shape, dtype=torch.float, device=device)
            y = x.sum()
            chunk_size = (prod(shape) + prod(shape[1:])) // chunk
            jacrev_fn_chunked = functorch.jacrev(fn, (0, 1), chunk_size=chunk_size)
            jacrev_fn_chunked_pre = functorch.jacrev(fn, (0, 1), chunk_size=chunk_size, _preallocate_and_copy=True)
            jacrev_fn = functorch.jacrev(fn, (0, 1), chunk_size=None)

            tasks = [("jacrev_fn_chunked(x, y)", "with chunk_size and stacked"),
                     ("jacrev_fn(x, y)", "without chunk_size"),
                     ("jacrev_fn_chunked_pre(x, y)", "with chunk_size and pre-allocated"),]
            timers = [Timer(stmt=stmt, label=f"jacrev : device {device} : chunks {chunk}", sub_label=f"{(shape)} : chunk_size {chunk_size}", description=desc, globals=globals()) for stmt, desc in tasks]

            for i, timer in enumerate(timers):
                results.append(
                    timer.blocked_autorange(min_run_time=2.)
                )
                print(f"\r{i + 1} / {len(timers)} : Shape {shape} : Device {device} : chunks: {chunk}", end="")
                sys.stdout.flush()

        print()
        comparison = Compare(results)
        comparison.print()
```

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89376
Approved by: https://github.com/zou3519
2022-12-19 20:04:21 +00:00
24c3ad7851 Move private forward grad mode helpers to torch.autograd.forward_ad (#90240)
Motivation
- These were previously defined in functorch. They are not
functorch-specific, so I'm moving them to torch.autograd.forward_ad and
the autograd python bindings.
- I need this to avoid some of my cyclic import problems.

Should these be public APIs? Probably. Though this needs discussion, so
punting it to the future.

Test Plan:
- moved the tests of these from test/functorch/test_eager_transforms.py
to test/test_autograd.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90240
Approved by: https://github.com/soulitzer
2022-12-13 14:14:02 +00:00
4068c5467d [Reland] Move functorch/_src to torch/_functorch (#88756) (#90091)
This will be the last disruptive functorch internals change.

Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.

Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times

Test Plan:
- wait for tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90091
Approved by: https://github.com/anijain2305, https://github.com/ezyang
2022-12-03 14:17:15 +00:00
218d9c6e09 Revert "Move functorch/_src to torch/_functorch (#88756)"
This reverts commit 52bc5c1cfe098fd4b4b13902b4fea83b455b9773.

Reverted https://github.com/pytorch/pytorch/pull/88756 on behalf of https://github.com/clee2000 due to broke imports in tests 52bc5c1cfe https://github.com/pytorch/pytorch/actions/runs/3574742513/jobs/6010814968 probably a landrace
2022-11-29 17:17:11 +00:00
52bc5c1cfe Move functorch/_src to torch/_functorch (#88756)
This will be the last disruptive functorch internals change.

Why are we moving these files?
- As a part of rationalizing functorch we are moving the code in
functorch/_src to torch/_functorch
- This is so that we can offer the functorch APIs as native PyTorch APIs
(coming soon) and resolve some internal build issues.

Why are we moving all of these files at once?
- It's better to break developers all at once rather than many times

Test Plan:
- wait for tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88756
Approved by: https://github.com/ezyang
2022-11-29 13:55:42 +00:00