Commit Graph

81 Commits

Author SHA1 Message Date
4bbadeee8a Revert "Set simdlen based on ATEN_CPU_CAPABILITY (#123514)"
This reverts commit b66e3f0957b96b058c9b632ca60833d9717a9d8a.

Reverted https://github.com/pytorch/pytorch/pull/123514 on behalf of https://github.com/clee2000 due to broke test/inductor/test_torchinductor.py::CpuTests::test_new_cpp_build_logical_cpu on periodic test on the no gpu tests b66e3f0957 https://github.com/pytorch/pytorch/actions/runs/9453518547/job/26040077301 ([comment](https://github.com/pytorch/pytorch/pull/123514#issuecomment-2159433432))
2024-06-10 22:46:01 +00:00
b66e3f0957 Set simdlen based on ATEN_CPU_CAPABILITY (#123514)
It is part of https://github.com/pytorch/pytorch/issues/123224. Set simdlen based on the environment ATEN_CPU_CAPABILITY to control CPU vec ISA like eager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123514
Approved by: https://github.com/jgong5, https://github.com/peterbell10
2024-06-10 09:02:14 +00:00
dcfa7702c3 Flip default value for mypy disallow_untyped_defs [1/11] (#127838)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838
Approved by: https://github.com/oulgen
2024-06-08 18:16:33 +00:00
d44ab8ba6d [dynamo] utility to generate bytecode from template function (#127359)
This will be helpful in reducing some of the hardcoded and python-version-dependent bytecode generation in various places in dynamo - e.g. resume function generation and object reconstruction.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127359
Approved by: https://github.com/jansel
ghstack dependencies: #127329
2024-05-30 06:37:32 +00:00
f17572fcf6 add 3.12 inductor CI tests (#126218)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126218
Approved by: https://github.com/huydhn, https://github.com/desertfire
2024-05-16 22:29:24 +00:00
e93b57a570 Add propagate_real_tensors mode for unbacked (#125115)
A common complaint when working with data-dependent code in PyTorch is that it's hard to tell how far you are from the finish line: every time a GuardOnDataDependentSymNode error is hit, you have to somehow fix or workaround it to see the next one.

This PR adds a new mode `torch._functorch.config.fake_tensor_propagate_real_tensors` which modifies fake tensors to also propagate real tensors. This means that when we try to guard on a data-dependent SymNode, we can actually produce a real result. We also produce a warning which you should consult to figure out what the crux points are.

I ran this on vision_maskrcnn. In the baseline (without this mode), the model has 27 graph breaks, resulting in 40 graphs. With this mode on, the model has only 11 graph breaks, resulting in 15 graphs (the remaining graph breaks are due to missing functionality for item() on float tensor and some other Dynamo missing features.) You get a list of things that would have errored like this:

```
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> False
```

Potential later follow ups:

* Improve the warning messages (in particular, should provide user frames)
* GC real tensors when they are no longer needed by tracing. Right now, this will use A LOT of memory, equal to as if your GC was broken and every intermediate tensor was kept live

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125115
Approved by: https://github.com/IvanKobzarev
2024-05-02 15:28:26 +00:00
d6c713884a [dynamo, 3.12] xfail refleaking tests due to buggy getattr_static (#125062)
For tracking https://github.com/pytorch/pytorch/issues/124302 so that we can re-enable the test once 3.12 updates with the bug fix for https://github.com/python/cpython/issues/118013.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125062
Approved by: https://github.com/anijain2305, https://github.com/jansel
2024-04-30 22:40:47 +00:00
c12c85e919 Revert "[benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729)" (#125246)
This reverts commit 62b5738a8bf325d79468b839b8412b87cb9951c1.

https://github.com/pytorch/pytorch/pull/119729/ regresses cudagraph dashboard. Moving the one-time per iteration loss from CPU to CUDA is somehow causing a lot of copies:

current (top) vs with revert (bottom)
![image](https://github.com/pytorch/pytorch/assets/9547562/62dfbf66-7edc-4a3c-ba7f-1ec057fba950)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125246
Approved by: https://github.com/eellison
2024-04-30 22:39:53 +00:00
62b5738a8b [benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729)
aten.div's output device will be its numerator's device. so it is acceptable to do cuda / cpu type divisions. post grad passes operate only on graphs and can't handle runtime graph inputs. so we change user code to move inputs to cuda for cudagraph. this affects any graph that has cpu tensors as graph inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119729
Approved by: https://github.com/eellison
2024-04-26 03:22:26 +00:00
48a016157d Revert "[benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729)"
This reverts commit c021c9b8e48b8e787b75fd69a3076beffffb8208.

Reverted https://github.com/pytorch/pytorch/pull/119729 on behalf of https://github.com/jeanschmidt due to one PR in this stack seems to have broken linux pull cuda12 tests ([comment](https://github.com/pytorch/pytorch/pull/119729#issuecomment-2076750595))
2024-04-25 09:26:25 +00:00
c021c9b8e4 [benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729)
aten.div's output device will be its numerator's device. so it is acceptable to do cuda / cpu type divisions. post grad passes operate only on graphs and can't handle runtime graph inputs. so we change user code to move inputs to cuda for cudagraph. this affects any graph that has cpu tensors as graph inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119729
Approved by: https://github.com/eellison
2024-04-25 03:38:09 +00:00
a32eac345f [dynamo] Return gm.forward for eager backend (#124109)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124109
Approved by: https://github.com/yanboliang, https://github.com/jansel
ghstack dependencies: #124445
2024-04-20 14:11:05 +00:00
812bae09be [dynamo] fix 3.11+ refleak (#124238)
Fixes https://github.com/pytorch/pytorch/issues/119607 for 3.11+.

In 3.11+, `_PyFrame_FastToLocalsWithError` could implicity run `COPY_FREE_VARS` on the original frame, leading to double incref's since the dynamo shadow frame can rerun `COPY_FREE_VARS`. So the solution is to skip the first `COPY_FREE_VARS` instruction in the shadow frame if it was already executed in the original frame.

Also move the location for clearing the original frame in 3.12 to handle error cases more thoroughly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124238
Approved by: https://github.com/jansel
2024-04-18 03:02:29 +00:00
df5829d0ba [inductor] let rand_strided support fp8 (#124120)
I'm working on https://fb.workplace.com/groups/1075192433118967/posts/1411161629522044/ (this is a meta internal link about a inefficient inner/persistent reduction kernel generated by inductor). I found the generated benchmark code for a kernel ( https://gist.github.com/shunting314/13a0105f72a1c54d9c220370c7fd3845 ) can not be run since rand_strided failed to generate tensors for fp8. Errors are like

```
RuntimeError: "normal_kernel_cpu" not implemented for 'Float8_e4m3fn'
```
for CPU
or
```
RuntimeError: "normal_kernel_cuda" not implemented for 'Float8_e4m3fn'
```
for GPU

This PR work around that problem.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124120
Approved by: https://github.com/Chillee, https://github.com/jansel
2024-04-16 04:15:56 +00:00
1d6c5972c1 [BE]: Optimize min/max/sum comprehensions C419 (#123960)
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
2024-04-12 23:54:15 +00:00
6939279a17 [dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)
Fixes #114844

In the linked issue we have
```
compiled_module = torch.compile(module)
compiled_module.x = ...
compiled_module(...)  # Mutates self.x
```
Where since the module mutates `self.x` you would expect `compiled_module.x`
to be updated but actually `compiled_module.x = ...` sets an attribute "x"
on the `OptimizedModule` object while the forward method of the module mutates
`module.x`.

This gives the expected behavior by forwarding `compiled_module.__setattr__`
down to `module.__setattr__`. There is already a corresponding `__getattr__`
so now `compiled_module.x` becomes an alias for `module.x`.

Co-authored-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098
Approved by: https://github.com/ezyang, https://github.com/lezcano
2024-04-01 14:30:44 +00:00
a9b27bbbe9 [dynamo, 3.12] update jump instructions (#122530)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122530
Approved by: https://github.com/jansel
ghstack dependencies: #122146, #122335, #122354, #122355, #122356, #122449, #122455, #122456
2024-03-27 20:39:39 +00:00
2564f6cf0e [dynamo, 3.12] Allocate Dynamo shadow frames by mimicking CPython (#122146)
Python 3.12 changed a few things with how `_PyInterpreterFrame`s are allocated and freed:
- Frames are now required to be placed on the Python frame stack. In 3.11, we could allocate frames anywhere in memory. In 3.12, we now need to use `THP_PyThreadState_BumpFramePointerSlow`/`push_chunk`/`allocate_chunk`. This method of allocating/freeing frames is also compatible with 3.11.
- The eval frame function is now responsible for clearing the frame (see https://docs.python.org/3/whatsnew/changelog.html#id128, the point about "...which now clear the frame.")

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122146
Approved by: https://github.com/jansel
2024-03-27 20:39:39 +00:00
f631586084 Revert "[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)"
This reverts commit b6982bf2b25d2d3ba5d82488a39721d6013a838f.

Reverted https://github.com/pytorch/pytorch/pull/122098 on behalf of https://github.com/atalman due to Failing internally ([comment](https://github.com/pytorch/pytorch/pull/122098#issuecomment-2021233604))
2024-03-26 18:54:17 +00:00
b6982bf2b2 [dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)
Fixes #114844

In the linked issue we have
```
compiled_module = torch.compile(module)
compiled_module.x = ...
compiled_module(...)  # Mutates self.x
```
Where since the module mutates `self.x` you would expect `compiled_module.x`
to be updated but actually `compiled_module.x = ...` sets an attribute "x"
on the `OptimizedModule` object while the forward method of the module mutates
`module.x`.

This gives the expected behavior by forwarding `compiled_module.__setattr__`
down to `module.__setattr__`. There is already a corresponding `__getattr__`
so now `compiled_module.x` becomes an alias for `module.x`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098
Approved by: https://github.com/ezyang, https://github.com/lezcano
2024-03-26 00:52:12 +00:00
e5e0685f61 Revert "[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)"
This reverts commit 88ebdbc97c103271766203df6662240e95a09b42.

Reverted https://github.com/pytorch/pytorch/pull/122098 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the distributed failure looks legit as it is also failing in trunk 88ebdbc97c ([comment](https://github.com/pytorch/pytorch/pull/122098#issuecomment-2008483316))
2024-03-20 01:12:24 +00:00
88ebdbc97c [dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)
Fixes #114844

In the linked issue we have
```
compiled_module = torch.compile(module)
compiled_module.x = ...
compiled_module(...)  # Mutates self.x
```
Where since the module mutates `self.x` you would expect `compiled_module.x`
to be updated but actually `compiled_module.x = ...` sets an attribute "x"
on the `OptimizedModule` object while the forward method of the module mutates
`module.x`.

This gives the expected behavior by forwarding `compiled_module.__setattr__`
down to `module.__setattr__`. There is already a corresponding `__getattr__`
so now `compiled_module.x` becomes an alias for `module.x`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098
Approved by: https://github.com/ezyang, https://github.com/lezcano
2024-03-19 16:51:43 +00:00
d14d62b7aa [dynamo] add more refleak tests (#120657)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120657
Approved by: https://github.com/jansel
2024-03-07 22:25:43 +00:00
99cb807e25 Skip test_wrap_bad if run under pytest (#115070)
Pytest replaces sys.stdout/stderr by `TextIOWrapper` instances which do not support `fileno()`
Hence skip that test in this case

Fixes #115069

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115070
Approved by: https://github.com/clee2000
2024-02-15 00:10:05 +00:00
790858afa9 Make start compiling stack trace omit framework frames (#119251)
Fixes https://github.com/pytorch/pytorch/issues/119238

Here's what it looks like now:

```
$ TORCH_LOGS=+torch._dynamo.convert_frame python a.py
[2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG] torchdynamo start compiling f /data/users/ezyang/b/pytorch/a.py:3, stack (elided 5 frames):
[2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG]   File "/data/users/ezyang/b/pytorch/a.py", line 7, in <module>
[2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG]     f(torch.randn(2))
[2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG]   File "/data/users/ezyang/b/pytorch/torch/_dynamo/eval_frame.py", line 453, in _fn
[2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG]     return fn(*args, **kwargs)
[2024-02-05 18:52:07,248] [0/0] torch._dynamo.convert_frame: [DEBUG]
$ cat a.py
import torch

@torch.compile
def f(x):
    return x * 2

f(torch.randn(2))
```

The eval_frame frame is intentionally present, since what happens is you run the torch.compile wrapper, and then you actually hit the user frame to be compiled.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119251
Approved by: https://github.com/yanboliang, https://github.com/mlazos
2024-02-06 17:40:07 +00:00
039fbeb016 [dynamo] fix functools.reduce() function with None as initial (#116398)
The `initial` argument in `functools.reduce` can be `None`.

```python
initial_missing = object()

def reduce(function, iterable, initial=initial_missing, /):
    it = iter(iterable)
    if initial is initial_missing:
        value = next(it)
    else:
        value = initial
    for element in it:
        value = function(value, element)
    return value
```

Reference:

- python/cpython#102759

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116398
Approved by: https://github.com/Skylion007
2023-12-25 21:23:28 +00:00
92e3f45f0e Revert "[dynamo] Refactor test cross importing (#113242)"
This reverts commit 4309d38f5d33530cbd875bded551e3fc08286c5d.

Reverted https://github.com/pytorch/pytorch/pull/113242 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113242#issuecomment-1811674395))
2023-11-15 01:53:07 +00:00
4309d38f5d [dynamo] Refactor test cross importing (#113242)
Having tests import tests is a bit annoying because fbcode/oss have different paths.  This moves that stuff into a helper function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113242
Approved by: https://github.com/yanboliang
2023-11-11 03:17:35 +00:00
59592389fc Revert "[dynamo] Refactor test cross importing (#113242)"
This reverts commit 8858edad656f505728c9810093f796f96e1285cb.

Reverted https://github.com/pytorch/pytorch/pull/113242 on behalf of https://github.com/PaliC due to this diff appears to be causing inductor failures internally ([comment](https://github.com/pytorch/pytorch/pull/113242#issuecomment-1805132719))
2023-11-10 05:43:08 +00:00
8858edad65 [dynamo] Refactor test cross importing (#113242)
Having tests import tests is a bit annoying because fbcode/oss have different paths.  This moves that stuff into a helper function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113242
Approved by: https://github.com/yanboliang
2023-11-09 01:36:27 +00:00
aa649f713f [dynamo, test] remove #ops comparison to fx.symbolic_trace from dynamo standard_test (#112420)
Fix https://github.com/pytorch/pytorch/issues/112230 by removing the comparison of number of ops in dynamo vs. fx.symbolic_trace. A number of tests fail in `test_functions.py` fail because the number of ops is no longer the same, but this seems to be acceptable behavior by dynamo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112420
Approved by: https://github.com/jansel, https://github.com/int3
2023-10-31 19:55:47 +00:00
a26cb0a3f2 [dynamo] Enable typechecking for testing.py (#112129)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112129
Approved by: https://github.com/Skylion007
ghstack dependencies: #111894, #111992, #112031, #112127, #112128
2023-10-27 18:00:56 +00:00
cc9b7bb85c [reland] [inductor] fix a max-autotune rng state related bug (#111381)
reland https://github.com/pytorch/pytorch/pull/109828

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111381
Approved by: https://github.com/lezcano
2023-10-17 19:16:36 +00:00
d9627c4264 Revert "[inductor] fix a max-autotune rng state related bug (#109828)"
This reverts commit 3663436db31bd3cebcb76efe05d8355553a05c57.

Reverted https://github.com/pytorch/pytorch/pull/109828 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the rocm failure looks legit. There is also another numpy import error when running dynamo test on CPU ([comment](https://github.com/pytorch/pytorch/pull/109828#issuecomment-1732423883))
2023-09-23 22:35:37 +00:00
3663436db3 [inductor] fix a max-autotune rng state related bug (#109828)
Fix https://github.com/pytorch/pytorch/issues/109736 .

HF pin move causes regression on accuracy check for HF models on the dashboard. Manually reverting the HF PR ( https://github.com/huggingface/transformers/pull/24696/files ) could recover, but this may hide some real issue. I happen to found that using a warm matmul max-autotune cache can work around the issue. Or putting it in another way:
- make all calls to check_cache cache miss repro the issue
- make all cals to check_cache cache hit works around the issue

I did some sort of 'bisect' to force halving the amount of cache miss each time while still make sure we can repro. Luckily reducing to a single cache miss still repro the issue. With more debugging, it turns out that it's the call to `torch.randn` on cuda device causing the problem.

The fix is to make sure  we restore the rng state when we generate random inputs for max-autotune benchmarking.

TBH, I can not fully explain the root cause although I know it's caused by rng state change.  AOTAutograd already has some logic to preserve rng state. And I can not repro the issue in unit tests. I have a few guess why the RNG state is not restored in the first place after we generate random inputs for max-autotune:
- maybe AOTAutograd misses some corner case to preserve the rng state
- maybe for the failed models, there are some eager fallback that's not handled by inductor. And if those fallback calles random number related APIs, we will see the issue. But again I don't find a good way to simulate this.

Repro:

```
TORCHINDUCTOR_BENCHMARK_KERNEL=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 CUDA_VISIBLE_DEVICES=3 time python benchmarks/dynamo/huggingface.py --backend inductor --amp --accuracy --only PLBartForCausalLM --training --cold-start-latency
```

We always repro the issue without the PR but pass the accuracy check with the PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109828
Approved by: https://github.com/eellison
2023-09-23 00:58:10 +00:00
54e73271c7 When patching dynamic shapes test class, don't run the original tests (#108681)
redo of https://github.com/pytorch/pytorch/pull/103523

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108681
Approved by: https://github.com/ezyang
2023-09-07 02:13:59 +00:00
a9dca53438 NumPy support in torch.compile (#106211)
RFC: https://github.com/pytorch/rfcs/pull/54
First commit is the contents of https://github.com/Quansight-Labs/numpy_pytorch_interop/

We have already been using this in core for the last few months as a external dependency. This PR pulls all these into core.

In the next commits, I do a number of things in this order
- Fix a few small issues
- Make the tests that this PR adds pass
- Bend backwards until lintrunner passes
- Remove the optional dependency on `torch_np` and simply rely on the upstreamed code
- Fix a number dynamo tests that were passing before (they were not tasting anything I think) and are not passing now.

Missing from this PR (but not blocking):
- Have a flag that deactivates tracing NumPy functions and simply breaks. There used to be one but after the merge stopped working and I removed it. @lezcano to investigate.
- https://github.com/pytorch/pytorch/pull/106431#issuecomment-1667079543. @voznesenskym to submit a fix after we merge.

All the tests in `tests/torch_np` take about 75s to run.

This was a work by @ev-br, @rgommers @honno and I. I did not create this PR via ghstack (which would have been convenient) as this is a collaboration, and ghstack doesn't allow for shared contributions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106211
Approved by: https://github.com/ezyang
2023-08-11 00:39:32 +00:00
186352a625 [inductor] Make autotune_process.py pass mypy (#105791)
`TensorMeta.from_irnodes` handles either a single `IRNode` or a tuple or list of them. I tried to express this with overloading, but because this file is in MYPYNOFOLLOW, the `IRNode` subclasses become `Any`, which causes the overloads to be overlapping.

This changes the type of the argument to `benchmark_in_sub_process` to the more specific `TritonTemplateCaller`, since that one has the `bmreq` member and existing docstrings indicate that only the triton template benchmark is handled.

The `rand_strided` call caused a mypy error because the default value for device was a string. This is fixed by adding type hints to `rand_strided` in `torch/_dynamo/testing.py`. Likewise, the return value of `PyCodeCache.load_by_key_path` can be inferred from the type hint on `PyCodeCache.cache`.

Fixes one part of #105230

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105791
Approved by: https://github.com/jansel, https://github.com/Skylion007
2023-07-31 23:58:38 +00:00
920b446da9 dynamo: support disable_saved_tensors_hooks (#104869)
Functorch transforms use this context manager which will lead to graph-breaks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104869
Approved by: https://github.com/zou3519
2023-07-26 07:27:37 +00:00
2385dad4b3 Enable automatic_dynamic_shapes by default (#103623)
Some notes:

* I now manually turn off `_generate` jobs from running with cudagraphs, as it is unrealistic to expect to cudagraph autoregressive generation up to max sequence length, this would imply compiling the entire unrolled sequence generation. Concretely, cm3leon_generate was timing out post this change, likely due to the compile time slowdown of dynamic shapes ON TOP OF accidentally unrolling all the loops
* A few torch._dynamo.reset tactically inserted to force recompiles on tests that expected it
* expectedFailureAutomaticDynamic flip into patching automatic_dynamic_shapes=False

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103623
Approved by: https://github.com/voznesenskym
2023-07-05 00:25:02 +00:00
75dab587ef [dynamo] FSDP + AC + torch.compile (#103953)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103953
Approved by: https://github.com/wanchaol
2023-06-24 01:40:56 +00:00
7ce932a92c Add signpost_event to dynamic_shapes (#103882)
Added two signpost_event calls to torch.fx.experimental.symbolic_shapes, one for produce_guards (where we can give stats like how many free symbols and how many guards produced) and the other is for evaluate_expr after freeze (so we can look for cases where we're improperly discarding guards in backwards.)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103882
Approved by: https://github.com/Skylion007
2023-06-21 13:26:21 +00:00
ed3a61afcc Add automatic_dynamic_shapes test configuration (#103598)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103598
Approved by: https://github.com/Skylion007
2023-06-15 19:55:57 +00:00
bc6ec97e02 Switch dynamic_shapes to True by default (#103597)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597
Approved by: https://github.com/voznesenskym
2023-06-15 15:16:20 +00:00
aece6705d1 Move locals/globals to output graph, make it easier to access them anywhere (#103456)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103456
Approved by: https://github.com/jansel
2023-06-14 20:04:33 +00:00
2f5fef5912 Refactor tests for dynamic shapes (#103542)
First, infra improvements: new combinator `expectedFailureDynamic` which subsumes expectedFailure calls in test_dynamic_shapes.py. It's just nicer to have these right with the test. Implementation in torch/_dynamo/testing.py and it works by putting an attr on the test, which is then converted into a real expectedFailure when we actually generate the dynamic shapes test class

Next, some housekeeping:
* test/dynamo/test_unspec.py accidentally was running mostly statically due to the `assume_static_by_default` config flip. Don't assume static by default and xfail some tests which regressed in that time.
* New test file test/dynamo/test_config.py, for testing permutations of configuration options. `test_dynamic_shapes` got moved there.

Finally, grinding through tests in a way that will make them more compatible with dynamic by default:
* If the test explicitly requires dynamic_shapes=False, remove that patch (and probably xfail it)
* If the test checks dynamic_shapes internally, remove that test and patch the test so it ALWAYS runs with dynamic_shapes (this is not coverage loss because we're going to switch the default)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103542
Approved by: https://github.com/anijain2305
2023-06-14 02:04:54 +00:00
687afeb686 [dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849)
Issue: #93684

# Problem

Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations.

# Design (as I know it)

* Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`.
* Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent.

This PR adds `NumpyTensorVariable` and supports:
1.  tensor to ndarray, ndarray to tensor
2. numpy functions such as numpy.meshgrid()
3. ndarray attributes such as `itemsize`, `stride`

Next PR will handle returning `np.ndarray` and add support for ndarray methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849
Approved by: https://github.com/ezyang
2023-04-27 16:18:35 +00:00
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
220712f4de Fix torch.compile() on a skipped module (#98894)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98894
Approved by: https://github.com/xw285cornell
2023-04-22 16:10:55 +00:00
88c8c2b71b [dynamo 3.11] implement 3.11 exceptiontable (#96511)
Summary of changes:
- Add CPython exceptiontable parsing/assembling functions in torch/_dynamo/bytecode_transformation.py, based on https://github.com/python/cpython/blob/3.11/Objects/exception_handling_notes.txt.
- Add optional `exn_tab_entry` field to dynamo `Instruction`s in torch/_dynamo/bytecode_transformation.py in order to virtualize exception table entries (start, end, target instructions).
- Add checks guarding against duplicate instructions in dynamo, so that jump/exceptiontable targets are unambiguous. See `get_indexof` in torch/_dynamo/bytecode_analysis.py. Ensure that bytecode generation throughout dynamo does not generate duplicate instructions.
- Allow dynamo bytecode generation logic to generate nested exception table entries for developer convenience. CPython expects entries to not overlap, so we flatten nested entries during assembly in torch/_dynamo/bytecode_transformation.py:compute_exception_table.
- Simulate the block stack in torch/_dynamo/symbolic_convert.py. CPython removed the block stack in 3.11, but dynamo needs it in order to keep track of active contexts. So we simulate the block stack as before by looking at exceptiontable entries in order to determine the current blocks.
- Update context codegen in torch/_dynamo/resume_execution.py. The `SETUP_FINALLY` bytecode, which conveniently had a jump target to the finally block, was removed in 3.11, so we need to keep track of the jump target of the finally block using exceptiontables. Generating resume functions is more difficult since the original exceptiontable entries pointing to old cleanup code need to be modified to point to new cleanup code.
- Fix a push_null bug in torch/_dynamo/variables/functions.py introduced by https://github.com/pytorch/pytorch/pull/98699

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96511
Approved by: https://github.com/jansel, https://github.com/yanboliang, https://github.com/albanD
2023-04-18 07:53:24 +00:00