Commit Graph

58597 Commits

Author SHA1 Message Date
79e14f8fd6 [better_engineering][multiplatform] Repalce host_info() check with select for default_compiler_flags (#98306)
Summary: Same as title

Test Plan: CI

Differential Revision: D44667769

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98306
Approved by: https://github.com/priyaramani, https://github.com/malfet
2023-04-07 15:39:38 +00:00
390c51bf87 Skip nnmodule hook guards by default (#98371)
This PR makes basic nnmodule forward hooks work by default, without any overhead.  But it leaves silent correctness issues if users modify/remove their hooks later, thus also emits a warning.

- the usual case is to not use hooks, so avoid guard overhead here
- registering any hook before compile will trigger a warning about hook support
- registering a hook later (or removing one) requires user knowledge and opting in,
  currently this isn't warnable (but maybe we can observe compiled nnmodules to make it
  warnable).

Why skip hook guards by default instead of not tracing __call__/hooks by default?
- avoid having a mode flag that alters dynamo tracing behavior (harder to test both codepaths
  in CI with full coverage)
- the most basic hook usecase (registering a hook before compile, and never removing it)
  will work by default with this PR, while it would require enablement and incur overhead
  in the 'not tracing __call__' proposal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98371
Approved by: https://github.com/jansel
2023-04-07 15:10:51 +00:00
46d765c15e [devX] make labels only count their own occurences (#98551)
Small QoL improvement such that add_numbered_label now works more intuitively. Now if we push different labels instead of having `[reverted, mergedX2, revertX3, mergedX4, revertedX5, mergedX6]` we have `[reverted, merged, revertX2, mergedX2, revertedX3, mergedX3]`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98551
Approved by: https://github.com/huydhn
2023-04-07 08:30:46 +00:00
d06662fb57 Add ephemeral merging label (#98543)
Addresses https://github.com/pytorch/test-infra/issues/3950

Test Plan: Ran a dry run on this pr. The label showed up while trying to merge
<img width="354" alt="Screenshot 2023-04-06 at 4 57 48 PM" src="https://user-images.githubusercontent.com/13758638/230514276-1ac70b58-d2d1-4e4b-892b-a957bf156063.png">

And then disappeared after failing
<img width="373" alt="Screenshot 2023-04-06 at 5 00 11 PM" src="https://user-images.githubusercontent.com/13758638/230514470-38b15ec7-cfd9-4efe-b6e8-0f9af5577c62.png">

There's also the trail of adding and removing the "merging" label at the bottom

Notes: This is slightly buggy sometimes. For example when the merge failed when I was editing this textbox, the label did not disappear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98543
Approved by: https://github.com/malfet
2023-04-07 08:24:54 +00:00
d643a00efc inductor(CPU): support dynamic shape for onednn fusion path (#97230)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97230
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/jansel
2023-04-07 06:53:31 +00:00
77d9742c24 [Inductor] Fix bug in lowering.slice_ when negative start out of range (#98517)
Fixes error from 14k github models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98517
Approved by: https://github.com/ngimel
2023-04-07 06:48:51 +00:00
45a2f6b70f Revert "Reduce includes of CUDACachingAllocator.h (#97072)"
This reverts commit 1bcb88089468a6ebc667bd76256c4dd6f58b7ee3.

Reverted https://github.com/pytorch/pytorch/pull/97072 on behalf of https://github.com/weiwangmeta due to breaking internal builds
2023-04-07 06:15:11 +00:00
5c8fea5647 Reduce overhead in CUDAGraph Trees (#98529)
Significantly reduces overhead of constructing Tensors and Storages and checking Storage Liveness. Removes the regression for HF models that I tested and removes 75% of overhead of the extremely overhead bound resnet50 training we have in torchbench. (.91x base commit, 1.02x torchinductor default, 1.16x this PR, 1.25 previous cudagraphs impl).

This PR takes care of all of the lower hanging fruit.

- Computes storage aliasing at record time instead of during at runtime. We no longer need to use a runtime storage cache, and can instead index directly into the existing alias if there is one, or construct a new Storage

- Moves the heavyweight C++ calls into a batch - getting storage weakrefs and constructing tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98529
Approved by: https://github.com/jansel, https://github.com/ngimel
2023-04-07 05:46:08 +00:00
616f50da3a [quant][pt2e] QNNPackQuantizer support annotation for resnet18 (#98507)
Summary:
This PR adds annotation support for conv2d relu, linear, maxpool2d, add and add relu so
that we can successfully quantize resnet18 with the prepare_pt2e_quantizer API and get the same result
as fx graph mode quantization

Test Plan:
python test/test_quantization.py TestQuantizePT2EModels.test_resnet18_with_quantizer_api

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98507
Approved by: https://github.com/vkuzo
2023-04-07 04:27:21 +00:00
5a537e291d refactor(add privateuseone floder in aten/src/ATen): add a PrivateUse… (#98127)
Add a PrivateUse1 folder to contain all the feature adaptations for PrivateUse1 under Aten,For example GetGeneratorPrivate which is used for the three-party backend to register his own Generator implementation.This makes it easier for us to centrally manage these features, and it will increase the convenience of adaptation for different back-end manufacturers. For more info: https://github.com/pytorch/pytorch/issues/98073

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98127
Approved by: https://github.com/bdhirsh
2023-04-07 03:43:16 +00:00
29608fd28d [pt2][inductor] hardcode autotuning names (#98351)
Summary: switch to hardcoded autotuning names, we want consistency incase the default choice changes

Test Plan: CI

Differential Revision: D44643318

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98351
Approved by: https://github.com/jansel
2023-04-07 03:40:33 +00:00
3d8ead7ee1 [vision hash update] update the pinned vision hash (#98367)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98367
Approved by: https://github.com/pytorchbot
2023-04-07 02:56:14 +00:00
1fb8428d70 Fix off-by-1 error in dynamo coverage stats (#98558)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98558
Approved by: https://github.com/malfet
2023-04-07 02:52:22 +00:00
2161be08c4 Disable test_torchinductor_dynamic_shapes on ASAN (#98544)
This is yet another wrong shard number calculation on ASAN causing flakiness.  I figure that we don't really need to run this test on ASAN, so let disable it.  There is discussion at the moment to run ASAN periodically too.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98544
Approved by: https://github.com/malfet
2023-04-07 02:27:52 +00:00
152d65ae1d [reland][inductor] Enable CudaWrapperCodeGen for non-AOT mode (#98534)
Summary: This is a reland of #98264.

When _inductor.config.cpp_wrapper is specified, we run a
two-pass wrapper codegen to generate wrapper code in cpp which calls
cuLaunchKernel to launch pre-compiled cuda kernels, and then call
load_inline to load that generated wrapper back into the python world.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98534
Approved by: https://github.com/huydhn
2023-04-07 02:04:03 +00:00
d4dbdee528 Update _linux-test.yml (#98317)
Skip "setup-ssh" for now for a100 runners from GCP as it frequently encounter issues like "connect ETIMEDOUT 173.231.16.75:443" Every day about 10 occurrences

Examples for just today so far:
2023-04-04T15:07:50.916331Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4609056040/jobs/8146321650
-- | -- | --
2023-04-04T15:03:56.914692Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4609010125/jobs/8146217819
2023-04-04T14:39:58.004717Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4608784966/jobs/8145641764
2023-04-04T14:19:28.854825Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4608561116/jobs/8145147916
2023-04-04T06:15:39.241848Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4604422106/jobs/8135687673
2023-04-04T06:10:21.056131Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4604406947/jobs/8135611094
2023-04-04T05:34:50.908482Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4604198332/jobs/8135201048
2023-04-04T03:04:36.628201Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4603162241/jobs/8133620905
2023-04-04T01:49:27.119830Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4600897505/jobs/8132760483
2023-04-04T01:18:06.141437Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4602745871/jobs/8132387930
2023-04-04T00:38:30.610770Z | inductor | https://github.com/pytorch/pytorch/actions/runs/4602537869/jobs/8131938265

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98317
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-04-07 01:51:02 +00:00
a0a0b0c701 Dont decompose dropout so it can be pattern matched (#97931)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97931
Approved by: https://github.com/ngimel
2023-04-07 01:15:24 +00:00
482f87a7bc [quantized] Fix return values of _get_name() in quantized ConvTranspose (#97678)
This PR fixes incorrect return values of _get_name() in quantized `ConvTranspose?d`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97678
Approved by: https://github.com/vkuzo, https://github.com/kit1980
2023-04-07 01:14:42 +00:00
88208c6fdf [inductor][cpp] fix mul for uint8 (#98473)
Fixes #98149

The type of `mul`'s output is not inconsistent with its input. This PR fixes the type of `mul`'s output.

Here is the output code for the newly added test case `pow+cos`. `tmp4` is 1024 before fixing and 0 after fixing.
#### Before fixing
```
auto tmp0 = in_ptr0[static_cast<long>(0)];     // tmp0 is unsigned_char
auto tmp1 = tmp0 * tmp0;                       // tmp1 is int
auto tmp2 = tmp1 * tmp1;                       // tmp2 is int
auto tmp3 = tmp2 * tmp0;                       // tmp3 is int
auto tmp4 = static_cast<float>(tmp3);          // tmp4 is float
auto tmp5 = std::cos(tmp4);
out_ptr0[static_cast<long>(0)] = tmp5;
```

#### After fixing
```
auto tmp0 = in_ptr0[static_cast<long>(0)];     // tmp0 is unsigned_char
auto tmp1 = decltype(tmp0)(tmp0 * tmp0);       // tmp1 is unsigned_char
auto tmp2 = decltype(tmp1)(tmp1 * tmp1);       // tmp2 is unsigned_char
auto tmp3 = decltype(tmp2)(tmp2 * tmp0);       // tmp3 is unsigned_char
auto tmp4 = static_cast<float>(tmp3);          // tmp4 is float
auto tmp5 = std::cos(tmp4);
out_ptr0[static_cast<long>(0)] = tmp5;
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98473
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/jansel
2023-04-07 01:10:36 +00:00
06eaa0970b [Resubmit] Don't crash on retrieveDesyncReport (#98470)
Per title

Differential Revision: [D44736409](https://our.internmc.facebook.com/intern/diff/D44736409/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98470
Approved by: https://github.com/XilunWu
2023-04-07 01:10:30 +00:00
4adba70cc6 [inductor][easy] use num_stages=1 for reduction (#98524)
Since num_stages only matters for matmul and does not matter for pointwise/reduction, set num_stage to 1 uniformly for all reductions in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98524
Approved by: https://github.com/ngimel
2023-04-07 01:06:07 +00:00
86cb7f40a9 Fix the missing PATH in mps workflow after #98522 (#98559)
This was missed in #98522
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98559
Approved by: https://github.com/malfet
2023-04-07 00:15:50 +00:00
22411b6f02 Revert "[dynamo 3.11] enable dynamo unittests in 3.11 (#98104)"
This reverts commit 0066f3405f290ab6ef379abea6945058f8eb7ce5.

Reverted https://github.com/pytorch/pytorch/pull/98104 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it is failing on CPU 3.11 test in trunk 0066f3405f.  This is probably a landrace
2023-04-07 00:05:30 +00:00
481ecffb5e Add test c10d ucc tests (#88110)
Creates the equivalent c10d test for ucc for https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_gloo.py and https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_nccl.py. Uses test_c10d_gloo.py as the reference and adds all the common ops. More detailed comparison of available ops here: https://docs.google.com/document/d/1yPsa_X9EiEiqo-j2Yn7ierhccBtEjwoqC-B7-amI0MI/edit?usp=sharing

Also removes extra line for ProcessGroupUCC.cpp barrier blocking wait that got duplicated from merging https://github.com/pytorch/pytorch/pull/85047.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88110
Approved by: https://github.com/zasdfgbnm, https://github.com/kit1980, https://github.com/kwen2501, https://github.com/malfet
2023-04-06 23:51:27 +00:00
8a29afe98a [RFC] Add warning about object-based collectives for GPU tensors to docs. (#97702)
Using GPU tensors in these collectives have caused SEVs, user
confusion, and slowness in the past. These APIs were only designed to
communicate arbitrary python objects, and GPU tensors should either be copied
to CPU first or use the regular collecitves. Add a warning indicating so.

Differential Revision: [D44435849](https://our.internmc.facebook.com/intern/diff/D44435849/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97702
Approved by: https://github.com/kumpera
2023-04-06 23:47:35 +00:00
eb5da4df8a Speed up LossCTC.cu (#97269)
For these two kernels, `grid.x == 1` is enough. `grid.x > 1` leads to repeated computation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97269
Approved by: https://github.com/ngimel, https://github.com/malfet
2023-04-06 23:44:25 +00:00
a2bb2fae1b Add Autocast support to MatMult thourgh explicit cast (#98346)
Fixes external issue https://github.com/microsoft/onnx-converters-private/issues/157

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98346
Approved by: https://github.com/BowenBao
2023-04-06 23:19:52 +00:00
0066f3405f [dynamo 3.11] enable dynamo unittests in 3.11 (#98104)
Enable most dynamo unittests for 3.11. There are a few tests that are skipped due to failures that will be addressed in upcoming PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98104
Approved by: https://github.com/yanboliang, https://github.com/voznesenskym, https://github.com/albanD, https://github.com/jansel, https://github.com/jerryzh168, https://github.com/malfet
2023-04-06 23:15:48 +00:00
dbfc4df075 Add $CONDA_ENV/bin to PATH on MacOS (#98522)
This PR explicitly add $CONDA_ENV/bin to MacOS PATH, so that it can always detect and use the correct Python.  $CONDA_ENV is always set to the correct value in setup-miniconda https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-miniconda/action.yml#L141

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at b4de81a</samp>

This pull request fixes the conda-pip environment mismatch for the macOS build and test workflows by using consistent pip requirements files. It also adds a conditional block to the `.github/workflows/_mac-test-mps.yml` file to enable the test MPS job.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98522
Approved by: https://github.com/malfet
2023-04-06 21:34:52 +00:00
531b8e8f1e stop using caffe2/core/logging.h forwarding header in serialize lib (#98168)
No need to create a library for this useless header.

Differential Revision: [D44612668](https://our.internmc.facebook.com/intern/diff/D44612668/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98168
Approved by: https://github.com/PaliC
2023-04-06 21:27:07 +00:00
fdb9441e7e Stop recursion on trivial replacement (#97903)
Pattern replacement behaves incorrectly when the replacement pattern maps inputs to outputs (such a pattern can be used to replace redundant code). However, current code in `torch.fx.subgraph_rewriter._replace_pattern` causes the list of replacement nodes to include the entire graph before that node, resulting in an exponential slowdown due to recursive calls traversing the entire graph multiple times.

The proposed fix is to add a check in `_replace_pattern` to prevent the call to `get_replacement_nodes`:
```python
        for ret_node in copied_returning_nodes:
            if ret_node in match.placeholder_nodes:
                replacement_nodes.append(ret_node)
            else:
                get_replacement_nodes(ret_node)
```

Fixes #97817

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97903
Approved by: https://github.com/angelayi
2023-04-06 20:49:08 +00:00
ca1fe9bae5 remove no-op C10_DISABLE_NUMA preprocessor flag (#98243)
Nothing reads this, so setting it does nothing.

Differential Revision: [D44642070](https://our.internmc.facebook.com/intern/diff/D44642070/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44642070/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98243
Approved by: https://github.com/PaliC
2023-04-06 20:38:10 +00:00
e4c8c75583 [PG NCCL] Add TDD, NCCL_DEBUG log (#97692)
Prints these env var setting during setup for easier debug.

Differential Revision: [D44430875](https://our.internmc.facebook.com/intern/diff/D44430875/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97692
Approved by: https://github.com/kumpera
2023-04-06 20:37:46 +00:00
03a428a5b2 [ONNX] Introduce 'Functionalization' for fx exporter (#98245)
<img src="https://user-images.githubusercontent.com/9376104/229648898-7e85efc8-143f-42f9-93e0-298a8f86c0a1.png" width="80%" height="80%">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98245
Approved by: https://github.com/wschin, https://github.com/titaiwangms
2023-04-06 20:26:50 +00:00
edebe413d3 [inductor] fix scatter fallback and fallback in deterministic mode (#98339)
Fixes https://github.com/pytorch/pytorch/issues/93537

add `ir.ScatterFallback` to handle the mutation correctly of scatter/scatter_reduce fallback, also handle the case that `src` is a scalar, and lastly fallback in deterministic mode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98339
Approved by: https://github.com/jansel
2023-04-06 19:43:17 +00:00
68cb06c752 Make gen_annotated_args support kwargs (#98396)
This PR is to address the issue seeing in PR #97417 where the newly added op requires `kwargs`, however, currently tools/autograd/gen_annotated_fn_args.py does not support `kwargs`, only `func_args` are generated for test_overrides.py.

The PR adds a new field "is_kwargs" to each argument indicating whether it's a `kwargs` or not. See example:
```
annotated_args = {
    torch._C._VariableFunctions._cast_Byte: [{'is_kwarg_only': 'False', 'name': 'self', 'simple_type': 'Tensor'}],
    ...
```

The full comparison of the generated file `annotated_fn_args.py` can be found here:
  - **Before**: [P681991116](https://www.internalfb.com/phabricator/paste/view/P681991116)
  - **After**: [P681994218](https://www.internalfb.com/intern/paste/P681994218/)

Differential Revision: D44698310

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98396
Approved by: https://github.com/ezyang
2023-04-06 19:42:26 +00:00
fe99d39fbd migrate PyTorch to c10::bit_cast (#98418)
Use the standardized version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98418
Approved by: https://github.com/ezyang
2023-04-06 19:38:06 +00:00
213cec3c45 Revert "Add typing_extensions as MacOS ci dependency (#98522)"
This reverts commit e6e33488d3e7de4f58359b6c86b3c43fa33cbfc5.

Reverted https://github.com/pytorch/pytorch/pull/98522 on behalf of https://github.com/huydhn due to This needs rework
2023-04-06 19:37:38 +00:00
12f340dcd9 Add round as UserError (#98376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98376
Approved by: https://github.com/anijain2305
2023-04-06 19:28:00 +00:00
e0b958f975 [SPMD] Allow IterGraph support a more general subgraph movement (#98360)
Resubmit D44444398 due to the  merging conflict.

The original assumption of IterGraph is very restrictive and only allow users to move a subgraph that only one node has the input from external nodes. This PR fixes the limitation.

Differential Revision: [D44689730](https://our.internmc.facebook.com/intern/diff/D44689730/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44689730/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98360
Approved by: https://github.com/lessw2020
2023-04-06 19:13:37 +00:00
f228b3977b Revert "[inductor] Enable CudaWrapperCodeGen for non-AOT mode (#98264)"
This reverts commit 77f32eb6ccf9c276fba1724e463247930ef71ec3.

Reverted https://github.com/pytorch/pytorch/pull/98264 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but this is failing in trunk due to a name error fake_mode_from_tensors is not defined 67d1a77086. This is probably a landrace
2023-04-06 19:00:09 +00:00
3b6e94cb8c [small] replace with .format() with f-strings (#98514)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98514
Approved by: https://github.com/awgu
2023-04-06 18:58:56 +00:00
0210481dcb Fix _like meta registrations (#98160)
The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!).
zeros_like is special due to sparse and is fixed directly by always filling it with zeros.
Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions.
I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal.

test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160
Approved by: https://github.com/ezyang
2023-04-06 18:44:34 +00:00
dcb9440af9 [kineto] add SOFT_ASSERT when logging metdata (#98442)
Summary: having a valid `kineto_activity_` before logging metadata is a crucial invariant worthy of asserts

Test Plan:
## Test with D44362040

Verify that we get SOFT_ASSERT logs before and after the diff

## Log
```
W0329 11:29:34.269824 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.270107 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.270385 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.270653 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.270941 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.271199 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.271476 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.271724 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.272003 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.272280 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.272553 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.272822 718148 profiler_kineto.cpp:122] Warning:  (function operator())
W0329 11:29:34.273092 718148 profiler_kineto.cpp:122] Warning:  (function operator())
```

Reviewed By: aaronenyeshi

Differential Revision: D44513152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98442
Approved by: https://github.com/aaronenyeshi
2023-04-06 18:39:13 +00:00
e394f6db5a Revert "Improve dynamo support for autograd.Function (#98158)"
This reverts commit 4716fa24115435fa87d04213382d757816b8f1f3.

Reverted https://github.com/pytorch/pytorch/pull/98158 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to breaks MacOS trunk job 4716fa2411.  The signal was missing from the PR because we disabled MacOS job yesterday due to https://github.com/pytorch/pytorch/issues/98362
2023-04-06 18:15:02 +00:00
e6e33488d3 Add typing_extensions as MacOS ci dependency (#98522)
MacOS jobs start to fail in trunk because of this missing dependency 938c5da61e.  So I add it explicitly. Caching issue?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98522
Approved by: https://github.com/malfet
2023-04-06 17:48:25 +00:00
49b80c3ea2 [reland] remove typed StorageImpl::data() and StorageImpl::unsafe_data() (#98411)
Original commit changeset: a466b3cb6a0a

Original Phabricator Diff: D44629941

Differential Revision: [D44709004](https://our.internmc.facebook.com/intern/diff/D44709004/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98411
Approved by: https://github.com/ezyang
2023-04-06 17:42:48 +00:00
e663143871 [dynamo 3.11] fix 3.11.2 issues (#98364)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98364
Approved by: https://github.com/albanD
2023-04-06 17:37:25 +00:00
1bcb880894 Reduce includes of CUDACachingAllocator.h (#97072)
On my machine this goes from > 200 to ~80, making rebuilds faster.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97072
Approved by: https://github.com/wanchaol
2023-04-06 17:22:35 +00:00
e085acc9f3 Cleanup Copy.cu logic (#97071)
Some of the logic specific to the cudaMallocAsync allocator related to peer access is placed outside of the allocator itself. This PR refactors, documents, and encapsulates it, while maintaining the same behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97071
Approved by: https://github.com/ngimel, https://github.com/eellison
2023-04-06 17:22:35 +00:00