Commit Graph

77484 Commits

Author SHA1 Message Date
b2eb0e8c6a docker: Use miniforge, install from pip (#134274)
Switch installation of the pytorch package to be installed from our download.pytorch.org sources which are better maintained.

As well, switching over the miniconda installation to a miniforge installation in order to ensure backwards compat for users expecting to have the conda package manager installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134274
Approved by: https://github.com/malfet, https://github.com/atalman

Co-authored-by: atalman <atalman@fb.com>
2024-08-22 23:20:22 +00:00
30d7e7a1cd [XPU] Fix patch for old llvm package error for triton xpu (#134204)
Fixes #134199

The PR #133694 does a workaround to replace the str `"https://tritonlang.blob.core.windows.net/llvm-builds/"` with  `"https://oaitriton.blob.core.windows.net/public/llvm-builds/"` in `triton/python/setup.py`. However, in [newer version of Triton](06e6799f4e), it has already been changed to `"https://oaitriton.blob.core....` and don't need to be replaced.  But formerly, this will throw a runtime error.

This PR makes the `check_and_replace` logic won't fail in such a scenario. Both the old link and the newer link could work.

Also note that the `.ci/docker/common/install_triton.sh` does not need the fix, because its `sed` command won't be in effect if there is no such pattern.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134204
Approved by: https://github.com/chuanqi129, https://github.com/EikanWang, https://github.com/atalman
2024-08-22 23:18:44 +00:00
629bd6f718 Update FlexAttention with masking semantic (#133373)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133373
Approved by: https://github.com/yanboliang
2024-08-22 22:50:33 +00:00
e7929809f3 [c10d][ez] Add comments to CudaEventCache class (#134172)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134172
Approved by: https://github.com/d4l3k, https://github.com/kwen2501
2024-08-22 22:44:12 +00:00
b319fa3fd9 [ONNX] Opt into ruff fmt (#134120)
Add ONNX directory to use ruff format.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134120
Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007
2024-08-22 22:44:03 +00:00
25499de814 Remove ncclIdToCommMap_. (#133961)
There is no purpose for this map structure, and it is incorrect in
some cases.  For example, when the uniqueID is not broadcasted to the
other processes.
@exported-using-ghexport

Differential Revision: [D60966882](https://our.internmc.facebook.com/intern/diff/D60966882/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133961
Approved by: https://github.com/shuqiangzhang
ghstack dependencies: #133960
2024-08-22 22:06:25 +00:00
b0cf287b46 [export][training ir migration] Fix getitem not exist (#134259)
Summary:
Make quantization tests compatible with the new training IR.

With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node.

We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir.

For now, we are just rolling out the training ir for fbcode internal tests.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion

buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args
```

Reviewed By: andrewor14, tugsbayasgalan

Differential Revision: D61292102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259
Approved by: https://github.com/tugsbayasgalan
2024-08-22 22:00:14 +00:00
f0ba309d78 [CI][dashboard] Add jemalloc back for aarch64 (#134189)
Forward fix based on https://github.com/pytorch/pytorch/pull/133997#discussion_r1726004220
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134189
Approved by: https://github.com/malfet, https://github.com/huydhn
2024-08-22 21:08:39 +00:00
1b6bbaa016 Remove PMI dependencies in PyTorch (#133960)
This patch makes two changes:
1. Whenever ncclCommSplit accepts groupRanks in its config, we should
populate it.  This is independent of using PMI or not.  For example,
non-PMI NCCL can also use this information, if it chooses to.
2. Provide a user flag to decide when to do a uniqueId broadcast and
when to skip it.  This is a performance optimization, and not a
correctness requirement.  If the user forgets to set this, we will
do the uniqueId broadcast, which is wasteful (because it will be
ignored by NCCL), but not incorrect.
@exported-using-ghexport

Differential Revision: [D60966774](https://our.internmc.facebook.com/intern/diff/D60966774/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133960
Approved by: https://github.com/shuqiangzhang
2024-08-22 20:34:43 +00:00
ff61f55387 [Dynamo][autograd.Function] Supports ctx.set_materialize_grads (#133978)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133978
Approved by: https://github.com/zou3519
2024-08-22 20:06:17 +00:00
5633773188 Convert various jobs to be Linux Foundation fleet compatible (#134128)
Migrates a batch of workflows over to LF
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134128
Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt
2024-08-22 19:23:07 +00:00
0eb9c870fd [reland][ROCm] TunableOp for gemm_and_bias (#128919)
Reland of #128143 but added `alpha` and `bias` initialization to `launchTunableGemmAndBias`

Thus far TunableOp was implemented for gemm, bgemm, and scaled_mm. gemm_and_bias was notably missing. This PR closes that gap.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128919
Approved by: https://github.com/malfet
2024-08-22 18:27:50 +00:00
978c5a80a0 [export][training ir migration] fix batch norm pattern match in quantization (#134157)
Summary:
In the new training ir, we produce `torch.ops.aten.batch_norm.default` instead of `torch.ops.aten._native_batch_norm_legit.default` or `torch.ops.aten._native_batch_norm_legit_no_training.default`.

So we need to change the pattern match to accomodate the new op.

- Add `torch.ops.aten.batch_norm.default` to pattern matcher list so it's identified as a batch norm node
- `torch.ops.aten.batch_norm.default` doesn't have a getitem user anymore, so when removing the bn norm,  we need to do `bn_node.replace_all_uses_with(conv_node)` instead of `getitem_node.replace_all_uses_with(conv_node)`

The behavior of capture_pre_autograd_graph is consistent for each run.

If the run is a fbcode test, then capture_pre_autograd_graph uses training IR. This means both _get_aten_graph_module_for_pattern and  replace_pattern_with_filters see the same training IR.

If the run is not a fbcode test, then both would see the old IR.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d_binary2
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d_unary
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_linear_unary
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_dynamic_quant_linear
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_dynamic_quant_linear
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_flatten_recipe
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_linear_unary
```

Reviewed By: andrewor14, tugsbayasgalan

Differential Revision: D61291077

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134157
Approved by: https://github.com/tugsbayasgalan
2024-08-22 18:25:45 +00:00
fee677eeb6 [fbode-testing][dynamo][reland][inline-inbuilt-nn-modules] Mark attri… (#134136)
Shuai wants to test this internally before https://github.com/pytorch/pytorch/pull/133713 can go in. Creating a separate PR for ghmport.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134136
Approved by: https://github.com/yanboliang
2024-08-22 17:54:58 +00:00
8f7d66f0c3 Enable dynamic rollout for Linux binary workflows (#131472)
Enables dynamic migration of jobs to the LF AWS account for binary workflows.

The new runners are only given to people specified in this issue: pytorch/test-infra#5132

This closes pytorch/ci-infra#251.

Depends-On: pytorch/pytorch#132870
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131472
Approved by: https://github.com/ZainRizvi
2024-08-22 17:12:50 +00:00
d95aedf5fd [BE] typing for decorators - fx/_compatibility (part 1) (#134202)
Part of #134054.

This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
2024-08-22 17:07:33 +00:00
44fa9f991c [NJT] add aten.to.dtype support (#134164)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134164
Approved by: https://github.com/davidberard98
2024-08-22 16:59:38 +00:00
b6abac68ec [BE][dynamo] reorganize polyfill module hierarchy (#133977)
Changes:

1. Move `polyfill.py` -> `polyfills/__init__.py`. It can be used as `polyfill.xxx` -> `polyfills.xxx`.
2. Move submodule loading from `polyfills/__init__.py` to `polyfills/loader.py`.

Merge `polyfill.py` and `polyfills/` packages. Each polyfill module have its own namespace for better code organization.

The ultimate goal is make `polyfills/__init__.py` empty and all polyfill functions move to its own namespace.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133977
Approved by: https://github.com/jansel
2024-08-22 16:42:29 +00:00
c95ddd4bf2 [dynamo] ensure polyfill function has the same signature as the original function in substitute_in_graph (#133813)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133813
Approved by: https://github.com/jansel
2024-08-22 16:38:06 +00:00
240467adfe [fx] Implement deepcopy for Proxy (#133706)
Summary: When deepcopy a proxy, we first try the default deepcopy behavior.

Test Plan: buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r  proxy_deepcopy

Differential Revision: D61398418

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133706
Approved by: https://github.com/angelayi
2024-08-22 16:37:30 +00:00
b0171c3920 Revert "[ONNX] Opt into ruff fmt (#134120)"
This reverts commit 0870398fa8c3e097640f31cb8a8e2e2d3e522d33.

Reverted https://github.com/pytorch/pytorch/pull/134120 on behalf of https://github.com/albanD due to Breaks main branch lint ([comment](https://github.com/pytorch/pytorch/pull/134120#issuecomment-2305089756))
2024-08-22 15:48:14 +00:00
828ab84e19 Improve error msg on _lazy_init() error (#134159)
Reviewed By: hanzlfs

Differential Revision: D61627609

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134159
Approved by: https://github.com/hanzlfs
2024-08-22 15:10:50 +00:00
3c5485fb7f [Retry] Log chromium events to scuba (#134118)
Summary:
This diff implements a bunch of views for internal scuba viewing.

TODOS that I might punt to another diff:
- Saving cache stats via counter is definitely sus here, but there's not really a good way to track "fx graph cache hit for this compile phase" right now. Will think about this more.
- We should definitely log frame id, compile id, etc
- We should definitely be logging configs. That way, we can A/B test based on whether a config is turned on.
- idk what I'm doing with compile_uuid yet, but it's useful when you want to look at samples for a single run. I think if we had mast job info this field is not needed, but it's nice to be able to drill down to a single run and get its chrome trace view or icicle view, so idk

Test Plan:
All of the above views are run with nanogpt benchmark:

```
buck run mode/opt caffe2/benchmarks/dynamo:torchbench -- --training --backend=inductor --only nanogpt --performance
```

Differential Revision: D61603243

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134118
Approved by: https://github.com/oulgen
2024-08-22 14:59:45 +00:00
1b10a5c652 Allow SymInts and SymFloats as other in div_softmax_pattern (#133989)
Fixes https://github.com/pytorch/pytorch/issues/133759

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133989
Approved by: https://github.com/ezyang
2024-08-22 14:36:01 +00:00
afc2615d33 Add proper casting to fuse_linear_bn_weights (#134105)
As per title, this PR adds proper casting to fuse_linear_bn_weights in the same style as the conv case above. This previously caused numerical issues on my end, so that is why I am fixing it.

Also cleans up the docstring.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134105
Approved by: https://github.com/mikaylagawarecki
2024-08-22 14:26:12 +00:00
b459ca78eb [NJT]Add unit tests that cover the internal use cases using new NJT API (#133513)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133513
Approved by: https://github.com/davidberard98, https://github.com/soulitzer
2024-08-22 13:54:40 +00:00
1a7e8e5780 Revert "Update FlexAttention with masking semantic (#133373)"
This reverts commit 5a7b544e5c3e37bea62c6a231f6230c004a33d38.

Reverted https://github.com/pytorch/pytorch/pull/133373 on behalf of https://github.com/jeanschmidt due to Broke internal test/inductor signals, see D61611729 ([comment](https://github.com/pytorch/pytorch/pull/133373#issuecomment-2304714503))
2024-08-22 13:47:26 +00:00
88c973005d Revert "[FlexAttention] Enable different qk and v head-dims (#134043)"
This reverts commit e847b6bb9ba281b0db83fcdd79c328252403e9e8.

Reverted https://github.com/pytorch/pytorch/pull/134043 on behalf of https://github.com/jeanschmidt due to Need to revert, in order to be able to revert https://github.com/pytorch/pytorch/pull/133373, feel free to reland this after solving conflicts ([comment](https://github.com/pytorch/pytorch/pull/134043#issuecomment-2304708996))
2024-08-22 13:44:17 +00:00
83b5d449a3 Add full float16/bfloat16 support to MaxUnPool (#133774)
It already supported half so might as well add bfloat16 support for parity

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133774
Approved by: https://github.com/eqy, https://github.com/ezyang
2024-08-22 13:34:43 +00:00
c9c84ae3ee [BE][Ez]: Update CUDNN_frontend submodule to 1.6.1 (#134007)
Update cudnn_frontend submodule to 1.6.1 to patch some minor bugfixes and compiler fixes.
# Bug fix
* Fixed an issue where custom dropout mask was not correctly applied.
* Added -fvisibility=hidden for the pip wheels generated to avoid symbol conflicts with other modules that use cudnn frontend.
* Fixed an issue in sdpa operation which when deserialized will lead to numerical mismatches.
* Fixed an issue in sdpa fp8 fprop operation (in inference mode).
# Samples
* Added a new sample to showcase how a custom dropout mask can be applied to a sdpa operation.
* Added a sample to showcase convolutions on large (c * d * h * w > 2 **31) tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134007
Approved by: https://github.com/eqy
2024-08-22 13:34:17 +00:00
108a75b454 [PP] Add ZeroBubble schedule (#133467)
Zero bubble can be expressed through `ScheduleFlexibleInterleaved1F1B` by setting `enable_zero_bubble=True`. But instead of having to include this flag in schedule initialization we should create a separate ZeroBubbleSchedule and also transition `Interleaved1F1B` to derive from `ScheduleFlexibleInterleaved1F1B`. Then we dont need to expose `ScheduleFlexibleInterleaved1F1B` since the naming is not obvious

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133467
Approved by: https://github.com/wconstab
ghstack dependencies: #132691
2024-08-22 13:32:15 +00:00
cedfac20c7 Revert "[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424)"
This reverts commit 66d3eb783c3b3d7087988dd29bfb619b7f4306b7.

Reverted https://github.com/pytorch/pytorch/pull/133424 on behalf of https://github.com/jeanschmidt due to Broke internal ADS builds, see D61611517 ([comment](https://github.com/pytorch/pytorch/pull/133424#issuecomment-2304676328))
2024-08-22 13:29:27 +00:00
592a172910 [FSDP2] Resolved strided sharding todo in clipping tests (#134152)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134152
Approved by: https://github.com/XilunWu, https://github.com/weifengpy, https://github.com/wz337
2024-08-22 12:45:13 +00:00
4c645c04d8 Fix type of get_raw_stream (#134187)
Just something I noticed while implementing a new DeviceInterface

I had to add `# type: ignore[assignment]` because mypy thinks
DeviceInterface.get_raw_stream is a `Callable` and therefore
incompatible with a `staticmethod`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134187
Approved by: https://github.com/jansel
2024-08-22 12:00:08 +00:00
5fb8754434 [inductor] write cpp code with encoding utf-8 (#134027)
Windows is different to Linux, each Windows version with different language pack have different code page.
Inductor on Windows will write the genarated cpp code with its code page, and it should occured un-decode character failed.

For this situlation, Microsoft suggest to use Unicode to instead of a specific code page. Ref: https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers

Changes:
1. Use `utf-8` as encoder for cpp code.
2. It only change encode for cpp code, but not for binary type. binary type is for AoT binary context.

It works on https://github.com/pytorch/pytorch/issues/122094#issuecomment-2299592942.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134027
Approved by: https://github.com/desertfire, https://github.com/jgong5, https://github.com/jansel
2024-08-22 11:54:32 +00:00
aea1148d56 [fp8 rowwise] Clarify dtypes (#134114)
Disambiguate some of the dtypes (e.g., for the scales), move the "constant" ones out of the function, and use safe casting functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134114
Approved by: https://github.com/drisspg
ghstack dependencies: #134110, #134111, #134112, #134113
2024-08-22 11:07:39 +00:00
72586ccd14 [fp8 rowwise] Don't build separate kernel for no bias (#134113)
CUTLASS automatically skips a stage in the epilogue if we provide a nullptr. Thus, instead of building a special kernel for bias=None, we can reuse one of the other ones.

This also considerably simplifies the code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134113
Approved by: https://github.com/drisspg
ghstack dependencies: #134110, #134111, #134112
2024-08-22 11:07:39 +00:00
d64fa11095 [fp8 rowwise] Fix bias calculation being done in low precision (#134112)
The compute dtype for the bias addition was set to ElementBias. Thus, for a bf16 bias, we would cast the fp32 accum to bf16 and _then_ add the bias. It is however (slightly?) more accurate to first add the bias in fp32 and only cast at the end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134112
Approved by: https://github.com/drisspg
ghstack dependencies: #134110, #134111
2024-08-22 11:07:34 +00:00
15faed60ca [fp8 rowwise] Make schedule selection more readable (#134111)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134111
Approved by: https://github.com/drisspg
ghstack dependencies: #134110
2024-08-22 11:07:30 +00:00
b8ea5b01c9 [fp8 rowwise] Allocate workspace as a PyTorch Tensor (#134110)
This makes us pass through the CUDA caching allocator which is safer e.g. in case of CUDA graphs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134110
Approved by: https://github.com/drisspg
2024-08-22 11:07:26 +00:00
cyy
4c8193b8f0 [14/N] Fix clang-tidy warnings in aten/src/ATen (#132733)
Follows #133807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132733
Approved by: https://github.com/ezyang
2024-08-22 10:09:15 +00:00
90c821814e SparseCsrCUDA: cuDSS backend for linalg.solve (#129856)
This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve.
Fixes #69538

Minimum example of usage:
```
import torch

if __name__ == '__main__':
    spd = torch.rand(4, 3)
    A = spd.T @ spd
    b = torch.rand(3).to(torch.float64).cuda()
    A = A.to_sparse_csr().to(torch.float64).cuda()

    x = torch.linalg.solve(A, b)
    print((A @ x - b).norm())

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856
Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn

Co-authored-by: Zihang Fang <zhfang1108@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2024-08-22 07:57:30 +00:00
64cfcbd8a3 Tune _int_bsr_dense_addmm for int8 inputs on A100 (#134035)
As in the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134035
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #133855
2024-08-22 06:43:11 +00:00
b7baa062fc Update torch-xpu-ops pin (ATen XPU implementation) (#133850)
Bugfixings for PyTorch 2.5,
1. Using SYCL group algorithm API instead of old style for sub group shift utilities.
2. Add preprocess in reduction kernel for cases requiring data type cast.
3. Make group norm memory format compatible.
4. ZeroTensor: a. Remove unnecessary aten operators registration, or ZeroTensor process is bypassed. b. Align preprocess with intree implementation in aten::copy_.
5. Rebase checkIndexTensorTypes usage.
6. Align latest semantics of PyTorch foreach operators. Return multiple tensors with offset=0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133850
Approved by: https://github.com/EikanWang
2024-08-22 06:27:03 +00:00
cdb9c7d228 Add support for using privateuse1 backend name in instantiate_device_type_tests() (#133082)
As you can see, 'privateuse1' appears many times in out-of-tree extension codebase. I think that everything about the device type should be as same as other in-tree backends after registering the privateuse1 backend.

For example, after registering a privateuse1 backend named "foo", you should allow "foo" to be passed in as a valid device type.

```diff
- instantiate_device_type_tests(TestIndexing, globals(), only_for='privateuse1')
- instantiate_device_type_tests(NumpyTests, globals(), only_for='privateuse1')
+ instantiate_device_type_tests(TestIndexing, globals(), only_for='foo')
+ instantiate_device_type_tests(NumpyTests, globals(), only_for='foo')
```

> https://github.com/Ascend/pytorch/blob/master/test/test_indexing.py#L1654-L1655

The change is to map privateuse1 backend name to 'privateuse1' when calling `filter_desired_device_types()`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133082
Approved by: https://github.com/albanD
2024-08-22 06:17:21 +00:00
24c2dd2002 Migrate fuse_chunk_reshape_concat_pass to PT2 (#134026)
Summary:
This is part of the work of dper pass migration https://fburl.com/gdoc/wxwykxns
This pass has ~2.4% perf impact for adfinder_reels_ctr_model

Test Plan: Still in test

Differential Revision: D60789747

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134026
Approved by: https://github.com/huxintong
2024-08-22 06:13:52 +00:00
938f37b745 Added batching rule for sdpa_math, sdpa_efficient_attention forward, cudnn, and flash attention (#133964)
Fixes https://github.com/pytorch/pytorch/issues/117016, https://github.com/pytorch/pytorch/issues/102457, https://github.com/pytorch/pytorch/issues/110525, https://github.com/pytorch/pytorch/issues/108065,

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133964
Approved by: https://github.com/Skylion007
2024-08-22 05:29:49 +00:00
e2ff094008 [inductor] calibration inductor windows uts (1/N) (#134033)
Changes:
1. Re-open fixed UTs.
2. Mark skiped reasons for failed UTs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134033
Approved by: https://github.com/jansel
2024-08-22 05:21:28 +00:00
0d7ac1966a kill sharing of constraints (#134045)
Summary:
Previously, reuse of the same `Dim` was encoded by "sharing" internal constraints among constraint targets. This kind of sharing, implemented using `shared` fields between `_Constraint`s, was originally motivated by `dynamic_dim`, specifically to support `==` between `dynamic_dim`s, but we no longer need to maintain this overcomplicated structure: we can simply use names of `Dims` to directly encode sharing information.

Thus this PR vastly simplifies the structure of `_Constraint` by removing `shared` fields. As a result, both `_Constraint` and its moral subclass, `_DerivedConstraint`, are 1-1 with `Dim` and its moral subclass, `DerivedDim`.

Note that this will break `==` over `dynamic_dim`, so an immediate follow-up will be to remove `dynamic_dim` entirely from our public API. (It's been more than 6 months since the deprecation warning anyway.) I just didn't want to deal with that process in the same PR.

Test Plan: existing

Differential Revision: D61559413

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134045
Approved by: https://github.com/pianpwk
2024-08-22 04:40:47 +00:00
de06345e9b Avoid Host & Device Sync In LR Scheduler (#133663)
Fixes #133662.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133663
Approved by: https://github.com/janeyx99, https://github.com/eqy

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2024-08-22 03:52:43 +00:00