fdab48a7c1
Enable all PIE rules on ruff ( #165814 )
...
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796 Enum contains duplicate value: {value}
PIE808 Unnecessary start argument in range
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
24520b8386
Revert "Enable all PIE rules on ruff ( #165814 )"
...
This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872.
Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863 ))
2025-10-18 07:21:08 +00:00
c79dfdc655
Enable all PIE rules on ruff ( #165814 )
...
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796 Enum contains duplicate value: {value}
PIE808 Unnecessary start argument in range
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
fbe0d20a17
[2/N] More ruff SIM fixes ( #165031 )
...
This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031
Approved by: https://github.com/mlazos
2025-10-14 14:22:54 +00:00
39161e73fc
[Fix] missing lambda in torch._check ( #165043 )
...
Fixes more missing lambda in torch._check in the source code. Inspired by #164225 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165043
Approved by: https://github.com/FFFrog , https://github.com/Skylion007
2025-10-10 17:11:55 +00:00
b8be796a57
Revert "[2/N] More ruff SIM fixes ( #165031 )"
...
This reverts commit 38095fbd1323ee4a9541fbcbb9b28bd20f2cd956.
Reverted https://github.com/pytorch/pytorch/pull/165031 on behalf of https://github.com/albanD due to One of the changed line started to fail on trunk ([comment](https://github.com/pytorch/pytorch/pull/165031#issuecomment-3390190870 ))
2025-10-10 13:42:14 +00:00
38095fbd13
[2/N] More ruff SIM fixes ( #165031 )
...
This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031
Approved by: https://github.com/mlazos
2025-10-10 05:37:46 +00:00
7158aa22e8
remove more ( #164753 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164753
Approved by: https://github.com/aorenste , https://github.com/mlazos
ghstack dependencies: #164664 , #164665 , #164667 , #164668
2025-10-08 14:23:38 +00:00
2855a045b3
Use sym_eq and sym_and on symbolic shapes in common_meta_baddbmm_bmm ( #164781 )
...
Differential Revision: D84005053
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164781
Approved by: https://github.com/Skylion007
2025-10-07 18:25:00 +00:00
5d7360bb03
Revert "Enable all SIM rules except disabled ones ( #164645 )"
...
This reverts commit 321e6026925f6b6e8a36e3a8b7c0295cd7541911.
Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351 ))
2025-10-05 19:32:21 +00:00
321e602692
Enable all SIM rules except disabled ones ( #164645 )
...
`SIM` rules are useful for simplifying boolean expressions and enhances code readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645
Approved by: https://github.com/ezyang
2025-10-05 07:38:25 +00:00
1051c1de5c
Add pyrefly suppressions 2/n ( #164513 )
...
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283
Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check
---
step 1: uncomment lines in the `pyrefly.toml` file
before: https://gist.github.com/maggiemoss/911b4d0bc88bf8cf3ab91f67184e9d46
after:
```
INFO Checking project configured at `/Users/maggiemoss/python_projects/pytorch/pyrefly.toml`
INFO 0 errors (1,152 ignored)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164513
Approved by: https://github.com/oulgen
2025-10-03 02:46:13 +00:00
315ffdc1e4
[4/N] Apply ruff UP035 rule to python code ( #164206 )
...
Follows #164104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164206
Approved by: https://github.com/albanD
2025-10-01 19:05:53 +00:00
cc8b14d09a
[2/N] Simplify "in" operation for containers of a single item ( #164323 )
...
These issues are detected by ruff [FURB171](https://docs.astral.sh/ruff/rules/single-item-membership-test/#single-item-membership-test-furb171 ).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164323
Approved by: https://github.com/justinchuby , https://github.com/Skylion007
2025-10-01 05:39:11 +00:00
7f29c47a4f
Fix cdist export compute mode validation ( #161724 )
...
Fixes #161089 . Added '0' as the acceptable value for compute mode in _meta_registrations.py. Also, added a test case in test_export.py file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161724
Approved by: https://github.com/albanD , https://github.com/angelayi
2025-09-30 12:23:20 +00:00
7afcb030d8
Back out "Revert D81959389" ( #163905 )
...
Summary:
Original commit changeset: 06888d7ebff0
Original Phabricator Diff: D82932788
Restricted the test to SM90 for scaled_grouped_mm
Test Plan: TBD (will share the linux CI results)
Differential Revision: D83283991
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163905
Approved by: https://github.com/angelayi
2025-09-30 07:05:13 +00:00
eb4361a801
[Fix] Adding missing f
prefixes to formatted strings [1/N] ( #164065 )
...
As stated in the title.
* #164068
* #164067
* #164066
* __->__ #164065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164065
Approved by: https://github.com/Skylion007
2025-09-29 04:53:00 +00:00
c106ee8515
[FakeTensor] Supplement the relevant logic for converting conv1d to conv2d in meta_conv ( #160408 )
...
## Fixes https://github.com/pytorch/pytorch/issues/159462 also fixes #163569 , #163604
## summary
the issue is caused by the wrong stride of conv1d's result generated by meta_conv:
4d5b3f2d5a/torch/_meta_registrations.py (L2453-L2471)
and the wrong stride will be used to codegen size assert in inductor:
4d5b3f2d5a/torch/_inductor/ir.py (L6152-L6163)
## reason
So why the computed stride is wrong in the meta_conv function? because the corresponding backend will convert conv1d to conv2d and change the input tensor' size and memory_format(channel last). but the meta_conv do not do this transformation, so a mismatch happend.
4d5b3f2d5a/aten/src/ATen/native/Convolution.cpp (L1502-L1510)
just add corresponding logic in meta_conv.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160408
Approved by: https://github.com/eellison , https://github.com/jansel , https://github.com/mlazos
2025-09-26 15:45:02 +00:00
21a41edd4f
Add fake_impl for _native_multi_head_attention ( #163700 )
...
Test Plan: See added test in test_export.py
Differential Revision: D83099187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163700
Approved by: https://github.com/angelayi
2025-09-25 19:01:27 +00:00
9c4d9f940b
[inductor] Support out_dtype arg to matmul ( #163393 )
...
Fixes #163275
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163393
Approved by: https://github.com/eellison , https://github.com/coconutruben
ghstack dependencies: #163386 , #163398 , #163387 , #163414 , #163415 , #163419 , #163434
2025-09-23 15:37:38 +00:00
aff76c046d
Revert "Add fake_impl for _native_multi_head_attention ( #163167 )"
...
This reverts commit 27164b6788cab6e6d8095012839e51c958a819d6.
Reverted https://github.com/pytorch/pytorch/pull/163167 on behalf of https://github.com/malfet due to This broke in inductor-cpu-test, see 1a42656d6c/1
([comment](https://github.com/pytorch/pytorch/pull/163167#issuecomment-3324302026 ))
2025-09-23 14:36:45 +00:00
27164b6788
Add fake_impl for _native_multi_head_attention ( #163167 )
...
Test Plan:
See added test in test_export.py
Rollback Plan:
Reviewed By: henryoier
Differential Revision: D77747446
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163167
Approved by: https://github.com/angelayi
2025-09-23 04:02:20 +00:00
9b5ec0ff7c
Use computed buffer sizes of torch for cusparseLt metadata ( #163125 )
...
Making sure buffer allocation matches what is computed by cusparseLt compression
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163125
Approved by: https://github.com/jcaip
2025-09-19 22:12:40 +00:00
9b7a8c4d05
[cuDNN][SDPA][submodule] Roll-back cuDNN frontend upgrade, update Meta registration ( #163104 )
...
For https://github.com/pytorch/torchtitan/issues/1713
Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time 1a7b4b78db/include/cudnn_frontend/node/sdpa_support_surface.h (L447%C2%A0)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163104
Approved by: https://github.com/drisspg
2025-09-17 15:48:54 +00:00
872ed60679
[mxfp8 torch._scaled_grouped_mm] fix meta registration for 3d tensor ( #162765 )
...
Meta registration checks for torch._scaled_grouped_mm has a bug for 3d "B" tensors. Namely, the scale shape for such a tensor should be 2d with shape (G, blocked_K * blocked_N), but it currently enforces an expected 3d shape of (G, blocked_K, blocked_N).
See Blas.cpp for correct validation logic [here](8e217a9f6d/aten/src/ATen/native/cuda/Blas.cpp (L1622)
).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162765
Approved by: https://github.com/ngimel
2025-09-12 03:51:52 +00:00
ac72f81c12
[dynamic shapes] unbacked-safe should_swap ( #160473 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160473
Approved by: https://github.com/laithsakka
2025-09-11 18:51:25 +00:00
b6d0a9ea90
MXFP8 grouped GEMM support for torch._scaled_grouped_mm + submodule bump ( #162209 )
...
## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: https://github.com/pytorch/FBGEMM/pull/4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
- Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
- Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
- Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
- Bump FBGEMM third party submodule to include:
- https://github.com/pytorch/FBGEMM/pull/4816
- https://github.com/pytorch/FBGEMM/pull/4820
- https://github.com/pytorch/FBGEMM/pull/4821
- https://github.com/pytorch/FBGEMM/pull/4823
#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`
## Test plan
#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581 )
#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...
test/test_matmul_cuda.py ......... [100%]
============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162209
Approved by: https://github.com/ngimel
2025-09-06 15:25:30 +00:00
fbf3d2027d
use sym_or instead of any to avoid dde in calc_conv_nd_return_shape ( #162084 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162084
Approved by: https://github.com/aorenste
Co-authored-by: Aaron Orenstein <aorenste@fb.com >
2025-09-04 01:20:22 +00:00
e34b6a0103
Add meta for add.Scalar ( #161332 )
...
Fixes https://github.com/pytorch/pytorch/issues/161076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161332
Approved by: https://github.com/Skylion007
2025-08-26 02:26:51 +00:00
e631557518
Fix meta function for aten.complex ( #160894 )
...
Closes https://github.com/pytorch/pytorch/issues/160882
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160894
Approved by: https://github.com/mlazos
2025-08-20 16:30:04 +00:00
781e9a7724
Fix meta for constant_pad_nd ( #159878 )
...
Fixes https://github.com/pytorch/pytorch/issues/144187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159878
Approved by: https://github.com/Skylion007 , https://github.com/ezyang
2025-08-14 14:47:47 +00:00
74a754aae9
Add meta kernel for sdpa_math_for_mps ( #159695 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159695
Approved by: https://github.com/malfet
ghstack dependencies: #159456
2025-08-05 22:27:06 +00:00
bc4b04e058
DeviceCopy should have the same layout as input ( #159615 )
...
Summary: Fix https://github.com/pytorch/pytorch/issues/159612
- Fix the meta implementation of `nan_to_num`, it should preserve the stride of the input
- The DeviceCopy IR node should always preserve the input's layout, so we don't end up with a contiguous call during device copy
Test Plan:
```
buck2 run @mode/dev-nosan fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_d2h_copy
```
Rollback Plan:
Differential Revision: D79411407
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159615
Approved by: https://github.com/eellison
2025-08-04 23:56:58 +00:00
a81ffbc5f5
improve shape checks for grouped_mm ( #159666 )
...
Check that contraction dimension matches between tensors if it's known, and do device-side checks for correct offsets
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159666
Approved by: https://github.com/danielvegamyhre , https://github.com/eqy
2025-08-02 00:12:25 +00:00
c400c8e2e0
[ROCm] Add FP8 rowwise support to _scaled_grouped_mm + Submodule update ( #159075 )
...
Summary:
In this PR we integrate the [FBGEMM AMD FP8 rowwise scaling grouped GEMM kernel](https://github.com/pytorch/FBGEMM/tree/main/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped ) to add support for the `_scaled_grouped_mm` API on AMD. `_scaled_grouped_mm` is [currently supported on Nvidia](9faef3d17c/aten/src/ATen/native/cuda/Blas.cpp (L1614)
), this PR aims to bring parity to AMD. Related: [[RFC]: PyTorch Low-Precision GEMMs Public API](https://github.com/pytorch/pytorch/issues/157950#top ) #157950 .
The kernel is developed using the Composable Kernel framework. Only MI300X is currently supported. In the near future we plan to add support for MI350X as well. For data types we support FP8 e3m4.
The kernel support will be gated with the `USE_FBGEMM_GENAI` flag. We hope to enable this by default for relevant AMD builds.
Note we also update submodule `third_party/fbgemm` to 0adf62831 for the required updates from fbgemm.
Test Plan:
**Hipify & build**
```
python tools/amd_build/build_amd.py
USE_FBGEMM_GENAI=1 python setup.py develop
```
**Unit tests**
```
python test/test_matmul_cuda.py -- TestFP8MatmulCUDA
Ran 488 tests in 32.969s
OK (skipped=454)
```
**Performance Sample**
| G | M | N | K | Runtime Ms | GB/S | TFLOPS |
| -- | -- | -- | -- | -- | -- | -- |
| 128 | 1 | 2048 | 5120 | 0.37| 3590 | 7.17 |
| 128 | 64 | 2048 | 5120 | 0.51| 2792 | 338.34 |
| 128 | 128 | 2048 | 5120 | 0.66| 2272 | 522.72 |
| 128 | 1 | 5120 | 1024 | 0.21| 3224 | 6.43 |
| 128 | 64 | 5120 | 1024 | 0.29| 2590 | 291.40 |
| 128 | 128 | 5120 | 1024 | 0.40| 2165 | 434.76 |
| 128 | 1 | 4096 | 4096 | 0.69| 3126 | 6.25 |
| 128 | 64 | 4096 | 4096 | 0.85| 2655 | 324.66 |
| 128 | 128 | 4096 | 4096 | 1.10| 2142 | 501.40 |
| 128 | 1 | 8192 | 8192 | 2.45| 3508 | 7.01 |
| 128 | 64 | 8192 | 8192 | 3.27| 2692 | 336.74 |
| 128 | 128 | 8192 | 8192 | 4.04| 2224 | 543.76 |
| 16 | 1 | 2048 | 5120 | 0.04| 3928 | 7.85 |
| 16 | 64 | 2048 | 5120 | 0.05| 3295 | 399.29 |
| 16 | 128 | 2048 | 5120 | 0.07| 2558 | 588.69 |
| 16 | 1 | 5120 | 1024 | 0.03| 3119 | 6.23 |
| 16 | 64 | 5120 | 1024 | 0.03| 2849 | 320.62 |
| 16 | 128 | 5120 | 1024 | 0.05| 2013 | 404.11 |
| 16 | 1 | 4096 | 4096 | 0.06| 4512 | 9.02 |
| 16 | 64 | 4096 | 4096 | 0.09| 3124 | 381.95 |
| 16 | 128 | 4096 | 4096 | 0.13| 2340 | 547.67 |
| 16 | 1 | 8192 | 8192 | 0.32| 3374 | 6.75 |
| 16 | 64 | 8192 | 8192 | 0.42| 2593 | 324.28 |
| 16 | 128 | 8192 | 8192 | 0.53| 2120 | 518.36 |
- Using ROCm 6.4.1
- Collected through `triton.testing.do_bench_cudagraph`
**Binary size with gfx942 arch**
Before: 116103856 Jul 23 14:12 build/lib/libtorch_hip.so
After: 118860960 Jul 23 14:29 build/lib/libtorch_hip.so
The difference is 2757104 bytes (~2.6 MiB).
Reviewers: @drisspg @ngimel @jwfromm @jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159075
Approved by: https://github.com/drisspg
2025-07-30 23:53:58 +00:00
aaa384b2d4
move view_meta to fake impl ( #158406 )
...
Python dispatcher is not always enabled in fake tensors and have to be called explicitly.
While it should be, it requires some work to get all tests working.
I have been running in several issues where I add to add enable_python_dispatcher ex
XLA, Helom ..etc to avoid issues related to that for the view specifically i moved it to fake tensor impl.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158406
Approved by: https://github.com/bobrenjc93
2025-07-25 08:21:27 +00:00
0b2ef76e85
DDE-Free select with unbacked index. ( #157605 )
...
When select has data dependent input, we cant tell if the actual index shall be index+size or index.
to avoid throwing dde, we allocate a new unbacked symbol to represent the storage offset of the
output view and we compute its value dynamically at runtime when inductor is lowered.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157605
Approved by: https://github.com/ColinPeppler
2025-07-24 20:08:05 +00:00
23550ab735
Revert "DDE-Free select with unbacked index. ( #157605 )"
...
This reverts commit 79d7c754ab8ae0e5c3a614521632d2cfbfa0fdba.
Reverted https://github.com/pytorch/pytorch/pull/157605 on behalf of https://github.com/laithsakka due to fail pr time benchmarks ([comment](https://github.com/pytorch/pytorch/pull/157605#issuecomment-3084663020 ))
2025-07-17 16:20:02 +00:00
79d7c754ab
DDE-Free select with unbacked index. ( #157605 )
...
When select has data dependent input, we cant tell if the actual index shall be index+size or index.
to avoid throwing dde, we allocate a new unbacked symbol to represent the storage offset of the
output view and we compute its value dynamically at runtime when inductor is lowered.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157605
Approved by: https://github.com/ColinPeppler
2025-07-17 05:08:11 +00:00
90618581e9
Fix grouped MM output strides when compiled but not max-autotuned ( #158143 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158143
Approved by: https://github.com/ngimel
2025-07-15 11:53:13 +00:00
c8c221c0b3
[Inductor][Float8] Add float8_e4m3fn into assertion dtype list. ( #157684 )
...
Fix assert issue.
Add float8_e4m3fn into dtype list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157684
Approved by: https://github.com/Xia-Weiwen , https://github.com/leslie-fang-intel , https://github.com/jansel
2025-07-15 06:02:01 +00:00
1f57e0e04d
[CPU] Support GQA for flash attention ( #157893 )
...
As many models require GQA, we support it in flash attention for CPU path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157893
Approved by: https://github.com/mingfeima , https://github.com/jansel
2025-07-13 09:49:02 +00:00
e1a20988f3
[Quant][CPU] Enable fp8 qconv ( #157076 )
...
**Summary**
Enable fp8 qconv on CPU. It's part of the plan to enable fp8 static quantization on CPU. This PR only adds FP8 support of the existing int8 qconv op. It does not add a new op nor does it affect frontend or quantization flow. The schema of the qconv op is not changed either.
So, the FP8 qconv shares the same op as INT8 qconv and the difference is that src/wei dtype is fp8 instead of int8. The output dtype can be fp8/float32/bfloat16. The implementation uses the oneDNN library.
Note:
OneDNN does not support quantized fp8 convolution until v3.9 but the version used in PyTorch is v3.7.2. So, the op goes to the reference kernel for now. And we have also update the oneDNN path so that it's compatible with the fp8 dtype. Once oneDNN is upgraded to v3.9 or newer, minimum changes are needed to enable the oneDNN path. And we have ensured that the behavior of the reference kernel is the same as the new oneDNN's implementation.
- oneDNN version < 3.9 (now)
- Always go to the reference kernel
- oneDNN version >= 3.9 (future)
- Go to reference kernel on old platforms (without AMX)
- Use oneDNN on new platforms (with AMX)
**Test plan**
```
pytest test/quantization/core/test_quantized_op.py -k "qconv and fp8"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157076
Approved by: https://github.com/leslie-fang-intel , https://github.com/jerryzh168
2025-07-11 10:00:57 +00:00
a3ec6d64b2
Update test after CUTLASS upgrade ( #157903 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157903
Approved by: https://github.com/ngimel
2025-07-10 20:10:20 +00:00
4cc8b60d1b
[BE][1/16] fix typos in torch/ ( #156311 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156311
Approved by: https://github.com/albanD
2025-07-09 11:02:22 +00:00
ed5d6d2a20
python definitely_contiguous-> is_contiguous_or_false ( #156515 )
...
We probably can avoid having those in python as well and just depend on c++ impl after we land https://github.com/pytorch/pytorch/pull/155590 but that is for a different PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156515
Approved by: https://github.com/bobrenjc93
2025-06-30 17:31:51 +00:00
75a7d9e868
Revert "python definitely_contiguous-> is_contiguous_or_false ( #156515 )"
...
This reverts commit 4c0091fda65b714fa73671a15e379f814af153e0.
Reverted https://github.com/pytorch/pytorch/pull/156515 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some torch.export failures internally ([comment](https://github.com/pytorch/pytorch/pull/156515#issuecomment-3014104570 ))
2025-06-27 19:07:06 +00:00
cbcffce48a
address remaining straight forward gso in meta_registrations ( #156902 )
...
Those are all straight forward generalization of existing checks,
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156902
Approved by: https://github.com/ColinPeppler
2025-06-27 06:19:54 +00:00
4c0091fda6
python definitely_contiguous-> is_contiguous_or_false ( #156515 )
...
We probably can avoid having those in python as well and just depend on c++ impl after we land https://github.com/pytorch/pytorch/pull/155590 but that is for a different PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156515
Approved by: https://github.com/bobrenjc93
2025-06-26 00:47:14 +00:00
04178d347c
[Reland] [Intel GPU] Make SDPA output has the same stride as Query. ( #154340 )
...
Fixes [#153903 ](https://github.com/pytorch/pytorch/issues/153903 ).
Currently the output tensor of SDPA XPU is always defined as contiguous stride, while CPU/CUDA flash_attention and cudnn_attention allocate output tensor with stride the same as Query.
This PR aligns XPU's behavior with CUDA/CPU to make XPU compatible to CPU/CUDA's modeling code.
The function `alloc_with_matching_layout` is copied from cudnn 8c16d0e404/aten/src/ATen/native/cudnn/MHA.cpp (L874)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154340
Approved by: https://github.com/guangyey , https://github.com/drisspg
2025-06-24 06:09:59 +00:00