81b7b16618
Reland "[Fix XPU CI][Inductor UT] Fix test cases broken by community. ( #161142 )" ( #161949 )
...
This PR reland #161142 which is reverted to be able to revert other PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161949
Approved by: https://github.com/jansel
2025-09-02 23:43:27 +00:00
54e275e0d8
Revert "[Fix XPU CI][Inductor UT] Fix test cases broken by community. ( #161142 )"
...
This reverts commit c83cbd2f2a2de2e3258f07de77d8740743df6d2d.
Reverted https://github.com/pytorch/pytorch/pull/161142 on behalf of https://github.com/jeanschmidt due to This PR needs to be reverted to be able to revert another PR, this is due to merge conflicts, I am sorry for this. Please feel free to rebase and merge at your earliest convenience ([comment](https://github.com/pytorch/pytorch/pull/161142#issuecomment-3242937640 ))
2025-09-01 17:03:50 +00:00
c83cbd2f2a
[Fix XPU CI][Inductor UT] Fix test cases broken by community. ( #161142 )
...
Fixes #161384 , Fixes #161162 , Fixes #160946 , Fixes #160947 , Fixes #160948
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161142
Approved by: https://github.com/jansel
2025-08-30 11:09:07 +00:00
8c506e6310
[easy][test] Add repeat_interleave opinfo that exercises binary search fusion ( #161445 )
...
This adds a configuration that would have caught the need for https://github.com/pytorch/pytorch/pull/159961 when https://github.com/pytorch/pytorch/pull/158462 was landed.
Notably:
* the test has output_size kwarg specified
* the input is 1D plus a size-1 dimension (otherwise, if there are non-size-1 dimensions, then the fusion won't occur)
Differential Revision: [D80981715](https://our.internmc.facebook.com/intern/diff/D80981715 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161445
Approved by: https://github.com/eellison , https://github.com/v0i0
2025-08-26 12:32:24 +00:00
6382302990
[MPS] Add grid_sampler_3d for MPS ( #160541 )
...
This PR adds support for `grid_sampler_3d` for MPS with "bilinear" interpolation.
NOTE: "nearest" interpolation is not yet supported
Fixes #159882
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160541
Approved by: https://github.com/malfet
2025-08-15 16:19:25 +00:00
db0b7f1cc9
[BE][CI] Adjust error_inputs for cat and complex ( #160378 )
...
MPS backend does not support double, so errors should be different
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160378
Approved by: https://github.com/dcci
2025-08-13 18:35:06 +00:00
df55ec7d4b
[OpInfo][BE] Better inputs for addmm ( #160234 )
...
Right now alpha and betha are both less than zero, which makes them useless for all addmm samples for interal types
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160234
Approved by: https://github.com/Skylion007
ghstack dependencies: #160228
2025-08-10 01:26:48 +00:00
f5e2de928b
[BE] fix remaining flake8 v7 warnings ( #159044 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159044
Approved by: https://github.com/Skylion007
ghstack dependencies: #159043
2025-07-25 02:56:34 +00:00
7f649ed4f8
Add basic torch.hash_tensor op ( #154149 )
...
Added `torch.hash_tensor` reduction function with a `mode` argument that defaults to reduction with xor.
- The hash is always uint64.
- Integers will be casted to uint64 before performing the xor_sum reduction
- Floats will be upcasted to double and then bitcasted to uint64 before performing the xor_sum reduction
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154149
Approved by: https://github.com/albanD
2025-07-23 22:28:03 +00:00
2cdafab0bd
[BE] Raise ValueError from torch.cat meta func ( #158249 )
...
Followup after https://github.com/pytorch/pytorch/pull/155460
From [Python documentation](https://docs.python.org/3/library/exceptions.html#ValueError ):
> Raised when an operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError.
Raise [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError ) when input-output types are incompatible with each other
> Raised when an operation or function is applied to an object of inappropriate type. The associated value is a string giving details about the type mismatch.
> This exception may be raised by user code to indicate that an attempted operation on an object is not supported, and is not meant to be. If an object is meant to support a given operation but has not yet provided an implementation, [NotImplementedError](https://docs.python.org/3/library/exceptions.html#NotImplementedError ) is the proper exception to raise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158249
Approved by: https://github.com/jbschlosser , https://github.com/Skylion007 , https://github.com/albanD
2025-07-20 23:49:18 +00:00
794b95d54b
Enable Half dtype for logcumsumexp_backward ( #157512 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157512
Approved by: https://github.com/malfet
2025-07-03 18:13:38 +00:00
c553c55be7
Revert "Fix full_like decomposition to preserve strides ( #144765 )"
...
This reverts commit 01b0f09931d47bd2716398a0c335b2807dc3074d.
Reverted https://github.com/pytorch/pytorch/pull/144765 on behalf of https://github.com/jeanschmidt due to Seems to be breaking internal tests see [D77652778](https://www.internalfb.com/diff/D77652778 ), @jansel may you help get this PR merged? ([comment](https://github.com/pytorch/pytorch/pull/144765#issuecomment-3027975098 ))
2025-07-02 13:56:03 +00:00
01b0f09931
Fix full_like decomposition to preserve strides ( #144765 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144765
Approved by: https://github.com/amjames , https://github.com/jansel
2025-07-01 19:13:22 +00:00
a4b59498c5
Fix fake kernel for the out=... variant of unbind_copy ( #156643 )
...
`unbind_copy(..., out=...)` returns None rather than the `out` argument
(see https://github.com/pytorch/pytorch/issues/130829#issuecomment-2283936222 ),
but the old fake kernel didn't account for that and caused an assertion
failure in `pushPyOutToStack`. This patch fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156643
Approved by: https://github.com/zou3519 , https://github.com/jansel , https://github.com/bdhirsh
ghstack dependencies: #156642
2025-06-27 01:34:07 +00:00
cec2977ed2
[BE][6/16] fix typos in torch/ ( #156316 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316
Approved by: https://github.com/albanD
ghstack dependencies: #156313 , #156314 , #156315
2025-06-23 02:57:34 +00:00
3f44fdc03d
Revert "[BE][6/16] fix typos in torch/ ( #156316 )"
...
This reverts commit b210cf1ea56bcd9f937a2805d9e70d8684d25ee4.
Reverted https://github.com/pytorch/pytorch/pull/156316 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912 ) [HUD commit link](c95f7fa874 ) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213 ))
2025-06-22 12:31:57 +00:00
b210cf1ea5
[BE][6/16] fix typos in torch/ ( #156316 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316
Approved by: https://github.com/albanD
ghstack dependencies: #156313 , #156314 , #156315
2025-06-22 08:43:33 +00:00
c2f4cc59a7
[MPS] Fix bug in 3d coords calculation ( #156375 )
...
Which was not caught by CI beforehand, as all 3D examples right now are symmetric, so add an uneven shape to `sample_inputs_interpolate`
Though it's indirectly tested by `test_upsample_nearest3d` inductor test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156375
Approved by: https://github.com/atalman
2025-06-19 19:56:15 +00:00
c28e74e457
[MPS] Add nearest_3d forward and backward ( #156090 )
...
Introduce generalizable `UpsampleParams` structure in `UpSample.h`, which could be shared between CPU and MPS
Delete `upsample_nearest3d` MPS fallback and replace it with proper shader
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156090
Approved by: https://github.com/kulinseth , https://github.com/dcci
ghstack dependencies: #156256
2025-06-18 04:48:15 +00:00
b1713c6655
[MPS][Testing][BE] Fix samples for full_like ( #156026 )
...
Now that device is known, one can avoid creating tensors of `torch.double` type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156026
Approved by: https://github.com/dcci
ghstack dependencies: #156121
2025-06-17 04:46:26 +00:00
03488d820c
Revert "[MPS][Testing][BE] Fix samples for full_like ( #156026 )"
...
This reverts commit 2d832c9587fd99db295b62d0c9b459d509c19d06.
Reverted https://github.com/pytorch/pytorch/pull/156026 on behalf of https://github.com/atalman due to Sorry breaks MPS tests: test_ops.py::TestMathBitsCPU::test_neg_view_full_like_cpu_float64 [GH job link](https://github.com/pytorch/pytorch/actions/runs/15683608879/job/44182730620 ) [HUD commit link](2d832c9587 ) ([comment](https://github.com/pytorch/pytorch/pull/156026#issuecomment-2977903074 ))
2025-06-16 19:50:26 +00:00
2d832c9587
[MPS][Testing][BE] Fix samples for full_like ( #156026 )
...
Now that device is known, one can avoid creating tensors of `torch.double` type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156026
Approved by: https://github.com/dcci
2025-06-16 14:27:42 +00:00
8a22551300
Fixes OpInfo gradient checks for ctc_loss ( #154590 )
...
Fixes #67462
Re-enables `OpInfo` gradient checks for the restricted scenarios where the current `ctc_loss` implementation is accurate and consistent.
The desired `ctc_loss` gradient behavior appears to be an ongoing discussion, see
https://github.com/pytorch/pytorch/issues/52241 . The `OpInfo` gradient checks can be updated if/as the underlying implementation advances.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154590
Approved by: https://github.com/soulitzer
2025-06-10 19:56:39 +00:00
abbdf9f363
[BE][Testing] Unskip ones_like/zeros_like testing on MPS ( #155476 )
...
But skip `double` dtype form OpInfo variants for this test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155476
Approved by: https://github.com/Skylion007 , https://github.com/dcci
2025-06-09 20:37:44 +00:00
7999735d23
[CUDA][MPS] Fix torch.arange bound validation for large float inputs ( #154320 )
...
Fixes #153133
Fixes an inconsistency in torch.arange on CUDA and MPS backends when using float32 and large input values. Previously, invalid ranges (e.g., start > end with a positive step) could silently return empty tensors due to precision loss in validation logic.
The fix introduces double precision validation for checking whether the step sign is consistent with the range direction.
This ensures torch.arange behaves consistently with CPU for large float32 inputs, and raises an appropriate error when the range is invalid.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154320
Approved by: https://github.com/malfet
2025-06-05 14:51:25 +00:00
34e3930401
fix numpy compatibility for 2d small list indices ( #154806 )
...
Will fix #119548 and linked issues once we switch from warning to the new behavior,
but for now, given how much this syntax was used in our test suite, we suspect a silent change will be disruptive.
We will change the behavior after 2.8 branch is cut.
Numpy behavior was changed at least in numpy 1.24 (more than 2 years ago)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154806
Approved by: https://github.com/cyyever , https://github.com/Skylion007 , https://github.com/albanD
2025-06-04 01:58:52 +00:00
e9266f807a
[BE] Use vendored packaging for testing ( #154946 )
...
As the rest of the torch uses it, test should rely on it as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154946
Approved by: https://github.com/cyyever , https://github.com/Skylion007
2025-06-03 14:22:53 +00:00
edc2d539d1
torch.tensordot: performance improvements when contracting to a scalar. (#145936 )
...
As per title.
Fixes https://github.com/pytorch/pytorch/issues/145731
Touches only compute. The CPU overhead can potentially be further reduced.
Before:
```python
In [3]: n = 512
In [4]: A = torch.rand(n, n)
In [5]: B = torch.rand(n, n)
In [6]: %timeit torch.tensordot(A, B, [[0, 1], [0, 1]])
2.04 ms ± 70 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [7]: %timeit torch.tensordot(A, B, [[0, 1], [1, 0]])
2.85 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %timeit torch.tensordot(A, B, [[1, 0], [0, 1]])
2.9 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [9]: %timeit torch.tensordot(A, B, [[1, 0], [1, 0]])
4.07 ms ± 262 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
After
```python
In [2]: n = 512
In [3]: A = torch.rand(n, n)
In [4]: B = torch.rand(n, n)
In [5]: %timeit torch.tensordot(A, B, [[0, 1], [0, 1]])
30.7 µs ± 2.51 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [6]: %timeit torch.tensordot(A, B, [[0, 1], [1, 0]])
141 µs ± 6.52 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [7]: %timeit torch.tensordot(A, B, [[1, 0], [0, 1]])
142 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [8]: %timeit torch.tensordot(A, B, [[1, 0], [1, 0]])
62.8 µs ± 4.31 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145936
Approved by: https://github.com/albanD , https://github.com/ngimel
2025-05-13 10:57:30 +00:00
fe8ebacee4
[ROCm] Upgrade ROCm CI to ROCm6.4 ( #151368 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151368
Approved by: https://github.com/jeffdaily , https://github.com/malfet
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-05-08 16:12:16 +00:00
9919d6b872
[Testing] Add copysign from scalar regression test ( #152997 )
...
But instead of adding it just for MPS backend, add it to OpInfo
Fixes https://github.com/pytorch/pytorch/issues/152582
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152997
Approved by: https://github.com/wdvr
2025-05-07 00:19:42 +00:00
cc28b43950
Revert "[ROCm] Upgrade ROCm CI to ROCm6.4 ( #151368 )"
...
This reverts commit 844842dfbf937c43b41c528e461d3f3931bca6e9.
Reverted https://github.com/pytorch/pytorch/pull/151368 on behalf of https://github.com/malfet due to This broke inductor cpp wrapper ([comment](https://github.com/pytorch/pytorch/pull/151368#issuecomment-2848519706 ))
2025-05-03 08:31:31 +00:00
216d81da81
[CUDA][complex] skip test_reference_numerics_large_jiterator_unary_cuda_complex64 on CUDA ( #148024 )
...
already skipped on ROCM for a similar reason, recent numpy versions changed convention from `nan+infj` to `-inf+infj`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148024
Approved by: https://github.com/nWEIdia , https://github.com/atalman , https://github.com/malfet
2025-05-02 19:11:11 +00:00
844842dfbf
[ROCm] Upgrade ROCm CI to ROCm6.4 ( #151368 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151368
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com >
2025-05-02 17:21:18 +00:00
f0c9b3385d
Support more dtypes for input, indices in gather ( #151822 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151822
Approved by: https://github.com/ngimel
2025-05-01 16:35:23 +00:00
bb90f66e70
[CUDA][conv3d] bump tolerances for test_variant_consistency_eager conv3d complex64 ( #152203 )
...
~1/1000 1.5e-5 mismatch on A100
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152203
Approved by: https://github.com/Skylion007 , https://github.com/soulitzer
2025-04-28 17:59:37 +00:00
e2f9759bd0
Fix broken URLs ( #152237 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237
Approved by: https://github.com/huydhn , https://github.com/malfet
2025-04-27 09:56:42 +00:00
3ef6d6924a
[BE] Switch TestConsistency to MPS device ( #147893 )
...
Which will eventually allow move decorators away more `common_mps.py`
Adjust tolerances accordingly. XFAIL a bunch of tests on MacOS-13, which is going to be deprecated anyway
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147893
Approved by: https://github.com/atalman
ghstack dependencies: #152204
2025-04-26 01:19:21 +00:00
82200e33b5
Make torch._chunk_cat support non-contiguous inputs ( #151263 )
...
Currently, `torch._chunk_cat` only supports contiguous inputs (due to `.view()` usage in `_pad_chunk()` supporting only contiguous tensor). This doesn't work for internal models where there can be non-contiguous input tensors:
- size=[8192, 16416], stride=[16448, 1] # stride[0] is larger than size[1]
- size=[1152, 384], stride=[1, 1152] # column-major tensor
In this PR, we relax the assumption on contiguous input tensor, by switching from `.view()` to `.reshape()`. Note that since `.reshape()` will try to use `.view()` under the hood whenever possible, this should not cause regression to existing use cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151263
Approved by: https://github.com/BoyuanFeng
2025-04-16 04:18:46 +00:00
ddfc14b3ae
[MPS] Fix where ( #151176 )
...
Fixes #150967
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151176
Approved by: https://github.com/kulinseth , https://github.com/malfet
2025-04-13 20:44:50 +00:00
1e92579126
Add torch._scaled_mm for CPU ( #150410 )
...
This PR is the duplicated one for https://github.com/pytorch/pytorch/pull/139975 .
This PR is to add torch._scaled_mm for CPU backend.
_scaled_mm_out_cpu and _scaled_mm_cpu are new added and included in torch._scaled_mm CPU dispatch. We also add _scaled_mm_out_cpu_emulated as a fallback function if the current platform cannot run FP8 matmul using oneDNN. And this PR also updates the various UTs related to FP8 to support CPU tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150410
Approved by: https://github.com/atalman
2025-04-11 02:23:03 +00:00
d751698a36
Support negative values for fill with uint tensors ( #144458 )
...
Fixes https://github.com/pytorch/pytorch/issues/144188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144458
Approved by: https://github.com/amjames , https://github.com/eellison
2025-04-09 21:08:06 +00:00
e6bd133866
add batching rule for torch.Tensor.scatter_add_ ( #150543 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150543
Approved by: https://github.com/zou3519
2025-04-08 18:00:10 +00:00
881d99495d
Add more check for torch.ormqr ( #150759 )
...
As the title statd.
Please refer to https://github.com/pytorch/pytorch/issues/150674 for more info.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150759
Approved by: https://github.com/lezcano
2025-04-08 08:26:05 +00:00
4854926aeb
Revert "Add torch._scaled_mm for CPU ( #150410 )"
...
This reverts commit 3b02f795c5ad2339794b15b370c0e4a235d36adf.
Reverted https://github.com/pytorch/pytorch/pull/150410 on behalf of https://github.com/malfet due to It breaks ROCM tests ([comment](https://github.com/pytorch/pytorch/pull/150410#issuecomment-2777704212 ))
2025-04-04 06:52:54 +00:00
3b02f795c5
Add torch._scaled_mm for CPU ( #150410 )
...
This PR is the duplicated one for https://github.com/pytorch/pytorch/pull/139975 .
This PR is to add torch._scaled_mm for CPU backend.
_scaled_mm_out_cpu and _scaled_mm_cpu are new added and included in torch._scaled_mm CPU dispatch. We also add _scaled_mm_out_cpu_emulated as a fallback function if the current platform cannot run FP8 matmul using oneDNN. And this PR also updates the various UTs related to FP8 to support CPU tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150410
Approved by: https://github.com/atalman
2025-04-03 19:43:45 +00:00
68414512e6
Implement aten.select.int sharding strategy ( #149842 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149842
Approved by: https://github.com/XilunWu
2025-03-27 20:49:00 +00:00
2c4bc65366
[aotd] Guess tangents stride as output strides ( #144579 )
...
AOTDispatch doing AOT backward graph preparation does not know real tangents that user will specify when runs backward.
AOTD guesses the tangents. Before - we guessed that memory format of tangents will be as memory format of corresponding outputs. And if specified tangents at runtime are not the same memory format as we guessed during compilation, AOTD does coercion (copy) to guessed memory_format
But as Horace found, there are popular use cases, where the outputs of compiled region will be in specific memory_format. E.g. in 4D tensor transposing dims 1 and 2.
https://github.com/karpathy/nanoGPT/blob/master/model.py#L57
This PR changes the logic, that AOTD expects the same "strideness" of tangents as outputs. As a result it will avoid coercion for the case of transposed dims.
Limitations:
We keep guessing memory_format for:
1/ Dynamic shapes (needs more changes)
2/ Tensor subclasses (needs more changes)
Other changes:
test_torchinductor was always creating contiguous tangents via `torch.randn()`, changing them to be `torch.randn_like()` to compare computation with the same strideness.
(E.g. for cuda float16 strideness affects numerics for fft ops).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144579
Approved by: https://github.com/bdhirsh
2025-03-20 15:41:36 +00:00
d5b1d99f78
Enable more nightly tests on s390x ( #148452 )
...
Also enable some tests which probably were accidentally disabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148452
Approved by: https://github.com/seemethere , https://github.com/malfet
2025-03-18 16:09:39 +00:00
67742128b7
[ROCm] Bump AOTriton to 0.9.2b ( #148433 )
...
Notable new features/optimizations for SDPA operators on AMD systems from AOTriton 0.9b:
* Optimize these Non-power-of-two head dimensions: 48, 80, 96, 160, 192, 224. Inputs with these head dimensions do not need padding to power-of-two anymore.
* `is_causal=True` cases are now supported with persistent dynamic algorithm, which requires an atomic tensor but does load balance between different CTAs
* `dropout_p > 0.0` cases now support full 64-bit offsets and use all i64x4 PRNG outputs
* The precise AOTriton shared library version can now be identified with `readelf -p .comment libaotriton_v2.so`
+ However, this does not guarantee the GPU images stored under `aotriton.images` have the same version, since they can be overwritten.
* The newly added fused backward kernel will be used for smaller workloads, due to less kernel invocation overhead.
* Support gfx1201 (RX 9070XT). Need to be enabled at runtime with `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148433
Approved by: https://github.com/jeffdaily
2025-03-07 22:10:07 +00:00
96176e32a9
Revert "[ROCm] Bump AOTriton to 0.9.1b ( #148433 )"
...
This reverts commit 8af79b7ec816f5c73536a806aa4c7ea1f7bd3867.
Reverted https://github.com/pytorch/pytorch/pull/148433 on behalf of https://github.com/jovianjaison due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/148433#issuecomment-2704638858 ))
2025-03-06 18:32:48 +00:00