pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
can-gaa-hou	0ce945790e	[NJT] Fix schema validation error in jagged functions (#165307 ) Fixes #161812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165307 Approved by: https://github.com/soulitzer	2025-10-13 17:59:18 +00:00
Dzmitry Huba	5e58420dff	LocalTensor (#164537 ) A LocalTensor is a tensor subclass which simulates a tensor that is distributed across SPMD ranks. A LocalTensor might be size N, but in fact there are world_size shards/replicas of it stored internally. When you do a plain PyTorch operation on it, we apply the operation to each shard; when you do a collective, we do the mathematically equivalent operation on the local shards. A LocalTensor is associated with a list of ranks which specify which ranks it holds local tensors for. NB, this is NOT a DataParallel like abstraction where you can run operations on multiple different GPUs. It is intended purely for debugging purposes, the overhead is almost certainly too high to keep eight GPUs (even the C++ autograd needs multithreading to keep up!) (It might potentially be possible to trace through this with torch.compile and then compile it with CUDA graphs but this is currently a non-goal.) In order to handle MPMD, we provide a helper decorator that allows you to run a function with no side effects for each LocalTensor shard and combine results back into LocalTensor or LocalIntNode. Note: This PR convert all DTensor ops and some DTensor tests to illustrate intended usage and ensure conrrectness. In subsequent PR more tests will be converted. DUring test conversion we aim to share as much as possible of test logic between multi-process / multi-threaded and local tensor tests. We would like to developers to be able to run both flavors of the tests. Note: This work is based on the original proposal by @ezyang (WIP PR https://github.com/pytorch/pytorch/pull/162753). Pull Request resolved: https://github.com/pytorch/pytorch/pull/164537 Approved by: https://github.com/ezyang	2025-10-12 20:06:41 +00:00
adabeyta	6a31f42da4	Fix NestedTensor max/min operations for integer dtypes. (#162273 ) Fixes: https://github.com/pytorch/pytorch/issues/162049 ### Summary max_dim and min_dim functions incorrectly used torch.finfo() for all dtypes, causing TypeError for integer tensors. ### Changes - Use torch.iinfo() for integer dtypes instead of torch.finfo(). - Add CPU test: `test_jagged_max_min_dtypes` covering `int8, int16, int32, int64, uint8, float16, bfloat16, float32 and float64` ### Testing Before Fix: `python -m pytest test/test_nestedtensor.py -k "test_jagged_max_min_dtypes" -v` Output: ``` FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_bfloat16 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float16 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float32 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float64 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0006s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int16 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0005s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int32 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0005s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int64 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0004s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int8 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' FAILED [0.0004s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_uint8 - TypeError: torch.finfo() requires a floating point input type. Use torch.iinfo to handle 'torch.finfo' ``` After Fix: `python -m pytest test/test_nestedtensor.py -k "test_jagged_max_min_dtypes" -v` Output: ``` Running 9 items in this shard test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_bfloat16 PASSED [0.0086s] [ 11%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float16 PASSED [0.0011s] [ 22%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float32 PASSED [0.0011s] [ 33%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_float64 PASSED [0.0011s] [ 44%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int16 PASSED [0.0009s] [ 55%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int32 PASSED [0.0010s] [ 66%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int64 PASSED [0.0010s] [ 77%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_int8 PASSED [0.0010s] [ 88%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_jagged_max_min_dtypes_cpu_uint8 PASSED [0.0011s] [100%] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162273 Approved by: https://github.com/Skylion007, https://github.com/jbschlosser	2025-10-02 18:46:27 +00:00
adabeyta	fd785b1762	Add NestedTensor dispatch for _is_any_true/_is_all_true (#162096 ) Fixes: https://github.com/pytorch/pytorch/issues/161818 ### Summary Add NestedTensor support for `_is_any_true` and `_is_all_true`. ### Changes - Register dispatch for `aten._is_any_true.default` and `aten._is_all_true.default` - Add CPU tests: - `test_is_any_true_jagged`: dispatch_matches_values_buffer, all_false_returns_false, one_true_returns_true - `test_is_all_true_jagged`: dispatch_matches_values_buffer, all_true_returns_true, any_false_returns_false ### Testing Before Fix: `pytest -q test/test_nestedtensor.py -k "test_is_any_true_jagged or test_is_all_true_jagged" -v` Output: ``` FAILED [0.0129s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_is_all_true_jagged_cpu - NotImplementedError: aten._is_all_true.default FAILED [0.0007s] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_is_any_true_jagged_cpu - NotImplementedError: aten._is_any_true.default ``` After Fix: `pytest -q test/test_nestedtensor.py -k "test_is_any_true_jagged or test_is_all_true_jagged" -v` Output: ``` Running 2 items in this shard test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_is_all_true_jagged_cpu PASSED [0.0277s] [ 50%] test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_is_any_true_jagged_cpu PASSED [0.0013s] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162096 Approved by: https://github.com/jbschlosser	2025-09-22 20:22:44 +00:00
Adrian Abeyta	bf28990c3d	Add support for NestedTensor share_memory_ (#162272 ) Fixes: https://github.com/pytorch/pytorch/issues/161915 ### Summary Implements share_memory_() support for NestedTensor! ### Changes - Added share_memory_() method to NestedTensor class. - Shares storage for all NestedTensor components: _values, _offsets, _lengths, and cached seqlen tensors. - Guard for CUDA Tensors. ### Testing Before Fix: `pytest -q test/test_nestedtensor.py -k "test_share_memory" -v` Output: ``` Running 1 items in this shard test/test_nestedtensor.py Fatal Python error: Segmentation fault ``` After Fix: `pytest -q test/test_nestedtensor.py -k "test_share_memory" -v` Output: ``` Running 1 items in this shard test/test_nestedtensor.py::TestNestedTensorDeviceTypeCPU::test_share_memory_cpu PASSED [0.0753s] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162272 Approved by: https://github.com/jbschlosser	2025-09-22 19:59:58 +00:00
drisspg	d08cabe314	[BC Breaking] Remove flex + njt code paths (#161734 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161734 Approved by: https://github.com/jbschlosser	2025-09-16 00:13:56 +00:00
drisspg	ac9ccd0dc2	Add return-max-scores to flex-attention (#161667 ) # Summary ### Update API ```Py class AuxRequest(NamedTuple): """Request which auxiliary outputs to compute from flex_attention. Each field is a boolean indicating whether that auxiliary output should be computed. """ lse: bool = False max_scores: bool = False class AuxOutput(NamedTuple): """Auxiliary outputs from flex_attention operation. Fields will be None if not requested, or contain the tensor if requested. """ lse: Optional[Tensor] = None max_scores: Optional[Tensor] = None out_only = flex_attention(query, key, value, score_mod) out_max, aux_max = flex_attention( query, key, value, score_mod, return_aux=FlexAttentionAuxRequest(max_scores=True), ) out_both, aux_both = flex_attention( query, key, value, score_mod, return_aux=FlexAttentionAuxRequest(lse=True, max_scores=True), ) ``` Returns the max post mod scores from flex attention. Not being able to break BC is kinda of annoying here since we end up with a combinatorial problem where if we need to add any more return vals we need to new kwargs that gate if they get returned by the function and need to support the 2**N additional args possible return groups. Ideally there isn't much more we need to return, but we might want to think about how best to set this up for expansion in the future. I added kwarg only now Maybe we make a `ExtraReturns` type kwarg that can grow and we don't need to keep adding new top level args. We could also return a Struct that holds all the extra tensors and start deprecation cycle for logsumexp eventually returning just 1 `ExtraReturns` like struct with the tensors. ### Req Grad I currently dont return a max_scores that supports backproping grads. I think this might be feasible but since max is essentially 1 hot on the inputs and a reduction we would either need to save another `max_location` from the forward or find the max_score but also only apply to first occurence if there is multiple equivalent scores (need to check if thats we define for vanilla max op in torch). For now no grad, we can re-visit if needed. ## Perf I am going to disable for flex_decode. Since at least initially the motivation is for training. I also more hard than it should be to have ops return nuns or optional tensors, If return max is at the false, we should probably just create a tensor of size zero so that we don't slow down the hot path. ```Shell 🔝 Top 5 TFlops Deltas (by absolute %): shape: (5, 7) ┌────────────────┬────────────────┬───────────────────────┬───────────────┬──────────────┬───────────┬───────────┐ │ attn_type ┆ dtype ┆ shape(B,Hq,M,Hkv,N,D) ┆ TFlops (base) ┆ TFlops (max) ┆ delta ┆ pct_delta │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞════════════════╪════════════════╪═══════════════════════╪═══════════════╪══════════════╪═══════════╪═══════════╡ │ causal ┆ torch.bfloat16 ┆ (4, 16, 2048, 16, ┆ 249.514658 ┆ 243.078974 ┆ 6.435684 ┆ 2.647569 │ │ ┆ ┆ 2048, 64) ┆ ┆ ┆ ┆ │ │ alibi ┆ torch.bfloat16 ┆ (2, 16, 1024, 16, ┆ 57.971274 ┆ 56.633641 ┆ 1.337633 ┆ 2.361905 │ │ ┆ ┆ 1024, 64) ┆ ┆ ┆ ┆ │ │ noop ┆ torch.bfloat16 ┆ (4, 16, 1024, 16, ┆ 244.052884 ┆ 248.65129 ┆ -4.598406 ┆ -1.849339 │ │ ┆ ┆ 1024, 64) ┆ ┆ ┆ ┆ │ │ noop ┆ torch.bfloat16 ┆ (2, 16, 1024, 16, ┆ 280.71254 ┆ 275.686991 ┆ 5.025549 ┆ 1.822918 │ │ ┆ ┆ 1024, 128) ┆ ┆ ┆ ┆ │ │ sliding_window ┆ torch.bfloat16 ┆ (2, 16, 16384, 16, ┆ 152.970031 ┆ 150.489109 ┆ 2.480923 ┆ 1.648573 │ │ ┆ ┆ 16384, 64) ┆ ┆ ┆ ┆ │ └────────────────┴────────────────┴───────────────────────┴───────────────┴──────────────┴───────────┴───────────┘ 🔺 Top 5 Positive TFlops Deltas (highest +%): shape: (5, 7) ┌────────────────┬────────────────┬────────────────────────┬───────────────┬──────────────┬──────────┬───────────┐ │ attn_type ┆ dtype ┆ shape(B,Hq,M,Hkv,N,D) ┆ TFlops (base) ┆ TFlops (max) ┆ delta ┆ pct_delta │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞════════════════╪════════════════╪════════════════════════╪═══════════════╪══════════════╪══════════╪═══════════╡ │ causal ┆ torch.bfloat16 ┆ (4, 16, 2048, 16, ┆ 249.514658 ┆ 243.078974 ┆ 6.435684 ┆ 2.647569 │ │ ┆ ┆ 2048, 64) ┆ ┆ ┆ ┆ │ │ alibi ┆ torch.bfloat16 ┆ (2, 16, 1024, 16, ┆ 57.971274 ┆ 56.633641 ┆ 1.337633 ┆ 2.361905 │ │ ┆ ┆ 1024, 64) ┆ ┆ ┆ ┆ │ │ noop ┆ torch.bfloat16 ┆ (2, 16, 1024, 16, ┆ 280.71254 ┆ 275.686991 ┆ 5.025549 ┆ 1.822918 │ │ ┆ ┆ 1024, 128) ┆ ┆ ┆ ┆ │ │ sliding_window ┆ torch.bfloat16 ┆ (2, 16, 16384, 16, ┆ 152.970031 ┆ 150.489109 ┆ 2.480923 ┆ 1.648573 │ │ ┆ ┆ 16384, 64) ┆ ┆ ┆ ┆ │ │ causal ┆ torch.bfloat16 ┆ (4, 16, 1024, 16, ┆ 161.031318 ┆ 158.597808 ┆ 2.43351 ┆ 1.534391 │ │ ┆ ┆ 1024, 64) ┆ ┆ ┆ ┆ │ └────────────────┴────────────────┴────────────────────────┴───────────────┴──────────────┴──────────┴───────────┘ 🔻 Top 5 Negative TFlops Deltas (lowest -%): shape: (5, 7) ┌────────────────┬────────────────┬───────────────────────┬───────────────┬──────────────┬───────────┬───────────┐ │ attn_type ┆ dtype ┆ shape(B,Hq,M,Hkv,N,D) ┆ TFlops (base) ┆ TFlops (max) ┆ delta ┆ pct_delta │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │ ╞════════════════╪════════════════╪═══════════════════════╪═══════════════╪══════════════╪═══════════╪═══════════╡ │ noop ┆ torch.bfloat16 ┆ (4, 16, 1024, 16, ┆ 244.052884 ┆ 248.65129 ┆ -4.598406 ┆ -1.849339 │ │ ┆ ┆ 1024, 64) ┆ ┆ ┆ ┆ │ │ alibi ┆ torch.bfloat16 ┆ (2, 16, 1024, 4, ┆ 175.546923 ┆ 177.81205 ┆ -2.265127 ┆ -1.273888 │ │ ┆ ┆ 1024, 128) ┆ ┆ ┆ ┆ │ │ sliding_window ┆ torch.bfloat16 ┆ (4, 16, 16384, 4, ┆ 156.282597 ┆ 158.209134 ┆ -1.926537 ┆ -1.217715 │ │ ┆ ┆ 16384, 64) ┆ ┆ ┆ ┆ │ │ sliding_window ┆ torch.bfloat16 ┆ (2, 16, 2048, 16, ┆ 232.542929 ┆ 235.140136 ┆ -2.597207 ┆ -1.104536 │ │ ┆ ┆ 2048, 128) ┆ ┆ ┆ ┆ │ │ alibi ┆ torch.bfloat16 ┆ (2, 16, 1024, 16, ┆ 169.652791 ┆ 171.475986 ┆ -1.823195 ┆ -1.063236 │ │ ┆ ┆ 1024, 128) ┆ ┆ ┆ ┆ │ └────────────────┴────────────────┴───────────────────────┴───────────────┴──────────────┴───────────┴───────────┘ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/161667 Approved by: https://github.com/Chillee, https://github.com/BoyuanFeng	2025-09-08 22:44:48 +00:00
Eddie Yan	1128f4c2a8	[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 ) cleanup tuple/tensor boilerplate in cuDNN SDPA, preparation for nested/ragged tensor backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/149282 Approved by: https://github.com/drisspg Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-08-08 22:22:48 +00:00
Alan Du	50580b5053	Add minimal nn.functional.log_softmax support for NestedTensor (#159662 ) This only works for the jagged layout and for the non-batch and non-jagged dimensions. I did this mostly by copy-pasting from the existing softmax implementation, but it seems fairly straightforward and I think it should work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159662 Approved by: https://github.com/jbschlosser	2025-08-06 20:34:02 +00:00
Shangdi Yu	bc4b04e058	DeviceCopy should have the same layout as input (#159615 ) Summary: Fix https://github.com/pytorch/pytorch/issues/159612 - Fix the meta implementation of `nan_to_num`, it should preserve the stride of the input - The DeviceCopy IR node should always preserve the input's layout, so we don't end up with a contiguous call during device copy Test Plan: ``` buck2 run @mode/dev-nosan fbcode//caffe2/test/inductor:test_aot_inductor -- -r test_d2h_copy ``` Rollback Plan: Differential Revision: D79411407 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159615 Approved by: https://github.com/eellison	2025-08-04 23:56:58 +00:00
Sam Larsen	fa0355c18d	Fix full_like decomposition to preserve strides (#158898 ) Summary: See original PR at: https://github.com/pytorch/pytorch/pull/144765, which landed internally but was reverted due to test failures. Addressing reviewer comments and trying again. Rollback Plan: Differential hack Revision: D78783627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158898 Approved by: https://github.com/eellison	2025-07-25 20:21:36 +00:00
Mikayla Gawarecki	7f649ed4f8	Add basic torch.hash_tensor op (#154149 ) Added `torch.hash_tensor` reduction function with a `mode` argument that defaults to reduction with xor. - The hash is always uint64. - Integers will be casted to uint64 before performing the xor_sum reduction - Floats will be upcasted to double and then bitcasted to uint64 before performing the xor_sum reduction Pull Request resolved: https://github.com/pytorch/pytorch/pull/154149 Approved by: https://github.com/albanD	2025-07-23 22:28:03 +00:00
PyTorch MergeBot	bfe5674e22	Revert "[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 )" This reverts commit 0797b2b6a80cf70a7accc3d5413186e7693d4451. Reverted https://github.com/pytorch/pytorch/pull/149282 on behalf of https://github.com/wdvr due to reverting as discussed with @drisspg - @eqy please reach out to @drisspg for more info ([comment](https://github.com/pytorch/pytorch/pull/149282#issuecomment-3084759671))	2025-07-17 16:55:55 +00:00
Eddie Yan	0797b2b6a8	[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 ) cleanup tuple/tensor boilerplate in cuDNN SDPA, preparation for nested/ragged tensor backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/149282 Approved by: https://github.com/drisspg Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-07-11 16:07:54 +00:00
zeshengzong	ed03492238	Add check `nested_tensor_from_jagged` param `jagged_dim >= 1` (#157770 ) Fixes #157404 ## Test Result ```bash pytest test/test_nestedtensor.py ...............................................s..........ssssss.................................................................................................s.s..sssss..s...ss............................................................. [ 44%] ...........................................................sssss....sss...s.........ss....s....sss.........s.sss...s..s......s............s.sss.ss...............s.....................s....s......................s.s.....s....s..s..ssssssssss [ 59%] sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss..ssssss.ssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.ssssssss...............................s........................................... [ 74%] .......sss...................................................................................................................................................................................................................................... [ 89%] ....sss.......................................................................................................................................................... [100%] ==================================================================================================== 1317 passed, 258 skipped in 2504.27s (0:41:44) ==================================================================================================== ``` ![image](https://github.com/user-attachments/assets/dcc8e46d-b88f-4580-b4ad-0999bad33ec9) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157770 Approved by: https://github.com/soulitzer Co-authored-by: Jeffrey Wan <soulitzer@gmail.com>	2025-07-10 00:34:39 +00:00
Xuehai Pan	fc0376e8b1	[BE][2/6] fix typos in test/ (test/test_*.py) (#157636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157636 Approved by: https://github.com/yewentao256, https://github.com/mlazos ghstack dependencies: #156311, #156609	2025-07-09 11:02:23 +00:00
PyTorch MergeBot	c553c55be7	Revert "Fix full_like decomposition to preserve strides (#144765 )" This reverts commit 01b0f09931d47bd2716398a0c335b2807dc3074d. Reverted https://github.com/pytorch/pytorch/pull/144765 on behalf of https://github.com/jeanschmidt due to Seems to be breaking internal tests see [D77652778](https://www.internalfb.com/diff/D77652778), @jansel may you help get this PR merged? ([comment](https://github.com/pytorch/pytorch/pull/144765#issuecomment-3027975098))	2025-07-02 13:56:03 +00:00
Isuru Fernando	01b0f09931	Fix full_like decomposition to preserve strides (#144765 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144765 Approved by: https://github.com/amjames, https://github.com/jansel	2025-07-01 19:13:22 +00:00
PyTorch MergeBot	06408dae49	Revert "Add view_simple as meta function for view, and avoid calling reshape_view_helper. (#154757 )" This reverts commit 0029259bdfeee627181df2b9f5ff6979f65090ec. Reverted https://github.com/pytorch/pytorch/pull/154757 on behalf of https://github.com/laithsakka due to post land issue ([comment](https://github.com/pytorch/pytorch/pull/154757#issuecomment-2971385787))	2025-06-13 19:11:43 +00:00
Laith Sakka	0029259bdf	Add view_simple as meta function for view, and avoid calling reshape_view_helper. (#154757 ) address https://github.com/pytorch/pytorch/issues/153303 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154757 Approved by: https://github.com/bobrenjc93, https://github.com/leslie-fang-intel	2025-06-12 09:58:15 +00:00
soulitzer	a060f3d272	Rewrite autograd producer consumer stream sync logic (#151079 ) Also see previous work https://github.com/pytorch/pytorch/pull/142097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151079 Approved by: https://github.com/albanD	2025-05-16 15:42:22 +00:00
PyTorch MergeBot	f363a3f51a	Revert "[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 )" This reverts commit 9386701b51aadce951bf38daf497b0257a3f2211. Reverted https://github.com/pytorch/pytorch/pull/149282 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, see [D74729259](https://www.internalfb.com/diff/D74729259). @drisspg may you help out the author have their PR merged? ([comment](https://github.com/pytorch/pytorch/pull/149282#issuecomment-2881546951))	2025-05-14 20:53:49 +00:00
PyTorch MergeBot	2c1912452d	Revert "Rewrite autograd producer consumer stream sync logic (#151079 )" This reverts commit f78e4529a9d446deb77c6ac38184582f6ab9167a. Reverted https://github.com/pytorch/pytorch/pull/151079 on behalf of https://github.com/jeanschmidt due to Seems to have introduced regressions in internal signals, see [D74648937](https://www.internalfb.com/diff/D74648937) ([comment](https://github.com/pytorch/pytorch/pull/151079#issuecomment-2880176879))	2025-05-14 13:07:12 +00:00
eqy	9386701b51	[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 ) cleanup tuple/tensor boilerplate in cuDNN SDPA, preparation for nested/ragged tensor backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/149282 Approved by: https://github.com/drisspg	2025-05-14 01:39:24 +00:00
soulitzer	f78e4529a9	Rewrite autograd producer consumer stream sync logic (#151079 ) Also see previous work https://github.com/pytorch/pytorch/pull/142097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151079 Approved by: https://github.com/albanD	2025-05-12 21:07:16 +00:00
eqy	ce317cd5a8	[CUDA][SDPA] bump fudge factor in `test_sdpa` in `test_nestedtensor` (#152235 ) Small mismatches on e.g., 4090, A6000/A40 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152235 Approved by: https://github.com/drisspg, https://github.com/Skylion007, https://github.com/jbschlosser	2025-04-30 20:24:49 +00:00
Guilherme Leobas	4e7d264cf8	Introduce `UserDefinedExceptionClassVariable` (#146504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146504 Approved by: https://github.com/anijain2305	2025-03-11 18:55:45 +00:00
bobrenjc93	c297c09a37	Fix invalid nested int guarding in broadcast_shapes() (#145957 ) Fixes #145874 This PR takes the approach of updating the logic determining whether multiple shapes broadcast together to handle nested ints specially. Possible alternative approach: don't update `broadcast_shapes()` + indicate that e.g. `Ne(j0, 1)` should statically evaluate to False. I briefly tried this but it wasn't straightforward. Is it better? Pull Request resolved: https://github.com/pytorch/pytorch/pull/145957 Approved by: https://github.com/bobrenjc93 Co-authored-by: bobrenjc93 <bobren@meta.com>	2025-03-11 00:53:13 +00:00
Eddie Yan	93e9daed54	[cuDNN][SDPA][Nested Tensor] Experimental cuDNN Nested Tensor SDPA Support (forward only) (#141178 ) Disabled by default for now behind `TORCH_CUDNN_SDPA_NESTED_TENSOR_ENABLED=1` Just wanted to get this out before starting a series of SDPA cleanup PRs---the biggest thing is we don't need the boilerplate around all of the `build_graph_and_tensors*` functions anymore as we can now use the `UID`-style referencing of tensor nodes as was done for the Conv-V8 API backend. CC @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/141178 Approved by: https://github.com/jbschlosser	2025-03-04 23:09:09 +00:00
Xuehai Pan	c73a92fbf5	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 ) Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements > Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target: > > ```python > # Input > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > > # Black > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > # Ruff > assert len(policy_types) >= priority + num_duplicates, ( > f"This tests needs at least {priority + num_duplicates} many types." > ) > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546 Approved by: https://github.com/malfet	2025-02-27 20:46:16 +00:00
Mikayla Gawarecki	9a1f720a72	Validate inputs to _nested_view_from_buffer to prevent overflows (#147356 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147356 Approved by: https://github.com/albanD, https://github.com/jbschlosser ghstack dependencies: #147352, #147354	2025-02-27 15:48:58 +00:00
PyTorch MergeBot	fa8e3a28a7	Revert "[cuDNN][SDPA][Nested Tensor] Experimental cuDNN Nested Tensor SDPA Support (forward only) (#141178 )" This reverts commit 533b884870acd951e684e0bf551eb76904dec047. Reverted https://github.com/pytorch/pytorch/pull/141178 on behalf of https://github.com/jeanschmidt due to Broke internal arvr signals, see D69971019. @jbschlosser please help the author get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/141178#issuecomment-2676317470))	2025-02-22 17:28:12 +00:00
Eddie Yan	533b884870	[cuDNN][SDPA][Nested Tensor] Experimental cuDNN Nested Tensor SDPA Support (forward only) (#141178 ) Disabled by default for now behind `TORCH_CUDNN_SDPA_NESTED_TENSOR_ENABLED=1` Just wanted to get this out before starting a series of SDPA cleanup PRs---the biggest thing is we don't need the boilerplate around all of the `build_graph_and_tensors*` functions anymore as we can now use the `UID`-style referencing of tensor nodes as was done for the Conv-V8 API backend. CC @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/141178 Approved by: https://github.com/jbschlosser	2025-02-21 05:22:19 +00:00
David Berard	43496e9b90	[NJT] fix flop counter for SDPA & test (#147032 ) Fixes 3 issues: 1. The test wasn't actually testing SDPA: both were checking cuda, and the inputs to SDPA were not transposed. 2. FlopCounterMode has been renamed _FlopCounterMode (and a wrapper named FlopCounterMode has been added) 3. offsets_to_list also needs to ignore the actual offset values if offsets is a meta tensor. Differential Revision: [D69558785](https://our.internmc.facebook.com/intern/diff/D69558785) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147032 Approved by: https://github.com/jbschlosser	2025-02-13 07:14:58 +00:00
soulitzer	3cadce7af2	[NJT] Fix inference mode for composite implicit ops without nested-specific kernel (#146633 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146633 Approved by: https://github.com/jbschlosser	2025-02-10 16:59:48 +00:00
Joel Schlosser	1ba1b7b597	Support remaining _like factory functions for NJT (#144889 ) Fixes #144761 This PR adds NJT impls for those _like functions that were previously missing: * `full_like()` * `rand_like()` * `randint_like()` It also fixes a bug in existing *_like functions when a new device is specified. Fix is to also transfer `offsets` / `lengths` to the new device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144889 Approved by: https://github.com/soulitzer	2025-01-27 21:33:51 +00:00
Joel Schlosser	46e06e1d09	Avoid data-dependent errors in NJT tests via capture_scalar_outputs=True (#144588 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. There are several xfails related to data-dependent errors in torch.compile. This PR sets `torch._dynamo.config.capture_scalar_outputs=True` to avoid these, which tends to exercise unbacked SymInt logic and will require `torch._check()`-related fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144588 Approved by: https://github.com/soulitzer ghstack dependencies: #144586, #144587	2025-01-24 22:45:01 +00:00
Joel Schlosser	128f3627b1	Implement backward for NJT matmul (#144587 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. This PR implements missing backward support for NJT matmul. Notably, for dense tensors, matmul dispatches to bmm. However, due to historical reasons related to NST, NJT handles matmul directly, and thus can't rely on the CompositeImplicit impl of matmul to get the derivative formula. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144587 Approved by: https://github.com/soulitzer ghstack dependencies: #144586	2025-01-21 18:27:50 +00:00
Joel Schlosser	af204135d8	Fix NJT fill.Scalar for contiguous inputs (#144586 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. This PR implements the missing `fill.Scalar` support, which works fine for contiguous inputs, but there is still some AOTAutograd debugging required to handle non-contiguous transposed NJTs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144586 Approved by: https://github.com/soulitzer	2025-01-21 18:22:08 +00:00
Joel Schlosser	b63b81410c	Fix NJT frexp() to handle both outputs (#144585 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. Before this PR, `frexp()` for NJT was handled via the unary pointwise fallback. The op returns a tuple, however, and the fallback doesn't handle that. This PR defines an explicit impl for `frexp()` that wraps both returned `(mantissa, exponent)` as NJTs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144585 Approved by: https://github.com/soulitzer ghstack dependencies: #144582, #144583, #144584	2025-01-18 15:59:56 +00:00
Joel Schlosser	3ee531f8b9	Support NJT chunk() backward on batch dim (#144584 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. Implements `chunk()` backward on the batch dim, which was left out before. This PR unbinds the components and invokes `copy_()` on these to pass along the appropriate gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144584 Approved by: https://github.com/soulitzer ghstack dependencies: #144582, #144583	2025-01-18 15:58:24 +00:00
Joel Schlosser	a8ef423fed	Fix NJT min / max backward() for non-ragged reductions (#144583 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. `value_selecting_reduction_backward()` is used in the backward for min / max, so this PR implements it for NJT. Notably, this isn't enough for reducing over the ragged dim, since that results in a dense tensor and thus NJT's torch_dispatch will not be called for this op. We need factory function support for nested ints to fix that case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144583 Approved by: https://github.com/soulitzer ghstack dependencies: #144582	2025-01-17 20:57:11 +00:00
Joel Schlosser	cac10b8190	Fix NJT OpInfo entry for nn.functional.prelu (#144582 ) Part of my BE project addressing NJT bugs surfaced via OpInfo tests. The OpInfo entry for prelu was wrong before this PR; `weight` needs to be passed as well. The op isn't fully implemented yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144582 Approved by: https://github.com/soulitzer	2025-01-17 20:36:15 +00:00
wizzniu	c07dc64017	Update pin memory related APIs to not pass 'device' argument (#131858 ) Based on https://github.com/pytorch/pytorch/pull/126376, this PR tries to update all PT callers (e.g., `Tensor.is_pinned()`, `Tensor.pin_memory()`) to not pass `device` argument. As for `storage/untyped_storage.is_pinned()/pin_memory()`, we keep the `device` argument but passing `device` is discouraged. And if not given, the default `device` is still 'cuda' for BC. Additionally, based on device-agnostic pin_memory, `pin_memory_device` argument of `torch.utils.data.DataLoader` is discouraged now. For BC, explictly passing this argument is still effective. If not given, the default `device` will be the current accelerator. Fixes #124908 Relates https://github.com/pytorch/pytorch/pull/126376 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131858 Approved by: https://github.com/albanD Co-authored-by: albanD <desmaison.alban@gmail.com>	2025-01-15 17:23:35 +00:00
cyy	df458be4e5	[4/N] Apply py39 ruff and pyupgrade fixes (#143257 ) ```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257 Approved by: https://github.com/justinchuby, https://github.com/albanD	2025-01-04 10:47:51 +00:00
Joel Schlosser	228b228449	Fix batch-specific attention mod for NJT + Flex (#143866 ) Fixes #143788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143866 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch	2024-12-27 20:51:41 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
Joel Schlosser	5dabe2d464	Fix NJT backward tests (#143072 ) This PR fixes some issues with NJT backward / compile backward tests: 1. `requires_grad` was not being propagated appropriately during `SampleInput` generation, so a LOT of backward cases were untested before (sad times). This PR utilizes a helper function `_clone()` to clone() / detach() NJTs for SampleInputs while preserving `requires_grad` status. Note: the clone() / detach() stuff is for autograd; can't have two SampleInputs as part of the same autograd graph. 2. Per-sample skips weren't -fully- working; the op logic would still be invoked even with a skip. I found this out thanks to `split_with_sizes`, which segfaults during backwards because it tries to use an NST-specific formula. As annoying as it is, I tried a ton of things but ultimately had to split the `subtest_ctx` into that + a `skip_xfail_ctx` to run the subtests within. * Updated all uses of per-sample skips / xfails: 4 in `test_nestedtensor.py` and 1 in `test_vmap.py` 3. Added the appropriate skips / xfails to get everything passing. There are a shitton of bugs to fix! Pull Request resolved: https://github.com/pytorch/pytorch/pull/143072 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2024-12-12 18:06:23 +00:00
Xuehai Pan	e1196dfe51	Deprecate `torch._utils.is_compiling()` (#127690 ) This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-12-08 22:55:36 +00:00
Joel Schlosser	5663ad99e7	Fix per-sample xfails for NJT tests (#142243 ) #140736 fixed some xfails, but these were not properly failing in CI due to #142157. This PR removes the xfails so we can land a fix to that issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142243 Approved by: https://github.com/huydhn	2024-12-06 22:39:35 +00:00

1 2 3 4 5 ...

376 Commits