pytorch

frozenleaves/pytorch

Fork 0

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-03 07:24:58 +08:00

Commit Graph

Author	SHA1	Message	Date
Luca Wehrstedt	58879bfafa	[DeviceMesh] Prefer using _layout over _mesh for all sorts of things (#165554 ) The goal of this PR is to avoid storing the explicit `mesh` Tensor inside each DeviceMesh, and instead compute it on-the-fly when the end user needs it, and try to replace all of its internal usages with `_layout` and the newly-introduced `_global_rank_permutation` Tensor. The name of this attribute is up for debate. The advantage of the `_global_rank_permutation` Tensor is that it is _the same_ Tensor for the root mesh and all its children, so it doesn't need to be copied/reallocated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165554 Approved by: https://github.com/fduwjj	2025-10-17 17:57:51 +00:00
fduwjj	232dd65c15	[CuTe] Change the logic of pycute manipulation ops like coalesce, complement from co-lex to lex (#162690 ) PyTorch tensor iteration (.view, contiguous, broadcasting) and NumPy array indexing all follow lexicographic (row-major) order. In Lexicographic (lex) on (i0, i1, …, i{k-1}): the leftmost index(stride is larger) changes fastest and the rightmost index changes slowest and usually last dim is contiguous. However original pycute is all based on co-lex, after porting their code into pytorch and some cosmetic change, we now make it lex so that we can use it for use cases like device mesh internal bookkeeping and other stuff as well. Changes included in this PR: 1. We changes all API ported in, included prefix_product(stride inferring and rename it to suffix_product), idx2crd, crd2idx, coalesce, composition, complement, right_inverse and left_inverse to make sure they are working in the lex way. 2. Added more unit test cases for some API mentioned above since existing unit tests do not have full coverage. 3. One bug fix inside composition, which will lead to infinite recursive call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162690 Approved by: https://github.com/ezyang ghstack dependencies: #162413, #162534, #162414	2025-09-16 19:53:45 +00:00
fduwjj	5dd14f0b65	[CuTe] Copy code from pycute for device mesh bookkeeping (#162413 ) We copied the whole module and its unit test into pytorch codebase. (https://github.com/NVIDIA/cutlass/blob/main/python%2Fpycute%2Flayout.py). We did change the indentation of code from 2 spaces to 4 spaces. And add lint suppressor to make mypy happy. Also we need to make changes to unit test to include ownership and use `run_tests, TestCase` so that the test gets picked up by CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162413 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-09-12 04:28:03 +00:00

Author

SHA1

Message

Date

Luca Wehrstedt

58879bfafa

[DeviceMesh] Prefer using _layout over _mesh for all sorts of things (#165554 )

The goal of this PR is to avoid storing the explicit `mesh` Tensor inside each DeviceMesh, and instead compute it on-the-fly when the end user needs it, and try to replace all of its internal usages with `_layout` and the newly-introduced `_global_rank_permutation` Tensor. The name of this attribute is up for debate. The advantage of the `_global_rank_permutation` Tensor is that it is _the same_ Tensor for the root mesh and all its children, so it doesn't need to be copied/reallocated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165554
Approved by: https://github.com/fduwjj

2025-10-17 17:57:51 +00:00

fduwjj

232dd65c15

[CuTe] Change the logic of pycute manipulation ops like coalesce, complement from co-lex to lex (#162690 )

PyTorch tensor iteration (.view, contiguous, broadcasting) and NumPy array indexing all follow lexicographic (row-major) order. In Lexicographic (lex) on (i0, i1, …, i{k-1}): the leftmost index(stride is larger) changes fastest and the rightmost index changes slowest and usually last dim is contiguous.

However original pycute is all based on co-lex, after porting their code into pytorch and some cosmetic change, we now make it lex so that we can use it for use cases like device mesh internal bookkeeping and other stuff as well.

Changes included in this PR:
1. We changes all API ported in, included prefix_product(stride inferring and rename it to suffix_product), idx2crd, crd2idx, coalesce, composition, complement, right_inverse and left_inverse to make sure they are working in the lex way.
2. Added more unit test cases for some API mentioned above since existing unit tests do not have full coverage.
3. One bug fix inside composition, which will lead to infinite recursive call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162690
Approved by: https://github.com/ezyang
ghstack dependencies: #162413, #162534, #162414

2025-09-16 19:53:45 +00:00

fduwjj

5dd14f0b65

[CuTe] Copy code from pycute for device mesh bookkeeping (#162413 )

We copied the whole module and its unit test into pytorch codebase. (https://github.com/NVIDIA/cutlass/blob/main/python%2Fpycute%2Flayout.py).

We did change the indentation of code from 2 spaces to 4 spaces. And add lint suppressor to make mypy happy.

Also we need to make changes to unit test to include ownership and use `run_tests, TestCase` so that the test gets picked up by CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162413
Approved by: https://github.com/ezyang, https://github.com/Skylion007

2025-09-12 04:28:03 +00:00

3 Commits