Commit Graph

7 Commits

Author SHA1 Message Date
61461f39d1 [dtensor] handle negative dim and fix TP regression (#111750)
TP style still have some regression due to negative dim specifications,
fix it by allow DTensor API to handle negative dims and normalize them.

i.e. TP uses `Shard(-1)`, and then try to redistribute `Shard(1) -> Shard(-1)`, this should ideally be no-op but current it runs a decompose sharding phrase and it would turn this transformation to `Shard(1) -> Replicate -> Shard(-1)`, which is wrong and triggers unnecessary allgathers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111750
Approved by: https://github.com/rohan-varma
2023-10-22 04:25:45 +00:00
6dc56d3490 [DTensor] Remove compute_local_offset from _utils.py (#109096)
Separating internal changes with OSS changes. This PR contains removing the compute_local_offset from the OSS directory only.

This replaces https://github.com/pytorch/pytorch/pull/108965
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109096
Approved by: https://github.com/wanchaol, https://github.com/fduwjj
2023-09-12 21:55:15 +00:00
13e4cce83c [DTensor] Add util API to compute_local_shape_and_global_offset for checkpointing purpose (#107996)
The compute_local_shape_and_global_offset API does the following:
1) Calculate both local_shape and global_offset in one API to replace two API calls (compute_local_size and compute_local_shape).
2) Generate the correct global_offset for checkpointing purposes. We are currently using compute_local_offset for downstream checkpoint components, which could lead to incorrect results. For checkpointing, we need global_offset instead of local_offset. In some cases, global_offset does not equal to local_offset, when a dimension is sharded multipe times on different mesh dimension (e.g. placements = [Shard(0), Shard(0)]).

Follow-up PRs:
1) Replace related downstream components to use compute_local_shape_and_global_offset instead of compute_local_size and compute_local_offset.
2) Audit existing code base to see if we can remove compute_local_size and compute_local_offset, since they are currently being used.

cc. @wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107996
Approved by: https://github.com/wanchaol
2023-08-30 02:46:50 +00:00
92923aca61 [TP] Use Stride inferred from local tensor in to_local bwd (#102630)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102630
Approved by: https://github.com/wanchaol
2023-06-01 04:30:24 +00:00
3ae612ba7f [dtensor] remove assertions about submesh checks (#101229)
This PR removes assertions from submesh checks to directly return local
tensor, this is so that all the other APIs can work with submesh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101229
Approved by: https://github.com/fduwjj
2023-05-12 04:20:35 +00:00
02179827cb [Easy] Include SPMD and DTensor files in UFMT checks (#98148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98148
Approved by: https://github.com/fegin
2023-04-02 15:34:49 +00:00
789fc4c292 [dtensor] refactor shape/offset calculation (#95923)
Shape offset calculation is commonly used and extract them into a separate util

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95923
Approved by: https://github.com/fduwjj
2023-03-05 06:33:32 +00:00