pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
PyTorch MergeBot	99f2491af9	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit 45411d1fc9a2b6d2f891b6ab0ae16409719e09fc. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))	2025-01-04 14:17:20 +00:00
Xuehai Pan	45411d1fc9	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2025-01-03 20:03:40 +00:00
PyTorch MergeBot	cc4e70b7c3	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit 135c7db99d646b8bd9603bf969d47d3dec5987b1. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))	2024-12-26 17:26:06 +00:00
Xuehai Pan	135c7db99d	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2024-12-24 08:33:08 +00:00
Tom Ritchford	d25e6e623f	Fix unused Python variables in test/[a-d]* (#134665 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134665 Approved by: https://github.com/albanD	2024-12-13 22:13:12 +00:00
Ke Wen	a58d2f14e8	[DTensor] Add a private util for sharding tensor (#142288 ) Locally shards a full tensor based on indicated sharding arrangement, and returns a DTensor containing the local shard. warning: This is a private API purposed to skip the communication otherwise required by `distribute_tensor`. It is only applicable to a case where all ranks have the same `full_tensor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142288 Approved by: https://github.com/wz337	2024-12-07 05:30:18 +00:00
Mikayla Gawarecki	f3f305ef3e	Fix condition for weights_only unpickler for DTensor (#140740 ) Same as #140739 but for DTensor (move safe globals for DTensor to `torch.distributed.tensor.__init__` and update error message to let user know `torch.distributed.tensor` must be imported to load DTensor) Differential Revision: [D65961690](https://our.internmc.facebook.com/intern/diff/D65961690) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140740 Approved by: https://github.com/malfet ghstack dependencies: #140739	2024-11-19 02:44:53 +00:00
wz337	4893e248a8	[DTensor][Test] Remove safe global context for weights_only torch.load() DTensor (#140173 ) We have added DTensor related classes to allowed globals so we can torch.load(DTensor) with weights_only=True. So we don't need the safe_globals context for this test anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140173 Approved by: https://github.com/mikaylagawarecki ghstack dependencies: #139949	2024-11-09 02:21:44 +00:00
Wanchao Liang	cfc227ad43	[reland][dtensor] move DTensor to public namespace (#134203 ) reland of https://github.com/pytorch/pytorch/pull/133113 I have to create a new PR because the previous reverted PR could not either be rebased, or imported successfully :( ---- Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the torch.distributed._tensor, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/134203 Approved by: https://github.com/tianyu-l	2024-09-08 17:08:40 +00:00
wz337	cfb642bb6b	[DTensor] Extend implicit replication to replicate DTensor for foreach ops so model doesn't have to be fully tp-ed when using 2D (#134551 ) Fixes [134212](https://github.com/pytorch/pytorch/issues/134212) Currently, when we use 2D FSDP with TP, `optimizer.step()` would fail if the model were not fully tensor parallelized. If we don't have the entire model tensor parallelized when doing 2D, we would have both 1D and 2D DTensor parameters. As foreach is turned on by default, `optimizer.step()` would fail as cross mesh op is not allowed. Error as follows: ``` NotImplementedError: aten._foreach_mul_.Scalar: DTensor does not support cross-mesh operation yet!Got meshes: DeviceMesh('cuda', [[0, 1], [2, 3]], mesh_dim_names=('dp', 'tp')) DeviceMesh('cuda', [1, 3], mesh_dim_names=('dp',)) ``` In this PR, we extend implicit_replication to replicate DTensor in missing dimensions for foreach ops. If users don't want to fully tensor parallelize the model when using 2D, they have the option of using the `implicit_replication()` context manager for `optimizer.step()`. In this case, we would swap out the 1D DTensorSpec and replace it with 2D DTensorSpec. However, we don't want to turn this on by default yet, as we want the users to be aware that the tp dimension is replicated if a layer is not tp-ed. With implicit implication turning on, try replicate dtensor spec in missing dimension would work for most cases for foreach case except when the first DTensor in the list is one that also need to be replicated. This is currently a limitation, which I don't have a good solution yet. Currently, with this change, we can handle most of the cases except the case that the first DTensor's ndim is not the largest. ``` [2D_DTensor, 1D_DTensor...] ---> Implicit_replication() can handle this. [1D_DTensor, 2D_DTensor...] ---> Implicit_replication() can't handle this. ``` This change doesn't affect the existing default behavior, as `implicit_replication()` is not turned on by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134551 Approved by: https://github.com/tianyu-l	2024-08-29 09:01:31 +00:00
PyTorch MergeBot	35f36363ec	Revert "[dtensor] move DTensor to public namespace (#133113 )" This reverts commit 2ee6b97464d17fcf4c1fc67c29868fa30d0c16e1. Reverted https://github.com/pytorch/pytorch/pull/133113 on behalf of https://github.com/wanchaol due to looks like it break some internal type imports ([comment](https://github.com/pytorch/pytorch/pull/133113#issuecomment-2295670911))	2024-08-19 05:00:19 +00:00
Wanchao Liang	2ee6b97464	[dtensor] move DTensor to public namespace (#133113 ) Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the `torch.distributed._tensor`, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/133113 Approved by: https://github.com/XilunWu ghstack dependencies: #133305, #133306	2024-08-17 05:09:52 +00:00
Mikayla Gawarecki	d9576c9440	Fix failures when default is flipped for weights_only (#127627 ) Tests on XLA shard not fixed yet but there is an issue here https://github.com/pytorch/xla/issues/7799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127627 Approved by: https://github.com/albanD ghstack dependencies: #132349	2024-08-16 00:22:43 +00:00
Xuehai Pan	db3290846e	[BE][Easy][10/19] enforce style for empty lines in import segments in `test/d*/` (#129761 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129761 Approved by: https://github.com/fegin	2024-07-17 16:57:39 +00:00
Wanchao Liang	a7cfe40c9b	[dtensor] Improve from_local API with run_check (#130289 ) as titled, this PR: 1. switch `run_check` to be by default False and add extra doc/comments about the correctness guarantee. Since I observed so many calls forget to use run_check=False, we should simply switch to not perform metadata check and make our documentation explicit 2. Implement metadata check by picking up the changes from https://github.com/pytorch/pytorch/pull/115229 3. Improve the from_local documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/130289 Approved by: https://github.com/awgu, https://github.com/wz337 ghstack dependencies: #130286, #130287, #130288	2024-07-15 18:52:55 +00:00
Chien-Chin Huang	0d8dedb01b	[dtensor] Add dtensor to TORCH_LOGS (#129512 ) Summary: Add the basic log for dispatcher of dtensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/129512 Approved by: https://github.com/wanchaol, https://github.com/XilunWu	2024-07-12 06:50:53 +00:00
wz337	d1f9e822dd	[DTensor][Test] Update implicit replication unit tests for tensor arg being the first in args list (#127803 ) Change the operands order so we can have test coverage for when the first arg is a tensor arg instead of DTensor arg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127803 Approved by: https://github.com/XilunWu	2024-06-25 23:51:58 +00:00
Mikayla Gawarecki	c5f7755e86	Allow BUILD/NEWOBJ instruction for items added via torch.serialization.add_safe_globals (#129251 ) Previously, allowlisting functions/classes via `torch.serialization.add_safe_globals(obj)` for the `weights_only` Unpickler had the following effect: - For a [`GLOBAL`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1939) instruction, `GLOBAL obj.__module__ obj.__name__` would be allowed and translated back to obj to be pushed back to the stack. - For a [`REDUCE`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1926-L1982) instruction where we expect the stack to contain `func` and `args`, `func` is allowed if it was added via `add_safe_globals` However, it did not have an effect on `BUILD` and `NEWOBJ` instructions Some classes may be rebuilt via [`NEWOBJ`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L2091-L2104) instruction, which indicates that their constructor should be used to rebuild the class. Further, a [`BUILD`](https://github.com/python/cpython/blob/3.12/Lib/pickletools.py#L1984-L2007) instruction might be used if an object's `__reduce__`/`__reduce_ex__` returns a non-None value for `state`. Which indicates a `__setstate__` or `__dict__.update`. This PR makes sure that adding objects to the allowlist will also allow `NEWOBJ` and `BUILD` instructions for them. In particular, the update for `NEWOBJ` should unblock allowlisting of [`ScaledMMConfig`](`d4ade877df/float8_experimental/float8_tensor.py (L26-L30)`) in float8_experimental @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/129251 Approved by: https://github.com/albanD ghstack dependencies: #129244	2024-06-25 04:19:44 +00:00
Wanchao Liang	3df53c2a8f	[dtensor] directly return local_tensor under no_grad (#128145 ) as titled, skip the autograd function and directly return the local_tensor if it's under no_grad context, this would avoid creating views Pull Request resolved: https://github.com/pytorch/pytorch/pull/128145 Approved by: https://github.com/awgu ghstack dependencies: #128112	2024-06-07 04:01:47 +00:00
Wanchao Liang	4f87f47ea1	[dtensor] reuse DTensorSpec as much as possible (#128112 ) as titled, given that our DTensorSpec is immutable, we can always reuse the spec if the input/output have the same tensor metadata. this helps two fold: 1. We don't need to re-calculate the hash everytime we produce a DTensorSpec, reduce runtime operator overhead 2. reduce the DTensor construction overhead. Some local benchmark on a 800 parameter clip_grad_norm shows that for foreach_norm the CPU overhead reduces from 11ms -> 7.8ms (around 30% improvement) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128112 Approved by: https://github.com/awgu	2024-06-06 16:55:50 +00:00
Wanchao Liang	b0ef363972	[dtensor] rename _Partial -> Partial for all imports (#127420 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127420 Approved by: https://github.com/awgu	2024-05-29 21:42:40 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Wanchao Liang	2ae65b72ff	[dtensor] early return for _split_tensor (#125810 ) as titled, if _split_tensor does not require padding or even is evenly sharded on the dim, no need to calculate padding and could simply return This is to avoid some unnecessary CPU operations Pull Request resolved: https://github.com/pytorch/pytorch/pull/125810 Approved by: https://github.com/wz337	2024-05-14 04:59:27 +00:00
wz337	603d1e6049	[DTensor] allow numel 1 tensor operand to be implicitly replicate DTensor (#125073 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/125073 Approved by: https://github.com/wanchaol	2024-05-08 19:47:47 +00:00
Wanchao Liang	8d46ab4104	[dtensor] move pad/unpad_tensor to separate utils (#124871 ) as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument Pull Request resolved: https://github.com/pytorch/pytorch/pull/124871 Approved by: https://github.com/awgu, https://github.com/wz337, https://github.com/XilunWu	2024-04-29 17:22:25 +00:00
PyTorch MergeBot	359ff49bf4	Revert "[dtensor] move pad/unpad_tensor to separate utils (#124871 )" This reverts commit 0b0eea222978e6b377e2c67f89902d5eb1aa7da3. Reverted https://github.com/pytorch/pytorch/pull/124871 on behalf of https://github.com/jeanschmidt due to Broke internal tests, see D56587991 for more details ([comment](https://github.com/pytorch/pytorch/pull/124871#issuecomment-2079001103))	2024-04-26 09:30:34 +00:00
Wanchao Liang	0b0eea2229	[dtensor] move pad/unpad_tensor to separate utils (#124871 ) as titled, 1. pad/unpad is a general util not specific to the Shard placement, 2. for the propose of the next PR, move these two out of Shard placement itself, and give additional pad_dim argument Pull Request resolved: https://github.com/pytorch/pytorch/pull/124871 Approved by: https://github.com/awgu, https://github.com/wz337	2024-04-25 03:36:16 +00:00
Yifu Wang	2a2e1d8e4f	[functional collective] change the Python APIs to only use the native funcol ops (#123777 ) ## Summary After this PR, the functional collective Python APIs will stop honoring `TORCH_DISABLE_NATIVE_FUNCOL` and only use native funcol ops. Specifically, this PR: - Removed `use_native_funcol()`. - Removed the code path in the Python APIs when `use_native_funcol()` is `False`. - Changed the CI tests that runs on both native funcol and legacy funcol through the Python API to only run with native funcol. ## Test Changes `test_functional_api.py` - Removed the tests where only one of output_split_sizes or input_split_sizes is specified. This behavior is unreliable has been removed from the native funcol. - Removed `TestWaitiness` which tests an implementation detail of the legacy funcol. We have equivalent tests for native funcol in `test/distributed/test_c10d_functional_native.py` `b7fac76fc2/test/distributed/test_c10d_functional_native.py (L114-L116)` `test/distributed/_tensor/test_dtensor.py` `test/distributed/_tensor/test_dtensor_compile.py` `test/distributed/test_device_mesh.py` `test/distributed/_tensor/experimental/test_tp_transform.py` `test/distributed/_tensor/test_matrix_ops.py` `test/distributed/test_inductor_collectives.py` - All these tests were double running with both native funcol and legacy funcol. Changed to only run with native funcol. `test/distributed/test_c10d_functional_native.py` - Removed the `run_with_native_funcol` decorators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123777 Approved by: https://github.com/wanchaol ghstack dependencies: #123776	2024-04-13 03:08:36 +00:00
Wanchao Liang	242e03ba86	[dtensor] add async_op option to redistribute and some refactor (#121477 ) async output option was only available in `full_tensor()` call, but I think it's generally good to make this option available in the `redistribute` call directly so that user can control it This PR adds async_op option to redistribute call, to allow user control whether to perform tensor redistribution asynchronously or not. By default we set this to False, this is to follow the semantics of the c10d collectives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121477 Approved by: https://github.com/wz337	2024-03-09 06:17:23 +00:00
Wanchao Liang	bc02fca358	[dtensor] to_local backward grad placement passthrough (#121474 ) to_local accepts a `grad_placements` if user choose to pass, previously we enforce the grad_out to be the "same" placement as the current DTensor for safety. But I realized that we DO NOT need to enforce this constraint. Why? backward placement does not need to be the same as fwd tensor placement, this is already the case for param vs param.grad (i.e. param can be replicate and grad can be partial), so we should not restrict this to activation vs activation grad too Pull Request resolved: https://github.com/pytorch/pytorch/pull/121474 Approved by: https://github.com/awgu, https://github.com/yoyoyocmu, https://github.com/yifuwang	2024-03-08 23:11:49 +00:00
Yifu Wang	a4c5f48b11	Prepare test_dtensor.py for native funcol migration (#120043 ) This file contains representative tests that we would like to run with both funcol impls during the migration period. Marking them as `@run_with_both_funcol_impls`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120043 Approved by: https://github.com/wanchaol ghstack dependencies: #120042	2024-02-22 20:24:15 +00:00
Andrew Gu	87fb8b6218	[DTensor] Relaxed `to_local` `requires_grad` warning (#118186 ) The existing warning in `DTensor.__new__()` checks `if requires_grad != local_tensor.requires_grad:` and warns with: > To construct DTensor from `torch.Tensor`, it's recommended to use `local_tensor.detach()` and make `requires_grad` consistent. Calling `local_tensor.detach()` will have the returned `Tensor` have `requires_grad=False`, so the error message refers to the case where `local_tensor.requires_grad is True` but the user passed `requires_grad=False` to `to_local()`. However, there is the converse case, where `local_tensor.requires_grad is False` but the user passed `requires_grad=True`. In this case, the original `if requires_grad != local_tensor.requires_grad:` check succeeds, and the warning is emitted. However, the warning message does not apply in that case. This can happen via `_prepare_output_fn` -> `redistribute` -> `Redistribute.forward()`, where `output.requires_grad is False` but it passes `requires_grad=input.requires_grad` which can be `True`. We should not warn in this case since `Redistribute.forward()` is our own framework code, so I was proposing to relax the warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118186 Approved by: https://github.com/XilunWu, https://github.com/wanchaol ghstack dependencies: #117994	2024-01-25 15:49:32 +00:00
Wanchao Liang	8a27352d6b	[dtensor] add a implicit replication flag (#115297 ) This PR adds a experimental implicit replication support for DTensor to inter-op with torch.Tensor, basically under this context manager DTensor could work together with torch.Tensor by assuming the torch.Tensor sharding layout is replicated. Note that this is risky for DTensor so we don't turn it on by default, but for certain cases where it is for sure replicated, user can use this to allow DTensor and Tensor computation work together Pull Request resolved: https://github.com/pytorch/pytorch/pull/115297 Approved by: https://github.com/awgu	2023-12-12 03:56:48 +00:00
wz337	dacf5d6e92	[DTensor] Remove assert to allow tensor sharding dimension < Shard(x).ndim (#115114 ) Consolidated by changes made by @yoyoyocmu. https://www.internalfb.com/diff/D51821717 Remove assert to allow tensor dimension < Shard(x).ndim. With the current padding, we do support this already. Follow up: we will still need to fix the size mismatch and `full_tensor()` hang when tensor is uneven-sharded. Created issue here: https://github.com/pytorch/pytorch/issues/115310 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115114 Approved by: https://github.com/yoyoyocmu, https://github.com/wanchaol	2023-12-07 21:57:30 +00:00
Tianyu Liu	8ae3835323	further deprecate PairwiseParallel and SequenceParallel from test (#114402 ) Remaining Issue When replace SequenceParallel, tests would pass even setting `input_layouts=Replicate()`. Still looking into it... Summary This is a follow-up PR to #114314. Test Plan `python test_files.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114402 Approved by: https://github.com/wanchaol	2023-11-30 05:06:08 +00:00
wz337	febbc48f43	[DeviceMesh] Make our mesh_dim kwarg naming consistent (#114707 ) Changing size(self, dim: Optional[int] = None) to def size(self, mesh_dim: Optional[int] = None) so it is consistent with the rest of our APIs. We also update this API usage change in both PT and internal (pyper, APS). Differential Revision: [D51602986](https://our.internmc.facebook.com/intern/diff/D51602986/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114707 Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fegin	2023-11-29 19:43:23 +00:00
Andrew Gu	e360f4c6dd	[DTensor] Renamed `shard_spec` -> `placements` in test file (#113917 ) Public APIs like `from_local` and `distribute_tensor` name the argument as `placements`, not `shard_spec` anymore. This was a direct find and replace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113917 Approved by: https://github.com/wanchaol ghstack dependencies: #113654, #113903	2023-11-18 00:13:30 +00:00
Wanchao Liang	9834fb7fd0	[dtensor] full_tensor to return synchronously (#113322 ) full_tensor API should return synchronously instead of AsyncCollectiveTensor and if the return is that, we do the wait directly, this makes the full_tensor API be more percise Pull Request resolved: https://github.com/pytorch/pytorch/pull/113322 Approved by: https://github.com/wz337	2023-11-09 18:02:40 +00:00
Iris Zhang	9af3f98faf	[DTensor] Fix DTensor.from_local() returns DTensor with wrong size for uneven sharded tensor (#110781 ) Fixes #110762 This PR: fixes issue described in #110762 by adding kwarg for shape and stride when creating DTensor using `DTensor.from_local()`. When `shape` and `stride` are provided, we skip calcualtion for `tensor_shape` and `tensor_stride` using `compute_global_tensor_info()`, as `compute_global_tensor_info()` always assume even sharding. Test plan: ``` python3 test/distributed/_tensor/test_dtensor.py -k test_from_local_uneven_sharding python3 test/distributed/_tensor/test_dtensor.py -k test_from_local_uneven_sharding_raise_error ``` cc. @wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/110781 Approved by: https://github.com/wanchaol	2023-11-04 11:21:10 +00:00
Iris Zhang	596dab4277	[DeviceMesh] Remove _validate_mesh from device_mesh.py (#112928 ) Plan B for https://github.com/pytorch/pytorch/pull/112839 Motivation for the change: 1. We need to remove `funcol` as a dependency for device_mesh.py to resolve circular dependency issues when introducing device_mesh as an arg for DDP. In the meantime, we should not go from funcol to non-funcol as @voznesenskym suggested. Therefore, we want to remove this all_gather check completely. 2. For large scale, it would not make sense to validate the mesh at global scale anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112928 Approved by: https://github.com/wanchaol	2023-11-04 05:12:27 +00:00
Wanchao Liang	2f09da3a21	[dtensor] Introduce full_tensor API to DTensor (#112224 ) This PR introduces a `full_tensor` API to DTensor, there were so many callsites that exercises the `redistribute(replicate)` path and I feel it deserves a separate API, mostly just a syntactic sugar Pull Request resolved: https://github.com/pytorch/pytorch/pull/112224 Approved by: https://github.com/wz337	2023-10-31 00:44:09 +00:00
Wanchao Liang	61461f39d1	[dtensor] handle negative dim and fix TP regression (#111750 ) TP style still have some regression due to negative dim specifications, fix it by allow DTensor API to handle negative dims and normalize them. i.e. TP uses `Shard(-1)`, and then try to redistribute `Shard(1) -> Shard(-1)`, this should ideally be no-op but current it runs a decompose sharding phrase and it would turn this transformation to `Shard(1) -> Replicate -> Shard(-1)`, which is wrong and triggers unnecessary allgathers Pull Request resolved: https://github.com/pytorch/pytorch/pull/111750 Approved by: https://github.com/rohan-varma	2023-10-22 04:25:45 +00:00
Wanchao Liang	c95cf4b4c9	[dtensor] add grad placements kwarg to to_local API (#110629 ) When we convert to local tensor, dtensor can't track autograd or gradient layout of the local tensor anymore, if user do sth not expected, there needs to be a way for user to hint about the gradient layout of the local tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/110629 Approved by: https://github.com/zdevito	2023-10-05 21:34:01 +00:00
wz337	49aa8d19dd	[DTensor] Replace usage of compute_local_offset by compute_local_shape_and_global_offset (#108547 ) This PR removes four usages of compute_local_offset() in PyTorch repo and replaces it with the new API compute_local_shape_and_global_offset(). We will be removing compute_local_offset() API in the next diff, as there are usages internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108547 Approved by: https://github.com/wanchaol	2023-09-06 04:53:44 +00:00
Wanchao Liang	74ff028839	[dtensor] fix new_empty_strided op (#107835 ) This PR fixes the new_empty_strided op to become replicate from sharding when necessary, this is a quick fix to resolve https://github.com/pytorch/pytorch/issues/107661 We'll need to think more about the behavior of this op when it comes to sharding, one possibility is to follow the input sharding, but given the output shape of this op might not be the same as the input, it's hard to say we should follow the input sharding, further improvement needed once we figure out the op syntax Pull Request resolved: https://github.com/pytorch/pytorch/pull/107835 Approved by: https://github.com/fduwjj	2023-08-31 18:27:35 +00:00
Wanchao Liang	9c2b4a35a3	[dtensor] group all dynamo tests together (#107487 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107487 Approved by: https://github.com/fduwjj ghstack dependencies: #107472, #107473	2023-08-21 23:56:00 +00:00
fduwjj	4a6ca4cc05	[TP][DTensor Perf] Some perf improvement to reduce DTensor CPU overhead (#106524 ) By inspecting a small TP benchmark, we found couple things we can optimize: 1. We call deep_copy so many times when we initialize DTensor. 2. Some shading_prop is not cached successfully. 3. We are still calling redistribute when not necessary. ![image](https://github.com/pytorch/pytorch/assets/6937752/b847d110-eea1-45df-9298-066d0ba07dd7) ![image](https://github.com/pytorch/pytorch/assets/6937752/fc08f564-caed-496b-80d7-275c1dba3806) ![image](https://github.com/pytorch/pytorch/assets/6937752/fdc06cc4-a4ba-48e8-a118-c041bbd04f5e) So we want to: 1. Remove the deep_copy, and we now make placements a tuple so we are sure it's immutable. 2. Somehow the op_schema gets changed during sharding_op propogation, so we store a hash version of it before passing it to sharding_prop. Ideally we want to figure out why `op_schema` gets changed, but looks like in both index and detach/view op, all get changed, it might take more time to debug. 3. Also when we do hashing of op_schema, we want to hash the entire args_schema not just the args_spec which only contains the DTensorSpec from args which are Dtensors. 4. It turns out that sometimes, DTensor has mem_format to be None (not contiguous) and this will lead to redistribute get triggered, so that we only need to compare type/shape and stride in the metadata. Also we need to ensure _Partial and Shard have different hash value in the DTensorSpec. ![image](https://github.com/pytorch/pytorch/assets/6937752/321e6890-1ab6-4975-adc9-524c6ef9a76b) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106524 Approved by: https://github.com/wanchaol	2023-08-14 20:03:19 +00:00
Wanchao Liang	5c48ff20b5	AsyncCollectiveTensor: dont sync on view ops (#105240 ) AsyncCollectiveTensor is a tensor subclass that is meant to "delay synchronization" when you call into the functional collectives API's. It does this (if I understand correctly) by internally holding an "unsynchronized" version of the tensor, which is the result of the communication op, and internally calling `.wait()` to synchronize the data the next time it is used. Previously, these wait() calls would happen immediately, because `AsyncCollectiveTensor` gets wrapped by `DTensor()`, which calls `.detach()` on its inner tensor, immediately causing the sync (code: `1518d5eec4/torch/distributed/_tensor/api.py (L207)`) AsyncCollectiveTensor shouldn't need to do a synchronization if you try to detach() it though - in fact, it should be fine to avoid synchronizing if you perform any view ops on it (which just require viewing metadata, but not actual data). This PR tries to update `AsyncCollectiveTensor` to delay `wait()` calls whenever the subclass encounters a view op. Added some light testing, that just runs some DTensor compute followed by view ops, and confirms that the output is still an `AsyncCollectiveTensor` when we call `.to_local()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105240 Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/wconstab	2023-08-11 19:20:25 +00:00

1 2

68 Commits