pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Rodrigo Kumpera	7ebd816aab	Switch DTensor to use funcol::all_reduce. (#95804 ) This is relanding the troubling part of #95009 that caused a regression. BC: This changes the signature and semantics of DeviceMesh::all_reduce. DeviceMesh::all_reduce now uses a functional collective under the hood which makes it more easily traceable. You no longer need to use CommTensor to get a trace. all_reduce now is async only and uses AsyncCollectiveTensor to ensure proper stream synchronization. Signature changed: removed async_op param and changes return type from Optional[Work] to torch.Tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95804 Approved by: https://github.com/fegin	2023-03-02 17:55:01 +00:00
Kazuaki Ishizaki	b3d8fae042	Fix typos in documents under torch directory (#95709 ) This PR fixes typo in `.md` files under `torch` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95709 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-03-01 23:43:35 +00:00
Wanchao Liang	7a772bfff9	[dtensor] add submesh example to checkpoint_example (#95655 ) This PR adds a submesh example for checkpoing purposes Pull Request resolved: https://github.com/pytorch/pytorch/pull/95655 Approved by: https://github.com/XilunWu	2023-03-01 08:19:27 +00:00
Wanchao Liang	2a1cb9640c	[dtensor] support creating DTensor in submesh (#95458 ) This PR supports creating DTensor in a submesh, if the rank is not participating in the mesh, we assign the local tensor to be empty tensor, and do nothing in the operator dispatch Differential Revision: [D43643577](https://our.internmc.facebook.com/intern/diff/D43643577) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95458 Approved by: https://github.com/XilunWu	2023-02-28 17:54:26 +00:00
Wanchao Liang	261eb46ddd	[dtensor] refactor get_coordiniate (#95457 ) This refactor get_coordinate to return a optional[list] instead of directly the coordinate on dim, this is so that we can check if the rank is inside the mesh easily Differential Revision: [D43643579](https://our.internmc.facebook.com/intern/diff/D43643579) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95457 Approved by: https://github.com/XilunWu	2023-02-28 17:54:26 +00:00
Wanchao Liang	bb9a05b116	[dtensor] use tracing for metadata prop (#95456 ) This PR uses tracing for metadata prop, so that we can get correct shape/stride metadata without manual calculation by ourselves. The follow up PR on this would be adopt tracing for the sharding prop itself Differential Revision: [D43643578](https://our.internmc.facebook.com/intern/diff/D43643578) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95456 Approved by: https://github.com/XilunWu	2023-02-28 17:54:22 +00:00
PyTorch MergeBot	d950f45577	Revert "[Functional Collectives] Migrate DeviceMesh::all_reduce to use functional all_reduce. (#95009 )" This reverts commit 0765dbc25ed9368f41225e7de231ee3dd6b188a3. Reverted https://github.com/pytorch/pytorch/pull/95009 on behalf of https://github.com/jeanschmidt due to this PR is causing internal breakages. Check https://fburl.com/diff/me41urq8	2023-02-27 19:21:58 +00:00
Kazuaki Ishizaki	31ce32b03d	Fix typos in documents under torch (#95597 ) This PR fixes typos of documents in `.md` files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95597 Approved by: https://github.com/ezyang	2023-02-27 19:07:47 +00:00
Rodrigo Kumpera	0765dbc25e	[Functional Collectives] Migrate DeviceMesh::all_reduce to use functional all_reduce. (#95009 ) BC: This changes the signature and semantics of DeviceMesh::all_reduce. DeviceMesh::all_reduce now uses a functional collective under the hood which makes it more easily traceable. You no longer need to use CommTensor to get a trace. all_reduce now is async only and uses AsyncCollectiveTensor to ensure proper stream synchronization. Signature changed: removed `async_op` param and changes return type from `Optional[Work]` to `torch.Tensor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95009 Approved by: https://github.com/wanchaol	2023-02-24 02:10:55 +00:00
Wanchao Liang	ee0e7f0529	[dtensor] add checkpointing example (#94743 ) This PR adds some DTensor sharding example on a simple MLP model for checkpointing reference purposes Note that checkpointing itself is not implemented yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94743 Approved by: https://github.com/wz337	2023-02-16 22:04:09 +00:00
fduwjj	b209d8fa0d	[PT-D][Sequence Parallelism] Enable DTensor based Naive sequence parallelism (#94369 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94369 Approved by: https://github.com/wanchaol	2023-02-16 21:21:00 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Wanchao Liang	680fc84e7b	[dtensor] group public APIs together (#94524 ) This PR groups distribute_tensor/module to api.py rename some to non-public (ToTensor/FromTensor) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94524 Approved by: https://github.com/XilunWu	2023-02-10 23:40:34 +00:00
Wanchao Liang	09598b603f	[dtensor] update readme for prototype release (#94517 ) This PR updates the README for prototype release, remove some code that are not available yet and use the ones that works. Also rename to DTensor in most sentences Pull Request resolved: https://github.com/pytorch/pytorch/pull/94517 Approved by: https://github.com/fegin	2023-02-09 22:35:26 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Wanchao Liang	d05ec0efeb	[dtensor] add split_with_sizes op (#93957 ) add the split_with_sizes op, sharing with split op impl Pull Request resolved: https://github.com/pytorch/pytorch/pull/93957 Approved by: https://github.com/XilunWu	2023-02-03 04:16:30 +00:00
Xilun Wu	6f3018d50b	[DTensor] implement dist_split as a sharding prop rule (#93306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93306 Approved by: https://github.com/wanchaol	2023-02-02 07:56:44 +00:00
Xilun Wu	b82f93d561	[DTensor] fix DTensorSpec dim_map description (#93160 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93160 Approved by: https://github.com/wanchaol	2023-02-02 07:56:44 +00:00
Wanchao Liang	60e503d468	[dtensor][6/N] change to a better/safer op registration (#90735 ) This PR changes the op registration to a better mechanism, now we require the directly overload registration instead of the op key str, this have several benefits: 1. We ensure that the op registration registers the correct op, which means it would be faild if the op registration become wrong (this PR already fixing several op registration errors as we use direct OpOverload registration 2. If the overload name get changed/deleted, we immediately know it at the source code compilation level, which is safer 3. This also keep it consistents with the op registration mechanism with other tensor subclasses within PyTorch Differential Revision: [D42876250](https://our.internmc.facebook.com/intern/diff/D42876250) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90735 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-02-01 05:06:33 +00:00
Wanchao Liang	9a56997fe1	[dtensor][5/N] add cached propagator for TP (#90734 ) This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90734 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-02-01 05:04:08 +00:00
Wanchao Liang	b072245178	[dtensor][4/N] refactor dispatching logic and add propagator (#90733 ) This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. Differential Revision: [D42876251](https://our.internmc.facebook.com/intern/diff/D42876251) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90733 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-02-01 05:02:11 +00:00
Xilun Wu	8b3e01cd30	[DTensor] implement dist_cat as a sharding prop rule (#92677 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92677 Approved by: https://github.com/wanchaol	2023-01-27 02:14:17 +00:00
fduwjj	77f336600a	[PT-D] Enable Meta Tensor Support for DTensor (#92652 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92652 Approved by: https://github.com/XilunWu, https://github.com/wanchaol	2023-01-26 04:54:57 +00:00
fduwjj	b985c2ef4a	[PT-D] Enable init ops for DTensor (#92651 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92651 Approved by: https://github.com/wanchaol	2023-01-23 04:38:11 +00:00
Wanchao Liang	c55f6973e4	[dtensor][3/N] move OpSchema and types to a separate file (#90732 ) This PR moves OpSchema and types to a separate file to resolve circular dependency better, this is part of refactor on dispatching logic to enable more complicated features Pull Request resolved: https://github.com/pytorch/pytorch/pull/90732 Approved by: https://github.com/XilunWu	2023-01-18 07:16:23 +00:00
Wanchao Liang	dc95ef25e5	[dtensor][2/N] add __repr__ to placements (#91785 ) This PR added __repr__ to all placement types Pull Request resolved: https://github.com/pytorch/pytorch/pull/91785 Approved by: https://github.com/XilunWu	2023-01-18 07:16:23 +00:00
Wanchao Liang	a1186d6af9	[dtensor][1/N] add __hash__ to device_mesh and dtensor_spec (#90731 ) This PR adds __hash__ to device_mesh and dtensor_spec to allow things like dict indexing Pull Request resolved: https://github.com/pytorch/pytorch/pull/90731 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-01-18 07:16:21 +00:00
Wanchao Liang	9942ddd5b3	[threaded_pg] enable subpg creation and concurrent collective (#91649 ) This PR refactors the threaded PG logic to enable multiple sub pg creation under the world threaded pg, and allow the case where we can call collectives together on different subpgs Pull Request resolved: https://github.com/pytorch/pytorch/pull/91649 Approved by: https://github.com/XilunWu	2023-01-17 03:26:34 +00:00
Xilun Wu	513c1e71e2	[DTensor] check DeviceMesh ranks contiguity (#91802 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91802 Approved by: https://github.com/wanchaol	2023-01-16 01:17:45 +00:00
Xilun Wu	b7cad020b5	[DTensor] require DeviceMesh size equals world size (#91801 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91801 Approved by: https://github.com/wanchaol	2023-01-12 22:37:55 +00:00
Xilun Wu	3dd9dbd942	[DTensor] create default process group when absent (#91756 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91756 Approved by: https://github.com/wanchaol	2023-01-12 22:37:55 +00:00
Xilun Wu	712170e929	[threaded pg] adapt test_pointwise_ops.py (#90713 ) Differential Revision: [D42153660](https://our.internmc.facebook.com/intern/diff/D42153660) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90713 Approved by: https://github.com/wanchaol	2022-12-20 23:37:40 +00:00
Wanchao Liang	7afba50508	[dtensor] delete unused torch_function (#90449 ) torch_function is not actually getting used yet today, deleting it first and we can revisit once we really need it Pull Request resolved: https://github.com/pytorch/pytorch/pull/90449 Approved by: https://github.com/fduwjj	2022-12-10 01:29:02 +00:00
Sergii Dymchenko	f51f6aa387	Fix non-existing parameters in docstrings (#90505 ) Continuation after https://github.com/pytorch/pytorch/pull/90163. Here is a script I used to find all the non-existing arguments in the docstrings (the script can give false positives in presence of args/*kwargs or decorators): _Edit:_ I've realized that the indentation is wrong for the last `break` in the script, so the script only gives output for a function if the first docstring argument is wrong. I'll create a separate PR if I find more issues with corrected script. ``` python import ast import os import docstring_parser for root, dirs, files in os.walk('.'): for name in files: if root.startswith("./.git/") or root.startswith("./third_party/"): continue if name.endswith(".py"): full_name = os.path.join(root, name) with open(full_name, "r") as source: tree = ast.parse(source.read()) for node in ast.walk(tree): if isinstance(node, ast.FunctionDef): all_node_args = node.args.args if node.args.vararg is not None: all_node_args.append(node.args.vararg) if node.args.kwarg is not None: all_node_args.append(node.args.kwarg) if node.args.posonlyargs is not None: all_node_args.extend(node.args.posonlyargs) if node.args.kwonlyargs is not None: all_node_args.extend(node.args.kwonlyargs) args = [a.arg for a in all_node_args] docstring = docstring_parser.parse(ast.get_docstring(node)) doc_args = [a.arg_name for a in docstring.params] clean_doc_args = [] for a in doc_args: clean_a = "" for c in a.split()[0]: if c.isalnum() or c == '_': clean_a += c if clean_a: clean_doc_args.append(clean_a) doc_args = clean_doc_args for a in doc_args: if a not in args: print(full_name, node.lineno, args, doc_args) break ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90505 Approved by: https://github.com/malfet, https://github.com/ZainRizvi	2022-12-09 21:43:09 +00:00
Wanchao Liang	9e314bd822	[dtensor] handle the case where output of op is Optional[Tensor] (#90241 ) Observed by @aazzolini, some op might have Optional[Tensor] returns where it return None (i.e. native_layer_norm_backward), it's a mismatch between C++ aten op signature and python None, but we need to handle it in the python side Pull Request resolved: https://github.com/pytorch/pytorch/pull/90241 Approved by: https://github.com/aazzolini	2022-12-06 18:17:20 +00:00
Wanchao Liang	2c2cce73d4	[dtensor] remove torchgen function schema and parse manually (#90106 ) This PR get rids of torchgen FunctionSchema parsing and parse it manually, it should resolve torchgen package issue and also provide some perf wins when running DTensor eagerly Pull Request resolved: https://github.com/pytorch/pytorch/pull/90106 Approved by: https://github.com/awgu	2022-12-06 05:45:00 +00:00
jiaruifang	29ea1c9c8e	[doc] update dtensor readme (#89991 ) I fixed some import erros in readme of dtensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89991 Approved by: https://github.com/wanchaol	2022-12-01 22:16:39 +00:00
Wanchao Liang	bf23e0bdbd	[dtensor] ufmt distributed._tensor (#89967 ) cmd: `ufmt format torch/distributed/_tensor` copy from Andrew: Notes For VSCode users, Install ufmt: https://pypi.org/project/ufmt/ Install VSCode ufmt extension: https://marketplace.visualstudio.com/items?itemName=omnilib.ufmt Include in settings.json: ``` { "[python]": { "editor.defaultFormatter": "omnilib.ufmt", "editor.formatOnSave": true, }, } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89967 Approved by: https://github.com/fduwjj	2022-12-01 20:58:13 +00:00
Wanchao Liang	4451eb24e6	Move tensor_parallel out to distributed.tensor folder (#89878 ) This PR moves tensor parallel from torch.distributed._tensor.parallel to torch.distributed.tensor.parallel, to prepare for beta release Pull Request resolved: https://github.com/pytorch/pytorch/pull/89878 Approved by: https://github.com/fduwjj	2022-11-30 22:13:10 +00:00
fduwjj	009dd3c4af	[PT-D][Tensor Parallel] Add more test cases when we use use_orig_params for FSDP wrapping (#89779 ) Differential Revision: [D41600656](https://our.internmc.facebook.com/intern/diff/D41600656) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89779 Approved by: https://github.com/wanchaol	2022-11-30 06:34:58 +00:00
Wanchao Liang	12f98f85bc	[dtensor] update README (#89800 ) This PR updates README to include the RFC details Pull Request resolved: https://github.com/pytorch/pytorch/pull/89800 Approved by: https://github.com/mrshenli	2022-11-30 04:35:32 +00:00
fduwjj	de0dee30d0	[PT-D][3/N] Sync TP API change to Pytorch (#89535 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89535 Approved by: https://github.com/wanchaol	2022-11-23 16:13:49 +00:00
fduwjj	00b9473ad6	[PT-D][Tensor Parallelism][2/N] Sync TP API change to PT prod (#89467 ) This is part of TP Beta Release efforts. ref: https://github.com/pytorch/tau/issues/576 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89467 Approved by: https://github.com/wanchaol	2022-11-22 03:05:53 +00:00
fduwjj	6afe341276	[PT-D][1/N] Sync TP Beta change to prod (#89242 ) This is part of TP Beta Release efforts. ref: https://github.com/pytorch/tau/issues/576 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89242 Approved by: https://github.com/wanchaol	2022-11-19 18:01:25 +00:00
Wanchao Liang	f20b3f2e57	[dtensor] PART 8: move tensor parallel api and tests to core distributed (#88180 ) This PR moves tensor/parallel folder and tests to torch.distributed. part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88180 Approved by: https://github.com/aazzolini	2022-11-16 08:07:50 +00:00
Wanchao Liang	1b88476320	[dtensor] PART 4: move remaining DTensor ops to core distributed (#88550 ) This PR moves the view related DTensor ops to core distributed, tests will be add in follow up PRs part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88550 Approved by: https://github.com/fduwjj	2022-11-16 08:07:44 +00:00
Wanchao Liang	2dcf0978a2	[dtensor] PART 3: move most DTensor ops to core distributed (#88177 ) This PR moves most DTensor ops to torch.distributed._tensor. We will add all tests in the following PRs. part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88177 Approved by: https://github.com/fduwjj	2022-11-16 08:07:42 +00:00
Wanchao Liang	4b945967de	[dtensor] PART 2: move DTensor abstraction and APIs to core distributed (#88176 ) This PR moves the core DTensor abstraction and high level APIs to torch.distributed._tensor folder, which includes the following: 1. DTensor class 2. high level APIs (distribute_tensor/module) 3. dispatching logic 4. redistribute logic part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88176 Approved by: https://github.com/fduwjj	2022-11-16 08:07:41 +00:00
Wanchao Liang	370fc5cb42	[dtensor] PART 1: move DeviceMesh and placement to core distributed (#88549 ) This PR creates `torch.distributed._tensor` package and moves DeviceMesh, PlacementTypes to it part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88549 Approved by: https://github.com/fduwjj	2022-11-16 08:07:39 +00:00

... 6 7 8 9 10

499 Commits