pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

IvanKobzarev 8aebf01287 [bucketing] Rewrite all_gather, reduce_scatter passes via tracing merge_fn (#158663 )

Rewriting bucketing of all_gather and reduce_scatter with defining of "merge graph" via torch function.
`all_gather_merge_fn_to_trace`
`reduce_scatter_merge_fn_to_trace`

(Instead of creating nodes and doing FakeTensor prop manually)
This allows to experiment with merge function.

Used foreach_copy_ in merging function for all_gather - added lowering for inductor for `foreach_copy_`

Adding topological sort after bucketing passes (comment in post_grad.py):
```
        # Fx collectives bucketing passes require topological sort for the cases:
        # when bucketed collectives have users before the last collective in the bucket
        # AND when inputs of bucketed collective have ancestors after the first collective in the bucket.
        #
        # In this case we can not manually pick the place for bucketed collective insertion.
        # But we are guaranteed by the bucketing (independent collectives in the bucket),
        # that it is possible to reorder nodes to satisfy all ordering requirements.
        #
        # --- before bucketing ---
        # in0 = ...
        # wait_ag0 = ag(in0)
        # user0(wait_ag0)
        # ...
        # pre_in1 = ...
        # in1 = transform(pre_in1)
        # wait_ag1 = ag(in1)
        # user1(wait_ag1)
        #
        # --- after bucketing ---
        #
        # in0 = ...
        # user(wait_ag0) <--- wait_ag0 is defined only after bucketed collective.
        #
        # pre_in1 = ...
        # in1 = transform(pre_in1)
        # ag_bucket(in0+in1)
        # wait_bucket
        # wait_ag0 = wait_bucket[0]
        # wait_ag1 = wait_bucket[1]
        # user1(wait_ag1)
````

Correctness of the passes verified by loss curve for llama3 8b for simple_fsdp and for autoparallel:

<img width="1364" height="495" alt="Screenshot 2025-07-22 at 14 27 28" src="https://github.com/user-attachments/assets/67b2cabb-3206-450b-b529-e23c24292fc6" />
<img width="1355" height="509" alt="Screenshot 2025-07-22 at 14 27 56" src="https://github.com/user-attachments/assets/4d0e6b25-2eb1-47b2-8d68-dcec185239c4" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158663
Approved by: https://github.com/wconstab

2025-07-25 22:49:51 +00:00

_awaits

…

[ROCm] add flag torch.backends.miopen.immediate (#158951 )

2025-07-25 04:01:51 +00:00

_C_flatbuffer

…

_custom_op

pyfmt lint torch/_custom_op/* (#155782 )

2025-06-12 23:04:11 +00:00

_decomp

[export] set enable_gqa in export flash->math decomp (#158604 )

2025-07-24 14:46:13 +00:00

_dispatch

Improve torch.ops typing (#154555 )

2025-06-22 15:52:27 +00:00

_dynamo

[HOP, map] Rework of map autograd to the new interface (#153343 )

2025-07-25 21:17:06 +00:00

_export

[export] assert fix in serdes (#159060 )

2025-07-25 21:46:20 +00:00

_functorch

Add aot_export_joint_with_descriptors and aot_compile_joint_with_descriptors (#158715 )

2025-07-25 18:49:00 +00:00

_higher_order_ops

[HOP, map] Rework of map autograd to the new interface (#153343 )

2025-07-25 21:17:06 +00:00

_inductor

[bucketing] Rewrite all_gather, reduce_scatter passes via tracing merge_fn (#158663 )

2025-07-25 22:49:51 +00:00

_lazy

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_library

[torchbind] fix fakifying a staitc tensor returns dynamic accidentally (#158607 )

2025-07-25 20:55:41 +00:00

_logging

[dynamo][be] hide warnings without invalidating warnings cache (#158520 )

2025-07-18 22:02:31 +00:00

_numpy

Fix torch._numpy to match NumPy when empty ellipsis causes advanced indexing separation (#158297 )

2025-07-16 08:11:53 +00:00

_prims

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_prims_common

[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 )

2025-07-24 15:55:18 +00:00

_refs

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_strobelight

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_subclasses

move view_meta to fake impl (#158406 )

2025-07-25 08:21:27 +00:00

_vendor

…

accelerator

Revert "Add unified memory APIs for torch.accelerator (#152932 )"

2025-07-22 01:01:41 +00:00

amp

Issue warning with reference to user code rather than torch (#155112 )

2025-07-14 05:24:23 +00:00

Use new type statement to fix public API of types (#158487 )

2025-07-17 18:46:44 +00:00

autograd

Fix types in graphs.py (#158192 )

2025-07-15 19:49:38 +00:00

backends

[ROCm] add flag torch.backends.miopen.immediate (#158951 )

2025-07-25 04:01:51 +00:00

compiler

Dont't GC as often when collecting cudagraphs (#158193 )

2025-07-24 21:37:11 +00:00

contrib

…

cpu

[device_mesh] improve device selection logic (#150897 )

2025-05-14 06:29:16 +00:00

csrc

support scalar tensor for functional all_gather (#149913 )

2025-07-25 22:38:08 +00:00

cuda

Revert "[BE] remove torch deploy - conditionals (#158288 )"

2025-07-25 16:09:39 +00:00

distributed

support scalar tensor for functional all_gather (#149913 )

2025-07-25 22:38:08 +00:00

distributions

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

export

[export] Fix public bindings (#159109 )

2025-07-25 18:18:52 +00:00

fft

[BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553 )

2025-06-17 08:18:47 +00:00

func

Add torch.func.debug_unwrap (#146528 )

2025-02-06 18:48:09 +00:00

futures

Simplify the base classes of _PyFutureMeta (#157757 )

2025-07-08 15:39:56 +00:00

Revert "[BE] Remove __reduce_deploy__ (#158291 )"

2025-07-25 16:09:39 +00:00

headeronly

Revert "Move some of vec into headeronly in preparation for Half.h (#158976 )"

2025-07-24 22:31:49 +00:00

jit

[4/n] Remove references to TorchScript in PyTorch docs (#158317 )

2025-07-16 20:01:34 +00:00

legacy

…

lib

[2/N] Fix cppcoreguidelines-init-variables suppression (#146237 )

2025-06-19 23:26:42 +00:00

linalg

Fix for ambiguity in linalg.norm()'s ord argument of +2 & -2 (#155148 )

2025-06-04 21:15:20 +00:00

masked

Fix MaskedTensor to device ignored mask (#151205 )

2025-07-21 21:44:49 +00:00

monitor

add WaitCounter type interface and get rid of type errors (#146175 )

2025-02-01 23:24:52 +00:00

mps

[BE][12/16] fix typos in torch/ (#156602 )

2025-07-02 22:55:29 +00:00

mtia

[BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553 )

2025-06-17 08:18:47 +00:00

multiprocessing

[BE][12/16] fix typos in torch/ (#156602 )

2025-07-02 22:55:29 +00:00

nativert

[nativert] make per-node benchmark work with memory planning (#159117 )

2025-07-25 20:46:17 +00:00

nested

Add check nested_tensor_from_jagged param jagged_dim >= 1 (#157770 )

2025-07-10 00:34:39 +00:00

[BE] More torch.nn docs coverage test (except for torch.nn.parallel) (#158654 )

2025-07-25 22:03:55 +00:00

onnx

Dont't GC as often when collecting cudagraphs (#158193 )

2025-07-24 21:37:11 +00:00

optim

Document the rest of the specific optimizer module APIs (#158669 )

2025-07-19 07:27:15 +00:00

package

[BE][Ez]: Update ruff to 0.12.2 (#157937 )

2025-07-11 15:16:20 +00:00

profiler

[profiler] update CUDA runtime kernel identification logic (#157890 )

2025-07-24 19:14:08 +00:00

quantization

[BE][6/16] fix typos in torch/ (#156316 )

2025-06-23 02:57:34 +00:00

signal

[BE][6/16] fix typos in torch/ (#156316 )

2025-06-23 02:57:34 +00:00

sparse

[build] modernize build-frontend: python setup.py develop/install -> [uv ]pip install --no-build-isolation [-e ]. (#156027 )

2025-07-09 11:24:27 +00:00

special

[BE][6/16] fix typos in torch/ (#156316 )

2025-06-23 02:57:34 +00:00

testing

[Profiler] Fix lost C call events problem in Python 3.12.0-3.12.4 (#155446 )

2025-07-25 21:44:57 +00:00

utils

Revert "[BE] remove torch deploy - conditionals (#158288 )"

2025-07-25 16:09:39 +00:00

xpu

[BE][6/16] fix typos in torch/ (#156316 )

2025-06-23 02:57:34 +00:00

__config__.py

…

__future__.py

…

__init__.py

Revert "[BE] remove torch deploy - conditionals (#158288 )"

2025-07-25 16:09:39 +00:00

_appdirs.py

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

_classes.py

remove allow-untyped-defs from torch/_classes.py (#157231 )

2025-07-08 00:11:52 +00:00

_compile.py

[precompile] Ensure @disable()-ed function won't trigger recompile from precompile bytecode. (#155363 )

2025-06-10 16:13:38 +00:00

_custom_ops.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_deploy.py

Revert "[BE] Remove torch deploy | remove torch deploy specific files (#158290 )"

2025-07-25 16:09:39 +00:00

_environment.py

…

_guards.py

[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397 )

2025-07-24 15:55:18 +00:00

_jit_internal.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_linalg_utils.py

Update is_sparse doc to mention that it is sparse_coo specific (#157378 )

2025-07-09 18:22:14 +00:00

_lobpcg.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_lowrank.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_meta_registrations.py

move view_meta to fake impl (#158406 )

2025-07-25 08:21:27 +00:00

_namedtensor_internals.py

…

_ops.py

Revert "[BE] remove torch deploy - conditionals (#158288 )"

2025-07-25 16:09:39 +00:00

_python_dispatcher.py

Typo fixes for "overridden" in comments and function names (#155944 )

2025-06-14 03:37:38 +00:00

_size_docs.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_sources.py

…

_storage_docs.py

Fix docstring for torch.UntypedStorage.from_file (#155067 )

2025-06-05 14:30:49 +00:00

_streambase.py

…

_tensor_docs.py

Add missing optional for tensor ops (#159028 )

2025-07-25 04:36:55 +00:00

_tensor_str.py

fix tensor print behavior for MAIA (#155609 )

2025-06-14 01:04:12 +00:00

_tensor.py

[MPS] Enable dlpack integration (#158888 )

2025-07-24 18:05:41 +00:00

_thread_safe_fork.py

…

_torch_docs.py

Add basic torch.hash_tensor op (#154149 )

2025-07-23 22:28:03 +00:00

_utils_internal.py

NUMA binding integration with elastic agent and torchrun (#149334 )

2025-07-25 21:19:49 +00:00

_utils.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_VF.py

…

_vmap_internals.py

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

_weights_only_unpickler.py

Add sparse tensors constructed via legacy constructor to _sparse_tensors_to_validate (#147759 )

2025-02-25 23:51:12 +00:00

CMakeLists.txt

Migrate c10/macros/cmake_macros.h.in to torch/headeronly (#158035 )

2025-07-15 19:52:59 +00:00

custom_class_detail.h

…

custom_class.h

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

extension.h

…

functional.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

header_only_apis.txt

Revert "Move some of vec into headeronly in preparation for Half.h (#158976 )"

2025-07-24 22:31:49 +00:00

hub.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

library.h

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

library.py

[torchbind] support register_autocast for torchbind custom op (#158583 )

2025-07-25 20:55:41 +00:00

overrides.py

Add basic torch.hash_tensor op (#154149 )

2025-07-23 22:28:03 +00:00

py.typed

…

quasirandom.py

…

random.py

Update description for torch.random.fork_rng (#151881 )

2025-04-23 16:59:29 +00:00

return_types.py

…

script.h

…

serialization.py

Reduce random reads for offset metadata when calling torch.load under FakeTensorMode (#157931 )

2025-07-17 22:17:52 +00:00

storage.py

mypy 1.16.0 (#155821 )

2025-06-14 18:18:43 +00:00

torch_version.py

[BE]: Enable ruff SLOT checks (#146276 )

2025-02-04 19:18:23 +00:00

types.py

…

version.py.tpl

…