pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Files

suo fe8cc619b8 [torch][c10d] fix split_group in mixed backend case (#162424 )

Today we can initialize a mixed-backend process group (e.g. "cpu:gloo,cuda:nccl") but we can only pass one set of process group options.

However, when we call `split_group`, we retrieve that set of options from the parent PG and pass it to the ProcessGroup::groupSplit C++ API, which then attempts to propagate that set of options to all backends.

This leads to an assert on some user code, where ProcessGroupGloo::split is expecting gloo options but receives nccl options instead.

Arguably the APIs as currently designed are just broken; we should not ever expect a single set of backend options to apply across multiple backends. However, fixing this would require changing quite a few public APIs.

As a quick fix, since user-provided options really only exist for NCCL, just warn and fall-back to defaulted options for Gloo if non-gloo options are detected.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162424
Approved by: https://github.com/d4l3k, https://github.com/fduwjj, https://github.com/H-Huang

2025-09-11 16:29:32 +00:00

_awaits

…

[serialization] Add pte file to archive (#162520 )

2025-09-11 07:59:11 +00:00

_C_flatbuffer

…

_custom_op

[BE]: ruff PLC0207 - use maxsplit kwarg (#160107 )

2025-08-08 03:14:59 +00:00

_decomp

Revert "[dynamic shapes] unbacked-safe slicing (#157944 )"

2025-08-22 20:48:46 +00:00

_dispatch

Improve torch.ops typing (#154555 )

2025-06-22 15:52:27 +00:00

_dynamo

Fix persistent buffer bug (#162190 )

2025-09-11 14:56:26 +00:00

_export

[triton][export] serialization in internal path + unit tests (#162200 )

2025-09-10 09:49:10 +00:00

_functorch

Disable autocast when running joint graph passes (#162304 )

2025-09-06 00:57:58 +00:00

_higher_order_ops

[associative_scan] partial gradient support (#162388 )

2025-09-09 23:52:29 +00:00

_inductor

[AOTI] Fix Windows fail to zip opened file. (#162617 )

2025-09-11 06:22:21 +00:00

_lazy

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_library

Avoid double hash lookup in torch._library.simple_registry (#161328 )

2025-08-30 06:55:43 +00:00

_logging

Add compile_id: Optional[CompileID] to torch._logging._internal.trace_structured_artifact (#160440 )

2025-08-13 06:28:23 +00:00

_numpy

Fix torch._numpy to match NumPy when empty ellipsis causes advanced indexing separation (#158297 )

2025-07-16 08:11:53 +00:00

_prims

remove gso from collapse_view_helper (#162212 )

2025-09-10 00:17:15 +00:00

_prims_common

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

_refs

remove gso from collapse_view_helper (#162212 )

2025-09-10 00:17:15 +00:00

_strobelight

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_subclasses

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

_vendor

…

accelerator

Add unified memory APIs for torch.accelerator (#152932 )

2025-08-08 17:41:22 +00:00

amp

Optimize AMP custom_backend_name error message (#162037 )

2025-09-04 08:27:56 +00:00

[torchao][pt2e] Make prepare and convert faster by caching (#162550 )

2025-09-11 07:59:22 +00:00

autograd

[ONNX] Refactor torchscript based exporter (#161323 )

2025-09-02 16:10:30 +00:00

backends

Revert "[ROCm] SDPA fix mem fault when dropout is enabled (#154864 )"

2025-08-26 20:03:59 +00:00

compiler

[easy] [precompile] Convert CompileArtifacts to callable (#162169 )

2025-09-07 23:37:31 +00:00

contrib

…

cpu

Replace _device_t with torch.types.Device in torch/cpu/__init__.py (#161031 )

2025-08-21 00:22:43 +00:00

csrc

[torch][c10d] fix split_group in mixed backend case (#162424 )

2025-09-11 16:29:32 +00:00

cuda

compile_kernel: Handle python floats as c double (#162626 )

2025-09-11 06:03:25 +00:00

distributed

[torch][c10d] fix split_group in mixed backend case (#162424 )

2025-09-11 16:29:32 +00:00

distributions

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

export

paths to exclude shape guards (#162684 )

2025-09-11 15:34:06 +00:00

fft

[BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553 )

2025-06-17 08:18:47 +00:00

func

…

futures

Simplify the base classes of _PyFutureMeta (#157757 )

2025-07-08 15:39:56 +00:00

paths to exclude shape guards (#162684 )

2025-09-11 15:34:06 +00:00

headeronly

CUDA 13 -- sm_120 -- Nvidia 5090 -- ptxas warning : Value of threads … (#161380 )

2025-09-02 13:27:57 +00:00

jit

[4/n] Remove references to TorchScript in PyTorch docs (#158317 )

2025-07-16 20:01:34 +00:00

legacy

…

lib

[2/N] Fix cppcoreguidelines-init-variables suppression (#146237 )

2025-06-19 23:26:42 +00:00

linalg

Revert "Add __init__.pyi to torch/linalg (#160750 )"

2025-09-02 16:53:55 +00:00

masked

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

monitor

…

mps

[BE][12/16] fix typos in torch/ (#156602 )

2025-07-02 22:55:29 +00:00

mtia

[Re-land][Inductor] Support native Inductor as backend for MTIA (#159211 )

2025-07-29 17:03:24 +00:00

multiprocessing

Allow parallel start NUMA binding (#161576 )

2025-08-28 01:15:58 +00:00

nativert

[nativert] AOTI delegate with flat inputs and outputs (#162538 )

2025-09-10 11:35:44 +00:00

nested

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

use torch.accelerator and device_module instead of cuda to make DataParallel more device agnostic. (#162573 )

2025-09-11 10:04:27 +00:00

numa

Allow parallel start NUMA binding (#161576 )

2025-08-28 01:15:58 +00:00

onnx

[ONNX] Update export docstring (#162622 )

2025-09-10 20:29:46 +00:00

optim

Unify TypeAlias definitions in optimizer.py (#161493 )

2025-08-30 00:35:02 +00:00

package

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

profiler

removed duplicate imports (#161685 )

2025-08-31 16:21:49 +00:00

quantization

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

signal

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

sparse

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

special

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

testing

[2/N]Port several test files under test/distributed to Intel GPU (#159473 )

2025-09-11 06:44:26 +00:00

utils

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

xpu

Add uuid to XPU device properties (#161392 )

2025-09-02 06:41:32 +00:00

__config__.py

…

__future__.py

…

__init__.py

[BE][Easy] restore #157584 after #158288 (#158541 )

2025-09-02 02:06:50 +00:00

_appdirs.py

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

_classes.py

remove allow-untyped-defs from torch/_classes.py (#157231 )

2025-07-08 00:11:52 +00:00

_compile.py

[precompile] Ensure @disable()-ed function won't trigger recompile from precompile bytecode. (#155363 )

2025-06-10 16:13:38 +00:00

_custom_ops.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_environment.py

…

_guards.py

fix incorrect interaction between DDPOptimizer and donated buffers (#160745 )

2025-09-04 21:57:27 +00:00

_jit_internal.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_linalg_utils.py

Update is_sparse doc to mention that it is sparse_coo specific (#157378 )

2025-07-09 18:22:14 +00:00

_lobpcg.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_lowrank.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_meta_registrations.py

MXFP8 grouped GEMM support for torch._scaled_grouped_mm + submodule bump (#162209 )

2025-09-06 15:25:30 +00:00

_namedtensor_internals.py

…

_ops.py

Enable XPU path for FlexAttention (#143553 )

2025-08-29 23:10:58 +00:00

_python_dispatcher.py

Typo fixes for "overridden" in comments and function names (#155944 )

2025-06-14 03:37:38 +00:00

_size_docs.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_sources.py

…

_storage_docs.py

Fix docstring for torch.UntypedStorage.from_file (#155067 )

2025-06-05 14:30:49 +00:00

_streambase.py

…

_tensor_docs.py

Add missing optional for tensor ops (#159028 )

2025-07-25 04:36:55 +00:00

_tensor_str.py

Fix max_width computation in _tensor_str._Formatter (#126859 )

2025-08-01 15:05:41 +00:00

_tensor.py

[MPS] Enable dlpack integration (#158888 )

2025-07-24 18:05:41 +00:00

_thread_safe_fork.py

…

_torch_docs.py

Update docs for quantile to be clearer for nearest (#162423 )

2025-09-09 18:04:12 +00:00

_utils_internal.py

Allow for using a dedicated binary for the torch subproc pool. (#162093 )

2025-09-05 01:43:46 +00:00

_utils.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_VF.py

…

_vmap_internals.py

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

_weights_only_unpickler.py

Fix type checking for persistent loads in the weights-only unpickler (#161661 )

2025-09-01 19:57:19 +00:00

CMakeLists.txt

Revert "Make distributed modules importable even when backend not built (#159889 )" (#162568 )

2025-09-10 04:29:42 +00:00

custom_class_detail.h

…

custom_class.h

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

extension.h

…

functional.py

unify broadcast_shapes functions and avoid duplicates (#160251 )

2025-08-16 00:54:32 +00:00

header_only_apis.txt

[Reland] Migrate ScalarType to headeronly (#159911 )

2025-08-06 07:36:37 +00:00

hub.py

Allow torch.hub.load with unauthorized GITHUB_TOKEN (#159896 )

2025-08-14 18:15:49 +00:00

library.h

Using std::make_unique<T>() instead of unique<T>(new T()) (#160723 )

2025-08-19 10:25:47 +00:00

library.py

Leak Python filenames so that we can give good dispatcher errors. (#160418 )

2025-08-31 22:31:39 +00:00

overrides.py

Add torch.Tensor._make_dtensor to accelerate DTensor.__new__ further (#161590 )

2025-09-05 18:43:41 +00:00

py.typed

…

quasirandom.py

…

random.py

Update description for torch.random.fork_rng (#151881 )

2025-04-23 16:59:29 +00:00

return_types.py

…

script.h

…

serialization.py

added class or module info for functions blocked by weight-only load (#159935 )

2025-08-12 20:52:25 +00:00

storage.py

mypy 1.16.0 (#155821 )

2025-06-14 18:18:43 +00:00

torch_version.py

…

types.py

…

version.py.tpl

…