pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Files

Bruce Chang fa0db212e7 shrink_group implementation to expose ncclCommShrink API (#164518 )

Closes #164529

To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch.

This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization.

For more info:  [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518
Approved by: https://github.com/kwen2501

2025-10-19 18:00:08 +00:00

_awaits

…

Introduce a generic API torch._C._accelerator_setAllocatorSettings (#165291 )

2025-10-19 15:34:36 +00:00

_C_flatbuffer

…

_custom_op

[2/N] Fix ruff warnings (#164460 )

2025-10-04 03:40:32 +00:00

_decomp

Enable all PIE rules on ruff (#165814 )

2025-10-18 07:36:18 +00:00

_dispatch

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

_dynamo

AOTI util deprecated flow using the new tracer (#165582 )

2025-10-19 15:52:16 +00:00

_export

[torch.export] Rmoving unused constants - add support for corner case (#165205 )

2025-10-14 20:26:28 +00:00

_functorch

Enable all flake8-logging-format rules (#164655 )

2025-10-19 00:59:28 +00:00

_higher_order_ops

[hop] run local_map with interpreter to preserve fx_traceback annotations (#165336 )

2025-10-16 02:53:17 +00:00

_inductor

Enable all flake8-logging-format rules (#164655 )

2025-10-19 00:59:28 +00:00

_lazy

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

_library

Enable all flake8-logging-format rules (#164655 )

2025-10-19 00:59:28 +00:00

_logging

Remove unnecessary noqa suppressions (#164106 )

2025-10-18 04:52:41 +00:00

_numpy

Fix self assignment (#165816 )

2025-10-18 18:51:52 +00:00

_prims

Fix self assignment (#165816 )

2025-10-18 18:51:52 +00:00

_prims_common

[2/N] Use "is" in python type comparison (#165142 )

2025-10-10 15:36:44 +00:00

_refs

Enable all PIE rules on ruff (#165814 )

2025-10-18 07:36:18 +00:00

_strobelight

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

_subclasses

Enable all flake8-logging-format rules (#164655 )

2025-10-19 00:59:28 +00:00

_vendor

…

accelerator

Add unified memory APIs for torch.accelerator (#152932 )

2025-08-08 17:41:22 +00:00

amp

Revert "[AMP][Refactor] Simplify dtype support logic in autocast context manager (#163446 )"

2025-10-10 15:12:46 +00:00

Enable all PIE rules on ruff (#165814 )

2025-10-18 07:36:18 +00:00

autograd

Enable all PIE rules on ruff (#165814 )

2025-10-18 07:36:18 +00:00

backends

Add torch.backends.mkldnn.is_acl_available() method (#165678 )

2025-10-16 22:34:21 +00:00

compiler

Megacache integration (#163533 )

2025-10-15 22:49:15 +00:00

contrib

…

cpu

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

csrc

shrink_group implementation to expose ncclCommShrink API (#164518 )

2025-10-19 18:00:08 +00:00

cuda

Introduce a generic API torch._C._accelerator_setAllocatorSettings (#165291 )

2025-10-19 15:34:36 +00:00

distributed

shrink_group implementation to expose ncclCommShrink API (#164518 )

2025-10-19 18:00:08 +00:00

distributions

[1/N] Use "is" in python type comparison (#165037 )

2025-10-10 12:36:50 +00:00

export

AOTI util deprecated flow using the new tracer (#165582 )

2025-10-19 15:52:16 +00:00

fft

[BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553 )

2025-06-17 08:18:47 +00:00

func

…

futures

[5/N] Apply ruff UP035 rule (#164423 )

2025-10-02 07:31:11 +00:00

Enable all flake8-logging-format rules (#164655 )

2025-10-19 00:59:28 +00:00

headeronly

Refactor out headeronly ArrayRef (#164991 )

2025-10-17 18:32:39 +00:00

jit

Fix missing brackets (#165138 )

2025-10-10 17:23:31 +00:00

legacy

…

lib

[2/N] Mark unused parameters in C++ code (#165121 )

2025-10-15 03:04:39 +00:00

linalg

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

masked

[1/N] Use "is" in python type comparison (#165037 )

2025-10-10 12:36:50 +00:00

monitor

…

mps

Add type annotations to MPS profiler utilities (#163486 )

2025-09-27 23:00:53 +00:00

mtia

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

multiprocessing

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

nativert

[2/N] Mark unused parameters in C++ code (#165121 )

2025-10-15 03:04:39 +00:00

nested

Enable all PIE rules on ruff (#165814 )

2025-10-18 07:36:18 +00:00

Fix self assignment (#165816 )

2025-10-18 18:51:52 +00:00

numa

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

onnx

Enable all flake8-logging-format rules (#164655 )

2025-10-19 00:59:28 +00:00

optim

[RFC] Add pyrefly to lintrunner (#165179 )

2025-10-16 20:07:09 +00:00

package

[1/N] Use "is" in python type comparison (#165037 )

2025-10-10 12:36:50 +00:00

profiler

Pyrefly suppressions 6/n (#164877 )

2025-10-08 02:30:57 +00:00

quantization

[RFC] Add pyrefly to lintrunner (#165179 )

2025-10-16 20:07:09 +00:00

signal

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

sparse

[RFC] Add pyrefly to lintrunner (#165179 )

2025-10-16 20:07:09 +00:00

special

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

testing

shrink_group implementation to expose ncclCommShrink API (#164518 )

2025-10-19 18:00:08 +00:00

utils

Enable PLC1802 on ruff (#165813 )

2025-10-18 05:44:14 +00:00

xpu

Add a new API torch.xpu.is_tf32_supported for Intel GPU (#163141 )

2025-10-12 12:11:57 +00:00

__config__.py

…

__future__.py

…

__init__.py

[BE][Ez]: Update torch.is_tensor documentation (#165841 )

2025-10-19 09:24:11 +00:00

_appdirs.py

…

_classes.py

remove allow-untyped-defs from torch/_classes.py (#157231 )

2025-07-08 00:11:52 +00:00

_compile.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

_custom_ops.py

…

_environment.py

…

_guards.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

_jit_internal.py

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

_linalg_utils.py

Update is_sparse doc to mention that it is sparse_coo specific (#157378 )

2025-07-09 18:22:14 +00:00

_lobpcg.py

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

_lowrank.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_meta_registrations.py

Enable all PIE rules on ruff (#165814 )

2025-10-18 07:36:18 +00:00

_namedtensor_internals.py

…

_ops.py

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

_python_dispatcher.py

Typo fixes for "overridden" in comments and function names (#155944 )

2025-06-14 03:37:38 +00:00

_size_docs.py

…

_sources.py

…

_storage_docs.py

Fix docstring for torch.UntypedStorage.from_file (#155067 )

2025-06-05 14:30:49 +00:00

_streambase.py

…

_tensor_docs.py

[reland] Allow setting grad_dtype on leaf tensors (#164751 )

2025-10-08 20:23:13 +00:00

_tensor_str.py

Enable all PIE rules on ruff (#165814 )

2025-10-18 07:36:18 +00:00

_tensor.py

Pyrefly suppressions 6/n (#164877 )

2025-10-08 02:30:57 +00:00

_thread_safe_fork.py

…

_torch_docs.py

Clarrifying input output angle unit in the docs for trigonometric fun… (#161248 )

2025-10-18 11:53:48 +00:00

_utils_internal.py

Revert "Call internal log_compilation_event if it exists (#164855 )"

2025-10-09 22:38:45 +00:00

_utils.py

Enable ruff rule E721 (#165162 )

2025-10-13 01:48:55 +00:00

_VF.py

…

_vmap_internals.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

_weights_only_unpickler.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

CMakeLists.txt

Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )"

2025-09-25 13:47:46 +00:00

custom_class_detail.h

Mark unused parameters in C++ code (#164912 )

2025-10-09 06:23:25 +00:00

custom_class.h

Mark unused parameters in C++ code (#164912 )

2025-10-09 06:23:25 +00:00

extension.h

…

functional.py

[2/N] Fix ruff warnings (#164460 )

2025-10-04 03:40:32 +00:00

header_only_apis.txt

Refactor out headeronly ArrayRef (#164991 )

2025-10-17 18:32:39 +00:00

hub.py

Enable PLC1802 on ruff (#165813 )

2025-10-18 05:44:14 +00:00

library.h

Mark unused parameters in C++ code (#164912 )

2025-10-09 06:23:25 +00:00

library.py

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

overrides.py

Add scaled_grouped_mm_v2 and python API (#165154 )

2025-10-15 17:47:23 +00:00

py.typed

…

quasirandom.py

…

random.py

Revert "Add device argument to torch.random.get_rng_state (#163034 )"

2025-10-04 15:25:45 +00:00

return_types.py

…

script.h

…

serialization.py

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

storage.py

[2/N] Use "is" in python type comparison (#165142 )

2025-10-10 15:36:44 +00:00

torch_version.py

…

types.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

version.py.tpl

…