[dynamo] fix store attr graph break in with block

[ghstack-poisoned]
Revert "shrink_group implementation to expose ncclCommShrink API (#164518 )"
2025-10-29 19:24:55 +08:00 · 2025-10-21 15:25:25 -07:00 · 2025-10-21 20:24:14 +00:00 · 2025-10-21 13:16:30 -07:00 · 2025-10-21 19:52:47 +00:00 · 2025-10-21 19:47:33 +00:00
13 changed files with 310 additions and 132 deletions
--- a/.ci/docker/requirements-docs.txt
+++ b/.ci/docker/requirements-docs.txt
@ -1,11 +1,15 @@
-sphinx==7.2.6
+sphinx==5.3.0
 #Description: This is used to generate PyTorch docs
-#Pinned versions: 7.2.6
+#Pinned versions: 5.3.0

-pytorch_sphinx_theme2==0.1.0
-#Description: This is needed to generate PyTorch docs
-#Pinned versions: 0.1.0
+standard-imghdr==3.13.0; python_version >= "3.13"
+#Description: This is needed by Sphinx, so it needs to be added here.
+# The reasons are as follows:
+# 1) This module has been removed from the Python standard library since Python 3.13(https://peps.python.org/pep-0594/#imghdr);
+# 2) The current version of Sphinx (5.3.0) is not compatible with Python 3.13.
+# Once Sphinx is upgraded to a version compatible with Python 3.13 or later, we can remove this dependency.

+-e git+https://github.com/pytorch/pytorch_sphinx_theme.git@71e55749be14ceb56e7f8211a9fb649866b87ad4#egg=pytorch_sphinx_theme2
 # TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering
 # but it doesn't seem to work and hangs around idly. The initial thought that it is probably
 # something related to Docker setup. We can investigate this later.
@ -32,17 +36,17 @@ tensorboard==2.18.0 ; python_version >= "3.13"
 #Description: This is used to generate PyTorch docs
 #Pinned versions: 2.13.0

-breathe==4.36.0
+breathe==4.34.0
 #Description: This is used to generate PyTorch C++ docs
-#Pinned versions: 4.36.0
+#Pinned versions: 4.34.0

-exhale==0.3.7
+exhale==0.2.3
 #Description: This is used to generate PyTorch C++ docs
-#Pinned versions: 0.3.7
+#Pinned versions: 0.2.3

-docutils==0.20
+docutils==0.16
 #Description: This is used to generate PyTorch C++ docs
-#Pinned versions: 0.20
+#Pinned versions: 0.16

 bs4==0.0.1
 #Description: This is used to generate PyTorch C++ docs
@ -52,13 +56,13 @@ IPython==8.12.0
 #Description: This is used to generate PyTorch functorch docs
 #Pinned versions: 8.12.0

-myst-nb==1.3.0
+myst-nb==0.17.2
 #Description: This is used to generate PyTorch functorch and torch.compile docs.
-#Pinned versions: 1.3.0
+#Pinned versions: 0.17.2

 # The following are required to build torch.distributed.elastic.rendezvous.etcd* docs
 python-etcd==0.4.5
 sphinx-copybutton==0.5.0
-sphinx-design==0.6.1
+sphinx-design==0.4.0
 sphinxcontrib-mermaid==1.0.0
-myst-parser==4.0.1
+myst-parser==0.18.1
--- a/.ci/lumen_cli/pyproject.toml
+++ b/.ci/lumen_cli/pyproject.toml
@ -6,7 +6,7 @@ dependencies = [
    "GitPython==3.1.45",
    "docker==7.1.0",
    "pytest==7.3.2",
-    "uv==0.8.6"
+    "uv==0.9.5"
 ]

 [tool.setuptools]
--- a/.ci/pytorch/python_doc_push_script.sh
+++ b/.ci/pytorch/python_doc_push_script.sh
@ -102,18 +102,8 @@ if [ "$is_main_doc" = true ]; then
    echo coverage output not found
    exit 1
  elif [ $undocumented -gt 0 ]; then
-    echo "======================================"
-    echo "ERROR: $undocumented undocumented objects found!"
-    echo "======================================"
-    echo ""
-    echo "Full coverage report:"
+    echo undocumented objects found:
    cat build/coverage/python.txt
-    echo ""
-    echo "======================================"
-    echo "Undocumented modules/objects (lines after TOTAL):"
-    tail -n +$((lines - undocumented + 1)) build/coverage/python.txt
-    echo "======================================"
-    echo ""
    echo "Make sure you've updated relevant .rsts in docs/source!"
    echo "You can reproduce locally by running 'cd docs && make coverage && cat build/coverage/python.txt'"
    exit 1
--- a/aten/src/ATen/native/cuda/fused_adagrad_utils.cuh
+++ b/aten/src/ATen/native/cuda/fused_adagrad_utils.cuh
@ -52,7 +52,7 @@ struct FusedAdagradMathFunctor {
  using opmath_t = at::opmath_type<scalar_t>;

  C10_DEVICE __forceinline__ void operator()(
-      int chunk_size,
+      int64_t chunk_size,
      FusedOptimizerTensorListMetadata<3>& tl,
      const float* lr_ptr,
      const double& lr,
@ -133,4 +133,4 @@ struct FusedAdagradMathFunctor {

 } // namespace

-} // namespace at::native
+} // namespace at::native
--- a/benchmarks/dynamo/pr_time_benchmarks/expected_results.csv
+++ b/benchmarks/dynamo/pr_time_benchmarks/expected_results.csv
@ -1,8 +1,8 @@
-add_loop_eager,compile_time_instruction_count,3070000000,0.1
+add_loop_eager,compile_time_instruction_count,3184000000,0.1



-add_loop_eager_dynamic,compile_time_instruction_count,4432000000,0.1
+add_loop_eager_dynamic,compile_time_instruction_count,4595000000,0.1



@ -18,7 +18,7 @@ add_loop_inductor_gpu,compile_time_instruction_count,26800000000,0.1



-basic_modules_ListOfLinears_eager,compile_time_instruction_count,1048000000,0.1
+basic_modules_ListOfLinears_eager,compile_time_instruction_count,1096000000,0.1



@ -26,7 +26,7 @@ basic_modules_ListOfLinears_inductor,compile_time_instruction_count,15240000000,



-basic_modules_ListOfLinears_inductor_gpu_force_shape_pad,compile_time_instruction_count,17020000000,0.1
+basic_modules_ListOfLinears_inductor_gpu_force_shape_pad,compile_time_instruction_count,17720000000,0.1



@ -34,11 +34,11 @@ basic_modules_ListOfLinears_inductor_gpu,compile_time_instruction_count,11090000



-update_hint_regression,compile_time_instruction_count,1719000000,0.1
+update_hint_regression,compile_time_instruction_count,1645000000,0.1



-sum_floordiv_regression,compile_time_instruction_count,3686995725,0.1
+sum_floordiv_regression,compile_time_instruction_count,3813000000,0.1



@ -50,31 +50,31 @@ symint_sum_loop,compile_time_instruction_count,4299000000,0.1



-aotdispatcher_inference_nosubclass_cpu,compile_time_instruction_count,1869000000,0.1
+aotdispatcher_inference_nosubclass_cpu,compile_time_instruction_count,1793000000,0.1



-aotdispatcher_inference_subclass_cpu,compile_time_instruction_count,5281000000,0.1
+aotdispatcher_inference_subclass_cpu,compile_time_instruction_count,5120000000,0.1



-aotdispatcher_partitioner_cpu,compile_time_instruction_count,8333000000,0.1
+aotdispatcher_partitioner_cpu,compile_time_instruction_count,7936000000,0.1



-aotdispatcher_partitioner_cpu2,compile_time_instruction_count,1909000000,0.1
+aotdispatcher_partitioner_cpu2,compile_time_instruction_count,1848000000,0.1



-aotdispatcher_training_nosubclass_cpu,compile_time_instruction_count,3442000000,0.1
+aotdispatcher_training_nosubclass_cpu,compile_time_instruction_count,3152000000,0.1



-aotdispatcher_training_subclass_cpu,compile_time_instruction_count,9239000000,0.1
+aotdispatcher_training_subclass_cpu,compile_time_instruction_count,8301000000,0.1



-mm_loop_inductor_gpu,compile_time_instruction_count,4820968837,0.1
+mm_loop_inductor_gpu,compile_time_instruction_count,4958000000,0.1



@ -82,8 +82,8 @@ mm_loop_inductor_dynamic_gpu,compile_time_instruction_count,9051000000,0.1



-basic_NestedModule_eager,compile_time_instruction_count,9554000000,0.1
+basic_NestedModule_eager,compile_time_instruction_count,9990000000,0.1



-basic_InlineMod_eager,compile_time_instruction_count,7618000000,0.1
+basic_InlineMod_eager,compile_time_instruction_count,8126000000,0.1
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -207,42 +207,6 @@ templates_path = [
 ]
 # TODO: document these and remove them from here.

-# Fixes the duplicated
-autosummary_filename_map = {
-    "torch.nn.utils.prune.identity": "torch.nn.utils.prune.identity_function",
-    "torch.nn.utils.prune.Identity": "torch.nn.utils.prune.Identity_class",
-    "torch.optim.adamw.adamw": "torch.optim.adamw.adamw_function",
-    "torch.optim.adamw.AdamW": "torch.optim.adamw.AdamW_class",
-    "torch.optim.asgd.asgd": "torch.optim.asgd.asgd_function",
-    "torch.optim.asgd.ASGD": "torch.optim.asgd.ASGD_class",
-    "torch.optim.nadam.nadam": "torch.optim.nadam.nadam_function",
-    "torch.optim.nadam.NAdam": "torch.optim.nadam.NAdam_class",
-    "torch.optim.radam.radam": "torch.optim.radam.radam_function",
-    "torch.optim.radam.RAdam": "torch.optim.radam.RAdam_class",
-    "torch.optim.rmsprop.rmsprop": "torch.optim.rmsprop.rmsprop_function",
-    "torch.optim.rmsprop.RMSprop": "torch.optim.rmsprop.RMSprop_class",
-    "torch.optim.rprop.rprop": "torch.optim.rprop.rprop_function",
-    "torch.optim.rprop.Rprop": "torch.optim.rprop.Rprop_class",
-    "torch.optim.sgd.sgd": "torch.optim.sgd.sgd_function",
-    "torch.optim.sgd.SGD": "torch.optim.sgd.SGD_class",
-    "torch.optim.adadelta.adadelta": "torch.optim.adadelta.adadelta_function",
-    "torch.optim.adadelta.Adadelta": "torch.optim.adadelta.Adadelta_class",
-    "torch.optim.adagrad.adagrad": "torch.optim.adagrad.adagrad_function",
-    "torch.optim.adagrad.Adagrad": "torch.optim.adagrad.Adagrad_class",
-    "torch.optim.adam.adam": "torch.optim.adam.adam_function",
-    "torch.optim.adam.Adam": "torch.optim.adam.Adam_class",
-    "torch.optim.adamax.adamax": "torch.optim.adamax.adamax_function",
-    "torch.optim.adamax.Adamax": "torch.optim.adamax.Adamax_class",
-    "torch.mtia.stream": "torch.mtia.stream_function",
-    "torch.mtia.Stream": "torch.mtia.Stream_class",
-    "torch.cpu.stream": "torch.cpu.stream_function",
-    "torch.cpu.Stream": "torch.cpu.Stream_class",
-    "torch.cuda.stream": "torch.cuda.stream_function",
-    "torch.cuda.Stream": "torch.cuda.Stream_class",
-    "torch.xpu.stream": "torch.xpu.stream_function",
-    "torch.xpu.Stream": "torch.xpu.Stream_class",
-}
-
 coverage_ignore_functions = [
    # torch
    "typename",
@ -3229,11 +3193,6 @@ autodoc_type_aliases = {
 # Enable overriding of function signatures in the first line of the docstring.
 autodoc_docstring_signature = True

-# Exclude inherited IntEnum methods that have RST formatting issues in their docstrings
-autodoc_default_options = {
-    "exclude-members": "from_bytes, to_bytes",
-}
-
 # -- katex javascript in header
 #
 #    def setup(app):
--- a/docs/source/quantization-support.md
+++ b/docs/source/quantization-support.md
@ -253,6 +253,7 @@ regular full-precision tensor.
 .. autosummary::
    :toctree: generated
    :nosignatures:
+    :template: classtemplate.rst

    view
    as_strided
--- a/test/dynamo/test_repros.py
+++ b/test/dynamo/test_repros.py
@ -7284,6 +7284,22 @@ def forward(self, s77 : torch.SymInt, s27 : torch.SymInt, L_x_ : torch.Tensor):
        flag = False
        self.assertEqual(fn(inp), opt_fn(inp))

+    def test_store_attr_graph_break_key_error(self):
+        # STORE_ATTR on dummy should result in graph break
+        def dummy():
+            pass
+
+        def fn(x):
+            x = x + 2
+            with torch.no_grad():
+                dummy.attr1 = x
+            return x + 4
+
+        inp = torch.ones(3)
+        opt_fn = torch.compile(fn, backend="eager")
+        self.assertEqual(fn(inp), opt_fn(inp))
+        self.assertGreater(len(torch._dynamo.utils.counters["graph_break"]), 0)
+
    def test_cells_unsupported_step_exception(self):
        # This error happened because:
        #  - we were generating cells into a list on the stack
--- a/test/test_fx.py
+++ b/test/test_fx.py
@ -6,6 +6,7 @@ import builtins
 import collections
 import contextlib
 import copy
+import gc
 import functools
 import inspect
 import io
@ -19,6 +20,7 @@ import traceback
 import types
 import typing
 import unittest
+import weakref
 import warnings
 from math import sqrt
 from torch.multiprocessing import Process
@ -1624,6 +1626,25 @@ class TestFX(JitTestCase):

        self.assertTrue(neg not in relu.users)

+    @skipIfTorchDynamo("Dynamo does not free right away")
+    def test_prepend_does_not_leak(self):
+        g = Graph()
+        x = g.placeholder("x")
+        relu = g.call_function(torch.relu, (x,))
+        neg = g.call_function(torch.neg, (x,))
+
+        relu.prepend(neg)
+
+        ref = weakref.ref(neg)
+        g.erase_node(neg)
+        del g
+        del x
+        del relu
+        del neg
+        gc.collect()
+
+        self.assertIsNone(ref())
+
    def test_remove_uses_with_custom_filter(self):
        g: torch.fx.Graph = Graph()
        x: torch.fx.Node = g.placeholder("x")
--- a/torch/_C/init.pyi.in
+++ b/torch/_C/init.pyi.in
@ -2758,6 +2758,12 @@ class _NodeBase:
        return_type: Any,
    ) -> None: ...
    def _update_args_kwargs(self, args: tuple[Any, ...], kwargs: dict[str, Any]): ...
+    def _prepend(self, n: FxNode) -> None: ...
+    def _remove_from_list(self) -> None: ...
+    def __lt__(self, n: Self) -> _bool: ...
+    def __gt__(self, n: Self) -> _bool: ...
+    def __le__(self, n: Self) -> _bool: ...
+    def __ge__(self, n: Self) -> _bool: ...

 class _NodeIter(Iterator[FxNode]):
    def __init__(self, root: FxNode, reversed: _bool) -> None: ...
--- a/torch/_dynamo/symbolic_convert.py
+++ b/torch/_dynamo/symbolic_convert.py
@ -2638,7 +2638,9 @@ class InstructionTranslatorBase(
            reason=GraphCompileReason("store_attr", [self.frame_summary()]),
            stack_pops=2,
        )
-        self.output.add_output_instructions([copy.copy(inst)])
+        inst_copy = copy.copy(inst)
+        inst_copy.exn_tab_entry = None
+        self.output.add_output_instructions([inst_copy])
        self.popn(2)
        self.output.add_output_instructions(
            self.codegen_fix_leaf_stack(
--- a/torch/csrc/fx/node.cpp
+++ b/torch/csrc/fx/node.cpp
@ -1,11 +1,15 @@
 #include <torch/csrc/fx/node.h>

+#include <c10/util/Exception.h>
+#include <c10/util/SmallVector.h>
 #include <structmember.h>
 #include <torch/csrc/utils/object_ptr.h>
 #include <torch/csrc/utils/pythoncapi_compat.h>
+#include <algorithm>

 namespace {

+using NodeSortKey = c10::SmallVector<int64_t, 4>;
 struct NodeBase;

 // Thrown to exit out of a C++ function and return an error to Python.
@ -163,7 +167,41 @@ struct NodeBase {
  PyObject* users;
  PyObject* _repr_fn;
  PyObject* meta;
-  PyObject* _sort_key;
+  // NOLINTNEXTLINE(cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)
+  alignas(NodeSortKey) char sort_key_buf[sizeof(NodeSortKey)];
+
+  inline NodeSortKey& sort_key() {
+    return *reinterpret_cast<NodeSortKey*>(sort_key_buf);
+  }
+
+  inline void set_prev(NodeBase* value) {
+    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(value);
+    Py_INCREF(reinterpret_cast<PyObject*>(value));
+    NodeBase* old = _prev;
+    _prev = value;
+    Py_DECREF(reinterpret_cast<PyObject*>(old));
+  }
+
+  inline void set_next(NodeBase* value) {
+    TORCH_INTERNAL_ASSERT_DEBUG_ONLY(value);
+    Py_INCREF(reinterpret_cast<PyObject*>(value));
+    NodeBase* old = _next;
+    _next = value;
+    Py_DECREF(reinterpret_cast<PyObject*>(old));
+  }
+
+  // Equivalent to:
+  //   p, n = self._prev, self._next
+  //   p._next, n._prev = n, p
+  inline void remove_from_list() {
+    if (this->_prev == this && this->_next == this) {
+      return;
+    }
+    NodeBase* p = this->_prev;
+    NodeBase* n = this->_next;
+    p->set_next(n);
+    n->set_prev(p);
+  }
 };

 static PyObject* NodeBase_new(
@ -173,6 +211,8 @@ static PyObject* NodeBase_new(
  PyObject* self = type->tp_alloc(type, 0);
  if (!self)
    return nullptr;
+  new (reinterpret_cast<NodeBase*>(self)->sort_key_buf)
+      NodeSortKey(); // placement new does not allocate
  return self;
 }

@ -201,7 +241,6 @@ static int NodeBase_init_fn(NodeBase* self, PyObject* args, PyObject* kwds) {
  self->users = PyDict_New();
  self->_repr_fn = Py_NewRef(Py_None);
  self->meta = PyDict_New();
-  self->_sort_key = PyTuple_New(0);
  return 0;
 }

@ -221,7 +260,6 @@ static struct PyMemberDef NodeBase_members[] = {
    {"users", T_OBJECT_EX, offsetof(NodeBase, users), 0, nullptr},
    {"_repr_fn", T_OBJECT_EX, offsetof(NodeBase, _repr_fn), 0, nullptr},
    {"meta", T_OBJECT_EX, offsetof(NodeBase, meta), 0, nullptr},
-    {"_sort_key", T_OBJECT_EX, offsetof(NodeBase, _sort_key), 0, nullptr},
    {nullptr} /* Sentinel */
 };

@ -239,7 +277,6 @@ static int NodeBase_traverse(NodeBase* self, visitproc visit, void* arg) {
  Py_VISIT(self->users);
  Py_VISIT(self->_repr_fn);
  Py_VISIT(self->meta);
-  Py_VISIT(self->_sort_key);
  return 0;
 }

@ -257,12 +294,12 @@ static int NodeBase_clear(NodeBase* self) {
  Py_CLEAR(self->users);
  Py_CLEAR(self->_repr_fn);
  Py_CLEAR(self->meta);
-  Py_CLEAR(self->_sort_key);
  return 0;
 }

 static void NodeBase_dealloc(PyObject* self) {
  PyObject_GC_UnTrack(self);
+  reinterpret_cast<NodeBase*>(self)->sort_key().~NodeSortKey();
  (void)NodeBase_clear((NodeBase*)self);
  Py_TYPE(self)->tp_free(self);
 }
@ -321,15 +358,195 @@ static PyObject* NodeBase__update_args_kwargs(
  }
 }

+static PyObject* NodeBase__remove_from_list(
+    PyObject* self,
+    PyObject* _ignored) {
+  reinterpret_cast<NodeBase*>(self)->remove_from_list();
+  Py_RETURN_NONE;
+}
+
+static PyObject* NodeBase__prepend(PyObject* self_, PyObject* arg) {
+  if (self_ == arg) {
+    Py_RETURN_NONE;
+  }
+  if (!is_node(arg)) {
+    PyErr_SetString(PyExc_TypeError, "_prepend() argument must be a Node");
+    return nullptr;
+  }
+  NodeBase* self = reinterpret_cast<NodeBase*>(self_);
+  NodeBase* x = reinterpret_cast<NodeBase*>(arg);
+  if (self->graph != x->graph) {
+    PyErr_SetString(
+        PyExc_AssertionError,
+        "Attempting to move a Node into a different Graph");
+    return nullptr;
+  }
+
+  x->remove_from_list();
+  NodeBase* p = self->_prev;
+  p->set_next(x);
+  x->set_prev(p);
+  x->set_next(self);
+  self->set_prev(x);
+
+  // Now compute x.sort_key()
+  const NodeSortKey& psk = x->_prev->sort_key();
+  const NodeSortKey& nsk = x->_next->sort_key();
+  if (psk.size() > nsk.size()) {
+    // prefix = psk[: len(nsk)+1]
+    size_t slice_len = nsk.size() + 1;
+    NodeSortKey prefix(psk.begin(), psk.begin() + slice_len);
+    // last element is idx => increment by 1
+    prefix.back()++;
+    x->sort_key() = std::move(prefix);
+  } else if (psk.size() < nsk.size()) {
+    // prefix = nsk[: len(psk)+1]
+    size_t slice_len = psk.size() + 1;
+    NodeSortKey prefix(nsk.begin(), nsk.begin() + slice_len);
+    // last element is idx => decrement by 1
+    prefix.back()--;
+    x->sort_key() = std::move(prefix);
+  } else {
+    // same length => add a 0
+    x->sort_key() = psk;
+    x->sort_key().emplace_back(0);
+  }
+  Py_RETURN_NONE;
+}
+
+// __lt__(self, other): Return self.sort_key < other.sort_key
+static PyObject* NodeBase___lt__(PyObject* self, PyObject* other) {
+  // METH_O => one argument: 'other'
+  if (!is_node(other)) {
+    Py_RETURN_NOTIMPLEMENTED;
+  }
+  const NodeSortKey& lhs = reinterpret_cast<NodeBase*>(self)->sort_key();
+  const NodeSortKey& rhs = reinterpret_cast<NodeBase*>(other)->sort_key();
+  bool less = std::lexicographical_compare(
+      lhs.begin(), lhs.end(), rhs.begin(), rhs.end());
+  if (less)
+    Py_RETURN_TRUE;
+  Py_RETURN_FALSE;
+}
+
+// __gt__(self, other): Return self.sort_key() > other.sort_key
+static PyObject* NodeBase___gt__(PyObject* self, PyObject* other) {
+  if (!is_node(other)) {
+    Py_RETURN_NOTIMPLEMENTED;
+  }
+  const NodeSortKey& lhs = reinterpret_cast<NodeBase*>(self)->sort_key();
+  const NodeSortKey& rhs = reinterpret_cast<NodeBase*>(other)->sort_key();
+  // "a > b" is equivalent to "b < a"
+  bool greater = std::lexicographical_compare(
+      rhs.begin(), rhs.end(), lhs.begin(), lhs.end());
+  if (greater)
+    Py_RETURN_TRUE;
+  Py_RETURN_FALSE;
+}
+
+static PyObject* NodeBase___ge__(PyObject* self, PyObject* other) {
+  if (self == other) {
+    Py_RETURN_TRUE;
+  }
+  return NodeBase___gt__(self, other);
+}
+
+// __le__(self, other): Return not (self > other)
+static PyObject* NodeBase___le__(PyObject* self, PyObject* other) {
+  if (self == other) {
+    Py_RETURN_TRUE;
+  }
+  return NodeBase___lt__(self, other);
+}
+
+// Convert the NodeBase::sort_key vector<long> into a Python tuple of ints
+// Only used by pickle/__getstate__
+static PyObject* NodeBase_get_sort_key(PyObject* self, void* /*closure*/) {
+  NodeBase* node = reinterpret_cast<NodeBase*>(self);
+  const NodeSortKey& vec = node->sort_key();
+  Py_ssize_t n = static_cast<Py_ssize_t>(vec.size());
+  THPObjectPtr tuple(PyTuple_New(n));
+  if (!tuple) {
+    return nullptr; // Out of memory
+  }
+  for (Py_ssize_t i = 0; i < n; i++) {
+    PyObject* value = PyLong_FromSsize_t(vec[i]);
+    if (!value) {
+      return nullptr;
+    }
+    PyTuple_SET_ITEM(tuple.get(), i, value);
+  }
+  return tuple.release();
+}
+
+// Setter for NodeBase::sort_key: expects a Python tuple of ints, e.g.
+// node._sort_key = (1,2,3) Only used by pickle/__setstate__
+static int NodeBase_set_sort_key(
+    PyObject* self,
+    PyObject* value,
+    void* /*closure*/) {
+  NodeBase* node = reinterpret_cast<NodeBase*>(self);
+  if (!PyTuple_Check(value)) {
+    PyErr_SetString(PyExc_TypeError, "_sort_key must be an tuple of ints");
+    return -1;
+  }
+  Py_ssize_t size = PyTuple_GET_SIZE(value);
+  NodeSortKey new_vec;
+  new_vec.reserve(size);
+  for (Py_ssize_t i = 0; i < size; i++) {
+    int64_t val = PyLong_AsSsize_t(PyTuple_GET_ITEM(value, i));
+    if (val == -1 && PyErr_Occurred()) {
+      return -1;
+    }
+    new_vec.emplace_back(val);
+  }
+  node->sort_key() = std::move(new_vec);
+  return 0;
+}
+
 // NOLINTNEXTLINE(cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)
 static PyMethodDef NodeBase_methods[] = {
    {"_update_args_kwargs",
     (PyCFunction)(void*)(NodeBase__update_args_kwargs),
     METH_FASTCALL,
     "Internal method: do not call directly."},
+    {"_remove_from_list",
+     (PyCFunction)(void*)(NodeBase__remove_from_list),
+     METH_NOARGS,
+     "Internal method: do not call directly."},
+    {"_prepend",
+     (PyCFunction)(void*)(NodeBase__prepend),
+     METH_O,
+     "Internal method: do not call directly."},
+    {"__lt__",
+     (PyCFunction)(void*)NodeBase___lt__,
+     METH_O,
+     "Return True if self.sort_key < other.sort_key"},
+    {"__gt__",
+     (PyCFunction)(void*)NodeBase___gt__,
+     METH_O,
+     "Return True if self.sort_key > other.sort_key"},
+    {"__ge__",
+     (PyCFunction)(void*)NodeBase___ge__,
+     METH_O,
+     "Return True if self.sort_key >= other.sort_key"},
+    {"__le__",
+     (PyCFunction)(void*)NodeBase___le__,
+     METH_O,
+     "Return True if self.sort_key <= other.sort_key"},
    {nullptr, nullptr, 0, nullptr} // Sentinel
 };

+// NOLINTNEXTLINE(cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays)
+static PyGetSetDef NodeBase_getset[] = {
+    {"_sort_key", // attribute name in Python
+     (getter)NodeBase_get_sort_key, // C getter function
+     (setter)NodeBase_set_sort_key, // C setter function
+     (char*)"The sort key as a tuple of ints", // docstring
+     nullptr},
+    {nullptr, nullptr, nullptr, nullptr, nullptr} // Sentinel
+};
+
 PyTypeObject NodeBaseType = {
    PyVarObject_HEAD_INIT(nullptr, 0)
    "torch._C._NodeBase", /* tp_name */
@ -361,7 +578,7 @@ PyTypeObject NodeBaseType = {
    nullptr, /* tp_iternext */
    NodeBase_methods, /* tp_methods */
    NodeBase_members, /* tp_members */
-    nullptr, /* tp_getset */
+    NodeBase_getset, /* tp_getset */
    nullptr, /* tp_base */
    nullptr, /* tp_dict */
    nullptr, /* tp_descr_get */
--- a/torch/fx/node.py
+++ b/torch/fx/node.py
@ -385,41 +385,7 @@ class Node(_NodeBase):
        Args:
            x (Node): The node to put before this node. Must be a member of the same graph.
        """
-        assert self.graph == x.graph, "Attempting to move a Node into a different Graph"
-        if self == x:
-            log.debug(
-                "Trying to prepend a node to itself. This behavior has no effect on the graph."
-            )
-            return
-        x._remove_from_list()
-        p = self._prev
-        p._next, x._prev = x, p
-        x._next, self._prev = self, x
-
-        # compute x._sort_key
-        psk = x._prev._sort_key
-        nsk = x._next._sort_key
-        if len(psk) > len(nsk):
-            idx: int
-            *prefix, idx = psk[: len(nsk) + 1]
-            x._sort_key = (*prefix, idx + 1)
-        elif len(psk) < len(nsk):
-            *prefix, idx = nsk[: len(psk) + 1]
-            x._sort_key = (*prefix, idx - 1)
-        else:  # same length, increase length by 1
-            x._sort_key = (*psk, 0)
-
-    def __gt__(self, other: "Node") -> bool:
-        return self._sort_key > other._sort_key
-
-    def __lt__(self, other: "Node") -> bool:
-        return self._sort_key < other._sort_key
-
-    def __ge__(self, other: "Node") -> bool:
-        return self > other or self == other
-
-    def __le__(self, other: "Node") -> bool:
-        return self < other or self == other
+        self._prepend(x)

    @compatibility(is_backward_compatible=True)
    def append(self, x: "Node") -> None:
@ -430,11 +396,7 @@ class Node(_NodeBase):
        Args:
            x (Node): The node to put after this node. Must be a member of the same graph.
        """
-        self._next.prepend(x)
-
-    def _remove_from_list(self) -> None:
-        p, n = self._prev, self._next
-        p._next, n._prev = n, p
+        self._next._prepend(x)

    @property
    def args(self) -> tuple[Argument, ...]:
Author	SHA1	Message	Date
William Wen	c49752c2c6	[dynamo] fix store attr graph break in with block [ghstack-poisoned]	2025-10-21 15:25:25 -07:00
PyTorch MergeBot	ad4dc52bf6	Revert "shrink_group implementation to expose ncclCommShrink API (#164518 )" This reverts commit 4e643422f63a3cdd71bd141615f98de6bb54d15f. Reverted https://github.com/pytorch/pytorch/pull/164518 on behalf of https://github.com/albanD due to Breaks lint ([comment](https://github.com/pytorch/pytorch/pull/164518#issuecomment-3429426503))	2025-10-21 20:24:14 +00:00
dependabot[bot]	dac9ed9790	Bump uv from 0.8.6 to 0.9.5 in /.ci/lumen_cli (#166017 ) Bumps [uv](https://github.com/astral-sh/uv) from 0.8.6 to 0.9.5. - [Release notes](https://github.com/astral-sh/uv/releases) - [Changelog](https://github.com/astral-sh/uv/blob/main/CHANGELOG.md) - [Commits](https://github.com/astral-sh/uv/compare/0.8.6...0.9.5) --- updated-dependencies: - dependency-name: uv dependency-version: 0.9.5 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-21 13:16:30 -07:00
linhaifeng	1c7fe8f861	[BugFix] chunk_size should always be int64_t (#165971 ) aspired by https://github.com/pytorch/pytorch/pull/156872 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165971 Approved by: https://github.com/albanD	2025-10-21 19:52:47 +00:00
Bruce Chang	4e643422f6	shrink_group implementation to expose ncclCommShrink API (#164518 ) Closes #164529 To expose the new [ncclCommShrink](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/comms.html#ncclcommshrink) API to PyTorch. This is useful when you need to exclude certain GPUs or nodes from a collective operation, for example in fault tolerance scenarios or when dynamically adjusting resource utilization. For more info: [Shrinking a communicator](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html#shrinking-a-communicator) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164518 Approved by: https://github.com/kwen2501	2025-10-21 19:47:33 +00:00
Jason Ansel	3c3b278872	[reland][fx] Move Node._prepend/Node._remove_from_list to C++ (#165882 ) Relands #148261 that was reverted by #150542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165882 Approved by: https://github.com/ezyang	2025-10-21 19:43:55 +00:00