pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

ruisizhang123 6adaa328f4 [autobucketing] aten autobucketing fix to enable aot_eager pass (#165063 )

When the autobucketing pass  is registered as aot_eager backend `fw_compiler` and `bw_compiler`, this pr ensures the tensors are all-gathers on "cpu/cuda" device instead of "meta" device.

When we do `dist.all_gather_object`, it will create new bytestorage outside no_dispatch [here](a2e2e1d8c0/torch/distributed/distributed_c10d.py (L3303)), which is on meta device. Thus, I updated the code to use `unset_fake_temporarily`, which would gather RealTensor from other ranks.

 It is needed to unblock the aot_eager+autobucketing pass in this [PR](https://github.com/pytorch/torchtitan/pull/1813).

Otherwise, I hit the error as follows:

```bash
  traceback : Traceback (most recent call last):
    File "/home/ruisizhang123/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 358, in wrapper
      return f(*args, **kwargs)
    File "/home/ruisizhang123/torchtitan/torchtitan/train.py", line 607, in train
      self.train_step(data_iterator)
      ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
    File "/home/ruisizhang123/torchtitan/torchtitan/train.py", line 507, in train_step
      loss = self.forward_backward_step(input_dict, labels)
    File "/home/ruisizhang123/torchtitan/torchtitan/train.py", line 483, in forward_backward_step
      pred = model_parts[0](inputs, **extra_inputs, **extra_args)
    File "/home/ruisizhang123/pytorch/torch/_dynamo/eval_frame.py", line 418, in __call__
      return super().__call__(*args, **kwargs)
             ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    File "/home/ruisizhang123/pytorch/torch/nn/modules/module.py", line 1784, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
             ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    File "/home/ruisizhang123/pytorch/torch/nn/modules/module.py", line 1795, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/ruisizhang123/pytorch/torch/_dynamo/eval_frame.py", line 901, in compile_wrapper
      raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/ruisizhang123/pytorch/torch/_dynamo/output_graph.py", line 2359, in _call_user_compiler
      raise BackendCompilerFailed(
          self.compiler_fn, e, inspect.currentframe()
      ).with_traceback(e.__traceback__) from None
    File "/home/ruisizhang123/pytorch/torch/_dynamo/output_graph.py", line 2334, in _call_user_compiler
      compiled_fn = compiler_fn(gm, example_inputs)
    File "/home/ruisizhang123/pytorch/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__
      compiled_gm = compiler_fn(gm, example_inputs)
    File "/home/ruisizhang123/pytorch/torch/__init__.py", line 2441, in __call__
      return self.compiler_fn(model_, inputs_, **self.kwargs)
             ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/ruisizhang123/pytorch/torch/_dynamo/backends/common.py", line 117, in __call__
      cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
    File "/home/ruisizhang123/pytorch/torch/_functorch/aot_autograd.py", line 1100, in aot_module_simplified
      compiled_fn, _ = aot_stage2_compile(
                       ~~~~~~~~~~~~~~~~~~^
          aot_state,
          ^^^^^^^^^^
      ...<4 lines>...
          inference_compiler,
          ^^^^^^^^^^^^^^^^^^^
      )
      ^
    File "/home/ruisizhang123/pytorch/torch/_functorch/_aot_autograd/graph_compile.py", line 257, in aot_stage2_compile
      return aot_stage2_autograd(aot_state, aot_graph_capture)
    File "/home/ruisizhang123/pytorch/torch/_functorch/_aot_autograd/graph_compile.py", line 1696, in aot_stage2_autograd
      compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
    File "/home/ruisizhang123/torchtitan/torchtitan/experiments/simple_fsdp/backend.py", line 35, in aten_autobucketing_reordering_pass
      schedule_overlap_bucketing(gm)
      ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^
    File "/home/ruisizhang123/pytorch/torch/_inductor/fx_passes/overlap_scheduling.py", line 755, in schedule_overlap_bucketing
      ).run()
        ~~~^^
    File "/home/ruisizhang123/pytorch/torch/_inductor/fx_passes/overlap_scheduling.py", line 358, in run
      self._align_compute_nodes_runtime_estimations_across_all_distributed_ranks()
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
    File "/home/ruisizhang123/pytorch/torch/_inductor/fx_passes/overlap_scheduling.py", line 337, in _align_compute_nodes_runtime_estimations_across_all_distributed_ranks
      dist.all_gather_object(
      ~~~~~~~~~~~~~~~~~~~~~~^
          gathered_runtime_estimations, runtime_estimations, pg
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      )
      ^
    File "/home/ruisizhang123/pytorch/torch/distributed/c10d_logger.py", line 82, in wrapper
      return func(*args, **kwargs)
    File "/home/ruisizhang123/pytorch/torch/distributed/distributed_c10d.py", line 3170, in all_gather_object
      input_tensor, local_size = _object_to_tensor(obj, current_device, group)
                                 ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/ruisizhang123/pytorch/torch/distributed/distributed_c10d.py", line 3079, in _object_to_tensor
      byte_tensor = torch.ByteTensor(byte_storage).to(device)
                    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  torch._dynamo.exc.BackendCompilerFailed: backend='compiler_fn' raised:
  RuntimeError: Attempted to set the storage of a tensor on device "cpu" to a storage on different device "meta".  This is no longer allowed; the devices must match.

  Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165063
Approved by: https://github.com/eellison

2025-10-14 17:09:54 +00:00

_awaits

…

Revert "[opaque_obj_v2] PyObject custom op schema type (#165004 )"

2025-10-13 20:08:38 +00:00

_C_flatbuffer

…

_custom_op

[2/N] Fix ruff warnings (#164460 )

2025-10-04 03:40:32 +00:00

_decomp

Enable ruff rule E721 (#165162 )

2025-10-13 01:48:55 +00:00

_dispatch

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

_dynamo

[export] Turn on install_free_tensors flag (#164691 )

2025-10-14 15:33:50 +00:00

_export

[export] Turn on install_free_tensors flag (#164691 )

2025-10-14 15:33:50 +00:00

_functorch

[annotate] Annotation should be mapped across submod (#165202 )

2025-10-14 16:19:38 +00:00

_higher_order_ops

Enable ruff rule E721 (#165162 )

2025-10-13 01:48:55 +00:00

_inductor

[autobucketing] aten autobucketing fix to enable aot_eager pass (#165063 )

2025-10-14 17:09:54 +00:00

_lazy

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

_library

[Bugfix][vLLM] Explicitly do not support instead of crashing for named tuples in infer schema (#165191 )

2025-10-14 14:18:42 +00:00

_logging

Enable ruff rule E721 (#165162 )

2025-10-13 01:48:55 +00:00

_numpy

Enable ruff rule E721 (#165162 )

2025-10-13 01:48:55 +00:00

_prims

Add pyrefly suppressions 2/n (#164513 )

2025-10-03 02:46:13 +00:00

_prims_common

[2/N] Use "is" in python type comparison (#165142 )

2025-10-10 15:36:44 +00:00

_refs

Enable ruff rule E721 (#165162 )

2025-10-13 01:48:55 +00:00

_strobelight

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

_subclasses

Do not decompose in functionalization/proxy tensor if autograd wouldn't have decomposed (#164939 )

2025-10-11 01:03:55 +00:00

_vendor

…

accelerator

Add unified memory APIs for torch.accelerator (#152932 )

2025-08-08 17:41:22 +00:00

amp

Revert "[AMP][Refactor] Simplify dtype support logic in autocast context manager (#163446 )"

2025-10-10 15:12:46 +00:00

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

autograd

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

backends

Revert "Add SVE128 ISA (#158932 )"

2025-10-10 01:17:02 +00:00

compiler

Pyrefly suppressions 6/n (#164877 )

2025-10-08 02:30:57 +00:00

contrib

…

cpu

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

csrc

[CodeClean] Replace std::runtime_error with TORCH_CHECK (#164130 )

2025-10-14 14:09:53 +00:00

cuda

error message for instantiating CUDA Stream if CUDA not available (#159868 )

2025-10-11 23:21:35 +00:00

distributed

Revert "[distributed] Replace assert statements with AssertionError exceptions (#165216 )"

2025-10-14 17:05:16 +00:00

distributions

[1/N] Use "is" in python type comparison (#165037 )

2025-10-10 12:36:50 +00:00

export

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

fft

[BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553 )

2025-06-17 08:18:47 +00:00

func

…

futures

[5/N] Apply ruff UP035 rule (#164423 )

2025-10-02 07:31:11 +00:00

[annotate] Annotation should be mapped across submod (#165202 )

2025-10-14 16:19:38 +00:00

headeronly

Mark unused parameters in C++ code (#164912 )

2025-10-09 06:23:25 +00:00

jit

Fix missing brackets (#165138 )

2025-10-10 17:23:31 +00:00

legacy

…

lib

[CodeClean] Replace std::runtime_error with TORCH_CHECK (#164129 )

2025-10-09 19:01:07 +00:00

linalg

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

masked

[1/N] Use "is" in python type comparison (#165037 )

2025-10-10 12:36:50 +00:00

monitor

…

mps

Add type annotations to MPS profiler utilities (#163486 )

2025-09-27 23:00:53 +00:00

mtia

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

multiprocessing

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

nativert

Revert "Add SVE128 ISA (#158932 )"

2025-10-10 01:17:02 +00:00

nested

[NJT] Fix schema validation error in jagged functions (#165307 )

2025-10-13 17:59:18 +00:00

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

numa

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

onnx

[ONNX] TorchTensor supports tofile() (#165195 )

2025-10-13 19:12:06 +00:00

optim

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

package

[1/N] Use "is" in python type comparison (#165037 )

2025-10-10 12:36:50 +00:00

profiler

Pyrefly suppressions 6/n (#164877 )

2025-10-08 02:30:57 +00:00

quantization

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

signal

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

sparse

Pyrefly suppressions 6/n (#164877 )

2025-10-08 02:30:57 +00:00

special

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

testing

Fix IValue from SymBool on big-endian system (#163647 )

2025-10-14 15:07:48 +00:00

utils

[torch/utils][Code Clean] Clean asserts in hipify/, jit/, model_dump and tensorboard of torch/utils (#165311 )

2025-10-14 15:26:23 +00:00

xpu

Add a new API torch.xpu.is_tf32_supported for Intel GPU (#163141 )

2025-10-12 12:11:57 +00:00

__config__.py

…

__future__.py

…

__init__.py

[2/N] Use "is" in python type comparison (#165142 )

2025-10-10 15:36:44 +00:00

_appdirs.py

…

_classes.py

remove allow-untyped-defs from torch/_classes.py (#157231 )

2025-07-08 00:11:52 +00:00

_compile.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

_custom_ops.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_environment.py

…

_guards.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

_jit_internal.py

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

_linalg_utils.py

Update is_sparse doc to mention that it is sparse_coo specific (#157378 )

2025-07-09 18:22:14 +00:00

_lobpcg.py

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

_lowrank.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_meta_registrations.py

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

_namedtensor_internals.py

…

_ops.py

[2/N] More ruff SIM fixes (#165031 )

2025-10-14 14:22:54 +00:00

_python_dispatcher.py

Typo fixes for "overridden" in comments and function names (#155944 )

2025-06-14 03:37:38 +00:00

_size_docs.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_sources.py

…

_storage_docs.py

Fix docstring for torch.UntypedStorage.from_file (#155067 )

2025-06-05 14:30:49 +00:00

_streambase.py

…

_tensor_docs.py

[reland] Allow setting grad_dtype on leaf tensors (#164751 )

2025-10-08 20:23:13 +00:00

_tensor_str.py

Pyrefly suppressions 6/n (#164877 )

2025-10-08 02:30:57 +00:00

_tensor.py

Pyrefly suppressions 6/n (#164877 )

2025-10-08 02:30:57 +00:00

_thread_safe_fork.py

…

_torch_docs.py

Update documentation for torch.index_select (#163616 )

2025-09-25 18:29:17 +00:00

_utils_internal.py

Revert "Call internal log_compilation_event if it exists (#164855 )"

2025-10-09 22:38:45 +00:00

_utils.py

Enable ruff rule E721 (#165162 )

2025-10-13 01:48:55 +00:00

_VF.py

…

_vmap_internals.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

_weights_only_unpickler.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

CMakeLists.txt

Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )"

2025-09-25 13:47:46 +00:00

custom_class_detail.h

Mark unused parameters in C++ code (#164912 )

2025-10-09 06:23:25 +00:00

custom_class.h

Mark unused parameters in C++ code (#164912 )

2025-10-09 06:23:25 +00:00

extension.h

…

functional.py

[2/N] Fix ruff warnings (#164460 )

2025-10-04 03:40:32 +00:00

header_only_apis.txt

Migrate DeviceType to torch/headeronly (#163999 )

2025-09-30 23:13:27 +00:00

hub.py

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

library.h

Mark unused parameters in C++ code (#164912 )

2025-10-09 06:23:25 +00:00

library.py

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

overrides.py

Add scaled_mm python API, test (#164142 )

2025-10-09 12:43:18 +00:00

py.typed

…

quasirandom.py

…

random.py

Revert "Add device argument to torch.random.get_rng_state (#163034 )"

2025-10-04 15:25:45 +00:00

return_types.py

…

script.h

…

serialization.py

Add initial suppressions for pyrefly (#164177 )

2025-10-02 20:57:41 +00:00

storage.py

[2/N] Use "is" in python type comparison (#165142 )

2025-10-10 15:36:44 +00:00

torch_version.py

…

types.py

[4/N] Apply ruff UP035 rule to python code (#164206 )

2025-10-01 19:05:53 +00:00

version.py.tpl

…