pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Mwiza Kunda 03798b0f91 [inductor] Fix removal of constexpr args from the launcher signature (#161924 )

Fixes the case described below which occurs when:
- A user `torch.compile`s a function that uses a triton kernel.
- `TORCHINDUCTOR_DUMP_LAUNCH_PARAMS=1` .

Problem:

If the user defined triton kernel is not autotuned:

```python
import os
os.environ["TORCHINDUCTOR_DUMP_LAUNCH_PARAMS"] = "1"
@triton.jit
def kernel(..., BLOCK_SIZE: tl.constexpr):
    ...
@torch.compile
def fn(..)
    kernel[..](..., 128)

fn(..)
```

Then In `triton_heuristics. _interpret_args_grid`, `filter_signature` function:

```python
        def filtered_signature() -> list[str]:
                # constexprs are not passed in as args
                return [
                    x
                    for x in self.triton_meta["signature"].keys()
                    if x not in cfg.kwargs.keys()
                ]
```

because `triton.autotune` is not used on the the `triton.jit` function, `cfg` above will be empty, and so `BLOCK_SIZE` will not be removed from the signature even though it is constexpr, even though it is removed from the arguments that are passed in to `interpret_args_grid`. This results in a mismatch between the number of parameters in the signature and the number of arguments, which leads to the error `NameError: name '_grid_2' is not defined`.

Fix:

Use the triton jit kernel `constexprs` for args to remove.  Not sure if this is a good fix so suggestions are welcome.

Test plan:

Added a parameter to an existing triton kernel to test for this edge case

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161924
Approved by: https://github.com/davidberard98

2025-09-12 13:58:09 +00:00

_awaits

…

[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )

2025-09-12 10:54:42 +00:00

_C_flatbuffer

…

_custom_op

[BE]: ruff PLC0207 - use maxsplit kwarg (#160107 )

2025-08-08 03:14:59 +00:00

_decomp

Revert "[indexing] Prevent integer overflow from large step values in C++ (#161707 )"

2025-09-12 06:49:36 +00:00

_dispatch

Improve torch.ops typing (#154555 )

2025-06-22 15:52:27 +00:00

_dynamo

[Dynamo] Don't guard data ptrs by default with mark_static_address (#162208 )

2025-09-12 07:15:10 +00:00

_export

Support vmap + custom autograd function/improve DTensor constructor inefficiency (#162240 )

2025-09-11 17:42:41 +00:00

_functorch

Disable autocast when running joint graph passes (#162304 )

2025-09-06 00:57:58 +00:00

_higher_order_ops

[associative_scan] partial gradient support (#162388 )

2025-09-09 23:52:29 +00:00

_inductor

[inductor] Fix removal of constexpr args from the launcher signature (#161924 )

2025-09-12 13:58:09 +00:00

_lazy

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_library

Avoid double hash lookup in torch._library.simple_registry (#161328 )

2025-08-30 06:55:43 +00:00

_logging

Add compile_id: Optional[CompileID] to torch._logging._internal.trace_structured_artifact (#160440 )

2025-08-13 06:28:23 +00:00

_numpy

Fix torch._numpy to match NumPy when empty ellipsis causes advanced indexing separation (#158297 )

2025-07-16 08:11:53 +00:00

_prims

[dynamic shapes] unbacked-safe should_swap (#160473 )

2025-09-11 18:51:25 +00:00

_prims_common

[dynamic shapes] unbacked-safe should_swap (#160473 )

2025-09-11 18:51:25 +00:00

_refs

[dynamic shapes] unbacked-safe should_swap (#160473 )

2025-09-11 18:51:25 +00:00

_strobelight

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_subclasses

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

_vendor

…

accelerator

Add unified memory APIs for torch.accelerator (#152932 )

2025-08-08 17:41:22 +00:00

amp

Optimize AMP custom_backend_name error message (#162037 )

2025-09-04 08:27:56 +00:00

[torchao][pt2e] Make prepare and convert faster by caching (#162550 )

2025-09-11 07:59:22 +00:00

autograd

[ONNX] Refactor torchscript based exporter (#161323 )

2025-09-02 16:10:30 +00:00

backends

Revert "[ROCm] SDPA fix mem fault when dropout is enabled (#154864 )"

2025-08-26 20:03:59 +00:00

compiler

[easy] [precompile] Convert CompileArtifacts to callable (#162169 )

2025-09-07 23:37:31 +00:00

contrib

…

cpu

Replace _device_t with torch.types.Device in torch/cpu/__init__.py (#161031 )

2025-08-21 00:22:43 +00:00

csrc

[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )

2025-09-12 10:54:42 +00:00

cuda

[ROCm] Support torch.cuda._compile_kernel (#162510 )

2025-09-12 00:18:47 +00:00

distributed

[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )

2025-09-12 10:54:42 +00:00

distributions

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

export

fix var args for shape guards (#162633 )

2025-09-12 00:33:35 +00:00

fft

[BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553 )

2025-06-17 08:18:47 +00:00

func

…

futures

Simplify the base classes of _PyFutureMeta (#157757 )

2025-07-08 15:39:56 +00:00

repro 161902 (#162416 )

2025-09-11 16:35:23 +00:00

headeronly

CUDA 13 -- sm_120 -- Nvidia 5090 -- ptxas warning : Value of threads … (#161380 )

2025-09-02 13:27:57 +00:00

jit

[4/n] Remove references to TorchScript in PyTorch docs (#158317 )

2025-07-16 20:01:34 +00:00

legacy

…

lib

[2/N] Fix cppcoreguidelines-init-variables suppression (#146237 )

2025-06-19 23:26:42 +00:00

linalg

Revert "Add __init__.pyi to torch/linalg (#160750 )"

2025-09-02 16:53:55 +00:00

masked

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

monitor

…

mps

[BE][12/16] fix typos in torch/ (#156602 )

2025-07-02 22:55:29 +00:00

mtia

[Re-land][Inductor] Support native Inductor as backend for MTIA (#159211 )

2025-07-29 17:03:24 +00:00

multiprocessing

Allow parallel start NUMA binding (#161576 )

2025-08-28 01:15:58 +00:00

nativert

[nativert] aoti (#162353 )

2025-09-12 05:56:25 +00:00

nested

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

fix: raise value error on init ParametrizationList if original.device != new.device (#162717 )

2025-09-11 23:07:58 +00:00

numa

Allow parallel start NUMA binding (#161576 )

2025-08-28 01:15:58 +00:00

onnx

[ONNX] Support enable_gqa when dropout is non-zero (#162771 )

2025-09-12 04:00:57 +00:00

optim

Unify TypeAlias definitions in optimizer.py (#161493 )

2025-08-30 00:35:02 +00:00

package

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

profiler

removed duplicate imports (#161685 )

2025-08-31 16:21:49 +00:00

quantization

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

signal

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

sparse

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

special

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

testing

[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )

2025-09-12 10:54:42 +00:00

utils

[ROCm/Windows] Support load_inline on windows (#162577 )

2025-09-12 08:10:07 +00:00

xpu

Add uuid to XPU device properties (#161392 )

2025-09-02 06:41:32 +00:00

__config__.py

…

__future__.py

…

__init__.py

added example for torch.is_storage (#162614 )

2025-09-11 20:25:26 +00:00

_appdirs.py

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

_classes.py

remove allow-untyped-defs from torch/_classes.py (#157231 )

2025-07-08 00:11:52 +00:00

_compile.py

[precompile] Ensure @disable()-ed function won't trigger recompile from precompile bytecode. (#155363 )

2025-06-10 16:13:38 +00:00

_custom_ops.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_environment.py

…

_guards.py

fix incorrect interaction between DDPOptimizer and donated buffers (#160745 )

2025-09-04 21:57:27 +00:00

_jit_internal.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_linalg_utils.py

Update is_sparse doc to mention that it is sparse_coo specific (#157378 )

2025-07-09 18:22:14 +00:00

_lobpcg.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_lowrank.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_meta_registrations.py

[mxfp8 torch._scaled_grouped_mm] fix meta registration for 3d tensor (#162765 )

2025-09-12 03:51:52 +00:00

_namedtensor_internals.py

…

_ops.py

Enable XPU path for FlexAttention (#143553 )

2025-08-29 23:10:58 +00:00

_python_dispatcher.py

Typo fixes for "overridden" in comments and function names (#155944 )

2025-06-14 03:37:38 +00:00

_size_docs.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_sources.py

…

_storage_docs.py

Fix docstring for torch.UntypedStorage.from_file (#155067 )

2025-06-05 14:30:49 +00:00

_streambase.py

…

_tensor_docs.py

Add missing optional for tensor ops (#159028 )

2025-07-25 04:36:55 +00:00

_tensor_str.py

Fix max_width computation in _tensor_str._Formatter (#126859 )

2025-08-01 15:05:41 +00:00

_tensor.py

[dynamic shapes] unbacked-safe should_swap (#160473 )

2025-09-11 18:51:25 +00:00

_thread_safe_fork.py

…

_torch_docs.py

Update docs for quantile to be clearer for nearest (#162423 )

2025-09-09 18:04:12 +00:00

_utils_internal.py

Allow for using a dedicated binary for the torch subproc pool. (#162093 )

2025-09-05 01:43:46 +00:00

_utils.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_VF.py

…

_vmap_internals.py

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

_weights_only_unpickler.py

Fix type checking for persistent loads in the weights-only unpickler (#161661 )

2025-09-01 19:57:19 +00:00

CMakeLists.txt

[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )

2025-09-12 10:54:42 +00:00

custom_class_detail.h

…

custom_class.h

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

extension.h

…

functional.py

unify broadcast_shapes functions and avoid duplicates (#160251 )

2025-08-16 00:54:32 +00:00

header_only_apis.txt

[Reland] Migrate ScalarType to headeronly (#159911 )

2025-08-06 07:36:37 +00:00

hub.py

Allow torch.hub.load with unauthorized GITHUB_TOKEN (#159896 )

2025-08-14 18:15:49 +00:00

library.h

Using std::make_unique<T>() instead of unique<T>(new T()) (#160723 )

2025-08-19 10:25:47 +00:00

library.py

Leak Python filenames so that we can give good dispatcher errors. (#160418 )

2025-08-31 22:31:39 +00:00

overrides.py

Add torch.Tensor._make_dtensor to accelerate DTensor.__new__ further (#161590 )

2025-09-05 18:43:41 +00:00

py.typed

…

quasirandom.py

…

random.py

Update description for torch.random.fork_rng (#151881 )

2025-04-23 16:59:29 +00:00

return_types.py

…

script.h

…

serialization.py

added class or module info for functions blocked by weight-only load (#159935 )

2025-08-12 20:52:25 +00:00

storage.py

mypy 1.16.0 (#155821 )

2025-06-14 18:18:43 +00:00

torch_version.py

…

types.py

…

version.py.tpl

…