pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Blaine Burton Rister 60fe8a65af [Inductor] Generalize tiling algorithm to handle fused reductions (#144041 )

# Issue

This PR cleans up an edge case that wasn't handled by https://github.com/pytorch/pytorch/pull/137243. The existing tiling code assumes that `node.get_ranges()` is a reliable source of pointwise and reduction numels. This is true for pointwise kernels, but the situation is more complicated with reductions. Since reductions change the number of elements in a tensor, not all ops within a reduction kernel will have the same number of iterations. For example, `var_mean` fuses pointwise division with the output of reduction sum, and the division lacks the corresponding reduction ranges.

# Fix

Instead of getting numels from `node.get_ranges()`, explicitly pass the global pointwise and reduction numels to the relevant tiling functions. In `SIMDKernel.complete_partial_tiling`, we solve for the missing numel by diving the global numel by the partial tiling's numel. This ensures all tilings have the correct global numel.

Also, in `SIMDKernel.is_compatible`, add the global reduction numel to node ranges that are missing it. For example, `{"x": 8, "r0_": 8}` is compatible with a node of ranges `([8], [])` when we have `reduction_numel=8`.

Finally, this PR generalizes some of the existing codegen to handle multiple reduction dims. We already had code to ignore reduction splits for pointwise kernels, but it only worked for 1D reductions. Now it can handle ND.

# Test plan

This PR parametrizes the existing CI test for `var_mean` to also run with tiled reductions. It also adds a new test checking that `var_mean` generates 2D tilings (with tiled reduction enabled). These new tests would fail on the current main branch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144041
Approved by: https://github.com/jansel

2025-01-03 18:16:27 +00:00

_awaits

Remove unused imported names in python files (#134438 )

2024-08-27 20:44:04 +00:00

Add get_stream_from_external API for CUDA backend (#143799 )

2024-12-31 11:15:59 +00:00

_C_flatbuffer

…

_custom_op

Tighten torch.library.infer_schema input types (#130705 )

2024-07-29 16:01:19 +00:00

_decomp

[Inductor][CPU] disable bernoulli_p decomposition (#143460 )

2024-12-19 11:21:35 +00:00

_dispatch

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_dynamo

[dynamo][BE] move zip_longest polyfill to submodule polyfills.itertools (#144067 )

2025-01-03 08:08:31 +00:00

_export

remove allow-untyped-defs from _export/pass_infra/proxy_value.py (#143944 )

2025-01-02 18:17:03 +00:00

_functorch

[1/n] Support Dynamic Memory Budget in Auto AC (#143539 )

2024-12-21 07:38:52 +00:00

_higher_order_ops

[user triton] Raise an exception when encountering nested @triton.autotune decorators or @triton.heuristics (#143519 )

2024-12-20 06:38:45 +00:00

_inductor

[Inductor] Generalize tiling algorithm to handle fused reductions (#144041 )

2025-01-03 18:16:27 +00:00

_lazy

remove allow-untyped-defs from torch/_lazy/config.py (#143603 )

2024-12-20 05:34:19 +00:00

_library

Revert "[export] don't decompose custom triton op when exporting (#142426 )"

2024-12-19 21:21:38 +00:00

_logging

Revert "Use absolute path path.resolve() -> path.absolute() (#129409 )"

2024-12-26 17:26:06 +00:00

_numpy

[BE][CI] bump ruff to 0.8.4 (#143753 )

2024-12-24 12:24:10 +00:00

_prims

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_prims_common

Pass allow_rhs_unbacked to the stride test in metadata test too (#143040 )

2024-12-19 09:37:50 +00:00

_refs

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_strobelight

Propagate callable parameter types using ParamSpec (#142306 ) (#143797 )

2024-12-29 23:03:14 +00:00

_subclasses

Propagate callable parameter types using ParamSpec (#142306 ) (#143797 )

2024-12-29 23:03:14 +00:00

_vendor

…

accelerator

torch/accelerator: fix device type comparison (#143541 )

2024-12-23 10:54:53 +00:00

amp

[MPS] Add support for bf16 autocast (#139390 )

2024-11-20 19:52:28 +00:00

remove allow-untyped-defs from ao/quantization/experimental/fake_quantize.py (#144091 )

2025-01-03 01:26:36 +00:00

autograd

[BE][CI] bump ruff to 0.8.4 (#143753 )

2024-12-24 12:24:10 +00:00

backends

[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )

2024-12-20 19:32:03 +00:00

compiler

[export] add is_exporting flag (#142425 )

2024-12-18 21:36:28 +00:00

contrib

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

cpu

[Inductor][CPP] Add oneDNN BRGEMM config for Half cpp gemm template (#136255 )

2024-11-05 05:33:29 +00:00

csrc

cpp_wrapper: Use runtime dispatched fallbacks for complex ops (#143223 )

2025-01-03 16:05:38 +00:00

cuda

Refine CUDA Stream priority (#143849 )

2024-12-31 11:15:59 +00:00

distributed

[dtensor] improve doc of the DTensor class (#144099 )

2025-01-03 05:35:44 +00:00

distributions

Remove some unused type ignores (round 1) (#142325 )

2024-12-09 18:23:46 +00:00

export

Enable ruff's unused variable checking everywhere in pytorch (#136965 )

2024-12-22 02:33:11 +00:00

fft

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

func

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

futures

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

subgraph rewriter supports matched pattern with no users (#143842 )

2024-12-27 12:45:39 +00:00

jit

Add warning to torch.jit.load (#143403 )

2024-12-18 00:17:41 +00:00

legacy

…

lib

Add and use thread-safe strerror (#140472 )

2024-11-19 04:24:17 +00:00

linalg

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

masked

remove allow-untyped-defs for torch/masked/maskedtensor/creation.py (#143321 )

2024-12-17 16:44:50 +00:00

monitor

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

mps

remove allow-untyped-defs from torch/mps/event.py (#144092 )

2025-01-03 01:20:17 +00:00

mtia

Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347 )"

2024-12-21 04:04:16 +00:00

multiprocessing

[BE][CI] bump ruff to 0.8.4 (#143753 )

2024-12-24 12:24:10 +00:00

nested

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

Fix batch-specific attention mod for NJT + Flex (#143866 )

2024-12-27 20:51:41 +00:00

onnx

remove allow-untyped-defs from onnx/_internal/_lazy_import.py (#143943 )

2024-12-29 10:29:43 +00:00

optim

Adding support for differentiable lr, weight_decay, and betas in Adam/AdamW (#143726 )

2024-12-30 01:11:57 +00:00

package

[Torch.package] Add support for UntypedStorage tensors (#143930 )

2024-12-30 02:03:52 +00:00

profiler

[pytorch/et] Allow ET to save additional resources for completing a trace like generated kernels and index tensor data (#143775 )

2024-12-26 21:15:39 +00:00

quantization

Remove unused Python variables in torch/[b-z]* (#136963 )

2024-10-19 16:45:22 +00:00

signal

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

sparse

[sparse] add extra options to _cslt_spare_mm (#137427 )

2024-11-27 05:32:45 +00:00

special

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

testing

[inductor] Add missing py312 xfail (#144006 )

2024-12-31 23:37:05 +00:00

utils

Dataloader distribute tasks to workers when in_order is False (#142324 )

2025-01-03 12:57:04 +00:00

xpu

Add get_stream_from_external API for XPU backend (#141123 )

2024-12-31 11:15:52 +00:00

__config__.py

remove allow-untyped-defs for torch/__config__.py (#143320 )

2024-12-17 00:16:09 +00:00

__future__.py

…

__init__.py

Rename cache limit to recompile limit in configs (#143709 )

2024-12-22 10:03:57 +00:00

_appdirs.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_classes.py

Add None return type to init (#132335 )

2024-08-01 15:26:45 +00:00

_compile.py

[BE] Format uncategorized Python files with ruff format (#132576 )

2024-08-04 17:13:31 +00:00

_custom_ops.py

[BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765 )

2024-07-31 10:42:50 +00:00

_deploy.py

Remove unused imported names in python files (#134438 )

2024-08-27 20:44:04 +00:00

_environment.py

Improve is_fbcode functionality (#136871 )

2024-09-27 21:19:01 +00:00

_guards.py

[ca] add compiled autograd to CompileId (#141907 )

2024-12-21 00:41:24 +00:00

_jit_internal.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_linalg_utils.py

[BE] Format uncategorized Python files with ruff format (#132576 )

2024-08-04 17:13:31 +00:00

_lobpcg.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_lowrank.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_meta_registrations.py

[ROCm] Add miopen_batch_norm to meta_registrations to fix AOTI issue (#143569 )

2024-12-24 23:43:11 +00:00

_namedtensor_internals.py

[BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765 )

2024-07-31 10:42:50 +00:00

_ops.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_python_dispatcher.py

Add None return type to init (#132335 )

2024-08-01 15:26:45 +00:00

_size_docs.py

remove allow-untyped-defs from torch/_size_docs.py (#143942 )

2024-12-29 01:00:46 +00:00

_sources.py

…

_storage_docs.py

[BE] Format uncategorized Python files with ruff format (#132576 )

2024-08-04 17:13:31 +00:00

_streambase.py

Use torch.Stream&torch.Event for Dynamo capature (#134850 )

2024-10-02 14:15:33 +00:00

_tensor_docs.py

Revert "Add deterministic path for CUDA cumsum (#136224 )"

2024-09-27 12:54:47 +00:00

_tensor_str.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_tensor.py

__cuda_array_interface__: Use "<V2" for bfloat16. (#143042 )

2024-12-14 06:27:52 +00:00

_thread_safe_fork.py

[inductor] parallel compile: add import of thread_safe_fork for internal (#137155 )

2024-10-03 17:37:21 +00:00

_torch_docs.py

[Easy] Add torch.range, torch.arange params optional description (#143731 )

2024-12-24 01:29:24 +00:00

_utils_internal.py

[reland] Kill capture_pre_autograd_graph API (#143426 )

2024-12-18 12:07:09 +00:00

_utils.py

Reraise worker errors as runtime errors in more cases when the original exception can't be constructed (#140911 )

2024-12-14 03:11:36 +00:00

_VF.py

Clean up RemoteCache classes (#134032 )

2024-08-31 20:18:59 +00:00

_vmap_internals.py

…

_weights_only_unpickler.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

abi-check.cpp

…

CMakeLists.txt

export AOTI_TORCH_EXPORT on Windows. (#140030 )

2025-01-03 05:41:06 +00:00

custom_class_detail.h

Enable readability-redundant-declaration (#143982 )

2024-12-31 00:20:10 +00:00

custom_class.h

Remove some pre-cpp17 stuff (#138410 )

2024-10-23 00:38:03 +00:00

extension.h

…

functional.py

Clarify opt-einsum usage, fix #127109 (#137596 )

2024-10-09 20:31:24 +00:00

hub.py

Remove unused Python variables in torch/[b-z]* (#136963 )

2024-10-19 16:45:22 +00:00

library.h

Enable more readability-redundant checks (#143963 )

2024-12-30 14:49:33 +00:00

library.py

make it clearer (in docs) one can double decorate with torch.library.impl_* APIs (#137608 )

2024-12-17 15:13:58 +00:00

overrides.py

[dim_order] raised runtime error when tensor has ambiguous dim order (#141632 )

2024-12-08 23:16:57 +00:00

py.typed

…

quasirandom.py

[BE] Format uncategorized Python files with ruff format (#132576 )

2024-08-04 17:13:31 +00:00

random.py

[Torch] Support meta device in random.fork_rng (#137715 )

2024-10-16 18:00:39 +00:00

README.txt

…

return_types.py

…

script.h

…

serialization.py

Add config.save.use_pinned_memory_for_d2h to serialization config (#143342 )

2024-12-20 21:01:18 +00:00

storage.py

Fix .to(cpu) for Storage (#138011 )

2024-10-23 01:31:48 +00:00

torch_version.py

Add mypy typing to torch_version.py (#131447 )

2024-07-23 17:31:07 +00:00

types.py

[BE] better type annotation for torch.types (#129559 )

2024-09-02 15:35:32 +00:00

version.py.tpl

…

README.txt

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.