pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Pian Pawakapan 2a9745de3c [multi-kernel] shape-similarity kernel selection (#163090 )

Introduces a variant of size-hint multi-kernel, where for novel runtime shapes, instead of performing full benchmarking to determine the optimal kernel, selects one of many kernels pre-generated from multi-kernel hints, based off similarity b/w hint / runtime input & output shapes (L1 distance in log2 space).

Some caveats/changes:
- Size-hint multi-kernel now only kicks in if the kernel has dynamic shapes
- Pre-generation still only does 1-d search over specified hints, e.g. `matmul([s0, s1], [s1, s2])` with size-hints `[64, 256]` only generates 2 kernels - based on tuning shapes ([64, 64], [64, 64]) and ([256, 256], [256, 256]). Extending this to reasonable n-d search (via user API?) is an extension

Benchmarking results, compared to multi-kernel w/ full benchmarking (hints 64, 4096), and compiling with the ground truth hint:
<img width="1902" height="1222" alt="550541081_1088709150049684_6528797079439730237_n" src="https://github.com/user-attachments/assets/056cca48-c16a-4451-9b4a-fa13a7a058a9" />

Full benchmarking doing worse is extremely weird, but we did see similar spikes in #156628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163090
Approved by: https://github.com/bobrenjc93

2025-09-23 21:00:47 +00:00

_awaits

…

[torch][cuda][device_limits] Library for querying device hardware limits for flops and bandwidth (#162942 )

2025-09-23 04:48:19 +00:00

_C_flatbuffer

…

_custom_op

[BE]: ruff PLC0207 - use maxsplit kwarg (#160107 )

2025-08-08 03:14:59 +00:00

_decomp

support unbacked softmax / logsoftmax (#162216 )

2025-09-18 15:43:20 +00:00

_dispatch

Improve torch.ops typing (#154555 )

2025-06-22 15:52:27 +00:00

_dynamo

Revert "[precompile] Add option to disable guard check on aot-compiled function. (#163432 )"

2025-09-23 16:31:30 +00:00

_export

Improve fake tensor leakage detection in export by not relying on gc too much (#163516 )

2025-09-22 22:04:24 +00:00

_functorch

Enable logging for absolute memory estimation (#158799 )

2025-09-22 18:36:49 +00:00

_higher_order_ops

[export] Fix wrap_with_set_grad_enabled retracing (#163295 )

2025-09-21 22:54:40 +00:00

_inductor

[multi-kernel] shape-similarity kernel selection (#163090 )

2025-09-23 21:00:47 +00:00

_lazy

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_library

[ignore][codex-test] Add typing to simple library registry (#161367 )

2025-09-23 02:08:55 +00:00

_logging

Add compile_id: Optional[CompileID] to torch._logging._internal.trace_structured_artifact (#160440 )

2025-08-13 06:28:23 +00:00

_numpy

Fix torch._numpy to match NumPy when empty ellipsis causes advanced indexing separation (#158297 )

2025-07-16 08:11:53 +00:00

_prims

[Bugfix] Match eager stride semantics for cloned tensors with preserve_format in compile (#163017 )

2025-09-19 19:41:33 +00:00

_prims_common

are_strides_like_channels_last_or_false (#162354 )

2025-09-16 00:49:05 +00:00

_refs

Better decomp for torch.eye (#163386 )

2025-09-22 21:52:37 +00:00

_strobelight

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

_subclasses

Improve fake tensor leakage detection in export by not relying on gc too much (#163516 )

2025-09-22 22:04:24 +00:00

_vendor

…

accelerator

Add unified memory APIs for torch.accelerator (#152932 )

2025-08-08 17:41:22 +00:00

amp

[Easy][AMP] Refactor the AMP logic for getting dtype (#162796 )

2025-09-21 06:32:35 +00:00

remove allow-untyped-defs from ./torch/ao/quantization/pt2e/duplicate_dq_pass.py (#163470 )

2025-09-22 20:29:09 +00:00

autograd

[ONNX] Refactor torchscript based exporter (#161323 )

2025-09-02 16:10:30 +00:00

backends

Revert "[ROCm] SDPA fix mem fault when dropout is enabled (#154864 )"

2025-08-26 20:03:59 +00:00

compiler

Simplify PrecompileContext to no longer be a CacheArtifactManager (#162886 )

2025-09-20 01:24:37 +00:00

contrib

…

cpu

Replace _device_t with torch.types.Device in torch/cpu/__init__.py (#161031 )

2025-08-21 00:22:43 +00:00

csrc

[AOTI] Fix model_package_loader get_cpp_compile_command (#163561 )

2025-09-23 17:38:18 +00:00

cuda

CUDA 13.0 Warning update for supported architectures (#163585 )

2025-09-23 11:27:11 +00:00

distributed

[DCP] DTensor slice dequantization with proper block alignment (#163532 )

2025-09-23 16:48:16 +00:00

distributions

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

export

[export] handling NamedTuple inputs (#162959 )

2025-09-23 17:43:50 +00:00

fft

[BE][PYFMT] migrate PYFMT for torch/[e-n]*/ to ruff format (#144553 )

2025-06-17 08:18:47 +00:00

func

…

futures

Simplify the base classes of _PyFutureMeta (#157757 )

2025-07-08 15:39:56 +00:00

Improve fake tensor leakage detection in export by not relying on gc too much (#163516 )

2025-09-22 22:04:24 +00:00

headeronly

Add CUDA_KERNEL_ASSERT_PRINTF, a more flexible CUDA_KERNEL_ASSERT_MSG (#160129 )

2025-09-16 00:23:48 +00:00

jit

Deprecate Lite Interpreter (#163289 )

2025-09-18 23:56:21 +00:00

legacy

…

lib

[2/N] Fix cppcoreguidelines-init-variables suppression (#146237 )

2025-06-19 23:26:42 +00:00

linalg

Revert "Add __init__.pyi to torch/linalg (#160750 )"

2025-09-02 16:53:55 +00:00

masked

Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous. [attempt2] (#160869 )

2025-09-08 22:59:13 +00:00

monitor

…

mps

[BE][12/16] fix typos in torch/ (#156602 )

2025-07-02 22:55:29 +00:00

mtia

[BE] Add Documentation for Device APIs (#162834 )

2025-09-16 17:01:06 +00:00

multiprocessing

Allow parallel start NUMA binding (#161576 )

2025-08-28 01:15:58 +00:00

nativert

Update placement utils and weights to handle meta device (#162842 )

2025-09-17 08:12:32 +00:00

nested

Add NestedTensor dispatch for _is_any_true/_is_all_true (#162096 )

2025-09-22 20:22:44 +00:00

[Flex attention] Fix flex attention head broadcast (#163426 )

2025-09-23 13:01:51 +00:00

numa

Allow parallel start NUMA binding (#161576 )

2025-08-28 01:15:58 +00:00

onnx

remove allow-untyped-defs from ./torch/onnx/_internal/torchscript_exporter/_globals.py (#163472 )

2025-09-23 03:50:29 +00:00

optim

[optim] override SWALR.state_dict and load_state_dict (#163122 )

2025-09-17 18:17:26 +00:00

package

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

profiler

removed duplicate imports (#161685 )

2025-08-31 16:21:49 +00:00

quantization

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

signal

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

sparse

Use computed buffer sizes of torch for cusparseLt metadata (#163125 )

2025-09-19 22:12:40 +00:00

special

[BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552 )

2025-08-07 00:09:56 +00:00

testing

Use accelerator API in common_dtensor (#163498 )

2025-09-23 16:30:20 +00:00

utils

[BC breaking] Remove deprecated imports for torch.utils.data.datapipes.iter.grouping (#163438 )

2025-09-23 05:02:06 +00:00

xpu

Add a new API torch.xpu.can_device_access_peer for Intel GPU (#162705 )

2025-09-16 18:00:22 +00:00

__config__.py

…

__future__.py

…

__init__.py

Turn on capture_dynamic_output_shape_ops when fullgraph=True (#163123 )

2025-09-18 21:24:15 +00:00

_appdirs.py

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

_classes.py

remove allow-untyped-defs from torch/_classes.py (#157231 )

2025-07-08 00:11:52 +00:00

_compile.py

Replace Literal[None] with None in typing (#163489 )

2025-09-22 22:10:08 +00:00

_custom_ops.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_environment.py

…

_guards.py

fix incorrect interaction between DDPOptimizer and donated buffers (#160745 )

2025-09-04 21:57:27 +00:00

_jit_internal.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_linalg_utils.py

Update is_sparse doc to mention that it is sparse_coo specific (#157378 )

2025-07-09 18:22:14 +00:00

_lobpcg.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_lowrank.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_meta_registrations.py

[inductor] Support out_dtype arg to matmul (#163393 )

2025-09-23 15:37:38 +00:00

_namedtensor_internals.py

…

_ops.py

[BE] Slight improvements to documentation in python_dispatch (#162963 )

2025-09-21 01:45:46 +00:00

_python_dispatcher.py

Typo fixes for "overridden" in comments and function names (#155944 )

2025-06-14 03:37:38 +00:00

_size_docs.py

Render Example: and not Example:: in docs (#153978 )

2025-05-21 01:03:26 +00:00

_sources.py

…

_storage_docs.py

Fix docstring for torch.UntypedStorage.from_file (#155067 )

2025-06-05 14:30:49 +00:00

_streambase.py

…

_tensor_docs.py

Add missing optional for tensor ops (#159028 )

2025-07-25 04:36:55 +00:00

_tensor_str.py

Fix max_width computation in _tensor_str._Formatter (#126859 )

2025-08-01 15:05:41 +00:00

_tensor.py

torchdim Python port (#160236 )

2025-09-21 03:01:04 +00:00

_thread_safe_fork.py

…

_torch_docs.py

Update docs for quantile to be clearer for nearest (#162423 )

2025-09-09 18:04:12 +00:00

_utils_internal.py

Add DISABLE_JUSTKNOBS to torch/_utils_internal.py and use it for dynamo _maybe_set_eval_frame (#162298 )

2025-09-15 23:00:39 +00:00

_utils.py

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

_VF.py

…

_vmap_internals.py

Fix broken URLs (#152237 )

2025-04-27 09:56:42 +00:00

_weights_only_unpickler.py

Fix type checking for persistent loads in the weights-only unpickler (#161661 )

2025-09-01 19:57:19 +00:00

CMakeLists.txt

[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )

2025-09-22 21:12:18 +00:00

custom_class_detail.h

…

custom_class.h

[BE][1/16] fix typos in torch/ (#156311 )

2025-07-09 11:02:22 +00:00

extension.h

…

functional.py

unify broadcast_shapes functions and avoid duplicates (#160251 )

2025-08-16 00:54:32 +00:00

header_only_apis.txt

[Reland] Migrate ScalarType to headeronly (#159911 )

2025-08-06 07:36:37 +00:00

hub.py

Allow torch.hub.load with unauthorized GITHUB_TOKEN (#159896 )

2025-08-14 18:15:49 +00:00

library.h

Using std::make_unique<T>() instead of unique<T>(new T()) (#160723 )

2025-08-19 10:25:47 +00:00

library.py

Replace Literal[None] with None in typing (#163489 )

2025-09-22 22:10:08 +00:00

overrides.py

Fully native DTensor.__new__ (#162508 )

2025-09-21 18:36:05 +00:00

py.typed

…

quasirandom.py

…

random.py

Update description for torch.random.fork_rng (#151881 )

2025-04-23 16:59:29 +00:00

return_types.py

…

script.h

…

serialization.py

added class or module info for functions blocked by weight-only load (#159935 )

2025-08-12 20:52:25 +00:00

storage.py

mypy 1.16.0 (#155821 )

2025-06-14 18:18:43 +00:00

torch_version.py

…

types.py

…

version.py.tpl

…