pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Xiaodong Wang 0a94bb432e [ROCm] CK Flash Attention Backend (#143695 )

Replace https://github.com/pytorch/pytorch/pull/138947 for re-import.

Replaces https://github.com/ROCm/pytorch/pull/1592

This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling torch.backends.cuda.preferred_rocm_fa_library("ck"). Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via USE_FLASH_ATTENTION) and is selected at runtime by the existing heuristics.

Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author

NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143695
Approved by: https://github.com/malfet

Co-authored-by: Andy Lugo <Andy.LugoReyes@amd.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>

2025-01-03 22:01:36 +00:00

_awaits

Remove unused imported names in python files (#134438 )

2024-08-27 20:44:04 +00:00

[ROCm] CK Flash Attention Backend (#143695 )

2025-01-03 22:01:36 +00:00

_C_flatbuffer

…

_custom_op

Tighten torch.library.infer_schema input types (#130705 )

2024-07-29 16:01:19 +00:00

_decomp

[Inductor][CPU] disable bernoulli_p decomposition (#143460 )

2024-12-19 11:21:35 +00:00

_dispatch

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_dynamo

[ROCm] CK Flash Attention Backend (#143695 )

2025-01-03 22:01:36 +00:00

_export

remove allow-untyped-defs from _export/db/logging.py (#144093 )

2025-01-03 19:36:14 +00:00

_functorch

[1/n] Support Dynamic Memory Budget in Auto AC (#143539 )

2024-12-21 07:38:52 +00:00

_higher_order_ops

[while_loop][dynamo] auto-unspecialize int input and output to unbacked symints (#143106 )

2025-01-03 19:01:07 +00:00

_inductor

Use absolute path path.resolve() -> path.absolute() (#129409 )

2025-01-03 20:03:40 +00:00

_lazy

remove allow-untyped-defs from torch/_lazy/config.py (#143603 )

2024-12-20 05:34:19 +00:00

_library

Revert "[export] don't decompose custom triton op when exporting (#142426 )"

2024-12-19 21:21:38 +00:00

_logging

Use absolute path path.resolve() -> path.absolute() (#129409 )

2025-01-03 20:03:40 +00:00

_numpy

[BE][CI] bump ruff to 0.8.4 (#143753 )

2024-12-24 12:24:10 +00:00

_prims

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_prims_common

Pass allow_rhs_unbacked to the stride test in metadata test too (#143040 )

2024-12-19 09:37:50 +00:00

_refs

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_strobelight

Propagate callable parameter types using ParamSpec (#142306 ) (#143797 )

2024-12-29 23:03:14 +00:00

_subclasses

Propagate callable parameter types using ParamSpec (#142306 ) (#143797 )

2024-12-29 23:03:14 +00:00

_vendor

…

accelerator

torch/accelerator: fix device type comparison (#143541 )

2024-12-23 10:54:53 +00:00

amp

[MPS] Add support for bf16 autocast (#139390 )

2024-11-20 19:52:28 +00:00

remove allow-untyped-defs from ao/quantization/experimental/fake_quantize.py (#144091 )

2025-01-03 01:26:36 +00:00

autograd

[BE][CI] bump ruff to 0.8.4 (#143753 )

2024-12-24 12:24:10 +00:00

backends

[ROCm] CK Flash Attention Backend (#143695 )

2025-01-03 22:01:36 +00:00

compiler

[export] add is_exporting flag (#142425 )

2024-12-18 21:36:28 +00:00

contrib

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

cpu

[Inductor][CPP] Add oneDNN BRGEMM config for Half cpp gemm template (#136255 )

2024-11-05 05:33:29 +00:00

csrc

[ROCm] CK Flash Attention Backend (#143695 )

2025-01-03 22:01:36 +00:00

cuda

Refine CUDA Stream priority (#143849 )

2024-12-31 11:15:59 +00:00

distributed

[dtensor] expose the __create_chunk_list__ in the doc (#144100 )

2025-01-03 20:06:23 +00:00

distributions

Remove some unused type ignores (round 1) (#142325 )

2024-12-09 18:23:46 +00:00

export

remove allow-untyped-defs from export/_remove_auto_functionalized_pass.py (#144135 )

2025-01-03 20:08:11 +00:00

fft

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

func

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

futures

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

subgraph rewriter supports matched pattern with no users (#143842 )

2024-12-27 12:45:39 +00:00

jit

remove allow-untyped-defs from torch/jit/_passes/_property_propagation.py (#144132 )

2025-01-03 20:07:37 +00:00

legacy

…

lib

Add and use thread-safe strerror (#140472 )

2024-11-19 04:24:17 +00:00

linalg

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

masked

remove allow-untyped-defs for torch/masked/maskedtensor/creation.py (#143321 )

2024-12-17 16:44:50 +00:00

monitor

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

mps

remove allow-untyped-defs from torch/mps/event.py (#144092 )

2025-01-03 01:20:17 +00:00

mtia

Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347 )"

2024-12-21 04:04:16 +00:00

multiprocessing

[BE][CI] bump ruff to 0.8.4 (#143753 )

2024-12-24 12:24:10 +00:00

nested

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

Revert "Rewrite _reparametrize_module to use contextmanager (#138203 )"

2025-01-03 18:17:32 +00:00

onnx

remove allow-untyped-defs onnx/_internal/exporter/_fx_passes.py (#144134 )

2025-01-03 20:18:40 +00:00

optim

Clarify what we mean by decoupled weight decay in the *AdamWs (#144101 )

2025-01-03 19:06:00 +00:00

package

Use absolute path path.resolve() -> path.absolute() (#129409 )

2025-01-03 20:03:40 +00:00

profiler

[pytorch/et] Allow ET to save additional resources for completing a trace like generated kernels and index tensor data (#143775 )

2024-12-26 21:15:39 +00:00

quantization

Remove unused Python variables in torch/[b-z]* (#136963 )

2024-10-19 16:45:22 +00:00

signal

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

sparse

[sparse] add extra options to _cslt_spare_mm (#137427 )

2024-11-27 05:32:45 +00:00

special

[BE][Easy] enable PYFMT for torch/[a-s]*/ (#138447 )

2024-12-23 14:04:00 +00:00

testing

Use absolute path path.resolve() -> path.absolute() (#129409 )

2025-01-03 20:03:40 +00:00

utils

Use absolute path path.resolve() -> path.absolute() (#129409 )

2025-01-03 20:03:40 +00:00

xpu

Add get_stream_from_external API for XPU backend (#141123 )

2024-12-31 11:15:52 +00:00

__config__.py

remove allow-untyped-defs for torch/__config__.py (#143320 )

2024-12-17 00:16:09 +00:00

__future__.py

…

__init__.py

Rename cache limit to recompile limit in configs (#143709 )

2024-12-22 10:03:57 +00:00

_appdirs.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_classes.py

Add None return type to init (#132335 )

2024-08-01 15:26:45 +00:00

_compile.py

[BE] Format uncategorized Python files with ruff format (#132576 )

2024-08-04 17:13:31 +00:00

_custom_ops.py

[BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765 )

2024-07-31 10:42:50 +00:00

_deploy.py

Remove unused imported names in python files (#134438 )

2024-08-27 20:44:04 +00:00

_environment.py

Improve is_fbcode functionality (#136871 )

2024-09-27 21:19:01 +00:00

_guards.py

[ca] add compiled autograd to CompileId (#141907 )

2024-12-21 00:41:24 +00:00

_jit_internal.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_linalg_utils.py

[BE] Format uncategorized Python files with ruff format (#132576 )

2024-08-04 17:13:31 +00:00

_lobpcg.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_lowrank.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_meta_registrations.py

[ROCm] Add miopen_batch_norm to meta_registrations to fix AOTI issue (#143569 )

2024-12-24 23:43:11 +00:00

_namedtensor_internals.py

[BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765 )

2024-07-31 10:42:50 +00:00

_ops.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_python_dispatcher.py

Add None return type to init (#132335 )

2024-08-01 15:26:45 +00:00

_size_docs.py

remove allow-untyped-defs from torch/_size_docs.py (#143942 )

2024-12-29 01:00:46 +00:00

_sources.py

…

_storage_docs.py

[BE] Format uncategorized Python files with ruff format (#132576 )

2024-08-04 17:13:31 +00:00

_streambase.py

Use torch.Stream&torch.Event for Dynamo capature (#134850 )

2024-10-02 14:15:33 +00:00

_tensor_docs.py

Revert "Add deterministic path for CUDA cumsum (#136224 )"

2024-09-27 12:54:47 +00:00

_tensor_str.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

_tensor.py

__cuda_array_interface__: Use "<V2" for bfloat16. (#143042 )

2024-12-14 06:27:52 +00:00

_thread_safe_fork.py

[inductor] parallel compile: add import of thread_safe_fork for internal (#137155 )

2024-10-03 17:37:21 +00:00

_torch_docs.py

[Easy] Add torch.range, torch.arange params optional description (#143731 )

2024-12-24 01:29:24 +00:00

_utils_internal.py

[reland] Kill capture_pre_autograd_graph API (#143426 )

2024-12-18 12:07:09 +00:00

_utils.py

Reraise worker errors as runtime errors in more cases when the original exception can't be constructed (#140911 )

2024-12-14 03:11:36 +00:00

_VF.py

Clean up RemoteCache classes (#134032 )

2024-08-31 20:18:59 +00:00

_vmap_internals.py

…

_weights_only_unpickler.py

Remove unused Python variables in torch/[_-a]* (#133492 )

2024-12-12 17:39:14 +00:00

abi-check.cpp

…

CMakeLists.txt

export AOTI_TORCH_EXPORT on Windows. (#140030 )

2025-01-03 05:41:06 +00:00

custom_class_detail.h

Enable readability-redundant-declaration (#143982 )

2024-12-31 00:20:10 +00:00

custom_class.h

Remove some pre-cpp17 stuff (#138410 )

2024-10-23 00:38:03 +00:00

extension.h

…

functional.py

Clarify opt-einsum usage, fix #127109 (#137596 )

2024-10-09 20:31:24 +00:00

hub.py

Remove unused Python variables in torch/[b-z]* (#136963 )

2024-10-19 16:45:22 +00:00

library.h

Enable more readability-redundant checks (#143963 )

2024-12-30 14:49:33 +00:00

library.py

make it clearer (in docs) one can double decorate with torch.library.impl_* APIs (#137608 )

2024-12-17 15:13:58 +00:00

overrides.py

[dim_order] raised runtime error when tensor has ambiguous dim order (#141632 )

2024-12-08 23:16:57 +00:00

py.typed

…

quasirandom.py

[BE] Format uncategorized Python files with ruff format (#132576 )

2024-08-04 17:13:31 +00:00

random.py

[Torch] Support meta device in random.fork_rng (#137715 )

2024-10-16 18:00:39 +00:00

README.txt

…

return_types.py

…

script.h

…

serialization.py

Add config.save.use_pinned_memory_for_d2h to serialization config (#143342 )

2024-12-20 21:01:18 +00:00

storage.py

Fix .to(cpu) for Storage (#138011 )

2024-10-23 01:31:48 +00:00

torch_version.py

Add mypy typing to torch_version.py (#131447 )

2024-07-23 17:31:07 +00:00

types.py

[BE] better type annotation for torch.types (#129559 )

2024-09-02 15:35:32 +00:00

version.py.tpl

…

README.txt

Note [TH abstraction violation]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TH/THC provide some hpp headers, which are proper C++ headers rather than
C headers.  These headers serve double duty as *internal implementation
detail* headers, whose contents should largely not be used by external
clients.

Ideally, we would not install these headers at all; instead, you should
use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`)
to manipulate these structs.  However, there are a few places
in torch/csrc where we violate this abstraction.  They are marked with
a pointer to this note.  Each of those sites will have to be refactored
when we refactor the guts of THTensor and related structures.