flash_attention integration (#81434)

# Summary:
- I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on.

- Only looked at CMake did not attempt bazel or buck yet.

-  I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434
Approved by: https://github.com/cpuhrsch
This commit is contained in:
Driss Guessous
2022-09-09 20:11:26 +00:00
committed by PyTorch MergeBot
parent 219ff26172
commit 0fc02dbba4
29 changed files with 4395 additions and 5 deletions

3
.gitmodules vendored
View File

@ -151,3 +151,6 @@
[submodule "third_party/VulkanMemoryAllocator"]
path = third_party/VulkanMemoryAllocator
url = https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git
[submodule "third_party/cutlass"]
path = third_party/cutlass
url = https://github.com/NVIDIA/cutlass.git