[MPS] Support includes in metal objects (#145087)

Useful for code reuse for Metal shader build both for eager mode and MPSInductor, but it requires one to implement `_cpp_embed_headers` tool that, as name suggests, would preprocess and embeds the for shader to be used in dynamic compilation.
Test using:
 -  `TestMetalLibrary.test_metal_include`
 - Moving `i0`/`i1` implementation to `c10/util/metal_special_math.h` and call it from `SpecialOps.metal` shader, which now looks much more compact:
 ```metal
template <typename T, typename Tout = T>
void kernel
i0(constant T* input,
   device Tout* output,
   uint index [[thread_position_in_grid]]) {
  output[index] = c10::i0(static_cast<Tout>(input[index]));
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145087
Approved by: https://github.com/dcci
ghstack dependencies: #145023
This commit is contained in:
Nikita Shulga
2025-01-17 16:43:01 -08:00
committed by PyTorch MergeBot
parent 2859b11bdb
commit dc9b77cc55
8 changed files with 219 additions and 135 deletions

View File

@ -1248,6 +1248,7 @@ def main():
"include/c10/cuda/impl/*.h",
"include/c10/hip/*.h",
"include/c10/hip/impl/*.h",
"include/c10/metal/*.h",
"include/c10/xpu/*.h",
"include/c10/xpu/impl/*.h",
"include/torch/*.h",