[MPS] Support includes in metal objects (#145087)

Useful for code reuse for Metal shader build both for eager mode and MPSInductor, but it requires one to implement `_cpp_embed_headers` tool that, as name suggests, would preprocess and embeds the for shader to be used in dynamic compilation. Test using: - `TestMetalLibrary.test_metal_include` - Moving `i0`/`i1` implementation to `c10/util/metal_special_math.h` and call it from `SpecialOps.metal` shader, which now looks much more compact: ```metal template <typename T, typename Tout = T> void kernel i0(constant T* input, device Tout* output, uint index [[thread_position_in_grid]]) { output[index] = c10::i0(static_cast<Tout>(input[index])); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145087 Approved by: https://github.com/dcci ghstack dependencies: #145023
2025-10-20 21:14:14 +08:00 · 2025-01-17 16:43:01 -08:00
parent 2859b11bdb
commit dc9b77cc55
8 changed files with 219 additions and 135 deletions
--- a/setup.py
+++ b/setup.py
@ -1248,6 +1248,7 @@ def main():
        "include/c10/cuda/impl/*.h",
        "include/c10/hip/*.h",
        "include/c10/hip/impl/*.h",
+        "include/c10/metal/*.h",
        "include/c10/xpu/*.h",
        "include/c10/xpu/impl/*.h",
        "include/torch/*.h",