Set version to 0.4.3 (#71 )

Support DISABLE_KERNEL_MAPPING env var for completely disabling kernel mappings (#70 )
* Disable kernel mappings with `DISABLE_KERNEL_MAPPING=1` * Rename HF_KERNELS_CACHE to KERNELS_CACHE But still recognize the old variant for compatibility. * Add documentation for environment variables
2025-10-21 13:33:48 +08:00 · 2025-04-10 11:57:15 +02:00 · 2025-04-10 11:37:54 +02:00 · 2025-04-04 20:35:29 +02:00 · 2025-04-04 19:38:15 +02:00 · 2025-03-31 14:29:30 +02:00
21 changed files with 1220 additions and 67 deletions
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@ -0,0 +1,10 @@
+name: Lints
+on: [push, pull_request]
+jobs:
+  lint:
+    name: Run lints
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run ruff
+        uses: astral-sh/ruff-action@v3
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@ -52,3 +52,8 @@ jobs:

      - name: Run tests
        run: uv run pytest tests
+
+      - name: Import check without torch
+        run: |
+          uv pip uninstall torch
+          python -c "import kernels"
--- a/201
+++ b/201
@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/README.md
+++ b/README.md
@ -45,7 +45,9 @@ the Hub.

 ## 📚 Documentation

+- [Using layers](docs/layers.md)
 - [Locking kernel versions](docs/locking.md)
+- [Environment variables](docs/env.md)
 - [Using kernels in a Docker container](docs/docker.md)
 - [Kernel requirements](docs/kernel-requirements.md)
 - [Writing kernels](https://github.com/huggingface/kernel-builder/blob/main/docs/writing-kernels.md) using [kernel-builder](https://github.com/huggingface/kernel-builder/)
--- a/docs/env.md
+++ b/docs/env.md
@ -0,0 +1,10 @@
+# Environment variables
+
+## `KERNELS_CACHE`
+
+The directory to use as the local kernel cache. If not set, the cache
+of the `huggingface_hub` package is used.
+
+## `DISABLE_KERNEL_MAPPING`
+
+Disables kernel mappings for [`layers`](layers.md).
--- a/docs/kernel-requirements.md
+++ b/docs/kernel-requirements.md
@ -26,13 +26,24 @@ recommended build variants are:
 - `torch26-cxx98-cu124-x86_64-linux`
 - `torch26-cxx98-cu126-x86_64-linux`

-This list will be updated as new PyTorch versions are released. Each
-variant directory should contain a single directory with the same name
+This list will be updated as new PyTorch versions are released. Kernels
+that are in pure Python (e.g. Triton kernels) only need to provide a
+single build variant:
+
+- `torch-universal`
+
+Each variant directory should contain a single directory with the same name
 as the repository (replacing `-` by `_`). For instance, kernels in the
 `kernels-community/activation` repository have a directories like
 `build/<variant>/activation`. This directory
 must be a Python package with an `__init__.py` file.

+## Versioning
+
+Kernels are versioned on the Hub using Git tags. Version tags must be of
+the form `v<major>.<minor>.<patch>`. Versions are used by [locking](./locking.md)
+to resolve the version constraints.
+
 ## Native Python module

 Kernels will typically contain a native Python module with precompiled
@ -41,16 +52,31 @@ requirements:

 - Use [ABI3/Limited API](https://docs.python.org/3/c-api/stable.html#stable-application-binary-interface)
  for compatibility with Python 3.9 and later.
- Compatible with glibc 2.27 or later. This means that no symbols
-  from later versions must be used. To archive this, the module should
-  be built against this glibc version. **Warning:** libgcc must also be
-  built against glibc 2.27 to avoid leaking symbols.
- No dynamic linkage against libstdc++/libc++. Linkage for C++ symbols
-  must be static.
- No dynamic library dependencies outside Torch or CUDA libraries
-  installed as dependencies of Torch.
+- Compatible with [`manylinux_2_28`](https://github.com/pypa/manylinux?tab=readme-ov-file#manylinux_2_28-almalinux-8-based).
+  This means that the extension **must not** use symbols versions higher than:

-(These requirements will be updated as new PyTorch versions are released.)
+  - GLIBC 2.28
+  - GLIBCXX 3.4.24
+  - CXXABI 1.3.11
+  - GCC 7.0.0
+
+  These requirement can be checked with the ABI checker (see below).
+
+- No dynamic library dependencies outside:
+
+  - Torch;
+  - CUDA/ROCm libraries installed as dependencies of Torch.
+
+The manylinux_2_28 and Python ABI 3.9 version requirements can be checked with
+[`kernel-abi-check`](https://crates.io/crates/kernel-abi-check):
+
+```bash
+
+$ cargo install kernel-abi-check
+$ kernel-abi-check result/relu/_relu_e87e0ca_dirty.abi3.so
+🐍 Checking for compatibility with manylinux_2_28 and Python ABI version 3.9
+✅ No compatibility issues found
+```

 ## Torch extension

@ -71,6 +97,80 @@ might use two different commits that happen to have the same version
 number. Git tags are not stable, so they do not provide a good way
 of guaranteeing uniqueness of the namespace.

+## Layers
+
+A kernel can provide layers in addition to kernel functions. A layer from
+the Hub can replace the `forward` method of an existing layer for a certain
+device type. This makes it possible to provide more performant kernels for
+existing layers. See the [layers documentation](layers.md) for more information
+on how to use layers.
+
+### Writing layers
+
+To make the extension of layers safe, the layers must fulfill the following
+requirements:
+
+- The layers are subclasses of `torch.nn.Module`.
+- The layers are pure, meaning that they do not have their own state. This
+  means that:
+  - The layer must not define its own constructor.
+  - The layer must not use class variables.
+- No other methods must be defined than `forward`.
+- The `forward` method has a signature that is compatible with the
+  `forward` method that it is extending.
+
+This is an example of a pure layer:
+
+```python
+class SiluAndMul(nn.Module):
+    def forward(self, x: torch.Tensor):
+        d = x.shape[-1] // 2
+        output_shape = x.shape[:-1] + (d,)
+        out = torch.empty(output_shape, dtype=x.dtype, device=x.device)
+        ops.silu_and_mul(out, x)
+        return out
+```
+
+For some layers, the `forward` method has to use state from the adopting class.
+In these cases, we recommend to use type annotations to indicate what member
+variables are expected. For instance:
+
+```python
+class LlamaRMSNorm(nn.Module):
+    weight: torch.Tensor
+    variance_epsilon: float
+
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        return rms_norm_fn(
+            hidden_states,
+            self.weight,
+            bias=None,
+            residual=None,
+            eps=self.variance_epsilon,
+            dropout_p=0.0,
+            prenorm=False,
+            residual_in_fp32=False,
+        )
+```
+
+This layer expects the adopting layer to have `weight` and `variance_epsilon`
+member variables and uses them in the `forward` method.
+
+### Exporting layers
+
+To accommodate portable loading, `layers` must be defined in the main
+`__init__.py` file. For example:
+
+```python
+from . import layers
+
+__all__ = [
+  # ...
+  "layers"
+  # ...
+]
+```
+
 ## Python requirements

 - Python code must be compatible with Python 3.9 and later.
--- a/docs/layers.md
+++ b/docs/layers.md
@ -0,0 +1,79 @@
+# Layers
+
+A kernel can provide layers in addition to kernel functions. A layer from
+the Hub can replace the `forward` method of an existing layer for a certain
+device type. This makes it possible to provide more performant kernels for
+existing layers.
+
+See [Kernel requirements](kernel-requirements.md) for more information the
+requirements of Hub layers.
+
+## Making a layer extensible with kernels from the hub
+
+### Using a decorator
+
+A layer can be made extensible with the `use_kernel_forward_from_hub`
+decorator. For example:
+
+```python
+@use_kernel_forward_from_hub("SiluAndMul")
+class SiluAndMul(nn.Module):
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        d = input.shape[-1] // 2
+        return F.silu(input[..., :d]) * input[..., d:]
+```
+
+The decorator changes the layer, so that other implementations of the `forward`
+method can be registered using the name `SiluAndMul`.
+
+### External layers
+
+An existing layer that does not (yet) have the `use_kernel_forward_from_hub`
+decorator can be made extensible by by monkeypatching it using the `replace_kernel_forward_from_hub` function.
+
+```python
+from somelibrary import SiluAndMul
+
+replace_kernel_forward_from_hub(SiluAndMul, "SiluAndMul")
+register_kernel_mapping(kernel_layer_mapping)
+```
+
+The `register_kernel_mapping` call maps the name `SiluAndMul` to actual
+hub kernels. See the [Registering a hub kernel for a layer](#registering-a-hub-kernel-for-a-layer)
+section for more information.
+
+**Warning:** we strongly recommend using layers with a decorator, since
+it signifies that the maintainer intends to keep the `forward` signature
+compatible with layers from the hub.
+
+## Registering a hub kernel for a layer
+
+Once a layer is made extensible, users can register hub kernels for it
+by name using the `register_kernel_mapping` function. For example:
+
+```python
+kernel_layer_mapping = {
+    "SiluAndMul": {
+        "cuda": LayerRepository(
+            repo_id="kernels-community/activation",
+            layer_name="SiluAndMul",
+            revision="layers",
+        )
+    }
+}
+
+register_kernel_mapping(kernel_layer_mapping)
+```
+
+This will register the kernel mapping in the current context, which is
+normally global. It is recommended to scope the mapping to where it is
+used with the `use_kernel_mapping` context manager:
+
+```python
+with use_kernel_mapping(kernel_layer_mapping):
+    # Use the layer for which the mapping is applied.
+    ...
+```
+
+This ensures that the mapping is not active anymore outside the
+`with`-scope.
--- a/flake.lock
+++ b/flake.lock
@ -0,0 +1,134 @@
+{
+  "nodes": {
+    "flake-compat": {
+      "locked": {
+        "lastModified": 1733328505,
+        "narHash": "sha256-NeCCThCEP3eCl2l/+27kNNK7QrwZB1IJCrXfrbv5oqU=",
+        "owner": "edolstra",
+        "repo": "flake-compat",
+        "rev": "ff81ac966bb2cae68946d5ed5fc4994f96d0ffec",
+        "type": "github"
+      },
+      "original": {
+        "owner": "edolstra",
+        "repo": "flake-compat",
+        "type": "github"
+      }
+    },
+    "flake-utils": {
+      "inputs": {
+        "systems": "systems"
+      },
+      "locked": {
+        "lastModified": 1731533236,
+        "narHash": "sha256-l0KFg5HjrsfsO/JpG+r7fRrqm12kzFHyUHqHCVpMMbI=",
+        "owner": "numtide",
+        "repo": "flake-utils",
+        "rev": "11707dc2f618dd54ca8739b309ec4fc024de578b",
+        "type": "github"
+      },
+      "original": {
+        "owner": "numtide",
+        "repo": "flake-utils",
+        "type": "github"
+      }
+    },
+    "flake-utils_2": {
+      "inputs": {
+        "systems": "systems_2"
+      },
+      "locked": {
+        "lastModified": 1731533236,
+        "narHash": "sha256-l0KFg5HjrsfsO/JpG+r7fRrqm12kzFHyUHqHCVpMMbI=",
+        "owner": "numtide",
+        "repo": "flake-utils",
+        "rev": "11707dc2f618dd54ca8739b309ec4fc024de578b",
+        "type": "github"
+      },
+      "original": {
+        "owner": "numtide",
+        "repo": "flake-utils",
+        "type": "github"
+      }
+    },
+    "nixpkgs": {
+      "locked": {
+        "lastModified": 1737453259,
+        "narHash": "sha256-5LaFI9SQwCZmJDasMoYMdzNouWXNk3BvjKcO19tq1Rs=",
+        "owner": "danieldk",
+        "repo": "nixpkgs",
+        "rev": "e0372dbcfd19ddd783b7c3b3868f19322f83318e",
+        "type": "github"
+      },
+      "original": {
+        "owner": "danieldk",
+        "ref": "outlines-v0.1.4-tgi",
+        "repo": "nixpkgs",
+        "type": "github"
+      }
+    },
+    "root": {
+      "inputs": {
+        "flake-utils": "flake-utils",
+        "nixpkgs": [
+          "tgi-nix",
+          "nixpkgs"
+        ],
+        "tgi-nix": "tgi-nix"
+      }
+    },
+    "systems": {
+      "locked": {
+        "lastModified": 1681028828,
+        "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
+        "owner": "nix-systems",
+        "repo": "default",
+        "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
+        "type": "github"
+      },
+      "original": {
+        "owner": "nix-systems",
+        "repo": "default",
+        "type": "github"
+      }
+    },
+    "systems_2": {
+      "locked": {
+        "lastModified": 1681028828,
+        "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
+        "owner": "nix-systems",
+        "repo": "default",
+        "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
+        "type": "github"
+      },
+      "original": {
+        "owner": "nix-systems",
+        "repo": "default",
+        "type": "github"
+      }
+    },
+    "tgi-nix": {
+      "inputs": {
+        "flake-compat": "flake-compat",
+        "flake-utils": "flake-utils_2",
+        "nixpkgs": "nixpkgs"
+      },
+      "locked": {
+        "lastModified": 1741617161,
+        "narHash": "sha256-cwKYAsIVSLtoLbG48+oi3NkSrvuZRLYs8lkJmpDsTw0=",
+        "owner": "huggingface",
+        "repo": "text-generation-inference-nix",
+        "rev": "5946021ec6cb6aae18158a9dc27f893cfbab2925",
+        "type": "github"
+      },
+      "original": {
+        "owner": "huggingface",
+        "ref": "kernels-0.2.0",
+        "repo": "text-generation-inference-nix",
+        "type": "github"
+      }
+    }
+  },
+  "root": "root",
+  "version": 7
+}
--- a/flake.nix
+++ b/flake.nix
@ -0,0 +1,54 @@
+{
+  inputs = {
+    tgi-nix.url = "github:huggingface/text-generation-inference-nix/kernels-0.2.0";
+    nixpkgs.follows = "tgi-nix/nixpkgs";
+    flake-utils.url = "github:numtide/flake-utils";
+  };
+  outputs =
+    {
+      self,
+      nixpkgs,
+      flake-utils,
+      tgi-nix,
+    }:
+    flake-utils.lib.eachDefaultSystem (
+      system:
+      let
+        pkgs = import nixpkgs {
+          inherit system;
+          inherit (tgi-nix.lib) config;
+          overlays = [
+            tgi-nix.overlays.default
+          ];
+        };
+      in
+      {
+        formatter = pkgs.nixfmt-rfc-style;
+        devShells = with pkgs; rec {
+          default = mkShell {
+            buildInputs =
+              [
+                black
+                mypy
+                pyright
+                ruff
+              ]
+              ++ (with python3.pkgs; [
+                huggingface-hub
+                pytest
+                pytest-benchmark
+                torch
+                venvShellHook
+              ]);
+
+            venvDir = "./.venv";
+
+            postVenvCreation = ''
+              unset SOURCE_DATE_EPOCH
+              ( python -m pip install --no-build-isolation --no-dependencies -e . )
+            '';
+          };
+        };
+      }
+    );
+}
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,20 +1,20 @@
 [project]
 name = "kernels"
-version = "0.1.7"
-description = "Download cuda kernels"
+version = "0.4.3"
+description = "Download compute kernels"
 authors = [
  { name = "OlivierDehaene", email = "olivier@huggingface.co" },
  { name = "Daniel de Kok", email = "daniel@huggingface.co" },
  { name = "David Holtz", email = "david@huggingface.co" },
  { name = "Nicolas Patry", email = "nicolas@huggingface.co" },
 ]
+license = { text = "Apache-2.0" }
 readme = "README.md"
 requires-python = ">= 3.9"
 dependencies = [
-  "huggingface-hub>=0.26.3",
-  "packaging>=24.2",
-  "tomli>=2.0.1; python_version<'3.11'",
-  "torch>=2.4",
+  "huggingface_hub>=0.26.0,<1.0",
+  "packaging>=20.0",
+  "tomli>=2.0; python_version<'3.11'",
 ]

 [build-system]
@ -27,11 +27,42 @@ dev = [
  "pytest >=8",
  # Whatever version is compatible with pytest.
  "pytest-benchmark",
+  "torch >=2.5",
 ]

+[project.optional-dependencies]
+torch = ["torch"]
+
 [project.scripts]
 kernels = "kernels.cli:main"

 [project.entry-points."egg_info.writers"]
 "kernels.lock" = "kernels.lockfile:write_egg_lockfile"

+
+[tool.ruff]
+exclude = [
+  ".eggs",
+  ".git",
+  ".git-rewrite",
+  ".hg",
+  ".mypy_cache",
+  ".nox",
+  ".pants.d",
+  ".pytype",
+  ".ruff_cache",
+  ".svn",
+  ".tox",
+  ".venv",
+  ".venv*",
+  "__pypackages__",
+  "_build",
+  "build",
+  "dist",
+  "venv",
+]
+line-length = 119
+# Ignored rules:
+# "E501" -> line length violation
+lint.ignore = ["E501"]
+lint.select = ["E", "F", "I", "W"]
--- a/src/kernels/init.py
+++ b/src/kernels/init.py
@ -1,3 +1,27 @@
-from kernels.utils import get_kernel, install_kernel, load_kernel, get_locked_kernel
+from kernels.layer import (
+    Device,
+    LayerRepository,
+    register_kernel_mapping,
+    replace_kernel_forward_from_hub,
+    use_kernel_forward_from_hub,
+    use_kernel_mapping,
+)
+from kernels.utils import (
+    get_kernel,
+    get_locked_kernel,
+    install_kernel,
+    load_kernel,
+)

-__all__ = ["get_kernel", "get_locked_kernel", "load_kernel", "install_kernel"]
+__all__ = [
+    "get_kernel",
+    "get_locked_kernel",
+    "load_kernel",
+    "install_kernel",
+    "use_kernel_forward_from_hub",
+    "use_kernel_mapping",
+    "register_kernel_mapping",
+    "replace_kernel_forward_from_hub",
+    "LayerRepository",
+    "Device",
+]
--- a/src/kernels/cli.py
+++ b/src/kernels/cli.py
@ -6,7 +6,7 @@ from pathlib import Path

 from kernels.compat import tomllib
 from kernels.lockfile import KernelLock, get_kernel_locks
-from kernels.utils import build_variant, install_kernel, install_kernel_all_variants
+from kernels.utils import install_kernel, install_kernel_all_variants


 def main():
--- a/src/kernels/layer.py
+++ b/src/kernels/layer.py
@ -0,0 +1,259 @@
+import inspect
+import os
+import warnings
+from contextvars import ContextVar
+from copy import deepcopy
+from dataclasses import dataclass, field
+from typing import TYPE_CHECKING, Callable, Dict, Union
+
+from .utils import get_kernel
+
+if TYPE_CHECKING:
+    from torch import nn
+
+_DISABLE_KERNEL_MAPPING: bool = bool(int(os.environ.get("DISABLE_KERNEL_MAPPING", "0")))
+
+
+@dataclass(frozen=True)
+class Device:
+    type: str
+
+    # In the future we might add compute capabilities, etc.
+
+    def __eq__(self, other):
+        return isinstance(other, Device) and self.type == other.type
+
+    def __hash__(self):
+        return hash(self.type)
+
+
+@dataclass
+class LayerRepository:
+    """
+    Repository and name of a layer.
+    """
+
+    layer_name: str = field(
+        metadata={"help": "The name of the layer in the kernel repository."}
+    )
+    repo_id: str = field(metadata={"help": "The kernel hub repository with the layer."})
+    revision: str = field(
+        default="main", metadata={"help": "The revision of the layer."}
+    )
+
+    def __eq__(self, other):
+        return (
+            isinstance(other, LayerRepository)
+            and self.layer_name == other.layer_name
+            and self.repo_id == other.repo_id
+            and self.revision == other.revision
+        )
+
+    def __hash__(self):
+        return hash((self.layer_name, self.repo_id, self.revision))
+
+
+_KERNEL_MAPPING: ContextVar[Dict[str, Dict[Device, LayerRepository]]] = ContextVar(
+    "_KERNEL_MAPPING", default={}
+)
+
+
+def use_kernel_mapping(
+    mapping: Dict[str, Dict[Union[Device, str], LayerRepository]],
+    *,
+    inherit_mapping: bool = True,
+):
+    """
+    Context manager that sets a mapping for a duration of the context.
+
+    When `inherit_mapping` is set to `True` the current mapping will be
+    extended by `mapping` inside the context. If it is `False`, only
+    `mapping` is used inside the context.
+    """
+
+    class ContextManager:
+        def __enter__(self):
+            # Mappings always stack on previous mappings.
+            if inherit_mapping:
+                self.token = _KERNEL_MAPPING.set(deepcopy(_KERNEL_MAPPING.get()))
+            else:
+                self.token = _KERNEL_MAPPING.set({})
+            register_kernel_mapping(mapping)
+
+        def __exit__(self, exc_type, exc_value, traceback):
+            _KERNEL_MAPPING.reset(self.token)
+
+    return ContextManager()
+
+
+def register_kernel_mapping(
+    mapping: Dict[str, Dict[Union[Device, str], LayerRepository]]
+):
+    """
+    Allows one to register a mapping between a layer name the corresponding kernel to use, depending on the device.
+    This should be use in conjunction with `use_kernel_hub_forward` decorator on the classname.
+    Exemple usage:
+
+    ```python
+    from kernels import LayerRepository, register_kernel_mapping
+
+    kernel_layer_mapping = {
+      "LlamaRMSNorm": {
+          "cuda": LayerRepository(
+              repo_id="kernels-community/activation",
+              layer_name="RmsNorm",
+              revision="layers",
+          ),
+      },
+    }
+    register_kernel_mapping(kernel_layer_mapping)
+    ```
+    """
+    # Merge with existing mappings.
+    for new_kernel, new_device_repos in mapping.items():
+        device_repo = _KERNEL_MAPPING.get().setdefault(new_kernel, {})
+        for new_device, new_repo in new_device_repos.items():
+            if isinstance(new_device, str):
+                device_repo[Device(type=new_device)] = new_repo
+            else:
+                device_repo[new_device] = new_repo
+
+
+def replace_kernel_forward_from_hub(cls, layer_name: str, *, use_fallback: bool = True):
+    """
+    Replace the forward function of a layer using a layer from the kernel hub.
+    This function monkeypatches a layer, replacing the `forward` method
+    of the layer with that of a layer from the hub. The replacement is done
+    when a layer matching `layer_name` and device type is registered through
+    `register_layer_mapping`. The device type is inferred from the first
+    argument to `forward`.
+    """
+
+    fallback_forward = cls.forward
+
+    cached_forward: Dict[LayerRepository, Callable] = {}
+
+    def forward(self, x, *args, **kwargs):
+        if _DISABLE_KERNEL_MAPPING:
+            return fallback_forward(self, x, *args, **kwargs)
+
+        kernel = _KERNEL_MAPPING.get().get(layer_name)
+        if kernel is None:
+            warnings.warn(
+                "\n"
+                f"No kernel mapping found for layer `{layer_name}`. "
+                f"Check if the layer name matches one of the kernels in the mapping or add the kernel "
+                f"you want to use to the mapping. Defaulting to original forward implementation."
+            )
+            if not use_fallback:
+                raise ValueError(f"No layer mapping for `{layer_name}`")
+            return fallback_forward(self, x, *args, **kwargs)
+
+        device = getattr(x, "device", None)
+        if device is None:
+            return fallback_forward(self, x, *args, **kwargs)
+
+        repo = kernel.get(Device(type=device.type))
+        if repo is None:
+            if not use_fallback:
+                raise ValueError(
+                    f"No layer mapping for `{layer_name}` with device type `{device.type}`"
+                )
+            return fallback_forward(self, x, *args, **kwargs)
+
+        # Short-circuit if we already loaded the layer.
+        layer_forward = cached_forward.get(repo, None)
+        if layer_forward is not None:
+            return layer_forward(self, x, *args, **kwargs)
+
+        layer = _get_kernel_layer(
+            repo_id=repo.repo_id,
+            layer_name=repo.layer_name,
+            revision=repo.revision,
+        )
+
+        # We have to validate against the original signature.
+        orig_forward = cls.forward
+        try:
+            cls.forward = fallback_forward
+            _validate_layer(check_cls=cls, cls=layer)
+        finally:
+            cls.forward = orig_forward
+
+        layer_forward = layer.forward
+        cached_forward[repo] = layer_forward
+
+        return layer_forward(self, x, *args, **kwargs)
+
+    cls.forward = forward
+
+
+def use_kernel_forward_from_hub(layer_name: str, *, use_fallback: bool = True):
+    """
+    Replace the forward function of a layer using a layer from the kernel hub.
+    This decorator can be applied to a layer and replaces the forward method
+    of the layer with that of a layer from the hub. The replacement is done
+    when a layer matching `layer_name` and device type is registered through
+    `register_layer_mapping`. The device type is inferred from the first
+    argument to `forward`.
+    """
+
+    def decorator(cls):
+        replace_kernel_forward_from_hub(cls, layer_name, use_fallback=use_fallback)
+        return cls
+
+    return decorator
+
+
+def _get_kernel_layer(*, repo_id: str, layer_name: str, revision: str) -> "nn.Module":
+    """Get a layer from a kernel."""
+
+    kernel = get_kernel(repo_id, revision=revision)
+
+    if getattr(kernel, "layers", None) is None:
+        raise ValueError(
+            f"Kernel `{repo_id}` at revision `{revision}` does not define any layers."
+        )
+
+    layer = getattr(kernel.layers, layer_name, None)
+    if layer is None:
+        raise ValueError(f"Layer `{layer_name}` not found in kernel `{repo_id}`.")
+    return layer
+
+
+def _validate_layer(*, check_cls, cls):
+    # The layer must have at least have the following properties: (1) it
+    # must be stateless; (2) the forward signature should correspond to
+    # the signature it is replacing; (3) forward should not call other
+    # methods.
+
+    from torch import nn
+
+    if not issubclass(cls, nn.Module):
+        raise TypeError(f"Layer `{cls}` is not a Torch layer.")
+
+    # We verify statelessness by checking that the does not have its own
+    # constructor (since the constructor could add member variables)...
+    if cls.__init__ is not nn.Module.__init__:
+        raise TypeError("Layer must not override nn.Module constructor.")
+
+    # ... or predefined member variables.
+    torch_module_members = {name for name, _ in inspect.getmembers(nn.Module)}
+    cls_members = {name for name, _ in inspect.getmembers(cls)}
+    if cls_members - torch_module_members != set():
+        raise TypeError("Layer must not contain additional members.")
+
+    # Check whether the forward signatures are similar.
+    params = inspect.signature(cls.forward).parameters
+    ref_params = inspect.signature(check_cls.forward).parameters
+
+    if len(params) != len(ref_params):
+        raise TypeError(
+            "Forward signature does not match: different number of arguments."
+        )
+
+    for param, ref_param in zip(params.values(), ref_params.values()):
+        if param.kind != ref_param.kind:
+            raise TypeError(
+                f"Forward signature does not match: different kind of arguments ({param} ({param.kind}) and {ref_param} ({ref_param.kind})"
+            )
--- a/src/kernels/lockfile.py
+++ b/src/kernels/lockfile.py
@ -1,5 +1,5 @@
-from dataclasses import dataclass
 import hashlib
+from dataclasses import dataclass
 from pathlib import Path
 from typing import Dict, List, Tuple

--- a/src/kernels/utils.py
+++ b/src/kernels/utils.py
@ -4,43 +4,59 @@ import importlib
 import importlib.metadata
 import inspect
 import json
+import logging
 import os
-from pathlib import Path
 import platform
 import sys
 from importlib.metadata import Distribution
+from pathlib import Path
 from types import ModuleType
 from typing import Dict, List, Optional, Tuple

-from huggingface_hub import hf_hub_download, snapshot_download
+from huggingface_hub import snapshot_download
 from packaging.version import parse

-from kernels.compat import tomllib
 from kernels.lockfile import KernelLock, VariantLock

-CACHE_DIR: Optional[str] = os.environ.get("HF_KERNELS_CACHE", None)
+
+def _get_cache_dir() -> Optional[str]:
+    """Returns the kernels cache directory."""
+    cache_dir = os.environ.get("HF_KERNELS_CACHE", None)
+    if cache_dir is not None:
+        logging.warning(
+            "HF_KERNELS_CACHE will be removed in the future, use KERNELS_CACHE instead"
+        )
+        return cache_dir
+
+    return os.environ.get("KERNELS_CACHE", None)
+
+
+CACHE_DIR: Optional[str] = _get_cache_dir()


 def build_variant() -> str:
    import torch

-    if torch.version.cuda is None:
-        raise AssertionError(
-            "This kernel requires CUDA to be installed. Torch was not compiled with CUDA enabled."
-        )
+    if torch.version.cuda is not None:
+        cuda_version = parse(torch.version.cuda)
+        compute_framework = f"cu{cuda_version.major}{cuda_version.minor}"
+    elif torch.version.hip is not None:
+        rocm_version = parse(torch.version.hip.split("-")[0])
+        compute_framework = f"rocm{rocm_version.major}{rocm_version.minor}"
+    else:
+        raise AssertionError("Torch was not compiled with CUDA or ROCm enabled.")

    torch_version = parse(torch.__version__)
-    cuda_version = parse(torch.version.cuda)
    cxxabi = "cxx11" if torch.compiled_with_cxx11_abi() else "cxx98"
    cpu = platform.machine()
    os = platform.system().lower()

-    return f"torch{torch_version.major}{torch_version.minor}-{cxxabi}-cu{cuda_version.major}{cuda_version.minor}-{cpu}-{os}"
+    return f"torch{torch_version.major}{torch_version.minor}-{cxxabi}-{compute_framework}-{cpu}-{os}"


-def noarch_build_variant() -> str:
+def universal_build_variant() -> str:
    # Once we support other frameworks, detection goes here.
-    return "torch-noarch"
+    return "torch-universal"


 def import_from_path(module_name: str, file_path: Path) -> ModuleType:
@ -74,11 +90,11 @@ def install_kernel(
    """
    package_name = package_name_from_repo_id(repo_id)
    variant = build_variant()
-    noarch_variant = noarch_build_variant()
+    universal_variant = universal_build_variant()
    repo_path = Path(
        snapshot_download(
            repo_id,
-            allow_patterns=[f"build/{variant}/*", f"build/{noarch_variant}/*"],
+            allow_patterns=[f"build/{variant}/*", f"build/{universal_variant}/*"],
            cache_dir=CACHE_DIR,
            revision=revision,
            local_files_only=local_files_only,
@ -86,12 +102,12 @@ def install_kernel(
    )

    variant_path = repo_path / "build" / variant
-    noarch_variant_path = repo_path / "build" / noarch_variant
+    universal_variant_path = repo_path / "build" / universal_variant

-    if not variant_path.exists() and noarch_variant_path.exists():
-        # Fall back to noarch variant.
-        variant = noarch_variant
-        variant_path = noarch_variant_path
+    if not variant_path.exists() and universal_variant_path.exists():
+        # Fall back to universal variant.
+        variant = universal_variant
+        variant_path = universal_variant_path

    if variant_locks is not None:
        variant_lock = variant_locks.get(variant)
@ -145,9 +161,18 @@ def get_kernel(repo_id: str, revision: str = "main") -> ModuleType:
    return import_from_path(package_name, package_path / package_name / "__init__.py")


-def load_kernel(repo_id: str) -> ModuleType:
-    """Get a pre-downloaded, locked kernel."""
-    locked_sha = _get_caller_locked_kernel(repo_id)
+def load_kernel(repo_id: str, *, lockfile: Optional[Path] = None) -> ModuleType:
+    """
+    Get a pre-downloaded, locked kernel.
+
+    If `lockfile` is not specified, the lockfile will be loaded from the
+    caller's package metadata.
+    """
+    if lockfile is None:
+        locked_sha = _get_caller_locked_kernel(repo_id)
+    else:
+        with open(lockfile, "r") as f:
+            locked_sha = _get_locked_kernel(repo_id, f.read())

    if locked_sha is None:
        raise ValueError(
@ -157,23 +182,24 @@ def load_kernel(repo_id: str) -> ModuleType:
    package_name = package_name_from_repo_id(repo_id)

    variant = build_variant()
-    noarch_variant = noarch_build_variant()
+    universal_variant = universal_build_variant()

    repo_path = Path(
        snapshot_download(
            repo_id,
-            allow_patterns=[f"build/{variant}/*", f"build/{noarch_variant}/*"],
+            allow_patterns=[f"build/{variant}/*", f"build/{universal_variant}/*"],
            cache_dir=CACHE_DIR,
+            revision=locked_sha,
            local_files_only=True,
        )
    )

    variant_path = repo_path / "build" / variant
-    noarch_variant_path = repo_path / "build" / noarch_variant
-    if not variant_path.exists() and noarch_variant_path.exists():
-        # Fall back to noarch variant.
-        variant = noarch_variant
-        variant_path = noarch_variant_path
+    universal_variant_path = repo_path / "build" / universal_variant
+    if not variant_path.exists() and universal_variant_path.exists():
+        # Fall back to universal variant.
+        variant = universal_variant
+        variant_path = universal_variant_path

    module_init_path = variant_path / package_name / "__init__.py"
    if not os.path.exists(module_init_path):
@ -201,11 +227,19 @@ def get_locked_kernel(repo_id: str, local_files_only: bool = False) -> ModuleTyp
 def _get_caller_locked_kernel(repo_id: str) -> Optional[str]:
    for dist in _get_caller_distributions():
        lock_json = dist.read_text("kernels.lock")
-        if lock_json is not None:
-            for kernel_lock_json in json.loads(lock_json):
-                kernel_lock = KernelLock.from_json(kernel_lock_json)
-                if kernel_lock.repo_id == repo_id:
-                    return kernel_lock.sha
+        if lock_json is None:
+            continue
+        locked_sha = _get_locked_kernel(repo_id, lock_json)
+        if locked_sha is not None:
+            return locked_sha
+    return None
+
+
+def _get_locked_kernel(repo_id: str, lock_json: str) -> Optional[str]:
+    for kernel_lock_json in json.loads(lock_json):
+        kernel_lock = KernelLock.from_json(kernel_lock_json)
+        if kernel_lock.repo_id == repo_id:
+            return kernel_lock.sha
    return None


--- a/tests/hash_validation/kernels.lock
+++ b/tests/hash_validation/kernels.lock
@ -55,9 +55,9 @@
  },
  {
    "repo_id": "kernels-community/triton-scaled-mm",
-    "sha": "9baccbeb763fe5f1b8fbdb9c1e5699548c46632c",
+    "sha": "af10d8c1affe8efce93d228c3e6e64ff673d493f",
    "variants": {
-      "torch-noarch": {
+      "torch-universal": {
        "hash": "sha256-b843c5f30b52b6c1c56fca28cb0cf453be71d6ce7d308f383dce71a8050f7b52",
        "hash_type": "git_lfs_concat"
      }
--- a/tests/hash_validation/pyproject.toml
+++ b/tests/hash_validation/pyproject.toml
@ -1,3 +1,3 @@
 [tool.kernels.dependencies]
 "kernels-community/activation" = ">=0.0.2"
-"kernels-community/triton-scaled-mm" = ">=0.0.1"
+"kernels-community/triton-scaled-mm" = ">=0.0.2"
--- a/tests/test_basic.py
+++ b/tests/test_basic.py
@ -1,5 +1,6 @@
 import pytest
 import torch
+
 from kernels import get_kernel


@ -9,7 +10,7 @@ def kernel():


@pytest.fixture
-def noarch_kernel():
+def universal_kernel():
    return get_kernel("kernels-community/triton-scaled-mm")


@ -35,14 +36,14 @@ def test_gelu_fast(kernel, device):
    assert torch.allclose(y, expected)


-def test_noarch_kernel(noarch_kernel):
+def test_universal_kernel(universal_kernel):
    torch.manual_seed(0)
    A = torch.randint(-10, 10, (64, 128), dtype=torch.int8, device="cuda")
    B = torch.randint(-10, 10, (128, 96), dtype=torch.int8, device="cuda")
    scale_a = torch.tensor(0.4, dtype=torch.float16, device="cuda")
    scale_b = torch.tensor(0.6, dtype=torch.float16, device="cuda")

-    out = noarch_kernel.triton_scaled_mm(A, B, scale_a, scale_b, torch.float16)
+    out = universal_kernel.triton_scaled_mm(A, B, scale_a, scale_b, torch.float16)
    out_check = (A * scale_a) @ (B * scale_b)
    out_check = out_check.to(torch.float16)

--- a/tests/test_benchmarks.py
+++ b/tests/test_benchmarks.py
@ -1,5 +1,6 @@
 import pytest
 import torch
+
 from kernels import get_kernel


--- a/tests/test_hash_validation.py
+++ b/tests/test_hash_validation.py
@ -1,6 +1,7 @@
 from dataclasses import dataclass
 from pathlib import Path

+from kernels import load_kernel
 from kernels.cli import download_kernels


@ -11,11 +12,13 @@ class DownloadArgs:
    project_dir: Path


-def test_download_hash_validation():
-    project_dir = Path(__file__).parent / "hash_validation"
-    download_kernels(DownloadArgs(all_variants=False, project_dir=project_dir))
-
-
 def test_download_all_hash_validation():
-    project_dir = Path(__file__).parent / "hash_validation"
+    project_dir = Path(__file__).parent / "kernel_locking"
    download_kernels(DownloadArgs(all_variants=True, project_dir=project_dir))
+
+
+def test_load_locked():
+    project_dir = Path(__file__).parent / "kernel_locking"
+    # Also validates that hashing works correctly.
+    download_kernels(DownloadArgs(all_variants=False, project_dir=project_dir))
+    load_kernel("kernels-community/activation", lockfile=project_dir / "kernels.lock")
--- a/tests/test_layer.py
+++ b/tests/test_layer.py
@ -0,0 +1,205 @@
+import pytest
+import torch
+import torch.nn as nn
+from torch.nn import functional as F
+
+from kernels import (
+    Device,
+    LayerRepository,
+    register_kernel_mapping,
+    use_kernel_forward_from_hub,
+)
+from kernels.layer import _KERNEL_MAPPING, _validate_layer, use_kernel_mapping
+
+kernel_layer_mapping = {
+    "SiluAndMul": {
+        Device(type="cuda"): LayerRepository(
+            repo_id="kernels-community/activation",
+            layer_name="SiluAndMul",
+            revision="layers",
+        )
+    },
+    "SiluAndMulStringDevice": {
+        "cuda": LayerRepository(
+            repo_id="kernels-community/activation",
+            layer_name="SiluAndMul",
+            revision="layers",
+        )
+    },
+}
+
+register_kernel_mapping(kernel_layer_mapping)
+
+
+class SiluAndMul(nn.Module):
+    def __init__(self):
+        super().__init__()
+        # Used to check that we called hub kernel.
+        self.n_calls = 0
+
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        self.n_calls += 1
+        d = input.shape[-1] // 2
+        return F.silu(input[..., :d]) * input[..., d:]
+
+
+@use_kernel_forward_from_hub("SiluAndMul")
+class SiluAndMulWithKernel(SiluAndMul):
+    pass
+
+
+@use_kernel_forward_from_hub("SiluAndMulStringDevice")
+class SiluAndMulStringDevice(SiluAndMul):
+    pass
+
+
+def test_arg_kinds():
+    @use_kernel_forward_from_hub("ArgKind")
+    class ArgKind(nn.Module):
+        def forward(
+            self,
+            arg1,
+            arg2,
+            *,
+            kwarg1,
+            kwarg2=42,
+        ):
+            return (arg1, arg2, kwarg1, kwarg2)
+
+    arg_kind = ArgKind()
+    assert arg_kind("foo", "bar", kwarg1="baz") == ("foo", "bar", "baz", 42)
+    assert arg_kind("foo", "bar", kwarg1="baz", kwarg2=5) == ("foo", "bar", "baz", 5)
+
+
+@pytest.mark.parametrize("cls", [SiluAndMulWithKernel, SiluAndMulStringDevice])
+@pytest.mark.parametrize("device", ["cuda", "cpu"])
+def test_hub_forward(cls, device):
+    torch.random.manual_seed(0)
+
+    silu_and_mul = SiluAndMul()
+    X = torch.randn((32, 64), device=device)
+    Y = silu_and_mul(X)
+
+    silu_and_mul_with_kernel = cls()
+    Y_kernel = silu_and_mul_with_kernel(X)
+
+    torch.testing.assert_close(Y_kernel, Y)
+
+    assert silu_and_mul.n_calls == 1
+    if device == "cuda":
+        assert silu_and_mul_with_kernel.n_calls == 0
+    else:
+        assert silu_and_mul_with_kernel.n_calls == 1
+
+
+def test_layer_fallback_works():
+    @use_kernel_forward_from_hub("SiluAndMulNonExisting")
+    class SiluAndMulWithKernelFallback(SiluAndMul):
+        pass
+
+    # Check that we don't raise an exception for a non-existing kernel.
+    SiluAndMulWithKernelFallback()
+
+
+def test_mapping_contexts():
+    assert set(_KERNEL_MAPPING.get().keys()) == {"SiluAndMul", "SiluAndMulStringDevice"}
+
+    extra_mapping1 = {
+        "TestKernel": {
+            Device(type="cuda"): LayerRepository(
+                repo_id="kernels-community/activation",
+                layer_name="SiluAndMul",
+                revision="layers",
+            )
+        }
+    }
+
+    with use_kernel_mapping(extra_mapping1):
+        assert set(_KERNEL_MAPPING.get().keys()) == {
+            "SiluAndMul",
+            "SiluAndMulStringDevice",
+            "TestKernel",
+        }
+
+        extra_mapping2 = {
+            "SiluAndMul": {
+                Device(type="cuda"): LayerRepository(
+                    repo_id="kernels-community/non-existing",
+                    layer_name="SiluAndMul",
+                    revision="layers",
+                )
+            }
+        }
+
+        with use_kernel_mapping(extra_mapping2):
+            assert set(_KERNEL_MAPPING.get().keys()) == {
+                "SiluAndMul",
+                "SiluAndMulStringDevice",
+                "TestKernel",
+            }
+            assert (
+                _KERNEL_MAPPING.get()["SiluAndMul"][Device(type="cuda")].repo_id
+                == "kernels-community/non-existing"
+            )
+
+        assert set(_KERNEL_MAPPING.get().keys()) == {
+            "SiluAndMul",
+            "SiluAndMulStringDevice",
+            "TestKernel",
+        }
+        assert (
+            _KERNEL_MAPPING.get()["SiluAndMul"][Device(type="cuda")].repo_id
+            == "kernels-community/activation"
+        )
+
+        with use_kernel_mapping(extra_mapping2, inherit_mapping=False):
+            assert set(_KERNEL_MAPPING.get().keys()) == {
+                "SiluAndMul",
+            }
+            assert (
+                _KERNEL_MAPPING.get()["SiluAndMul"][Device(type="cuda")].repo_id
+                == "kernels-community/non-existing"
+            )
+
+        assert set(_KERNEL_MAPPING.get().keys()) == {
+            "SiluAndMul",
+            "SiluAndMulStringDevice",
+            "TestKernel",
+        }
+        assert (
+            _KERNEL_MAPPING.get()["SiluAndMul"][Device(type="cuda")].repo_id
+            == "kernels-community/activation"
+        )
+
+    assert set(_KERNEL_MAPPING.get().keys()) == {
+        "SiluAndMul",
+        "SiluAndMulStringDevice",
+    }
+
+
+def test_validate_kernel_layer():
+    class BadLayer(nn.Module):
+        def __init__(self, *args, **kwargs):
+            super().__init__(*args, **kwargs)
+            self.foo = 42
+
+    with pytest.raises(TypeError, match="not override"):
+        _validate_layer(cls=BadLayer, check_cls=SiluAndMul)
+
+    class BadLayer2(nn.Module):
+        foo: int = 42
+
+    with pytest.raises(TypeError, match="not contain additional members"):
+        _validate_layer(cls=BadLayer2, check_cls=SiluAndMul)
+
+    class BadLayer3(nn.Module):
+        def forward(self, x: torch.Tensor, foo: int) -> torch.Tensor: ...
+
+    with pytest.raises(TypeError, match="different number of arguments"):
+        _validate_layer(cls=BadLayer3, check_cls=SiluAndMul)
+
+    class BadLayer4(nn.Module):
+        def forward(self, *, x: torch.Tensor) -> torch.Tensor: ...
+
+    with pytest.raises(TypeError, match="different kind of arguments"):
+        _validate_layer(cls=BadLayer4, check_cls=SiluAndMul)
Author	SHA1	Message	Date
Daniël de Kok	6fd2112e22	Set version to 0.4.3 (#71 )	2025-04-10 11:57:15 +02:00
Daniël de Kok	70f56ff856	Support `DISABLE_KERNEL_MAPPING` env var for completely disabling kernel mappings (#70 ) * Disable kernel mappings with `DISABLE_KERNEL_MAPPING=1` * Rename HF_KERNELS_CACHE to KERNELS_CACHE But still recognize the old variant for compatibility. * Add documentation for environment variables	2025-04-10 11:37:54 +02:00
Daniël de Kok	7178b0b86c	Add Apache License version 2.0 (#66 ) Fixes #64	2025-04-04 20:35:29 +02:00
Daniël de Kok	0bbf90a564	Update ABI requirement to `manylinux_2_28` (#65 )	2025-04-04 19:38:15 +02:00
Daniël de Kok	27d6ffcb80	Add more details about the ABI requirements (#63 )	2025-03-31 14:29:30 +02:00
Daniël de Kok	f7bd21438b	Set version to 0.4.2 (#62 )	2025-03-27 16:57:28 +01:00
Mohamed Mekkouri	6174febb4b	Add warning when layer_name not present in _KERNEL_MAPPING (#61 ) * add warning * fix import order	2025-03-27 16:22:58 +01:00
Daniël de Kok	ff55bc201b	Add support for fetching ROCm kernels (#59 )	2025-03-25 15:11:03 +01:00
Daniël de Kok	3808108d62	doc: add versioning (#58 )	2025-03-24 16:48:20 +01:00
Daniël de Kok	c4a16ef462	Actually export `use_kernel_mapping` at the top-level (#57 ) * Actually export `use_kernel_mapping` at the top-level * Set version to 0.4.1	2025-03-24 12:44:00 +01:00
Daniël de Kok	9762794dd2	Set version to 0.4.0 (#56 )	2025-03-21 20:49:01 +01:00
Daniël de Kok	b7d6867c52	`use_kernel_mapping`: add `inherit_mapping` option (#55 ) `inherit_mapping` is the default and extends the existing mapping with the given mapping. If `inherit_mapping` is `False`, existing mappings are not inherited.	2025-03-21 17:28:45 +01:00
Daniël de Kok	fbcd0f2ebd	Set version to 0.3.3 (#54 )	2025-03-20 16:09:11 +01:00
Daniël de Kok	5af46eca94	Align dependency versions with transformers (#53 )	2025-03-20 15:13:45 +01:00
Daniël de Kok	747dd66876	Set version to 0.3.2 (#51 )	2025-03-20 11:46:36 +01:00
Daniël de Kok	920590a592	Also export `replace_kernel_forward_from_hub` (#52 )	2025-03-20 11:46:18 +01:00
Daniël de Kok	5208ac4be5	Make torch an extra/dev dependency (#50 ) To support use of this package when Torch is optional.	2025-03-20 10:18:19 +01:00
Daniël de Kok	22eaba2826	Set version to 0.3.1 (#49 )	2025-03-19 16:35:10 +01:00
Daniël de Kok	9521ba79a0	Fix `forward` positional argument handling (#48 )	2025-03-19 15:54:51 +01:00
Daniël de Kok	9861a5bdef	Fix `forward` positional argument handling (#48 )	2025-03-19 15:34:35 +01:00
Daniël de Kok	1c7c87c960	Set version to 0.3.0 (#47 )	2025-03-19 12:02:02 +01:00
Daniël de Kok	df45cf2795	Add `use_kernel_forward_from_hub` decorator (#46 ) * Add `use_kernel_forward_from_hub` decorator This decorator replaces a layer's `forward` with the `forward` of a layer on the hub. * Add support for registering a mapping for the duration of a context This change makes `_KERNEL_MAPPING` a context variable and adds a `use_kernel_mapping` context manager. This allows users to register a mapping for the duration of a context. * Update layer docs * ruff fix * Remove an old bit from the docs * Extend layer mapping example Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Support stringly-typed device type * Forward-reference `register_kernel_mapping` in monkeypatching section * Use stringly-typed device name in layer mapping example Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-03-19 11:03:18 +01:00
Daniël de Kok	cf0413efe5	Add Nix flake devshell (#44 )	2025-03-11 10:59:12 +01:00
Daniël de Kok	851c13f666	Set version to 0.2.1 (#43 )	2025-03-10 15:20:34 +01:00
Daniël de Kok	b6a393612f	Pass through locked sha again when loading locked kernels (#42 ) This bit got removed accidentally when adding support for universal kernels. Also add a test to ensure that we'd catch this in the future.	2025-03-10 15:10:47 +01:00
Daniël de Kok	18ecd0ce69	Set version to 0.2.0 (#41 )	2025-03-10 10:24:02 +01:00
Daniël de Kok	b4ef1d60e5	Update torch dependency to 2.5 (#40 ) Fixes #37.	2025-03-07 20:32:54 +01:00
Daniël de Kok	a40756f306	Configure ruff lints and add to CI (#39 )	2025-03-07 20:32:44 +01:00
Daniël de Kok	3671158f47	Rename `noarch` to `universal` (#38 ) Also update docs to mention this variant.	2025-03-07 15:12:44 +01:00