Compare commits

..

42 Commits

Author SHA1 Message Date
29a930487a up 2025-09-29 19:51:36 +05:30
b0c431fee4 Add the kernels check subcommand (#158)
* Add the `kernels check` subcommand

This subcommand checks a given kernel. Currently it applies the same ABI
checks as `kernel-abi-check` in `kernel-builder`.

* Print an error when `build` contains files

* Forgot to update has_issues in two places
2025-09-25 19:05:29 +02:00
9a188eadbe up (#157) 2025-09-24 11:39:07 +02:00
457c7c1b8d Only run staging tests in one configuration (#156) 2025-09-23 10:52:47 +02:00
fb8cd99a2c Add support for NPU kernelize/layers (#155)
This change add support for Huawei Ascend NPUs. This is #146 with some formatting/typing fixes.

Co-authored-by: zheliuyu <15750543867@163.com>
2025-09-23 10:46:41 +02:00
dfee307d54 Set version to 0.10.2.dev0 (#154) 2025-09-22 18:54:09 +02:00
93e5765611 [tests] turn the kernels upload tests to be staging tests (#152) 2025-09-22 18:53:53 +02:00
bf488208be faq: why only replace forward methods? (#153) 2025-09-19 17:38:03 +02:00
2a14472e4c Bump huggingface_hub upper bound <2.0 (#151) 2025-09-19 16:56:30 +02:00
055a953552 Document the to-wheel subcommand (#149)
* Document the `to-wheel` subcommand

* Capitalization
2025-09-17 17:02:41 +02:00
692d5ad458 Fix some spelling errors to check docs CI is working (#120) 2025-09-17 13:44:09 +02:00
2139df57f4 rm link (#148) 2025-09-17 12:46:49 +02:00
8f9a77bb6a Describe the get_kernel/LayerRepository (#147)
This was already in the API documentation, but describe this in the
guides as well (since we want people to use versions).
2025-09-16 16:06:40 +02:00
6c00194680 Improve errors for layer validation (#145)
* Improve errors for layer validation

Include the repo and layer name as well as the name of the class
that is being compared to (when applicable).

* Remove upload xfail

* Only enable tests that require a token with `--token`
2025-09-16 14:40:54 +02:00
d6b51eefb7 [feat] add an uploading utility (#138)
* add an uploading utility.

* format

* remove stale files.

* black format

* sorted imports.

* up

* up

* add a test

* propagate.

* remove duplicate imports.

* Apply suggestions from code review

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* up

* up

* up

* command to format all files at once would be nice.

* up

* up

* up

* Use token for upload test

* assign env better.

* docs

* polish

* up

* xfail the test for now.

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2025-09-16 08:56:54 +02:00
d383fdd4b4 Add support for XPU layer repostories (#142)
This change adds support for XPU layer repositories, e.g.:

```
kernel_mapping = {
    "LigerRMSNorm": {
        "xpu": LayerRepository(
            repo_id="kernels-community/liger_kernels",
            layer_name="LigerRMSNorm",
        )
    },
}

Co-authored-by: YangKai0616 <kai.yang@intel.com>
2025-09-11 15:51:02 +02:00
07e5e8481a Set version to 0.10.1.dev0 (#140)
* Set version to 0.10.1.dev0

* Add `__version__` attribute to top-level module

This is needed for doc generation.
2025-09-10 09:08:02 +02:00
88f55d4728 XPU: look up kernel by framework version (#139)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-09-09 13:10:11 +02:00
e801ebf332 Set version to v0.10.0.dev0 (#137) 2025-09-05 10:48:41 +02:00
0ae07f05fc Remove default for mode argument of kernelize (#136) 2025-08-29 17:44:20 +02:00
7611021100 cpu is not (yet) a supported device type (#132)
Fixes #131.
2025-08-25 16:25:58 +02:00
767e7ccf13 fix: add get local tests (#134)
* fix: add tests for get local kernel

* fix: update test and add path example comments

* fix: run black linter
2025-08-21 13:01:48 -04:00
1caa4c1393 feat: improve get local kernel importing (#129)
* feat: improve get local kernel importing

* fix: adjust for linter
2025-08-08 10:22:29 -04:00
da701bf58a Small markup fixes of the local kernel repo example (#127) 2025-08-06 08:02:28 +02:00
703664ed31 Set version to 0.9.0.dev0 (#126) 2025-08-01 16:37:30 +02:00
a8a6564fa7 Add ROCm device discovery (#122)
* Add ROCm device discovery

* Ruff

* Address review comments

* Ruff

* Reorg torch import

* Remove redundant import

* Apply suggestions from code review

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Address review comments

* Validat device type

* Clean diff

* black

* Sync test with repo changes

* black again

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2025-08-01 16:09:45 +02:00
c89e0fa9b9 Nix: go back to hf-nix main (#125) 2025-08-01 15:56:02 +02:00
176a601178 Run black check (#124) 2025-08-01 15:42:38 +02:00
cfa0c76ddc Add LocalLayerRepository to load from a local repo (#123) 2025-08-01 14:03:11 +02:00
bcc29915f9 Log when using fallback layer (#121) 2025-07-31 17:18:00 +02:00
6fbff7a9cb Add doc build to CI (#119)
* Add doc build to CI

* Trigger doc build

* No path scoping
2025-07-29 16:01:05 +02:00
f7490bd0a9 Test examples in docstrings using mktestdocs (#118)
Also adjust examples so that they are correct.
2025-07-28 17:31:34 +02:00
8069e3bf0c Update documentation for compatibility with doc-builder (#117) 2025-07-24 16:21:54 +02:00
c540d1e1d6 Fix typo in layers documentation (#116) 2025-07-23 17:13:14 +02:00
967ac581b8 Set version to 0.8.1.dev0 (#115) 2025-07-23 14:42:24 +02:00
81088d44e8 Add support for project-wide locking of layers (#114)
This change adds `LockedLayerRepository` as an alternative to
`LayerRepository`. `LockedLayerRepository` allows for locking all kernel
layers that are used at the project level. Example usage:

```
with use_kernel_mapping(
    {
        "SomeLayer": {
            "cuda": LockedLayerRepository(
                repo_id="some-org/some-layer",
                layer_name="SomeLayer",
            )
        },
    }
):
    layer = kernelize(layer, device="cuda", mode=Mode.INFERENCE)
```

This requires that the project has a `pyproject.toml` with kernel
version specifications and `kernel.lock` with the locked kernels.
2025-07-23 09:37:05 +02:00
4a04c005e3 Add version support to LayerRepository (#113)
* Add version support to `LayerRepository`

* Remove some docs that do not apply

* Removed unused member variable
2025-07-22 17:02:39 +02:00
6d3c6daf20 triton based kernel could also run in xpu (#112)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-22 10:03:34 +02:00
071900fd69 get_kernel: allow Python-style version specifiers (#111)
Use Python-style version specifiers to resolve to tags. E.g., given
the presence of the tags `v0.1.0`, `v0.1.1`, and `v0.2.0`,

get_kernel("my/kernel", version=">=0.1.0,<0.2.0")

would resolve to `v0.1.1`.
2025-07-21 17:18:35 +02:00
2d2c6b14e0 Set version to 0.8.0.dev0 (#110) 2025-07-15 18:45:03 +02:00
03edc573b1 Log kernel layer selection (#109) 2025-07-15 18:38:17 +02:00
c841a6c90d Improve mode handling (#108)
* Set `kernelize` default mode to `Mode.TRAINING | Mode.TORCH_COMPILE`

Also update docs and tests.

* Rename `Mode.DEFAULT` to `Mode.FALLBACK`

* More fine-grained fallbacks

For instance, INFERENCE can fall back to INFERENCE | TORCH_COMPILE,
TRAINING, TRAINING | TORCH_COMPILE, and FALLBACK.

* Update documtenation for mode fallback

* Mention that you can rerun `kernelize` to change the mode
2025-07-15 16:10:43 +02:00
43 changed files with 2670 additions and 368 deletions

View File

@ -0,0 +1,17 @@
name: Build documentation
on:
push:
branches:
- main
- doc-builder*
- v*-release
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: kernels
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@ -0,0 +1,15 @@
name: Build PR Documentation
on: pull_request
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: kernels

View File

@ -8,3 +8,24 @@ jobs:
- uses: actions/checkout@v4
- name: Run ruff
uses: astral-sh/ruff-action@v3
black:
name: Run black check
runs-on: ubuntu-latest
env:
UV_PYTHON_PREFERENCE: only-managed
steps:
- uses: actions/checkout@v4
- name: Install uv and set the python version
uses: astral-sh/setup-uv@v5
with:
python-version: 3.12
- name: Install black
run: uv pip install black
- name: Check formatting
run: |
uv run black --check src
uv run black --check tests

View File

@ -51,7 +51,15 @@ jobs:
run: uv run mypy src/kernels
- name: Run tests
run: uv run pytest tests
run: |
uv run pytest tests
- name: Run staging tests
env:
HF_TOKEN: ${{ secrets.HF_STAGING_TOKEN }}
run: |
HUGGINGFACE_CO_STAGING=true uv run pytest --token -m "is_staging_test" tests/
if: matrix.python_version == '3.10' && matrix.torch-version == '2.7.0'
- name: Check kernel conversion
run: |
@ -65,6 +73,11 @@ jobs:
run: |
uv run kernels generate-readme kernels-community/triton-layer-norm
- name: Check kernel check
run: |
uv pip install kernel-abi-check
kernels check kernels-community/activation
- name: Import check without torch
run: |
uv pip uninstall torch

View File

@ -0,0 +1,16 @@
name: Upload PR Documentation
on:
workflow_run:
workflows: ["Build PR Documentation"]
types:
- completed
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
with:
package_name: kernels
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}

8
Makefile Normal file
View File

@ -0,0 +1,8 @@
.PHONY: style
export check_dirs := src examples tests
style:
black ${check_dirs}
isort ${check_dirs}
ruff check ${check_dirs} --fix

View File

@ -56,10 +56,12 @@ the Hub.
## 📚 Documentation
- [Using layers](docs/layers.md)
- [Locking kernel versions](docs/locking.md)
- [Environment variables](docs/env.md)
- [Using kernels in a Docker container](docs/docker.md)
- [Kernel requirements](docs/kernel-requirements.md)
- [Frequently Asked Questions](docs/faq.md)
- [Introduction](docs/source/index.md)
- [Installation](docs/source/installation.md)
- [Basic usage](docs/source/basic-usage.md)
- [Using layers](docs/source/layers.md)
- [Locking kernel/layer versions](docs/source/locking.md)
- [Environment variables](docs/source/env.md)
- [Kernel requirements](docs/source/kernel-requirements.md)
- [Frequently Asked Questions](docs/source/faq.md)
- [Writing kernels](https://github.com/huggingface/kernel-builder/blob/main/docs/writing-kernels.md) using [kernel-builder](https://github.com/huggingface/kernel-builder/)

View File

@ -1,8 +0,0 @@
# Using kernels in a Docker container
build and run the reference [examples/basic.py](examples/basic.py) in a Docker container with the following commands:
```bash
docker build --platform linux/amd64 -t kernels-reference -f docker/Dockerfile.reference .
docker run --gpus all -it --rm -e HF_TOKEN=$HF_TOKEN kernels-reference
```

View File

@ -1,13 +0,0 @@
# FAQ
## Why is the kernelization step needed?
In earlier versions of `kernels`, a layer's `forward` was replaced by
`use_kernel_forward_from_hub` and `replace_kernel_forward_from_hub`. The
new `forward` would dispatch to a kernel based on the device type,
whether a model was training, etc. However, this approach was
fundamentally incompatible with `torch.compile` since it relied
on data-dependent branching.
To avoid branching, we have to make dispatch decisions ahead of time,
which is what the `kernelize` function does.

30
docs/source/_toctree.yml Normal file
View File

@ -0,0 +1,30 @@
- sections:
- local: index
title: Introduction
- local: installation
title: Installation
title: Getting started
- sections:
- local: basic-usage
title: Basic Usage
- local: layers
title: Using Layers
- local: locking
title: Locking Kernel Versions
- local: env
title: Environment Variables
- local: faq
title: FAQ
title: Usage Guide
- sections:
- local: api/kernels
title: Kernels
- local: api/layers
title: Layers
- local: cli
title: Kernels CLI
title: API Reference
- sections:
- local: kernel-requirements
title: Kernel Requirements
title: Developer Guide

View File

@ -0,0 +1,21 @@
# Kernels API Reference
## Main Functions
### get_kernel
[[autodoc]] kernels.get_kernel
### has_kernel
[[autodoc]] kernels.has_kernel
## Loading locked kernels
### load_kernel
[[autodoc]] kernels.load_kernel
### get_locked_kernel
[[autodoc]] kernels.get_locked_kernel

41
docs/source/api/layers.md Normal file
View File

@ -0,0 +1,41 @@
# Layers API Reference
## Making layers kernel-aware
### use_kernel_forward_from_hub
[[autodoc]] kernels.use_kernel_forward_from_hub
### replace_kernel_forward_from_hub
[[autodoc]] kernels.replace_kernel_forward_from_hub
## Registering kernel mappings
### use_kernel_mapping
[[autodoc]] kernels.use_kernel_mapping
### register_kernel_mapping
[[autodoc]] kernels.register_kernel_mapping
## Kernelizing a model
### kernelize
[[autodoc]] kernels.kernelize
## Classes
### Device
[[autodoc]] kernels.Device
### Mode
[[autodoc]] kernels.Mode
### LayerRepository
[[autodoc]] kernels.LayerRepository

View File

@ -0,0 +1,50 @@
# Basic Usage
## Loading Kernels
Here is how you would use the [activation](https://huggingface.co/kernels-community/activation) kernels from the Hugging Face Hub:
```python
import torch
from kernels import get_kernel
# Download optimized kernels from the Hugging Face hub
activation = get_kernel("kernels-community/activation")
# Create a random tensor
x = torch.randn((10, 10), dtype=torch.float16, device="cuda")
# Run the kernel
y = torch.empty_like(x)
activation.gelu_fast(y, x)
print(y)
```
### Using version bounds
Kernels are versioned using tags of the form `v<major>.<minor>.<patch>`.
You can specify which version to download using Python version specifiers:
```python
import torch
from kernels import get_kernel
activation = get_kernel("kernels-community/activation", version=">=0.0.4,<0.1.0")
```
This will get the latest kernel tagged `v0.0.z` where `z` is at least 4. It
is strongly recommended to specify a version bound, since a kernel author
might push incompatible changes to the `main` branch.
## Checking Kernel Availability
You can check if a specific kernel is available for your environment:
```python
from kernels import has_kernel
# Check if kernel is available for current environment
is_available = has_kernel("kernels-community/activation")
print(f"Kernel available: {is_available}")
```

58
docs/source/cli.md Normal file
View File

@ -0,0 +1,58 @@
# Kernels CLI Reference
## Main Functions
### kernels check
You can use `kernels check` to test compliance of a kernel on the Hub.
This currently checks that the kernel:
- Supports the currently-required Python ABI version.
- Works on supported operating system versions.
For example:
```bash
$ kernels check kernels-community/flash-attn3
Checking variant: torch28-cxx11-cu128-aarch64-linux
🐍 Python ABI 3.9 compatible
🐧 manylinux_2_28 compatible
[...]
```
### kernels to-wheel
We strongly recommend downloading kernels from the Hub using the `kernels`
package, since this comes with large [benefits](index.md) over using Python
wheels. That said, some projects may require deployment of kernels as
wheels. The `kernels` utility provides a simple solution to this. You can
convert any Hub kernel into a set of wheels with the `to-wheel` command:
```bash
$ kernels to-wheel drbh/img2grey 1.1.2
☸ img2grey-1.1.2+torch27cu128cxx11-cp39-abi3-manylinux_2_28_x86_64.whl
☸ img2grey-1.1.2+torch26cu124cxx11-cp39-abi3-manylinux_2_28_x86_64.whl
☸ img2grey-1.1.2+torch26cu126cxx11-cp39-abi3-manylinux_2_28_x86_64.whl
☸ img2grey-1.1.2+torch27cu126cxx11-cp39-abi3-manylinux_2_28_x86_64.whl
☸ img2grey-1.1.2+torch26cu126cxx98-cp39-abi3-manylinux_2_28_x86_64.whl
☸ img2grey-1.1.2+torch27cu128cxx11-cp39-abi3-manylinux_2_28_aarch64.whl
☸ img2grey-1.1.2+torch26cu126cxx98-cp39-abi3-manylinux_2_28_aarch64.whl
☸ img2grey-1.1.2+torch27cu126cxx11-cp39-abi3-manylinux_2_28_aarch64.whl
☸ img2grey-1.1.2+torch26cu126cxx11-cp39-abi3-manylinux_2_28_aarch64.whl
☸ img2grey-1.1.2+torch26cu118cxx98-cp39-abi3-manylinux_2_28_x86_64.whl
☸ img2grey-1.1.2+torch26cu124cxx98-cp39-abi3-manylinux_2_28_x86_64.whl
☸ img2grey-1.1.2+torch26cu118cxx11-cp39-abi3-manylinux_2_28_x86_64.whl
☸ img2grey-1.1.2+torch27cu118cxx11-cp39-abi3-manylinux_2_28_x86_64.whl
```
### kernels upload
Use `kernels upload <dir_containing_build> --repo_id="hub-username/kernel"` to upload
your kernel builds to the Hub.
**Notes**:
- This will take care of creating a repository on the Hub with the `repo_id` provided.
- If a repo with the `repo_id` already exists and if it contains a `build` with the build variant
being uploaded, it will attempt to delete the files existing under it.
- Make sure to be authenticated (run `hf auth login` if not) to be able to perform uploads to the Hub.

41
docs/source/faq.md Normal file
View File

@ -0,0 +1,41 @@
# FAQ
## Kernel layers
### Why is the kernelization step needed as a separate step?
In earlier versions of `kernels`, a layer's `forward` method was replaced
by `use_kernel_forward_from_hub` and `replace_kernel_forward_from_hub`.
The new `forward` would dispatch to a kernel based on the device type,
whether a model was training, etc. However, this approach was
fundamentally incompatible with `torch.compile` since it relied
on data-dependent branching.
To avoid branching, we have to make dispatch decisions ahead of time,
which is what the `kernelize` function does.
### Why does kernelization only replace `forward` methods?
There are some other possible approaches. The first is to completely
replace existing layers by kernel layers. However, since this would
permit free-form layer classes, it would be much harder to validate
that layers are fully compatible with the layers that they are
replacing. For instance, they could have completely different member
variables. Besides that, we would also need to hold on to the original
layers, in case we need to revert to the base layers when the model
is `kernelize`d again with different options.
A second approach would be to make an auxiliary layer that wraps the
original layer and the kernel layer and dispatches to the kernel layer.
This wouldn't have the issues of the first approach, because kernel layers
could be similarly strict as they are now, and we would still have access
to the original layers when `kernelize`-ing the model again. However,
this would change the graph structure of the model and would break use
cases where programs access the model internals (e.g.
`model.layers[0].attention.query_weight`) or rely on the graph structure
in other ways.
The approach of `forward`-replacement is the least invasive, because
it preserves the original model graph. It is also reversible, since
even though the `forward` of a layer _instance_ might be replaced,
the corresponding class still has the original `forward`.

20
docs/source/index.md Normal file
View File

@ -0,0 +1,20 @@
# Kernels
<div align="center">
<img src="https://github.com/user-attachments/assets/64a652f3-0cd3-4829-b3c1-df13f7933569" width="450" height="450" alt="kernel-builder logo">
</div>
The Kernel Hub allows Python libraries and applications to load compute
kernels directly from the [Hub](https://hf.co/). To support this kind
of dynamic loading, Hub kernels differ from traditional Python kernel
packages in that they are made to be:
- **Portable**: a kernel can be loaded from paths outside `PYTHONPATH`.
- **Unique**: multiple versions of the same kernel can be loaded in the
same Python process.
- **Compatible**: kernels must support all recent versions of Python and
the different PyTorch build configurations (various CUDA versions
and C++ ABIs). Furthermore, older C library versions must be supported.
You can [search for kernels](https://huggingface.co/models?other=kernel) on
the Hub.

View File

@ -0,0 +1,16 @@
# Installation
Install the `kernels` package with `pip` (requires `torch>=2.5` and CUDA):
```bash
pip install kernels
```
# Using kernels in a Docker container
Build and run the reference `examples/basic.py` in a Docker container with the following commands:
```bash
docker build --platform linux/amd64 -t kernels-reference -f docker/Dockerfile.reference .
docker run --gpus all -it --rm -e HF_TOKEN=$HF_TOKEN kernels-reference
```

View File

@ -34,6 +34,8 @@ Kernels are versioned on the Hub using Git tags. Version tags must be of
the form `v<major>.<minor>.<patch>`. Versions are used by [locking](./locking.md)
to resolve the version constraints.
We recommend using [semver](https://semver.org/) to version kernels.
## Native Python module
Kernels will typically contain a native Python module with precompiled
@ -44,19 +46,28 @@ have dynamic library dependencies outside:
- Torch;
- CUDA/ROCm libraries installed as dependencies of Torch.
## Compatibility with torch.compile
The Kernel Hub also encourages to write the kernels in a `torch.compile`
compliant way. This helps to ensure that the kernels are compatible with
`torch.compile` without introducing any graph breaks and triggering
recompilation which can limit the benefits of compilation.
[Here](https://github.com/huggingface/kernel-builder/blob/d1ee9bf9301ac8c5199099d90ee1c9d5c789d5ba/examples/relu-backprop-compile/tests/test_relu.py#L162) is a simple test example which checks for graph breaks and
recompilation triggers during `torch.compile`.
### Linux
- Use [ABI3/Limited API](https://docs.python.org/3/c-api/stable.html#stable-application-binary-interface)
for compatibility with Python 3.9 and later.
- Compatible with [`manylinux_2_28`](https://github.com/pypa/manylinux?tab=readme-ov-file#manylinux_2_28-almalinux-8-based).
This means that the extension **must not** use symbols versions higher than:
- GLIBC 2.28
- GLIBCXX 3.4.24
- CXXABI 1.3.11
- GCC 7.0.0
These requirement can be checked with the ABI checker (see below).
These requirements can be checked with the ABI checker (see below).
### macOS

View File

@ -5,7 +5,7 @@ the Hub can replace the `forward` method of an existing layer for a certain
device type. This makes it possible to provide more performant kernels for
existing layers.
See [Kernel requirements](kernel-requirements.md) for more information the
See [Kernel requirements](kernel-requirements.md) for more information on the
requirements of Hub layers.
## Making a layer extensible with kernels from the hub
@ -56,22 +56,33 @@ model = MyModel(...)
model = kernelize(model, mode=Mode.INFERENCE)
```
The `kernelize` function modifies the model in-place, the model
itself is returned as a convenience.
The `mode` specifies that the model will be used in inference. Similarly,
you can ask `kernelize` to prepare the model for training:
The `kernelize` function modifies the model in-place, the model itself is
returned as a convenience. The `mode` specifies that the model will be used
in inference. Similarly, you can ask `kernelize` to prepare the model for
training:
```python
model = MyModel(...)
model = kernelize(model, mode=Mode.TRAINING)
```
When the `mode` argument is not specified, the
`Mode.TRAINING | Mode.TORCH_COMPILE` mode is used as the default. This mode
aligns most closely with pure PyTorch layers (which generally support backward
passes and `torch.compile`). However, this mode can also lead to fewer
kernels being used, since not all kernels support training or `torch.compile`.
A model that is kernelized for training can also be used for inference, but
not the other way around. If you want to change the mode of the kernelized
model, you can just run `kernelize` on the model again with the new mode.
If you want to compile a model with `torch.compile`, this should be indicated
in the mode as well. You can do this by combining `Mode.INFERENCE` or
`Mode.TRAINING` with `Mode.TORCH_COMPILE` using the set union (`|`) operator:
```python
model = MyModel(...)
# Inference
model = kernelize(model, mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
# Training
model = kernelize(model, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
```
### Kernel device
@ -86,22 +97,11 @@ model = MyModel(...)
model = kernelize(model, device="cuda", mode=Mode.INFERENCE)
```
### `torch.compile`
Not all Hub kernels support `torch.compile`. If you want to compile a model
after kernelizing it, you need to add this to the mode. You can use the
set union (`|`) operator to add `TORCH_COMPILE` to the mode:
```python
model = MyModel(...)
model = kernelize(model, mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
```
### Fallback `forward`
If the `TRAINING` and/or `TORCH_COMPILE` modes are used, but a registered
kernel does not support backward passes or `torch.compile` respectively,
`kernenize` will fall back to the original, non-kernelized, layer. You
`kernelize` will fall back to the original, non-kernelized, layer. You
can let `kernelize` raise an exception instead by using `use_fallback=False`:
```python
@ -111,6 +111,12 @@ model = kernelize(model, mode=Mode.INFERENCE | Mode.TORCH_COMPILE, use_fallback=
This can be useful if you want to guarantee that Hub kernels are used.
### Inspecting which kernels are used
The kernels that are used are logged at the `INFO` level by `kernelize`.
See the [Python logging](https://docs.python.org/3/library/logging.html)
documentation for information on how to configure logging.
## Registering a hub kernel for a layer
`kernelize` relies on kernel mappings to find Hub kernels for layers.
@ -123,6 +129,10 @@ kernel_layer_mapping = {
"cuda": LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
),
"rocm": LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
)
}
}
@ -141,12 +151,39 @@ used with the `use_kernel_mapping` context manager:
```python
with use_kernel_mapping(kernel_layer_mapping):
# Use the layer for which the mapping is applied.
model = kernelize(model)
model = kernelize(model, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
```
This ensures that the mapping is not active anymore outside the
`with`-scope.
### Using version bounds
Kernels are versioned using tags of the form `v<major>.<minor>.<patch>`.
You can specify which version of the kernel to download using Python version
specifiers:
```python
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
version=">=0.0.4,<0.1.0",
),
"rocm": LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
version=">=0.0.4,<0.1.0",
)
}
}
```
This will get the layer from latest kernel tagged `v0.0.z` where `z` is at
least 4. It is strongly recommended to specify a version bound, since a
kernel author might push incompatible changes to the `main` branch.
### Registering kernels for specific modes
You might want to register two different kernels for a particular layer,
@ -170,18 +207,25 @@ kernel_layer_mapping = {
}
```
The kernels will match exactly on the mode. So, for instance in the example above, no kernel
layer is used when the `mode` passed to `kernelize` is
`Mode.INFERENCE | Mode.TORCH_COMPILE` or `Mode.TRAINING`. However, if you want to
register a kernel to be used when the mode does not match any of the
modes in the mapping, you can use the special `Mode.DEFAULT` mode to do
so. For example:
The `kernelize` function will attempt to use the following registered
kernels for a given mode:
- `INFERENCE`: `INFERENCE``INFERENCE | TORCH_COMPILE``TRAINING`
`TRAINING | TORCH_COMPILE``FALLBACK`
- `INFERENCE | TORCH_COMPILE`: `INFERENCE | TORCH_COMPILE`
`TRAINING | TORCH_COMPILE``FALLBACK`
- `TRAINING`: `TRAINING``TRAINING | TORCH_COMPILE``FALLBACK`
- `TRAINING | TORCH_COMPILE`: `TRAINING | TORCH_COMPILE``FALLBACK`
`Mode.FALLBACK` is a special mode that is used when no other mode matches. It
is also used when a kernel is registered without a mode, as described in the
previous section.
```python
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": {
Mode.DEFAULT: LayerRepository(
Mode.FALLBACK: LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
),
@ -189,7 +233,7 @@ kernel_layer_mapping = {
repo_id="kernels-community/activation-inference-optimized",
layer_name="SiluAndMul",
),
Mode.TRAINING | Mode.TORCH_COMPILE: LayerRepository(
Mode.TRAINING: LayerRepository(
repo_id="kernels-community/activation-training-optimized",
layer_name="SiluAndMul",
),
@ -198,34 +242,9 @@ kernel_layer_mapping = {
}
```
In this case, modes other than `Mode.INFERENCE` and
`Mode.TRAINING | Mode.TORCH_COMPILE` will be kernelized using
`kernels-community/activation`.
### Mode fallback behavior
As described above, if there is no exact match for the mode given to
`kernelize`, it will try to use the kernel registered for `Mode.DEFAULT`.
If the `Mode.DEFAULT` kernel does not support the `kernelize` mode, the
original layer's `forward` method will be used instead.
As an example, suppose that two kernels were registered for a layer:
1. Kernel `A` is registered for `Mode.DEFAULT`. This kernel supports training
(backward), but not `torch.compile`.
2. Kernel `B` is registered for `Mode.INFERENCE | Mode.COMPILE` and supports
`torch.compile`.
`kernelize` modes will then behave as follows:
- `Mode.INFERENCE | Mode.COMPILE`` uses kernel `B`: exact match.
- `Mode.INFERENCE` uses kernel `A`: no exact match, so fall back to
`Mode.DEFAULT`.
- `Mode.TRAIN` uses kernel `A`: no exact match, so fall back to
`Mode.DEFAULT`, which supports training.
- `Mode.TRAIN | Mode.COMPILE`: uses the original layer's
`forward`: no exact match, falling back to `Mode.DEFAULT` is not possible
because kernel `A` does not support `torch.compile`.
In this case, both `Mode.INFERENCE | Mode.TORCH_COMPILE` and
`Mode.TRAINING | Mode.TORCH_COMPILE` will use the `Mode.FALLBACK` kernel,
since the other kernels do not support `torch.compile`.
### Registering kernels for specific CUDA capabilities
@ -267,7 +286,6 @@ Capabilities behave as follows:
an existing kernel, the new kernel will replace the old kernel.
- When there are multiple kernels that support a capability, the kernel
with the smaller capability interval will be used. E.g. given:
- `KernelA` with `min_capability=80` and `max_capability=89`;
- `KernelB` with `min_capability=75` and `max_capability=89`;
- `kernelize` runs on a system with capability 8.6.
@ -276,3 +294,30 @@ Capabilities behave as follows:
than 75..89. The motivation is that kernels with smaller ranges
tend to be more optimized for a specific set of GPUs. **This behavior
might still change in the future.**
### Registering kernels for specific ROCm capabilities
Registering kernels for the ROCm architecture follows the exact same
pattern as CUDA kernels, using `min_capability` and `max_capability` to restrict
a kernel to a range of ROCm capabilities.
### Loading from a local repository for testing
The `LocalLayerRepository` class is provided to load a repository from
a local directory. For example:
```python
with use_kernel_mapping(
{
"SiluAndMul": {
"cuda": LocalLayerRepository(
repo_path="/home/daniel/kernels/activation",
package_name="activation",
layer_name="SiluAndMul",
)
}
},
inherit_mapping=False,
):
kernelize(linear, mode=Mode.INFERENCE)
```

View File

@ -1,4 +1,4 @@
# Locking kernel versions
# Locking kernel/layer versions
Projects that use `setuptools` can lock the kernel versions that should be
used. First specify the accepted versions in `pyproject.toml` and make
@ -26,6 +26,24 @@ activation = get_locked_kernel("kernels-community/activation")
**Note:** the lock file is included in the package metadata, so it will only be visible
to `kernels` after doing an (editable or regular) installation of your project.
## Locked kernel layers
Locking is also supported for kernel layers. To use locked layers, register them
with the `LockedLayerRepository` class:
```python
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": LockedLayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
)
}
}
register_kernel_mapping(kernel_layer_mapping)
```
## Pre-downloading locked kernels
Locked kernels can be pre-downloaded by running `kernels download .` in your

View File

@ -20,11 +20,11 @@ activation.gelu_fast(y, x)
print("Kernel successfully executed")
# Check results
expected = torch.tensor([
[0.8408, 1.9551, 2.9961],
[4.0000, 5.0000, 6.0000],
[7.0000, 8.0000, 9.0000]
], device='cuda:0', dtype=torch.float16)
expected = torch.tensor(
[[0.8408, 1.9551, 2.9961], [4.0000, 5.0000, 6.0000], [7.0000, 8.0000, 9.0000]],
device="cuda:0",
dtype=torch.float16,
)
assert torch.allclose(y, expected)
print("Calculated values are exact")

18
flake.lock generated
View File

@ -58,11 +58,11 @@
"nixpkgs": "nixpkgs"
},
"locked": {
"lastModified": 1750775451,
"narHash": "sha256-HiGqtwzIgUH7Xkh+wgpvHRZGooqrW0z663E6nauczA4=",
"lastModified": 1754038838,
"narHash": "sha256-oHigCT4z0ayyLyEuxdZooSXRAZP8lfOkZHzY1lx1U50=",
"owner": "huggingface",
"repo": "hf-nix",
"rev": "5943c3169e861618a6634bc8dbdb498e413ab9b7",
"rev": "336f781fa284e193baa3d4c3ce3f95fb34e9ffad",
"type": "github"
},
"original": {
@ -73,17 +73,17 @@
},
"nixpkgs": {
"locked": {
"lastModified": 1747820358,
"narHash": "sha256-fTqsZsUX6M3yeEvgyQvXcbGmT2CaRVyVwsi8eK29Oj4=",
"owner": "danieldk",
"lastModified": 1752785354,
"narHash": "sha256-Y33ryUz7MPqKrZwlbQcsYCUz2jAJCacRf8jbs0tYUlA=",
"owner": "nixos",
"repo": "nixpkgs",
"rev": "d3c1681180717528068082103bf323147de6ab0b",
"rev": "d38025438a6ee456758dc03188ca6873a415463b",
"type": "github"
},
"original": {
"owner": "danieldk",
"ref": "cudatoolkit-12.9-kernel-builder",
"owner": "nixos",
"repo": "nixpkgs",
"rev": "d38025438a6ee456758dc03188ca6873a415463b",
"type": "github"
}
},

View File

@ -24,8 +24,13 @@
in
{
formatter = pkgs.nixfmt-tree;
packages.kernel-abi-check = pkgs.python3.pkgs.callPackage ./nix/kernel-abi-check.nix {};
devShells = with pkgs; rec {
default = mkShell {
nativeBuildInputs = [
# For hf-doc-builder.
nodejs
];
buildInputs =
[
black
@ -36,6 +41,8 @@
++ (with python3.pkgs; [
docutils
huggingface-hub
(callPackage ./nix/kernel-abi-check.nix {})
mktestdocs
pytest
pytest-benchmark
pyyaml

27
nix/kernel-abi-check.nix Normal file
View File

@ -0,0 +1,27 @@
{
buildPythonPackage,
fetchPypi,
rustPlatform,
}:
buildPythonPackage rec {
pname = "kernel-abi-check";
version = "0.6.2";
src = fetchPypi {
inherit version;
pname = "kernel_abi_check";
hash = "sha256-goWC7SK79FVNEvkp3bISBwbOqdSrmobANtrWIve9/Ys=";
};
cargoDeps = rustPlatform.fetchCargoVendor {
inherit pname version src sourceRoot;
hash = "sha256-+1jdbKsDKmG+bf0NEVYMv8t7Meuge1z2cgYfbdB9q8A=";
};
sourceRoot = "kernel_abi_check-${version}/bindings/python";
pyproject = true;
nativeBuildInputs = with rustPlatform; [ cargoSetupHook maturinBuildHook ];
}

View File

@ -1,6 +1,6 @@
[project]
name = "kernels"
version = "0.7.0.dev0"
version = "0.10.2.dev0"
description = "Download compute kernels"
authors = [
{ name = "OlivierDehaene", email = "olivier@huggingface.co" },
@ -12,7 +12,7 @@ license = { text = "Apache-2.0" }
readme = "README.md"
requires-python = ">= 3.9"
dependencies = [
"huggingface_hub>=0.26.0,<1.0",
"huggingface_hub>=0.26.0,<2.0",
"packaging>=20.0",
"pyyaml>=6",
"tomli>=2.0; python_version<'3.11'",
@ -24,16 +24,21 @@ build-backend = "setuptools.build_meta"
[dependency-groups]
dev = [
"mypy >= 1.15.0",
"pytest >=8",
"mktestdocs>=0.2.5",
"mypy>=1.15.0",
"pytest>=8",
# Whatever version is compatible with pytest.
"pytest-benchmark",
"torch >=2.5",
"torch>=2.5",
"types-pyyaml"
]
[project.optional-dependencies]
abi-check = ["kernel-abi-check>=0.6.2,<0.7.0"]
torch = ["torch"]
docs = [
"hf-doc-builder",
]
[project.scripts]
kernels = "kernels.cli:main"
@ -41,6 +46,9 @@ kernels = "kernels.cli:main"
[project.entry-points."egg_info.writers"]
"kernels.lock" = "kernels.lockfile:write_egg_lockfile"
[tool.isort]
profile = "black"
line_length = 119
[tool.ruff]
exclude = [
@ -67,4 +75,4 @@ line-length = 119
# Ignored rules:
# "E501" -> line length violation
lint.ignore = ["E501"]
lint.select = ["E", "F", "I", "W"]
lint.select = ["E", "F", "W"]

View File

@ -1,4 +1,9 @@
[pytest]
markers =
cuda_only: marks tests that should only hosts with CUDA GPUs
rocm_only: marks tests that should only run on hosts with ROCm GPUs
darwin_only: marks tests that should only run on macOS
linux_only: marks tests that should only run on Linux
xpu_only: marks tests that should only run on hosts with Intel XPUs
npu_only: marks tests that should only run on Ascend NPUs
token: enable tests that require a write token
is_staging_test: Marks tests that should only run on a staging environment

View File

@ -1,7 +1,13 @@
import importlib.metadata
__version__ = importlib.metadata.version("kernels")
from kernels.layer import (
CUDAProperties,
Device,
LayerRepository,
LocalLayerRepository,
LockedLayerRepository,
Mode,
kernelize,
register_kernel_mapping,
@ -19,9 +25,12 @@ from kernels.utils import (
)
__all__ = [
"__version__",
"CUDAProperties",
"Device",
"LayerRepository",
"LocalLayerRepository",
"LockedLayerRepository",
"Mode",
"get_kernel",
"get_local_kernel",

52
src/kernels/_versions.py Normal file
View File

@ -0,0 +1,52 @@
from typing import Dict, Optional
from huggingface_hub import HfApi
from huggingface_hub.hf_api import GitRefInfo
from packaging.specifiers import SpecifierSet
from packaging.version import InvalidVersion, Version
def _get_available_versions(repo_id: str) -> Dict[Version, GitRefInfo]:
"""Get kernel versions that are available in the repository."""
versions = {}
for tag in HfApi().list_repo_refs(repo_id).tags:
if not tag.name.startswith("v"):
continue
try:
versions[Version(tag.name[1:])] = tag
except InvalidVersion:
continue
return versions
def resolve_version_spec_as_ref(repo_id: str, version_spec: str) -> GitRefInfo:
"""
Get the locks for a kernel with the given version spec.
The version specifier can be any valid Python version specifier:
https://packaging.python.org/en/latest/specifications/version-specifiers/#version-specifiers
"""
versions = _get_available_versions(repo_id)
requirement = SpecifierSet(version_spec)
accepted_versions = sorted(requirement.filter(versions.keys()))
if len(accepted_versions) == 0:
raise ValueError(
f"No version of `{repo_id}` satisfies requirement: {version_spec}"
)
return versions[accepted_versions[-1]]
def select_revision_or_version(
repo_id: str, revision: Optional[str], version: Optional[str]
) -> str:
if revision is not None and version is not None:
raise ValueError("Either a revision or a version must be specified, not both.")
elif revision is None and version is None:
revision = "main"
elif version is not None:
revision = resolve_version_spec_as_ref(repo_id, version).target_commit
assert revision is not None
return revision

141
src/kernels/check.py Normal file
View File

@ -0,0 +1,141 @@
from pathlib import Path
import sys
from huggingface_hub import snapshot_download
from kernels.utils import CACHE_DIR
from kernel_abi_check import (
BinaryFormat,
IncompatibleMacOSVersion,
ObjectFile,
IncompatibleAbi3Symbol,
NonAbi3Symbol,
IncompatibleManylinuxSymbol,
MissingMacOSVersion,
)
def check_kernel(
*, macos: str, manylinux: str, python_abi: str, repo_id: str, revision: str
):
variants_path = (
Path(
snapshot_download(
repo_id,
allow_patterns=["build/*"],
cache_dir=CACHE_DIR,
revision=revision,
)
)
/ "build"
)
has_issues = False
for variant_path in variants_path.iterdir():
if not variant_path.is_dir():
print(
f"⛔ `build/` must only contain directories, found: {variant_path.name}",
file=sys.stderr,
)
has_issues = True
continue
print(f"Checking variant: {variant_path.name}", file=sys.stderr)
indent = 2
for dylib_path in variant_path.rglob("*.so"):
print_with_indent(
indent,
f"Dynamic library {dylib_path.relative_to(variant_path)}:",
)
o = ObjectFile(dylib_path)
has_issues |= check_abi3(o, python_abi, indent + 2)
# TODO: also check operating system
if o.format() == BinaryFormat.ELF:
has_issues |= check_manylinux(o, manylinux, indent + 2)
elif o.format() == BinaryFormat.MACH_O:
has_issues |= check_macos(o, macos, indent + 2)
if has_issues:
sys.exit(1)
def check_abi3(object_file: ObjectFile, python_abi: str, indent: int) -> bool:
has_issues = False
violations = object_file.check_python_abi(python_abi)
if violations != []:
has_issues = True
print_with_indent(
indent,
f"⛔ Found symbols that are incompatible with Python ABI {python_abi}:",
)
for violation in violations:
if isinstance(violation, IncompatibleAbi3Symbol):
print_with_indent(
indent + 3,
f"{violation.name}: {violation.version_added}",
)
elif isinstance(violation, NonAbi3Symbol):
print_with_indent(
indent + 3,
f"{violation.name}",
)
else:
print_with_indent(indent, f"🐍 Python ABI {python_abi} compatible")
return has_issues
def check_macos(object_file: ObjectFile, macos: str, indent: int) -> bool:
has_issues = False
violations = object_file.check_macos(macos)
if violations != []:
has_issues = True
print_with_indent(
indent,
f"⛔ Found incompatibility with macOS {macos}:",
)
for violation in violations:
if isinstance(violation, MissingMacOSVersion):
print_with_indent(
indent + 3,
"shared library does not contain macOS version",
)
elif isinstance(violation, IncompatibleMacOSVersion):
print_with_indent(
indent + 3,
f"shared library requires macOS {violation.version}",
)
else:
print_with_indent(indent, f"🍏 compatible with macOS {macos}")
return has_issues
def check_manylinux(object_file: ObjectFile, manylinux: str, indent: int) -> bool:
has_issues = False
violations = object_file.check_manylinux(manylinux)
if violations != []:
has_issues = True
print_with_indent(
indent,
f"⛔ Found symbols that are incompatible with {manylinux}:",
)
for violation in violations:
if isinstance(violation, IncompatibleManylinuxSymbol):
print_with_indent(
indent + 3,
f"{violation.name}_{violation.dep}: {violation.version}",
)
else:
print_with_indent(indent, f"🐧 {manylinux} compatible")
return has_issues
def print_with_indent(indent: int, message: str):
print(f"{' ' * indent}{message}", file=sys.stderr)

View File

@ -4,6 +4,8 @@ import json
import sys
from pathlib import Path
from huggingface_hub import create_repo, upload_folder
from kernels.compat import tomllib
from kernels.lockfile import KernelLock, get_kernel_locks
from kernels.utils import install_kernel, install_kernel_all_variants
@ -18,6 +20,31 @@ def main():
)
subparsers = parser.add_subparsers(required=True)
check_parser = subparsers.add_parser("check", help="Check a kernel for compliance")
check_parser.add_argument("repo_id", type=str, help="The kernel repo ID")
check_parser.add_argument(
"--revision",
type=str,
default="main",
help="The kernel revision (branch, tag, or commit SHA, defaults to 'main')",
)
check_parser.add_argument("--macos", type=str, help="macOS version", default="15.0")
check_parser.add_argument(
"--manylinux", type=str, help="Manylinux version", default="manylinux_2_28"
)
check_parser.add_argument(
"--python-abi", type=str, help="Python ABI version", default="3.9"
)
check_parser.set_defaults(
func=lambda args: check_kernel(
macos=args.macos,
manylinux=args.manylinux,
python_abi=args.python_abi,
repo_id=args.repo_id,
revision=args.revision,
)
)
download_parser = subparsers.add_parser("download", help="Download locked kernels")
download_parser.add_argument(
"project_dir",
@ -31,6 +58,24 @@ def main():
)
download_parser.set_defaults(func=download_kernels)
upload_parser = subparsers.add_parser("upload", help="Upload kernels to the Hub")
upload_parser.add_argument(
"kernel_dir",
type=Path,
help="Directory of the kernel build",
)
upload_parser.add_argument(
"--repo_id",
type=str,
help="Repository ID to use to upload to the Hugging Face Hub",
)
upload_parser.add_argument(
"--private",
action="store_true",
help="If the repository should be private.",
)
upload_parser.set_defaults(func=upload_kernels)
lock_parser = subparsers.add_parser("lock", help="Lock kernel revisions")
lock_parser.add_argument(
"project_dir",
@ -153,8 +198,56 @@ def lock_kernels(args):
json.dump(all_locks, f, cls=_JSONEncoder, indent=2)
def upload_kernels(args):
kernel_dir = Path(args.kernel_dir).resolve()
build_dir = kernel_dir / "build"
if not kernel_dir.is_dir():
raise ValueError(f"{kernel_dir} is not a directory")
if not build_dir.is_dir():
raise ValueError("Couldn't find `build` directory inside `kernel_dir`")
repo_id = create_repo(
repo_id=args.repo_id, private=args.private, exist_ok=True
).repo_id
delete_patterns: set[str] = set()
for build_variant in build_dir.iterdir():
if build_variant.is_dir():
delete_patterns.add(f"{build_variant.name}/**")
upload_folder(
repo_id=repo_id,
folder_path=build_dir,
path_in_repo="build",
delete_patterns=list(delete_patterns),
commit_message="Build uploaded using `kernels`.",
)
print(f"✅ Kernel upload successful. Find the kernel in https://hf.co/{repo_id}.")
class _JSONEncoder(json.JSONEncoder):
def default(self, o):
if dataclasses.is_dataclass(o):
return dataclasses.asdict(o)
return super().default(o)
def check_kernel(
*, macos: str, manylinux: str, python_abi: str, repo_id: str, revision: str
):
try:
import kernels.check
except ImportError:
print(
"`kernels check` requires the `kernel-abi-check` package: pip install kernel-abi-check",
file=sys.stderr,
)
sys.exit(1)
kernels.check.check_kernel(
macos=macos,
manylinux=manylinux,
python_abi=python_abi,
repo_id=repo_id,
revision=revision,
)

File diff suppressed because it is too large Load Diff

View File

@ -4,10 +4,8 @@ from pathlib import Path
from typing import Dict, List, Tuple
from huggingface_hub import HfApi
from huggingface_hub.hf_api import GitRefInfo
from packaging.specifiers import SpecifierSet
from packaging.version import InvalidVersion, Version
from kernels._versions import resolve_version_spec_as_ref
from kernels.compat import tomllib
@ -31,20 +29,6 @@ class KernelLock:
return cls(repo_id=o["repo_id"], sha=o["sha"], variants=variants)
def _get_available_versions(repo_id: str) -> Dict[Version, GitRefInfo]:
"""Get kernel versions that are available in the repository."""
versions = {}
for tag in HfApi().list_repo_refs(repo_id).tags:
if not tag.name.startswith("v"):
continue
try:
versions[Version(tag.name[1:])] = tag
except InvalidVersion:
continue
return versions
def get_kernel_locks(repo_id: str, version_spec: str) -> KernelLock:
"""
Get the locks for a kernel with the given version spec.
@ -52,16 +36,7 @@ def get_kernel_locks(repo_id: str, version_spec: str) -> KernelLock:
The version specifier can be any valid Python version specifier:
https://packaging.python.org/en/latest/specifications/version-specifiers/#version-specifiers
"""
versions = _get_available_versions(repo_id)
requirement = SpecifierSet(version_spec)
accepted_versions = sorted(requirement.filter(versions.keys()))
if len(accepted_versions) == 0:
raise ValueError(
f"No version of `{repo_id}` satisfies requirement: {version_spec}"
)
tag_for_newest = versions[accepted_versions[-1]]
tag_for_newest = resolve_version_spec_as_ref(repo_id, version_spec)
r = HfApi().repo_info(
repo_id=repo_id, revision=tag_for_newest.target_commit, files_metadata=True

View File

@ -16,6 +16,7 @@ from typing import Dict, List, Optional, Tuple
from huggingface_hub import file_exists, snapshot_download
from packaging.version import parse
from kernels._versions import select_revision_or_version
from kernels.lockfile import KernelLock, VariantLock
@ -34,6 +35,14 @@ def _get_cache_dir() -> Optional[str]:
CACHE_DIR: Optional[str] = _get_cache_dir()
def _get_privateuse_backend_name() -> Optional[str]:
import torch
if hasattr(torch._C, "_get_privateuse1_backend_name"):
return torch._C._get_privateuse1_backend_name()
return None
def build_variant() -> str:
import torch
@ -45,9 +54,17 @@ def build_variant() -> str:
compute_framework = f"rocm{rocm_version.major}{rocm_version.minor}"
elif torch.backends.mps.is_available():
compute_framework = "metal"
elif torch.version.xpu is not None:
version = torch.version.xpu
compute_framework = f"xpu{version[0:4]}{version[5:6]}"
elif _get_privateuse_backend_name() == "npu":
from torch_npu.utils.collect_env import get_cann_version # type: ignore[import-not-found]
cann_major, cann_minor = get_cann_version()[0], get_cann_version()[2]
compute_framework = f"cann{cann_major}{cann_minor}"
else:
raise AssertionError(
"Torch was not compiled with CUDA, Metal, or ROCm enabled."
"Torch was not compiled with CUDA, Metal, XPU, NPU, or ROCm enabled."
)
torch_version = parse(torch.__version__)
@ -95,7 +112,20 @@ def install_kernel(
"""
Download a kernel for the current environment to the cache.
The output path is validated againt `hash` when set.
The output path is validated against the hashes in `variant_locks` when provided.
Args:
repo_id (`str`):
The Hub repository containing the kernel.
revision (`str`):
The specific revision (branch, tag, or commit) to download.
local_files_only (`bool`, *optional*, defaults to `False`):
Whether to only use local files and not download from the Hub.
variant_locks (`Dict[str, VariantLock]`, *optional*):
Optional dictionary of variant locks for validation.
Returns:
`Tuple[str, Path]`: A tuple containing the package name and the path to the variant directory.
"""
package_name = package_name_from_repo_id(repo_id)
variant = build_variant()
@ -182,13 +212,39 @@ def install_kernel_all_variants(
return repo_path / "build"
def get_kernel(repo_id: str, revision: str = "main") -> ModuleType:
def get_kernel(
repo_id: str, revision: Optional[str] = None, version: Optional[str] = None
) -> ModuleType:
"""
Download and import a kernel from the Hugging Face Hub.
Load a kernel from the kernel hub.
The kernel is downloaded from the repository `repo_id` at
branch/commit/tag `revision`.
This function downloads a kernel to the local Hugging Face Hub cache directory (if it was not downloaded before)
and then loads the kernel.
Args:
repo_id (`str`):
The Hub repository containing the kernel.
revision (`str`, *optional*, defaults to `"main"`):
The specific revision (branch, tag, or commit) to download. Cannot be used together with `version`.
version (`str`, *optional*):
The kernel version to download. This can be a Python version specifier, such as `">=1.0.0,<2.0.0"`.
Cannot be used together with `revision`.
Returns:
`ModuleType`: The imported kernel module.
Example:
```python
import torch
from kernels import get_kernel
activation = get_kernel("kernels-community/activation")
x = torch.randn(10, 20, device="cuda")
out = torch.empty_like(x)
result = activation.silu_and_mul(out, x)
```
"""
revision = select_revision_or_version(repo_id, revision, version)
package_name, package_path = install_kernel(repo_id, revision=revision)
return import_from_path(package_name, package_path / package_name / "__init__.py")
@ -196,16 +252,56 @@ def get_kernel(repo_id: str, revision: str = "main") -> ModuleType:
def get_local_kernel(repo_path: Path, package_name: str) -> ModuleType:
"""
Import a kernel from a local kernel repository path.
Args:
repo_path (`Path`):
The local path to the kernel repository.
package_name (`str`):
The name of the package to import from the repository.
Returns:
`ModuleType`: The imported kernel module.
"""
package_name, package_path = _load_kernel_from_path(repo_path, package_name)
return import_from_path(package_name, package_path / package_name / "__init__.py")
variant = build_variant()
universal_variant = universal_build_variant()
# Presume we were given the top level path of the kernel repository.
for base_path in [repo_path, repo_path / "build"]:
# Prefer the universal variant if it exists.
for v in [universal_variant, variant]:
package_path = base_path / v / package_name / "__init__.py"
if package_path.exists():
return import_from_path(package_name, package_path)
# If we didn't find the package in the repo we may have a explicit
# package path.
package_path = repo_path / package_name / "__init__.py"
if package_path.exists():
return import_from_path(package_name, package_path)
raise FileNotFoundError(f"Could not find package '{package_name}' in {repo_path}")
def has_kernel(repo_id: str, revision: str = "main") -> bool:
def has_kernel(
repo_id: str, revision: Optional[str] = None, version: Optional[str] = None
) -> bool:
"""
Check whether a kernel build exists for the current environment
(Torch version and compute framework).
Check whether a kernel build exists for the current environment (Torch version and compute framework).
Args:
repo_id (`str`):
The Hub repository containing the kernel.
revision (`str`, *optional*, defaults to `"main"`):
The specific revision (branch, tag, or commit) to download. Cannot be used together with `version`.
version (`str`, *optional*):
The kernel version to download. This can be a Python version specifier, such as `">=1.0.0,<2.0.0"`.
Cannot be used together with `revision`.
Returns:
`bool`: `True` if a kernel is available for the current environment.
"""
revision = select_revision_or_version(repo_id, revision, version)
package_name = package_name_from_repo_id(repo_id)
variant = build_variant()
universal_variant = universal_build_variant()
@ -228,8 +324,16 @@ def load_kernel(repo_id: str, *, lockfile: Optional[Path] = None) -> ModuleType:
"""
Get a pre-downloaded, locked kernel.
If `lockfile` is not specified, the lockfile will be loaded from the
caller's package metadata.
If `lockfile` is not specified, the lockfile will be loaded from the caller's package metadata.
Args:
repo_id (`str`):
The Hub repository containing the kernel.
lockfile (`Path`, *optional*):
Path to the lockfile. If not provided, the lockfile will be loaded from the caller's package metadata.
Returns:
`ModuleType`: The imported kernel module.
"""
if lockfile is None:
locked_sha = _get_caller_locked_kernel(repo_id)
@ -274,7 +378,18 @@ def load_kernel(repo_id: str, *, lockfile: Optional[Path] = None) -> ModuleType:
def get_locked_kernel(repo_id: str, local_files_only: bool = False) -> ModuleType:
"""Get a kernel using a lock file."""
"""
Get a kernel using a lock file.
Args:
repo_id (`str`):
The Hub repository containing the kernel.
local_files_only (`bool`, *optional*, defaults to `False`):
Whether to only use local files and not download from the Hub.
Returns:
`ModuleType`: The imported kernel module.
"""
locked_sha = _get_caller_locked_kernel(repo_id)
if locked_sha is None:

View File

@ -1,10 +1,46 @@
import sys
import pytest
import torch
from kernels.utils import _get_privateuse_backend_name
has_cuda = (
hasattr(torch.version, "cuda")
and torch.version.cuda is not None
and torch.cuda.device_count() > 0
)
has_rocm = (
hasattr(torch.version, "hip")
and torch.version.hip is not None
and torch.cuda.device_count() > 0
)
has_xpu = (
hasattr(torch.version, "xpu")
and torch.version.xpu is not None
and torch.xpu.device_count() > 0
)
has_npu = _get_privateuse_backend_name() == "npu"
def pytest_addoption(parser):
parser.addoption(
"--token",
action="store_true",
help="run tests that require a token with write permissions",
)
def pytest_runtest_setup(item):
if "linux_only" in item.keywords and not sys.platform.startswith("linux"):
pytest.skip("skipping Linux-only test on non-Linux platform")
if "cuda_only" in item.keywords and not has_cuda:
pytest.skip("skipping CUDA-only test on host without CUDA")
if "rocm_only" in item.keywords and not has_rocm:
pytest.skip("skipping ROCm-only test on host without ROCm")
if "darwin_only" in item.keywords and not sys.platform.startswith("darwin"):
pytest.skip("skipping macOS-only test on non-macOS platform")
if "xpu_only" in item.keywords and not has_xpu:
pytest.skip("skipping XPU-only test on host without XPU")
if "npu_only" in item.keywords and not has_npu:
pytest.skip("skipping NPU-only test on host without NPU")
if "token" in item.keywords and not item.config.getoption("--token"):
pytest.skip("need --token option to run this test")

View File

@ -0,0 +1,12 @@
[
{
"repo_id": "kernels-test/versions",
"sha": "dc142fd6c9920c993d32be6358b78957c58681c3",
"variants": {
"torch-universal": {
"hash": "sha256-35ce0ccfe68e392cbc06feef72268f4c41a74b9920496a2c6ee8978db7f7c17c",
"hash_type": "git_lfs_concat"
}
}
}
]

View File

@ -0,0 +1,2 @@
[tool.kernels.dependencies]
"kernels-test/versions" = ">=0.1.0,<0.2.0"

View File

@ -10,10 +10,16 @@ def kernel():
@pytest.fixture
def local_kernel():
def local_kernel_path():
package_name, path = install_kernel("kernels-community/activation", "main")
# Path is the build variant path (build/torch-<...>), so the grandparent
# is the kernel repository path.
return package_name, path
@pytest.fixture
def local_kernel(local_kernel_path):
package_name, path = local_kernel_path
return get_local_kernel(path.parent.parent, package_name)
@ -34,7 +40,7 @@ def device():
return "cuda"
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_gelu_fast(kernel, device):
x = torch.arange(1, 10, dtype=torch.float16, device=device).view(3, 3)
y = torch.empty_like(x)
@ -50,7 +56,7 @@ def test_gelu_fast(kernel, device):
assert torch.allclose(y, expected)
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_local_kernel(local_kernel, device):
x = torch.arange(1, 10, dtype=torch.float16, device=device).view(3, 3)
y = torch.empty_like(x)
@ -66,6 +72,39 @@ def test_local_kernel(local_kernel, device):
assert torch.allclose(y, expected)
@pytest.mark.cuda_only
def test_local_kernel_path_types(local_kernel_path, device):
package_name, path = local_kernel_path
# Top-level repo path
# ie: /home/ubuntu/.cache/huggingface/hub/models--kernels-community--activation/snapshots/2fafa6a3a38ccb57a1a98419047cf7816ecbc071
kernel = get_local_kernel(path.parent.parent, package_name)
x = torch.arange(1, 10, dtype=torch.float16, device=device).view(3, 3)
y = torch.empty_like(x)
kernel.gelu_fast(y, x)
expected = torch.tensor(
[[0.8408, 1.9551, 2.9961], [4.0000, 5.0000, 6.0000], [7.0000, 8.0000, 9.0000]],
device=device,
dtype=torch.float16,
)
assert torch.allclose(y, expected)
# Build directory path
# ie: /home/ubuntu/.cache/huggingface/hub/models--kernels-community--activation/snapshots/2fafa6a3a38ccb57a1a98419047cf7816ecbc071/build
kernel = get_local_kernel(path.parent.parent / "build", package_name)
y = torch.empty_like(x)
kernel.gelu_fast(y, x)
assert torch.allclose(y, expected)
# Explicit package path
# ie: /home/ubuntu/.cache/huggingface/hub/models--kernels-community--activation/snapshots/2fafa6a3a38ccb57a1a98419047cf7816ecbc071/build/torch28-cxx11-cu128-x86_64-linux
kernel = get_local_kernel(path, package_name)
y = torch.empty_like(x)
kernel.gelu_fast(y, x)
assert torch.allclose(y, expected)
@pytest.mark.darwin_only
@pytest.mark.parametrize("dtype", [torch.float16, torch.float32])
def test_relu_metal(metal_kernel, dtype):
@ -74,7 +113,7 @@ def test_relu_metal(metal_kernel, dtype):
assert torch.allclose(y, torch.relu(x))
@pytest.mark.linux_only
@pytest.mark.cuda_only
@pytest.mark.parametrize(
"kernel_exists",
[
@ -91,7 +130,26 @@ def test_has_kernel(kernel_exists):
assert has_kernel(repo_id, revision=revision) == kernel
@pytest.mark.linux_only
def test_version():
kernel = get_kernel("kernels-test/versions")
assert kernel.version() == "0.2.0"
kernel = get_kernel("kernels-test/versions", version="<1.0.0")
assert kernel.version() == "0.2.0"
kernel = get_kernel("kernels-test/versions", version="<0.2.0")
assert kernel.version() == "0.1.1"
kernel = get_kernel("kernels-test/versions", version=">0.1.0,<0.2.0")
assert kernel.version() == "0.1.1"
with pytest.raises(ValueError, match=r"No version.*satisfies requirement"):
get_kernel("kernels-test/versions", version=">0.2.0")
with pytest.raises(ValueError, match=r"Either a revision or a version.*not both"):
kernel = get_kernel(
"kernels-test/versions", revision="v0.1.0", version="<1.0.0"
)
@pytest.mark.cuda_only
def test_universal_kernel(universal_kernel):
torch.manual_seed(0)
A = torch.randint(-10, 10, (64, 128), dtype=torch.int8, device="cuda")

View File

@ -16,21 +16,21 @@ def device():
return "cuda"
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_gelu_small(kernel, device, benchmark):
x = torch.randn(32, 32, dtype=torch.float16, device=device)
y = torch.empty_like(x)
benchmark(kernel.gelu_fast, y, x)
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_gelu_medium(kernel, device, benchmark):
x = torch.randn(128, 128, dtype=torch.float16, device=device)
y = torch.empty_like(x)
benchmark(kernel.gelu_fast, y, x)
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_gelu_large(kernel, device, benchmark):
x = torch.randn(512, 512, dtype=torch.float16, device=device)
y = torch.empty_like(x)

49
tests/test_doctest.py Normal file
View File

@ -0,0 +1,49 @@
import inspect
import pytest
from mktestdocs import check_docstring, get_codeblock_members
import kernels
def all_public_functions():
function_list = inspect.getmembers(kernels, inspect.isfunction)
return [func for _, func in function_list]
def all_public_classes():
class_list = inspect.getmembers(kernels, inspect.isclass)
return [cls for _, cls in class_list]
def all_public_class_members():
members = get_codeblock_members(*all_public_classes())
return members
@pytest.mark.cuda_only
@pytest.mark.parametrize(
"func",
all_public_functions(),
ids=lambda d: d.__name__,
)
def test_func_docstring(func):
check_docstring(obj=func)
@pytest.mark.cuda_only
@pytest.mark.parametrize(
"cls",
all_public_classes(),
ids=lambda d: d.__name__,
)
def test_class_docstring(cls):
check_docstring(obj=cls)
@pytest.mark.cuda_only
@pytest.mark.parametrize(
"member", all_public_class_members(), ids=lambda d: d.__qualname__
)
def test_member_docstring(member):
check_docstring(member)

View File

@ -2,9 +2,17 @@ from dataclasses import dataclass
from pathlib import Path
import pytest
import torch.nn as nn
from kernels import load_kernel
from kernels.cli import download_kernels
from kernels.layer import (
LockedLayerRepository,
Mode,
kernelize,
use_kernel_forward_from_hub,
use_kernel_mapping,
)
# Mock download arguments class.
@ -19,9 +27,35 @@ def test_download_all_hash_validation():
download_kernels(DownloadArgs(all_variants=True, project_dir=project_dir))
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_load_locked():
project_dir = Path(__file__).parent / "kernel_locking"
# Also validates that hashing works correctly.
download_kernels(DownloadArgs(all_variants=False, project_dir=project_dir))
load_kernel("kernels-community/activation", lockfile=project_dir / "kernels.lock")
@pytest.mark.cuda_only
def test_layer_locked():
project_dir = Path(__file__).parent / "layer_locking"
@use_kernel_forward_from_hub("Version")
class Version(nn.Module):
def forward(self) -> str:
return "0.0.0"
version = Version()
with use_kernel_mapping(
{
"Version": {
"cuda": LockedLayerRepository(
repo_id="kernels-test/versions",
layer_name="Version",
lockfile=project_dir / "kernels.lock",
)
},
}
):
version = kernelize(version, device="cuda", mode=Mode.INFERENCE)
assert version() == "0.1.1"

115
tests/test_kernel_upload.py Normal file
View File

@ -0,0 +1,115 @@
import logging
import os
import re
import tempfile
from dataclasses import dataclass
from pathlib import Path
from typing import List
import pytest
from huggingface_hub import delete_repo, model_info
from kernels.cli import upload_kernels
REPO_ID = "valid_org/kernels-upload-test"
PY_CONTENT = """\
#!/usr/bin/env python3
def main():
print("Hello from torch-universal!")
if __name__ == "__main__":
main()
"""
@dataclass
class UploadArgs:
kernel_dir: None
repo_id: None
private: False
def next_filename(path: Path) -> Path:
"""
Given a path like foo_2050.py, return foo_2051.py.
"""
m = re.match(r"^(.*?)(\d+)(\.py)$", path.name)
if not m:
raise ValueError(
f"Filename {path.name!r} does not match pattern <prefix>_<number>.py"
)
prefix, number, suffix = m.groups()
new_number = str(int(number) + 1).zfill(len(number))
return path.with_name(f"{prefix}{new_number}{suffix}")
def get_filename_to_change(repo_filenames):
for f in repo_filenames:
if "foo" in f and f.endswith(".py"):
filename_to_change = os.path.basename(f)
break
assert filename_to_change
return filename_to_change
def get_filenames_from_a_repo(repo_id: str) -> List[str]:
try:
repo_info = model_info(repo_id=repo_id, files_metadata=True)
repo_siblings = repo_info.siblings
if repo_siblings is not None:
return [f.rfilename for f in repo_siblings]
else:
raise ValueError("No repo siblings found.")
except Exception as e:
logging.error(f"Error connecting to the Hub: {e}.")
@pytest.mark.token
@pytest.mark.is_staging_test
def test_kernel_upload_works_as_expected():
with tempfile.TemporaryDirectory() as tmpdir:
path = f"{tmpdir}/build/torch-universal/upload_test"
build_dir = Path(path)
build_dir.mkdir(parents=True, exist_ok=True)
script_path = build_dir / "foo.py"
script_path.write_text(PY_CONTENT)
upload_kernels(UploadArgs(tmpdir, REPO_ID, False))
repo_filenames = get_filenames_from_a_repo(REPO_ID)
assert any(str(script_path.name) for f in repo_filenames)
delete_repo(repo_id=REPO_ID)
@pytest.mark.token
@pytest.mark.is_staging_test
def test_kernel_upload_deletes_as_expected():
with tempfile.TemporaryDirectory() as tmpdir:
path = f"{tmpdir}/build/torch-universal/upload_test"
build_dir = Path(path)
build_dir.mkdir(parents=True, exist_ok=True)
script_path = build_dir / "foo_2025.py"
script_path.write_text(PY_CONTENT)
upload_kernels(UploadArgs(tmpdir, REPO_ID, False))
repo_filenames = get_filenames_from_a_repo(REPO_ID)
filename_to_change = get_filename_to_change(repo_filenames)
with tempfile.TemporaryDirectory() as tmpdir:
path = f"{tmpdir}/build/torch-universal/upload_test"
build_dir = Path(path)
build_dir.mkdir(parents=True, exist_ok=True)
changed_filename = next_filename(Path(filename_to_change))
script_path = build_dir / changed_filename
script_path.write_text(PY_CONTENT)
upload_kernels(UploadArgs(tmpdir, REPO_ID, False))
repo_filenames = get_filenames_from_a_repo(REPO_ID)
assert any(str(changed_filename) in k for k in repo_filenames), f"{repo_filenames=}"
assert not any(
str(filename_to_change) in k for k in repo_filenames
), f"{repo_filenames=}"
delete_repo(repo_id=REPO_ID)

View File

@ -7,18 +7,23 @@ import torch.nn as nn
from torch.nn import functional as F
from kernels import (
CUDAProperties,
Device,
LayerRepository,
LocalLayerRepository,
Mode,
kernelize,
register_kernel_mapping,
use_kernel_forward_from_hub,
use_kernel_mapping,
)
from kernels.layer import (
_KERNEL_MAPPING,
CUDAProperties,
_validate_layer,
use_kernel_mapping,
)
from kernels.utils import (
_get_privateuse_backend_name,
install_kernel,
)
kernel_layer_mapping = {
@ -26,13 +31,21 @@ kernel_layer_mapping = {
Device(type="cuda"): LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
)
),
"npu": LayerRepository(
repo_id="kernels-ext-npu/SwiGlu",
layer_name="SwiGlu",
),
},
"SiluAndMulNoCompile": {
"cuda": LayerRepository(
repo_id="kernels-test/op-without-fake-test",
layer_name="SiluAndMul",
)
),
"rocm": LayerRepository(
repo_id="kernels-test/op-without-fake-test",
layer_name="SiluAndMul",
),
},
"SiluAndMulStringDevice": {
"cuda": LayerRepository(
@ -40,11 +53,37 @@ kernel_layer_mapping = {
layer_name="SiluAndMul",
)
},
"LigerRMSNorm": {
"xpu": LayerRepository(
repo_id="kernels-community/liger_kernels",
layer_name="LigerRMSNorm", # Triton
)
},
}
register_kernel_mapping(kernel_layer_mapping)
class RMSNorm(nn.Module):
def __init__(self, weight: torch.Tensor, eps: float = 1e-6):
super().__init__()
# Used to check that we called hub kernel.
self.n_calls = 0
self.weight = nn.Parameter(weight)
self.variance_epsilon = eps
def forward(self, x: torch.Tensor):
self.n_calls += 1
var = x.pow(2).mean(-1, keepdim=True)
x_norm = x * torch.rsqrt(var + self.variance_epsilon)
return x_norm * self.weight
@use_kernel_forward_from_hub("LigerRMSNorm")
class RMSNormWithKernel(RMSNorm):
pass
class SiluAndMul(nn.Module):
def __init__(self):
super().__init__()
@ -84,6 +123,18 @@ class TorchLinearWithCounter(nn.Linear):
return super().forward(input)
@pytest.fixture
def device():
if torch.cuda.is_available():
return "cuda"
elif hasattr(torch, "xpu") and torch.xpu.is_available():
return "xpu"
elif _get_privateuse_backend_name() == "npu":
return "npu"
pytest.skip("No CUDA, NPU or XPU")
def test_arg_kinds():
@use_kernel_forward_from_hub("ArgKind")
class ArgKind(nn.Module):
@ -102,29 +153,122 @@ def test_arg_kinds():
assert arg_kind("foo", "bar", kwarg1="baz", kwarg2=5) == ("foo", "bar", "baz", 5)
@pytest.mark.linux_only
@pytest.mark.cuda_only
@pytest.mark.parametrize("cls", [SiluAndMulWithKernel, SiluAndMulStringDevice])
@pytest.mark.parametrize("device", ["cuda", "cpu"])
def test_hub_forward(cls, device):
def test_hub_forward(cls):
torch.random.manual_seed(0)
silu_and_mul = SiluAndMul()
X = torch.randn((32, 64), device=device)
X = torch.randn((32, 64), device="cuda")
Y = silu_and_mul(X)
silu_and_mul_with_kernel = kernelize(cls(), device=device, mode=Mode.INFERENCE)
silu_and_mul_with_kernel = kernelize(cls(), device="cuda", mode=Mode.INFERENCE)
Y_kernel = silu_and_mul_with_kernel(X)
torch.testing.assert_close(Y_kernel, Y)
assert silu_and_mul.n_calls == 1
if device == "cuda":
assert silu_and_mul_with_kernel.n_calls == 0
else:
assert silu_and_mul_with_kernel.n_calls == 1
@pytest.mark.linux_only
@pytest.mark.rocm_only
def test_hub_forward_rocm():
torch.manual_seed(0)
silu_and_mul = SiluAndMul()
X = torch.randn((32, 64))
Y = silu_and_mul(X)
silu_and_mul_with_kernel = kernelize(
SiluAndMulNoCompileKernel(), device="rocm", mode=Mode.INFERENCE
)
Y_kernel = silu_and_mul_with_kernel(X)
torch.testing.assert_close(Y_kernel, Y)
assert silu_and_mul.n_calls == 1
# Should use kernel (n_calls == 0) if ROCm kernel is available, otherwise fallback (n_calls == 1)
# The exact behavior depends on whether the test kernel exists for ROCm
assert silu_and_mul_with_kernel.n_calls in [0, 1]
@pytest.mark.xpu_only
def test_hub_forward_xpu():
torch.manual_seed(0)
hidden_size = 1024
weight = torch.ones(hidden_size, device="xpu")
rms_norm = RMSNorm(weight).to("xpu")
X = torch.randn(4, 16, hidden_size, device="xpu", dtype=torch.float32)
Y = rms_norm(X)
rms_norm_with_kernel = kernelize(
RMSNormWithKernel(weight), mode=Mode.INFERENCE, device="xpu"
)
Y_kernel = rms_norm_with_kernel(X)
torch.testing.assert_close(Y_kernel, Y)
assert rms_norm.n_calls == 1
assert rms_norm_with_kernel.n_calls == 0
@pytest.mark.npu_only
def test_hub_forward_npu():
torch.manual_seed(0)
silu_and_mul = SiluAndMul()
X = torch.randn((32, 64), device="npu")
Y = silu_and_mul(X)
silu_and_mul_with_kernel = kernelize(
SiluAndMulWithKernel(), device="npu", mode=Mode.INFERENCE
)
Y_kernel = silu_and_mul_with_kernel(X)
torch.testing.assert_close(Y_kernel, Y)
assert silu_and_mul.n_calls == 1
assert silu_and_mul_with_kernel.n_calls == 0
@pytest.mark.skipif(
hasattr(torch, "xpu") and getattr(torch.xpu, "is_available", lambda: False)(),
reason="Skip on xpu devices",
)
@pytest.mark.skipif(
_get_privateuse_backend_name() == "npu",
reason="Skip on npu devices",
)
def test_rocm_kernel_mapping():
"""Test that ROCm shorthand device mapping works correctly."""
kernel_layer_mapping = {
"SiluAndMul": {
"rocm": LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
)
}
}
# Test that the mapping is processed correctly
with use_kernel_mapping(kernel_layer_mapping, inherit_mapping=False):
mapping = _KERNEL_MAPPING.get()
# Verify the mapping exists
assert "SiluAndMul" in mapping
assert "rocm" in mapping["SiluAndMul"]
# Verify the repository is correctly stored
rocm_repos = mapping["SiluAndMul"]["rocm"]
assert rocm_repos is not None
assert (
rocm_repos.repos[Mode.FALLBACK]._repo_id == "kernels-community/activation"
)
assert rocm_repos.repos[Mode.FALLBACK].layer_name == "SiluAndMul"
@pytest.mark.cuda_only
def test_capability():
linear = TorchLinearWithCounter(32, 32).to("cuda")
with use_kernel_mapping(
@ -183,7 +327,33 @@ def test_layer_fallback_works():
kernelize(silu_and_mul, device="cuda", mode=Mode.INFERENCE)
@pytest.mark.linux_only
def test_local_layer_repo(device):
# Fetch a kernel to the local cache.
package_name, path = install_kernel("kernels-test/backward-marker-test", "main")
linear = TorchLinearWithCounter(32, 32).to(device)
with use_kernel_mapping(
{
"Linear": {
device: LocalLayerRepository(
# install_kernel will give the fully-resolved path.
repo_path=path.parent.parent,
package_name=package_name,
layer_name="LinearBackward",
)
}
},
inherit_mapping=False,
):
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device=device)
linear(X)
assert linear.n_calls == 0
@pytest.mark.cuda_only
@pytest.mark.parametrize("cls", [SiluAndMulWithKernel, SiluAndMulNoCompileKernel])
@pytest.mark.parametrize("device", ["cuda"])
def test_torch_compile_layer_without_fallback(cls, device):
@ -214,7 +384,7 @@ def test_torch_compile_layer_without_fallback(cls, device):
torch.testing.assert_close(Y_compiled, Y)
@pytest.mark.linux_only
@pytest.mark.cuda_only
@pytest.mark.parametrize("cls", [SiluAndMulWithKernel, SiluAndMulNoCompileKernel])
@pytest.mark.parametrize("device", ["cuda"])
def test_torch_compile_layer_with_fallback(cls, device):
@ -237,12 +407,16 @@ def test_torch_compile_layer_with_fallback(cls, device):
torch.testing.assert_close(Y_compiled, Y)
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_mapping_contexts():
# Make sure we start from scratch.
register_kernel_mapping(kernel_layer_mapping, inherit_mapping=False)
assert set(_KERNEL_MAPPING.get().keys()) == {
"SiluAndMul",
"SiluAndMulStringDevice",
"SiluAndMulNoCompile",
"LigerRMSNorm",
}
extra_mapping1 = {
@ -260,6 +434,7 @@ def test_mapping_contexts():
"SiluAndMul",
"SiluAndMulStringDevice",
"SiluAndMulNoCompile",
"LigerRMSNorm",
"TestKernel",
}
@ -278,10 +453,13 @@ def test_mapping_contexts():
"SiluAndMul",
"SiluAndMulStringDevice",
"SiluAndMulNoCompile",
"LigerRMSNorm",
"TestKernel",
}
assert (
_KERNEL_MAPPING.get()["SiluAndMul"]["cuda"].repos[Mode.DEFAULT].repo_id
_KERNEL_MAPPING.get()["SiluAndMul"]["cuda"]
.repos[Mode.FALLBACK]
._repo_id
== "kernels-community/non-existing"
)
@ -289,10 +467,11 @@ def test_mapping_contexts():
"SiluAndMul",
"SiluAndMulStringDevice",
"SiluAndMulNoCompile",
"LigerRMSNorm",
"TestKernel",
}
assert (
_KERNEL_MAPPING.get()["SiluAndMul"]["cuda"].repos[Mode.DEFAULT].repo_id
_KERNEL_MAPPING.get()["SiluAndMul"]["cuda"].repos[Mode.FALLBACK]._repo_id
== "kernels-community/activation"
)
@ -301,7 +480,9 @@ def test_mapping_contexts():
"SiluAndMul",
}
assert (
_KERNEL_MAPPING.get()["SiluAndMul"]["cuda"].repos[Mode.DEFAULT].repo_id
_KERNEL_MAPPING.get()["SiluAndMul"]["cuda"]
.repos[Mode.FALLBACK]
._repo_id
== "kernels-community/non-existing"
)
@ -309,10 +490,11 @@ def test_mapping_contexts():
"SiluAndMul",
"SiluAndMulStringDevice",
"SiluAndMulNoCompile",
"LigerRMSNorm",
"TestKernel",
}
assert (
_KERNEL_MAPPING.get()["SiluAndMul"]["cuda"].repos[Mode.DEFAULT].repo_id
_KERNEL_MAPPING.get()["SiluAndMul"]["cuda"].repos[Mode.FALLBACK]._repo_id
== "kernels-community/activation"
)
@ -320,6 +502,7 @@ def test_mapping_contexts():
"SiluAndMul",
"SiluAndMulStringDevice",
"SiluAndMulNoCompile",
"LigerRMSNorm",
}
@ -329,29 +512,46 @@ def test_validate_kernel_layer():
super().__init__(*args, **kwargs)
self.foo = 42
with pytest.raises(TypeError, match="not override"):
_validate_layer(cls=BadLayer, check_cls=SiluAndMul)
def stub_repo(layer):
return LayerRepository(
repo_id="kernels-test/nonexisting", layer_name=layer.__name__
)
with pytest.raises(
TypeError,
match="`kernels-test/nonexisting`.*layer `BadLayer` must not override",
):
_validate_layer(cls=BadLayer, check_cls=SiluAndMul, repo=stub_repo(BadLayer))
class BadLayer2(nn.Module):
foo: int = 42
with pytest.raises(TypeError, match="not contain additional members"):
_validate_layer(cls=BadLayer2, check_cls=SiluAndMul)
with pytest.raises(
TypeError,
match="`kernels-test/nonexisting`.*layer `BadLayer2` must not contain.*SiluAndMul",
):
_validate_layer(cls=BadLayer2, check_cls=SiluAndMul, repo=stub_repo(BadLayer2))
class BadLayer3(nn.Module):
def forward(self, x: torch.Tensor, foo: int) -> torch.Tensor: ...
with pytest.raises(TypeError, match="different number of arguments"):
_validate_layer(cls=BadLayer3, check_cls=SiluAndMul)
with pytest.raises(
TypeError,
match="Forward.*`kernels-test/nonexisting`.*layer `BadLayer3` does not match `SiluAndMul`: different number of arguments",
):
_validate_layer(cls=BadLayer3, check_cls=SiluAndMul, repo=stub_repo(BadLayer3))
class BadLayer4(nn.Module):
def forward(self, *, x: torch.Tensor) -> torch.Tensor: ...
with pytest.raises(TypeError, match="different kind of arguments"):
_validate_layer(cls=BadLayer4, check_cls=SiluAndMul)
with pytest.raises(
TypeError,
match="Forward.*`kernels-test/nonexisting`.*layer `BadLayer4` does not match `SiluAndMul`: different kind of arguments",
):
_validate_layer(cls=BadLayer4, check_cls=SiluAndMul, repo=stub_repo(BadLayer4))
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_invalid_mode_for_mapping_rejected():
linear = TorchLinearWithCounter(32, 32).to("cuda")
@ -371,7 +571,7 @@ def test_invalid_mode_for_mapping_rejected():
kernelize(linear, mode=Mode.TRAINING)
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_kernel_modes():
linear = TorchLinearWithCounter(32, 32).to("cuda")
@ -400,11 +600,6 @@ def test_kernel_modes():
linear(X)
assert linear.n_calls == 0
# Same as previous, since TRAINING | TORCH_COMPILE is the default.
kernelize(linear)
linear(X)
assert linear.n_calls == 0
# Case 2: register a kernel just for training. If no base kernel
# layer is registered, we fall back to the original layer.
with use_kernel_mapping(
@ -422,30 +617,24 @@ def test_kernel_modes():
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
assert linear.n_calls == 1
assert linear.n_calls == 0
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# Training has a kernel, so fallback.
assert linear.n_calls == 1
assert linear.n_calls == 0
kernelize(linear, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
linear(X)
# No kernel for training + torch.compile, so fallback.
assert linear.n_calls == 2
# Same as previous, since TRAINING | TORCH_COMPILE is the default.
kernelize(linear)
linear(X)
# No kernel for training + torch.compile, so fallback.
assert linear.n_calls == 3
# TRAINING | TORCH_COMPILE cannot fall back to TRAINING kernel, so uses original.
assert linear.n_calls == 1
# Case 3: register a kernel just for training and one for fallback.
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.DEFAULT: LayerRepository(
Mode.FALLBACK: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
@ -460,24 +649,18 @@ def test_kernel_modes():
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
# Uses the base kernel.
assert linear.n_calls == 3
# Falls back to TRAINING.
assert linear.n_calls == 1
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# Uses the training kernel.
assert linear.n_calls == 3
# Falls back to the TRAINING kernel.
assert linear.n_calls == 1
kernelize(linear, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
linear(X)
# Uses the base kernel.
assert linear.n_calls == 3
# Same as previous, since TRAINING | TORCH_COMPILE is the default.
kernelize(linear)
linear(X)
# Uses the base kernel.
assert linear.n_calls == 3
# TRAINING | TORCH_COMPILE falls back to FALLBACK kernel.
assert linear.n_calls == 1
# Case 4: register a kernel with two preferences.
with use_kernel_mapping(
@ -496,26 +679,21 @@ def test_kernel_modes():
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
# No inference kernel, so fallback.
assert linear.n_calls == 4
# Falls back to the TRAINING | TORCH_COMPILE kernel.
assert linear.n_calls == 1
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# No training kernel, so fallback.
assert linear.n_calls == 5
# TRAINING can fall back to TRAINING | TORCH_COMPILE kernel.
assert linear.n_calls == 1
kernelize(linear, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
linear(X)
# We do have a training + torch.compile kernel.
assert linear.n_calls == 5
# Same as previous, since TRAINING | TORCH_COMPILE is the default.
kernelize(linear)
linear(X)
assert linear.n_calls == 5
# Uses TRAINING | TORCH_COMPILE kernel.
assert linear.n_calls == 1
@pytest.mark.linux_only
@pytest.mark.cuda_only
def test_fallback_used_when_training():
linear = TorchLinearWithCounter(32, 32).to("cuda")
@ -569,12 +747,385 @@ def test_invalid_mode_rejected():
_ = Mode.INFERENCE | Mode.TRAINING
with pytest.raises(ValueError, match="cannot be combined with other modes"):
_ = Mode.DEFAULT | Mode.TORCH_COMPILE
_ = Mode.FALLBACK | Mode.TORCH_COMPILE
with pytest.raises(
ValueError, match="can only be used to register kernel mappings"
):
kernelize(torch.nn.Linear(32, 32), mode=Mode.DEFAULT)
kernelize(torch.nn.Linear(32, 32), mode=Mode.FALLBACK)
with pytest.raises(ValueError, match="mode must contain"):
kernelize(torch.nn.Linear(32, 32), mode=Mode.TORCH_COMPILE)
@pytest.mark.cuda_only
def test_kernel_modes_inference():
"""Test inference-specific fallback scenarios."""
linear = TorchLinearWithCounter(32, 32).to("cuda")
# Case 1: register a kernel just for inference
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.INFERENCE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
)
}
}
}
):
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
assert linear.n_calls == 0
kernelize(linear, mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
linear(X)
# INFERENCE | TORCH_COMPILE cannot fall back to INFERENCE kernel, so uses original
assert linear.n_calls == 1
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# No training kernel, so fallback to original
assert linear.n_calls == 2
# Case 2: register a kernel just for inference + torch.compile
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.INFERENCE
| Mode.TORCH_COMPILE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
)
}
}
}
):
kernelize(linear, mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
X = torch.randn(10, 32, device="cuda")
linear(X)
assert linear.n_calls == 2
kernelize(linear, mode=Mode.INFERENCE)
linear(X)
# INFERENCE falls back to INFERENCE | TORCH_COMPILE kernel
assert linear.n_calls == 2
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# No training kernel, so fallback to original
assert linear.n_calls == 3
# Case 3: register both inference kernels
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.INFERENCE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
Mode.INFERENCE
| Mode.TORCH_COMPILE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
}
}
}
):
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
# Uses exact INFERENCE kernel
assert linear.n_calls == 3
kernelize(linear, mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
linear(X)
# Uses exact INFERENCE | TORCH_COMPILE kernel
assert linear.n_calls == 3
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# No training kernel, so fallback to original
assert linear.n_calls == 4
@pytest.mark.cuda_only
def test_kernel_modes_mixed():
"""Test mixed training and inference kernel scenarios."""
linear = TorchLinearWithCounter(32, 32).to("cuda")
# Case 1: register both base inference and training kernels
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.INFERENCE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
Mode.TRAINING: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
}
}
}
):
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
assert linear.n_calls == 0
kernelize(linear, mode=Mode.TRAINING)
linear(X)
assert linear.n_calls == 0
kernelize(linear, mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
linear(X)
# INFERENCE | TORCH_COMPILE cannot fall back to INFERENCE kernel, so uses original
assert linear.n_calls == 1
kernelize(linear, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
linear(X)
# TRAINING | TORCH_COMPILE cannot fall back to TRAINING kernel, so uses original
assert linear.n_calls == 2
# Case 2: register all four kernel modes
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.INFERENCE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
Mode.TRAINING: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
Mode.INFERENCE
| Mode.TORCH_COMPILE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
Mode.TRAINING
| Mode.TORCH_COMPILE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
}
}
}
):
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
# Uses exact INFERENCE kernel
assert linear.n_calls == 2
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# Uses exact TRAINING kernel
assert linear.n_calls == 2
kernelize(linear, mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
linear(X)
# Uses exact INFERENCE | TORCH_COMPILE kernel
assert linear.n_calls == 2
kernelize(linear, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
linear(X)
# Uses exact TRAINING | TORCH_COMPILE kernel
assert linear.n_calls == 2
@pytest.mark.cuda_only
def test_kernel_modes_cross_fallback():
"""Test cross-mode fallback scenarios from inference to training modes."""
linear = TorchLinearWithCounter(32, 32).to("cuda")
# Case 1: Only training kernel registered - inference should fall back to training
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.TRAINING: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
)
}
}
}
):
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
# INFERENCE falls back to TRAINING kernel
assert linear.n_calls == 0
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# TRAINING uses the kernel directly
assert linear.n_calls == 0
# Case 2: Only training + torch.compile kernel registered
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.TRAINING
| Mode.TORCH_COMPILE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
)
}
}
}
):
kernelize(linear, mode=Mode.INFERENCE)
X = torch.randn(10, 32, device="cuda")
linear(X)
# INFERENCE falls back to TRAINING | TORCH_COMPILE kernel
assert linear.n_calls == 0
kernelize(linear, mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
linear(X)
# INFERENCE | TORCH_COMPILE falls back to TRAINING | TORCH_COMPILE kernel
assert linear.n_calls == 0
kernelize(linear, mode=Mode.TRAINING)
linear(X)
# TRAINING falls back to TRAINING | TORCH_COMPILE kernel
assert linear.n_calls == 0
kernelize(linear, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
linear(X)
# TRAINING | TORCH_COMPILE uses the kernel directly
assert linear.n_calls == 0
# Case 3: Test that training modes don't fall back to inference modes
with use_kernel_mapping(
{
"Linear": {
"cuda": {
Mode.INFERENCE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
Mode.INFERENCE
| Mode.TORCH_COMPILE: LayerRepository(
repo_id="kernels-test/backward-marker-test",
layer_name="LinearBackward",
),
}
}
}
):
kernelize(linear, mode=Mode.TRAINING)
X = torch.randn(10, 32, device="cuda")
linear(X)
# TRAINING should NOT fall back to inference kernels, use original
assert linear.n_calls == 1
kernelize(linear, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
linear(X)
# TRAINING | TORCH_COMPILE should NOT fall back to inference kernels, use original
assert linear.n_calls == 2
def test_layer_versions(device):
@use_kernel_forward_from_hub("Version")
class Version(nn.Module):
def forward(self) -> str:
return "0.0.0"
version = Version()
with use_kernel_mapping(
{
"Version": {
Device(type=device): LayerRepository(
repo_id="kernels-test/versions",
layer_name="Version",
)
}
}
):
version = kernelize(version, device=device, mode=Mode.INFERENCE)
assert version() == "0.2.0"
with use_kernel_mapping(
{
"Version": {
Device(type=device): LayerRepository(
repo_id="kernels-test/versions",
layer_name="Version",
version="<1.0.0",
)
}
}
):
version = kernelize(version, device=device, mode=Mode.INFERENCE)
assert version() == "0.2.0"
with use_kernel_mapping(
{
"Version": {
Device(type=device): LayerRepository(
repo_id="kernels-test/versions",
layer_name="Version",
version="<0.2.0",
)
}
}
):
version = kernelize(version, device=device, mode=Mode.INFERENCE)
assert version() == "0.1.1"
with use_kernel_mapping(
{
"Version": {
Device(type=device): LayerRepository(
repo_id="kernels-test/versions",
layer_name="Version",
version=">0.1.0,<0.2.0",
)
}
}
):
version = kernelize(version, device=device, mode=Mode.INFERENCE)
assert version() == "0.1.1"
with use_kernel_mapping(
{
"Version": {
Device(type=device): LayerRepository(
repo_id="kernels-test/versions",
layer_name="Version",
version=">0.2.0",
)
}
}
):
with pytest.raises(ValueError, match=r"No version.*satisfies requirement"):
kernelize(version, device=device, mode=Mode.INFERENCE)
with pytest.raises(ValueError, match=r"Either a revision or a version.*not both"):
use_kernel_mapping(
{
"Version": {
Device(type=device): LayerRepository(
repo_id="kernels-test/versions",
layer_name="Version",
revision="v0.1.0",
version="<1.0.0",
)
}
}
)