kernels

mirror of https://github.com/huggingface/kernels.git synced 2025-10-20 12:33:46 +08:00

Author	SHA1	Message	Date
Sayak Paul	ce77658efc	fix: kernels upload to a repo branch (#168 ) * fix: kernels upload to a repo branch * up	2025-10-16 16:01:00 +02:00
Sayak Paul	a7101b2cfd	feat: allow kernels to be uploaded to a revision (#161 ) * feat: allow kernels to be uploaded to a revision * revision -> branch	2025-10-13 10:31:11 +02:00
Mohamed Mekkouri	6241afa06e	Bump torch version in runner (#162 ) * bump torch version * run kernels lock tests/kernel_locking	2025-10-09 11:04:52 +02:00
Daniël de Kok	fb8cd99a2c	Add support for NPU kernelize/layers (#155 ) This change add support for Huawei Ascend NPUs. This is #146 with some formatting/typing fixes. Co-authored-by: zheliuyu <15750543867@163.com>	2025-09-23 10:46:41 +02:00
Sayak Paul	93e5765611	[tests] turn the `kernels upload` tests to be staging tests (#152 )	2025-09-22 18:53:53 +02:00
Daniël de Kok	6c00194680	Improve errors for layer validation (#145 ) * Improve errors for layer validation Include the repo and layer name as well as the name of the class that is being compared to (when applicable). * Remove upload xfail * Only enable tests that require a token with `--token`	2025-09-16 14:40:54 +02:00
Sayak Paul	d6b51eefb7	[feat] add an uploading utility (#138 ) * add an uploading utility. * format * remove stale files. * black format * sorted imports. * up * up * add a test * propagate. * remove duplicate imports. * Apply suggestions from code review Co-authored-by: Daniël de Kok <me@danieldk.eu> * up * up * up * command to format all files at once would be nice. * up * up * up * Use token for upload test * assign env better. * docs * polish * up * xfail the test for now. --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2025-09-16 08:56:54 +02:00
Daniël de Kok	d383fdd4b4	Add support for XPU layer repostories (#142 ) This change adds support for XPU layer repositories, e.g.: ``` kernel_mapping = { "LigerRMSNorm": { "xpu": LayerRepository( repo_id="kernels-community/liger_kernels", layer_name="LigerRMSNorm", ) }, } Co-authored-by: YangKai0616 <kai.yang@intel.com>	2025-09-11 15:51:02 +02:00
Daniël de Kok	0ae07f05fc	Remove default for `mode` argument of `kernelize` (#136 )	2025-08-29 17:44:20 +02:00
Daniël de Kok	7611021100	`cpu` is not (yet) a supported device type (#132 ) Fixes #131.	2025-08-25 16:25:58 +02:00
drbh	767e7ccf13	fix: add get local tests (#134 ) * fix: add tests for get local kernel * fix: update test and add path example comments * fix: run black linter	2025-08-21 13:01:48 -04:00
Ákos Hadnagy	a8a6564fa7	Add ROCm device discovery (#122 ) * Add ROCm device discovery * Ruff * Address review comments * Ruff * Reorg torch import * Remove redundant import * Apply suggestions from code review Co-authored-by: Daniël de Kok <me@danieldk.eu> * Address review comments * Validat device type * Clean diff * black * Sync test with repo changes * black again --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2025-08-01 16:09:45 +02:00
Daniël de Kok	cfa0c76ddc	Add `LocalLayerRepository` to load from a local repo (#123 )	2025-08-01 14:03:11 +02:00
Daniël de Kok	f7490bd0a9	Test examples in docstrings using mktestdocs (#118 ) Also adjust examples so that they are correct.	2025-07-28 17:31:34 +02:00
Daniël de Kok	81088d44e8	Add support for project-wide locking of layers (#114 ) This change adds `LockedLayerRepository` as an alternative to `LayerRepository`. `LockedLayerRepository` allows for locking all kernel layers that are used at the project level. Example usage: ``` with use_kernel_mapping( { "SomeLayer": { "cuda": LockedLayerRepository( repo_id="some-org/some-layer", layer_name="SomeLayer", ) }, } ): layer = kernelize(layer, device="cuda", mode=Mode.INFERENCE) ``` This requires that the project has a `pyproject.toml` with kernel version specifications and `kernel.lock` with the locked kernels.	2025-07-23 09:37:05 +02:00
Daniël de Kok	4a04c005e3	Add version support to `LayerRepository` (#113 ) * Add version support to `LayerRepository` * Remove some docs that do not apply * Removed unused member variable	2025-07-22 17:02:39 +02:00
Daniël de Kok	071900fd69	`get_kernel`: allow Python-style version specifiers (#111 ) Use Python-style version specifiers to resolve to tags. E.g., given the presence of the tags `v0.1.0`, `v0.1.1`, and `v0.2.0`, get_kernel("my/kernel", version=">=0.1.0,<0.2.0") would resolve to `v0.1.1`.	2025-07-21 17:18:35 +02:00
Daniël de Kok	c841a6c90d	Improve mode handling (#108 ) * Set `kernelize` default mode to `Mode.TRAINING \| Mode.TORCH_COMPILE` Also update docs and tests. * Rename `Mode.DEFAULT` to `Mode.FALLBACK` * More fine-grained fallbacks For instance, INFERENCE can fall back to INFERENCE \| TORCH_COMPILE, TRAINING, TRAINING \| TORCH_COMPILE, and FALLBACK. * Update documtenation for mode fallback * Mention that you can rerun `kernelize` to change the mode	2025-07-15 16:10:43 +02:00
Daniël de Kok	c7a343f195	Support registering layers with a range of CUDA capabilities (#106 ) * Add interval tree implementation * Support registering layers with a range of CUDA capabilities This change adds support for registering a layers for ranges of CUDA capabilities. This makes it possible to use newer, faster kernels for new GPUs, while falling back to another implementation on older GPUs. * Add docs for registering kernels with CUDA capabilities * Fix typing errors	2025-07-14 16:59:21 +02:00
Daniël de Kok	8d838f947d	Fix macOS tests by marking some CUDA-only tests (#105 )	2025-07-10 12:24:25 +02:00
Daniël de Kok	fc935d9874	Support registering inference/training-specific layers (#103 ) * Support registering inference/training-specific layers This change makes it possible to register kernels specialized for inference, training, and/or `torch.compile`. To do so, the mapping notation is extended to support registering specialized kernels for a specific 'mode'. For instance, the following mapping, ```python kernel_layer_mapping = { "SiluAndMul": { "cuda": { Mode.DEFAULT: LayerRepository( repo_id="kernels-community/activation", layer_name="SiluAndMul", ), Mode.TRAINING \| Mode.TORCH_COMPILE: LayerRepository( repo_id="kernels-community/activation-training-optimized", layer_name="SiluAndMul", ), } } } ``` uses `kernels-community/activation` by default, but will switch to using `kernels-community/activation-training-optimized` if a model is kernelized for training and `torch.compile`. To make it easier to add more modes in the future and to unify the `register_kernel_mapping` and `kernelize` signatures, the `training` and `needs_torch_compile` arguments of `kernelize` are replaced by a single `mode` argument: ```python model = MyModel(...) model = kernelize(model, mode=Mode.TRAINING \| Mode.TORCH_COMPILE) ``` * Documentation fixes Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Add note on when the fallback is used * Tighten up some Mode checks * Fix ruff check * Attempt to fix mypy errors * More typing fixes * Ignore Python < 3.11 type check SNAFU --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-07-04 19:57:14 +02:00
Daniël de Kok	3622e1f8dd	Add `get_local_kernel` function (#102 ) This function loads a kernel from a local repository (e.g. the output of kernel-builder), which can be handy for testing.	2025-07-01 13:58:47 +02:00
Daniël de Kok	3a635eaeea	Automatic fallback for kernels that don't support training (#90 ) For kernels that do not support backward, fall back to the original implementation if `model.train(True)` is called. This removes the need for the `needs_backward` argument of `kernelize`.	2025-06-03 19:13:57 +02:00
Mohamed Mekkouri	32ec496c5a	Make the forward pass `torch.compile` compatible (#87 ) * first commit * style * update * fix * different approach * Polish kernelize - Process comment from the PR. - Replacement should be on instances, not the class. - Remove torch compile checks (not relevant during kernelize). We might add it back in a different way in another commit: add an option to `kernelize`. * Fixup tests * Fix `torch.compile` support * Remove some unused code * Sync the docs * CI: update Torch versions --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2025-06-03 15:06:02 +02:00
Daniël de Kok	848c6db87b	Add support for Metal builds (#89 ) * Add support for Metal builds * Add Metal test, gate tests by OS where necessary	2025-05-30 15:54:28 +02:00
Daniël de Kok	2036892762	Allow layers to opt in to `torch.compile` (#79 ) * Allow layers to opt in to `torch.compile` This change allows a layer to set the `can_torch_compile` class variable to indicate that the layer is compatible with `torch.compile`. When enabled, the layer does not fall back to the original implementation when `torch.compile` is used. * Comment fixes Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-05-06 09:36:33 +02:00
Daniël de Kok	437f910336	Add `has_kernel` function (#69 ) * Add `has_kernel` function This function checks whether a kernel build exists for the current environment (Torch version and compute framework). * Test kernel repo that only contains Torch 2.4	2025-04-11 10:12:37 +02:00
Daniël de Kok	1d14abcef0	Do not use kernels without backward when training (#68 ) * Do not use kernels without backward when training * Update repo for backwards marker test	2025-04-11 10:05:57 +02:00
Daniël de Kok	b7d6867c52	`use_kernel_mapping`: add `inherit_mapping` option (#55 ) `inherit_mapping` is the default and extends the existing mapping with the given mapping. If `inherit_mapping` is `False`, existing mappings are not inherited.	2025-03-21 17:28:45 +01:00
Daniël de Kok	9861a5bdef	Fix `forward` positional argument handling (#48 )	2025-03-19 15:34:35 +01:00
Daniël de Kok	df45cf2795	Add `use_kernel_forward_from_hub` decorator (#46 ) * Add `use_kernel_forward_from_hub` decorator This decorator replaces a layer's `forward` with the `forward` of a layer on the hub. * Add support for registering a mapping for the duration of a context This change makes `_KERNEL_MAPPING` a context variable and adds a `use_kernel_mapping` context manager. This allows users to register a mapping for the duration of a context. * Update layer docs * ruff fix * Remove an old bit from the docs * Extend layer mapping example Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Support stringly-typed device type * Forward-reference `register_kernel_mapping` in monkeypatching section * Use stringly-typed device name in layer mapping example Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-03-19 11:03:18 +01:00
Daniël de Kok	b6a393612f	Pass through locked sha again when loading locked kernels (#42 ) This bit got removed accidentally when adding support for universal kernels. Also add a test to ensure that we'd catch this in the future.	2025-03-10 15:10:47 +01:00
Daniël de Kok	a40756f306	Configure ruff lints and add to CI (#39 )	2025-03-07 20:32:44 +01:00
Daniël de Kok	3671158f47	Rename `noarch` to `universal` (#38 ) Also update docs to mention this variant.	2025-03-07 15:12:44 +01:00
Daniël de Kok	497dffb89e	Support kernels that are not pre-compiled (#35 ) * Support kernels that are not pre-compiled This change add support for kernels that are not precompiled (such as Triton-based kernels). For Torch, these kernels are assumed to be in `build/torch-noarch`. Kernel download functions will filter on both the expected (CUDA) build variant and the `noarch` variant. If a binary variant exists, it is used. Otherwise the `noarch` variant is used when present. We don't append a Torch version, since in most cases the output for every `ver` in `build/torch<ver>-noarch` would be the same. If some kernel needs features that are only available in a specific Torch version, the capabilities can be checked by the kernel itself at runtime. * CI: system Python does not have headers installed	2025-03-05 14:05:46 +01:00
Lysandre Debut	4116d6019e	hf-kernels -> kernels (#32 ) * hf-kernels -> kernels * Set version to 0.1.7 * hf-kernels.lock -> kernels.lock	2025-02-25 16:13:37 +01:00
Lysandre	bd166b348a	Revert "hf-kernels -> kernels" This reverts commit 386c2a104ef4c251912e63bfcdbfaa588dc09605.	2025-02-25 15:06:35 +01:00
Lysandre	386c2a104e	hf-kernels -> kernels	2025-02-25 15:05:38 +01:00
Daniël de Kok	c7516b9e50	Use per-build variant hashes in the lockfile (#29 ) This makes the lock file a fair bit shorter than per-file hashes. The hash is computed from filenames + SHA-1 hash for git objects/SHA-256 hash for LFS files.	2025-02-25 14:58:03 +01:00
Nicolas Patry	b6ae897c4d	Fix all occurrences.	2025-01-20 12:55:22 +01:00
drbh	c5ad392b77	fix: adjust example and docker for name	2025-01-15 23:09:28 +00:00
drbh	14b9350f3c	CI build and test multiple cuda and torch versions (#6 ) * feat: add workflow and multistage torch docker builder * feat: add configurable docker builder workflow * fix: improve file structure * fix: improve with pytest * feat: run tests and benches after build * fix: fix empty exlude in workflow * fix: specify dockerfile location * fix: include subset of combinations of ubuntu 18.04 and cuda 11.8 * fix: improve version syntax * fix: add support for cuda 11.8 in dockerfile * fix: pin python version in image from workflow * fix: syntax tweak python version in dockerfile * fix: adjust build args in dockerfile * fix: avoid loading the image and ensure building works	2025-01-15 13:27:39 +01:00

42 Commits