kernels

mirror of https://github.com/huggingface/kernels.git synced 2025-10-21 05:30:30 +08:00

Author	SHA1	Message	Date
drbh	e7173d9a28	fix: lint correction	2025-08-07 18:16:34 +00:00
drbh	e487945cf6	fix: help mypy type checking	2025-08-07 18:14:07 +00:00
drbh	338752c367	fix: improve metaclass repr	2025-08-07 17:59:32 +00:00
drbh	53723ea986	fix: improve name	2025-08-07 17:53:51 +00:00
drbh	95d3a758a9	feat: add custom type for kernel modules	2025-08-07 17:48:29 +00:00
Daniël de Kok	da701bf58a	Small markup fixes of the local kernel repo example (#127 )	2025-08-06 08:02:28 +02:00
Daniël de Kok	703664ed31	Set version to 0.9.0.dev0 (#126 )	2025-08-01 16:37:30 +02:00
Ákos Hadnagy	a8a6564fa7	Add ROCm device discovery (#122 ) * Add ROCm device discovery * Ruff * Address review comments * Ruff * Reorg torch import * Remove redundant import * Apply suggestions from code review Co-authored-by: Daniël de Kok <me@danieldk.eu> * Address review comments * Validat device type * Clean diff * black * Sync test with repo changes * black again --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2025-08-01 16:09:45 +02:00
Daniël de Kok	c89e0fa9b9	Nix: go back to hf-nix main (#125 )	2025-08-01 15:56:02 +02:00
Daniël de Kok	176a601178	Run black check (#124 )	2025-08-01 15:42:38 +02:00
Daniël de Kok	cfa0c76ddc	Add `LocalLayerRepository` to load from a local repo (#123 )	2025-08-01 14:03:11 +02:00
Daniël de Kok	bcc29915f9	Log when using fallback layer (#121 )	2025-07-31 17:18:00 +02:00
Daniël de Kok	6fbff7a9cb	Add doc build to CI (#119 ) * Add doc build to CI * Trigger doc build * No path scoping	2025-07-29 16:01:05 +02:00
Daniël de Kok	f7490bd0a9	Test examples in docstrings using mktestdocs (#118 ) Also adjust examples so that they are correct.	2025-07-28 17:31:34 +02:00
Daniël de Kok	8069e3bf0c	Update documentation for compatibility with doc-builder (#117 )	2025-07-24 16:21:54 +02:00
Madeesh Kannan	c540d1e1d6	Fix typo in layers documentation (#116 )	2025-07-23 17:13:14 +02:00
Daniël de Kok	967ac581b8	Set version to 0.8.1.dev0 (#115 )	2025-07-23 14:42:24 +02:00
Daniël de Kok	81088d44e8	Add support for project-wide locking of layers (#114 ) This change adds `LockedLayerRepository` as an alternative to `LayerRepository`. `LockedLayerRepository` allows for locking all kernel layers that are used at the project level. Example usage: ``` with use_kernel_mapping( { "SomeLayer": { "cuda": LockedLayerRepository( repo_id="some-org/some-layer", layer_name="SomeLayer", ) }, } ): layer = kernelize(layer, device="cuda", mode=Mode.INFERENCE) ``` This requires that the project has a `pyproject.toml` with kernel version specifications and `kernel.lock` with the locked kernels.	2025-07-23 09:37:05 +02:00
Daniël de Kok	4a04c005e3	Add version support to `LayerRepository` (#113 ) * Add version support to `LayerRepository` * Remove some docs that do not apply * Removed unused member variable	2025-07-22 17:02:39 +02:00
Wang, Yi	6d3c6daf20	triton based kernel could also run in xpu (#112 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-07-22 10:03:34 +02:00
Daniël de Kok	071900fd69	`get_kernel`: allow Python-style version specifiers (#111 ) Use Python-style version specifiers to resolve to tags. E.g., given the presence of the tags `v0.1.0`, `v0.1.1`, and `v0.2.0`, get_kernel("my/kernel", version=">=0.1.0,<0.2.0") would resolve to `v0.1.1`.	2025-07-21 17:18:35 +02:00
Daniël de Kok	2d2c6b14e0	Set version to 0.8.0.dev0 (#110 )	2025-07-15 18:45:03 +02:00
Daniël de Kok	03edc573b1	Log kernel layer selection (#109 )	2025-07-15 18:38:17 +02:00
Daniël de Kok	c841a6c90d	Improve mode handling (#108 ) * Set `kernelize` default mode to `Mode.TRAINING \| Mode.TORCH_COMPILE` Also update docs and tests. * Rename `Mode.DEFAULT` to `Mode.FALLBACK` * More fine-grained fallbacks For instance, INFERENCE can fall back to INFERENCE \| TORCH_COMPILE, TRAINING, TRAINING \| TORCH_COMPILE, and FALLBACK. * Update documtenation for mode fallback * Mention that you can rerun `kernelize` to change the mode	2025-07-15 16:10:43 +02:00
Daniël de Kok	c7a343f195	Support registering layers with a range of CUDA capabilities (#106 ) * Add interval tree implementation * Support registering layers with a range of CUDA capabilities This change adds support for registering a layers for ranges of CUDA capabilities. This makes it possible to use newer, faster kernels for new GPUs, while falling back to another implementation on older GPUs. * Add docs for registering kernels with CUDA capabilities * Fix typing errors	2025-07-14 16:59:21 +02:00
Daniël de Kok	8d838f947d	Fix macOS tests by marking some CUDA-only tests (#105 )	2025-07-10 12:24:25 +02:00
Daniël de Kok	b87e6fadbe	Set version to 0.7.0.dev0 (#104 )	2025-07-07 14:56:43 +02:00
Daniël de Kok	fc935d9874	Support registering inference/training-specific layers (#103 ) * Support registering inference/training-specific layers This change makes it possible to register kernels specialized for inference, training, and/or `torch.compile`. To do so, the mapping notation is extended to support registering specialized kernels for a specific 'mode'. For instance, the following mapping, ```python kernel_layer_mapping = { "SiluAndMul": { "cuda": { Mode.DEFAULT: LayerRepository( repo_id="kernels-community/activation", layer_name="SiluAndMul", ), Mode.TRAINING \| Mode.TORCH_COMPILE: LayerRepository( repo_id="kernels-community/activation-training-optimized", layer_name="SiluAndMul", ), } } } ``` uses `kernels-community/activation` by default, but will switch to using `kernels-community/activation-training-optimized` if a model is kernelized for training and `torch.compile`. To make it easier to add more modes in the future and to unify the `register_kernel_mapping` and `kernelize` signatures, the `training` and `needs_torch_compile` arguments of `kernelize` are replaced by a single `mode` argument: ```python model = MyModel(...) model = kernelize(model, mode=Mode.TRAINING \| Mode.TORCH_COMPILE) ``` * Documentation fixes Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Add note on when the fallback is used * Tighten up some Mode checks * Fix ruff check * Attempt to fix mypy errors * More typing fixes * Ignore Python < 3.11 type check SNAFU --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-07-04 19:57:14 +02:00
Daniël de Kok	3622e1f8dd	Add `get_local_kernel` function (#102 ) This function loads a kernel from a local repository (e.g. the output of kernel-builder), which can be handy for testing.	2025-07-01 13:58:47 +02:00
Daniël de Kok	a7f3b2e8ed	Set version to 0.6.2.dev0 (#100 )	2025-06-25 09:48:09 +02:00
Daniël de Kok	a6ab5d83ba	Make the flake work on Darwin (#98 )	2025-06-24 20:35:21 +02:00
Daniël de Kok	4f9f1abfb9	darwin: fix variant CPU for aarch64 (#97 )	2025-06-24 20:35:07 +02:00
Daniël de Kok	f94b7780a6	CI: main triton-layer-norm has docs, branch is gone (#99 )	2025-06-24 16:40:36 +02:00
Daniël de Kok	bd28883775	Set version to 0.6.1.dev1 (#96 )	2025-06-20 11:43:26 +02:00
Daniël de Kok	498429e322	Add README generation for layers (#94 )	2025-06-20 10:16:50 +02:00
Daniël de Kok	09c991af4b	Add macOS requirements (#95 )	2025-06-16 17:20:47 +02:00
Daniël de Kok	bcf8df5875	Bump version to 0.6.0.dev0 (#93 )	2025-06-04 13:59:32 +02:00
Daniël de Kok	239afff6f5	Update Nix flake dependencies (#92 ) * Update Nix flake dependencies To ensure that we can test with Torch 2.7 kernels in the development environment. * Update nix fmt to use nixfmt-tree	2025-06-04 12:13:19 +02:00
Daniël de Kok	c5ec6b900a	Hotfix: add FAQ (#91 )	2025-06-04 09:52:39 +02:00
Daniël de Kok	3a635eaeea	Automatic fallback for kernels that don't support training (#90 ) For kernels that do not support backward, fall back to the original implementation if `model.train(True)` is called. This removes the need for the `needs_backward` argument of `kernelize`.	2025-06-03 19:13:57 +02:00
Mohamed Mekkouri	32ec496c5a	Make the forward pass `torch.compile` compatible (#87 ) * first commit * style * update * fix * different approach * Polish kernelize - Process comment from the PR. - Replacement should be on instances, not the class. - Remove torch compile checks (not relevant during kernelize). We might add it back in a different way in another commit: add an option to `kernelize`. * Fixup tests * Fix `torch.compile` support * Remove some unused code * Sync the docs * CI: update Torch versions --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2025-06-03 15:06:02 +02:00
Daniël de Kok	848c6db87b	Add support for Metal builds (#89 ) * Add support for Metal builds * Add Metal test, gate tests by OS where necessary	2025-05-30 15:54:28 +02:00
Daniël de Kok	fabb8c52d1	Add generate-readme subcommand for generating a README (#88 ) * Add `generate-readme` subcommand for generating a README This README includes all the top-level functions with docs (if docstrings are available). * CI: attempt README generation * Add PyYAML dependencies * Typing fixes	2025-05-21 15:43:53 +02:00
Daniël de Kok	d66260dd83	kernels: add the `to-wheel` subcommand (#84 ) * kernels: add the `to-wheel` subcommand This subcommand accepts a kernel repo and version as arguments: kernels to-wheel kernels-community/activation 0.0.3 Wheels will then be generated for every build variant. * CI: check kernel -> wheel conversion * No typing for wheel.wheelfile	2025-05-08 17:30:06 +02:00
Daniël de Kok	daac8078fc	CI: fix some stubs (#83 )	2025-05-07 14:43:57 +02:00
Daniël de Kok	fcb9a80ce6	Set version to 0.5.0 (#82 ) v0.5.0	2025-05-06 11:45:26 +02:00
Daniël de Kok	c25bb32e6e	Add publishing workflow (#81 )	2025-05-06 09:29:08 +00:00
Daniël de Kok	2036892762	Allow layers to opt in to `torch.compile` (#79 ) * Allow layers to opt in to `torch.compile` This change allows a layer to set the `can_torch_compile` class variable to indicate that the layer is compatible with `torch.compile`. When enabled, the layer does not fall back to the original implementation when `torch.compile` is used. * Comment fixes Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-05-06 09:36:33 +02:00
Daniël de Kok	0f0de049cf	docs: link to autogenerated build variant list (#77 )	2025-04-16 17:25:51 +02:00
Daniël de Kok	59597df03e	Specify required aarch64 build variants (#76 )	2025-04-15 16:09:44 +02:00

1 2 3 4

152 Commits