* Improve errors for layer validation
Include the repo and layer name as well as the name of the class
that is being compared to (when applicable).
* Remove upload xfail
* Only enable tests that require a token with `--token`
* add an uploading utility.
* format
* remove stale files.
* black format
* sorted imports.
* up
* up
* add a test
* propagate.
* remove duplicate imports.
* Apply suggestions from code review
Co-authored-by: Daniël de Kok <me@danieldk.eu>
* up
* up
* up
* command to format all files at once would be nice.
* up
* up
* up
* Use token for upload test
* assign env better.
* docs
* polish
* up
* xfail the test for now.
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
This change adds `LockedLayerRepository` as an alternative to
`LayerRepository`. `LockedLayerRepository` allows for locking all kernel
layers that are used at the project level. Example usage:
```
with use_kernel_mapping(
{
"SomeLayer": {
"cuda": LockedLayerRepository(
repo_id="some-org/some-layer",
layer_name="SomeLayer",
)
},
}
):
layer = kernelize(layer, device="cuda", mode=Mode.INFERENCE)
```
This requires that the project has a `pyproject.toml` with kernel
version specifications and `kernel.lock` with the locked kernels.
Use Python-style version specifiers to resolve to tags. E.g., given
the presence of the tags `v0.1.0`, `v0.1.1`, and `v0.2.0`,
get_kernel("my/kernel", version=">=0.1.0,<0.2.0")
would resolve to `v0.1.1`.
* Set `kernelize` default mode to `Mode.TRAINING | Mode.TORCH_COMPILE`
Also update docs and tests.
* Rename `Mode.DEFAULT` to `Mode.FALLBACK`
* More fine-grained fallbacks
For instance, INFERENCE can fall back to INFERENCE | TORCH_COMPILE,
TRAINING, TRAINING | TORCH_COMPILE, and FALLBACK.
* Update documtenation for mode fallback
* Mention that you can rerun `kernelize` to change the mode
* Add interval tree implementation
* Support registering layers with a range of CUDA capabilities
This change adds support for registering a layers for ranges
of CUDA capabilities. This makes it possible to use newer, faster
kernels for new GPUs, while falling back to another implementation
on older GPUs.
* Add docs for registering kernels with CUDA capabilities
* Fix typing errors
* Support registering inference/training-specific layers
This change makes it possible to register kernels specialized for
inference, training, and/or `torch.compile`. To do so, the mapping
notation is extended to support registering specialized kernels
for a specific 'mode'. For instance, the following mapping,
```python
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": {
Mode.DEFAULT: LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
),
Mode.TRAINING | Mode.TORCH_COMPILE: LayerRepository(
repo_id="kernels-community/activation-training-optimized",
layer_name="SiluAndMul",
),
}
}
}
```
uses `kernels-community/activation` by default, but will switch to
using `kernels-community/activation-training-optimized` if a model
is kernelized for training and `torch.compile`.
To make it easier to add more modes in the future and to unify the
`register_kernel_mapping` and `kernelize` signatures, the `training`
and `needs_torch_compile` arguments of `kernelize` are replaced by
a single `mode` argument:
```python
model = MyModel(...)
model = kernelize(model, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
```
* Documentation fixes
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Add note on when the fallback is used
* Tighten up some Mode checks
* Fix ruff check
* Attempt to fix mypy errors
* More typing fixes
* Ignore Python < 3.11 type check SNAFU
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
For kernels that do not support backward, fall back to the original
implementation if `model.train(True)` is called. This removes the
need for the `needs_backward` argument of `kernelize`.
* first commit
* style
* update
* fix
* different approach
* Polish kernelize
- Process comment from the PR.
- Replacement should be on instances, not the class.
- Remove torch compile checks (not relevant during kernelize). We
might add it back in a different way in another commit: add an
option to `kernelize`.
* Fixup tests
* Fix `torch.compile` support
* Remove some unused code
* Sync the docs
* CI: update Torch versions
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
* Allow layers to opt in to `torch.compile`
This change allows a layer to set the `can_torch_compile` class
variable to indicate that the layer is compatible with `torch.compile`.
When enabled, the layer does not fall back to the original
implementation when `torch.compile` is used.
* Comment fixes
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Add `has_kernel` function
This function checks whether a kernel build exists for the current
environment (Torch version and compute framework).
* Test kernel repo that only contains Torch 2.4
`inherit_mapping` is the default and extends the existing mapping
with the given mapping. If `inherit_mapping` is `False`, existing
mappings are not inherited.
* Add `use_kernel_forward_from_hub` decorator
This decorator replaces a layer's `forward` with the `forward` of
a layer on the hub.
* Add support for registering a mapping for the duration of a context
This change makes `_KERNEL_MAPPING` a context variable and adds a
`use_kernel_mapping` context manager. This allows users to register
a mapping for the duration of a context.
* Update layer docs
* ruff fix
* Remove an old bit from the docs
* Extend layer mapping example
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Support stringly-typed device type
* Forward-reference `register_kernel_mapping` in monkeypatching section
* Use stringly-typed device name in layer mapping example
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Support kernels that are not pre-compiled
This change add support for kernels that are not precompiled (such as
Triton-based kernels). For Torch, these kernels are assumed to be in
`build/torch-noarch`. Kernel download functions will filter on both
the expected (CUDA) build variant and the `noarch` variant. If a binary
variant exists, it is used. Otherwise the `noarch` variant is used
when present.
We don't append a Torch version, since in most cases the output for
every `ver` in `build/torch<ver>-noarch` would be the same. If some
kernel needs features that are only available in a specific Torch
version, the capabilities can be checked by the kernel itself at
runtime.
* CI: system Python does not have headers installed
This makes the lock file a fair bit shorter than per-file hashes. The
hash is computed from filenames + SHA-1 hash for git objects/SHA-256
hash for LFS files.
* feat: add workflow and multistage torch docker builder
* feat: add configurable docker builder workflow
* fix: improve file structure
* fix: improve with pytest
* feat: run tests and benches after build
* fix: fix empty exlude in workflow
* fix: specify dockerfile location
* fix: include subset of combinations of ubuntu 18.04 and cuda 11.8
* fix: improve version syntax
* fix: add support for cuda 11.8 in dockerfile
* fix: pin python version in image from workflow
* fix: syntax tweak python version in dockerfile
* fix: adjust build args in dockerfile
* fix: avoid loading the image and ensure building works