* Improve errors for layer validation
Include the repo and layer name as well as the name of the class
that is being compared to (when applicable).
* Remove upload xfail
* Only enable tests that require a token with `--token`
* add an uploading utility.
* format
* remove stale files.
* black format
* sorted imports.
* up
* up
* add a test
* propagate.
* remove duplicate imports.
* Apply suggestions from code review
Co-authored-by: Daniël de Kok <me@danieldk.eu>
* up
* up
* up
* command to format all files at once would be nice.
* up
* up
* up
* Use token for upload test
* assign env better.
* docs
* polish
* up
* xfail the test for now.
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
This change adds `LockedLayerRepository` as an alternative to
`LayerRepository`. `LockedLayerRepository` allows for locking all kernel
layers that are used at the project level. Example usage:
```
with use_kernel_mapping(
{
"SomeLayer": {
"cuda": LockedLayerRepository(
repo_id="some-org/some-layer",
layer_name="SomeLayer",
)
},
}
):
layer = kernelize(layer, device="cuda", mode=Mode.INFERENCE)
```
This requires that the project has a `pyproject.toml` with kernel
version specifications and `kernel.lock` with the locked kernels.
Use Python-style version specifiers to resolve to tags. E.g., given
the presence of the tags `v0.1.0`, `v0.1.1`, and `v0.2.0`,
get_kernel("my/kernel", version=">=0.1.0,<0.2.0")
would resolve to `v0.1.1`.
* Set `kernelize` default mode to `Mode.TRAINING | Mode.TORCH_COMPILE`
Also update docs and tests.
* Rename `Mode.DEFAULT` to `Mode.FALLBACK`
* More fine-grained fallbacks
For instance, INFERENCE can fall back to INFERENCE | TORCH_COMPILE,
TRAINING, TRAINING | TORCH_COMPILE, and FALLBACK.
* Update documtenation for mode fallback
* Mention that you can rerun `kernelize` to change the mode
* Add interval tree implementation
* Support registering layers with a range of CUDA capabilities
This change adds support for registering a layers for ranges
of CUDA capabilities. This makes it possible to use newer, faster
kernels for new GPUs, while falling back to another implementation
on older GPUs.
* Add docs for registering kernels with CUDA capabilities
* Fix typing errors
* Support registering inference/training-specific layers
This change makes it possible to register kernels specialized for
inference, training, and/or `torch.compile`. To do so, the mapping
notation is extended to support registering specialized kernels
for a specific 'mode'. For instance, the following mapping,
```python
kernel_layer_mapping = {
"SiluAndMul": {
"cuda": {
Mode.DEFAULT: LayerRepository(
repo_id="kernels-community/activation",
layer_name="SiluAndMul",
),
Mode.TRAINING | Mode.TORCH_COMPILE: LayerRepository(
repo_id="kernels-community/activation-training-optimized",
layer_name="SiluAndMul",
),
}
}
}
```
uses `kernels-community/activation` by default, but will switch to
using `kernels-community/activation-training-optimized` if a model
is kernelized for training and `torch.compile`.
To make it easier to add more modes in the future and to unify the
`register_kernel_mapping` and `kernelize` signatures, the `training`
and `needs_torch_compile` arguments of `kernelize` are replaced by
a single `mode` argument:
```python
model = MyModel(...)
model = kernelize(model, mode=Mode.TRAINING | Mode.TORCH_COMPILE)
```
* Documentation fixes
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Add note on when the fallback is used
* Tighten up some Mode checks
* Fix ruff check
* Attempt to fix mypy errors
* More typing fixes
* Ignore Python < 3.11 type check SNAFU
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
For kernels that do not support backward, fall back to the original
implementation if `model.train(True)` is called. This removes the
need for the `needs_backward` argument of `kernelize`.
* first commit
* style
* update
* fix
* different approach
* Polish kernelize
- Process comment from the PR.
- Replacement should be on instances, not the class.
- Remove torch compile checks (not relevant during kernelize). We
might add it back in a different way in another commit: add an
option to `kernelize`.
* Fixup tests
* Fix `torch.compile` support
* Remove some unused code
* Sync the docs
* CI: update Torch versions
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>