mirror of https://github.com/huggingface/transformers.git synced 2025-10-20 17:13:56 +08:00

Files

stevhliu 0ecb993601 usage tips

2025-10-15 14:08:54 -07:00

2.9 KiB

Raw Blame History

This model was released on 2023-12-01 and added to Hugging Face Transformers on 2024-03-05 and contributed by ArthurZ.

Mamba

Mamba is a state-space-model architecture designed to address the computational inefficiency of Transformers on long sequences. It incorporates selective state spaces that allow the model to selectively propagate or forget information based on the input, enhancing content-based reasoning. Despite this, Mamba maintains linear scaling in sequence length and achieves fast inference, with a throughput 5× higher than Transformers. The model integrates mixer layers, similar to attention layers, and is optimized for hardware efficiency. Mamba outperforms Transformers of the same size and matches larger Transformers in performance across various modalities, including language, audio, and genomics.

import torch
from transformers import pipeline

pipeline = pipeline(task="text-generation", model="state-spaces/mamba-130m-hf", dtype="auto",)
pipeline("Plants create energy through a process known as photosynthesis.")

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-130m-hf")
model = AutoModelForCausalLM.from_pretrained("state-spaces/mamba-130m-hf", dtype="auto",)

inputs = tokenizer("Plants create energy through a process known as photosynthesis.", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Usage tips

The current implementation uses the original CUDA kernels. The FlashAttention equivalent implementation is hosted in the mamba-ssm and causal_conv1d repositories. Install them if your hardware supports it.
Mamba stacks mixer layers which are equivalent to attention layers. Find the main logic of Mamba in the [MambaMixer] class.

MambaConfig

autodoc MambaConfig

MambaModel

autodoc MambaModel - forward

MambaLMHeadModel

autodoc MambaForCausalLM - forward

MambaCache

autodoc MambaCache

2.9 KiB Raw Blame History Unescape Escape