mirror of
https://github.com/vllm-project/vllm.git
synced 2025-10-20 14:53:52 +08:00
[Doc][TPU] Add models and features supporting matrix. (#20230)
Signed-off-by: Qiliang Cui <cuiq@google.com>
This commit is contained in:
@ -39,6 +39,7 @@ nav:
|
||||
- models/generative_models.md
|
||||
- models/pooling_models.md
|
||||
- models/extensions
|
||||
- Hardware Supported Models: models/hardware_supported_models
|
||||
- Features:
|
||||
- features/compatibility_matrix.md
|
||||
- features/*
|
||||
|
@ -59,23 +59,23 @@ th:not(:first-child) {
|
||||
|
||||
## Feature x Hardware
|
||||
|
||||
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD |
|
||||
|-----------------------------------------------------------|--------------------|----------|----------|-------|----------|--------------------|-------|
|
||||
| [CP][chunked-prefill] | [❌](gh-issue:2729) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| [APC][automatic-prefix-caching] | [❌](gh-issue:3687) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| [LoRA][lora-adapter] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| <abbr title="Prompt Adapter">prmpt adptr</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | [❌](gh-issue:8475) | ✅ |
|
||||
| [SD][spec-decode] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
|
||||
| <abbr title="Pooling Models">pooling</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ |
|
||||
| <abbr title="Encoder-Decoder Models">enc-dec</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| <abbr title="Multimodal Inputs">mm</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| <abbr title="Logprobs">logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| <abbr title="Prompt Logprobs">prmpt logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| <abbr title="Async Output Processing">async output</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | [❌](gh-issue:8477) | ✅ |
|
||||
| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD | TPU |
|
||||
|-----------------------------------------------------------|---------------------|-----------|-----------|--------|------------|--------------------|--------|-----|
|
||||
| [CP][chunked-prefill] | [❌](gh-issue:2729) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| [APC][automatic-prefix-caching] | [❌](gh-issue:3687) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| [LoRA][lora-adapter] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| <abbr title="Prompt Adapter">prmpt adptr</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | [❌](gh-issue:8475) | ✅ | ❌ |
|
||||
| [SD][spec-decode] | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ |
|
||||
| <abbr title="Pooling Models">pooling</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❌ |
|
||||
| <abbr title="Encoder-Decoder Models">enc-dec</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| <abbr title="Multimodal Inputs">mm</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| <abbr title="Logprobs">logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| <abbr title="Prompt Logprobs">prmpt logP</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| <abbr title="Async Output Processing">async output</abbr> | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
|
||||
| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | [❌](gh-issue:8477) | ✅ | ❌ |
|
||||
| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
|
||||
!!! note
|
||||
Please refer to [Feature support through NxD Inference backend][feature-support-through-nxd-inference-backend] for features supported on AWS Neuron hardware
|
||||
|
36
docs/models/hardware_supported_models/tpu.md
Normal file
36
docs/models/hardware_supported_models/tpu.md
Normal file
@ -0,0 +1,36 @@
|
||||
---
|
||||
title: TPU
|
||||
---
|
||||
[](){ #tpu-supported-models }
|
||||
|
||||
# TPU Supported Models
|
||||
## Text-only Language Models
|
||||
|
||||
| Model | Architecture | Supported |
|
||||
|-----------------------------------------------------|--------------------------------|-----------|
|
||||
| mistralai/Mixtral-8x7B-Instruct-v0.1 | MixtralForCausalLM | 🟨 |
|
||||
| mistralai/Mistral-Small-24B-Instruct-2501 | MistralForCausalLM | ✅ |
|
||||
| mistralai/Codestral-22B-v0.1 | MistralForCausalLM | ✅ |
|
||||
| mistralai/Mixtral-8x22B-Instruct-v0.1 | MixtralForCausalLM | ❌ |
|
||||
| meta-llama/Llama-3.3-70B-Instruct | LlamaForCausalLM | ✅ |
|
||||
| meta-llama/Llama-3.1-8B-Instruct | LlamaForCausalLM | ✅ |
|
||||
| meta-llama/Llama-3.1-70B-Instruct | LlamaForCausalLM | ✅ |
|
||||
| meta-llama/Llama-4-* | Llama4ForConditionalGeneration | ❌ |
|
||||
| microsoft/Phi-3-mini-128k-instruct | Phi3ForCausalLM | 🟨 |
|
||||
| microsoft/phi-4 | Phi3ForCausalLM | ❌ |
|
||||
| google/gemma-3-27b-it | Gemma3ForConditionalGeneration | 🟨 |
|
||||
| google/gemma-3-4b-it | Gemma3ForConditionalGeneration | ❌ |
|
||||
| deepseek-ai/DeepSeek-R1 | DeepseekV3ForCausalLM | ❌ |
|
||||
| deepseek-ai/DeepSeek-V3 | DeepseekV3ForCausalLM | ❌ |
|
||||
| RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 | LlamaForCausalLM | ✅ |
|
||||
| RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 | LlamaForCausalLM | ✅ |
|
||||
| Qwen/Qwen3-8B | Qwen3ForCausalLM | ✅ |
|
||||
| Qwen/Qwen3-32B | Qwen3ForCausalLM | ✅ |
|
||||
| Qwen/Qwen2.5-7B-Instruct | Qwen2ForCausalLM | ✅ |
|
||||
| Qwen/Qwen2.5-32B | Qwen2ForCausalLM | ✅ |
|
||||
| Qwen/Qwen2.5-14B-Instruct | Qwen2ForCausalLM | ✅ |
|
||||
| Qwen/Qwen2.5-1.5B-Instruct | Qwen2ForCausalLM | 🟨 |
|
||||
|
||||
✅ Runs and optimized.
|
||||
🟨 Runs and correct but not optimized to green yet.
|
||||
❌ Does not pass accuracy test or does not run.
|
Reference in New Issue
Block a user