mirror of https://github.com/vllm-project/vllm.git synced 2025-10-20 23:03:52 +08:00

Files

Harry Mellor e09d1753ec Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416 )

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

2025-10-08 10:40:42 -07:00

auto_tune

[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts (#26336 )

2025-10-07 16:46:44 +08:00

cutlass_benchmarks

[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 )

2025-10-02 19:35:13 +00:00

disagg_benchmarks

[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967 )

2025-10-02 10:04:57 -07:00

fused_kernels

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

kernels

[Benchmarks] Fix imports in FP8 tuning script (#26407 )

2025-10-08 16:31:59 +00:00

multi_turn

Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 (#26416 )

2025-10-08 10:40:42 -07:00

overheads

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

structured_schemas

benchmarks: simplify test jsonschema (#14567 )

2025-03-11 13:39:30 +00:00

backend_request_func.py

[Misc] Add request_id into benchmark_serve.py (#23065 )

2025-08-19 08:32:18 +00:00

benchmark_block_pool.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

benchmark_latency.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_long_document_qa_throughput.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_ngram_proposer.py

Fix per file ruff ignores related to line length (#26262 )

2025-10-06 05:12:40 +00:00

benchmark_prefix_caching.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_prioritization.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_serving_structured_output.py

Fix per file ruff ignores related to line length (#26262 )

2025-10-06 05:12:40 +00:00

benchmark_serving.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_throughput.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_utils.py

[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437 )

2025-08-13 14:44:06 -07:00

README.md

[Docs] move benchmarks README to contributing guides (#24820 )

2025-09-16 05:52:57 -07:00

run_structured_output_benchmark.sh

[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 )

2025-05-13 01:47:29 -07:00

sonnet.txt

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )

2024-03-27 13:39:26 -07:00

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage