Files
vllm-ascend/docs/source/community/user_stories/llamafactory.md
Li Wang bdfb065b5d [1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)
### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-25 22:16:10 +08:00

1.1 KiB

LLaMA-Factory

About / Introduction

LLaMA-Factory is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained models locally without writing any code.

LLaMA-Facotory users need to evaluate and inference the model after fine-tuning the model.

The Business Challenge

LLaMA-Factory used transformers to perform inference on Ascend NPU, but the speed was slow.

Solving Challenges and Benefits with vLLM Ascend

With the joint efforts of LLaMA-Factory and vLLM Ascend (LLaMA-Factory#7739), the performance of LLaMA-Factory in the model inference stage has been significantly improved. According to the test results, the inference speed of LLaMA-Factory has been increased to 2x compared to the transformers version.

Learn more

See more about LLaMA-Factory and how it uses vLLM Ascend for inference on the Ascend NPU in the following documentation: LLaMA-Factory Ascend NPU Inference.