[misc] Do not allow to use lora with chunked prefill. (#5538)

Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
This commit is contained in:
SangBin Cho
2024-06-15 23:59:36 +09:00
committed by GitHub
parent 81fbb3655f
commit e691918e3b

View File

@ -1092,6 +1092,8 @@ class LoRAConfig:
"Due to limitations of the custom LoRA CUDA kernel, "
"max_num_batched_tokens must be <= 65528 when "
"LoRA is enabled.")
if scheduler_config.chunked_prefill_enabled:
raise ValueError("LoRA is not supported with chunked prefill yet.")
@dataclass