-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Description
🚀 The feature, motivation and pitch
Currently, the benchmark script of vLLM supports multiple backends, and the overall functionality is also relatively rich.
And it relies on backend_request_func
and get_tokenizer
. The backend_request_func
is independent and is a separate file but if we want to use get_tokenizer
, we need to clone the repository or install Python package.
vllm/benchmarks/benchmark_serving.py
Lines 37 to 42 in 845a3f2
from backend_request_func import (ASYNC_REQUEST_FUNCS, RequestFuncInput, | |
RequestFuncOutput) | |
from tqdm.asyncio import tqdm | |
from transformers import PreTrainedTokenizerBase | |
from vllm.transformers_utils.tokenizer import get_tokenizer |
vllm/vllm/transformers_utils/tokenizer.py
Line 57 in 845a3f2
def get_tokenizer( |
When we typically use the vLLM script to benchmark other backends, we do not want to rely on vLLM components. We don't want to clone the repository or install a Python package.
May I submit a PR to extract the function get_tokenizer
into backend_request_func
? Do you think this is okay or do you have any other suggestions? Thanks.
Alternatives
No response
Additional context
No response