-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Description
🚀 The feature, motivation and pitch
Currently, the benchmark script of vLLM supports multiple backends, and the overall functionality is also relatively rich.
And it relies on backend_request_func and get_tokenizer. The backend_request_func is independent and is a separate file but if we want to use get_tokenizer, we need to clone the repository or install Python package.
vllm/benchmarks/benchmark_serving.py
Lines 37 to 42 in 845a3f2
| from backend_request_func import (ASYNC_REQUEST_FUNCS, RequestFuncInput, | |
| RequestFuncOutput) | |
| from tqdm.asyncio import tqdm | |
| from transformers import PreTrainedTokenizerBase | |
| from vllm.transformers_utils.tokenizer import get_tokenizer |
vllm/vllm/transformers_utils/tokenizer.py
Line 57 in 845a3f2
| def get_tokenizer( |
When we typically use the vLLM script to benchmark other backends, we do not want to rely on vLLM components. We don't want to clone the repository or install a Python package.
May I submit a PR to extract the function get_tokenizer into backend_request_func? Do you think this is okay or do you have any other suggestions? Thanks.
Alternatives
No response
Additional context
No response