-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Labels
bugSomething isn't workingSomething isn't working
Description
I'm trying to benchmark the performance of vLLM OPT. But I find that when I pass a relatively large batch of prompts to vLLM, it will raise decode error when the sequence length meets a threshold (which makes the problem look like an OOM).
A minimal reproduction for this issue:
from vllm import LLM, SamplingParams
def make_input(bs):
return ["Hello!" for _ in range(bs)]
bs = 128
generate_length = 200
# Create a sampling params object.
sampling_params = SamplingParams(
temperature=0.8,
top_p=0.95,
max_tokens=generate_length)
# Create an LLM.
llm = LLM(
model="facebook/opt-125m",
use_dummy_weights=True,
)
input = make_input(bs)
out = llm.generate(input, sampling_params)When bs=128, the error happens in the 108-th token approximately. The error looks like
Traceback (most recent call last):
File "vllm-none-problem-repro.py", line 21, in <module>
out = llm.generate(input, sampling_params)
File "/llm-bench/vllm-src/vllm/entrypoints/llm.py", line 127, in generate
return self._run_engine(use_tqdm)
File "/llm-bench/vllm-src/vllm/entrypoints/llm.py", line 147, in _run_engine
step_outputs = self.llm_engine.step()
File "/llm-bench/vllm-src/vllm/engine/llm_engine.py", line 246, in step
self._decode_sequences(seq_groups)
File "/llm-bench/vllm-src/vllm/engine/llm_engine.py", line 263, in _decode_sequences
new_token, new_output_text = detokenize_incrementally(
File "/llm-bench/vllm-src/vllm/transformers_utils/tokenizer.py", line 73, in detokenize_incrementally
output_text = tokenizer.convert_tokens_to_string(output_tokens)
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 533, in convert_tokens_to_string
return self.backend_tokenizer.decoder.decode(tokens)
TypeError: argument 'tokens': 'NoneType' object cannot be converted to 'PyString
If I use a smaller bs, the "threshold" will also increase (>108). For example, it's around 210 when bs=64. Seems that there is a limit for bs * length.
tju01, hzhua, nileee, infwinston, YangWang92 and 3 more
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working