Fix vocab_size inconsistency for sampler #2398

tongyx361 · 2024-01-10T02:12:28Z

Fixing #340

simon-mo · 2024-01-10T02:16:40Z

vllm/engine/llm_engine.py

            revision=model_config.revision)
+        model_config.hf_config.sampler_vocab_size = min(
+            len(self.tokenizer), model_config.hf_config.vocab_size)
+        self.cache_config = cache_config


These lines don't need to move right?

This could work, because self.model_config = model_config is a reference assignment.
Maybe it's better to move line 89-96 before self.model_config = model_config?

zhuohan123 · 2024-01-13T00:02:57Z

@tongyx361 What exact bug is this PR fixing?

tongyx361 · 2024-01-14T03:42:56Z

@tongyx361 What exact bug is this PR fixing?

c.f. #340 (comment)

Traceback (most recent call last):
  File "vllm-none-problem-repro.py", line 21, in <module>
    out = llm.generate(input, sampling_params)
  File "/llm-bench/vllm-src/vllm/entrypoints/llm.py", line 127, in generate
    return self._run_engine(use_tqdm)
  File "/llm-bench/vllm-src/vllm/entrypoints/llm.py", line 147, in _run_engine
    step_outputs = self.llm_engine.step()
  File "/llm-bench/vllm-src/vllm/engine/llm_engine.py", line 246, in step
    self._decode_sequences(seq_groups)
  File "/llm-bench/vllm-src/vllm/engine/llm_engine.py", line 263, in _decode_sequences
    new_token, new_output_text = detokenize_incrementally(
  File "/llm-bench/vllm-src/vllm/transformers_utils/tokenizer.py", line 73, in detokenize_incrementally
    output_text = tokenizer.convert_tokens_to_string(output_tokens)
  File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 533, in convert_tokens_to_string
    return self.backend_tokenizer.decoder.decode(tokens)
TypeError: argument 'tokens': 'NoneType' object cannot be converted to 'PyString

The reason is that for some models there can be a mismatch between the config.vocab_size and the len(tokenizer). The model outputs a distribution over tokens in the range vocab_size, but only tokens in the range len(tokenizer) should actually be sampled. The remaining tokens are just padding and when sampling these tokens and decoding them, the result will be None instead of a string and so the exception will be thrown.

creatorrr · 2024-02-04T17:06:02Z

@zhuohan123 @simon-mo does this PR need more work? We are facing this issue in production

creatorrr · 2024-02-05T04:41:07Z

I see that the added lora padding complicates this PR quite a bit

GennVa · 2024-03-06T08:58:59Z

@zhuohan123 Are there any updates on a solution for this issue? Thanks

creatorrr · 2024-03-06T09:27:28Z

I ended up adding extra tokens to the tokenizer to make it a multiple of 32 and it works

zhuohan123 · 2024-03-06T09:29:15Z

Sorry for the delay. This PR is a little bit stalled. I will try to fix this in another PR with some other changes in the Sampler

GennVa · 2024-03-06T10:18:39Z

@zhuohan123 Okay thanks! What is the tracking number of this PR#?

GennVa · 2024-03-06T10:19:55Z

@creatorrr Okay thanks, I have in my case a qwen tokenizer of 151648 (151643 + 5 added tokens) that is a multiple of 32, a vocab.json size of 151643 and a vocab_size in config.json of 151936. Tried to add the 5 added tokens in vocab.json and in tokenizer vocab but still same error.

### What this PR does / why we need it? vLLM-Ascend's rope implementaion include several header file that are not supposed to be included by outside users. Current implementation may break when canntoolkits update, this PR remove those not compatible file includes to guarantee the safety of upgrading cann toolkits. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested by rope unittest Signed-off-by: ganyi <[email protected]>

Fix vocab_size inconsistency for sampler

66f1e08

simon-mo reviewed Jan 10, 2024

View reviewed changes

simon-mo assigned zhuohan123 Jan 10, 2024

GennVa mentioned this pull request Feb 29, 2024

Qwen1.5 - TypeError: argument 'tokens': 'NoneType' object cannot be converted to 'PyString' #3118

Closed

esmeetu mentioned this pull request Mar 6, 2024

Migrate logits computation and gather to model_runner #3233

Merged

3 tasks

GennVa mentioned this pull request Mar 14, 2024

[v0.4.0] Release Tracker #3155

Closed

3 tasks

zhuohan123 closed this in #3233 Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix vocab_size inconsistency for sampler #2398

Fix vocab_size inconsistency for sampler #2398

Uh oh!

tongyx361 commented Jan 10, 2024

Uh oh!

simon-mo Jan 10, 2024

Uh oh!

tongyx361 Jan 10, 2024

Uh oh!

zhuohan123 commented Jan 13, 2024

Uh oh!

tongyx361 commented Jan 14, 2024

Uh oh!

creatorrr commented Feb 4, 2024

Uh oh!

creatorrr commented Feb 5, 2024

Uh oh!

GennVa commented Mar 6, 2024

Uh oh!

creatorrr commented Mar 6, 2024

Uh oh!

zhuohan123 commented Mar 6, 2024

Uh oh!

GennVa commented Mar 6, 2024

Uh oh!

GennVa commented Mar 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Fix vocab_size inconsistency for sampler #2398

Fix vocab_size inconsistency for sampler #2398

Uh oh!

Conversation

tongyx361 commented Jan 10, 2024

Uh oh!

simon-mo Jan 10, 2024

Choose a reason for hiding this comment

Uh oh!

tongyx361 Jan 10, 2024

Choose a reason for hiding this comment

Uh oh!

zhuohan123 commented Jan 13, 2024

Uh oh!

tongyx361 commented Jan 14, 2024

Uh oh!

creatorrr commented Feb 4, 2024

Uh oh!

creatorrr commented Feb 5, 2024

Uh oh!

GennVa commented Mar 6, 2024

Uh oh!

creatorrr commented Mar 6, 2024

Uh oh!

zhuohan123 commented Mar 6, 2024

Uh oh!

GennVa commented Mar 6, 2024

Uh oh!

GennVa commented Mar 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants