Skip to content

Conversation

@tongyx361
Copy link

Fixing #340

revision=model_config.revision)
model_config.hf_config.sampler_vocab_size = min(
len(self.tokenizer), model_config.hf_config.vocab_size)
self.cache_config = cache_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines don't need to move right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could work, because self.model_config = model_config is a reference assignment.
Maybe it's better to move line 89-96 before self.model_config = model_config?

@zhuohan123
Copy link
Member

@tongyx361 What exact bug is this PR fixing?

@tongyx361
Copy link
Author

@tongyx361 What exact bug is this PR fixing?

c.f. #340 (comment)

Traceback (most recent call last):
  File "vllm-none-problem-repro.py", line 21, in <module>
    out = llm.generate(input, sampling_params)
  File "/llm-bench/vllm-src/vllm/entrypoints/llm.py", line 127, in generate
    return self._run_engine(use_tqdm)
  File "/llm-bench/vllm-src/vllm/entrypoints/llm.py", line 147, in _run_engine
    step_outputs = self.llm_engine.step()
  File "/llm-bench/vllm-src/vllm/engine/llm_engine.py", line 246, in step
    self._decode_sequences(seq_groups)
  File "/llm-bench/vllm-src/vllm/engine/llm_engine.py", line 263, in _decode_sequences
    new_token, new_output_text = detokenize_incrementally(
  File "/llm-bench/vllm-src/vllm/transformers_utils/tokenizer.py", line 73, in detokenize_incrementally
    output_text = tokenizer.convert_tokens_to_string(output_tokens)
  File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 533, in convert_tokens_to_string
    return self.backend_tokenizer.decoder.decode(tokens)
TypeError: argument 'tokens': 'NoneType' object cannot be converted to 'PyString

The reason is that for some models there can be a mismatch between the config.vocab_size and the len(tokenizer). The model outputs a distribution over tokens in the range vocab_size, but only tokens in the range len(tokenizer) should actually be sampled. The remaining tokens are just padding and when sampling these tokens and decoding them, the result will be None instead of a string and so the exception will be thrown.

@creatorrr
Copy link

@zhuohan123 @simon-mo does this PR need more work? We are facing this issue in production

@creatorrr
Copy link

I see that the added lora padding complicates this PR quite a bit

@GennVa
Copy link

GennVa commented Mar 6, 2024

@zhuohan123 Are there any updates on a solution for this issue? Thanks

@creatorrr
Copy link

I ended up adding extra tokens to the tokenizer to make it a multiple of 32 and it works

@zhuohan123
Copy link
Member

Sorry for the delay. This PR is a little bit stalled. I will try to fix this in another PR with some other changes in the Sampler

@GennVa
Copy link

GennVa commented Mar 6, 2024

@zhuohan123 Okay thanks! What is the tracking number of this PR#?

@GennVa
Copy link

GennVa commented Mar 6, 2024

@creatorrr Okay thanks, I have in my case a qwen tokenizer of 151648 (151643 + 5 added tokens) that is a multiple of 32, a vocab.json size of 151643 and a vocab_size in config.json of 151936. Tried to add the 5 added tokens in vocab.json and in tokenizer vocab but still same error.

@GennVa GennVa mentioned this pull request Mar 14, 2024
3 tasks
amy-why-3459 pushed a commit to amy-why-3459/vllm that referenced this pull request Sep 15, 2025
### What this PR does / why we need it?
vLLM-Ascend's rope implementaion include several header file that are
not supposed to be included by outside users. Current implementation may
break when canntoolkits update, this PR remove those not compatible file
includes to guarantee the safety of upgrading cann toolkits.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested by rope unittest

Signed-off-by: ganyi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants