Skip to content

[Bug]: Incoherent error message when using MLPSpeculator and num_speculative_tokens is set too high #5893

@tdoublep

Description

@tdoublep

Your current environment

n/a

🐛 Describe the bug

When using MLPSpeculator as the speculative model, each model has an upper-limit on how num_speculative_tokens can be set. This corresponds to the value of n_predict in the config of the speculative model. Currently, if the user tries to set num_speculative_tokens to a value higher than what is supported we get a confusing message. For example, if one uses ibm-fms/llama-13b-accelerator and sets num_speculative_tokens=4 we will get the following message:

ValueError: Expected both speculative_model and num_speculative_tokens to be provided, but found speculative_model='ibm-fms/llama-13b-accelerator' and num_speculative_tokens=4.

This model supports a maximum of num_speculative_tokesn=3 (e.g., see config here). It would better if we explicitly tell the user to reduce the value of num_speculative_tokens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions