[Bug]: Incoherent error message when using MLPSpeculator and `num_speculative_tokens` is set too high

### Your current environment

n/a

### 🐛 Describe the bug

When using `MLPSpeculator` as the speculative model, each model has an upper-limit on how `num_speculative_tokens` can be set. This corresponds to the value of `n_predict` in the config of the speculative model. Currently, if the user tries to set `num_speculative_tokens` to a value higher than what is supported we get a confusing message. For example, if one uses `ibm-fms/llama-13b-accelerator` and sets `num_speculative_tokens=4` we will get the following message:
```
ValueError: Expected both speculative_model and num_speculative_tokens to be provided, but found speculative_model='ibm-fms/llama-13b-accelerator' and num_speculative_tokens=4.
```
This model supports a maximum of `num_speculative_tokesn=3` (e.g., see config [here](https://huggingface.co/ibm-fms/llama-13b-accelerator/blob/main/config.json#L10)). It would better if we explicitly tell the user to reduce the value of `num_speculative_tokens`. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Incoherent error message when using MLPSpeculator and `num_speculative_tokens` is set too high #5893

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Incoherent error message when using MLPSpeculator and num_speculative_tokens is set too high #5893

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: Incoherent error message when using MLPSpeculator and `num_speculative_tokens` is set too high #5893