[Feature]: google/gemma-2-2b supports 8K context length but vllm does not support it.

### Your current environment

vllm version: 0.6.3.post1

### Model Input Dumps

_No response_

### 🐛 Describe the bug

I see on the official site of gemma: https://huggingface.co/google/gemma-2b, context length is 8K.
however, when I load it into vllm and try to do inference where max_model_len is set to 8192, I encounter the error below:


> Traceback (most recent call last):
>   File "/home/ubuntu/moa/eval.py", line 170, in <module>
>     llm = LLM(model= args.llm_name, dtype='bfloat16', max_model_len= max_len, 
>   File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 177, in __init__
>     self.llm_engine = LLMEngine.from_engine_args(
>   File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 570, in from_engine_args
>     engine_config = engine_args.create_engine_config()
>   File "/opt/conda/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 903, in create_engine_config
>     model_config = self.create_model_config()
>   File "/opt/conda/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 839, in create_model_config
>     return ModelConfig(
>   File "/opt/conda/lib/python3.10/site-packages/vllm/config.py", line 192, in __init__
>     self.max_model_len = _get_and_verify_max_len(
>   File "/opt/conda/lib/python3.10/site-packages/vllm/config.py", line 1790, in _get_and_verify_max_len
>     raise ValueError(
> ValueError: User-specified max_model_len (8192) is greater than the derived max_model_len (sliding_window=4096 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: google/gemma-2-2b supports 8K context length but vllm does not support it. #9517

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: google/gemma-2-2b supports 8K context length but vllm does not support it. #9517

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions