From 247d3e3e44947d28d5f6936bfcd52e4c6f1fb61b Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 25 Apr 2025 14:17:17 +0000 Subject: [PATCH 1/2] [Misc] Inline Molmo requirements Signed-off-by: DarkLight1337 --- docs/source/models/supported_models.md | 26 +++++++++++++++++++++++++- requirements/molmo.txt | 20 -------------------- 2 files changed, 25 insertions(+), 21 deletions(-) delete mode 100644 requirements/molmo.txt diff --git a/docs/source/models/supported_models.md b/docs/source/models/supported_models.md index 6b101662fc14..986e6c222b05 100644 --- a/docs/source/models/supported_models.md +++ b/docs/source/models/supported_models.md @@ -1112,7 +1112,31 @@ To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have to pass `--hf_overrides '{" ::: :::{warning} -For improved output quality of `AllenAI/Molmo-7B-D-0924` (especially in object localization tasks), we recommend using the pinned dependency versions listed in (including `vllm==0.7.0`). These versions match the environment that achieved consistent results on both A10 and L40 GPUs. +For improved output quality of `AllenAI/Molmo-7B-D-0924` (especially in object localization tasks), we recommend using the following dependency versions: + +```text +# Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40) +torch==2.5.1 +torchvision==0.20.1 +transformers==4.48.1 +tokenizers==0.21.0 +tiktoken==0.7.0 +vllm==0.7.0 + +# Optional but recommended for improved performance and stability +triton==3.1.0 +xformers==0.0.28.post3 +uvloop==0.21.0 +protobuf==5.29.3 +openai==1.60.2 +opencv-python-headless==4.11.0.86 +pillow==10.4.0 + +# Installed FlashAttention (for float16 only) +flash-attn>=2.5.6 # Not used in float32, but should be documented +``` + +These versions match the environment that achieved consistent results on both A10 and L40 GPUs. ::: :::{note} diff --git a/requirements/molmo.txt b/requirements/molmo.txt deleted file mode 100644 index 8450e29b6e7d..000000000000 --- a/requirements/molmo.txt +++ /dev/null @@ -1,20 +0,0 @@ -# Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40) -torch==2.5.1 -torchvision==0.20.1 -transformers==4.48.1 -tokenizers==0.21.0 -tiktoken==0.7.0 -vllm==0.7.0 - -# Optional but recommended for improved performance and stability -triton==3.1.0 -xformers==0.0.28.post3 -uvloop==0.21.0 -protobuf==5.29.3 -openai==1.60.2 -opencv-python-headless==4.11.0.86 -pillow==10.4.0 - -# Installed FlashAttention (for float16 only) -flash-attn>=2.5.6 # Not used in float32, but should be documented - From c0df8a00ef7a2518c68c8f750d2c11eb61cf53fa Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 25 Apr 2025 14:24:16 +0000 Subject: [PATCH 2/2] Add a note about security implications Signed-off-by: DarkLight1337 --- docs/source/models/supported_models.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/source/models/supported_models.md b/docs/source/models/supported_models.md index 986e6c222b05..20a706a0b8ec 100644 --- a/docs/source/models/supported_models.md +++ b/docs/source/models/supported_models.md @@ -1112,7 +1112,9 @@ To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have to pass `--hf_overrides '{" ::: :::{warning} -For improved output quality of `AllenAI/Molmo-7B-D-0924` (especially in object localization tasks), we recommend using the following dependency versions: +The output quality of `AllenAI/Molmo-7B-D-0924` (especially in object localization tasks) has deteriorated in recent updates. + +For the best results, we recommend using the following dependency versions (tested on A10 and L40): ```text # Core vLLM-compatible dependencies with Molmo accuracy setup (tested on L40) @@ -1136,7 +1138,7 @@ pillow==10.4.0 flash-attn>=2.5.6 # Not used in float32, but should be documented ``` -These versions match the environment that achieved consistent results on both A10 and L40 GPUs. +**Note:** Make sure you understand the security implications of using outdated packages. ::: :::{note}