Make Kimi-VL avialbe in ComfyUI.
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities.
-
Make sure you have ComfyUI installed
-
Clone this repository into your ComfyUI's custom_nodes directory:
cd ComfyUI/custom_nodes
git clone https://github.com/Yuan-ManX/ComfyUI-Kimi-VL.git
- Install dependencies:
cd ComfyUI-Kimi-VL
pip install -r requirements.txt
🤗 For general multimodal perception and understanding, OCR, long video and long document, video perception, and agent uses, we recommend Kimi-VL-A3B-Instruct
for efficient inference; for advanced text and multimodal reasoning (e.g. math), please consider using Kimi-VL-A3B-Thinking
.
Model | #Total Params | #Activated Params | Context Length | Download Link |
---|---|---|---|---|
Kimi-VL-A3B-Instruct | 16B | 3B | 128K | 🤗 Hugging Face |
Kimi-VL-A3B-Thinking | 16B | 3B | 128K | 🤗 Hugging Face |
Note
Recommended parameter settings:
- For Thinking models, it is recommended to use
Temperature = 0.6
. - For Instruct models, it is recommended to use
Temperature = 0.2
.