Status: Experimental
Tag: vision
Local Vision is a simple experimental n8n workflow designed to test and confirm the use of local vision-capable LLMs (such as qwen/qwen2.5-vl-7b
) for analyzing imagesβspecifically for tasks like understanding data fields on a receipt.
This proof-of-concept confirms the ability to download an image from the web and process it using a locally-hosted multimodal model, producing structured data that could later be stored or further processed.
Analyze the content of an image (e.g., a store receipt) using a local Vision LLM to extract visual information in base64 format and interpret it intelligently.
Step | Node Name | Description |
---|---|---|
1 | Manual Trigger |
Used for testing the workflow manually from the n8n UI. |
2 | HTTP Request |
Downloads an image file from the given URL (https://connect.garmin.com/images/login/desktop/signin-hero-3.jpg ). |
3 | Qwen2.5 Local Vision |
Uses a local multimodal LLM model (qwen/qwen2.5-vl-7b ) to analyze the downloaded image. |
4 | Sticky Note (comment) |
Describes the experiment, stating that the project attempts to test local vision-based LLM processing. |
- n8n for workflow automation
- Qwen 2.5 VL 7B model (via
@n8n/n8n-nodes-langchain.openAi
) for local vision AI processing - HTTP Node to download the image
- Manual Trigger to start the flow
- LM Studio (Credential ID:
nJxj1tJeQ2RgUtHC
)
Required for the OpenAI-compatible local endpoint to run the Vision LLM.
- The image must be processed as base64 for the model to analyze it.
- Currently, the workflow stops after analysisβno further data extraction or storage is implemented. You can add a
Set
,Function
, orSpreadsheet
node to post-process or export the results. - The workflow uses
typeVersion: 4.2
for HTTP andtypeVersion: 1.8
for the Langchain/OpenAI node, ensuring compatibility with recent n8n versions.
- Add a
Set
orFunction
node to parse the output into structured fields. - Save the result to Google Sheets, PostgreSQL, or Airtable.
- Enhance the model prompt or context for more accurate receipt parsing.
- Make sure your LM Studio or other OpenAI-compatible local inference server is running.
- Replace the image URL with a picture of a receipt if needed.
- Open n8n β Import this workflow β Click "Test Workflow".
- Inspect the output from the Qwen2.5 Vision node.
Default image:
https://connect.garmin.com/images/login/desktop/signin-hero-3.jpg
Replace it with your own image if needed.
vision
, local-llm
, image-processing
, qwen
, multimodal
- n8n v1.8+ with latest HTTP & Langchain/OpenAI nodes
- Local LLM setup via LM Studio or compatible API
- Base64 image input format