Skip to content

a simple experimental n8n workflow designed to test and confirm the use of local vision-capable LLMs (such as `qwen/qwen2.5-vl-7b`) for analyzing images, specifically for tasks like understanding data fields on a receipt.

Notifications You must be signed in to change notification settings

emooney/Local-Vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Local Vision (WIP)

Status: Experimental
Tag: vision

πŸ“˜ Overview

Local Vision is a simple experimental n8n workflow designed to test and confirm the use of local vision-capable LLMs (such as qwen/qwen2.5-vl-7b) for analyzing imagesβ€”specifically for tasks like understanding data fields on a receipt.

This proof-of-concept confirms the ability to download an image from the web and process it using a locally-hosted multimodal model, producing structured data that could later be stored or further processed.


πŸ“Œ Use Case

Analyze the content of an image (e.g., a store receipt) using a local Vision LLM to extract visual information in base64 format and interpret it intelligently.


πŸ” Workflow Structure

Step Node Name Description
1 Manual Trigger Used for testing the workflow manually from the n8n UI.
2 HTTP Request Downloads an image file from the given URL (https://connect.garmin.com/images/login/desktop/signin-hero-3.jpg).
3 Qwen2.5 Local Vision Uses a local multimodal LLM model (qwen/qwen2.5-vl-7b) to analyze the downloaded image.
4 Sticky Note (comment) Describes the experiment, stating that the project attempts to test local vision-based LLM processing.

🧠 Technology Stack

  • n8n for workflow automation
  • Qwen 2.5 VL 7B model (via @n8n/n8n-nodes-langchain.openAi) for local vision AI processing
  • HTTP Node to download the image
  • Manual Trigger to start the flow

βš™οΈ Credentials Used

  • LM Studio (Credential ID: nJxj1tJeQ2RgUtHC)
    Required for the OpenAI-compatible local endpoint to run the Vision LLM.

πŸ“ Notes

  • The image must be processed as base64 for the model to analyze it.
  • Currently, the workflow stops after analysisβ€”no further data extraction or storage is implemented. You can add a Set, Function, or Spreadsheet node to post-process or export the results.
  • The workflow uses typeVersion: 4.2 for HTTP and typeVersion: 1.8 for the Langchain/OpenAI node, ensuring compatibility with recent n8n versions.

πŸš€ Next Steps

  • Add a Set or Function node to parse the output into structured fields.
  • Save the result to Google Sheets, PostgreSQL, or Airtable.
  • Enhance the model prompt or context for more accurate receipt parsing.

βœ… How to Run

  1. Make sure your LM Studio or other OpenAI-compatible local inference server is running.
  2. Replace the image URL with a picture of a receipt if needed.
  3. Open n8n β†’ Import this workflow β†’ Click "Test Workflow".
  4. Inspect the output from the Qwen2.5 Vision node.

πŸ“· Example Image

Default image:

https://connect.garmin.com/images/login/desktop/signin-hero-3.jpg

Replace it with your own image if needed.


πŸ“ Tags

vision, local-llm, image-processing, qwen, multimodal


🧩 Requirements

  • n8n v1.8+ with latest HTTP & Langchain/OpenAI nodes
  • Local LLM setup via LM Studio or compatible API
  • Base64 image input format

About

a simple experimental n8n workflow designed to test and confirm the use of local vision-capable LLMs (such as `qwen/qwen2.5-vl-7b`) for analyzing images, specifically for tasks like understanding data fields on a receipt.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published