Smart Visual Assistant

Project Summary

Smart Visual Assistant is an AI-powered desktop application that helps visually impaired users interact with their environment. It combines real-time object recognition (using YOLOv4-tiny) and voice-based AI queries (using Google Gemini AI and Vosk/Google Speech Recognition) for a fully hands-free experience.

📦 To run or test this project, simply extract the ZIP file and launch the executable. (https://drive.google.com/file/d/1LQpHb-XYGv9MUabUrxw9w5zVah6klQdG/view?usp=drive_link) 📘 For detailed instructions, please refer to the included User Guide

Why This Project?

To create a practical, real-world solution that empowers visually impaired individuals to interact with both their surroundings and AI in a seamless way.

So, I combined:
✅ Computer Vision (Object recognition with YOLOv4)
✅ AI-Powered Query System (Google Gemini AI)
✅ Voice-Controlled Interaction (Speech recognition & response control)

Now, users can not only identify objects around them but also ask general AI queries just like anyone else.

Features

Real-time object detection via webcam
Voice-activated control and feedback
AI-powered question answering (Google Gemini AI)
Two operation modes: On-Demand and Continuous
Interruptible speech responses
Works offline (Vosk) and online (Google Speech Recognition)

System Requirements

Hardware:
- Windows 10+ PC or laptop
- Webcam
- Microphone and speakers/headphones
Software:
- Python 3.7+
- Required Python packages (see below)
Example Voice Commands

Command	Action Performed
`"Gemini wake up, what is AI?"`	Calls Gemini AI for an answer.
`"Switch to next"`	Switches to continuous mode.
`"Switch to back"`	Switches to on-demand mode.
`"Start recognition"`	Enables object detection.
`"Stop recognition"`	Disables object detection.
`"Stop query"`	Stops Gemini AI responses.

Installation Guide

1. Install Python and Required Packages

Download and install Python 3.7 or newer from python.org.
Open a terminal/command prompt in the project directory and run:
```
pip install -r requirements.txt
```

2. Download YOLOv4-tiny Model Files

yolov4-tiny.weights:
- Download from official YOLO website
yolov4-tiny.cfg:
- Download from here (right-click > Save As)
coco.names:
- Download from here (right-click > Save As)
Place all three files in the project root directory (same folder as main.py).

3. Download and Set Up Vosk Speech Recognition Model

Visit the Vosk models page
Download a small English model, e.g., vosk-model-small-en-us-0.15.zip
Extract the zip file. You should get a folder like vosk-model-small-en-us-0.15
Place this folder inside the model/ directory in your project (so you have model/vosk-model-small-en-us-0.15)

4. (Optional) Google Gemini AI Setup

If you want to use Gemini AI features, set up Google Cloud credentials:
- Follow Google Cloud authentication guide
- Enable Gemini/Vertex AI API in your Google Cloud project
- Download your credentials JSON and set the GOOGLE_APPLICATION_CREDENTIALS environment variable to its path

How to Run the Application

A. From Source (Python)

Open a terminal in the project directory and run:
```
python main.py
```

B. Using the Executable (.exe)

If you have a pre-built executable (e.g., smart-visual.exe in the dist/ folder):
- Double-click the .exe file to launch the application

File and Folder Structure

project-root/
│── main.py
│── config.py
│── object_recognition.py
│── speech_processing.py
│── gemini_ai.py
│── yolov4-tiny.cfg
│── yolov4-tiny.weights
│── coco.names
│── requirements.txt
│── README.md
│── model/
│    └── vosk-model-small-en-us-0.15/  # (or similar)

Troubleshooting

Missing Packages: Run pip install -r requirements.txt again
Microphone/Camera Not Detected: Check device connections and Windows settings
Model Not Found: Ensure all model files are in the correct locations
Google Gemini AI Not Working: Check your Google Cloud credentials and API setup
Permission Errors: Try running the terminal or executable as administrator

For questions or support, please refer to the User Guide or contact the project maintainer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Smart Visual Assistant

Project Summary

Why This Project?

Features

System Requirements

Example Voice Commands

Installation Guide

1. Install Python and Required Packages

2. Download YOLOv4-tiny Model Files

3. Download and Set Up Vosk Speech Recognition Model

4. (Optional) Google Gemini AI Setup

How to Run the Application

A. From Source (Python)

B. Using the Executable (.exe)

File and Folder Structure

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
model		model
.gitignore		.gitignore
README.md		README.md
coco.names		coco.names
config.py		config.py
gemini_ai.py		gemini_ai.py
main.py		main.py
object_recognition.py		object_recognition.py
requirements.txt		requirements.txt
speech_processing.py		speech_processing.py
yolov4-tiny.cfg		yolov4-tiny.cfg

Imdachu/Voice-Assisted-Object-Recognition-and-AI-Query-System-for-the-Visually-Impaired

Folders and files

Latest commit

History

Repository files navigation

Smart Visual Assistant

Project Summary

Why This Project?

Features

System Requirements

Example Voice Commands

Installation Guide

1. Install Python and Required Packages

2. Download YOLOv4-tiny Model Files

3. Download and Set Up Vosk Speech Recognition Model

4. (Optional) Google Gemini AI Setup

How to Run the Application

A. From Source (Python)

B. Using the Executable (.exe)

File and Folder Structure

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages