Skip to content

atharvak-dev/VisionAssist

Repository files navigation

VisionAssist 👁️

Python YOLO License PRs Welcome

VisionAssist Demo: Real-time object detection and distance estimation in action

VisionAssist is a groundbreaking AI-powered assistant that transforms the way visually impaired individuals interact with their environment. Using state-of-the-art computer vision and natural language processing, it provides real-time audio descriptions of surroundings, making the world more accessible and navigable.

✨ Features

  • 🎯 Real-time Object Detection

    • Powered by YOLOv8, one of the fastest and most accurate object detection models
    • Detects 80+ different types of objects in real-time
    • Smooth performance on standard hardware
  • 📏 Precise Distance Estimation

    • Accurate distance measurements using advanced focal length calculations
    • Real-time updates as objects move
    • Distance reported in both inches and feet
  • 🔊 Natural Audio Descriptions

    • Crystal-clear text-to-speech descriptions
    • Contextual information about object locations
    • Adjustable speech rate and volume
  • 🎤 Intuitive Voice Control

    • Simple voice commands for system control
    • Works in noisy environments
    • Supports multiple accents and dialects

🔧 Prerequisites

  • Python 3.8 or higher
  • Webcam or USB camera
  • Microphone
  • Internet connection (for speech recognition)
  • 4GB RAM minimum (8GB recommended)
  • NVIDIA GPU (optional, for better performance)

⚡ Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/VisionAssist.git
    cd VisionAssist
  2. Set up a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install required packages:

    pip install -r requirements.txt
  4. Download the YOLO model:

    wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt

🚀 Usage

  1. Start the application:

    python main.py
  2. Voice Commands:

    • Say "start" to begin object detection
    • Say "stop" to pause detection
    • Say "quit" to exit the application

🔍 How It Works

Main Components

  • Object Detection and Distance Estimation:

    import cv2
    Known_width = 5.7  # Inches
    
    # Load the YOLO model
    model = YOLO('yolov8n.pt')  # Load an official model
    names = model.names  # Get class names
  • Text-to-Speech and Speech Recognition:

    import pyttsx3
    import speech_recognition as sr
    
    # Initialize the text-to-speech engine
    engine = pyttsx3.init()
    
    # Initialize the speech recognizer
    recognizer = sr.Recognizer()
  • Focal Length Calculation:

    def FocalLength(measured_distance, real_width, width_in_rf_image):
        return (width_in_rf_image * measured_distance) / real_width
  • Distance Finder:

    def Distance_finder(Focal_Length, real_object_width, object_width_in_frame):
        return (real_object_width * Focal_Length) / object_width_in_frame
  • Generate Description:

    def generate_description(object_distance, class_id):
        return f"A {class_id} is at {object_distance} inches"
  • Generate Speech:

    def generate_speech(description):
        engine.say(description)
        engine.runAndWait()
        print("Say 'start' to continue or 'stop' to end.")
        command = listen_for_command()
        return command
  • Listen for Command:

    def listen_for_command():
        with sr.Microphone() as source:
            recognizer.adjust_for_ambient_noise(source)
            audio = recognizer.listen(source)
        try:
            command = recognizer.recognize_google(audio).lower()
            print("Received command:", command)
            return command
        except sr.UnknownValueError:
            print("Sorry, could not understand audio.")
            return ""
        except sr.RequestError:
            print("Could not request results; check your internet connection.")
            return ""
  • Control Speech:

    def control_speech():
        while True:
            command = generate_speech("Description goes here")
            if command == "start":
                cap = cv2.VideoCapture(0)  # Camera object
                describe_objects(cap)
            elif command == "stop":
                print("Stopping speech generation...")
                engine.stop()
                break
            else:
                print("Sorry, could not understand the command.")
  • Describe Objects:

    def describe_objects(cap):
        Focal_length_found = None  # Initialize Focal_length_found variable
        while True:
            ret, frame = cap.read()
            if not ret:
                break
    
            results = model(frame)  # Predict on an image
            result = results[0]
    
            if Focal_length_found is None:
                Focal_length_found = FocalLength(Known_distance, Known_width, frame.shape[1])
    
            for box in result.boxes:
                cords = box.xyxy[0].tolist()
                cords = [round(x) for x in cords]
                x, y, w, h = cords
                class_id = result.names[box.cls[0].item()]
    
                object_width_in_frame = w
                object_distance = Distance_finder(Focal_length_found, Known_width, object_width_in_frame)
                object_distance = round(object_distance, 2)
    
                description = generate_description(object_distance, class_id)
                command = generate_speech(description)
                if command == "stop":
                    cap.release()
                    cv2.destroyAllWindows()
                    return
    
                cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
                cv2.putText(frame, f"Object: {class_id}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
    
            cv2.imshow('Object Detection', frame)
    
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
    
        cap.release()
        cv2.destroyAllWindows()

Main Function

  • Main Function:
    def main():
        control_speech()
    
    if __name__ == "__main__":
        main()

❗ Troubleshooting

Common issues and solutions:

  1. Camera not detected:

    # Try changing the camera index
    cv2.VideoCapture(1)  # Instead of 0
  2. Speech recognition errors:

    • Ensure stable internet connection
    • Check microphone permissions
    • Try reducing background noise
  3. Performance issues:

    • Close other GPU-intensive applications
    • Reduce frame resolution in settings
    • Use a faster YOLO model variant

🤝 Contributing

We love your input! Check out our Contributing Guidelines to get started.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🌟 Support

If you find this project useful, please consider giving it a star ⭐️

📧 Contact

For any questions or support, please open an issue or contact us at your-email@example.com


Made with ❤️ for the visually impaired community

About

It is an python based application which is build to help blind people by assisting them

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages