SPARC - AI-Powered Assistive Reality System for Sign Language Communication and Navigation

Published Jun 12, 2026
 170 hours to build
 Intermediate

An AI-powered wearable assistive system that enables real-time two-way sign language communication, environmental awareness, and emergency safety assistance. By combining computer vision, edge AI, and smart sensors, it helps Deaf and speech-impaired individuals communicate, navigate, and interact more independently in everyday life.

display image

Components Used

Raspberry Pi 4 Model B - 8 GB RAM
Single Board Computers The factory is currently not accepting orders for this product.
1
Oled Display
Displays captions and system feedback
1
MPU6050 IMU Sensor
Motion sensing and fall detection
1
Microphone Module
Speech input for speech-to-text
1
Bluetooth Audio Module
Wireless audio output
1
Bluetooth Earbuds/Speaker
Audio feedback to user
1
Lithium Polymer Battery
Portable power source
1
Spectacle Frame / Wearable Fitment
Mounting structure for assistive hardware
1
Description

1. Project Overview

 

SPARC is a wearable assistive communication system designed to help Deaf and speech-impaired users interact more independently in real-world settings. The project combines computer vision, edge AI, wearable electronics, speech output, and a compact OLED interface to turn sign language gestures into meaningful communication.

The current project focuses on the spectacle-mounted fitment prototype. The broader project vision includes a modular assistive reality ecosystem, but the practical build presented here is centered on a wearable device that can be demonstrated, tested, and iterated quickly.

2. Why This Project Matters

 

 

In many daily situations, the user has no interpreter nearby. Existing tools often solve only one part of the problem: either speech-to-text, or sign recognition, or a phone-based assistive app that is not truly wearable. That creates friction in schools, hospitals, counters, workplaces, and public spaces.

SPARC is built around a more complete idea: a communication companion that sits on the user, reads gestures, speaks responses, and gives compact visual feedback without depending on a cloud connection.

3. Problem Statement

  • One-way translation tools do not support natural conversation.
  • Bulky glove-based systems are uncomfortable and difficult to scale.
  • Cloud-dependent apps create latency, privacy, and connectivity issues.
  • Most solutions do not support a wearable, everyday form factor.
  • Safety and situational awareness are often missing entirely.

The result is that users still need extra help for routine communication. A solution only becomes useful when it is fast, portable, private, and comfortable enough to wear for long periods.

4. Existing Solutions and Their Limitations

The current market has sign-language apps, captioning tools, camera-based demos, and glove systems. Each of them solves a narrow piece of the puzzle, but the overall experience is still fragmented.

  • Glove systems: often accurate only in controlled environments, with poor comfort and maintenance issues.
  • Mobile apps: easy to install, but not always hands-free or suitable for continuous interaction.
  • Cloud platforms: powerful but slower, dependent on internet quality, and less private.
  • Screen-only prototypes: useful for research, but weak as a daily wearable product.

This is why the project was framed as a wearable assistive reality system instead of a plain translation app. The hardware form factor is part of the innovation, not just a container for software.

5. Why We Chose a Wearable Approach

A wearable system makes the interaction feel immediate. The user does not need to hold a phone, open an app, or connect extra accessories every time. The device is always ready, and that matters in real-world communication.

  • Hands-free use during conversation.
  • Faster access to feedback through the OLED interface.
  • More natural user experience in public settings.
  • Better privacy because processing can stay on-device.
  • A modular fitment can be adapted to different frames and future generations.

6. Concept Development Journey

The concept evolved from a basic sign-language recognition idea into a full assistive workflow. The team did not stop at gesture classification. It grew into a communication companion with audio output, visual overlays, modular hardware, and a live web demo.

The strongest design decision was to keep the prototype simple enough to run reliably, while still showing a future-ready architecture.

7. Design Thinking Process

The project follows a simple but effective human-centered design path.

1. Empathize: understand how sign-language users experience communication barriers.

2. Define: focus on a wearable, low-latency, offline assistive solution.

3. Ideate: combine gesture recognition, speech output, OLED feedback, and modular hardware.

4. Prototype: build a spectacle-mounted fitment and test it in live conditions.

5. Evaluate: measure accuracy, latency, comfort, and reliability.

8. Project Objectives

  • Recognize sign gestures in real time.
  • Convert gestures into readable text and speech.
  • Provide visual feedback on a wearable OLED display.
  • Operate with low latency and offline-first behavior.
  • Support modular development and future expansion.
  • Demonstrate a real wearable prototype, not just a software concept.

9. Proposed Solution

SPARC is an AI-powered wearable assistive reality system that uses a camera, a small on-device compute layer, machine learning models, and a wearable display to assist communication. The system detects gestures, interprets the result, and outputs the information through both speech and text.

In the current fitment-focused build, the device is presented as a spectacle-mounted unit with a compact display and camera alignment that can be worn during live demonstrations.

                                User wearing the spectacle-mounted fitment prototype
 

10. Key Innovations

  • Wearable first: the form factor is designed for daily use.
  • Offline edge processing: the system is not built around constant cloud dependence.
  • Multi-module AI stack: gesture recognition, emotion detection, and optional object awareness.
  • OLED-driven feedback: compact visual output without needing a phone screen.
  • Code modularity: the repository is split into services, models, camera, gestures, vision, and test modules.

11. System Overview


           High-level architecture of the wearable assistive communication platform

At a high level, the workflow is simple: the camera captures the user's gesture, the software extracts landmarks and classifies the sign, the result is converted into sentence-level output, and the OLED/audio services present it to the user and the listener.

The design is intentionally modular. That means the camera, output display, models, and services can be upgraded independently.

12. Hardware Architecture

ComponentRole in the system
Raspberry Pi 4Main local compute platform for inference and orchestration.
USB cameraCaptures hand and body gestures from the wearable viewpoint.
Waveshare OLED displayPresents compact captions and system feedback.
Microphone moduleOptional speech input path for voice-based interaction.
Bluetooth audio / speakerConverts text to speech for external listeners.
Li-ion / Li-Po batteryPortable power for the fitment prototype.
Wearable spectacle frameMechanical mounting platform for the assistive unit.

13. Working Principle

1. The user wears the spectacle-mounted unit.

2. The camera streams live frames to the recognition pipeline.

3. Hand landmarks and optional face or pose cues are extracted.

4. The model classifies the sign or gesture.

5. The result is built into a readable word or sentence.

6. The display shows the text, and the audio service speaks it out.

7. The user can continue the conversation without switching devices.

14. End-to-End Workflow


                      End-to-end gesture-to-speech and gesture-to-text workflow

 

15. Wearable Design

The fitment was chosen because it feels closer to an everyday assistive product. The camera and display sit on the wearable frame, which helps keep the prototype compact. The shape, alignment, and cable routing matter just as much as the AI model.

 

                                                                    SPARC - CAD Diagram

16. Hardware Components

QuantityComponentPurpose
1Raspberry Pi 4Main local inference and control unit
1USB CameraGesture capture
1Waveshare OLED displayWearable text output
1Microphone moduleSpeech input
1Bluetooth audio / speakerVoice output
1Battery packPortable power
1Spectacle frame / fitmentWearable enclosure
1Optional IMU moduleFuture motion/safety extension

17. Hardware Assembly Process

The prototype was assembled in a practical engineering sequence instead of a cosmetic sequence. First came the sensor placement, then display alignment, then compute integration, and only after that did the team focus on cable management and usability.

1. Mount the display on the spectacle frame and confirm visibility from the user angle.

2. Fix the camera so that the hands remain in the field of view during natural movement.

3. Place the compute unit and route the cables to reduce strain on the frame.

4. Connect the output path for audio and verify that speech generation is clear.

5. Run live tests and adjust the alignment until the system feels natural to wear.

18. Prototype Development Journey

SPARC was developed through multiple iterations, beginning with basic hand-landmark detection and progressing to a wearable assistive communication system. Initial efforts focused on achieving reliable sign-language recognition using computer vision and machine learning.

The next phase involved integrating a wearable OLED display and optimizing the recognition pipeline for real-time performance. Multiple hardware configurations were tested to improve comfort, visibility, and system stability.

Finally, the complete system was validated through live demonstrations, successfully translating gestures into meaningful text and wearable feedback in real time.

                             Live gesture recognition shown on the wearable prototype during testing

  Development table showing the processing laptop, display, and hardware wiring during integration

   Hand landmark recognition view used during sign-language classification

 

19. Software Architecture

  Software architecture and module interactions inside the SPARC repository

20. Source Code Structure

The repository is not flat. It is deliberately split into runtime code, model assets, driver libraries, and test/debug scripts. That structure makes it easier to explain, debug, and extend.
Repo Link: https://github.com/Avishkar-byte/SPARC-Smart-Perception-Assistive-Reality-Companion.git

SPARC-Smart-Perception-Assistive-Reality-Companion/

├── run.py
├── requirements.txt
├── README.md
├── RFC_MODEL_2_0_9_modes.pkl
├── RFC_MODEL_3_A_Z_modes.pkl
├── yolov8n.pt
├── weights/
├── SPARC/
│   ├── main.py
│   ├── core/
│   ├── services/
│   ├── gestures/
│   ├── emotions/
│   ├── vision/
│   ├── cameras/
│   ├── config/
│   ├── isl_recognition/
│   └── utils/
├── Face_Recognition/
│   ├── src/
│   ├── model.h5
│   └── imgs/
├── lib/
│   └── waveshare_OLED/
└── X/
    ├── test_*.py
    └── fix_*.md

21. Repository Walkthrough: run.py and Root Assets

run.py is the clean launch point. It simply appends the repository path to Python's import path and calls SPARC.main.main(). 

  • run.py: launch script.
  • requirements.txt: Python dependency list.
  • RFC_MODEL_2_0_9_modes.pkl: trained classifier for number mode.
  • RFC_MODEL_3_A_Z_modes.pkl: trained classifier for character mode.
  • yolov8n.pt / weights/yolov8n.pt: object detection weights used by the optional vision module.

22. Repository Walkthrough: SPARC Package

  • SPARC/main.py: interactive runtime orchestrator and user flow control.
  • SPARC/services/display.py: OLED wrapper with rotation, scaling, fallback logging, and text rendering.
  • SPARC/services/audio.py: blocking TTS playback with graceful cleanup.
  • SPARC/services/logger.py: consistent timestamped logging.
  • SPARC/core/model_handler.py: model loading and prediction wrapper for ISL and ASL assets.
  • SPARC/gestures/gesture_recognizer.py: hand detection, sentence building, emotion integration, and output routing.
  • SPARC/emotions/emotion_detector.py: face-based emotion classification using a pre-trained CNN and Haar cascade.
  • SPARC/vision/object_detection.py: optional YOLOv8 scene/object description and distance-aware summarization.
  • SPARC/cameras/realsense_manager.py: camera abstraction that prefers RealSense and falls back to USB camera.
  • SPARC/config/settings.py and config_io.py: central constants and persisted display/audio calibration.
  • SPARC/isl_recognition/: alternate gesture pipeline, geometry helpers, and visualizer utilities.
  • SPARC/utils/geometry.py: landmark distance and angle calculations.

23. Repository Walkthrough: Face_Recognition Module

Face_Recognition is a separate reusable subproject for emotion detection. It contains the dataset preparation script, the model file, and the inference/training script.

  • src/dataset_prepare.py: converts FER-2013 CSV into train/test image folders.
  • src/emotions.py: CNN architecture, training loop, and display mode for facial emotion recognition.
  • src/haarcascade_frontalface_default.xml: face detector used before feeding crops into the CNN.
  • model.h5: saved emotion classifier used during runtime.
  • imgs/accuracy.png: training performance visual used in the module README.

24. Repository Walkthrough: X Folder

The X folder behaves like a development workshop. It contains fixes, test scripts, and investigation tools that were used while stabilizing the camera and object-detection pipeline.

  • test_camera_feed.py / test_camera_simple.py / test_cameras.py: camera diagnostics.
  • test_both_modes.py: validation across multiple runtime modes.
  • test_object_detection_live.py: live YOLO-based scene tests.
  • test_realsense.py: RealSense validation script.
  • REALSENSE_SETUP.md / REALSENSE_RGB_FIX.md / CAMERA_FIX_SUMMARY.md: troubleshooting notes and integration fixes.
  • main_simple.py and backup.py: reduced or fallback runtime versions.

 

25. Main Execution Flow

The actual startup path is very short but very important.

python run.py

    -> imports SPARC.main

    -> initializes logging

    -> initializes OLED display service

    -> initializes audio service

    -> selects language mode (ISL or ASL)

    -> launches GestureRecognizer

    -> enters live recognition loop

    -> speaks and displays recognized output

    -> cleans up on exit

Inside SPARC/main.py, the code currently uses the USB webcam path for the fitment-focused demo. RealSense and object detection support are present in the codebase.

26. How the Code Works Internally

The code is organized around a few clear responsibilities.

1. Model loading: model_handler loads the correct model based on language mode.

2. Frame capture: camera or webcam frames are read continuously.

3. Feature extraction: the hand detector and geometry utilities convert raw frames into model-friendly features.

4. Inference: the classifier predicts the current gesture or sign.

5. Sentence building: results are accumulated into a readable sentence.

6. Output: display.py renders text on the OLED and audio.py speaks the output.

 

27. Dataset Collection and Training

The repository contains the runtime models and a separate emotion-training pipeline. For the gesture side, the models are packaged as trained assets. For facial emotion detection, the repo includes the FER-2013 preparation script and CNN training script.

In the emotion module, `dataset_prepare.py` converts the FER-2013 CSV into train and test folders, while `emotions.py` trains a CNN on 48x48 grayscale face crops. This gives the project a complete ML story: data preparation, model training, inference, and deployment.

28. Hand Landmark Detection

Hand landmarks are the most important input representation for the gesture system. Instead of feeding the model raw pixels only, the pipeline derives structured landmark coordinates. That improves consistency across lighting and background changes.

 Landmark-based hand recognition interface used to validate gestures in real time

The geometry helper functions compute distances, angles, and shoulder width normalization. That is a strong engineering choice because it gives the classifier more stable features than raw frames alone.

29. Gesture Recognition Pipeline

Gesture recognition is handled by the GestureRecognizer class. The module uses cvzone hand detection, the trained model handler, and emotion integration to build a live interactive system.

  • Detect hands and landmarks from the live frame.
  • Normalize and convert the landmark positions into a compact feature set.
  • Pass the features to the correct model based on language mode.
  • Filter low-confidence results using a threshold.
  • Accumulate stable predictions into a sentence.
  • Send the sentence to display and audio services.

30. Emotion Recognition Module

The emotion recognition module is a valuable additional layer. It reads the face, detects the region using a Haar cascade, and classifies the expression with a CNN saved as model.h5.

The module focuses on a practical subset of emotions during runtime: Angry, Happy, Neutral, and Sad. This is a smart decision because it keeps the interaction layer simple and readable.

31. OLED Display Module

The display service is a nice engineering detail because it wraps a real Waveshare OLED driver, manages rotation and scaling, and falls back gracefully if hardware is unavailable. That makes the code robust.

     Wearable OLED output showing the system initialization message

The OLED module supports centered bold text, configurable line height, offsets, and display persistence. 

32. Audio Module

Audio output is handled by gTTS and local playback through mpg123. The design blocks until speech is finished so the spoken result is not cut off mid-sentence.

  • Generate MP3 using gTTS.
  • Play the audio using mpg123.
  • Clean up temporary files after playback.
  • Use short safety delays to avoid truncation.

This is a practical choice for a demo because it keeps the speech output predictable and easy to understand.

33. Vision and Optional Object Detection

The repository also includes an optional object detection and depth-aware vision layer. The module uses YOLOv8 and can work with RealSense or USB camera input through the camera manager abstraction.

In the current fitment, it demonstrates that the project is architected to expand into scene awareness and navigation assistance.

 

34. GitHub Repository Walkthrough


  SPARC - GitHub repository screenshot showing the top-level file structure and README


GitHub Repository Link: https://github.com/Avishkar-byte/SPARC-Smart-Perception-Assistive-Reality-Companion 
 

35. How We Built This Project

 

1. Idea Validation
The project originated from observing the communication difficulties faced by Deaf and speech-impaired individuals in educational institutions and public spaces. The goal was to create a practical solution that could assist users without requiring an interpreter.

2. Requirement Analysis
Key requirements were identified, including real-time gesture recognition, offline operation, wearable form factor, low latency, and user-friendly feedback mechanisms.

3. Hardware Selection
Several hardware configurations were evaluated before selecting a compact setup consisting of a camera module, OLED display, processing unit, audio output system, and spectacle-mounted frame.

4. Dataset Preparation
Gesture samples were collected and organized to represent commonly used signs and communication phrases. Data was cleaned, labeled, and prepared for model training and validation.

5. Model Training & Packaging
Multiple machine learning models were trained and evaluated to achieve reliable recognition performance. The final models were exported and packaged as optimized .pkl and .h5 files for deployment.

6. Software Development
A modular software architecture was developed to separate gesture recognition, display management, audio feedback, and utility services, making the system easier to maintain and expand.

7. Hardware Integration
The electronic components were assembled onto a wearable spectacle-fitment prototype. Special attention was given to display positioning, wiring management, and user comfort.

8. User Interface Development
The OLED interface and audio feedback system were refined to ensure that recognized information could be delivered clearly, quickly, and consistently to the user.

9. Testing & Validation
Extensive testing was performed under different lighting conditions, hand positions, and usage scenarios to evaluate recognition accuracy and system stability.

10. Optimization
Several iterations were carried out to reduce inference time, improve prediction consistency, enhance display readability, and optimize overall system responsiveness.

11. Deployment & Demonstration
The final system was documented, deployed through a web interface, and demonstrated through live testing sessions, showcasing real-time assistive communication capabilities in a wearable form factor.

 

36. How To Rebuild This Project From Scratch

One of the primary goals behind SPARC was to create a modular and reproducible assistive technology platform. The project has been structured so that students, researchers, and developers can easily recreate, test, and further enhance the system without redesigning the complete architecture.

Hardware Requirements

The following hardware components are required to recreate the wearable prototype:

  • Raspberry Pi 4 (or equivalent edge-computing platform)
  • USB Camera for gesture acquisition
  • Waveshare OLED Display
  • Microphone Module
  • Bluetooth Speaker / Earphones
  • Spectacle-Mounted Wearable Frame
  • Portable Battery Pack
  • Standard USB and GPIO Interconnects

The hardware can initially be assembled on a development table for testing and later integrated into the wearable spectacle-fitment prototype.

Software Requirements

SPARC is built using a Python-based AI and Computer Vision stack.

Core Software Dependencies:

  • Python 3.9+
  • OpenCV
  • NumPy
  • Pandas
  • MediaPipe
  • TensorFlow / Keras
  • Scikit-Learn
  • Joblib
  • gTTS
  • Pygame
  • SpeechRecognition
  • PyAudio
  • Pillow
  • Ultralytics YOLO

These libraries collectively handle image processing, hand landmark extraction, machine learning inference, display rendering, speech synthesis, and user interaction.

Repository Setup

Clone the repository and create a dedicated Python environment:

git clone https://github.com/Avishkar-byte/SPARC-Smart-Perception-Assistive-Reality-Companion.git

cd SPARC-Smart-Perception-Assistive-Reality-Companion

python -m venv venv

source venv/bin/activate

pip install -r requirements.txt

Once installation is complete, verify that all model files, assets, and dependencies are properly available within the project structure.

Project Structure Overview

The repository is organized into multiple modules responsible for different functionalities such as gesture recognition, emotion analysis, display services, audio output, and computer vision processing.

Major repository components include:

  • SPARC Core Application
  • Gesture Recognition Module
  • Emotion Recognition Module
  • Vision Processing Module
  • Audio Services
  • OLED Display Services
  • Trained Machine Learning Models
  • Utility Scripts
  • Configuration Files
  • Deployment Assets

This modular architecture simplifies maintenance, debugging, and future feature expansion.

System Initialization Workflow

When the application starts, the following sequence is executed automatically:

  1. Configuration files are loaded.
  2. Camera devices are initialized.
  3. Trained machine-learning models are loaded into memory.
  4. OLED display services are activated.
  5. Audio services are initialized.
  6. Required resources are verified.
  7. The real-time recognition engine is launched.

After successful initialization, the system enters live assistive communication mode.

Running the Project

Launch the system using:

python run.py

After startup, the application begins capturing live video frames, extracting hand landmarks, performing gesture classification, and generating real-time text and audio outputs.

Expected Execution Pipeline

The complete workflow follows the sequence below:

Gesture Capture → Landmark Extraction → Feature Generation → Gesture Classification → Sentence Formation → OLED Display → Audio Output

This pipeline enables real-time communication support while maintaining low latency and offline operation.

Testing and Validation

Before deploying the wearable system, the following checks should be performed:

✔ Camera feed verification

✔ Hand landmark detection validation

✔ Gesture prediction verification

✔ OLED display visibility testing

✔ Audio playback testing

✔ Latency measurement

✔ End-to-end communication validation

Testing should be performed under different lighting conditions, backgrounds, and user positions to ensure robust performance.

Troubleshooting Guide

Camera Not Detected

  • Verify USB connectivity.
  • Check operating system camera permissions.
  • Confirm the configured camera index.

No Audio Output

  • Verify speaker or earphone connection.
  • Check system audio settings.
  • Confirm Text-to-Speech dependencies are installed correctly.

OLED Display Not Responding

  • Verify wiring connections.
  • Confirm Waveshare display drivers are installed.
  • Check OLED initialization settings.

Model Loading Errors

  • Ensure all trained model files are present.
  • Verify configured model paths.
  • Confirm compatible dependency versions.

Performance Issues

  • Reduce camera resolution.
  • Close unnecessary background applications.
  • Use hardware acceleration where available.

37. How To Run And Test

1. First, run the repository from the root with python run.py.

2. Confirm that the OLED shows the startup message.

3. Choose ISL or ASL mode.

4. Open gesture mode and verify that the live feed is responding.

5. Check whether recognized symbols appear consistently on screen.

6. Speak a sample sentence and verify the audio response.

7. Repeat the test under different lighting conditions.

 

                           Recommended testing flow for the demo build

38. Deployment and Live Demo

The repository has a companion web experience deployed at the provided Vercel link.


            Live companion deployment used to present the project online

Link: https://sparc-web-app.vercel.app/
 

 

39. Demonstration Video

40. Performance Metrics and Evaluation

 

MetricMeasured resultWhy it matters
ISL recognition accuracy90%+Shows the system is practically useful.
Sign-to-token latency~380 msKeeps conversation responsive.
Speech-to-caption latency~800 msMaintains live interaction flow.
Inference speed~18 FPS on Raspberry Pi 4 / ~42 FPS on laptop CPUProves usable edge performance.
Dataset size12,000+ imagesSupports model robustness.
[email protected]~0.92Indicates strong detection quality.
On-device operationAll localImproves privacy and reliability.

41. Lessons Learned

  • A good wearable must be comfortable before it is impressive.
  • Modular software makes hardware prototyping easier.
  • The smallest screen in the system can decide whether the demo feels polished.
  • A stable live path is better than a complicated but unreliable feature set.
  • Documentation is part of the product, not an afterthought.

42. Applications and Impact

The social value of the project is one of its strongest assets. It is not a gadget for a shelf; it is a tool that can reduce friction in real human interactions.

43. Conclusion

SPARC is more than a gesture recognition system; it is a practical assistive technology platform designed to improve accessibility, independence, and communication for Deaf and speech-impaired individuals. By combining computer vision, machine learning, speech synthesis, wearable electronics, and real-time feedback mechanisms, the project demonstrates how multiple technologies can work together to solve a meaningful real-world problem.

Throughout the development journey, the focus remained on creating a solution that is portable, user-friendly, and capable of operating in real-world environments

Codes

Downloads

CAD Diagram Download

Institute / Organization

Vellore Institute of Technology, Chennai
Comments
Ad