Sanchar Mitra - a Communication Assistant That Enables Real-time Bidirectional Interaction Between Speech and Hearing-impaired

Published Jun 15, 2026
 20 hours to build
 Expert

Sanchar Mitra is an Edge AI-powered sign language translation system that enables real-time communication between hearing-impaired individuals and the general public. By combining computer vision, machine learning, offline speech recognition, and text-to-speech technologies, it delivers low-latency, accessible communication directly on embedded hardware without relying on cloud connectivity.

display image

Components Used

Microphone
Microphones microphone, 4 mmx1.5mm, electret condenser, noise cancelling, solder pads, 1 Vdc
1
Speaker
Speakers & Transducers 28 mm, Round Frame, 0.25 W, 32 Ohm, Neodymium Magnet, PET Cone, Speaker
1
STM32N6570-DK
1
Camera module / webcam
1
Description

How We Built Sanchar Mitra 

What does a better world look like?

To us, a better world is one where no one is excluded from communication.

According to the Census of India, more than 5 million individuals live with hearing and speech disabilities. Despite rapid advances in Artificial Intelligence, many assistive technologies still depend on cloud services, smartphones, and internet connectivity. This creates barriers in accessibility, privacy, and affordability.

At the same time, modern AI systems are becoming increasingly resource-hungry, relying on powerful GPUs and large cloud infrastructure.

This inspired us to develop Sanchaar Mitra, an AI-powered communication assistant built on Edge AI principles. The project demonstrates how intelligent systems can run efficiently on embedded hardware while creating meaningful social impact.

Our goal is simple:

AI for All. AI for Social Good.

 

Hardware

  1. STM32N6570-DK Discovery Kit
  2. USB Type-C Cable
  3. Laptop / PC for host-side processing
  4. Onboard Camera (STM32N6570-DK)
  5. Microphone
  6. Speaker / Audio Output Device
  7. Integrated LCD Display
  8. UART Communication Interface

Software & Development Tools

  1. STM32CubeIDE
  2. STM32CubeProgrammer
  3. STM32 Model Zoo
  4. X-CUBE-AI
  5. Python
  6. Scikit-learn
  7. Vosk (Offline Speech-to-Text)
  8. Ollama
  9. Llama 3.2
  10. PySerial

Machine Learning Resources

  1. ASL Alphabet Hand Landmark Dataset (Kaggle)
  2. Custom Sign Language Gesture Dataset
  3. Random Forest Classifier

Optional Future Enhancements

  1. Raspberry Pi 5
  2. Battery Pack
  3. Dedicated Camera Module
  4. Custom PCB
  5. 3D Printed Enclosure

Key Technologies Used

  1. Edge AI
  2. Embedded Machine Learning
  3. Computer Vision
  4. Sign Language Recognition
  5. Speech-to-Text (STT)
  6. Text-to-Speech (TTS)
  7. Neural Processing Unit (NPU) Acceleration
  8. Offline AI Inference
  9. Full-Duplex UART Communication

Github For Dataset and Code - https://github.com/maitreya0106/sanchar-mitra
 


Step 1: Understanding the Problem

 


Speech- and hearing-impaired individuals often face communication barriers in:

  1. Schools
  2. Hospitals
  3. Government offices
  4. Workplaces
  5. Public spaces

Most people do not understand sign language, making everyday interactions difficult.

Traditional assistive technologies often require:

  1. Internet connectivity
  2. Smartphones
  3. Cloud processing

We wanted to build a system that works locally, protects privacy, and provides real-time communication assistance.



Step 2: Defining the Solution
 

An AI-powered communication assistant capable of:

  1. Sign Language → Speech
  2. Speech → Text
  3. AI-powered Conversations

The system enables real-time bidirectional communication between hearing and speech-impaired individuals and the rest of society.

 


Step 3: System Architecture

 

The complete system consists of:

Embedded Side

  1. STM32N6570-DK
  2. Onboard Camera
  3. LCD Display
  4. NPU Accelerated AI Model

Host Side

  1. Python Application
  2. Random Forest Classifier
  3. Speech-to-Text Engine
  4. Text-to-Speech Engine
  5. Local LLM

The embedded device performs hand landmark detection and sends landmark coordinates to the host system for gesture classification.
 


Step 4: Hardware Used
 

Components

  1. STM32N6570-DK Discovery Kit
  2. USB Type-C Cable
  3. Laptop
  4. Integrated LCD Display
  5. Camera Interface
  6. Speaker
  7. Microphone

Why STM32N6?

The STM32N6 features an integrated Neural Processing Unit (NPU) that allows efficient AI inference directly on embedded hardware.

This makes it ideal for Edge AI applications where low power consumption and privacy are important.



Step 5: Setting Up the STM32 Environment
 

The development environment was built using:

  1. STM32CubeIDE
  2. STM32CubeProgrammer
  3. STM32 Model Zoo
  4. X-CUBE-AI

Tasks Performed

  1. Flashing firmware
  2. Configuring camera pipeline
  3. Loading AI model
  4. Enabling UART communication


 

Step 6: Deploying the Hand Landmark Detection Model


We used the STM32 Model Zoo Hand Landmark model.

The model performs:

  1. Palm Detection
  2. Hand Tracking
  3. Landmark Extraction

The model outputs:

  1. 21 hand landmarks
  2. 63 coordinate values

All inference runs directly on the STM32N6 NPU.

Memory usage remains around 4.2 MB.

 

Step 7: UART Communication Pipeline
 

The extracted landmark values are streamed over UART.

Communication Parameters:

  1. Baud Rate: 115200
  2. Full Duplex Communication

The STM32 board continuously sends landmark coordinates to the host system.

Python files attached below( stt.py, uart.py )

 

Step 8: Building the Gesture Recognition Engine


The gesture recognition pipeline was developed using Python and Scikit-Learn.

Training Dataset Sources:

  1. ASL Kaggle Dataset
  2. Custom Collected Dataset

Training Process:

  1. Remove noisy Z-coordinate
  2. Use X and Y coordinates only
  3. Normalize landmarks
  4. Train Random Forest Model

Final Model:

  1. 200 Trees
  2. Confidence Filtering
  3. Two-Frame Confirmation Logic

 

Python files attached below(collect_all.py, collect_data.py, download_dataset.py, train_model.py)

 

 

Step 9: Sign Language to Speech

Once a gesture is recognized:

  1. Character is displayed on LCD
  2. Text is converted to speech

The speech synthesis system enables hearing individuals to understand the message instantly.

 

Step 10: Reverse Communication Mode

For reverse communication:

  1. Microphone captures speech
  2. Whisper-based STT processes audio
  3. Text is displayed for deaf users

This creates a complete two-way communication system.

 

Step 11: AI Interaction Mode

We extended the system by integrating:

  1. Ollama
  2. Llama 3.2

Recognized sentences can be sent to a local language model to provide contextual responses.

The AI assistant runs locally without cloud dependency.

 

 

Step 12: Testing and Results
 

Key Achievements:

  1. Real-time gesture recognition
  2. Bidirectional communication
  3. Offline operation
  4. Low-latency performance
  5. Privacy-preserving architecture

The project demonstrates how meaningful AI applications can run efficiently on embedded hardware.


 


Step 13: Impact on Society


Potential deployment areas:

  1. Schools
  2. Hospitals
  3. Government Offices
  4. Public Service Kiosks
  5. Smart Cities
  6. Workplace Accessibility

Sanchar Mitra demonstrates how technology can empower people who are often excluded from mainstream digital systems.

 



Step 14: Recognition and Learning Journey

 

Sanchaar Mitra was awarded:

  1. Winner – ST Innovation Fair 2026

One of the most rewarding experiences during this project was presenting our work to industry leaders from STMicroelectronics, including Alessandro Cremonesi (CIO) and Giuseppe Desoli (Chief Architect of STM32N6).

Working with the STM32N6 platform showcased how powerful Edge AI can be when combined with a meaningful social purpose.
 

 

Step 15: Future Scope
 

Future improvements include:

  1. Fully standalone operation
  2. On-device STT
  3. On-device TTS
  4. More sign language vocabulary
  5. Battery-powered deployment
  6. Custom PCB
  7. Compact enclosure

 

Step 16: Conclusion

 

Sanchaar Mitra is more than a technical project.

It is an attempt to make AI:

  1. Inclusive
  2. Accessible
  3. Efficient

Because the true power of AI lies not in how advanced it is, but in who it empowers.

A better world is one where everyone can communicate, participate, and be heard.



Thank You for reading.
 

Downloads

All Codes Mentioned Above Download

Institute / Organization

Jaypee Institue of Information Technology
Comments
Ad