AURA CHATBOT

Published Dec 02, 2025
 1 hours to build
 Intermediate

This project is the development of a fully integrated, standalone Voice Assistant built upon the ESP32-C6 microcontroller. It leverages two major cloud AI services—Gemini 2.0 Flash for intelligence and Murf AI for realistic speech generation—to create a seamless, conversational experience for the user.

display image

Description

This voice assistant is designed to provide a seamless, hands-free conversational experience by turning spoken questions into spoken answers.

Here is what it can do:

  1. Listen and Record: It can capture your voice when you press a button, using its microphone to record your question as an audio file.
  2. Transcribe and Understand: It sends your audio file to the Gemini AI to automatically turn your speech into text (transcription) and understand what you are asking (intent).
  3. Answer Questions: It uses the intelligence of Gemini to generate a concise and helpful text response to your query.
  4. Speak Back: It takes the AI's text answer and sends it to the Murf AI service, which converts the text into a realistic, natural-sounding audio response.
  5. Deliver Audio: It plays the final, synthesized speech through its speaker, delivering the answer to you in a natural voice.

    WHY I CHOOSE GEMINI AND MURF AI  

    GEMINI AI

     

I chose Gemini for my LLM due to the following strategic reasons:

1. Cost-Effectiveness and Accessibility

Free-of-Cost Tier: The availability of a free tier is crucial for individual users, students, and hobbyists, allowing full experimentation and development without initial financial burden.

Competitive Pricing: Even for professional use, the paid tiers for Gemini Pro and Flash are usually competitively priced when compared with other leading LLMs, providing an excellent balance of performance and cost.

2. Native Multimodal Processing (Faster Audio

Architecture from Gemini allows native audio file input, which bypasses the traditional Speech-to-Text chain. In other words, it means the model processes an audio signal directly without conversion into text form first, significantly reducing latency.

Deeper audio understanding can be achieved by Gemini analyzing raw audio data beyond mere words. This will include intonation, emotion, and speaker identity through diarization for more contextual and accurate responses than what would have been possible with a text-only transcript. 

3. Superior Capability and Versatility True Multimodality: While LLMs were originally text-only and had vision/audio capabilities "bolted on," Gemini was designed from the ground up to be multimodal. It can process and reason across text, code, images, and audio, all in one prompt. Advanced Reasoning: Pro and Ultra versions of Gemini afford advanced reasoning and planning, which can handle more complex, multi-step tasks in agentic workflows than their many competitors.

MURF AI 

Ultra-Realistic, Human-Like Voices

  • Natural Sounding: Murf utilizes advanced neural network algorithms to generate voices that are expressive, smooth, and closely resemble human speech, avoiding the robotic or monotone sound of older TTS systems.
  • Multilingual Output: Some advanced models can seamlessly switch between multiple languages within a single audio generation, ensuring accurate pronunciation and a natural flow when code-mixing is required.

 

  1. DIGIKEY MY LIST  =    https://www.digikey.in/en/mylists/list/SGYEKYUNPG

     

    COMPONENT FOR AURA CHABOT

     

    1*ESP32C6 DEVKIT

  2. 1*MAX98357A AMPLIFIER
  3. 1*8 GB SD CARD with its module 
  4.  
  5. 1*PUSH BUTTON
  6. 1* 1W 8OHM SPEAKER
  7. 1*INMP441 MIC MEMS

    2*MALE TO MALE JUMPER WIRE

  8. 1*BREADBOARD
  9.  

  10. code of aura chatbot

  11.   Wiring: OF AURA CHATBOT
    INMP441 (Microphone Input):
    VDD  -> 3.3V
    GND  -> GND
    SD   -> GPIO 2
    WS   -> GPIO 3
    SCK  -> GPIO 4
    L/R  -> GND
    
    MAX98357A (Speaker Output):
    VIN  -> 3.3V or 5V
    GND  -> GND
    DIN  -> GPIO 22
    BCLK -> GPIO 21
    LRC  -> GPIO 20
    
    SD Card (SPI):
    VCC  -> 3.3V
    GND  -> GND
    MISO -> GPIO 5
    MOSI -> GPIO 6
    SCK  -> GPIO 7
    CS   -> GPIO 10
    
    Button:
    One side -> GPIO 23
    Other side -> GND

/*

  ESP32-C6 Voice Assistant

  - Record audio with INMP441 (press & hold button)

  - Send to Gemini 2.0 Flash for AI response

  - Convert response to speech with Murf AI

  - Play audio through MAX98357A speaker

 

  Wiring:

  INMP441 (Microphone Input):

    VDD  -> 3.3V

    GND  -> GND

    SD   -> GPIO 2

    WS   -> GPIO 3

    SCK  -> GPIO 4

    L/R  -> GND

 

  MAX98357A (Speaker Output):

    VIN  -> 3.3V or 5V

    GND  -> GND

    DIN  -> GPIO 22

    BCLK -> GPIO 21

    LRC  -> GPIO 20

   

  SD Card (SPI):

    VCC  -> 3.3V

    GND  -> GND

    MISO -> GPIO 5

    MOSI -> GPIO 6

    SCK  -> GPIO 7

    CS   -> GPIO 10

 

  Button:

    One side -> GPIO 9

    Other side -> GND

*/

 

#include <Arduino.h>

#include <WiFi.h>

#include <HTTPClient.h>

#include <WiFiClientSecure.h>

#include <driver/i2s.h>

#include <SD.h>

#include <SPI.h>

#include <ArduinoJson.h>

 

// WiFi credentials

const char* ssid = "*******************";   //CHANGE IT WITH YOUR SSID/WIFI NAME

const char* password = "*************";    //CHANGE IT WITH YOUR WIFI PASSWAORD

 

// API Keys

const char* geminiApiKey = "*************";  // CHANGE IT WITH GEMINI API KEY

const char* murfApiKey = "***********";  //CHANGE IT WITH MURF AI API KEY FOR TTS

 

// Pin definitions - INMP441 (Input)

#define I2S_MIC_WS 3

#define I2S_MIC_SD 2

#define I2S_MIC_SCK 4

 

// Pin definitions - MAX98357A (Output)

#define I2S_SPK_DIN 22

#define I2S_SPK_BCLK 21

#define I2S_SPK_LRC 20

 

// SD Card pins

#define SD_CS 10

#define SD_MISO 5

#define SD_MOSI 6

#define SD_SCK 7

 

// Button

#define BUTTON_PIN 23

 

// Audio configuration

#define SAMPLE_RATE 16000

#define SAMPLE_RATE_PLAY 24000

 

// Files

const char* recordingFile = "/recording.wav";

const char* responseFile = "/response.wav";

 

// State

bool sdCardReady = false;

bool isRecording = false;

 

// WAV header

struct WavHeader {

  char riff[4] = {'R', 'I', 'F', 'F'};

  uint32_t fileSize;

  char wave[4] = {'W', 'A', 'V', 'E'};

  char fmt[4] = {'f', 'm', 't', ' '};

  uint32_t fmtSize = 16;

  uint16_t audioFormat = 1;

  uint16_t numChannels = 1;

  uint32_t sampleRate = SAMPLE_RATE;

  uint32_t byteRate;

  uint16_t blockAlign;

  uint16_t bitsPerSample = 16;

  char data[4] = {'d', 'a', 't', 'a'};

  uint32_t dataSize;

};

 

// ==================== I2S SETUP ====================

 

void setupI2S_Mic() {

  i2s_driver_uninstall(I2S_NUM_0);

 

  i2s_config_t i2s_config = {

    .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),

    .sample_rate = SAMPLE_RATE,

    .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,

    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,

    .communication_format = I2S_COMM_FORMAT_STAND_I2S,

    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,

    .dma_buf_count = 4,

    .dma_buf_len = 1024,

    .use_apll = false,

    .tx_desc_auto_clear = false,

    .fixed_mclk = 0

  };

 

  i2s_pin_config_t pin_config = {

    .bck_io_num = I2S_MIC_SCK,

    .ws_io_num = I2S_MIC_WS,

    .data_out_num = I2S_PIN_NO_CHANGE,

    .data_in_num = I2S_MIC_SD

  };

 

  i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);

  i2s_set_pin(I2S_NUM_0, &pin_config);

  i2s_zero_dma_buffer(I2S_NUM_0);

  delay(100);

 

  Serial.println("I2S Microphone ready");

}

 

void setupI2S_Speaker() {

  i2s_driver_uninstall(I2S_NUM_0);

 

  i2s_config_t i2s_config = {

    .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),

    .sample_rate = SAMPLE_RATE_PLAY,

    .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,

    .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,

    .communication_format = I2S_COMM_FORMAT_STAND_I2S,

    .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,

    .dma_buf_count = 8,

    .dma_buf_len = 1024,

    .use_apll = false,

    .tx_desc_auto_clear = true,

    .fixed_mclk = 0

  };

 

  i2s_pin_config_t pin_config = {

    .bck_io_num = I2S_SPK_BCLK,

    .ws_io_num = I2S_SPK_LRC,

    .data_out_num = I2S_SPK_DIN,

    .data_in_num = I2S_PIN_NO_CHANGE

  };

 

  i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);

  i2s_set_pin(I2S_NUM_0, &pin_config);

  i2s_zero_dma_buffer(I2S_NUM_0);

 

  Serial.println("I2S Speaker ready");

}

 

// ==================== SD CARD ====================

 

void setupSD() {

  Serial.println("Initializing SD card...");

 

  SPI.begin(SD_SCK, SD_MISO, SD_MOSI, SD_CS);

 

  for (int i = 0; i < 3; i++) {

    if (SD.begin(SD_CS, SPI, 4000000)) {

      sdCardReady = true;

      break;

    }

    delay(500);

  }

 

  if (!sdCardReady) {

    Serial.println("ERROR: SD Card failed!");

    return;

  }

 

  Serial.printf("SD Card Size: %lluMB\n", SD.cardSize() / (1024 * 1024));

}

 

// ==================== WIFI ====================

 

void setupWiFi() {

  WiFi.begin(ssid, password);

  Serial.print("Connecting to WiFi");

 

  int attempts = 0;

  while (WiFi.status() != WL_CONNECTED && attempts < 30) {

    delay(500);

    Serial.print(".");

    attempts++;

  }

 

  if (WiFi.status() == WL_CONNECTED) {

    Serial.println("\nWiFi connected!");

    Serial.println(WiFi.localIP());

  } else {

    Serial.println("\nWiFi failed!");

  }

}

 

// ==================== RECORDING ====================

 

void writeWavHeader(File &file, uint32_t dataSize) {

  WavHeader header;

  header.byteRate = SAMPLE_RATE * 1 * 2;

  header.blockAlign = 1 * 2;

  header.dataSize = dataSize;

  header.fileSize = dataSize + 36;

 

  file.seek(0);

  file.write((uint8_t*)&header, sizeof(header));

}

 

void recordAudio() {

  Serial.println("\n🎤 RECORDING - Release button to stop...");

 

  if (!sdCardReady) {

    Serial.println("SD Card not ready!");

    return;

  }

 

  setupI2S_Mic();

 

  if (SD.exists(recordingFile)) {

    SD.remove(recordingFile);

    delay(50);

  }

 

  File audioFile = SD.open(recordingFile, FILE_WRITE, true);

  if (!audioFile) {

    Serial.println("Failed to create file!");

    return;

  }

 

  // Write placeholder header

  WavHeader header;

  audioFile.write((uint8_t*)&header, sizeof(header));

 

  int16_t buffer[512];

  size_t bytesRead;

  uint32_t totalDataSize = 0;

  unsigned long lastPrint = 0;

 

  isRecording = true;

 

  // Record while button is pressed

  while (digitalRead(BUTTON_PIN) == LOW) {

    esp_err_t result = i2s_read(I2S_NUM_0, buffer, sizeof(buffer), &bytesRead, pdMS_TO_TICKS(100));

   

    if (result == ESP_OK && bytesRead > 0) {

      // Amplify

      for (int i = 0; i < bytesRead / 2; i++) {

        buffer[i] = constrain(buffer[i] * 4, -32768, 32767);

      }

     

      audioFile.write((uint8_t*)buffer, bytesRead);

      totalDataSize += bytesRead;

     

      // Print progress every second

      if (millis() - lastPrint > 1000) {

        Serial.printf("Recording: %.1f sec\n", totalDataSize / 32000.0);

        lastPrint = millis();

      }

    }

  }

 

  isRecording = false;

 

  // Update header

  writeWavHeader(audioFile, totalDataSize);

  audioFile.close();

 

  Serial.printf("✅ Recording complete! %.1f seconds\n", totalDataSize / 32000.0);

}

 

// ==================== GEMINI API ====================

 

String uploadToGemini() {

  Serial.println("📤 Uploading to Gemini...");

 

  File audioFile = SD.open(recordingFile, FILE_READ);

  if (!audioFile) {

    Serial.println("Failed to open file");

    return "";

  }

 

  size_t fileSize = audioFile.size();

 

  WiFiClientSecure client;

  client.setInsecure();

 

  if (!client.connect("generativelanguage.googleapis.com", 443)) {

    Serial.println("Connection failed");

    audioFile.close();

    return "";

  }

 

  String uploadUrl = "/upload/v1beta/files?key=" + String(geminiApiKey);

 

  client.println("POST " + uploadUrl + " HTTP/1.1");

  client.println("Host: generativelanguage.googleapis.com");

  client.println("Content-Type: audio/wav");

  client.println("X-Goog-Upload-Protocol: raw");

  client.println("Content-Length: " + String(fileSize));

  client.println("Connection: close");

  client.println();

 

  uint8_t buffer[512];

  while (audioFile.available()) {

    size_t bytesRead = audioFile.read(buffer, sizeof(buffer));

    client.write(buffer, bytesRead);

  }

  audioFile.close();

 

  // Read response

  String response = "";

  unsigned long timeout = millis() + 30000;

 

  while (client.connected() && millis() < timeout) {

    if (client.available()) {

      response += (char)client.read();

    }

  }

  client.stop();

 

  // Parse file URI

  int jsonStart = response.indexOf("\r\n\r\n");

  if (jsonStart == -1) return "";

 

  String json = response.substring(jsonStart + 4);

 

  DynamicJsonDocument doc(2048);

  if (deserializeJson(doc, json)) return "";

 

  String fileUri = doc["file"]["uri"].as<String>();

  Serial.println("File URI: " + fileUri);

 

  return fileUri;

}

 

String askGemini(String fileUri) {

  Serial.println("🤖 Asking Gemini...");

 

  HTTPClient http;

  String url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=" + String(geminiApiKey);

 

  http.begin(url);

  http.addHeader("Content-Type", "application/json");

  http.setTimeout(60000);

 

  // Build request

  DynamicJsonDocument doc(1024);

  JsonArray contents = doc.createNestedArray("contents");

  JsonObject content = contents.createNestedObject();

  JsonArray parts = content.createNestedArray("parts");

 

  JsonObject filePart = parts.createNestedObject();

  JsonObject fileData = filePart.createNestedObject("file_data");

  fileData["mime_type"] = "audio/wav";

  fileData["file_uri"] = fileUri;

 

  JsonObject textPart = parts.createNestedObject();

  textPart["text"] = "concise answer (under 100 words)";

 

  String payload;

  serializeJson(doc, payload);

 

  int httpCode = http.POST(payload);

  String answer = "";

 

  if (httpCode == HTTP_CODE_OK) {

    String response = http.getString();

   

    DynamicJsonDocument responseDoc(8192);

    if (!deserializeJson(responseDoc, response)) {

      answer = responseDoc["candidates"][0]["content"]["parts"][0]["text"].as<String>();

      Serial.println("\n💬 Gemini says:");

      Serial.println(answer);

    }

  } else {

    Serial.printf("HTTP Error: %d\n", httpCode);

    Serial.println(http.getString());

  }

 

  http.end();

  return answer;

}

 

// ==================== MURF TTS ====================

 

bool getMurfAudio(String text) {

  Serial.println("🔊 Getting speech from Murf...");

 

  HTTPClient http;

  http.begin("https://api.murf.ai/v1/speech/generate");

  http.addHeader("Content-Type", "application/json");

  http.addHeader("api-key", murfApiKey);

  http.setTimeout(60000);

 

  // Build request

  DynamicJsonDocument doc(4096);

  doc["text"] = text;

  doc["voiceId"] = "en-US-natalie";  // You can change this voice

  doc["format"] = "WAV";

  doc["sampleRate"] = SAMPLE_RATE_PLAY;

  doc["channelType"] = "MONO";

 

  String payload;

  serializeJson(doc, payload);

 

  int httpCode = http.POST(payload);

 

  if (httpCode == HTTP_CODE_OK) {

    String response = http.getString();

   

    DynamicJsonDocument responseDoc(2048);

    if (deserializeJson(responseDoc, response)) {

      Serial.println("Failed to parse Murf response");

      http.end();

      return false;

    }

   

    String audioUrl = responseDoc["audioFile"].as<String>();

    Serial.println("Audio URL: " + audioUrl);

   

    http.end();

   

    // Download the audio file

    return downloadAudioFile(audioUrl);

  } else {

    Serial.printf("Murf Error: %d\n", httpCode);

    Serial.println(http.getString());

    http.end();

    return false;

  }

}

 

bool downloadAudioFile(String url) {

  Serial.println("📥 Downloading audio...");

 

  HTTPClient http;

  http.begin(url);

  http.setTimeout(60000);

 

  int httpCode = http.GET();

 

  if (httpCode == HTTP_CODE_OK) {

    // Delete old file

    if (SD.exists(responseFile)) {

      SD.remove(responseFile);

    }

   

    File file = SD.open(responseFile, FILE_WRITE);

    if (!file) {

      Serial.println("Failed to create response file");

      http.end();

      return false;

    }

   

    WiFiClient* stream = http.getStreamPtr();

    uint8_t buffer[512];

    size_t totalBytes = 0;

   

    while (http.connected() && (stream->available() || totalBytes < http.getSize())) {

      size_t available = stream->available();

      if (available) {

        size_t bytesRead = stream->readBytes(buffer, min(available, sizeof(buffer)));

        file.write(buffer, bytesRead);

        totalBytes += bytesRead;

      }

      delay(1);

    }

   

    file.close();

    Serial.printf("Downloaded: %d bytes\n", totalBytes);

    http.end();

    return true;

  } else {

    Serial.printf("Download Error: %d\n", httpCode);

    http.end();

    return false;

  }

}

 

// ==================== PLAY AUDIO ====================

 

void playAudio() {

  Serial.println("🔈 Playing audio...");

 

  setupI2S_Speaker();

 

  File audioFile = SD.open(responseFile, FILE_READ);

  if (!audioFile) {

    Serial.println("Failed to open audio file");

    return;

  }

 

  // Skip WAV header (44 bytes)

  audioFile.seek(44);

 

  int16_t buffer[512];

  size_t bytesRead;

  size_t bytesWritten;

 

  while (audioFile.available()) {

    bytesRead = audioFile.read((uint8_t*)buffer, sizeof(buffer));

   

    // Boost volume

    for (int i = 0; i < bytesRead / 2; i++) {

      buffer[i] = constrain(buffer[i] * 2, -32768, 32767);

    }

   

    i2s_write(I2S_NUM_0, buffer, bytesRead, &bytesWritten, portMAX_DELAY);

  }

 

  // Wait for audio to finish

  delay(500);

  i2s_zero_dma_buffer(I2S_NUM_0);

 

  audioFile.close();

  Serial.println("✅ Playback complete!");

}

 

// ==================== MAIN ====================

 

void setup() {

  Serial.begin(115200);

  delay(1000);

 

  Serial.println("\n================================");

  Serial.println("ESP32-C6 Voice Assistant");

  Serial.println("Gemini AI + Murf TTS");

  Serial.println("================================\n");

 

  pinMode(BUTTON_PIN, INPUT_PULLUP);

 

  setupSD();

  setupWiFi();

  setupI2S_Mic();

 

  Serial.println("\n✅ System ready!");

  Serial.println("🎤 Press and HOLD button to record");

  Serial.println("   Release to send to AI\n");

}

 

void loop() {

  // Check if button is pressed

  if (digitalRead(BUTTON_PIN) == LOW) {

    delay(50);  // Debounce

   

    if (digitalRead(BUTTON_PIN) == LOW) {

      // Step 1: Record while button held

      recordAudio();

     

      // Step 2: Upload to Gemini

      String fileUri = uploadToGemini();

     

      if (!fileUri.isEmpty()) {

        delay(2000);  // Wait for file processing

       

        // Step 3: Get AI response

        String answer = askGemini(fileUri);

       

        if (answer.length() > 0) {

          // Step 4: Convert to speech

          if (getMurfAudio(answer)) {

            // Step 5: Play audio

            playAudio();

          }

        }

      }

     

      // Ready for next question

      setupI2S_Mic();

      Serial.println("\n🎤 Press and HOLD button to record\n");

    }

  }

 

  delay(10);

}

Install the ESP32 Board Package (Board Manager)

  1. Open Arduino IDE → File → Preferences.
  2. In Additional Boards Manager URLs, add:
  3. https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json
  4. Click OK.
  1. Go to Tools → Board → Boards Manager….
  2. Search “ESP32” and install esp32 by Espressif Systems (latest).
  3. Open the provided AURA_Chatbot.ino sketch into Arduino IDE.

 

 

 

  1. CHANGE SSID,PASSWORD ACCORDING TO YOUR WIFI SETTINGS
  2. CHANGE GEMINI API KEY OR MURF AI API KEY 
  1. GETTING GEMINI API KEY

     

  2. GETTING GEMINI API KEY  FOR LLM(LARGE LANGUAGE MODEL)
  3. OPEN GOOGLE AI STUDIO BY THIS LINK   https://ai.google.dev/gemini-api/docs/available-regions
  4. CLICK ON SIGN WHICH IS ON UPPER RIGHT CORNER
  5. AND SIGN IN BY YOUR EMAIL ID
  6. AFTER SIGN IN CLICK ON GET API KEY 
  7. AFTER CLICKING IT SENT YOU TO API KEY SECTION
  8. CLICK ON CREATE API KEY
  9. AFTER NAME YOUR KEY  e.g AURA CHATBOT
  10. SELECT DEFAULT GEMINI PROJECT
  11. AFTER CLICK ON CREATE KEY
  12. AFTER API KEY MAKING API KEY
  13. COPY IT AND PUT THIS API KEY INTO THE AURA CHATBOT SKETCH 
  14. GETTING AI API KEY FOR TTS

    GETTING MURF AI API KEY 

  15. OPEN MURF AI BY THIS LINK https://murf.ai/
  16. CLICK ON SIGN UP WHICH IS ON UPPER RIGHT CORNER
  17. AND SIGN IN WITH YOUR EMAIL ID
  18. AFTER SIGN IN CLICK ON THIS LINK    https://murf.ai/
  19. AFTER OPEN IT CLICK ON GET API KEY
  20. AFTER CLICKING IT SENT YOU TO MURF API SECTION

  21. AFTER CLICK API KEYS 
  22. CLICK ON PLUS ICON  AND NAME YOUR API KEY AND GENERATE API KEY
  23. COPY IT AND PASTE IT IN AURA CHATBOT SKETCH
  24. AFTER  PASTE IT IN SKETCH CLICK ON UPLOAD 
  25. AFTER UPLOADING CODE SO YOUR AURA CHATBAT IS READY TO ASSIST YOU

THE WORKING AURA CHATBOT  VIDEO 

 

VIDEO :  https://youtu.be/SSEEPJG_pKs?si=QnXWfaouDybGzS7c

 

Codes

Downloads

AURA CHATBOT CIRCUIT DIAGRAM Download
Comments
Ad