Project

WHY I CHOOSE GEMINI AND MURF AI
GEMINI AI
MURF AI
DIGIKEY MY LIST = https://www.digikey.in/en/mylists/list/SGYEKYUNPG
COMPONENT FOR AURA CHABOT
code of aura chatbot
GETTING GEMINI API KEY
GETTING MURF AI API KEY
THE WORKING AURA CHATBOT VIDEO

Code

Downloads

Credit

Comments

Report This Project

AURA CHATBOT

Published Dec 02, 2025

GPL-3.0-only

1 hours to build

Intermediate

arduino ide

google ai studio

murf ai

This project is the development of a fully integrated, standalone Voice Assistant built upon the ESP32-C6 microcontroller. It leverages two major cloud AI services—Gemini 2.0 Flash for intelligence and Murf AI for realistic speech generation—to create a seamless, conversational experience for the user.

Description

This voice assistant is designed to provide a seamless, hands-free conversational experience by turning spoken questions into spoken answers.

Here is what it can do:

Listen and Record: It can capture your voice when you press a button, using its microphone to record your question as an audio file.
Transcribe and Understand: It sends your audio file to the Gemini AI to automatically turn your speech into text (transcription) and understand what you are asking (intent).
Answer Questions: It uses the intelligence of Gemini to generate a concise and helpful text response to your query.
Speak Back: It takes the AI's text answer and sends it to the Murf AI service, which converts the text into a realistic, natural-sounding audio response.
Deliver Audio: It plays the final, synthesized speech through its speaker, delivering the answer to you in a natural voice.
WHY I CHOOSE GEMINI AND MURF AI
GEMINI AI

I chose Gemini for my LLM due to the following strategic reasons:

1. Cost-Effectiveness and Accessibility

Free-of-Cost Tier: The availability of a free tier is crucial for individual users, students, and hobbyists, allowing full experimentation and development without initial financial burden.

Competitive Pricing: Even for professional use, the paid tiers for Gemini Pro and Flash are usually competitively priced when compared with other leading LLMs, providing an excellent balance of performance and cost.

2. Native Multimodal Processing (Faster Audio

Architecture from Gemini allows native audio file input, which bypasses the traditional Speech-to-Text chain. In other words, it means the model processes an audio signal directly without conversion into text form first, significantly reducing latency.

Deeper audio understanding can be achieved by Gemini analyzing raw audio data beyond mere words. This will include intonation, emotion, and speaker identity through diarization for more contextual and accurate responses than what would have been possible with a text-only transcript.

3. Superior Capability and Versatility True Multimodality: While LLMs were originally text-only and had vision/audio capabilities "bolted on," Gemini was designed from the ground up to be multimodal. It can process and reason across text, code, images, and audio, all in one prompt. Advanced Reasoning: Pro and Ultra versions of Gemini afford advanced reasoning and planning, which can handle more complex, multi-step tasks in agentic workflows than their many competitors.

MURF AI

Ultra-Realistic, Human-Like Voices

Natural Sounding: Murf utilizes advanced neural network algorithms to generate voices that are expressive, smooth, and closely resemble human speech, avoiding the robotic or monotone sound of older TTS systems.
Multilingual Output: Some advanced models can seamlessly switch between multiple languages within a single audio generation, ensuring accurate pronunciation and a natural flow when code-mixing is required.

DIGIKEY MY LIST = https://www.digikey.in/en/mylists/list/SGYEKYUNPG

COMPONENT FOR AURA CHABOT

1*ESP32C6 DEVKIT
1*MAX98357A AMPLIFIER
1*8 GB SD CARD with its module
1*PUSH BUTTON
1* 1W 8OHM SPEAKER
1*INMP441 MIC MEMS
2*MALE TO MALE JUMPER WIRE
1*BREADBOARD
code of aura chatbot

  Wiring: OF AURA CHATBOT
INMP441 (Microphone Input):
VDD  -> 3.3V
GND  -> GND
SD   -> GPIO 2
WS   -> GPIO 3
SCK  -> GPIO 4
L/R  -> GND

MAX98357A (Speaker Output):
VIN  -> 3.3V or 5V
GND  -> GND
DIN  -> GPIO 22
BCLK -> GPIO 21
LRC  -> GPIO 20

SD Card (SPI):
VCC  -> 3.3V
GND  -> GND
MISO -> GPIO 5
MOSI -> GPIO 6
SCK  -> GPIO 7
CS   -> GPIO 10

Button:
One side -> GPIO 23
Other side -> GND

ESP32-C6 Voice Assistant

- Record audio with INMP441 (press & hold button)

- Send to Gemini 2.0 Flash for AI response

- Convert response to speech with Murf AI

- Play audio through MAX98357A speaker

Wiring:

INMP441 (Microphone Input):

VDD -> 3.3V

GND -> GND

SD -> GPIO 2

WS -> GPIO 3

SCK -> GPIO 4

L/R -> GND

MAX98357A (Speaker Output):

VIN -> 3.3V or 5V

GND -> GND

DIN -> GPIO 22

BCLK -> GPIO 21

LRC -> GPIO 20

SD Card (SPI):

VCC -> 3.3V

GND -> GND

MISO -> GPIO 5

MOSI -> GPIO 6

SCK -> GPIO 7

CS -> GPIO 10

Button:

One side -> GPIO 9

Other side -> GND

#include <Arduino.h>

#include <WiFi.h>

#include <HTTPClient.h>

#include <WiFiClientSecure.h>

#include <driver/i2s.h>

#include <SD.h>

#include <SPI.h>

#include <ArduinoJson.h>

// WiFi credentials

const char* ssid = "*******************"; //CHANGE IT WITH YOUR SSID/WIFI NAME

const char* password = "*************"; //CHANGE IT WITH YOUR WIFI PASSWAORD

// API Keys

const char* geminiApiKey = "*************"; // CHANGE IT WITH GEMINI API KEY

const char* murfApiKey = "***********"; //CHANGE IT WITH MURF AI API KEY FOR TTS

// Pin definitions - INMP441 (Input)

#define I2S_MIC_WS 3

#define I2S_MIC_SD 2

#define I2S_MIC_SCK 4

// Pin definitions - MAX98357A (Output)

#define I2S_SPK_DIN 22

#define I2S_SPK_BCLK 21

#define I2S_SPK_LRC 20

// SD Card pins

#define SD_CS 10

#define SD_MISO 5

#define SD_MOSI 6

#define SD_SCK 7

// Button

#define BUTTON_PIN 23

// Audio configuration

#define SAMPLE_RATE 16000

#define SAMPLE_RATE_PLAY 24000

// Files

const char* recordingFile = "/recording.wav";

const char* responseFile = "/response.wav";

// State

bool sdCardReady = false;

bool isRecording = false;

// WAV header

struct WavHeader {

char riff[4] = {'R', 'I', 'F', 'F'};

uint32_t fileSize;

char wave[4] = {'W', 'A', 'V', 'E'};

char fmt[4] = {'f', 'm', 't', ' '};

uint32_t fmtSize = 16;

uint16_t audioFormat = 1;

uint16_t numChannels = 1;

uint32_t sampleRate = SAMPLE_RATE;

uint32_t byteRate;

uint16_t blockAlign;

uint16_t bitsPerSample = 16;

char data[4] = {'d', 'a', 't', 'a'};

uint32_t dataSize;

};

// ==================== I2S SETUP ====================

void setupI2S_Mic() {

i2s_driver_uninstall(I2S_NUM_0);

i2s_config_t i2s_config = {

.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),

.sample_rate = SAMPLE_RATE,

.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,

.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,

.communication_format = I2S_COMM_FORMAT_STAND_I2S,

.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,

.dma_buf_count = 4,

.dma_buf_len = 1024,

.use_apll = false,

.tx_desc_auto_clear = false,

.fixed_mclk = 0

};

i2s_pin_config_t pin_config = {

.bck_io_num = I2S_MIC_SCK,

.ws_io_num = I2S_MIC_WS,

.data_out_num = I2S_PIN_NO_CHANGE,

.data_in_num = I2S_MIC_SD

};

i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);

i2s_set_pin(I2S_NUM_0, &pin_config);

i2s_zero_dma_buffer(I2S_NUM_0);

delay(100);

Serial.println("I2S Microphone ready");

}

void setupI2S_Speaker() {

i2s_driver_uninstall(I2S_NUM_0);

i2s_config_t i2s_config = {

.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),

.sample_rate = SAMPLE_RATE_PLAY,

.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,

.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,

.communication_format = I2S_COMM_FORMAT_STAND_I2S,

.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,

.dma_buf_count = 8,

.dma_buf_len = 1024,

.use_apll = false,

.tx_desc_auto_clear = true,

.fixed_mclk = 0

};

i2s_pin_config_t pin_config = {

.bck_io_num = I2S_SPK_BCLK,

.ws_io_num = I2S_SPK_LRC,

.data_out_num = I2S_SPK_DIN,

.data_in_num = I2S_PIN_NO_CHANGE

};

i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);

i2s_set_pin(I2S_NUM_0, &pin_config);

i2s_zero_dma_buffer(I2S_NUM_0);

Serial.println("I2S Speaker ready");

}

// ==================== SD CARD ====================

void setupSD() {

Serial.println("Initializing SD card...");

SPI.begin(SD_SCK, SD_MISO, SD_MOSI, SD_CS);

for (int i = 0; i < 3; i++) {

if (SD.begin(SD_CS, SPI, 4000000)) {

sdCardReady = true;

break;

}

delay(500);

}

if (!sdCardReady) {

Serial.println("ERROR: SD Card failed!");

return;

}

Serial.printf("SD Card Size: %lluMB\n", SD.cardSize() / (1024 * 1024));

}

// ==================== WIFI ====================

void setupWiFi() {

WiFi.begin(ssid, password);

Serial.print("Connecting to WiFi");

int attempts = 0;

while (WiFi.status() != WL_CONNECTED && attempts < 30) {

delay(500);

Serial.print(".");

attempts++;

}

if (WiFi.status() == WL_CONNECTED) {

Serial.println("\nWiFi connected!");

Serial.println(WiFi.localIP());

} else {

Serial.println("\nWiFi failed!");

}

// ==================== RECORDING ====================

void writeWavHeader(File &file, uint32_t dataSize) {

WavHeader header;

header.byteRate = SAMPLE_RATE * 1 * 2;

header.blockAlign = 1 * 2;

header.dataSize = dataSize;

header.fileSize = dataSize + 36;

file.seek(0);

file.write((uint8_t*)&header, sizeof(header));

}

void recordAudio() {

Serial.println("\n🎤 RECORDING - Release button to stop...");

if (!sdCardReady) {

Serial.println("SD Card not ready!");

return;

}

setupI2S_Mic();

if (SD.exists(recordingFile)) {

SD.remove(recordingFile);

delay(50);

}

File audioFile = SD.open(recordingFile, FILE_WRITE, true);

if (!audioFile) {

Serial.println("Failed to create file!");

return;

}

// Write placeholder header

WavHeader header;

audioFile.write((uint8_t*)&header, sizeof(header));

int16_t buffer[512];

size_t bytesRead;

uint32_t totalDataSize = 0;

unsigned long lastPrint = 0;

isRecording = true;

// Record while button is pressed

while (digitalRead(BUTTON_PIN) == LOW) {

esp_err_t result = i2s_read(I2S_NUM_0, buffer, sizeof(buffer), &bytesRead, pdMS_TO_TICKS(100));

if (result == ESP_OK && bytesRead > 0) {

// Amplify

for (int i = 0; i < bytesRead / 2; i++) {

buffer[i] = constrain(buffer[i] * 4, -32768, 32767);

}

audioFile.write((uint8_t*)buffer, bytesRead);

totalDataSize += bytesRead;

// Print progress every second

if (millis() - lastPrint > 1000) {

Serial.printf("Recording: %.1f sec\n", totalDataSize / 32000.0);

lastPrint = millis();

}

isRecording = false;

// Update header

writeWavHeader(audioFile, totalDataSize);

audioFile.close();

Serial.printf("✅ Recording complete! %.1f seconds\n", totalDataSize / 32000.0);

}

// ==================== GEMINI API ====================

String uploadToGemini() {

Serial.println("📤 Uploading to Gemini...");

File audioFile = SD.open(recordingFile, FILE_READ);

if (!audioFile) {

Serial.println("Failed to open file");

return "";

}

size_t fileSize = audioFile.size();

WiFiClientSecure client;

client.setInsecure();

if (!client.connect("generativelanguage.googleapis.com", 443)) {

Serial.println("Connection failed");

audioFile.close();

return "";

}

String uploadUrl = "/upload/v1beta/files?key=" + String(geminiApiKey);

client.println("POST " + uploadUrl + " HTTP/1.1");

client.println("Host: generativelanguage.googleapis.com");

client.println("Content-Type: audio/wav");

client.println("X-Goog-Upload-Protocol: raw");

client.println("Content-Length: " + String(fileSize));

client.println("Connection: close");

client.println();

uint8_t buffer[512];

while (audioFile.available()) {

size_t bytesRead = audioFile.read(buffer, sizeof(buffer));

client.write(buffer, bytesRead);

}

audioFile.close();

// Read response

String response = "";

unsigned long timeout = millis() + 30000;

while (client.connected() && millis() < timeout) {

if (client.available()) {

response += (char)client.read();

}

client.stop();

// Parse file URI

int jsonStart = response.indexOf("\r\n\r\n");

if (jsonStart == -1) return "";

String json = response.substring(jsonStart + 4);

DynamicJsonDocument doc(2048);

if (deserializeJson(doc, json)) return "";

String fileUri = doc["file"]["uri"].as<String>();

Serial.println("File URI: " + fileUri);

return fileUri;

}

String askGemini(String fileUri) {

Serial.println("🤖 Asking Gemini...");

HTTPClient http;

String url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=" + String(geminiApiKey);

http.begin(url);

http.addHeader("Content-Type", "application/json");

http.setTimeout(60000);

// Build request

DynamicJsonDocument doc(1024);

JsonArray contents = doc.createNestedArray("contents");

JsonObject content = contents.createNestedObject();

JsonArray parts = content.createNestedArray("parts");

JsonObject filePart = parts.createNestedObject();

JsonObject fileData = filePart.createNestedObject("file_data");

fileData["mime_type"] = "audio/wav";

fileData["file_uri"] = fileUri;

JsonObject textPart = parts.createNestedObject();

textPart["text"] = "concise answer (under 100 words)";

String payload;

serializeJson(doc, payload);

int httpCode = http.POST(payload);

String answer = "";

if (httpCode == HTTP_CODE_OK) {

String response = http.getString();

DynamicJsonDocument responseDoc(8192);

if (!deserializeJson(responseDoc, response)) {

answer = responseDoc["candidates"][0]["content"]["parts"][0]["text"].as<String>();

Serial.println("\n💬 Gemini says:");

Serial.println(answer);

}

} else {

Serial.printf("HTTP Error: %d\n", httpCode);

Serial.println(http.getString());

}

http.end();

return answer;

}

// ==================== MURF TTS ====================

bool getMurfAudio(String text) {

Serial.println("🔊 Getting speech from Murf...");

HTTPClient http;

http.begin("https://api.murf.ai/v1/speech/generate");

http.addHeader("Content-Type", "application/json");

http.addHeader("api-key", murfApiKey);

http.setTimeout(60000);

// Build request

DynamicJsonDocument doc(4096);

doc["text"] = text;

doc["voiceId"] = "en-US-natalie"; // You can change this voice

doc["format"] = "WAV";

doc["sampleRate"] = SAMPLE_RATE_PLAY;

doc["channelType"] = "MONO";

String payload;

serializeJson(doc, payload);

int httpCode = http.POST(payload);

if (httpCode == HTTP_CODE_OK) {

String response = http.getString();

DynamicJsonDocument responseDoc(2048);

if (deserializeJson(responseDoc, response)) {

Serial.println("Failed to parse Murf response");

http.end();

return false;

}

String audioUrl = responseDoc["audioFile"].as<String>();

Serial.println("Audio URL: " + audioUrl);

http.end();

// Download the audio file

return downloadAudioFile(audioUrl);

} else {

Serial.printf("Murf Error: %d\n", httpCode);

Serial.println(http.getString());

http.end();

return false;

}

bool downloadAudioFile(String url) {

Serial.println("📥 Downloading audio...");

HTTPClient http;

http.begin(url);

http.setTimeout(60000);

int httpCode = http.GET();

if (httpCode == HTTP_CODE_OK) {

// Delete old file

if (SD.exists(responseFile)) {

SD.remove(responseFile);

}

File file = SD.open(responseFile, FILE_WRITE);

if (!file) {

Serial.println("Failed to create response file");

http.end();

return false;

}

WiFiClient* stream = http.getStreamPtr();

uint8_t buffer[512];

size_t totalBytes = 0;

while (http.connected() && (stream->available() || totalBytes < http.getSize())) {

size_t available = stream->available();

if (available) {

size_t bytesRead = stream->readBytes(buffer, min(available, sizeof(buffer)));

file.write(buffer, bytesRead);

totalBytes += bytesRead;

}

delay(1);

}

file.close();

Serial.printf("Downloaded: %d bytes\n", totalBytes);

http.end();

return true;

} else {

Serial.printf("Download Error: %d\n", httpCode);

http.end();

return false;

}

// ==================== PLAY AUDIO ====================

void playAudio() {

Serial.println("🔈 Playing audio...");

setupI2S_Speaker();

File audioFile = SD.open(responseFile, FILE_READ);

if (!audioFile) {

Serial.println("Failed to open audio file");

return;

}

// Skip WAV header (44 bytes)

audioFile.seek(44);

int16_t buffer[512];

size_t bytesRead;

size_t bytesWritten;

while (audioFile.available()) {

bytesRead = audioFile.read((uint8_t*)buffer, sizeof(buffer));

// Boost volume

for (int i = 0; i < bytesRead / 2; i++) {

buffer[i] = constrain(buffer[i] * 2, -32768, 32767);

}

i2s_write(I2S_NUM_0, buffer, bytesRead, &bytesWritten, portMAX_DELAY);

}

// Wait for audio to finish

delay(500);

i2s_zero_dma_buffer(I2S_NUM_0);

audioFile.close();

Serial.println("✅ Playback complete!");

}

// ==================== MAIN ====================

void setup() {

Serial.begin(115200);

delay(1000);

Serial.println("\n================================");

Serial.println("ESP32-C6 Voice Assistant");

Serial.println("Gemini AI + Murf TTS");

Serial.println("================================\n");

pinMode(BUTTON_PIN, INPUT_PULLUP);

setupSD();

setupWiFi();

setupI2S_Mic();

Serial.println("\n✅ System ready!");

Serial.println("🎤 Press and HOLD button to record");

Serial.println(" Release to send to AI\n");

}

void loop() {

// Check if button is pressed

if (digitalRead(BUTTON_PIN) == LOW) {

delay(50); // Debounce

if (digitalRead(BUTTON_PIN) == LOW) {

// Step 1: Record while button held

recordAudio();

// Step 2: Upload to Gemini

String fileUri = uploadToGemini();

if (!fileUri.isEmpty()) {

delay(2000); // Wait for file processing

// Step 3: Get AI response

String answer = askGemini(fileUri);

if (answer.length() > 0) {

// Step 4: Convert to speech

if (getMurfAudio(answer)) {

// Step 5: Play audio

playAudio();

}

// Ready for next question

setupI2S_Mic();

Serial.println("\n🎤 Press and HOLD button to record\n");

}

delay(10);

}

Install the ESP32 Board Package (Board Manager)

Open Arduino IDE → File → Preferences.
In Additional Boards Manager URLs, add:

https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json

Click OK.

Go to Tools → Board → Boards Manager….
Search “ESP32” and install esp32 by Espressif Systems (latest).
Open the provided AURA_Chatbot.ino sketch into Arduino IDE.

CHANGE SSID,PASSWORD ACCORDING TO YOUR WIFI SETTINGS
CHANGE GEMINI API KEY OR MURF AI API KEY

GETTING GEMINI API KEY
GETTING GEMINI API KEY FOR LLM(LARGE LANGUAGE MODEL)
OPEN GOOGLE AI STUDIO BY THIS LINK https://ai.google.dev/gemini-api/docs/available-regions
CLICK ON SIGN WHICH IS ON UPPER RIGHT CORNER
AND SIGN IN BY YOUR EMAIL ID
AFTER SIGN IN CLICK ON GET API KEY
AFTER CLICKING IT SENT YOU TO API KEY SECTION
CLICK ON CREATE API KEY
AFTER NAME YOUR KEY e.g AURA CHATBOT
SELECT DEFAULT GEMINI PROJECT
AFTER CLICK ON CREATE KEY
AFTER API KEY MAKING API KEY
COPY IT AND PUT THIS API KEY INTO THE AURA CHATBOT SKETCH
GETTING AI API KEY FOR TTS
GETTING MURF AI API KEY
OPEN MURF AI BY THIS LINK https://murf.ai/
CLICK ON SIGN UP WHICH IS ON UPPER RIGHT CORNER
AND SIGN IN WITH YOUR EMAIL ID
AFTER SIGN IN CLICK ON THIS LINK https://murf.ai/
AFTER OPEN IT CLICK ON GET API KEY
AFTER CLICKING IT SENT YOU TO MURF API SECTION
AFTER CLICK API KEYS
CLICK ON PLUS ICON AND NAME YOUR API KEY AND GENERATE API KEY
COPY IT AND PASTE IT IN AURA CHATBOT SKETCH
AFTER PASTE IT IN SKETCH CLICK ON UPLOAD
AFTER UPLOADING CODE SO YOUR AURA CHATBAT IS READY TO ASSIST YOU