A DIY AI Voice Assistant for Home Assistant

by relfwolf in Circuits > Microcontrollers

321 Views, 5 Favorites, 0 Comments

A DIY AI Voice Assistant for Home Assistant

Screenshot From 2026-05-23 09-26-14.png

Build your own — intelligent, custom personality, real voice, controls your smart home.

A voice assistant you actually own. It runs through Home Assistant, uses an AI model of your choice for the brain (ChatGPT, Gemini, or a fully local model via Ollama), takes a custom wake word, and it can speak and act on its own, with no wake word at all. Mine closes the blinds at sunset and announces it, usually with an insult, because I gave it the personality of someone who finds me exhausting.

This guide covers building your own from an ESP32, a microphone, and a small amplifier and speaker. Everything you need is here: firmware, wiring, the Home Assistant side, the personality, and a proper walkthrough of the automation trick that makes it feel alive.

See my video on the build

Supplies

Hardware build (~$19)

Most parts come with headers and wires pre-soldered; some suppliers leave the headers for you. Once headers are on the ESP32, the whole build is just female-to-female jumper wires between the pins. No breadboard needed.

Waveshare ESP32-S3 Zero ($11) — Amazon

8Ω 2W speaker, 28mm round ($1) — Amazon

MAX98357 I2S audio amplifier ($2) — Amazon

I2S MEMS microphone ($5) — Amazon

Jumper wires, female-to-female — Amazon

Before You Build: You Might Not Need To

The first part of this guide covers building a small, cheap device so you can speak to the assistant and hear its responses. The self-build is the fun, cheap, customisable route. But if you'd rather skip the soldering entirely, two ready-made devices work in a very similar way — just skip to Step 4.

Home Assistant Voice PE (~$69) — home-assistant.io/voice-pe

M5Stack ATOM Echo (~$35) — Amazon

Hardware

Screenshot .png
Screenshot From 2026-05-23 09-33-07.png
Screenshot From 2026-05-23 10-17-46.png

Wire the speaker to the amplifier, then the amplifier and the microphone to the ESP32. That's the whole job.

Microphone → ESP32-S3

  1. Power: VDD / 3V → 3.3V
  2. Ground: GND → GND
  3. Channel select: SEL → GND
  4. Word select: WS / LRC → GP1
  5. Clock: SCK / BCLK → GP2
  6. Data out: SD / DOUT → GP3

MAX98357A amplifier → ESP32-S3

  1. Vin → 5V
  2. GND → GND
  3. LRC → GP4
  4. BCLK → GP5
  5. DIN → GP6
  6. GAIN → leave unconnected
  7. SD → leave unconnected

Flash the Firmware

Screenshot From 2026-05-23 08-27-39.png
esphome.png
esp.png

Use ESPHome to put the firmware on the board. Plug the ESP32 into your computer and go to web.esphome.io, or use the ESPHome Device Builder add-on inside Home Assistant.

web.esphome.io

Inside Home Assistant, make sure the ESPHome integration is installed.

ESPHome integration docs

  1. In Home Assistant, install the ESPHome Device Builder add-on and open it.
  2. + New Device → name it voice-assistant → choose ESP32-S3 → Skip.
  3. Click Edit and paste in voice-assistant.yaml, replacing everything.
esphome:
name: basic-voice-assistant
friendly_name: Basic Voice Assistant

esp32:
board: waveshare_esp32_s3_zero
framework:
type: esp-idf

psram:
mode: quad
speed: 80MHz

logger:
level: INFO

api:

ota:
- platform: esphome
password: !secret ota_password

wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password

i2s_audio:
- id: i2s_mic
i2s_lrclk_pin: GPIO2
i2s_bclk_pin: GPIO1
- id: i2s_spk
i2s_lrclk_pin: GPIO5
i2s_bclk_pin: GPIO4

microphone:
- platform: i2s_audio
adc_type: external
pdm: false
id: mic_i2s
channel: left
bits_per_sample: 32bit
i2s_audio_id: i2s_mic
i2s_din_pin: GPIO3

speaker:
- platform: i2s_audio
id: i2s_speaker
dac_type: external
i2s_dout_pin: GPIO6
i2s_audio_id: i2s_spk
sample_rate: 16000
bits_per_sample: 16bit
channel: mono
buffer_duration: 500ms

media_player:
- platform: speaker
id: media_out
name: "Media Player"
announcement_pipeline:
speaker: i2s_speaker
format: MP3
num_channels: 2

light:
- platform: esp32_rmt_led_strip
id: status_led
name: "Status LED"
pin: GPIO21
num_leds: 1
chipset: WS2812
rgb_order: GRB
effects:
- pulse:
name: "Pulse"
transition_length: 0.5s
update_interval: 0.5s

binary_sensor:
- platform: status
name: "API Connection"
id: api_connection
filters:
- delayed_on: 1s
on_press:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
- script.execute: led_idle
on_release:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.stop:
- script.execute: led_off

script:
- id: led_idle
then:
- light.turn_on:
id: status_led
brightness: 30%
red: 0%
green: 0%
blue: 100%
effect: "Pulse"

- id: led_listening
then:
- light.turn_on:
id: status_led
brightness: 100%
red: 0%
green: 100%
blue: 100%
effect: "None"

- id: led_thinking
then:
- light.turn_on:
id: status_led
brightness: 100%
red: 100%
green: 80%
blue: 0%
effect: "None"

- id: led_speaking
then:
- light.turn_on:
id: status_led
brightness: 100%
red: 0%
green: 100%
blue: 0%
effect: "None"

- id: led_error
then:
- light.turn_on:
id: status_led
brightness: 100%
red: 100%
green: 0%
blue: 0%
effect: "None"

- id: led_off
then:
- light.turn_off:
id: status_led

- id: restart_va
then:
- delay: 500ms
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
- script.execute: led_idle

voice_assistant:
microphone: mic_i2s
media_player: media_out
id: va
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 4.0
use_wake_word: true

on_wake_word_detected:
- script.execute: led_listening
on_listening:
- script.execute: led_listening
on_stt_vad_start:
- script.execute: led_listening
on_stt_end:
- script.execute: led_thinking
on_tts_start:
- script.execute: led_speaking
on_end:
- wait_until:
condition:
speaker.is_playing: i2s_speaker
timeout: 10s
- wait_until:
condition:
not:
- speaker.is_playing: i2s_speaker
timeout: 120s
- delay: 300ms
- if:
condition:
switch.is_on: use_wake_word
then:
- script.execute: led_idle
else:
- script.execute: led_off
on_error:
- script.execute: led_error
- delay: 2s
- script.execute: restart_va
on_client_connected:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
- script.execute: led_idle
on_client_disconnected:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.stop:
- script.execute: led_off

switch:
- platform: template
name: "Use wake word"
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
on_turn_on:
- lambda: id(va).set_use_wake_word(true);
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
- script.execute: led_idle
on_turn_off:
- voice_assistant.stop
- lambda: id(va).set_use_wake_word(false);
- script.execute: led_off
  1. Click Secrets (top right) and fill in your WiFi name and passwords.
# Your Wi-Fi SSID and password
wifi_ssid: "[WIFI NETWORK NAME]"
wifi_password: "[PASSWORD]"

# OTA update password (used by the basic-voice-assistant config)
ota_password: "[PASSWORD]" #Choose any password you like
  1. Install → Plug into this computer for the first flash. Hold the BOOT button while connecting USB if it isn't detected. Future updates go over the air.

Install the Integrations (ChatGPT)

Screenshot From 2026-05-23 08-50-34.png
Screenshot From 2026-05-23 08-36-15.png

From here on, everything is the same whether you built the device or bought a pre-made one. You'll add three things: an AI brain, a voice, and the speech plumbing.

The AI brain (ChatGPT)

This gives genuinely intelligent conversation and responses. Many agents work — Gemini, Claude, etc. I use ChatGPT as it's easiest to set up.

OpenAI Conversation integration

Go to openai.com/api, create an account, and open the API platform. In the API keys tab, click Create new secret key.

In Home Assistant under Settings → Devices & Services, open the OpenAI integration, add the service, and paste the API key when prompted.

Install the Integrations (ElevenLabs)

Screenshot From 2026-05-23 08-51-17.png
Screenshot From 2026-05-23 08-45-23.png
Screenshot From 2026-05-23 08-47-35.png

The voice (ElevenLabs text-to-speech)

Text-to-speech turns the AI's text reply into a realistic voice. There are many options; I use ElevenLabs for its thousands of high-quality voices.

ElevenLabs integration — install it.

Go to ElevenLabs and create an account.

Open the API keys tab and click Create key.

Enter this key in Home Assistant when you open the ElevenLabs integration and click Add service.


Install the Integrations (Wyoming Protocol)

Screenshot From 2026-05-23 08-50-11.png

Speech-to-text and Open Wake Word (Wyoming Protocol)

The Wyoming Protocol integration handles speech-to-text — converting what you say into text to send to the AI. It also provides the OpenWakeWord functionality if you want a custom wake word.

Wyoming Protocol integration

Build Your Assistant

Screenshot From 2026-05-23 08-52-52.png

In Home Assistant go to Settings → Voice assistants.

Click Add assistant, give it a name, and select your language.

Conversation agent

Under Conversation agent, select the OpenAI you set up earlier.

Personality

Click the cog beside the agent to open the Instructions box. This directs how the AI responds — anything from “be grumpy” to “respond in the style of Jules Winnfield”. Keep replies short; long ones feel sluggish through a speaker.

Speech-to-text and voice

For speech-to-text, select Whisper (faster-whisper).

For text-to-speech, choose ElevenLabs, then pick a voice.

There's a long list of voices — male, female, American, British, and more — so the assistant sounds exactly how you want.

Custom Wake Word

Screenshot From 2026-05-23 09-08-45.png

The wake word is what you say to make the assistant start listening. You get the usual options (Alexa, Hey Jarvis, Okay Nabu), but for your own custom word, use the OpenWakeWord training tool:

OpenWakeWord training (Google Colab)

Enter your chosen word in the target word box (use a word with two or more syllables). At the top, hit Run all — the tool generates thousands of synthetic voices saying the word and trains an AI to recognise it. It takes 20 minutes to an hour, then produces a file you upload to Home Assistant and select as your wake word.

Expose Your Smart Devices

The Expose tab lets you choose which smart devices the assistant can see — so it can control them or read information from them.

Not Just Wake Words: Automations

Screenshot From 2026-05-23 09-13-07.png
Screenshot From 2026-05-23 09-23-46.png

Here's what makes this genuinely interesting. You don't only need a wake word to talk to the assistant — you can trigger it from automations too. That's great, because it means the assistant can speak without you prompting it, reacting to your smart sensors, so it feels far more alive.

The key idea: the wake word is just one way to start the assistant. An automation is another. Fire a spoken line from an automation and it talks without anyone saying a thing.

Go to Settings → Automations & scenes.

Click Create automation. Set the trigger to whatever you like — a sensor changing, a door opening, or a time of day (for example, 07:30 for a unique spoken response every morning, like an alarm). Then set the action to Assist Satellite to start the conversation with the AI.

Ideas to steal

  1. Morning briefing — trigger on kitchen motion, condition to 6–9am, speak the weather.
  2. Washing's done — trigger on the machine going idle; have it nag you.
  3. Someone at the door — trigger on the doorbell; announce it, with commentary.
  4. Sunset — close the blinds and announce it.

Say your wake word, ask it something, and enjoy!