A DIY AI Voice Assistant for Home Assistant

Build your own — intelligent, custom personality, real voice, controls your smart home.

A voice assistant you actually own. It runs through Home Assistant, uses an AI model of your choice for the brain (ChatGPT, Gemini, or a fully local model via Ollama), takes a custom wake word, and it can speak and act on its own, with no wake word at all. Mine closes the blinds at sunset and announces it, usually with an insult, because I gave it the personality of someone who finds me exhausting.

This guide covers building your own from an ESP32, a microphone, and a small amplifier and speaker. Everything you need is here: firmware, wiring, the Home Assistant side, the personality, and a proper walkthrough of the automation trick that makes it feel alive.

See my video on the build

Supplies

Hardware build (~$19)

Most parts come with headers and wires pre-soldered; some suppliers leave the headers for you. Once headers are on the ESP32, the whole build is just female-to-female jumper wires between the pins. No breadboard needed.

Waveshare ESP32-S3 Zero ($11) — Amazon

8Ω 2W speaker, 28mm round ($1) — Amazon

MAX98357 I2S audio amplifier ($2) — Amazon

I2S MEMS microphone ($5) — Amazon

Jumper wires, female-to-female — Amazon

Before You Build: You Might Not Need To

The first part of this guide covers building a small, cheap device so you can speak to the assistant and hear its responses. The self-build is the fun, cheap, customisable route. But if you'd rather skip the soldering entirely, two ready-made devices work in a very similar way — just skip to Step 4.

Home Assistant Voice PE (~$69) — home-assistant.io/voice-pe

M5Stack ATOM Echo (~$35) — Amazon

Hardware

Wire the speaker to the amplifier, then the amplifier and the microphone to the ESP32. That's the whole job.

Microphone → ESP32-S3

Power: VDD / 3V → 3.3V
Ground: GND → GND
Channel select: SEL → GND
Word select: WS / LRC → GP1
Clock: SCK / BCLK → GP2
Data out: SD / DOUT → GP3

MAX98357A amplifier → ESP32-S3

Vin → 5V
GND → GND
LRC → GP4
BCLK → GP5
DIN → GP6
GAIN → leave unconnected
SD → leave unconnected

Flash the Firmware

Use ESPHome to put the firmware on the board. Plug the ESP32 into your computer and go to web.esphome.io, or use the ESPHome Device Builder add-on inside Home Assistant.

web.esphome.io

Inside Home Assistant, make sure the ESPHome integration is installed.

ESPHome integration docs

In Home Assistant, install the ESPHome Device Builder add-on and open it.
+ New Device → name it voice-assistant → choose ESP32-S3 → Skip.
Click Edit and paste in voice-assistant.yaml, replacing everything.

esphome:

friendly_name: Basic Voice Assistant

esp32:

board: waveshare_esp32_s3_zero

framework:

type: esp-idf

psram:

mode: quad

speed: 80MHz

logger:

level: INFO

api:

ota:

- platform: esphome

password: !secret ota_password

wifi:

ssid: !secret wifi_ssid

password: !secret wifi_password

i2s_audio:

- id: i2s_mic

i2s_lrclk_pin: GPIO2

i2s_bclk_pin: GPIO1

- id: i2s_spk

i2s_lrclk_pin: GPIO5

i2s_bclk_pin: GPIO4

microphone:

- platform: i2s_audio

adc_type: external

pdm: false

id: mic_i2s

channel: left

bits_per_sample: 32bit

i2s_audio_id: i2s_mic

i2s_din_pin: GPIO3

speaker:

- platform: i2s_audio

id: i2s_speaker

dac_type: external

i2s_dout_pin: GPIO6

i2s_audio_id: i2s_spk

sample_rate: 16000

bits_per_sample: 16bit

channel: mono

buffer_duration: 500ms

media_player:

- platform: speaker

id: media_out

announcement_pipeline:

speaker: i2s_speaker

format: MP3

num_channels: 2

light:

- platform: esp32_rmt_led_strip

id: status_led

pin: GPIO21

num_leds: 1

chipset: WS2812

rgb_order: GRB

effects:

- pulse:

transition_length: 0.5s

update_interval: 0.5s

binary_sensor:

- platform: status

id: api_connection

filters:

- delayed_on: 1s

on_press:

- if:

condition:

switch.is_on: use_wake_word

then:

- voice_assistant.start_continuous:

- script.execute: led_idle

on_release:

- if:

condition:

switch.is_on: use_wake_word

then:

- voice_assistant.stop:

- script.execute: led_off

script:

- id: led_idle

then:

- light.turn_on:

id: status_led

brightness: 30%

red: 0%

green: 0%

blue: 100%

effect: "Pulse"

- id: led_listening

then:

- light.turn_on:

id: status_led

brightness: 100%

red: 0%

green: 100%

blue: 100%

effect: "None"

- id: led_thinking

then:

- light.turn_on:

id: status_led

brightness: 100%

red: 100%

green: 80%

blue: 0%

effect: "None"

- id: led_speaking

then:

- light.turn_on:

id: status_led

brightness: 100%

red: 0%

green: 100%

blue: 0%

effect: "None"

- id: led_error

then:

- light.turn_on:

id: status_led

brightness: 100%

red: 100%

green: 0%

blue: 0%

effect: "None"

- id: led_off

then:

- light.turn_off:

id: status_led

- id: restart_va

then:

- delay: 500ms

- if:

condition:

switch.is_on: use_wake_word

then:

- voice_assistant.start_continuous:

- script.execute: led_idle

voice_assistant:

microphone: mic_i2s

media_player: media_out

id: va

noise_suppression_level: 2

auto_gain: 31dBFS

volume_multiplier: 4.0

use_wake_word: true

on_wake_word_detected:

- script.execute: led_listening

on_listening:

- script.execute: led_listening

on_stt_vad_start:

- script.execute: led_listening

on_stt_end:

- script.execute: led_thinking

on_tts_start:

- script.execute: led_speaking

on_end:

- wait_until:

condition:

speaker.is_playing: i2s_speaker

timeout: 10s

- wait_until:

condition:

not:

- speaker.is_playing: i2s_speaker

timeout: 120s

- delay: 300ms

- if:

condition:

switch.is_on: use_wake_word

then:

- script.execute: led_idle

else:

- script.execute: led_off

on_error:

- script.execute: led_error

- delay: 2s

- script.execute: restart_va

on_client_connected:

- if:

condition:

switch.is_on: use_wake_word

then:

- voice_assistant.start_continuous:

- script.execute: led_idle

on_client_disconnected:

- if:

condition:

switch.is_on: use_wake_word

then:

- voice_assistant.stop:

- script.execute: led_off

switch:

- platform: template

id: use_wake_word

optimistic: true

restore_mode: RESTORE_DEFAULT_ON

entity_category: config

on_turn_on:

- lambda: id(va).set_use_wake_word(true);

- if:

condition:

not:

- voice_assistant.is_running

then:

- voice_assistant.start_continuous

- script.execute: led_idle

on_turn_off:

- voice_assistant.stop

- lambda: id(va).set_use_wake_word(false);

- script.execute: led_off

Click Secrets (top right) and fill in your WiFi name and passwords.

# Your Wi-Fi SSID and password

wifi_ssid: "[WIFI NETWORK NAME]"

wifi_password: "[PASSWORD]"

# OTA update password (used by the basic-voice-assistant config)

ota_password: "[PASSWORD]" #Choose any password you like

Install → Plug into this computer for the first flash. Hold the BOOT button while connecting USB if it isn't detected. Future updates go over the air.

Install the Integrations (ChatGPT)

From here on, everything is the same whether you built the device or bought a pre-made one. You'll add three things: an AI brain, a voice, and the speech plumbing.

The AI brain (ChatGPT)

This gives genuinely intelligent conversation and responses. Many agents work — Gemini, Claude, etc. I use ChatGPT as it's easiest to set up.

OpenAI Conversation integration

Go to openai.com/api, create an account, and open the API platform. In the API keys tab, click Create new secret key.

In Home Assistant under Settings → Devices & Services, open the OpenAI integration, add the service, and paste the API key when prompted.

Install the Integrations (ElevenLabs)

The voice (ElevenLabs text-to-speech)

Text-to-speech turns the AI's text reply into a realistic voice. There are many options; I use ElevenLabs for its thousands of high-quality voices.

ElevenLabs integration — install it.

Go to ElevenLabs and create an account.

Open the API keys tab and click Create key.

Enter this key in Home Assistant when you open the ElevenLabs integration and click Add service.

Install the Integrations (Wyoming Protocol)

Speech-to-text and Open Wake Word (Wyoming Protocol)

The Wyoming Protocol integration handles speech-to-text — converting what you say into text to send to the AI. It also provides the OpenWakeWord functionality if you want a custom wake word.

Wyoming Protocol integration

Build Your Assistant

In Home Assistant go to Settings → Voice assistants.

Click Add assistant, give it a name, and select your language.

Conversation agent

Under Conversation agent, select the OpenAI you set up earlier.

Personality

Click the cog beside the agent to open the Instructions box. This directs how the AI responds — anything from “be grumpy” to “respond in the style of Jules Winnfield”. Keep replies short; long ones feel sluggish through a speaker.

Speech-to-text and voice

For speech-to-text, select Whisper (faster-whisper).

For text-to-speech, choose ElevenLabs, then pick a voice.

There's a long list of voices — male, female, American, British, and more — so the assistant sounds exactly how you want.

Custom Wake Word

The wake word is what you say to make the assistant start listening. You get the usual options (Alexa, Hey Jarvis, Okay Nabu), but for your own custom word, use the OpenWakeWord training tool:

OpenWakeWord training (Google Colab)

Enter your chosen word in the target word box (use a word with two or more syllables). At the top, hit Run all — the tool generates thousands of synthetic voices saying the word and trains an AI to recognise it. It takes 20 minutes to an hour, then produces a file you upload to Home Assistant and select as your wake word.

Expose Your Smart Devices

The Expose tab lets you choose which smart devices the assistant can see — so it can control them or read information from them.

Not Just Wake Words: Automations

Here's what makes this genuinely interesting. You don't only need a wake word to talk to the assistant — you can trigger it from automations too. That's great, because it means the assistant can speak without you prompting it, reacting to your smart sensors, so it feels far more alive.

The key idea: the wake word is just one way to start the assistant. An automation is another. Fire a spoken line from an automation and it talks without anyone saying a thing.

Go to Settings → Automations & scenes.

Click Create automation. Set the trigger to whatever you like — a sensor changing, a door opening, or a time of day (for example, 07:30 for a unique spoken response every morning, like an alarm). Then set the action to Assist Satellite to start the conversation with the AI.

Ideas to steal

Morning briefing — trigger on kitchen motion, condition to 6–9am, speak the weather.
Washing's done — trigger on the machine going idle; have it nag you.
Someone at the door — trigger on the doorbell; announce it, with commentary.
Sunset — close the blinds and announce it.

Say your wake word, ask it something, and enjoy!