AI-Powered Piano Trainer: Learn Songs With Real-Time Feedback

Hi!

My name is Ada López, and I am a first year student in Creative Technologies & AI at Howest University in Kortrijk. This project represents my first complete solo integration of AI and hardware. My AI-Powered Piano Trainer is an interactive system designed to help beginners learn to play the piano by offering real-time feedback and tracking their performance across a predefined song.

The core idea is simple: as the user plays "Twinkle Twinkle Little Star" on a piano keyboard, a USB camera connected to my Raspberry Pi and mounted above the keys captures each press. These video frames are streamed in real time to a laptop, where a YOLOv8 object detection model trained entirely by me detects which key is being pressed. The predicted key is then sent back to my Raspberry Pi, which compares it with the expected sequence and displays the result on an I2C-connected LCD. At the end of the song, the trainer gives a final accuracy score and even plays a melody with a buzzer to signal that the session has ended.

Let me take you through the entire journey of building this, from training the AI model and designing the enclosure to building the feedback loop and solving real-time streaming issues. It was a huge experience filled with debugging, breakthroughs, and lots of late-night testing.

Supplies

Here's a full list of everything I used to build my AI-Powered Piano Trainer. Some of these items were reused from school kits or things I already owned, so your total cost might be lower depending on what you have on hand.

Electronics & Hardware

Raspberry Pi 8GB - €88.95
USB Webcam - €23.39
Freenove Projects Kit - €52.59
Laptop - owned
Piano Keyboard - owned

Maker Tools & Enclosure Materials

x4 multiplex 4mm sheets - €18
Laser Cutting (done at school) - €5
LED strip - €7

Free Software & Libraries

MakerCase + Inkscape - To design and customize the wooden enclosure.
Roboflow (free account) - Used to create, label, and train custom key detection dataset.
YOLOv8 (via ultralytics library) - Lightweight object detection model used for inference.
Python Libraries:
opencv-python - For video capture and visualization.
socket - for real-time communication between Pi and laptop.
smbus - To control the I2C LCD.
csv, os, datetime - For logging and file handling.
playsound - To play the song preview.

Downloads

BOM_bill-of-materials.pdf

Creating a Custom Dataset

For my detection model to work in my specific setup, I couldn't rely on existing datasets. Most public datasets either show full keyboards or are trained for generic object detection. I needed high-quality, focused images that captured my keyboard.

I started by taking pictures of key presses and made sure to cover all eight labels: c_pressed, d_pressed, e_pressed, f_pressed, g_pressed, a_pressed, b_pressed, and one for no press. Then I uploaded them into Roboflow and started manually annotating them.

In total, I labeled around 200 images by hand using Roboflow's annotation tool. I made sure to include a range of hand sizes, lighting conditions, and slight variations in angle. This helped the model generalize better to real world use.

Training the Model

Once the dataset was annotated and uploaded, I moved on to training the model. I used Roboflow's web interface and trained a YOLOv8 object detection model. I chose YOLOv8 for its balance between speed and accuracy, essential for real-time detection.

I started with basic augmentation, like flipping, brightness adjustment, and slight cropping. These helped improve generalization, especially since my dataset was relatively small. I trained the model for 100 epochs using Roboflow's internal tools. During training, I monitored metrics like mAP@50, precision and recall.

The last version (v4) of my model achieved a mAP@50 of 99.5%, with 90.4% precision and 100% recall. I exported this model using Roboflow's inference_sdk in Python format and prepared it for local inference on my laptop. Since my Raspberry Pi couldn't handle YOLOv8, the Pi only streams video and handles feedback, while my laptop handles all detection.

Building the Real-Time Feedback Loop

The feedback loop is the heart of my project. My Raspberry Pi captures video using a USB webcam and sends frames over a socket connection to the laptop. On the laptop, the frames are received and passed into the trained YOLOv8 model for inference. The result, a label like c_pressed, is then sent back to the Raspberry Pi through a second socket.

On the Pi, another script receives the predicted label and compares it against the current expected note in the song. It uses a simple list of 14 notes representing the melody of "Twinkle Twinkle Little Star". Each correct match is counted. At the end of the song, the Pi calculates an accuracy percentage and displays the final result on the LCD.

Displaying Results on an LCD

WhatsApp Image 2025-06-16 at 13.33.50_3c24bc23.jpg

The LCD feedback was essential to making the experience feel interactive and complete. I used a 16x2 LCD screen with an I2C interface, connected to the Pi through GPIO pins. I wrote a custom Python class called LCD.py, which handles bitwise communication with the screen using the smbus library.

The LCD shows each detected key in real time on the first line and displays messages like "Song Finished!" and the final accuracy score on the second line. The class includes low-level functions for sending characters, strings, and special commands like clearing the screen or setting the cursor. Writing this from scratch was both fun and educational, I learned a lot about I2C addressing and bit manipulation.

Downloads

LCD.py

Play a Song Preview

To help users recall the melody before starting, I added a feature to optionally play a preview of "Twinkle Twinkle Little Star". When the program starts, it prompts the user with: "🎧Do you want to hear the song first? (y/n)" If the user says yes, a short .mp3 file is played using the playsound Python package. This adds a nice musical touch to the experience and helps reinforce learning.

The audio file is stored in an assets/ directory and is played on the laptop before streaming begins. This small feature helped increase the immersion and friendliness of the tool, especially for beginner players.

Buzzer Melody to Signal Completion

Since my enclosure is closed and doesn't allow for external LED to be visible, I decided to use a buzzer to signal the end of the session. I connected the buzzer to GPIO18 and played a simple melody using PWM.

This not only gave an audible cue that the song had ended, but also made the whole experience feel more dynamic. The buzzer triggers once all 14 notes have been processed and the accuracy has been displayed.

Live Prediction Visuals and Debugging Tools

I used cv2.imshow() to open a window on the laptop side to show the webcam input with overlaid prediction. This window shows the current detected key, bounding boxes and an FPS counter. This also helped me troubleshoot slow frame rates and ensure the model was behaving as expected in real time.

It also allowed me to test how the model performed in lower lighting, different hand positions, and various background conditions.

Logging and Accuracy Tracking

Every session logs predictions to a CSV file. Each row contains a timestamp, the expected key, the detected key, and whether it matched. This was helpful for identifying when the model made mistakes and for tracking improvements across model versions.

The Pi stores logs in a logs/ folder inside the RPi/ directory. I also wrote helper functions to flush the socket buffer before each session to avoid ghost predictions from previous runs.

Replay Function and Reset

Once the full 14-note sequence has been completed, the Pi calculates the final accuracy and displays it on the LCD. At that point, the buzzer plays its melody, and the user is asked whether they want to play again.

If the user inputs 'y', the session resets: the LCD is cleared, the socket buffer is flushed, and everything starts from the beginning. This makes it easy to practice multiple rounds and test improvements.

Project Structure and GitHub

The codebase is split into clearly named folders:

AI/: Contains the laptop-side inference script and song preview code.
RPi/: Includes the LCD class, socket client, and logging logic and a launcher script (run_all_pi.py) that starts both Pi-side scripts in parallel for convenience.
assets/: Stores the audio preview file.
drafts/: Unused experimental code.

The project includes a README.md for general instructions and a feedforward.md with reflections on the process.

You can check out my GitHub here.

Designing the Wooden Enclosure

Before I could build anything, I had to make sure the camera would be positioned at the right height and angle. I measured that the optimal height for capturing the keyboard was between 40 and 45 cm from the base. With that in mind, I created a 3D box using MakerCase with finger joints. Then, I imported the design into Inkscape to customize it further.

I added precise cutouts for the USB webcam (4.5x2 cm) and LCD display (8x3.6 cm), and made some creative additions too: I extended the bottom front panel so it could hold a music sheet, and also extended the top to make space for gluing an LED strip. This extra space made the setup more functional and visually interesting.

Building the Wooden Enclosure

Once the design was ready, I exported it and laser-cut the parts at school using 4mm multiplex sheets. After all the pieces were cut, I assembled them using wood glue and I added hand-drawn details to make it more personal. The finger joints made alignment easy, and the final structure was solid and stable.

There's a hole on the side for cables to exit, and I used the extended panels just as I planned: one for placing a music sheet while playing, and the other to attach an LED strip to light up the music sheet.

Final Thoughts

WhatsApp Image 2025-06-19 at 11.31.26_3f2bcd23.jpg

WhatsApp Image 2025-06-19 at 11.31.27_d3aa0622.jpg

This project helped me grow as both a developer and a maker. I learned how to collect and annotate a dataset, train a model, debug real-time systems, use bitwise operations for LCD control, and build a physical prototype.

It's not flawless, sometimes fast or soft presses go undetected, but it works reliably when played at a steady rhythm. Most importantly, it's a fun and motivating way to practice piano and explore the intersection of music, AI, and hardware. I'm incredibly proud of this first solo project and excited to keep learning from here.