No Controller? No Problem! AI Vision Gaming
by Rafael Aldaz in Circuits > Raspberry Pi
151 Views, 2 Favorites, 0 Comments
No Controller? No Problem! AI Vision Gaming




Have you ever wanted to play a game, without using a controller, keyboard, or mouse, just with your hands?
That’s exactly what this project explores!
"No Controller? No Problem!" is an AI-powered vision gaming system that lets you play simple games using computer vision and hand gestures. For now, I’ve implemented a Rock-Paper-Scissors game, where the AI detects your hand shape in real time and plays against you. The goal is to create a fun, accessible, and immersive gaming experience, and possibly expand it to other games like Fruit Ninja or Flappy Bird.
This project combines AI, computer vision, and gaming — turning a basic webcam and a Raspberry Pi into an interactive game console.
It’s a fantastic way to learn about OpenCV and gesture recognition while building something that is fun and usable.
This Instructable is aimed at makers with some coding and electronics experience and anyone interested in learning about AI-powered interfaces. By the end of it, you'll have a working prototype of a gesture-controlled game, and lots of ideas to take it even further!
Supplies
Hardware
- Raspberry Pi 5
- 4 Push Buttons
- 16x2 LCD Display
- Buzzer
- Logitech Webcam (or any compatible USB webcam)
- Breadboard, jumper wires, resistors (for connecting buttons & buzzer)
- 3 Servo Motors MG996R
Software
- Visual Studio Code (VS Code)
- Python 3 (via venv environment)
- OpenCV library for Python
- Linux OS (running on the Raspberry Pi)
- HTML / CSS / JavaScript (optional, if you plan to add a web interface)
Recommended Setup
- A well-lit space to ensure good webcam image quality and gesture detection.
- A stable background behind your hand (helps improve computer vision accuracy).
See the following "Build Of Materials"
Downloads
Gathering a Dataset
Before training any AI model to recognize hand gestures, you first need a good dataset.
In this step, I’ll show you how I recorded videos of the gestures and transformed them into images for training.
1. Recording Videos
To create my dataset, I recorded 16 short videos using my phone and laptop camera at 1920x1080 resolution.
Each video lasted about 10–15 seconds and included one of the following gestures:
- ✊ Rock
- 🖐️ Paper
- ✌️ Scissors
Tips for recording:
- I used a white background and ensured the lighting was consistent and bright.
- I varied the position, angle, and scale of my hand slightly to add natural variation to the data.
2. Extracting Images from Videos
Once the videos were recorded, I wrote a Python script to extract individual frames from them.
- I configured the script to extract 5 frames per second from each video.
- The images were initially saved at the original video resolution (1920x1080).
- I ended up generating 688 images in total (across all gestures).
Here is a simplified version of the extraction process:
3. Preparing for Labeling
At this point, I had a folder full of images of Rock, Paper, and Scissors gestures.
Next step? Uploading them to Roboflow for labeling and augmentation, we’ll cover that in the next section.
Preparing and Augmenting the Dataset With Roboflow



After extracting the images, the next step was to label them and prepare them for training.
For this, I used Roboflow, a fantastic web tool that makes it easy to label, augment, and export datasets for computer vision projects.
1. Uploading Images
I uploaded the 688 images extracted from the videos into a new Object Detection project in Roboflow.
I used object detection because I wanted to give the model the ability to detect the gesture in an image, even if it appears at different positions or scales (vs. simple classification).
2. Defining Classes
I defined 3 classes in Roboflow:
- rock
- paper
- scissors
At this stage, I did not explicitly include a "background" class, but in practice the model will learn to ignore areas without a hand.
3. Labeling the Images
I manually drew bounding boxes around the hand in each image and labeled it as one of the three classes.
This process took some time, but accurate labeling is key for a good model!
4. Preprocessing and Augmentation
Once the labeling was complete, I used Roboflow to apply preprocessing and data augmentation:
Preprocessing
- Auto-orient: Applied
- Resize: 640×640 pixels (Roboflow stretched the original images to this size)
Augmentations:
- Horizontal flip
- Rotation: Between -15° and +15°
- Saturation: -25% to +25%
- Brightness: -15% to +15%
- Noise: up to ~0.89% of pixels
Each augmentation generated multiple variations of the training images, helping the model become more robust.
5. Dataset Split
Roboflow automatically split the dataset into:
- Training set: 1434 images
- Validation set: 137 images
- Test set: 68 images
Total dataset size after augmentations: 1639 images.
6. Exporting the Dataset
Finally, I exported the dataset in YOLOv5/YOLOv7/YOLOv8 format (YOLOv11 at first) for object detection:
- Image size: 640×640
- Format compatible with popular YOLO object detection frameworks.
Summary
At the end of this step, I had a large, augmented dataset ready to use for training an object detection model to recognize Rock, Paper, and Scissors gestures in real time.
In the next step, I’ll show how I trained the model, both on Roboflow and later locally with YOLOv8 for better results.
Training the Model


With the dataset prepared and labeled, the next step was to train an object detection model that could recognize gestures in real time.
1. First Training Run (Roboflow)
To quickly test the dataset and validate the labels, I first used Roboflow’s built-in Custom Train option:
- I selected the YOLOv11 training model on Roboflow.
- The model was trained directly in the cloud on my augmented dataset (9160 images).
This gave me an initial working model and allowed me to inspect its performance (you can see the confusion matrix screenshot).
👉 The initial results were promising — the model could correctly classify most gestures, but I wanted to improve robustness and accuracy.
2. Local Training with YOLOv8
For better performance and more control, I then switched to training locally using YOLOv8.
In this part, I used both Roboflow to prepare the dataset and YOLOv8 locally to train the model with more flexibility and performance.
2.1. Merging the Dataset
To improve model robustness, I merged my original dataset (688 images) with an existing dataset:
- Final merged dataset: 3812 images (before augmentation).
- I performed additional augmentation in Roboflow:
- Blur: up to 2.5 px
- Noise: up to 0.73% of pixels
- Brightness: between -15% and +15%
- Final dataset after augmentation: 9160 images.
2.2. Downloading the Dataset
To use the dataset locally with YOLOv8, I downloaded it using the Roboflow Python API:
- The dataset was exported in YOLOv8 format.
- Image size: 640×640 pixels.
- Labels were stored in the YOLO format (.txt files for each image).
2.3 Training the Model Locally with YOLOv8
For local training, I used YOLOv8n (the Nano version — fast and lightweight) via the Ultralytics library.
I wrote a Python script to configure and run the training:
2.4 Training Results
- I trained for 50 epochs on my local machine with a GPU.
- Final model achieved very good accuracy across all three classes:
- Class Accuracy (Confusion Matrix)
- Paper ~94%
- Rock ~95%
- Scissors ~95%
- Example Confusion Matrix (see image)
- Example Validation Batch Predictions (see image)
- The model performs well even with different backgrounds and lighting variations.
Summary
At the end of this step, I had a YOLOv8n object detection model trained locally, ready to be used in real-time with a webcam.
Next, the goal is to integrate it with OpenCV and build the actual gesture-controlled Rock-Paper-Scissors game!
Running the Model

1. Preparing the Model
- I used the best model from my training run:
- File: weights/best.pt
- Model type: YOLOv8n (Nano)
- The model was designed to detect Rock, Paper, and Scissors gestures.
2. Setting Up the Testing Script
I wrote a Python script to:
- Capture video from the webcam.
- Perform inference on each frame.
- Display the results in real time.
Here is a simplified overview of the workflow:
3. Displaying Results
To enhance the user experience:
- I added a top-center banner showing the last detected gesture (for 2 seconds).
- If no gesture was detected, the app showed the message "Show your hand gesture!".
Example display logic:
Summary
At the end of this step, I had a fully working prototype:
✅ YOLOv8-trained model
✅ OpenCV-based live camera integration
✅ Real-time gesture recognition and display
This was the foundation for building a gesture-controlled Rock-Paper-Scissors game, without using a controller!
Building the AI Vision Gaming Web App


Once the model was working in real time, I decided to give the project a visual interface, so I built a web-based game launcher using HTML, CSS, JavaScript, and Flask (Python).
This web app gives users an easy and fun way to access AI-powered games using just a webcam.
Currently, the only implemented game is Rock-Paper-Scissors, but I plan to add Flappy Bird and Snake soon!
1. Folder Structure
Here’s a quick overview of the project structure:
2. Home Screen: Game Launcher
The main menu shows a clean UI with three game options:
- 🪨 Rock-Paper-Scissors: Active and working!
- 🐤 Flappy Bird: Coming soon
- 🐍 Snake Game: Coming soon
The layout is responsive and visually engaging, using custom icons and soft background gradients.
3. Rock-Paper-Scissors Game Page
Clicking on the RPS game brings you to an interactive game screen:
Features:
- 🎥 Live webcam feed using JavaScript + WebRTC
- ✊✋✌️ Real-time gesture detection via backend Python/YOLOv8
- 🤖 AI makes a random move
- 🧠 The round is evaluated: win, lose, or draw
- 🧮 Score is tracked live (Player vs AI)
4. Backend Integration (Flask + YOLOv8)
The Python app.py file:
- Starts a Flask server
- Serves HTML templates and static files
- Handles webcam input or requests for model prediction
- Loads best.pt model using Ultralytics and processes frames in real time
The backend uses YOLOv8 to:
- Detect the hand gesture
- Return the class label and confidence to the front-end
- Let the JavaScript UI update based on the prediction
5. Technologies Used
- 🐍 Python + Flask (backend)
- 🧠 YOLOv8 (gesture recognition model)
- 🎨 HTML/CSS (interface styling)
- ⚙️ JavaScript (camera, game logic, dynamic updates)
- 🎥 WebRTC / getUserMedia (access webcam in-browser)
Summary
At this point, I had a fully working AI vision web app for playing Rock-Paper-Scissors using just gestures and a webcam.
It provides a clean user interface and real-time AI feedback without using any controllers, just your hands!
Designing and Building the Enclosure



To make the project feel like a real gaming console, not just a tangle of wires and components, I designed and built a custom laser-cut wooden enclosure. The result is a clean, functional, and stylish box that houses the Raspberry Pi, buttons, LCD, and buzzer.
1. Box Design Tools
To speed up the design process, I used two tools:
- 🧩 Boxes.py — to generate the basic interlocking box shape
- 🖌️ Adobe Illustrator — to customize dimensions, add cutouts, labels, and engravings
This allowed me to add:
- Custom holes for the buttons, LCD, and camera
- Slanted top for better visibility
- Logo and project title engraving
2. Laser Cutting
I used a laser cutter to create the pieces from wood. Here are a few important laser-cutting settings I followed:
- Cut lines: Red color (#FF0000)
- Stroke width: Exactly 0.025 mm (thin vector cut line)
- Engraving: Filled text and icons in black or grayscale
- Material: 8 mm plywood
3. Assembly & Aesthetics
Once the parts were cut, I assembled them by hand — no glue needed thanks to the tight interlocking edges. I engraved the front with:
"No Controller? No Problem! AI Vision Gaming"
Also engraved:
- Arrow indicators, a check and a home icon for the buttons
- Personal signature: Rafael Aldaz, June 2025
Features:
- 🎮 4 buttons (2 red, 1 yellow, 1 green)
- 🖥️ Center-aligned 16x2 LCD with a 3D-printed bezel
- 🔊 Internal buzzer
- 🧠 Space inside for the Raspberry Pi + wiring
4. LCD Mount
To give the LCD a clean and protected look, I 3D printed a custom bezel for it.
It aligns perfectly with the laser-cut hole and makes the build feel like a finished product.
Summary
Designing the case was a key part of turning this from a tech experiment into a real physical AI-powered game console. It gave the project:
- A portable and durable structure
- Clearly marked buttons and inputs
- An engraved identity and personal touch
Wiring the Electronics and Connecting to the Raspberry Pi
To give the game console physical interactivity, I wired 4 buttons, a buzzer, and a 16x2 LCD to the Raspberry Pi 5.
Instead of letting Python directly control the game logic, I created a WebSocket server that connects the hardware to the HTML/JS interface — so the browser reacts live when a button is pressed!
1. Components Used
- 🟢 4 Push Buttons:
- left: Scroll game menu left
- right: Scroll game menu right
- enter: Confirm selection / launch game
- home: Go back to main menu
- 🔊 Buzzer:
- Short sound for menu switches
- Long beep for selection or going back
- 🖥️ 16x2 LCD:
- Shows current game selection or status messages
2. GPIO Setup
Using the RPi.GPIO library in BCM mode, each button is set up with a pull-up resistor and listens for a FALLING edge (button press):
3. LCD Integration
I used a custom LCD Python class to initialize and control the display.
On each button press, the LCD updates the text to reflect the selected game or status:
4. Real-Time Web Communication (WebSockets)
The real magic comes from the WebSocket server I built using websockets and asyncio.
This allows bidirectional communication between the Raspberry Pi and the web frontend.
Here’s how it works:
- When a button is pressed on the Pi, the Python script:
- Updates the LCD
- Plays a buzzer sound
- Sends a JSON message to all connected browsers
- Example message sent:
On the frontend, JavaScript listens for these messages to update the game interface accordingly.
5. Sample WebSocket Flow
notify_button() — called when a GPIO button is triggered:
- Updates game index
- Shows the new selection on the LCD
- Plays a sound
- Sends a message to the web client
JavaScript in the browser:
6. Running the Server
The main() coroutine runs the WebSocket server on port 8765:
Use asyncio.run(main()) to start it, and GPIO.cleanup() on exit to reset the pins safely.
Summary
In this step, I successfully bridged physical hardware with a web-based AI game interface using:
✅ GPIO input for real buttons
✅ Real-time LCD updates
✅ Buzzer feedback
✅ WebSocket server to sync everything to the browser
The result is a real-world console experience powered by AI and computer vision, no controller needed!
Building the Robotic Hand to Show the AI’s Move



To make the AI more expressive, and fun, I added a robotic hand that performs the AI’s move in real time.
Instead of keeping everything on screen, I wanted the AI to physically throw rock, paper, or scissors, just like a real opponent.
1. Repurposed Toy Hand
I didn’t design the hand from scratch, instead, I used a kids' robotic hand toy and modified it to be servo-controlled.
- I mounted it to a wooden baseplate
- I engraved the base with:
RoboHand — Rafael Aldaz
2. Servo Setup
I used three MG996R servo motors to control finger groups:
- Servo 1: Controls the thumb
- Servo 2: Controls the index and middle fingers
- Servo 3: Controls the ring and pinky fingers
Each servo pulls a set of strings attached to its finger group.
3. Hand Gestures Mapping
When the AI selects its move (via the YOLOv8 model), a message is sent to the servos to form:
- 🪨 Rock: All fingers closed
- ✋ Paper: All fingers open
- ✌️ Scissors: Index + middle fingers open, others closed
4. Electronics & Integration
- The servos are controlled by a PCA9685 driver for reliable PWM output
- The Raspberry Pi sends commands to the servo controller whenever the AI makes a move
- The movement is perfectly synced with the web game interface via WebSockets
Summary
This final step brought the AI’s decisions into the physical world, turning it from a virtual opponent into a robotic player sitting across from you.
Whether it wins or loses, the hand makes it feel personal and real.
✅ Repurposed toy hand
✅ Controlled by 3 servos
✅ Reacts instantly to AI predictions
✅ Totally integrated with the AI game console
Final Demo, How to Run It, and What’s Next
After weeks of work, combining AI, vision, hardware, and web technologies, the project finally came together as a fully interactive, controller-free gaming console.
In this step, I’ll show you how everything runs together and share ideas for future improvements.
✅ Final System Demo
The completed setup includes:
- 🖥️ A clean web interface that acts as the game launcher and visual feedback hub
- 🎮 Physical buttons to navigate menus, select games, and interact
- 🔊 A buzzer and LCD for sound and status feedback
- 👁️ A camera-powered AI using YOLOv8 to detect hand gestures
- 🤖 A robotic hand that physically performs the AI’s move in Rock-Paper-Scissors
All components talk to each other via WebSockets, allowing seamless communication between the Raspberry Pi and the browser.
🟢 Example Flow:
- You press “Enter” on the console to start the Rock-Paper-Scissors game
- The webcam opens inside the browser
- You show a gesture with your hand
- The AI model detects it and makes its own move
- The robotic hand mimics the AI’s move, and the score updates!
▶️ How to Run the System
To run the full system, follow these steps:
1. 🧠 Start the AI Model Server (YOLOv8)
2. 🌐 Launch the Web App (Flask)
3. 🔌 On the Raspberry Pi: Run the GPIO Controller
Make sure:
- The LCD, buzzer, and buttons are wired correctly
- The servos and PCA9685 are powered and connected
- All scripts are in sync with the right ports and IPs
What’s Next?
This project was designed to be expandable, and here are some ideas I plan to work on:
- 🐤 Flappy Bird: Use vertical hand movements to control the bird’s flaps
- 🐍 Snake Game: Head movements or gestures to control the snake
- 🕹️ Multiplayer Mode: Play against another human using dual camera inputs
- 🧠 Model Improvements: Better gesture recognition in low light or cluttered backgrounds
- 🧩 Modular Console Design: Swappable front plates for different input methods (joystick, MPU6050, facial expressions)
🧠 What I Learned
This project brought together:
- Real-time AI inference
- GPIO electronics and LCD control
- 3D printing and laser cutting
- Async WebSocket communication
- UX design for browser-based games
But most of all, it showed me how creative tech can become when AI, hardware, and software all work together.
🏁 Conclusion
No Controller? No Problem! isn’t just a game, it’s a platform for rethinking how we interact with computers.
You don’t need a controller when you’ve got vision, AI, and a robotic hand as your gaming partner.
Thanks for reading, and feel free to remix, expand, or build your own version of this project!
Have Fun and Make It Yours!
You’ve reached the end of this Instructable, but the real fun is just beginning.
Whether you're building this for yourself, a school project, or just to explore the possibilities of AI and robotics, remember:
🎮 You don’t need a controller to play. You are the controller.
This project is meant to be hacked, customized, remixed, and shared. You can:
- Add your own games
- Use facial expressions or head gestures instead of hands
- Customize the sounds, visuals, and animations
- Even create your own robotic opponent!
So fire up the laser cutter, grab a servo or two, write some code, and have fun building something that's totally yours.
And if you do make your own version, I’d love to see it!
🚀 No Controller? No Problem! is more than a slogan — it's an invitation to invent.
Happy building! 🛠️🧠🎉