No Controller? No Problem! AI Vision Gaming

by Rafael Aldaz in Circuits > Raspberry Pi

151 Views, 2 Favorites, 0 Comments

No Controller? No Problem! AI Vision Gaming

Have you ever wanted to play a game, without using a controller, keyboard, or mouse, just with your hands?

That’s exactly what this project explores!

"No Controller? No Problem!" is an AI-powered vision gaming system that lets you play simple games using computer vision and hand gestures. For now, I’ve implemented a Rock-Paper-Scissors game, where the AI detects your hand shape in real time and plays against you. The goal is to create a fun, accessible, and immersive gaming experience, and possibly expand it to other games like Fruit Ninja or Flappy Bird.

This project combines AI, computer vision, and gaming — turning a basic webcam and a Raspberry Pi into an interactive game console.

It’s a fantastic way to learn about OpenCV and gesture recognition while building something that is fun and usable.

This Instructable is aimed at makers with some coding and electronics experience and anyone interested in learning about AI-powered interfaces. By the end of it, you'll have a working prototype of a gesture-controlled game, and lots of ideas to take it even further!

Supplies

Hardware

Raspberry Pi 5
4 Push Buttons
16x2 LCD Display
Buzzer
Logitech Webcam (or any compatible USB webcam)
Breadboard, jumper wires, resistors (for connecting buttons & buzzer)
3 Servo Motors MG996R

Software

Visual Studio Code (VS Code)
Python 3 (via venv environment)
OpenCV library for Python
Linux OS (running on the Raspberry Pi)
HTML / CSS / JavaScript (optional, if you plan to add a web interface)

Recommended Setup

A well-lit space to ensure good webcam image quality and gesture detection.
A stable background behind your hand (helps improve computer vision accuracy).

See the following "Build Of Materials"

Downloads

BOM_bill-of-materials_RafaelAldaz.pdf

Gathering a Dataset

Before training any AI model to recognize hand gestures, you first need a good dataset.

In this step, I’ll show you how I recorded videos of the gestures and transformed them into images for training.

1. Recording Videos

To create my dataset, I recorded 16 short videos using my phone and laptop camera at 1920x1080 resolution.

Each video lasted about 10–15 seconds and included one of the following gestures:

✊ Rock
🖐️ Paper
✌️ Scissors

Tips for recording:

I used a white background and ensured the lighting was consistent and bright.
I varied the position, angle, and scale of my hand slightly to add natural variation to the data.

2. Extracting Images from Videos

Once the videos were recorded, I wrote a Python script to extract individual frames from them.

I configured the script to extract 5 frames per second from each video.
The images were initially saved at the original video resolution (1920x1080).
I ended up generating 688 images in total (across all gestures).

Here is a simplified version of the extraction process:

import cv2

import os

video_path = "my_video.mp4"

output_dir = "output_images/"

fps_to_extract = 5

cap = cv2.VideoCapture(video_path)

fps = cap.get(cv2.CAP_PROP_FPS)

frame_interval = int(fps / fps_to_extract)

frame_count = 0

saved_count = 0

os.makedirs(output_dir, exist_ok=True)

while True:

ret, frame = cap.read()

if not ret:

break

if frame_count % frame_interval == 0:

cv2.imwrite(f"{output_dir}/frame_{saved_count}.jpg", frame)

saved_count += 1

frame_count += 1

cap.release()

3. Preparing for Labeling

At this point, I had a folder full of images of Rock, Paper, and Scissors gestures.

Next step? Uploading them to Roboflow for labeling and augmentation, we’ll cover that in the next section.

Preparing and Augmenting the Dataset With Roboflow

After extracting the images, the next step was to label them and prepare them for training.

For this, I used Roboflow, a fantastic web tool that makes it easy to label, augment, and export datasets for computer vision projects.

1. Uploading Images

I uploaded the 688 images extracted from the videos into a new Object Detection project in Roboflow.

I used object detection because I wanted to give the model the ability to detect the gesture in an image, even if it appears at different positions or scales (vs. simple classification).

2. Defining Classes

I defined 3 classes in Roboflow:

rock
paper
scissors

At this stage, I did not explicitly include a "background" class, but in practice the model will learn to ignore areas without a hand.

3. Labeling the Images

I manually drew bounding boxes around the hand in each image and labeled it as one of the three classes.

This process took some time, but accurate labeling is key for a good model!

4. Preprocessing and Augmentation

Once the labeling was complete, I used Roboflow to apply preprocessing and data augmentation:

Preprocessing

Auto-orient: Applied
Resize: 640×640 pixels (Roboflow stretched the original images to this size)

Augmentations:

Horizontal flip
Rotation: Between -15° and +15°
Saturation: -25% to +25%
Brightness: -15% to +15%
Noise: up to ~0.89% of pixels

Each augmentation generated multiple variations of the training images, helping the model become more robust.

5. Dataset Split

Roboflow automatically split the dataset into:

Training set: 1434 images
Validation set: 137 images
Test set: 68 images

Total dataset size after augmentations: 1639 images.

6. Exporting the Dataset

Finally, I exported the dataset in YOLOv5/YOLOv7/YOLOv8 format (YOLOv11 at first) for object detection:

Image size: 640×640
Format compatible with popular YOLO object detection frameworks.

Summary

At the end of this step, I had a large, augmented dataset ready to use for training an object detection model to recognize Rock, Paper, and Scissors gestures in real time.

In the next step, I’ll show how I trained the model, both on Roboflow and later locally with YOLOv8 for better results.

Training the Model

With the dataset prepared and labeled, the next step was to train an object detection model that could recognize gestures in real time.

1. First Training Run (Roboflow)

To quickly test the dataset and validate the labels, I first used Roboflow’s built-in Custom Train option:

I selected the YOLOv11 training model on Roboflow.
The model was trained directly in the cloud on my augmented dataset (9160 images).

This gave me an initial working model and allowed me to inspect its performance (you can see the confusion matrix screenshot).

👉 The initial results were promising — the model could correctly classify most gestures, but I wanted to improve robustness and accuracy.

2. Local Training with YOLOv8

For better performance and more control, I then switched to training locally using YOLOv8.

In this part, I used both Roboflow to prepare the dataset and YOLOv8 locally to train the model with more flexibility and performance.

2.1. Merging the Dataset

To improve model robustness, I merged my original dataset (688 images) with an existing dataset:

Final merged dataset: 3812 images (before augmentation).
I performed additional augmentation in Roboflow:
Blur: up to 2.5 px
Noise: up to 0.73% of pixels
Brightness: between -15% and +15%
Final dataset after augmentation: 9160 images.

2.2. Downloading the Dataset

To use the dataset locally with YOLOv8, I downloaded it using the Roboflow Python API:

from roboflow import Roboflow

rf = Roboflow(api_key="my-api-key")

project = rf.workspace("my-workspace").project("rps-merged")

version = project.version(2)

dataset = version.download("yolov8")

print(f"Dataset downloaded to: {dataset.location}")

The dataset was exported in YOLOv8 format.
Image size: 640×640 pixels.
Labels were stored in the YOLO format (.txt files for each image).

2.3 Training the Model Locally with YOLOv8

For local training, I used YOLOv8n (the Nano version — fast and lightweight) via the Ultralytics library.

I wrote a Python script to configure and run the training:

from ultralytics import YOLO

# Paths and settings

DATA_YAML_PATH = 'rps-merged-2/data.yaml'

MODEL_TYPE = 'yolov8n.pt'

EPOCHS = 50

BATCH_SIZE = 16

IMAGE_SIZE = 640

NAME = 'rps_model_v1'

DEVICE = 0 # GPU if available

# Load pretrained model

model = YOLO(MODEL_TYPE)

print(f"Loaded pre-trained model: {MODEL_TYPE}")

# Train the model

results = model.train(

data=DATA_YAML_PATH,

epochs=EPOCHS,

imgsz=IMAGE_SIZE,

batch=BATCH_SIZE,

name=NAME,

device=DEVICE,

)

print("Training complete!")

print(f"The best model is saved as: runs/detect/{NAME}/weights/best.pt")

# Evaluate the model

metrics = model.val()

print(f"Validation metrics: {metrics.results_dict}")

2.4 Training Results

I trained for 50 epochs on my local machine with a GPU.
Final model achieved very good accuracy across all three classes:
Class Accuracy (Confusion Matrix)
Paper ~94%
Rock ~95%
Scissors ~95%

Example Confusion Matrix (see image)
Example Validation Batch Predictions (see image)
The model performs well even with different backgrounds and lighting variations.

Summary

At the end of this step, I had a YOLOv8n object detection model trained locally, ready to be used in real-time with a webcam.

Next, the goal is to integrate it with OpenCV and build the actual gesture-controlled Rock-Paper-Scissors game!

Running the Model

1. Preparing the Model

I used the best model from my training run:
File: weights/best.pt
Model type: YOLOv8n (Nano)
The model was designed to detect Rock, Paper, and Scissors gestures.

2. Setting Up the Testing Script

I wrote a Python script to:

Capture video from the webcam.
Perform inference on each frame.
Display the results in real time.

Here is a simplified overview of the workflow:

from ultralytics import YOLO

import cv2

import time

# Load the trained model

model = YOLO('weights/best.pt')

# Initialize webcam

cap = cv2.VideoCapture(0)

while True:

ret, frame = cap.read()

frame = cv2.flip(frame, 1) # Mirror effect

# Run YOLOv8 inference

results = model.predict(source=frame, conf=0.65, iou=0.5, stream=False, verbose=False)

# Process detections

for r in results:

for box in r.boxes:

conf = float(box.conf[0])

cls_id = int(box.cls[0])

label = model.names[cls_id]

# Draw bounding box and label

x1, y1, x2, y2 = map(int, box.xyxy[0])

cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)

cv2.putText(frame, f'{label} {conf:.2f}', (x1, y1 - 10),

cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)

# Display frame

cv2.imshow('Rock-Paper-Scissors AI Demo', frame)

# Exit on 'q'

if cv2.waitKey(1) & 0xFF == ord('q'):

break

# Cleanup

cap.release()

cv2.destroyAllWindows()

3. Displaying Results

To enhance the user experience:

I added a top-center banner showing the last detected gesture (for 2 seconds).
If no gesture was detected, the app showed the message "Show your hand gesture!".

Example display logic:

if detected_gesture:

# Show detected gesture (with timer)

cv2.putText(frame, f"AI Sees: {last_prediction.upper()}", ...)

else:

# Show default prompt

cv2.putText(frame, "Show your hand gesture!", ...)

Summary

At the end of this step, I had a fully working prototype:

✅ YOLOv8-trained model

✅ OpenCV-based live camera integration

✅ Real-time gesture recognition and display

This was the foundation for building a gesture-controlled Rock-Paper-Scissors game, without using a controller!

Building the AI Vision Gaming Web App

Once the model was working in real time, I decided to give the project a visual interface, so I built a web-based game launcher using HTML, CSS, JavaScript, and Flask (Python).

This web app gives users an easy and fun way to access AI-powered games using just a webcam.

Currently, the only implemented game is Rock-Paper-Scissors, but I plan to add Flappy Bird and Snake soon!

1. Folder Structure

Here’s a quick overview of the project structure:

ai-game-webapp/

├── models/

│ └── best.pt # YOLOv8 trained model

├── static/

│ ├── css/

│ │ ├── style.css

│ │ └── rps_game.css

│ └── images/

│ ├── rock.png / paper.png / scissors.png

│ ├── bird.png / snake.png / question.png

├── templates/

│ ├── index.html # Main menu

│ ├── rps_game.html # Rock-Paper-Scissors game UI

└── app.py # Main Flask app

2. Home Screen: Game Launcher

The main menu shows a clean UI with three game options:

🪨 Rock-Paper-Scissors: Active and working!
🐤 Flappy Bird: Coming soon
🐍 Snake Game: Coming soon

The layout is responsive and visually engaging, using custom icons and soft background gradients.

3. Rock-Paper-Scissors Game Page

Clicking on the RPS game brings you to an interactive game screen:

Features:

🎥 Live webcam feed using JavaScript + WebRTC
✊✋✌️ Real-time gesture detection via backend Python/YOLOv8
🤖 AI makes a random move
🧠 The round is evaluated: win, lose, or draw
🧮 Score is tracked live (Player vs AI)

4. Backend Integration (Flask + YOLOv8)

The Python app.py file:

Starts a Flask server
Serves HTML templates and static files
Handles webcam input or requests for model prediction
Loads best.pt model using Ultralytics and processes frames in real time

The backend uses YOLOv8 to:

Detect the hand gesture
Return the class label and confidence to the front-end
Let the JavaScript UI update based on the prediction

5. Technologies Used

🐍 Python + Flask (backend)
🧠 YOLOv8 (gesture recognition model)
🎨 HTML/CSS (interface styling)
⚙️ JavaScript (camera, game logic, dynamic updates)
🎥 WebRTC / getUserMedia (access webcam in-browser)

Summary

At this point, I had a fully working AI vision web app for playing Rock-Paper-Scissors using just gestures and a webcam.

It provides a clean user interface and real-time AI feedback without using any controllers, just your hands!

Designing and Building the Enclosure

To make the project feel like a real gaming console, not just a tangle of wires and components, I designed and built a custom laser-cut wooden enclosure. The result is a clean, functional, and stylish box that houses the Raspberry Pi, buttons, LCD, and buzzer.

1. Box Design Tools

To speed up the design process, I used two tools:

🧩 Boxes.py — to generate the basic interlocking box shape
🖌️ Adobe Illustrator — to customize dimensions, add cutouts, labels, and engravings

This allowed me to add:

Custom holes for the buttons, LCD, and camera
Slanted top for better visibility
Logo and project title engraving

2. Laser Cutting

I used a laser cutter to create the pieces from wood. Here are a few important laser-cutting settings I followed:

Cut lines: Red color (#FF0000)
Stroke width: Exactly 0.025 mm (thin vector cut line)
Engraving: Filled text and icons in black or grayscale
Material: 8 mm plywood

3. Assembly & Aesthetics

Once the parts were cut, I assembled them by hand — no glue needed thanks to the tight interlocking edges. I engraved the front with:

"No Controller? No Problem! AI Vision Gaming"

Also engraved:

Arrow indicators, a check and a home icon for the buttons
Personal signature: Rafael Aldaz, June 2025

Features:

🎮 4 buttons (2 red, 1 yellow, 1 green)
🖥️ Center-aligned 16x2 LCD with a 3D-printed bezel
🔊 Internal buzzer
🧠 Space inside for the Raspberry Pi + wiring

4. LCD Mount

To give the LCD a clean and protected look, I 3D printed a custom bezel for it.

It aligns perfectly with the laser-cut hole and makes the build feel like a finished product.

Summary

Designing the case was a key part of turning this from a tech experiment into a real physical AI-powered game console. It gave the project:

A portable and durable structure
Clearly marked buttons and inputs
An engraved identity and personal touch

Wiring the Electronics and Connecting to the Raspberry Pi

To give the game console physical interactivity, I wired 4 buttons, a buzzer, and a 16x2 LCD to the Raspberry Pi 5.

Instead of letting Python directly control the game logic, I created a WebSocket server that connects the hardware to the HTML/JS interface — so the browser reacts live when a button is pressed!

1. Components Used

🟢 4 Push Buttons:
left: Scroll game menu left
right: Scroll game menu right
enter: Confirm selection / launch game
home: Go back to main menu
🔊 Buzzer:
Short sound for menu switches
Long beep for selection or going back
🖥️ 16x2 LCD:
Shows current game selection or status messages

2. GPIO Setup

Using the RPi.GPIO library in BCM mode, each button is set up with a pull-up resistor and listens for a FALLING edge (button press):

BUTTON_PINS = {

"left": 26,

"right": 16,

"enter": 21,

"home": 20

}

for pin in BUTTON_PINS.values():

GPIO.setup(pin, GPIO.IN, pull_up_down=GPIO.PUD_UP)

3. LCD Integration

I used a custom LCD Python class to initialize and control the display.

On each button press, the LCD updates the text to reflect the selected game or status:

lcd = LCD()

lcd.init()

lcd.clear()

GAMES = ["RPS", "Flappy", "Snake"]

4. Real-Time Web Communication (WebSockets)

The real magic comes from the WebSocket server I built using websockets and asyncio.

This allows bidirectional communication between the Raspberry Pi and the web frontend.

Here’s how it works:

When a button is pressed on the Pi, the Python script:
Updates the LCD
Plays a buzzer sound
Sends a JSON message to all connected browsers
Example message sent:

{ "event": "button_press", "button": "left" }

On the frontend, JavaScript listens for these messages to update the game interface accordingly.

5. Sample WebSocket Flow

notify_button() — called when a GPIO button is triggered:

Updates game index
Shows the new selection on the LCD
Plays a sound
Sends a message to the web client

async def notify_button(button):

...

message = json.dumps({"event": "button_press", "button": button})

for client in clients:

await client.send(message)

JavaScript in the browser:

socket.onmessage = (event) => {

const data = JSON.parse(event.data);

if (data.event === "button_press") {

handleButtonPress(data.button);

}

};

6. Running the Server

The main() coroutine runs the WebSocket server on port 8765:

server = await websockets.serve(handler, "0.0.0.0", 8765)

print("WebSocket server started on port 8765")

Use asyncio.run(main()) to start it, and GPIO.cleanup() on exit to reset the pins safely.

Summary

In this step, I successfully bridged physical hardware with a web-based AI game interface using:

✅ GPIO input for real buttons

✅ Real-time LCD updates

✅ Buzzer feedback

✅ WebSocket server to sync everything to the browser

The result is a real-world console experience powered by AI and computer vision, no controller needed!

Building the Robotic Hand to Show the AI’s Move

To make the AI more expressive, and fun, I added a robotic hand that performs the AI’s move in real time.

Instead of keeping everything on screen, I wanted the AI to physically throw rock, paper, or scissors, just like a real opponent.

1. Repurposed Toy Hand

I didn’t design the hand from scratch, instead, I used a kids' robotic hand toy and modified it to be servo-controlled.

I mounted it to a wooden baseplate
I engraved the base with:

RoboHand — Rafael Aldaz

2. Servo Setup

I used three MG996R servo motors to control finger groups:

Servo 1: Controls the thumb
Servo 2: Controls the index and middle fingers
Servo 3: Controls the ring and pinky fingers

Each servo pulls a set of strings attached to its finger group.

3. Hand Gestures Mapping

When the AI selects its move (via the YOLOv8 model), a message is sent to the servos to form:

🪨 Rock: All fingers closed
✋ Paper: All fingers open
✌️ Scissors: Index + middle fingers open, others closed

4. Electronics & Integration

The servos are controlled by a PCA9685 driver for reliable PWM output
The Raspberry Pi sends commands to the servo controller whenever the AI makes a move
The movement is perfectly synced with the web game interface via WebSockets

Summary

This final step brought the AI’s decisions into the physical world, turning it from a virtual opponent into a robotic player sitting across from you.

Whether it wins or loses, the hand makes it feel personal and real.

✅ Repurposed toy hand

✅ Controlled by 3 servos

✅ Reacts instantly to AI predictions

✅ Totally integrated with the AI game console

Final Demo, How to Run It, and What’s Next

After weeks of work, combining AI, vision, hardware, and web technologies, the project finally came together as a fully interactive, controller-free gaming console.

In this step, I’ll show you how everything runs together and share ideas for future improvements.

✅ Final System Demo

The completed setup includes:

🖥️ A clean web interface that acts as the game launcher and visual feedback hub
🎮 Physical buttons to navigate menus, select games, and interact
🔊 A buzzer and LCD for sound and status feedback
👁️ A camera-powered AI using YOLOv8 to detect hand gestures
🤖 A robotic hand that physically performs the AI’s move in Rock-Paper-Scissors

All components talk to each other via WebSockets, allowing seamless communication between the Raspberry Pi and the browser.

🟢 Example Flow:

You press “Enter” on the console to start the Rock-Paper-Scissors game
The webcam opens inside the browser
You show a gesture with your hand
The AI model detects it and makes its own move
The robotic hand mimics the AI’s move, and the score updates!

▶️ How to Run the System

To run the full system, follow these steps:

1. 🧠 Start the AI Model Server (YOLOv8)

python test_webcam_model.py

2. 🌐 Launch the Web App (Flask)

python app.py

3. 🔌 On the Raspberry Pi: Run the GPIO Controller

python gpio_websocket_server.py

Make sure:

The LCD, buzzer, and buttons are wired correctly
The servos and PCA9685 are powered and connected
All scripts are in sync with the right ports and IPs

What’s Next?

This project was designed to be expandable, and here are some ideas I plan to work on:

🐤 Flappy Bird: Use vertical hand movements to control the bird’s flaps
🐍 Snake Game: Head movements or gestures to control the snake
🕹️ Multiplayer Mode: Play against another human using dual camera inputs
🧠 Model Improvements: Better gesture recognition in low light or cluttered backgrounds
🧩 Modular Console Design: Swappable front plates for different input methods (joystick, MPU6050, facial expressions)

🧠 What I Learned

This project brought together:

Real-time AI inference
GPIO electronics and LCD control
3D printing and laser cutting
Async WebSocket communication
UX design for browser-based games

But most of all, it showed me how creative tech can become when AI, hardware, and software all work together.

🏁 Conclusion

No Controller? No Problem! isn’t just a game, it’s a platform for rethinking how we interact with computers.

You don’t need a controller when you’ve got vision, AI, and a robotic hand as your gaming partner.

Thanks for reading, and feel free to remix, expand, or build your own version of this project!

Have Fun and Make It Yours!

You’ve reached the end of this Instructable, but the real fun is just beginning.

Whether you're building this for yourself, a school project, or just to explore the possibilities of AI and robotics, remember:

🎮 You don’t need a controller to play. You are the controller.

This project is meant to be hacked, customized, remixed, and shared. You can:

Add your own games
Use facial expressions or head gestures instead of hands
Customize the sounds, visuals, and animations
Even create your own robotic opponent!

So fire up the laser cutter, grab a servo or two, write some code, and have fun building something that's totally yours.

And if you do make your own version, I’d love to see it!

🚀 No Controller? No Problem! is more than a slogan — it's an invitation to invent.

Happy building! 🛠️🧠🎉