Training YOLO Object Recognition From a Single Reference

by Emilian0 in Circuits > Sensors

38 Views, 1 Favorites, 0 Comments

Training YOLO Object Recognition From a Single Reference

Screenshot 2025-12-14 182319.png
Screenshot 2025-12-14 192152.png

This guide walks through setting up a Conda-based Python environment for Ultralytics YOLO on Windows. You will install Miniconda, create an isolated environment, install Ultralytics, and verify the installation with a simple command.

Supplies

  1. Any computer-recognized camera (integrated webcam, usb camera, pi camera, etc)
  2. Internet connection
  3. This guide is written for Windows, however the python scripts can be run on any OS, even on edge devices. Minor editing may be required depending on hardware.

Install Miniconda and Ultralytics

Miniconda is a minimal installer for Conda that includes Python and the Conda package manager without the full Anaconda bundle.

1. Open a web browser and go to the official Miniconda download page: https://docs.conda.io/en/latest/miniconda.html

2. Under the Windows section, download the latest 64-bit installer for Miniconda (Miniconda3).

[or your respective operating system's download]

3. Once the file finishes downloading, double-click it to launch the installer.

4. Go through the installation process. Default settings are perfectly fine.

5. Click "Install" and wait for the installer to complete, then click "Finish."


Next, create a dedicated environment to keep Ultralytics and its dependencies isolated from other Python projects.

1. Open the Anaconda Prompt (Miniconda3)

2. Run the following command to create a new environment named ultralytics_env with Python 3.11:

conda create -n ultralytics_env python=3.11

3. When Conda shows a list of packages to be installed and asks for confirmation, type y and press Enter.

Once this completes, the environment is created but not yet active.

1. In the same terminal, run:

conda activate ultralytics_env

2. After activation, your prompt should change to show (ultralytics_env) at the beginning, indicating the environment is active.

Any python, pip, or conda commands you run now will apply to this environment.

Install Ultralytics

Ultralytics YOLO can be installed either with pip or directly via Conda; using Conda often simplifies dependency management.

With pip (inside the activated environment):

pip install ultralytics

This downloads Ultralytics from PyPI and installs it along with required Python dependencies.

Or, using Conda (recommended when you want Conda-managed dependencies):

conda install -c conda-forge ultralytics

This installs Ultralytics from the conda-forge channel and automatically resolves compatible versions of its dependencies.

Choose one method (pip or conda).

Ultralytics provides a simple command-line check to confirm that the package is installed and the environment is configured correctly.

1. Ensure the ultralytics_env environment is still active.

2. Run:

yolo checks

3. This command performs basic environment and dependency checks for Ultralytics YOLO and reports any issues it finds.

If the command completes without errors, your Ultralytics installation is ready to use.

Download YOLOE Model

YOLOE (Real-Time Seeing Anything) is an advanced zero-shot, open-vocabulary object detection and instance segmentation model. You'll need to download one of the prompt-free YOLOE models.

- YOLOE-11S-PF (Small): Fastest inference, good for real-time on edge devices

- YOLOE-11M-PF (Medium): Balanced speed and accuracy

- YOLOE-11L-PF (Large): Highest accuracy

1. Vist the YOLOE Prompt Free models page:

https://docs.ultralytics.com/models/yoloe/#prompt-free-models

2. In the "Prompt Free models" table, locate your preferred model size (11s, 11m, or 11l).

3. Click on the pretrained weights link for your chosen model:

- yoloe-11s-seg-pf.pt (Small)

- yoloe-11m-seg-pf.pt (Medium)

- yoloe-11l-seg-pf.pt (Large)

Once downloaded, you can load and run the model in Python:

import cv2
from ultralytics import YOLO

cap = cv2.VideoCapture(0)

#use the model you downloaded
model = YOLO("yoloe-11l-seg-pf.pt")

while True:
ret, frame = cap.read()
if not ret:
print("Failed to grab frame")
break
results = model.predict(frame)
annotated_frame = results[0].plot(boxes=True, masks=False)

cv2.imshow("Camera", annotated_frame)

if cv2.waitKey(1) == ord("q"):
break

cap.release()
cv2.destroyAllWindows()

The prompt-free models can detect objects across 1200+ categories out of the box.

Capture Images for Custom Training

unnamed.jpg

If you want to recognize uncommon or custom objects, the next step is to train a YOLO model to recognize your chosen object from an image.

1. Capture a clear photo of your target object using the same camera you plan to use in your project, ideally witgh reproducible lighting conditions

- Lighting can affect the model's confidence scores.

- If training multiple objects, you may take photos of each object individually but I find capturing them all in the same photo yields the best results.

- You'll want to spread them out slightly with even, uniform lighting.


2. Extract the pixel coordinates of your object(s) in the image so that YOLO can learn where they appear.

- One quick method is to open the image in an editor such as Photoshop or Photopea, draw a rectangular selection tightly around the object, and write down the x,y coordinates of the rectangle's corners.

- Alternatively, you can write a short Python script using libraries like Pillow and Matplotlib to display the image and log the bounding-box coordinates.

Once you have recorded the bounding-box coordinates for each object in the image, we can create the fine-tuned yolo model.

Train Your Custom Model and Export to ONNX

Now that you have your bounding box coordinates, you can export to onnx format with the custom trained object. The code below loads the reference photo and YOLO model, creates an array of objects with bounding boxes and object ID's.

You'll have to modify the code to match your reference photo's filename, and manually add the bounding boxes for each of your objects.

Using the chosen prompt free YOLO model as the foundation, this new model learns to recognize your objects by analyzing the bounding boxes you provided with a confidence threshold of 0.1.

Lastly the code exports the model to ONNX (Open Neural Network Exchange), a universal format for machine learning models. This is optional but highly recommended as onnx models can run on different hardware without modification, is optimized for speed and doesn't require the full Ultralytics or PyTorch libraries. This means that the ONNX model is lighter, simpler, and faster at inferencing than the .pt model.

Here's the full Python script that performs visual prompting training and ONNX export:

from ultralytics import YOLOE
from ultralytics.models.yolo.yoloe import YOLOEVPSegPredictor
import numpy as np

model = YOLOE("yoloe-11l-seg.pt")

model.predict(
"image.jpg",
refer_image="image.jpg",
visual_prompts={
'bboxes': np.array([[54, 214, 214, 370], [325, 246, 440, 362], [559, 247, 660, 351]]),
'cls': np.arange(3)
},
predictor=YOLOEVPSegPredictor,
conf=0.1
)

model.export(format="onnx", imgsz=640)

To run your model, modify the step 2 script to open your onnx model

model = YOLO("yoloe-11l-seg.onnx")