YOLO v7 training

May 21, 2024

Object 6D pose estimation, using MegaPose, requires 3 inputs:

Object CAD model (with texture)
RGB image
Region of Interest (bounding box) where the object is located on the rgb image

To get this object bounding box, we used a YOLO v7 (the fastest object detection algorithm in 2024) to detect our wood block.

The generated dataset to train the YOLO v7 model is composed by images gathered from 3 different sources:

Generated synthetic images, where the block texture is a common wood texture (available online)
Generated synthetic images, where the object has its own real texture (collected from real images)
Captured real images, where the labeling (ground truth for the bounding boxes) were manually developed

The dataset distribution is represented in the following table:

	Total	Synthetic images (synthetic texture)	Synthetic Images (Real Texture)	Real Images
Train	55105	51840	3072	193
Validation	13774	12960	768	46
Total	68879	64800	3840	239

Here is an example of each one of the 3 types of images:

Synthetic image; object with a common online (synthetic) wood texture:

Synthetic image; object with its own real texture:

Real image, with real object:

The training procedure used, as initial weights, the pre-trained, online available, yolov7-tiny model. Thus, this training step stands as a YOLOv7 finetune, where the following parameters where selected:

epochs: 600
batch size: 32

After more than 80 hours of training, the metrics evolution throughout the training are shown bellow:

The evaluation of this previous graphs raises several questions about why the trained model has this difficulty to converge:

Does the mixed dataset influence the training procedure? A homogeneous dataset would stabilize the training curves?
Too much epochs? It seems that, for this number of images and batch size, 400 epochs would be enough.
Too small batch size? Should I test it with 128 images per batch instead of 32?
More or less images on the dataset?

Search This Blog

Human-Robot Collaboration

YOLO v7 training

Comments

Post a Comment

Popular posts from this blog

Remote Control of UR10e via MoveIt (ROS)

UR10e control architecture

ViSP - first tutorial