YOLO v7 training
Object 6D pose estimation, using MegaPose, requires 3 inputs:
- Object CAD model (with texture)
- RGB image
- Region of Interest (bounding box) where the object is located on the rgb image
To get this object bounding box, we used a YOLO v7 (the fastest object detection algorithm in 2024) to detect our wood block.
The generated dataset to train the YOLO v7 model is composed by images gathered from 3 different sources:
- Generated synthetic images, where the block texture is a common wood texture (available online)
- Generated synthetic images, where the object has its own real texture (collected from real images)
- Captured real images, where the labeling (ground truth for the bounding boxes) were manually developed
The dataset distribution is represented in the following table:
Total | Synthetic images (synthetic texture) | Synthetic Images (Real Texture) | Real Images | |
Train | 55105 | 51840 | 3072 | 193 |
Validation | 13774 | 12960 | 768 | 46 |
Total | 68879 | 64800 | 3840 | 239 |
Here is an example of each one of the 3 types of images:
- Synthetic image; object with a common online (synthetic) wood texture:
- Synthetic image; object with its own real texture:
- Real image, with real object:
The training procedure used, as initial weights, the pre-trained, online available, yolov7-tiny model. Thus, this training step stands as a YOLOv7 finetune, where the following parameters where selected:
- epochs: 600
- batch size: 32
After more than 80 hours of training, the metrics evolution throughout the training are shown bellow:
The evaluation of this previous graphs raises several questions about why the trained model has this difficulty to converge:
- Does the mixed dataset influence the training procedure? A homogeneous dataset would stabilize the training curves?
- Too much epochs? It seems that, for this number of images and batch size, 400 epochs would be enough.
- Too small batch size? Should I test it with 128 images per batch instead of 32?
- More or less images on the dataset?
Comments
Post a Comment