YOLO v7 training #3

Since my last YOLO v7 training seemed to be short, in terms of epochs, I took the last network weights as initial weights for a continuous training procedure. This procedure was very similar to the last one, with the number of epochs being the only change:

  • Training: 56 000 images
  • Validation: 14 000 images
  • Total: 70 000 images
  • Train - Valid Ratio: 80% - 20%

Trained with more 200 epochs, added to the previous trained 100:

  • epochs: 300
  • batch size: 64 
  • initial learning rate: 0.01 ( lr0 )
  • final OneCycleLR learning rate ( lrf ): 0.1 ( lr0*lrf ) 

After more than 46 hours of training, the two Loss metrics (Box and Objectness) graphs looked like this:



Evaluating the previous graphs, it is clear that the separation of the 2 training procedures (epoch 100) resulted in a high disturbance on the loss curves, making it achieve highest (worst) values.

The same occurrence can be depicted on Precision curves:


It was expected that the final, stable, Precision and mAP values were around 0.95 but, as we can see, they tended to converge towards 0.8...

What exactly does this mean? 

Should I try to train YOLOv7 with a known, online available, dataset? This would clarify if the "mistake" is on the dataset side, or on the training procedure side.

Should I come back to my last successful YOLO training and test?


Comments

Popular posts from this blog

Remote Control of UR10e via MoveIt (ROS)

Installing External Control URCap on robot Teach Pendant

UR10e control architecture