YOLO v7 training #3

Since my last YOLO v7 training seemed to be short, in terms of epochs, I took the last network weights as initial weights for a continuous training procedure. This procedure was very similar to the last one, with the number of epochs being the only change:

  • Training: 56 000 images
  • Validation: 14 000 images
  • Total: 70 000 images
  • Train - Valid Ratio: 80% - 20%

Trained with more 200 epochs, added to the previous trained 100:

  • epochs: 300
  • batch size: 64 
  • initial learning rate: 0.01 ( lr0 )
  • final OneCycleLR learning rate ( lrf ): 0.1 ( lr0*lrf ) 

After more than 46 hours of training, the two Loss metrics (Box and Objectness) graphs looked like this:



Evaluating the previous graphs, it is clear that the separation of the 2 training procedures (epoch 100) resulted in a high disturbance on the loss curves, making it achieve highest (worst) values.

The same occurrence can be depicted on Precision curves:


It was expected that the final, stable, Precision and mAP values were around 0.95 but, as we can see, they tended to converge towards 0.8...

What exactly does this mean? 

Should I try to train YOLOv7 with a known, online available, dataset? This would clarify if the "mistake" is on the dataset side, or on the training procedure side.

Should I come back to my last successful YOLO training and test?


Comments

Popular posts from this blog

RGB-D tracking + UR10e following & picking/placing

UR10e control architecture

Real-time UR10e following a tracked object