How does end-to-end semi-supervised learning improve object detection?

Insight from top 10 papers

End-to-End Semi-Supervised Object Detection

Limitations of Multi-Stage Semi-Supervised Object Detection

  • Previous semi-supervised object detection methods use a multi-stage training schema (Xu et al., 2021):
  1. Train an initial detector using labeled data
  2. Generate pseudo-labels for unlabeled data
  3. Retrain the detector using labeled and pseudo-labeled data
  • The final performance is limited by the quality of the pseudo-labels generated by the initial, inaccurate detector (Xu et al., 2021)

Benefits of End-to-End Semi-Supervised Learning

  • End-to-end semi-supervised learning can gradually improve the quality of pseudo-labels during training, and the more accurate pseudo-labels in turn benefit the object detection training (Xu et al., 2021)

  • End-to-end training allows the object detection model and pseudo-label generation to reinforce each other, leading to better performance compared to multi-stage approaches (Kallempudi et al., 2022)

Key Techniques in End-to-End Semi-Supervised Object Detection

Soft Teacher Mechanism

  • The classification loss for each unlabeled bounding box is weighted by the classification score produced by the teacher network (Xu et al., 2021)

Box Jittering

  • Box jittering is used to select reliable pseudo boxes for the learning of box regression (Xu et al., 2021)

Adaptive Thresholding

  • Adaptive thresholding mechanisms help the network filter out optimal bounding boxes, addressing issues like high false-negative and low precision rates (Kar et al., 2023)

Jitter-Bagging

  • Jitter-Bagging provides accurate information on localization to help refine the bounding boxes (Kar et al., 2023)

Strict Supervision of Teacher Network

  • Feeding strong and weak augmented data to the teacher network generates robust pseudo-labels, helping it detect small and complex objects (Kar et al., 2023)

Empirical Results

  • End-to-end semi-supervised object detection approaches outperform previous multi-stage methods by a large margin under various labeling ratios (1%, 5%, 10%) on the COCO benchmark (Xu et al., 2021)

  • The proposed end-to-end approach can improve a 40.9 mAP baseline detector trained using the full COCO training set by +3.6 mAP, reaching 44.5 mAP, by leveraging the 123K unlabeled images of COCO (Xu et al., 2021)

  • On the state-of-the-art Swin Transformer based object detector (58.9 mAP), the end-to-end semi-supervised approach can further improve the performance (Xu et al., 2021)

Source Papers (10)
End-to-End Semi-Supervised Object Detection with Soft Teacher
CISO: Co-iteration semi-supervised learning for visual object detection
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations
3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection
Towards End-to-end Semi-supervised Learning for One-stage Object Detection
Scale-Equivalent Distillation for Semi-Supervised Object Detection
Toward Semi-Supervised Graphical Object Detection in Document Images
Lesion Localization in OCT by Semi-Supervised Object Detection
Revisiting Class Imbalance for End-to-end Semi-Supervised Object Detection