Types of Object Detection Algorithms

Deep learning is instrumental in the constant evolution of object detection. At present, deep learning-based object detectors can be divided into these two categories:

One-stage detectors
Two-stage detectors

Press + to interact

How do two-stage object detectors work?

As the name suggests, these types of detectors divide the OD task into two networks:

ROI (region of interest) extraction/region proposal network: This network tries to identify/detect the regions in an image that might contain objects to pass on to the next stage. This helps in reducing computational requirements because instead of the complete image, only relevant regions are passed on to the next stage. The functions of a RPN (region proposal network) are as follows:
- Generate regions by sliding a window of different sizes over the image.
- Check if any object (irrespective of the class) can be present in that region by predicting the probability of an object being present in each region.
- If yes, pass the coordinates of that region to the next network.
Localization and classification: An OD task requires not just classification but also localization of objects. This is a 2-step process:
- Classification: We classify the object for each ROI generated in the previous layer. This outputs the probability of an object belonging to a particular class, also referred to as class score.
- Localization: The OD model not only needs to identify the objects correctly but should also predict the bounding box fitting the object closely. Bounding box refinement happens in this stage.

Press + to interact

The key difference between these two detectors lies in their training method. A one-stage detector performs both classification and localization. This makes it much faster and simpler compared to a two-stage detector.
One-stage detectors like YOLO skip the region proposal stage and the box refinement step.
One-stage detectors treat object detection as a single, unified regression problem that involves predicting the coordinates of bounding boxes and associated class probabilities directly from the raw image pixels. However, two-stage detectors use a combination of classification and regression to predict bounding box locations.

How do single-stage detectors work if there is no ROI detection?

Without ROI detection, a complete image is passed through the network, which performs classification and localization in a single pass. For most of the detectors, the image is divided into a grid of cells, and then these grids are used for further computations. YOLOv1 is used to predict a fixed number of bounding boxes for each cell. The issue with this approach is that the number of bounding boxes predicted can be pretty high. So, to handle this problem, a concept called NMS (non-maximum suppression) is used.

Introduction to Object Detection

Fundamentals for Understanding YOLO

Building a System for Safety Helmet Detection Based on YOLOv5

YOLOv7 Architecture

Improving Model Performance: Handling Overfitting/Underfitting

Dealing With Small Datasets In ML

Pre-Trained Models, Fine-Tuning, and Hyperparameters in OD

Sun Detection Using YOLOv8

Conclusion

How do two-stage object detectors work?

How are the one-stage and two-stage detectors different?

How do single-stage detectors work if there is no ROI detection?