Since the work presented in this project aims at performing 3D object detection that is deployable on embedded platforms, we took inference from research work that relates to this idea. The PIXOR network uses a Convolutional Neural Network-based architecture on Birds Eye View images of LiDAR point clouds, stacked together in distinct channels and feeds that to a CNN to classify and regress over the type and bounding box of the detected object respectively.
Since the data in a LiDAR point cloud is sparse, most 3D CNN approaches have longer inference times. In PIXOR, the processing is done on images using 2D CNN, which makes it relatively faster and as accurate as any other 3D CNN based object detection model.
Network Flow chart for PIXOR
Work in the field of weakly supervised object localization is well explored in the WSDNN paper. The original work explores the idea of performing simultaneous image region selection and classification tasks for deep detection. The network takes in images, class labels, and region proposals as input and outputs class probability along with bounding box coordinates. This idea of performing detection and recognition by using shared weights is further extended to spatial domain in this project.
Weakly Supervised Deep Detection Network Framework
Traditional clustering-based methods can also be performed on a 3D point cloud to detect object clusters. Such methods include use of KD-Tree, Density-Based Scan (DB-SCAN), or other clustering algorithms to perform nearest neighbour search and generate bounding boxes.
These methods are are prone to noise in the data and can produce unstable results. According to VS3D, following are the results for detection for some methods that use online instance classification refinement and proposal cluster learning.
This projects tried to build upon the the research done in object detection on a point cloud using clustering-based approaches. Instead of performing final detection using such approaches, we use these detections as proposals to our network.