LiDAR based object detection and recognition has become an important field of research in the domain of self-driving vehicles. Most of the infrastructure on which the current state-of-the-art detection algorithms are built upon contains self-supervised deep neural networks. Over the years, well annotated datasets such as the KITTI, NuScenes, etc., have facilitated learning for deep detection in these supervised architectures. The problem of domain adaptation in context of deep learning models for self driving perception tasks is one of the issues that the industry trying to tackle. Models trained on well annotated datasets might not always generalise well on a system with different hardware specifications despite data augmentation.
Another issue that needs to be addressed is the labor-intensive and costly process of 3D annotations. Annotating LiDAR point cloud data is a time consuming process. According to published statistics [1], it takes 114s on average to annotate a 3D point cloud which in comparison to 2-3 secs of time taken for annotating a 2D image is almost 40 times slower and requires more meticulousness.
In this project we try to address the possibility of eliminating the use of annotated 3D data while performing the detection task. We try to leverage the feature extraction and localization methods explained in [2] to perform weakly supervised 3D object detection. We generate 3D proposals of bounding boxes using clustering based methods on point clouds. These generated proposals are then projected on 2D images while training a CNN model to extract features for localization. We have trained our model on the KITTI dataset and reported the evaluation metrics for the CAR class. One of the benefits of choosing such image based CNN approach instead of 3D CNN is the reduction in inference time and its ability to be able to run on portable embedded platforms.