Pedestrian Detection Using Non Maximum Suppression Algorithm
Read this overview of a complete pipeline for detecting pedestrians on the road.
Pedestrian detection is still an unsolved problem in computer science. While many object detection algorithms like YOLO, SSD, RCNN, Fast R-CNN and Faster R-CNN have been researched a lot to great success but still pedestrian detection in crowded scenes remains an open challenge.
In recent years, pedestrian detection is urgently required in the real-world scenario where the density of people is high, i.e., airports, train stations, shopping malls etc. Despite great progress achieved, detecting pedestrians in those scenes still remains difficult, evidenced by significant performance drops of state of the art methods. In this post, I will go through an efficient and scalable algorithm known as Non Maximum Suppression for solving pedestrian/person detection in crowded scenes.
APPLICATIONS
- Self driving cars. Identifying pedestrians on a road scene
- Security. Restrict access for certain people to certain places
- Retail. Analysing visitors behaviour within a supermarket
- Fashion. Identify specific brands and persons who wear them
Data
I downloaded the images for testing from here. Then I compressed the image to 300*200 size and used these images as test images for this project.
Non Maximum Suppression
History of Oriented Gradients(HOG) combined with Support Vector Machines(SVM) have been pretty successful for detecting objects in images but the problem with those algorithms is that they detect multiple bounding boxes surrounding the objects in the image. Hence they are not applicable in our case that is detecting pedestrians on crowded roads. Here’s where Non maximum suppression(NMS) comes to rescue to better refine the bounding boxes given by detectors. In this algorithm we propose additional penalties to produce more compact bounding boxes and thus become less sensitive to the threshold of NMS. The ideal solution for crowds under their pipelines with greedy NMS is to set a high threshold to preserve highly overlapped objects and predict very compact detection boxes for all instances to reduce false positives.
Environment and tools
- scikit-learn
- scikit-Image
- numpy
- opencv
Where is the code?
Without much ado, let’s get started with the code. The complete project on github can be found here.
I started by calculating the overlapping areas between two bounding boxes as detected by our algorithm.
Then I defined a function to take bounding box rectangle and threshold factor as input. Also I sorted all the bounding boxes in descending order of the bottom right corner co-ordinate values. After that I appended all those boxes which were not within a factor of 0.5 times of the overlapping area with another box.
Time to load all the required libraries.
Then I created a function to append the four end co-ordinates of all the bounding boxes to an empty list.
After that I created some parser arguments for image location, downscaling it, visualizing it and applying threshold to it.
Then comes the core part of the algorithm. I used a pickle file which was generated after training it over thousands of images both having pedestrians and not having pedestrians. Also I converted the image to greyscale and applied strides on it.
After that I created a script to find and locate all the bounding boxes which were within a threshold as defined by the nms function as shown above. Please note the below script might look intimidating at first glance but it is just plain mathematics.
Finally, I wrote a bunch of lines to display both the images before and after NMS is applied and saved the output image which I got.
Evaluating Pedestrian Detection Model
Every image in an object detection problem could have different objects of different classes. Hence the standard metric of precision used in image classification problems cannot be directly applied here. This is where mAP(Mean Average-Precision) comes into the picture.
Ground Truth
For any algorithm, the metrics are always evaluated in comparison to the ground truth data. We only know the Ground Truth information for the Training, Validation and Test datasets.
For object detection problems, the ground truth includes the image, the classes of the objects in it and the true bounding boxes of each of the objects in that image.
Calculating the mAP
Let’s say the original image and ground truth annotations are as we have seen above. The training and validation data has all images annotated in the same way. The model would return lots of predictions but out of those most of them would have a very low confidence score associated hence we only consider predictions above a certain reported confidence score. We run the original image through our model and this is what the object detection algorithm returns after confidence thresholding.
We first need to know to judge the correctness of each of these detections. The metric that tells us the correctness of a given bounding box is the — IoU — Intersection over Union.
Calculating the IOU
Intersection over Union is a ratio between the intersection and the union of the predicted boxes and the ground truth boxes.
Results
Conclusions
Although algorithms like Mask R-CNN have pushed the boundaries and are considered the state of art instance segmentation algorithms, but still problems like pedestrian detection pose a lot of open challenges. Non Maximum Suppression algorithms still fails if the images contains a lot of people clustered in one location. This project is far from over. In fact it has opened more questions than it has answered. The future of self driving cars relies a lot on efficient pedestrian detection algorithms.
References/Further Readings
Pedestrian Tracking in Real Time Using YOLOv3
A complete pipeline for tracking pedestrians.
Adaptive NMS: Refining Pedestrian Detection in a Crowd
Pedestrian detection in a crowd is a very challenging issue. This paper addresses this problem by a novel Non-Maximum…
Non-Maximum Suppression for Object Detection in Python - PyImageSearch
Connecticut is cold. Very cold. Sometimes it's hard to even get out of bed in the morning. And honestly, without the…
Before You Go
The corresponding source code can be found here.
abhinavsagar/Pedestrian-detection
Pedestrian detection using Non Maximum Suppression. Pedestrian detection is still an unsolved problem in computer…
Contacts
If you want to keep updated with my latest articles and projects follow me on Medium. These are some of my contacts details:
Happy reading, happy learning and happy coding.
Bio: Abhinav Sagar is a senior year undergrad at VIT Vellore. He is interested in data science, machine learning and their applications to real-world problems.
Original. Reposted with permission.
Related:
- Deep Learning for Image Classification with Less Data
- Convolutional Neural Network for Breast Cancer Classification
- A 2019 Guide to Human Pose Estimation