People Tracking using Deep Learning
Read this overview of people tracking and how deep learning-powered computer vision has allowed for phenomenal performance.
By Priyanka Kochhar, Deep Learning Consultant
Object Tracking is an important domain in computer vision. It involves the process of tracking an object which could be a person, ball or a car across a series of frames. For people tracking we would start with all possible detections in a frame and give them an ID. In subsequent frames we try to carry forward a person’s ID. If the person has moved away from the frame then that ID is dropped. If a new person appears then they start off with a fresh ID.
This is a difficult task since people could look similar causing the model to switch IDs, people may get occluded as in when a pedestrian or player gets hidden behind someone else or objects may disappear and reappear in later frames.
Basics of Tracking
Lets first start by reviewing the basics of tracking. Lets assume we have bounding box information for all objects in the frame. In real world applications we need to do bounding box detections in advance so tracker needs to be combined with a detector. But for now lets assume we are only working on tracking. Given bbox information for an ID in frame 1, how do we assign IDs in subsequent frames?
- Centroid based ID assignment — In its simplest form, we can assign IDs by looking at the bounding box centroids. We do this by calculating centroids for each bounding box in frame 1. In frame 2, we look at the new centroids and based on the distance from previous centroids we can assign IDs by looking at relative distance. The basic assumption is that frame to frame centroids would only move a little bit. This simple approach works quite well as long as centroids are spaced apart from each other. As you can imagine this approach fails when people are close to each other since it may switch IDs then
- Kalman Filter — Kalman Filter is an improvement over simple centroid based tracking. This blog does a great job of explaning a kalman filter. Kalman Filter allows us to model tracking based on the position and velocity of an object and predict where it is likely to be. It models future position and velocity using gaussians. When it receives a new reading it can use probability to assign the measurement to its prediction and update itself. It is light in memory and fast to run. And since it uses both position and velocity of motion, it has better results than the centroid based tracking.
Deep Sort Algorithm
I love the deep sort algorithm. It is so intuitive. In all of the above math, one fundamental thing that is missing that we humans use all the time in tracking is a visual understanding of that bounding box. We track based on not just distance, velocity but also what that person looks like. Deep sort allows us to add this feature by computing deep features for every bounding box and using the similarity between deep features to also factor into the tracking logic.
How do we compute deep features?
This paper uses a model trained on millions of human images and extracts a 128 dim vector for each bounding box which should capture the key features of the box. Using deep features allows this model to track much better in cases where people are occluding or are very close as in the image below.
Overall our experiments show that deep sort algorithm does work very well but it does have some limitations:
- If the bounding boxes are too big than too much of background is “captured” in the features reducing the effectiveness of the algorithm
- If people are dressed similarly as happens in sports that can results in similar features and ID switching
Tracking is an important problem in Computer Vision with tons of applications. Deep Sort algorithm is quite powerful and fast to run and is a good starting point for many cases. Their repo is very well written and easy to try. I encourage you to pull their code and give it a shot.
PS: I have my own deep learning consultancy and love to work on interesting problems. I have helped many startups deploy innovative AI based solutions. Check us out at — http://deeplearninganalytics.org/.
You can also see my other writings at: http://deeplearninganalytics.org/blog
If you have a project that we can collaborate on, then please contact me through my website or at [email protected]
Bio: Priyanka Kochhar has been a data scientist for 10+ years. She now has her own deep learning consultancy and loves to work on interesting problems. She has helped several startups deploy innovative AI based solutions. If you have a project that she can collaborate on then please contact her at [email protected].
Original. Reposted with permission.
- Building a Toy Detector with Tensorflow Object Detection API
- Is Google Tensorflow Object Detection API the Easiest Way to Implement Image Recognition?
- Using Tensorflow Object Detection to do Pixel Wise Classification