Object Detection: An Overview in the Age of Deep Learning

Like many other computer vision problems, there still isn’t an obvious or even “best” way to approach the problem of object recognition, meaning there’s still much room for improvement.

By Javier Rey, Research Engineer at Tryolabs.

There’s no shortage of interesting problems in computer vision, from simple image classification to 3D-pose estimation. One of the problems we’re most interested in and have worked on a bunch is object detection. Like many other computer vision problems, there still isn’t an obvious or even “best” way to approach the problem, meaning there’s still much room for improvement. Before getting into object detection, let’s do a quick rundown of the most common problems in the field.

Object detection vs. other computer vision problems



Probably the most well-known problem in computer vision. It consists of classifying an image into one of many different categories. One of the most popular datasets used in academia is ImageNet, composed of millions of classified images, (partially) utilized in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)annual competition. In recent years classification models have surpassed human performance and it has been considered practically solved. While there are plenty of challenges to image classification, there are also plenty of write-ups on how it’s usually solved and which are the remaining challenges.

Classification example



Similar to classification, localization finds the location of a single object inside the image.

Localization example

Localization can be used for lots of useful real-life problems. For example, smart cropping (knowing where to crop images based on where the object is located), or even regular object extraction for further processing using different techniques. It can be combined with classification for not only locating the object but categorizing it into one of many possible categories.


Instance segmentation

Going one step further from object detection we would want to not only find objects inside an image, but find a pixel by pixel mask of each of the detected objects. We refer to this problem as instance or object segmentation.


Object detection

Iterating over the problem of localization plus classification we end up with the need for detecting and classifying multiple objects at the same time. Object detection is the problem of finding and classifying a variable number of objects on an image. The important difference is the “variable” part. In contrast with problems like classification, the output of object detection is variable in length, since the number of objects detected may change from image to image. In this post we’ll go into the details of practical applications, what are the main issues of object detection as a machine learning problem and how the way to tackle it has been shifting in the last years with deep learning.

Object detection example


Practical uses

At Tryolabs we specialize in applying state of the art machine learning to solve business problems, so even though we love all the crazy machine learning research problems, at the end of the day we end up worrying a lot more about the applications.

Even though object detection is somewhat still of a new tool in the industry, there are already many useful and exciting applications using it.


Face detection

Since the mid-2000s some point and shoot cameras started to come with the feature of detecting faces for a more efficient auto-focus. While it’s a narrower type of object detection, the methods used apply to other types of objects as we’ll describe later.



One simple but often ignored use of object detection is counting. The ability to count people, cars, flowers, and even microorganisms, is a real world need that is broadly required for different types of systems using images. Recently with the ongoing surge of video surveillance devices, there’s a bigger than ever opportunity to turn that raw information into structured data using computer vision.


Visual Search Engine

Finally, one use case we’re fond of is the visual search engine of Pinterest. They use object detection as part of the pipeline for indexing different parts of the image. This way when searching for a specific purse, you can find instances of purses similar to the one you want in a different context. This is much more powerful than just finding similar images, like Google Image’s reverse search engine does.

Pinterest example used in paper. Jing, Yushi, et al. "Visual Search at Pinterest."


Aerial image analysis

In the age of cheap drones and (close to) affordable satellite launches, there has never been that much data of our world from above. There are already companies using satellite imagery from companies like Planet and Descartes Labs, applying object detection to count cars, trees and ships. This has resulted in high quality data, which was impossible (or extremely expensive) to get before, now reaching a broader audience.

Some companies are using drone footage for automatic inspections on hard to reach places (e.g. BetterView) or using object detection for general purpose analysis (e.g. TensorFlight). On top of this, some companies add automatic detection and location of problems without the need for human intervention.

Car, tree and pedestrian detection using TensorFlight. TensorFlight