Is Google Tensorflow Object Detection API the Easiest Way to Implement Image Recognition?

There are many different ways to do image recognition. Google recently released a new Tensorflow Object Detection API to give computer vision everywhere a boost.

comments

By Priyanka Kochhar, Deep Learning Consultant

There are many different ways to do image recognition. Google recently released a new Tensorflow Object Detection API to give computer vision everywhere a boost. Any offering from Google is not to be taken lightly, and so I decided to try my hands on this new API and use it on videos from you tube :) See the result below:

Object Detection from Tensorflow API

You can find the full code on my Github repo

I added a second phase for this project where I used the Tensorflow Object Detection API on a custom dataset to build my own toy aeroplane detector. You can check out my article at:

https://medium.com/@priya.dwivedi/building-a-toy-detector-with-tensorflow-object-detection-api-63c0fdf2ac95

So what was the experience like? First lets understand the API.

Understanding the API

The API has been trained on the COCO dataset (Common Objects in Context). This is a dataset of 300k images of 90 most commonly found objects. Examples of objects includes:

Some of the object categories in COCO datset

The API provides 5 different models that provide a trade off between speed of execution and the accuracy in placing bounding boxes. See table below:

Here mAP (mean average precision) is the product of precision and recall on detecting bounding boxes. It’s a good combined measure for how sensitive the network is to objects of interest and how well it avoids false alarms. The higher the mAP score, the more accurate the network is but that comes at the cost of execution speed.

You can get more information about these models at this link

Using the API

I decided to try the most light weight model (ssd_mobilenet). The main steps were:

Download the frozen model (.pb — protobuf) and load it into memory
Use the built in helper code to load labels, categories, visualization tools etc.
Open a new session and run the model on an image

Overall a fairly simple set of steps. The API documentation also provides a handy Jupyter notebook that walks through the main steps.

The model had pretty good performance on the sample image (see below):

Running on Videos

Next I decided to try this API on some videos. To do this, I used the Python moviepy library. The main steps are:

Use the VideoFileClip function to extract images from the video
The fl_image function is an awesome function that can take an image and replace it with a modified image. I used this to run object detection on every image extracted from the video
Finally all the modified clip images were combined into a new video

This code takes a bit of time to run (~ 1 minute) for a 3–4 second clip. But since we are using a frozen model loaded to memory, all of this can be done on a computer without a GPU.

I was very impressed! With just a little bit of code, you can detect and draw bounding boxes on a good number of commonly found objects with decent accuracy.

There were cases where I felt that the performance could be better. See example below. The birds are not detected at all in this video.

Next Steps

Couple of additional ideas for further exploration of this API

Try the more accurate but high overhead models and see how much of a difference they make
Find out ways of speeding up the API, so it can be used for real time object detection on a mobile device
Google also provides the ability to use these models for transfer learning i.e load the frozen models and add another output layer with different image categories

Give me a ❤️ if you liked this post:) Hope you pull the code and try it yourself.

Other writings: https://medium.com/@priya.dwivedi/

PS: I have my own deep learning consultancy and love to work on interesting problems. I have helped several startups deploy innovative AI based solutions. If you have a project that we can collaborate on, then please contact me at priya.toronto3@gmail.com.

References:

Bio: Priyanka Kochhar has been a data scientist for 10+ years. She now has her own deep learning consultancy and loves to work on interesting problems. She has helped several startups deploy innovative AI based solutions. If you have a project that she can collaborate on then please contact her at priya.toronto3@gmail.com.

Original. Reposted with permission.

Related:

Is Google Tensorflow Object Detection API the Easiest Way to Implement Image Recognition?

Understanding the API

Using the API

Next Steps

More On This Topic

Latest Posts

Top Posts