Comparing MobileNet Models in TensorFlow

MobileNets are a family of mobile-first computer vision models for TensorFlow, designed to effectively maximize accuracy while being mindful of the restricted resources for an on-device or embedded application.

comments

By Harshit Dwivedi, Android Instructor

Header image

In recent years, neural networks and deep learning have sparked tremendous progress in the field of natural language processing (NLP) and computer vision.

While many of the face, object, landmark, logo, and text recognition and detection technologies are provided for Internet-connected devices, we believe that the ever-increasing computational power of mobile devices can enable the delivery of these technologies into the hands of users anytime, anywhere, regardless of Internet connection.

However, computer vision for on-device and embedded applications faces many challenges — models must run quickly with high accuracy in a resource-constrained environment, making use of limited computation, power, and space.

TensorFlow offers various pre-trained models, such as drag-and-drop models, in order to identify approximately 1,000 default objects.

When compared with other similar models, such as the Inception model datasets, MobileNet works better with latency, size, and accuracy. In terms of output performance, there is a significant amount of lag with a full-fledged model.

However, the trade-off is acceptable when the model is deployable on a mobile device for real-time offline detection.

Let’s look at an example of how to use MobileNet. We will write a simple classifier to find Pikachu in an image. The following are sample pictures showing an image of Pikachu and an image without Pikachu:

Pikachu

Not Pikachu, assuming there’s no Pikachu to collect in Pokémon Go…

Building the dataset

To build our own classifier, we need to have datasets that contain images with and without Pikachu.

Let’s start with 1,000 images on each database. You can pull such images here:

CC Search
Creative Commons licenses provide a flexible range of protections and freedoms for authors, artists, and educators.
search.creativecommons.org

Next up, let’s create two folders named pikachu and no-pikachu and drop those images accordingly.

Another handy dataset containing images for all the generation one Pokémon can be found here:

Pokemon Generation One
Gotta train 'em all!
www.kaggle.com

Now we have an image folder, which is structured as follows:

/dataset/
/pikachu/[image1,..]
/no-pikachu/[image1,..]

Retraining Images

We can now start labeling our images. With TensorFlow, this job becomes easier. Assuming that TensorFlow is installed on the training machine already, download the following retraining script:

curl https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py

Next up, we’ll retrain the image with this Python script :

python retrain.py \
--image_dir ~/MLmobileapps/Chapter5/dataset/ \
--learning_rate=0.0001 \
--testing_percentage=20 \
--validation_percentage=20 \
--train_batch_size=32 \
--validation_batch_size=-1 \
--eval_step_interval=100 \
--how_many_training_steps=1000 \
--flip_left_right=True \
--random_scale=30 \
--random_brightness=30 \
--architecture mobilenet_1.0_224 \
--output_graph=output_graph.pb \
--output_labels=output_labels.txt

Note : If you set validation_batch_size to -1, it will validate the whole dataset. learning_rate = 0.0001 works well. You can adjust and try this for yourself.

In the architecture flag, we choose which version of MobileNet to use, from versions 1.0, 0.75, 0.50, and 0.25. The suffix number 224 represents the image resolution. You can specify 224, 192, 160, or 128 as well.

Model conversion from GraphDef to TFLite

TOCO Converter is used to convert from a TensorFlow GraphDef file or SavedModel into either a TFLite FlatBuffer or graph visualization.

(TOCO stands for TensorFlow Lite Optimizing Converter.)

We need to pass the data through command-line arguments. There are a few command-line arguments that can be passed in while converting the model:

--output_file OUTPUT_FILE
Filepath of the output tflite model.

--graph_def_file GRAPH_DEF_FILE
Filepath of input TensorFlow GraphDef.

--saved_model_dir
Filepath of directory containing the SavedModel.

--keras_model_file
Filepath of HDF5 file containing tf.Keras model.

--output_format {TFLITE,GRAPHVIZ_DOT}
Output file format.

--inference_type {FLOAT,QUANTIZED_UINT8}
Target data type in the output

--inference_input_type {FLOAT,QUANTIZED_UINT8}
Target data type of real-number input arrays.

--input_arrays INPUT_ARRAYS
Names of the input arrays, comma-separated.

--input_shapes INPUT_SHAPES
Shapes corresponding to --input_arrays, colon-separated.

--output_arrays OUTPUT_ARRAYS
Names of the output arrays, comma-separated.

We can now use the TOCO tool to convert the TensorFlow model into a TensorFlow Lite model:

toco \
--graph_def_file=/tmp/output_graph.pb
--output_file=/tmp/retrained_model.tflite
--input_arrays=Mul
--output_arrays=final_result
--input_format=TENSORFLOW_GRAPHDEF
--output_format=TFLITE
--input_shape=1,${224},${224},3
--inference_type=FLOAT
--input_data_type=FLOAT

Similarly, we can use the MobileNet model in similar applications; for example, in the next section, we’ll be looking at a gender model and an emotion model.

Gender Model

This model uses the IMDB WIKI dataset, which contains 500k+ celebrity faces. It uses the MobileNet_V1_224_0.5 version of MobileNet.

It is very rare to find public datasets with thousands of images. This dataset is built on top of a large collection of celebrity faces. There are two common places: one is IMDb and the other one is Wikipedia. More than 100K celebrities’ details were retrieved from their profiles from both sources through scripts.

Then it was organized by removing noise (irrelevant content). All the images without a timestamp were removed, assuming that images with a single photo are likely to show the person with correct birth date details. At the end, there were 460,723 faces from 20,284 celebrities from IMDb, and 62,328 from Wikipedia, for a total of 523,051.

Emotion model

This is built on the AffectNet model with more than 1 million images. It uses the MobileNet_V2_224_1.4 version of MobileNet.

The link to the data model project can be found here:

AffectNet - Mohammad H. Mahoor, PhD
Currently the test set is not released. We are planning to organize a challenge on AffectNet in near future and the…
mohammadmahoor.com

The AffectNet model is built by collecting and annotating facial images of more than 1 million faces from the Internet. The images were sourced from three search engines, using around 1,250 related keywords in six different languages.

Among the collected images, half of the images were manually annotated for the presence of seven discrete facial expressions (categorical model) and the intensity of valence and arousal (dimensional model).

Comparison of MobileNet Versions

In both of the above models, different versions of MobileNet models are used. MobileNet V2 is mostly an updated version of V1 that makes it even more efficient and powerful in terms of performance.

Note: Lower is better

MACs are multiply-accumulate operations, which measure how many calculations are needed to perform inference on a single 224×224 RGB image.

From the number of MACs alone, V2 should be almost twice as fast as V1. However, it’s not just about the number of calculations. On mobile devices, memory access is much slower than computation. V2 only has 80% of the parameter count that V1 has hence making it better than V1.

By seeing the results we can assume that V2 is almost twice as fast as V1 model. On a mobile device when memory access is limited, the computational capability of V2 works very well.

In terms of accuracy:

Here MobileNet V2 is slightly, if not significantly, better than V1.

Conclusion

MobileNets are a family of mobile-first computer vision models for TensorFlow, designed to effectively maximize accuracy while being mindful of the restricted resources for an on-device or embedded application.

MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings, and segmentation, similar to how other popular large scale models, such as Inception, are used.

If you want to go ahead and fuel your curiosity, a bunch of pre trained models can be found here :

tensorflow/models
Models and examples built with TensorFlow. Contribute to tensorflow/models development by creating an account on…
github.com

Also, here’s a blog post outlining how you can build a real like Pokémon classifier using MobileNets and TensorFlow Lite:

Building “Pokédex” in Android using TensorFlow Lite and Firebase’s ML Kit
heartbeat.fritz.ai

Thanks for reading! If you enjoyed this story, please click the ???? button and share to help others find it! Feel free to leave a comment ???? below. Have feedback? Let’s connect on Twitter.

If you found this article interesting, you can explore Machine Learning Projects for Mobile Applications. Written by Karthikeyan MG, an ML expert, Machine Learning Projects for Mobile Applications presents the implementation of 7 practical, real-world projects that will teach you how to leverage TensorFlow Lite and Core ML to perform efficient machine learning on a cross-platform mobile OS.

Want to start building amazing Android Apps? Check out my course on Coding Bocks: https://online.codingblocks.com/courses/android-app-training-online

Ready to dive into some code? Check out Fritz on GitHub. You’ll find open source, mobile-friendly implementations of the popular machine and deep learning models along with training scripts, project templates, and tools for building your own ML-powered iOS and Android apps.

Join us on Slack for help with technical problems, to share what you’re working on, or just chat with us about mobile development and machine learning. And follow us on Twitter and LinkedIn for the all the latest content, news, and more from the mobile machine learning world.

Thanks to Austin Kodra.

Bio: Harshit Dwivedi has an *approximate* knowledge of many things. He's an Android instructor at Coding Blocks, a contributing author for Heartbeat, by Fritz, public speaker, & Astrophysics enthusiast.

Original. Reposted with permission.

Related: