A TensorFlow Modeling Pipeline Using TensorFlow Datasets and TensorBoard

This article investigates TensorFlow components for building a toolset to make modeling evaluation more efficient. Specifically, TensorFlow Datasets (TFDS) and TensorBoard (TB) can be quite helpful in this task.

By Stephen Godfrey, AICamp Student

While completing a highly informative AICamp online class taught by Tyler Elliot Bettilyon (TEB) called Deep Learning for Developers, I got interested in creating a more structured way for machine-learning model builders — like me as the student — to understand and evaluate various models and observe their performance when applied to new datasets. Since this particular class focused on TensorFlow (TF), I started to investigate TF components for building a toolset to make this type of modeling evaluation more efficient. In doing so, I learned about two components, TensorFlow Datasets (TFDS) and TensorBoard (TB), that can be quite helpful and this blog post discusses their application in this task. See the References section for links to AICamp, TEB and other useful resources.



While the term ‘pipeline’ may have several meanings when used in a data science context, I use it here to mean a modeling pipeline or set of programmatic components that can automatically complete end-to-end modeling from loading data, applying a pre-determined model and logging performance results. The goal is to set up a number of modeling tests and to automatically run the pipeline for each test. Once the models are trained, each test result can be easily compared to the others. In summary, the objective is to establish an efficient, organized and methodical mechanism for model testing.


The logical flow of the modeling pipeline


This approach is depicted in Figure 1. The pipeline consists of three steps:

  1. Data: Loading and processing a dataset,
  2. Analysis: Building predefined models and applying to this dataset,
  3. Results: Capturing key metrics for each dataset-model test for methodical comparison later.

Any analyst who has studied or even dabbled with deep learning neural networks has probably experienced the seemingly boundless array of modeling choices. Any number of many layer types, each with a multitude of configuration options, can be interconnected, and once stacked the model can be trained using multiple optimization routines and numerous hyper-parameters. And there is the question of data, since it may be desirable to apply promising models to new datasets to observe their performance on unseen data or to gain a foundation for further model iterations.

For this application, I worked exclusively with image-classification data and models. TFDS includes audio, image, object-detection, structured, summarization, text, translate and video data and deep-learning models can be specifically constructed for these problems. While the out-of-the box code presented here will require some modifications and testing to be applied to other sets, its foundational framework will still be helpful.



The code in this post is summarized in Table 1 and is built on TensorFlow 2.0 (product release September 2019) and two components, TensorFlow Datasets and TensorBoard. Keras, a high-level API interacting with TensorFlow is now deeply integrated with the TF 2.x, and many of the tools used here rely on Keras components.


The key TensorFlow components used in the modeling pipeline



As it turns out, building the intended modeling pipeline required a fair bit of coding. While simple or straightforward applications of these TF modules can be more simply deployed, using them in a robust pipeline in which both the data and models are expected to programmatically change requires some custom wrapping and implementation components to orchestrate their use. I’ve made my code available in a shared Google Colab notebook (TF_Modeling_Pipeline) accessible to all. However, I should note that I use Colab Pro (a subscription service) and run this notebook on a GPU device.

While this notebook contains a lot of code, I tried to cleanly organize and thoroughly document the work to make it a bit more navigable. In addition, I added extensive reference links throughout the documentation to recognize sources and to provide quick access to more information.


Key custom wrappers calling the TensorFlow, TensorFlow Datasets and TensorBoard modules


As noted, the goal here is to build an automated pipeline. With that objective in mind, I employed several custom wrappers to handle the data loading and processing, model building and training and results logging processes. These are depicted in Figure 2 and discussed in Table 2.


Key custom wrappers calling the TensorFlow, TensorFlow Datasets and TensorBoard modules



To demonstrate the pipeline, I set up the following tests in which the Dataset names can be found in the TFDS Catalog.


Tests configurations examining two models applied to two datasets


Reviewing the thought process in establishing these demonstration tests exemplifies benefits of this pipeline approach. After applying Tyler Bettilyon’s class and TF example models (TBConvTest) to the MNIST dataset and finding good results, I wanted to see its performance with a color-image dataset. I ended up choosing the Malaria data. As you can see from the TFDS Catalog links or the dataset information class, TFDS datasets come with reference material.

That material proved quite valuable since the early tests of the TBConvTest model on the Malaria dataset produced poor results. This led me to the work of Sivaramakrishnan Rajaraman, et al and this paper and GitHub repository. There I was able to find helpful resources on their approach in classifying these Malaria image data, and the VGG16Test model is equivalent to one of theirs.



The training data image examples in Figures 3 and 4 are saved in logs and available through TB’s IMAGES feature (see step 5 in the modeling pipeline code).


MNIST image examples



Malaria image examples



The model graphs in Figures 5 and 6 are created by TB callbacks and are available in TB’s GRAPHs feature (see step 7 in the modeling pipeline code).


The TBConvTest model



The VGG16Test model



All results for these tests are stored in TensorBoard and can be recreated by running the TF_Modeling_Pipeline Colab notebook. As an overview, the screenshot in Figure 7 shows the TB dashboard. As you can see, there are several features, some of which we have already discussed, accessible via the header menu. In this image, we are looking at the key accuracy metrics for each modeling test under SCALARS.


The TB dashboard and SCALARS showing prediction accuracy for each test


From the annotations (which I added), we can quickly compare models and make a few observations. First, the stark performance difference between test_1 and test_2 clearly show that the TBConvTest model does not extend to the Malaria dataset. Since the Malaria test is a binary classification, the 50% accuracy is no better than guessing and the model offers no explanatory power. Second, we can see the incremental improvement by using the VGG16Test model on the Malaria dataset (test_3). Third, we quickly appreciate the improvement of all models with training by viewing the accuracy rate at each epoch (x axis). Considering that Rajaraman, et al trained for at least 100 epochs, it is interesting to examine the slow accuracy growth rate of test_3 and wonder how or when the benefits of additional training would manifest.

It is also helpful to view our customized metric, the confusion matrix. Given our TB configuration (see step 7 in the modeling pipeline code) that is available in the IMAGES tab. This view is after 10 epochs, but the top slider allows the user to walk through the confusion matrix changes after each epoch. From this normalized matrix, we note that accuracy with the uninfected class is 80% versus only 57% for the parasitized class and that may provide insight into additional model changes and the appropriate use of this model.


The confusion matrix for test_3 after 10 epochs



Building a semi-automated, organized and methodical pipeline to test machine learning models and to validate their performance on multiple datasets can useful tool for the modeler. TensorFlow Datasets and TensorBoard are two components in the TensorFlow 2.x development suite that provide substantial pipeline functionality and can serve as a foundation for such TensorFlow pipelines. However, in order to achieve automation benefits, they should be wrapped in custom programming components that provide both the efficiency and visibility needed to effectively build deep learning models.


  • If you want to learn more about AICamp or the Deep Learning class instructor, Tyler Elliot Bettilyon (TEB), visit them here: AICamp and Teb’s Lab.



Shared with permission, original post.

Bio: Stephen Godfrey is a student at AICamp. He is a Technical Product Manager & Data Scientist.

Original. Reposted with permission.