The Good, Bad & Ugly of TensorFlow

A survey of six months of rapid evolution (+ tips/hacks and code to fix the ugly stuff) using TensorFlow. Get some great advice from the trenches.

By Daniel Kuster, indico.

We’ve been using TensorFlow in daily research and engineering since it was released almost six months ago. We’ve learned a lot of things along the way. Time for an update!


Because there are many subjective articles on TensorFlow and not enough helpful documentation, I’ve sprinkled in examples, tutorials, docs, and code snippets wherever possible.

The Good

Community engagement is the most important thing.

When it comes to machine learning, it is easy to focus on the tech (features, capabilities, benchmarks, etc). But good programmers know it is much harder to write code that humans will use, versus code that a machine can compile and execute. So my favorite thing about TensorFlow is the simple fact that everyone in the machine learning community is aware of it, most are open to trying it, and hopefully, enough of us will use it to make useful things. More minds solving problems, more shoulders to stand upon!

A large number of developers and students are now interested in deep learning because they heard about TensorFlow. Google Deepmind recently announced they’ll be migrating from Torch to TensorFlow, so we might see an uptick in TensorFlow reinforcement learning models being released in the near future, too. The future is bright when the community embraces openness, clean APIs, useful modules, and the attitude of being helpful on the internet.

Technical blocking factors have been mostly eliminated.

When we wrote the first post evaluating TensorFlow in November of last year, there were a number of real and potential blocking factors. I’m happy to report that most of these have now been solved.

  • Multi-GPU support. It works; the documentation is simple and clear. You’ll still need to figure out how to divide and conquer your problem, but isn’t that part of the fun?
  • Training across distributed resources (i.e., cloud). As of v0.8, distributed training is supported.
  • Queues for putting operations like data loading and preprocessing on the graph.
  • Visualize the graph itself using TensorBoard. When building and debugging new models, it is easy to get lost in the weeds. For me, holding mental context for a new framework and model I’m building to solve a hard problem is already pretty taxing, so it can be really helpful to inspect a totally different representation of a model; the TensorBoard graph visualization is great for this.
  • Logging events interactively with TensorBoard. In UNIX/Linux, I like to use tail -f <log_file> to monitor the output of tasks at the command line and do quick sanity checks. Logging events in TensorFlow allows me to do the same thing, by emitting events and summaries from the graph and then monitoring output over time via TensorBoard (e.g., learning rate, loss values, train/test accuracy).
  • Model checkpointing. Train a model for a while. Stop to evaluate it. Reload from checkpoint, keep training.
  • Performance and GPU memory usage are similar to Theano and everything else that uses CUDNN. Most of the performance complaints in the earlier releases appear to have been due to using CUDNNv2, so TensorFlow v0.8 (using CUDNNv4) is much improved in this regard.

Several high-quality metaframeworks.

  • Keras wraps both TensorFlow and Theano backends. A good option if you want modularity without diving into the details of TensorFlow (or Theano).
  • TensorFlow Slim is a great reference for image models. Even if you prefer to write your own low-level Tensorflow code, the Slim repo can be a good reference for Tensorflow API usage, model design, etc.
  • Skflow wraps Tensorflow methods in a scikit-learn-style API. In my hands, it seems a bit awkward compared to just importing and inlining the python code for various sklearn metrics.
  • PrettyTensor provides objects that behave like tensors and have a chainable syntax so you can quickly compose certain kinds of models.

Release schedule.

Maintaining a popular open source project is a challenge, especially something with the technical complexity of TensorFlow. Hat tip to the maintainers! We appreciate their strategy of integrating new features and tests first so early adopters can try things before they are documented. Check out the version semantics note if you are interested in the details of what is released and when.

Tests are great!

Tests are valuable for validating functionality and for templating how things are supposed to work. When you find something in TensorFlow that isn’t working as you expect, or maybe you are learning the quirks of a method or arguments…search Github for a test, and see how the test does it!

The Bad

RNNs are still a bit lacking, compared to Theano.

The Theano team has put a lot of effort over the years into optimizing their implementation of recurrent neural networks. Happily, the gap is quickly closing, and in a few months, TensorFlow may very well be the platform of choice for RNNs. Specifically:

  • We haven’t seen an elegant way to handle variable length sequence inputs. Bucketing works, at the cost of extra complexity that most models just don’t need. Patching and padding all sequences to a fixed length works fine in many cases (especially using batches and GPUs), but some might see it as an unsatisfying hack. Dynamic unrolling for RNNs might be a solution, but the implementation of dynamic_rnn in the tensorflow.python.ops.rnn module is new and undocumented. We’re still experimenting.
  • Performance and memory usage. Although it is hard to do an exact apples-to-apples comparison here, after implementing many of same models in both frameworks, our impression is that, for RNNs, Theano is perhaps a bit faster and eats up less memory than TensorFlow on a given GPU, perhaps due to element-wise ops. Tensorflow wins for multi-GPU and “compilation” time.

Lack of authoritative examples for data ingestion.

The TensorFlow docs and examples focus on using several well-known academic datasets to demonstrate various features or functionality. This totally makes sense, and is a good thing to prioritize for general consumption. But real-world problems are rarely drop-in replacements for these kinds of datasets. Working with tensor inputs and shapes can be a real stumbling block when learning a new deep learning framework, so an example or two showing how to work with messy input data (weird shapes, padding, distributions, tokenization, etc.) could save a lot of pain for future developers/engineers.