The Good, Bad & Ugly of TensorFlow
A survey of six months of rapid evolution (+ tips/hacks and code to fix the ugly stuff) using TensorFlow. Get some great advice from the trenches.
Documentation can be inconsistent.
There are a number of good tutorials available for TensorFlow, and the code itself is very well commented (thank you, authors). But machine learning/deep learning is deep and wide domain, and there is a lag between new functionality and docs/tutorials explaining how to build stuff.
A few of our favorites tutorials are:
- Nathan’s Github repo of simple tutorials. It’s the quick way to see machine learning primitives at work. Start here if you’re familiar with numpy or Theano.
- Udacity course by Google’s Vincent Vanhoucke. Start here if you’re new to deep learning.
- The official MNIST tutorial. Go here after the Udacity course if you’re new to deep learning. MNIST is the “Drosophila of machine learning” and a good benchmark and sanity check.
- Tensorflow API documentation. Our go-to reference for stuff in TensorFlow. Control-F to find stuff!
Unfortunately, especially for RNNs, there are still conceptual gaps in the documentation and tutorials, such as the gap between the simple or trivial examples and the full-on state-of-the-art examples. This can be a real barrier for developers who are trying learn the concepts at the same time as they are learning the framework. For example, the Udacity tutorials and the RNN tutorial using Penn TreeBank data to build a language model are very illustrative, thanks to their simplicity. They are good illustrations to learn a concept, but too basic for real-world modeling tasks.
The only other authoritative TensorFlow RNN tutorial that we’re aware of is a full-on seq2seq model using multi-cell RNNs (GRU or LSTM) with attention, bucketing, and sampled softmax. Woah! Just as you shouldn’t learn to ski by starting on the training hill then going straight to the top of the mountain to ride a double black diamond with trees and moguls (dangerous, and terrifying!?)…you probably shouldn’t go from the simplest implementations to the most complicated. Better to add complexity progressively, according to the problem you’re trying to solve.
High-quality tutorials that progressively ratchet up the complexity from simple RNN language models to something like plain seq2seq RNN encoder-decoder architecture that learns to reverse words, to a fancier neural translation seq2seq LSTM with attention, to something with multi-cell RNNs, bucketing and all the tricks would be extremely helpful to the nascent community of TensorFlow users. I suspect this lack of progressive examples might explain why the community has already reproduced many popular models in TensorFlow, but we haven’t seen many novel architectures or clever remixes yet.
Where documentation is lacking, look to the tests! Often the tests are more illuminating than the documentation anyway. Thanks to Google releasing the project as open source, you can search the Github repo for a relevant test to see how the authors do it.
We totally understand that the TensorFlow team is focusing on functionality and features first, and following thereafter with documentation... we’d probably do the same! Good docs are an investment, and the best docs I’ve seen are the result of someone who isn’t the author writing that documentation, because then you’re guaranteed that at least one fresh mind has understood the thing. It would be really cool if the TensorFlow community wrote documentation with as much urgency as they ask for new features!
We’re still waiting on the trace monitoring tool, EEG.
Heterogeneous resource utilization adds complexity.
A classic engineering tradeoff between control and simplicity—if you want fine-grained control over how your operations execute (e.g., which GPU node), then you need to maintain these constraints. In some cases, fine-grained control is necessary to maximize performance. For example, using multiple threads to fetch and pre-process a batch of data before feeding the GPU, so the GPU doesn’t wait on these operations. For more detail on using asynchronous runners on CPUs to feed GPUs, or to benchmark your own queues, see Luke’s excellent post,TensorFlow Data Input (Part 2): Extensions.
TensorFlow can hog a GPU.
Similarly, on startup, TensorFlow tries to allocate all available GPU memory for itself. This is a double-edged sword, depending on your context. If you are actively developing a model and have GPUs available to you in a local machine, you might want to allocate portions of the GPU to different things. However, if you are deploying a model to a cloud environment, you want to know that your model can execute on the hardware available to it, without unpredictable interactions with other code that may access the same hardware.
You can use something like the following snippet to put an upper limit on the GPU memory available to a given process, but if you have multiple GPUs on a machine, we’re not aware of a way to control allocation per GPU.
Set the option:
and pass it to your session as a config:
By default, Theano and TensorFlow can conflict.
We have a lot of code that depends on Theano, from loading data to various utility functions. We also read a lot of research code that was implemented in Theano. However, if you import Theano and TensorFlow in the same scope, they will compete to allocate GPU memory and bad things happen. To execute totally different environments on different GPUs (e.g., two GPUs running two separate models), you can restrict CUDA to see only certain devices, at the shell/environment level. Then when you launch your python code, it will only see (and allocate) the GPUs that CUDA can see. If you use
bash, this will do the trick:
Note: the CUDA device numbers above might not be the same, as the device IDs you see using
Alternatively, if you want Theano to execute only on CPU, which is probably want you want for those data and utility functions anyway, you can do it inline in Python. Here’s a Python one-liner to do just that. Put this at the top of your imports:
Of course, you can inline the environment flags for CUDA too, but for my model development workflow, it is easier to remember “one GPU per shell”.
It takes a fair amount of effort to implement end-to-end workflows in any framework, and TensorFlow is no exception. Some things (queues, certain graph operations, resource allocation/context management, graph visualization) from TensorFlow are all relatively new to the deep learning scene and like many, we’re still learning the best ways to exploit these features. Other things have been available in other frameworks for some time. Even though the overall concept is similar, implementation details can differ. We appreciate all the effort Google developers have put into implementing good abstractions (e.g., streaming data from queues).
The best part of open tools is when someone from the community implements a really clever hack or novel way of solving a problem. Even though most folks are still climbing the learning curve with TensorFlow, I think the odds of that happening have gone up! Looking forward to the next epoch!
Bio: Dr. Daniel Kuster is a researcher for indico, where he helps build the next generation of APIs for image and text analysis, powered by deep learning. He writes these articles on the chance that exposing some scar tissue will help a reader avoid some frustration, or maybe inspire a better solution. If you have an interesting problem, you should reach out!
Original. Reposted with permission.