TensorFlow is Terrific – A Sober Take on Deep Learning Acceleration
TensorFlow does not change the world. But it appears to be the best, most convenient deep learning library out there.
On November 9th, Google open-sourced TensorFlow, a Python library for performing fast gradient-based machine learning on GPUs. As with most recent developments in AI, the web erupted with outlandish storylines.
Several articles speculated on TensorFlow's capacity to revolutionize AI. Many described the move as bold despite the fact that (Torch), which is maintained by Ronan Collobert of Facebook AI Research, already offers categorically similar open-source deep learning tools and that Yoshua Bengio's lab has long maintained Theano, the revolutionary software package which pioneered the category in the first place, making deep learning easy for the masses. In an article at Wired, Cade Metz described TensorFlow as Google's "Artificial Intelligence Engine". Even this headline stands out as hyperbolic for an article describing an open-source library for performing linear algebra and taking derivatives. A number of other news outlets marveled that Google made the code open source.
From the more technical side, the reception also spanned hyperbolic praise through cold water. Offering a quantitative take, Soumith Chintala published a suite of benchmarks across all the rival packages, showing that the first version of TensorFlow lagged in speed behind Torch and Caffe, especially on convolutional neural networks. As Jeff Dean and Oriol Vinyals revealed at NIPS 2015, the most disappointing numbers owed primarily to a difference between the underlying version of NVIDIA's cuDNN library and controlling for cuDNN version, the results are competitive.
Matt Mayo, a KDnuggets contributor and Masters student, wrote a popular article expressing disappointment with TensorFlow. While hedging that he is not a deep learning expert, his takeaway was that TensorFlow was too similar to existing offerings and he lamented that much of the capacity for distributed computing was withheld. At NIPS in Montreal, Dean and Vinyals also indicated that support for large distributed systems should be part of TensorFlow in the first few months of 2015. Mayo correctly points our that TensorFlow is not fundamentally different from Theano or Torch. However, I disagree with the overall assessment that this is disappointing. Just as a Tesla is yet another four-wheeled conveyance with four doors, a steering wheel and a roof, TensorFlow appears to be the best, most convenient library for deep learning, more worthy of anointment.
Neither TensorFlow nor Torch nor Theano represents a revolution in artificial intelligence. They are all simply libraries used to build and train gradient-based machine learning models. Still, if training such models is your livelihood, subtle differences in capabilities and reliability can have a dramatic impact on quality of life. In this article, I'll explain what these libraries do, the subtle but important ways in which TensorFlow appears to be terrific, and the obvious reasons why it's been made available as open source software. Further, I'll explain why we should all be alarmed by the hysteria in both the media and engineer communities over libraries.
Why Fast Math?
Deep learning generally means building large scale neural networks with many layers. Simply put, these networks are simply functions which generate outputs Y given inputs X. In addition to the input X, the functions make use of a bunch of parameters (also called weights). These can include scalar values, vectors, and most expensively, matrices and higher-order tensors. A tensor is just a generalization of vectors and matrices into higher dimensions. The particular functions in vogue today involve tens of computationally expensive linear algebra operations, including matrix products and convolutions. Before we can train the network, we define a loss function. Common loss functions include squared error for regression problems and cross-entropy loss for classification. To train a network, we need to successively present many batches of new inputs to the network. After each is presented, we update the model by taking the derivative of the loss with respect to all of our parameters.
So right away there are a few obvious problems. First, multiplying tens or hundreds or tensors together millions of times to process even a moderately sized dataset is terribly expensive. Second, taking the derivative of giant ugly functions by hand is a pain and could consume days or weeks that would be better spent imagining new experiments. This is why we need libraries like Theano, Caffe, Torch, and TensorFlow. In the Theano-inspired paradigm, one defines the function symbolically by composing it from basic elements through which the library knows how to take derivatives. The library can then take this symbolic function and compile it for any available backend. This could be for CPU, GPU, or a heterogeneous computing environment. The takeaway here is first that with any of these libraries you can write only the prediction code or forward pass and the framework will figure out how to take derivatives for you, that is to calculate the backwards pass. Second, you write your code once in a nice high-level language without ever learning the ugly details of GPU coding, and the framework will compile for whatever CPU or NVIDIA hardware you have access to.
As Matt accurately pointed out in his article, TensorFlow is not the first to the party. Theano went public in 2010. Soon after Berkeley released Caffe, a high-performance deep learning library in C++ framework specializing in convolutional neural networks. Around this time Torch, which one uses inacquired similar auto-differentiation and GPU compilation capabilities. Further, TensorFlow appears to follow closely on the interface pioneered by Theano. TensorFlow's Variables behave like Theano's shared variables. TensorFlow's Placeholders behave like Theano's symbolic variables. So what's the big deal?
First, for my purposes TensorFlow picks the right language. I like to write in Python. Python is fast to write and easy to read. And since all of the performance critical code in any of these libraries is written in C++ and Cuda, I'd rather that the high level interface be exposed in a language that integrates easily with the rest of my workflow. With Python, I can use the same language to wrangle data, access the most powerful open source scientific computing tools, and build a web server to demo my work. If I were doing almost any other scientific computing task besides deep learning, my first thought rolling out of bed would be to look for an appropriate Python library. While I like everything I've seen from Torch and have deep respect for the Facebook AI Research team, I'm not excited about learning Lua and don't feel like the overhead of learning yet another programming language is the greatest use of my research time. Further, I already have a strong investment in working with NumPy. Using a Python framework makes that easy. This is a category to which only Python and Theano belong (Caffe has Python bindings for running models but you can't define arbitrary new models or kinds of layers without writing a fair amount of C++ code).
Second, TensorFlow operates easily with multiple GPUs. While the previous KDnuggets article suggested that other models also operate in these environments, it's not true. Working with multiple devices to achieve batchwise or model parallelism is comparatively complex. Further, when you typically fire up Theano, you compile to a single device. Switching configuration inside an interpreter session is either not done or at least not straightforward. In comparison, TensorFlow makes it easy to spin up sessions for running code on various devices, without having to exit and restart your program.
Third, TensorFlow compile times appear to be great. If you've spent time sitting for 1-5 minutes between code adjustments waiting for your Theano code to compile, you know how important this is.
Fourth, the TensorFlow community appears to move very fast. The community is strangely active. I began watching the project on GitHub and seriously regret the decision. In any given hour there are probably 10 people around the world improving TensorFlow. In comparison despite my deep love of Theano, I'm occasionally terrified by the uncertainty of whether my bugs stem from my code or from bugs in Theano. For example, there appears to be a well-known bug whenever you try to create random numbers inside a call to scan(). The error messages are not particularly useful and I have no way of knowing if or when this bug will be resolved.
Fifth, TensorFlow appears to have gone over the top with TensorBoard, creating a powerful set of visualizations for both network topology and performance.
In short, Theano invented the genre. It's the Ford motors of compiling code for deep learning. But TensorFlow appears well on its way to emerging as the Tesla motors of the genre. Theano got a lot of things right, and fortunately TensorFlow appears to mostly embrace the Theano way. I view this as an asset. TensorFlow offers a better interface and faster compile time. Caffe is a terrific library for training convolutional neural networks but is not really in the same category of tools for prototyping and training arbitrary neural networks. Torch appears to have a comparable offering and I imagine the libraries will compete in the years to come, but for the time being I'm much happier to stick with Python and NumPy.
The Obvious Case for Open Sourcing
Despite the intense speculation in the media over why Google open-sourced TensorFlow, the reasons should be obvious given a sober understanding of what the library does. These are libraries, not algorithms. Google isn't sharing the secret details of it's search algorithm, just as Facebook isn't sharing the trained model which curates news feeds as part of Torch. On the balance, researchers will do more or less the same work regardless of whether they do it in Theano, Caffe, TensorFlow, or Theano. At present, Google, Facebook, Microsoft, and others are in a war for machine learning talent. If the next generation of PhDs comes of age with their skills wedded to Facebook's Torch platform, this will mean they'll be more valuable and productive working at Facebook as compared to Google. By open-sourcing TensorFlow, Google balances the playing field.
Machine Learning Uncomfortably Like Web Development
As a parting thought, the overblown positive and negative coverage over something so innocent as a library should serve as a warning that mainstream machine learning too closely resembles web development, a field pathologically obsessed with libraries. The web development community has given rise to libraries which wrap libraries which wrap libraries. This is annoying because it's probably wasteful and unnecessary. But it's also annoying because it creates an environment in which the most useful skill is staying on top of the rapidly evolving set of libraries. Sadly, as machine learning crosses the boundary from niche academic discipline to commercial enterprise, some amount of this may be inevitable. I hope that TensorFlow will be the last deep learning library I'll have to learn for some time.Zachary Chase Lipton is a PhD student in the Computer Science Engineering department at the University of California, San Diego. Funded by the Division of Biomedical Informatics, he is interested in both theoretical foundations and applications of machine learning. In addition to his work at UCSD, he has interned at Microsoft Research Labs and as a Machine Learning Scientist at Amazon, is a Contributing Editor at KDnuggets, and has signed on as an author at Manning Publications.
- TensorFlow Disappoints – Google Deep Learning falls shallow
- Does Deep Learning Come from the Devil?
- MetaMind Competes with IBM Watson Analytics and Microsoft Azure Machine Learning
- Deep Learning and the Triumph of Empiricism
- The Myth of Model Interpretability
- (Deep Learning’s Deep Flaws)’s Deep Flaws
- Data Science’s Most Used, Confused, and Abused Jargon