Microsoft Deep Learning Brings Innovative Features – CNTK Shows Promise

Microsoft releases CNTK, a deep learning tool kit which shows promise. While a few innovative features set it apart from its competitors, a major drawback may hurt its adoption.

Xuedong Huang thought the tools that he and his team had were slowing down their research. So Xuedong Huang did something about it. The result? The Computation Network Tool Kit, or CNTK, Microsoft's newly released deep learning library.

According to Microsoft, CNTK is "a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph." It is, essentially, an alternative to other established deep learning frameworks, libraries, and toolkits, including TensorFlow, Theano, and Torch, among others.

CNTK architecture
CNTK architecture.

CNTK Overview

CNTK allows for the structured implementation of the popular deep neural network architectures, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short Term Memory Networks (LSTMs). As such, it should come as no surprise that it implements stochastic gradient descent (SGD) and backpropagation, as well as, importantly, auto differentiation. Perhaps most importantly, at least in this writer's opinion, CNTK supports parallelization across both multiple machines and multiple GPUs (without regard to where the GPUs are located). As someone who was notably critical of this functionality's absence from TensorFlow, I can say that this is the feature that has most captured my attention.

CNTK was developed with the following criteria in mind (taken from the CNTK NIPS 2015 tutorial):

  • Efficiency: Can train production systems as fast as possible
  • Performance: Can achieve state-of-the-art performance on benchmark tasks and production systems
  • Flexibility: Can support various tasks such as speech, image, and text, and can try out new ideas quickly

While Huang and a team of volunteers set out at Microsoft to develop tools to help in their pursuit of computational speech recognition, the performance-focused solution is a general toolkit which can be useful for a wide-range of applications. CNTK has now been officially released on Microsoft's Github account, and is available for all to put to use for their own endeavors. Far from claiming to be a polished product, the repository's README has the following disclaimer:

CNTK is in active use at Microsoft and constantly evolving. There will be bugs.

CNTK had previously been available via a strict open source license since April 2015, mostly for use by academics. It was officially released for the masses on January 25, 2016. After 2 short days in the wild, CNTK had 25 logged issues; it also had upwards of 4000 stars and 500+ forks. Clearly, such a move (open-sourcing) should help the development team identify and iron out bugs in a much more efficient manner, the intellectual payment for allowing others to take advantage of their research.

Internal CNTK tests from December point to an up to 4 times speed increase over other similar libraries, such as TensorFlow. In reality, that number doesn't tell the whole story, as can be seen in the graphic (source: Microsoft) below; however, it does show promising performance. Says Huang: "The CNTK toolkit is just insanely more efficient than anything we have ever seen."

It should be noted that there are no independent benchmarks available at the time of writing, though I suspect them to be coming in the next few days.

CntK performance comparison
CNTK performance comparison.

CNTK currently supports Windows and Linux platforms. It also touts its modularization, maintaining a separation of computation networks, execution engine, learning algorithms, and model descriptions. CNTK also supports the description of neural networks via C++, Network Definition Language (NDL) and other descriptive languages, with Python and C# bindings planned; however, it seems to me that the manner in which model description is being most arduously promoted is via NDL. NDL is a (seemingly concise) script language, specifically designed for describing such neural networks, an example of which is shown below.

NDL example
An example NDL script (click to enlarge).

CNTK's repo includes both source to build from, as well as binaries to grab and run straight-away. A simple getting started is found here, and here is a small collection of samples which can be downloaded, run, and studied as tutorials.

While its Github has some useful information on it, I found that, by far, the quickest way to get a useful overview of CNTK was to look at its NIPS 2015 tutorial. If you want the full treatment, check the CNTK reference book, titled "An Introduction to Computational Networks and the Computational Network Toolkit".


As mentioned above, CNTK supports scaling to GPU clusters out of the box. This means distributed neural network training on clusters of basically any size, making the toolkit useful for hobbyists starting out, startups with specific goals, and researchers with large-scale ambitions (and matching hardware capacity). For this author, who was critical of TensorFlow for not supporting distributed training upon its release, CNTK provides some functionality that open source competitors do not currently possess, making it especially tempting to investigate. It also removes the lingering questions I had when investigating TensorFlow: Why? What? And Where? As in, why is the software being released? What is its particular contribution to the landscape? And where exactly does it fit in?

This is much less of a problem with CNTK, at least in my eyes, for this lone reason: CNTK supports distributed neural network training on GPU clusters. End of story. This fills a very specific void. While KDnuggets has recently shared stories on distributed neural nets using TensorFlow alongside Spark, as well as SparkNet (combining Spark and Caffe), this brings the native support that is genuinely desired.

Benchmarks undoubtedly need to be independently verified. If the numbers above hold true, CNTK is also seemingly at an advantage with regard to performance speed.

Another potential positive for CNTK is the NDL language for network descriptions. Now, while I have only played with the tutorials included with CNTK, and extended use of this declarative method for building neural networks in order to evaluate its effectiveness is necessary, there is a relatively good chance that using the configuration files as demonstrated in the above image, as well as in the tutorial and reference book links above, may well be convenient, especially for quick prototyping. For those more interested in modeling than programming, this seems like a solution. In machine learning, programming is a means to an end, and treating it as such can be refreshing.

However, the lack of other language bindings, Python in particular, is almost definitely a hindrance to CNTK in this early stage. When looking to maximize adoption (this is not a stated goal, but almost certainly is a reasonable assumption, to a large degree), giving current practitioners the tools to integrate into existing pipelines is critical. Python is the undisputed lead language when it comes to open source machine learning. Argue if you like, but it would be futile. That is not to say there are no other options, but the glue in this case really is Python. Not catering to this massive ecosystem almost relegates a new tool to novelty status, a toy to play around with until the real work needs to be done. Until such bindings come, CNTK is relying on those who do not partake in the established Python ecosystem.

In summary, Microsoft's newly-released deep learning suite, the Computation Network Tool Kit, has a number of positive aspects working in its favor as far as making it a viable alternative to its already-established competitors. However, while distributed training, GPU cluster support, and possibly the NDL descriptive language should all persuade potential users to give it a serious look, its current lack of Python support will undoubtedly stymie its acceptance as an industrial strength tool. I will admit, however, to being somewhat excited to training a few distributed networks.

If you have given CNTK a test drive, or are thinking of doing so, let us know in the comments.

Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.