Top arXiv Papers, January: ConvNets Advances, Wide Instead of Deep, Adversarial Networks Win, Learning to Reinforcement Learn

Check out the top arXiv Papers from January, covering convolutional neural network advances, why wide may trump deep, generative adversarial networks, learning to reinforcement learn, and more. has become the leading clearinghouse for open-access bleeding edge machine learning research, especially that on neural networks. Keeping up with the shared research in real time is impossible, given the relative information deluge. Hopefully this post can help convey some of the top arXiv papers from January.

The choice of "top" is somewhat subjective; I have used Andrej Karpathy's Arxiv Sanity Preserver to select form among the top papers of the past month (as queried on the evening of January 31) -- "top" means being included in the most users' libraries -- and then hand picked the seemingly most interesting, at least in my view. This isn't completely data-driven, but given that some papers would have had a full month to be "upvoted," while others could have come on the last day of the month, I feel confidently justified (enough) in the process and selections. If you don't like the method, feel free to check out the top returns yourself.

Along with the title and authors for each paper, you will also find some modest commentary, links, perhaps an image, and an excerpt from the abstract.

Recent Advances in Convolutional Neural Networks

Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang

Convolutional neural networks (CNNs) are, without doubt, among the most researched and implemented contemporary neural network architectures. They have proven their mettle over the past several years, and have all but revolutionized entire areas of machine learning. If you need a crash course on CNNs, this paper may be a good up-to-date starting place.

Recent Advances in Convolutional Neural Networks

After the rapid growth in the amount of the annotated data and the recent improvements in the strengths of graphics processor units (GPUs), the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. Besides, we also introduce some applications of convolutional neural networks in computer vision.

Wide Residual Networks

Sergey Zagoruyko, Nikos Komodakis

This paper discusses wide architectures as a counter to the lengthy training times of the very deep networks generally required for continued accuracy improvements. The use of ResNet blocks in an architecture of decreased network depth and increased width are discussed, in order to combat training times while preserving these accuracy improvements -- Wide Residual Networks. The results are promising.

It should be noted that this is an update to a paper originally posted May 2016, which is why you've (probably) heard of this already. But if you haven't, read up. Here's an old Reddit discussion on the paper.

We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layer-deep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at this https URL

Adversarial Feature Learning

Jeff Donahue, Philipp Krähenbühl, Trevor Darrell

Generative Adversarial Networks (GANs) are a "hot topic" in machine learning. GANs do a great job mapping simple latent data distributions to more complex distributions, which can be beneficial in a variety of uses.

However, in their existing form, GANs have no means of learning the inverse mapping -- projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning.

This is the most recent version of the paper.

NIPS 2016 Tutorial: Generative Adversarial Networks

Ian Goodfellow

As per the title, this is Ian Goodfellow's NIPS 2016 tutorial on Generative Adversarial Networks (GANs). You may know Ian from pioneering GANs. And so you would assume this is a solid overview of the technology. And it is. In fact, this is canonical reading for the generative network enthusiast.

NIPS 2016 Tutorial: Generative Adversarial Networks

This report summarizes the tutorial presented by the author at NIPS 2016 on generative adversarial networks (GANs). The tutorial describes: (1) Why generative modeling is a topic worth studying, (2) how generative models work, and how GANs compare to other generative models, (3) the details of how GANs work, (4) research frontiers in GANs, and (5) state-of-the-art image models that combine GANs with other methods. Finally, the tutorial contains three exercises for readers to complete, and the solutions to these exercises.

Learning to reinforcement learn

Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick

Reinforcement learning (RL) systems have had smashing successes of late. This paper discusses deep meta-reinforcement learning, which is intended to combat the massive amounts of training data which RL systems generally require.

Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.

A modest Hacker News discussion which makes a few good points on the original version of this paper from late 2016.

Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review

Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao

This paper is an overview of neural networks, deep versus shallow architectures, and how the curse of dimensionality fits. Not exactly groundbreaking, but good review and tutorial material nonetheless.

The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage.

Benchmarking State-of-the-Art Deep Learning Software Tools

Shaohuai Shi, Qiang Wang, Pengfei Xu, Xiaowen Chu

This technical paper benchmarks a number of deep learning frameworks employing a variety of network architectures, just as the title suggests. This is the latest version of the paper.

In this paper, we aim to make a comparative study of the state-of-the-art GPU-accelerated deep learning software tools, including Caffe, CNTK, MXNet, TensorFlow, and Torch. We first benchmark the running performance of these tools with three popular types of neural networks on two CPU platforms and three GPU platforms. We then benchmark some distributed versions on multiple GPUs. Our contribution is two-fold. First, for end users of deep learning tools, our benchmarking results can serve as a guide to selecting appropriate hardware platforms and software tools. Second, for software developers of deep learning tools, our in-depth analysis points out possible future directions to further optimize the running performance.