- Vision Transformers: Natural Language Processing (NLP) Increases Efficiency and Model Generality - Feb 2, 2021.
Why do we hear so little about transformer models applied to computer vision tasks? What about attention in computer vision networks?
- Attention mechanism in Deep Learning, Explained - Jan 11, 2021.
Attention is a powerful mechanism developed to enhance the performance of the Encoder-Decoder architecture on neural network-based machine translation tasks. Learn more about how this process works and how to implement the approach into your work.
- Deep Learning’s Most Important Ideas - Sep 14, 2020.
In the field of deep learning, there continues to be a deluge of research and new papers published daily. Many well-adopted ideas that have stood the test of time provide the foundation for much of this new work. To better understand modern deep learning, these techniques cover the basic necessary knowledge, especially as a starting point if you are new to the field.
- AI Papers to Read in 2020 - Sep 10, 2020.
Reading suggestions to keep you up-to-date with the latest and classic breakthroughs in AI and Data Science.
- A Deep Dive Into the Transformer Architecture – The Development of Transformer Models - Aug 24, 2020.
Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.
- Is depth useful for self-attention? - Jul 27, 2020.
Learn about recent research that is the first to explain a surprising phenomenon where in BERT/Transformer-like architectures, deepening the network does not seem to be better than widening (or, increasing the representation dimension). This empirical observation is in contrast to a fundamental premise in deep learning.
- KDnuggets™ News 19:n45, Nov 27: Interpretable vs black box models; Advice for New and Junior Data Scientists - Nov 27, 2019.
This week: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead; Advice for New and Junior Data Scientists; Python Tuples and Tuple Methods; Can Neural Networks Develop Attention? Google Thinks they Can; Three Methods of Data Pre-Processing for Text Classification
- Can Neural Networks Develop Attention? Google Thinks they Can - Nov 25, 2019.
Google recently published some work about modeling attention mechanisms in deep neural networks.
- Beyond Neurons: Five Cognitive Functions of the Human Brain that we are Trying to Recreate with Artificial Intelligence - Sep 3, 2019.
The quest for recreating cognitive capabilities of the brain in deep neural networks remains one of the elusive goals of AI. Let’s explore some human cognitive skills that are serving as inspiration to a new generation of AI techniques.
- Deep Learning Next Step: Transformers and Attention Mechanism - Aug 29, 2019.
With the pervasive importance of NLP in so many of today's applications of deep learning, find out how advanced translation techniques can be further enhanced by transformers and attention mechanisms.
- Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention - Mar 6, 2019.
In this post, the author shows how BERT can mimic a Bag-of-Words model. The visualization tool from Part 1 is extended to probe deeper into the mind of BERT, to expose the neurons that give BERT its shape-shifting superpowers.
- GANs Need Some Attention, Too - Mar 5, 2019.
Self-Attention Generative Adversarial Networks (SAGAN; Zhang et al., 2018) are convolutional neural networks that use the self-attention paradigm to capture long-range spatial relationships in existing images to better synthesize new images.
- Deconstructing BERT: Distilling 6 Patterns from 100 Million Parameters - Feb 27, 2019.
Google’s BERT algorithm has emerged as a sort of “one model to rule them all.” BERT builds on two key ideas that have been responsible for many of the recent advances in NLP: (1) the transformer architecture and (2) unsupervised pre-training.