Gold Blog, May 2017Deep Learning – Past, Present, and Future

There is a lot of buzz around deep learning technology. First developed in the 1940s, deep learning was meant to simulate neural networks found in brains, but in the last decade 3 key developments have unleashed its potential.

2. The Rise of the Graphics Processing Unit (GPU)

Making a neural network run fast is difficult. Hundreds or thousands of neurons must interact with each other in parallel. Depending on the task, it could take weeks for traditional CPUs to generate a prediction from an ANN. With GPUs, the same task that took weeks may only take days or hours.

GPUs were first built by NVIDIA to handle the massively parallel operations that video games require to render images hundreds of times a second for smooth video display. In 2009, Andrew Ng and several others found they could use GPUs for large-scale deep learning.

To illustrate the power of GPUs, Ng replicated the Google X project with a network with 11 billion connections running on 16 computers powered by just 64 GPUs – the previous project used 1,000 computers with 16,000 CPUs. The new project did not run too much faster or perform better, but Ng made his point. Sixty-four GPUs could handle the same amount of work as 16,000 CPUs in roughly the same amount of time.

3. The Invention of Advanced Algorithms

Although a range of discoveries have increased ANNs capabilities, many consider the discoveries made by Geoffrey Hinton and his colleagues in 2006 to be the turning point for ANNs.

Hinton introduced an algorithm that could fine-tune the learning procedure used to train ANNs with multiple hidden layers. The key was using a ‘greedy’ or gradient descent algorithm which could fine tune each layer of the ANN separately.

The other key discovery optimized the initial setting of the weights. This allowed high-dimensional data, or data with many features, to convert into low-dimensional data, increasing predictive power.

Hinton is credited with putting the ‘deep’ in deep learning because he operationalized multiple hidden layers. Hinton and his team reportedly coined the term “deep learning” to rebrand ANNs. At that point in time, many professionals and funders had no interest in supporting ANNs because they were thought to be unprofitable.

What’s The Impact?

Deep learning technology is solving highly complex problems that have eluded computer scientists for decades thanks to heightened processing power, massive amounts of available data, and more advanced neural network algorithms.

For example, deep learning is being used to improve natural language processing tools to consistently comprehend the meaning of a sentence, not just the individual words. So if someone wants to translate ‘take a hike’ or ‘get lost’ it will not take the expression literally. It will translate the expression into a corresponding expression in the other language.

Object recognition software will become more prevalent and accurate. For example, facial recognition software is already operating at a high level. Scientists are now training deep learning algorithms to differentiate between similar objects, such as teacups and bowls, houses and cabins, shoes and boots. This precision allows computers to differentiate between  pedestrians on a street, detect anomalies in common objects, piece together panoramic photos, index images, and much more.


Using ANNs comes with a couple drawbacks, namely the black box problem and overfitting.

The black box problem is the inability to know how an ANN reached a prediction. Users can see the data in the input and output layer, which offers an inkling of what input variables it deems important. However, the hidden layers mask the underlying reasoning behind a prediction. Hence, business leaders are less inclined to trust an untested ANN because they cannot see how it reaches its conclusion unlike other algorithms whose processes are clearly visible.

Overfitting is also a common problem with ANNs. Overfitting happens when an algorithm fits a set of test data so well that it fails to perform accurately with non-test data. This problem is not unique to deep learning and can be seen in other types of machine learning algorithms.


There are many algorithms that data scientists can use to detect patterns and relationships in underlying data. Deep learning algorithms are some of the most powerful because they can adapt to a wide variety of data, require little statistical training, learn with simple algorithms, and scale to large data sets.

But in practical use, deep learning is overkill if your project uses small data volumes and solves simple problems. If you process large amounts of data and need to produce complex predictions, deep learning technology may be beneficial. And if there doesn’t seem to be a deep learning tool that fits your needs, just wait.

For more reading, check out How Deep is your Data? by Julian Ereth and Enterprise Grade Data Science byStephen Smith.

Bio: Henry Eckerson covers business intelligence and analytics at Eckerson Group and has a keen interest in artificial intelligence, deep learning, predictive analytics, and cloud data warehousing.