Gold BlogUsing Deep Learning to Solve Real World Problems

Do you assume that deep learning is only being used for toy problems and in self-learning scenarios? This post includes several firsthand accounts of organizations using deep neural networks to solve real world problems.

Are you using deep neural networks in the real world, solving real world problems?

A number of weeks ago I asked my LinkedIn connections this very question, in the wake of Kaggle's "The State of Data Science and Machine Learning" 2017 report. The Kaggle report revealed that "neural networks" are being employed by 37% of respondents. The report's algorithm breakdown considers CNNs, RNNS, and GANs separately. It's a given that self-selecting surveys are difficult to get perfect, but I was surprised by the high percentage that neural networks garnered, to be honest. It was generally assumed (up to then) that neural networks have been used by a very small percentage of machine learning/data science practitioners on real world problems (as opposed to toy problems for educational purposes).

Kaggle report

So I solicited input from those who were interested, and selected a number of eager volunteers to share how they are using neural networks to solve actual problems. What follows are select responses, unedited and in full, from a number of these participants. I encourage anyone interested in doing so to share in the comments below their experiences using neural networks to solve actual problems.

First up is William Falcon. William is a deep learning researcher, and co-founder at NextGenVest, where he is building deep learning models to scale personalized financial advice over SMS for Gen Z.

At NextGenVest we've developed a retrieval based Neural Network powered chatbot. Our business motivation is to provide human-powered personalized student loan advice to Gen Z over text message at scale.

Given the volume of information and messages (3 million text messages), our particular problem is difficult because we can't train a model that is very tailored to one specific type of conversation or student loan guidance. As a startup, we also have the added difficulty of not having millions of datapoints to help train robust models.

To solve our problem we started with the google smart reply paper from June 2017. Our model offers suggested responses to our human advisors (what we call “Money Mentors”) effectively giving them a summary of the current chat, the user's intent and possible actions without the mentors having to read the whole conversation. In this model they use dual encoder networks to embed emails and responses in a shared space from which they can query appropriate responses. Between their 2015 and 2017 paper they moved away from Recurrent Neural Networks to fully connected networks. The primary reason being speed and scalability. We started with a model implemented from their paper and modified a few parts to work on chats, namely the way embeddings are fed in the network and how we model relationships for longer sequences. We were able to generate around 3 million data points from our less than 45,000 chats by extracting multiple x,y subpairs from chats. The resulting model had a 62% recall at k=1 from 50 total options. That means that given 1 chance to guess right, our model picked the correct one 62% of the time from 50 possible responses.

The model worked better than expected. It learned to greet users and offer up scholarship recommendations without specifically being trained to do so, which ultimately would reduce student loan burdens. It's allowed our Money Mentors to increase their reply rates to text messages by up to 50% and has helped us keep our Money Mentor headcount at 16 (down from 86). It was very tricky to get this particular neural network to work, but proper hyperparameter fine-tunning and appropriate architectural changes to the network should help get to a good solution.

Next is Charles Martin. Charles is a data scientist & machine learning AI consultant who runs Calculation Consulting, based in San Francisco. This is his account of how he and his clients are using neural networks to solve real problems.

In the past couple of years, I have seen a large uptick in 'real' deep learning in my consulting practice. Of course, LSTMs and 1D CNNs are useful for simple NLP applications, although a few years ago, I would not have recommended such methods to clients using frameworks like Caffe or Torch because they would have been so hard to maintain in production. Today, with TensorFlow, Keras (and PyTorch), I can both develop advanced solutions and reliability hand them off.

In turn, clients are now asking for sophisticated solutions such as generating text, transforming images, and custom object detection. We are exploring Reinforcement Learning, Generative Adversarial Nets, and Bayesian Casual Inference. The tools are now widely available and free. Machine learning is no longer a software engineering problem.

Clients see the business opportunity, but are struggling to staff, scope and manage such projects. These projects require a radically different approach than traditional IT consulting. They are essentially R&D projects, and it is difficult to both project the timelines and set milestones. And forget about trying to manage them internally using the same old processes.

The smart companies are transforming themselves into learning organizations. I have one client in particular that understands that they need to transform themselves from an IT services company to an AI company, and I am working with them and their staff to do just this.

Andres Milioto, of the University of Bonn, shared the following in a separate, dedicated article which he penned as a result of a back and forth between he and I after my call for input went out on LinkedIn. You can read his entire account here.

We focus on a detection on a per-plant basis to estimate the amount of crops as well as various weed species as a part of an autonomous robotic perception system. We are working on vision-based classification system for identifying crops and weeds in both, RGB-only as well as RGB combined with near infra-red (NIR) imagery.
The advances in image classification, object detection, and semantic segmentation using deep Convolutional Neural Networks, which spawned the availability of open source tools such as Caffe and TensorFlow (to name a couple) to easily manipulate neural network graphs, and to quickly prototype, train, and deploy using off the shelf GPUs made a very strong case in favor of CNNs for our classifier. Another thing that made a deep learning approach possible was that during the last 2 years of project we have been able to gather a dataset containing a large amount of data (in the order of 10^5 labeled images). A big part of these dataset has been made publicly available to allow other institutions to benefit from it (see publications by Nived Chebrolu et. al), and also to compare algorithmic results.
We are exploring generative models, and several unsupervised learning approaches in order to achieve this, all of which would use the autonomous capabilities of the robots to gather new data in the new field and allow it to intelligently auto-retrain itself for the new task in hand. We are also always working on getting the models to run faster and faster, and using less resources and power, for example by using newly available hardware accelerators for neural networks.

Finally, we have Dipanjan Sarkar. Dipanjan works as a data scientist in Intel’s IT Infrastructure Analytics team, trying to leverage advanced analytics, machine learning and deep learning to solve problems proactively with data-driven insights. He is also a technical author who has published several books around machine learning, social media analytics and natural language processing. Having a passion for education and data science, he is also a data science mentor at Springboard helping others gain necessary skills around data science and machine learning.

Working as a data scientist with Intel's IT Infrastructure Analytics team, we need to make sure our key IT infrastructure is healthy, robust and fault-resistant. Traditionally, data warehouses, databases and business intelligence (BI) reports and dashboards served the purpose of descriptive analytics. This basically tells us what has already happened or what the current health of various organizations could be, based on trackable metrics and KPIs. However the shift towards diagnostic, predictive and prescriptive analytics made us look towards newer paradigms and methodologies including machine learning and deep learning. The key idea is to leverage these methodologies in order to diagnose and predict events of interest which might happen in the future and take necessary actions based on data-driven insights.

For us, events of interest could be tracking device anomalies, data center shutdowns, server faults and general incidents. Data-driven insights involve making use of the necessary data in our organization ranging from device events to customer tickets to get valuable insights which drive decisions and actions. Some of the challenges with traditional machine learning approaches involve spending tedious time in hand-crafted feature engineering, extensive model tuning and overfitting models. These challenges, along with the recent advancements in deep learning software as well as hardware, drove us to pursue deep learning for some of our more challenging problems around predictive analytics.

We leveraged architectures like Convolutional Neural Networks (CNNs) along with the regular fully connected deep networks for predicting network device failures in our environment. The focus is to leverage diagnostic network events from devices like load balancers and predict future failures or critical alerts based on these past historical events. Leveraging CNNs for automatically extracting features enabled us not only to cut down tedious time and effort spent for feature engineering but also to gain superior prediction accuracy when detecting and predicting device failures. We have also leveraged concepts around word embedding’s, sequence models, Long Short Term Memory Networks (LSTMS) along with regular deep networks for problems such as root cause classification and incident resolution by using our vast trove of historical data pertaining to issues, incidents and problems. Our ultimate vision is to work towards building a self-healing eco-system around infrastructure and we believe these are tiny but crucial steps towards that. We are also currently looking at newer methodologies like transfer learning and one-shot learning to apply in scenarios where we have lower volumes of data but can still achieve superior accuracy using these methods as compared to traditional statistical or machine learning models.