Semi-supervised Feature Transfer: The Practical Benefit of Deep Learning Today?
This post evaluates four different strategies for solving a problem with machine learning, where customized models built from semi-supervised "deep" features using transfer learning outperform models built from scratch, and rival state-of-the-art methods.
Strategy A: Build a model from scratch using open source tools
When it comes to open source tools for machine learning, Python + scikit-learn (sklearn) are the weapons of choice. They provide an excellent interface for training machine learning models, and strong implementations of methods that are known to work well. As a bonus, developers/engineering teams won’t grumble at you for handing them a model in some weiRd new language. ;)
Our strategy here is to build a good solid sentiment analysis model from scratch, using features and a classifier that are known to work well for sentiment analysis:
ML features: Term Frequency-Inverse Document Frequency (TF-IDF) features (i.e., count words and normalize by how often you expect each word to appear in this example).
Economics & practical factors:
- How much data required to train a good model?
- As many labeled training examples as possible. A minimum viable model starts somewhere around a thousand labeled examples per class, even for simple modeling problems like sentiment analysis.
- Can be slow/expensive to obtain labeled examples when academic data are not available for easy download.
- How much effort to train?
- Sklearn makes it easy for trained data scientists to experiment with machine learning models, but many organizations don’t have in-house talent.
- How much effort to deploy?
- Making predictions from a trained sklearn model is easy, but deploying a scalable service that can be integrated into your product is a major engineering effort.
- Complexity adds significant technical debt to a product or organization.
- Does your organization manage compute and storage infrastructure? Do you want to?
- Other sticking points?
- How will you update your model over time as your data evolve? What if your data scientist is no longer available -- can someone else manage the model, or deploy an updated version?
- How long can you afford to experiment before deploying a model?
Strategy B: Integrate a pre-built API.
Some problems, such as sentiment analysis, are common enough that pre-trained APIs may be available for the specific problem. Although you should confirm that your use case and data domain are aligned with how the model was trained, you generally won’t need to train anything. Two sentiment APIs are available through indico: the "Sentiment" API is optimized for throughput/speed, and "Sentiment HQ" is optimized for accuracy. The model behind Sentiment HQ is a recurrent neural network, which responds to contextual cues like sarcasm and negation. These models were trained on many online reviews, but not IMDB or the Large Movie Review Database.
ML features: recurrent weights, trained on online reviews
Economics & practical factors:
- How much data required to train a good model?
- None. Awesome!
- How much effort to train?
- None!
- How much effort to deploy?
- Pretty minimal. The API has a well-documented REST endpoint and client libraries in most common programming languages (Python, Ruby, node, Java, R, etc). Generally less than 5 lines of code.
- Other sticking points?
- It only works for sentiment analysis.
Strategy C: Custom model using general text features
The general effectiveness of n-grams, TF-IDF features, and "bag of words" models are evidence that words convey a lot of the meaning in text. This is great, because it means word-level features should be valid across many contexts. This is why indico's general text features API provides word-level features. The Custom Collection takes input text, computes word-level features, and trains a custom classifier using input labels. We are transferring in pre-trained general features, and training on a small number of examples. So we expect the custom model to perform similarly to the "built from scratch" sklearn model, but with less data required.
ML features: word features trained on general text
Economics & practical factors:
- How much data required to train a good model?
- Hundreds of examples gives credible (but not great performance) on the task of sentiment analysis. For any given text problem, you can probably sit down with a beverage of choice and label several hundred examples in a few hours.
- How much effort to train?
- Moderate. You’ll need to have some labeled training examples, but the complexity of model training and hyperparameter optimization are handled by the API.
- How much effort to deploy?
- Pretty minimal. The API has a well-documented REST endpoint and client libraries in most common programming languages (Python, Ruby, node, Java, R, etc).
- Other sticking points?
- Requires a network connection.