An Inside Update on Natural Language Processing
This article is an interview with computational linguist Jason Baldridge. It's a good read for data scientists, researchers, software developers, and professionals working in media, consumer insights, and market intelligence. It's for anyone who's interested in, or needs to know about, natural language processing (NLP).
To finish this thread: You'd say that newer technologies -- machine learning and all that -- have changed computational linguistics?
Certainly machine learning has changed the face of computational linguistics, over the past three decades. It starts with Bell Labs using Hidden Markov Models to recognize speech in the early 1980s; their successes there led to text-oriented NLP work incorporating machine learning starting in the late 1980s. (Philip Resnik calls this DARPA's shotgun wedding for NLP.)
Machine learning really dominated all of NLP by the end of the 1990s. The majority of that work was using discriminative models like support vector machines and logistic regression to do text classification and generative models for language modeling and tagging. Machine learning specialists figured out that language provided fun problems and students learning in the early 2000s were trained in machine learning. So by the mid-2000s we had a larger portion of the community that could not only use machine learning methods, but also dig into their internals and tweak them. Bayesian models also became the rage in the mid-2000s. Their attraction stemmed from their support for priors, the promise of learning on less data, the natural intuitiveness of the generative stories for language problems, and the availability of inference algorithms for learning their parameters. We now see deep learning in a similar position, and it's really leading to big improvements in important tasks like speech recognition, machine translation, parsing, and more.
Analysis functions transform and extract information from text. (From Jason's 2014 Sentiment Analysis Symposium tutorial.)
The ideal scenario, from my perspective, is that new architectures will be inspired and informed by work in computational linguistics (and linguistics proper), but also that they will be learned without requiring costly and imperfect labeled data. As an example, one almost surely needs to handle syntax in some way to know that the book is the thing purchased in the sentence "I saw the book that John said Bill knows Fred bought and gave his daughter." It would be the coolest thing ever if that could be done without needing to train a model on a treebank, but instead takes the idea of phrase-structure, dependency grammar, categorial grammar, etc, and learns the representations from pairs of sentences for machine translation. But you still need the theory/representations and a way to represent them in your network/algorithm. As another example, you might not want to be stuck with a particular word sense inventory, e.g. the one created by annotators of WordNet, but you might do well to represent the idea of word senses in your model. And so on.
You co-invented OpenNLP, now part of Apache, when you were a grad student at the University of Edinburgh, back in 2000. Back then, you were a Java proponent, but a few years back you embraced Scala.
I'm frankly not really satisfied with the JVM ecosystem as regards NLP tasks in general. OpenNLP has a solid offering, but it's lagging with respect to the state-of-the-art. Yes, I started it as a grad student and am very happy to see its continued development, but I can no longer bear programming in Java and haven't contributed myself in a long time.
I have contributed to ScalaNLP more recently, but I just haven't found time to work on it in the last two years and it's been entirely carried by David Hall. He does great work, but ScalaNLP as a whole doesn't have the maturity and documentation necessary to be a go-to NLP framework on the JVM.
Stanford's CoreNLP is a mature and strong system, but it doesn't have an Apache, MIT or BSD license and that means that I and many others won't touch it for commercial work. I'd personally love to have a JVM-based implementation along the lines of Matthew Honnibal's SpaCy system, which is written in Python. If I had nothing else to do, I'd be happily working on an ASL-licensed NLP system written in Scala that integrates well with Spark and deep learning libraries, etc. Maybe some day I'll be able to do that!
What other tool/environment preferences do you have nowadays, for coding NLP functions, for NLP-centered product development, and for data science?
People Pattern's Vespanaut: Into the Future
I'm still very much Scala-centric. This has worked well for our development at People Pattern. Our data infrastructure is JVM-based, with heavy reliance on technologies like Spark and Elastic Search. It means that as we develop NLP components, we can serialize them and deploy them in both large batch analysis jobs and RESTful APIs straightforwardly. It also means that components I build are not too far from production. Overall, this has been very important for moving quickly as a small team. However, we aren't entirely dependent on Java+Scala. For example, we use the C-based Vowpal Wabbit for much of our supervised model training. We extract and index features using Scala, output training files, estimate parameters with VW, and then read the resulting model back in to Scala and serialize the whole thing. Having said that, we are now exploring using toolkits like MLlib and H2O (for standard model types) since they integrate natively with with Spark jobs. We also have our own proprietary models, which are implemented in Scala. It's worth mentioning that R plays a big role in our data science prototyping.
We are also working on deep learning models for image and text processing. I considered using DL4J and H2O for JVM compatibility, but I wasn't satisfied with their current offerings for our needs and decided to opt for TensorFlow instead. TensorFlow has allowed us to quickly ramp up our efforts, and it has tremendous momentum behind it. It means we have a gap with respect to how we deploy our RESTful APIs, but our initial need is to use them for batch analysis. And, I'm still keeping an eye on DL4J and H2O.
You teamed with computational linguist Philip Resnik -- you quoted him earlier -- to create the ConveyAPI text analytics technology for social agency Converseon, which brought it to market via spin-out Revealed Context. Converseon has a wealth of social customer insights data and domain knowledge that you leveraged, via machine learning, to craft high-accuracy classifiers. What key lessons did you learn from the experience -- general or specific to social insights -- that you'd pass on to others?
There are two main lessons: the value of high quality labeled data and using diverse evaluation strategies.
Converseon amassed hundreds of thousands of annotations across tens of industry verticals. For a given text passage (e.g. comment or blog post), mentions of entities, products and terms relevant to the vertical were extracted and assigned sentiment labels, including "positive," "negative," "neutral," and "mixed." This gave us what we needed to build a sentiment classifier that used not only standard context-word and lexicon features, but furthermore features that capture the distance and relationship between the target term and the sentiment bearing phrases. For example, with a sentence like "The new Mustang is amazing, but I'm disappointed with the Camaro," the system detects that both "Mustang" and "Camaro" are targets of interest, and it uses information like the nearness of "amazing" to "Mustang" and the fact that a discourse connective like "but" separates "disappointed" and "Mustang" (and vice versa with respect to "Camaro"). It was important to have that much high-quality labeled data to get a strong signal for these features. As a result, the model is able to say that the Mustang and the Camaro were referred to positively and negatively, respectively, in the same sentence.
How has this work proved out?
Evaluation was not done simply on overall accuracy, which is a terrible sentiment-classifier performance measure if that's all you look at. Instead, we focused on class-specific precision and recall measures to ensure we were capturing each label well, and more importantly, to ensure that we made few positive-negative errors while being more permissive of positive-neutral and negative-neutral errors.
Humans tend to agree strongly on positive versus negative, but they disagree with each other a lot when it comes to neutral items. To get a sense of this, consider that human agreement when "neutral" is one of the labels tends to be in the 80-85% range. Because of this, we also considered the performance of our classifiers and those of other vendors with respect to average disagreement of human annotators. Across several data sets, Convey achieved 90-100% of human agreement, while other vendors were in the 80-90% range, which was a massive and pleasing discrepancy (for us).
Another important aspect of our evaluation was that we compared per-target sentiment ratios and overall ranking of products with respect to each other. The ground truth gave us counts such as 100 positive, 20 negative and 300 neutral mentions of a product like "Ford Mustang." Given that Convey found 80 positive, 15 negative, and 250 neutral mentions, we could measure how well we captured the ground truth ratio. We furthermore measured rank correlations (e.g. Pearson's) of all the products in a data set given ground truth sentiment ratios and our predicted ratios. Having these evaluations as part of the experimental setup used to decide development priorities and directions was important for producing a more capable classifier that generalized well to new data.
I recruited you to speak at the 2016 Sentiment Analysis Symposium in New York, and pre-conference you'll be teaching a tutorial, Computing Sentiment, Emotion, and Personality. But rather than ask about your presentations, instead, what two symposium speakers are you most looking forward to hearing?
Katharine Jarmul and Alyona Medelyan.
I'm looking forward to hearing them also. For the record: Katharine's talk is "How Machine Learning Changed Sentiment Analysis, or I Hate You Computer ," and Alyona is presenting "7 NLP Must-Haves for Customer Feedback Analysis." I'll point readers to an article of Alyona's "Three tips for getting started with Natural Language Understanding," and to my own "Interview with Pythonista Katharine Jarmul" on data wrangling and NLP.
And Jason, thanks for the interview.
(Disclosure: Revealed Context is Sentiment Analysis Symposium sponsor.)
Related: