Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification

An exclusive interview with Richard Socher, co-founder of etcML, a new and free tool for helping users with creating classifiers for text using machine learning.

by Ajay Ohri, March 31, 2014.

We recently covered etcML, an exciting startup in the field of machine learning and sentiment analysis.  Here is an exclusive interview with the co-founder of etcML.com Richard Socher.

Ajay Ohri: There are many tools available for classification like  BigML.comGoogle PredictionAPI and many tools for Sentiment Analysis (including Viralheat ). How etcML.com intends to differentiate itself?

Richard Socher Richard Socher (RS)- Most of all: ease of use. For analyzing what's trending on Twitter each day you just need to visit the front page. For analyzing your own tweets or a certain query you just type it into the front page. To train and test your own classifier you just copy and paste a text file into the browser. This is by far the easiest way to train a classifier and predict with existing classifiers. The fact that it's entirely free and doesn't even require signing up is an additional bonus.

AO: We noticed the interface of  etcML.com enables you to manually correct the classified tags/labels. Is it possible to incorporate this manual intervention as feedback for a better classifier ?

RS: Not yet but we will include that capability in the future.

AO: What were your basic interface design paradigms while designing the application

RS: Get real user feedback. We iterated through several interfaces to make the final site as easy to use as possible.

AO: In what ways is your NASENT algorithm distinct from existing  sentiment analysis methodologies?

RS: NaSent is quite different to the etcML.  NaSent is more accurate but also works only for English sentences. NaSent is based on some very novel recursive deep learning techniques that we just developed at Stanford last year:  nlp.stanford.edu/sentiment/.
etcML can classify longer documents of any standard character language (we're working on UTF8 support right now to support Chinese and Japanese as well). It is very robust and fast.

AO: Describe some usage stats or numbers to help us with how strong is the overall interest in etcML.com

RS: As of today, we have:
  • Classifiers (343)
  • Datasets (1849)
  • Twitter Searches (8998)
  • Daily Twitter Trends (1284)

This includes public and private datasets and classifiers. In the last 2 months we've had:
  • Visits: 26,247
  • Unique Visitors: 20,276
  • Pageviews: 32,704

AO: What are some areas that you can focus on helping enterprises with text mining

RS- The answer depends on how I disambiguate "you" in this question :) etcML can help researchers, journalists, social scientists and companies quickly solve language classification problems. It will get most people 75% to their solution. More broadly, my research and deep learning can improve and automate a lot of text problems in industry.

AO: Any plans to use model diagnostics in the application?

RS: Yes. One of my long term goals is to develop novel deep learning and machine learning algorithms and analyze their performance on a large variety of different text classification problems. Hopefully, we'll be able to improve all the trained classifiers on the site in the future.

AO: Would it be possible at a later stage to create a market place for classifiers (like sell or rent your private classifier for a fee) on your website

RS: I like that idea and we've had similar thoughts. We'll see if this is something that users want and if so make it happen.

Richard Socher is a PhD student at Stanford working on natural language processing, machine learning and deep learning. etcML is a new and free tool that allows even novice user use the power of machine learning and text classification.