KDnuggets Home » News » 2014 » Mar » Opinions, Interviews » Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification ( 14:n08 )

Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification

          

Tags: , , , ,


An exclusive interview with Richard Socher, co-founder of etcML, a new and free tool for helping users with creating classifiers for text using machine learning.

by Ajay Ohri, March 31, 2014.

We recently covered etcML, an exciting startup in the field of machine learning and sentiment analysis.  Here is an exclusive interview with the co-founder of etcML.com Richard Socher.

Ajay Ohri: There are many tools available for classification like  BigML.comGoogle PredictionAPI and many tools for Sentiment Analysis (including Viralheat ). How etcML.com intends to differentiate itself?

Richard Socher Richard Socher (RS)- Most of all: ease of use. For analyzing what's trending on Twitter each day you just need to visit the front page. For analyzing your own tweets or a certain query you just type it into the front page. To train and test your own classifier you just copy and paste a text file into the browser. This is by far the easiest way to train a classifier and predict with existing classifiers. The fact that it's entirely free and doesn't even require signing up is an additional bonus.

AO: We noticed the interface of  etcML.com enables you to manually correct the classified tags/labels. Is it possible to incorporate this manual intervention as feedback for a better classifier ?

RS: Not yet but we will include that capability in the future.

AO: What were your basic interface design paradigms while designing the application

RS: Get real user feedback. We iterated through several interfaces to make the final site as easy to use as possible.

AO: In what ways is your NASENT algorithm distinct from existing  sentiment analysis methodologies?

RS: NaSent is quite different to the etcML.  NaSent is more accurate but also works only for English sentences. NaSent is based on some very novel recursive deep learning techniques that we just developed at Stanford last year:  nlp.stanford.edu/sentiment/.
etcML can classify longer documents of any standard character language (we're working on UTF8 support right now to support Chinese and Japanese as well). It is very robust and fast.

AO: Describe some usage stats or numbers to help us with how strong is the overall interest in etcML.com

RS: As of today, we have:
  • Classifiers (343)
  • Datasets (1849)
  • Twitter Searches (8998)
  • Daily Twitter Trends (1284)

This includes public and private datasets and classifiers. In the last 2 months we've had:
  • Visits: 26,247
  • Unique Visitors: 20,276
  • Pageviews: 32,704




AO: What are some areas that you can focus on helping enterprises with text mining

RS- The answer depends on how I disambiguate "you" in this question :) etcML can help researchers, journalists, social scientists and companies quickly solve language classification problems. It will get most people 75% to their solution. More broadly, my research and deep learning can improve and automate a lot of text problems in industry.

AO: Any plans to use model diagnostics in the application?

RS: Yes. One of my long term goals is to develop novel deep learning and machine learning algorithms and analyze their performance on a large variety of different text classification problems. Hopefully, we'll be able to improve all the trained classifiers on the site in the future.

AO: Would it be possible at a later stage to create a market place for classifiers (like sell or rent your private classifier for a fee) on your website

RS: I like that idea and we've had similar thoughts. We'll see if this is something that users want and if so make it happen.

Richard Socher is a PhD student at Stanford working on natural language processing, machine learning and deep learning. etcML is a new and free tool that allows even novice user use the power of machine learning and text classification.







Most popular last 30 days


 

Most viewed last 30 days

  1. The Grammar of Data Science: Python vs R - Mar 28, 2015.
  2. More Free Data Mining, Data Science Books and Resources - Mar 25, 2015.
  3. Deep Learning, The Curse of Dimensionality, and Autoencoders - Mar 12, 2015. 4, up3
  4. Awesome Public Datasets on GitHub - Apr 6, 2015.
  5. Deep Learning for Text Understanding from Scratch - Mar 13, 2015.
  6. PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning - Mar 26, 2015.
  7. 10 things statistics taught us about big data analysis - Feb 10, 2015.
  8. Top 10 Data Analysis Tools for Business - Jun 13, 2014.
  9. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.

 
 

Most shared last 30 days

  1. The Grammar of Data Science: Python vs R - Mar 28, 2015.
  2. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
  3. Awesome Public Datasets on GitHub - Apr 6, 2015.
  4. PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning - Mar 26, 2015.
  5. Data Science as a profession - time is now - Mar 30, 2015.
  6. Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure - Apr 16, 2015.
  7. Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science - from "Big Bang" to Now - Apr 19, 2015.
  8. Computing Platforms for Analytics, Data Mining, Data Science - Apr 1, 2015.
  9. How Big Data Can Improve the Lives of the Poor - Mar 31, 2015.
  10. Gold Mine or Blind Alley? Functional Programming for Big Data & Machine Learning - Apr 1, 2015.

KDnuggets Home » News » 2014 » Mar » Opinions, Interviews » Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification ( 14:n08 )