KDnuggets Home » News » 2014 » Mar » Software » etcML Promises to Make Text Classification Easy ( 14:n06 )

etcML Promises to Make Text Classification Easy

etcML is a new and free tool that allows even novice user use the power of machine learning and text classification.

By Ajay Ohri, Mar 5, 2014.

etcML.com is a new website that helps bring the power of machine learning to classifying text even to users who are not proficient in machine learning and text classification. It does so by creating an easy to use interface that helps a user upload data, create a classifier, and apply the classifier to predict. You can even use it to classify tweets for sentiment using an inbuilt classifier and integration with Twitter search.

The  website is backed by a young team led by Richard Socher out of Stanford with faculty adviser as the famed professor Andrew Ng, creator of Coursera. The website is free and is currently at the status of a research project at Stanford. The same team has been behind the creation of NASENT which is an improved version of sentiment analysis algorithms.

There are other websites that promise to help users with easy machine learning classification including google prediction api and   bigml.com, however the ease of usage helps make this a promising candidate to watch. This could potentially help widen the audience for the direct end users of machine learning classification (like marketing and product strategy) teams while creating a challenge for existing social media analysis tools like Radian6.

The basic process of text classification using etcml.com is as follows-

 

Step 1- choose among

  • upload dataset of text data, OR

  • choose existing text dataset,  OR

  • classify tweets (publicly available text data)    etcml figure 1.png

Step 2 - choose among

  • existing classifiers models OR

  • your own classifier models

Step 3 predict the dataset using the chosen classifier.

 

You can then download the dataset with both the input  text and the output classification labels.

An additional point is while the interface makes it very easy to explain classification (thus being of some use to the academic world), it also allows you to create public and private datasets as well as public and private classifiers.

It could thus potentially bring in a marketplace of both datasets and predictive models, both of which have been tried and tested without resounding success in separate formats.

 

For a sentiment analysis of tweets, it shows top positive , top negative and top neutral tweets, a graphical description of when the tweets happened and the ability to change the classification manually.   Screenshot 2014-03-04 08.48.51.png

 

We noticed some caveats- while the tool allows manual intervention to change the labels, there is no way to feed this back automatically into the classifier that was originally used. This of course applies to the traditional text mining challenges of classifying sarcasm, slang or even double negatives. Another drawback is the lack of model diagnostics- especially the confusion matrix for the lift curve, Also needed is perhaps a paid version for tweets from a longer period (since Twitter sells the API data through resellers like Datasift). We noticed some API information and documentation here but better bindings especially for Python, and R communities can only enhance the usage .

 Overall better interfaces is something that should come to the data science world, and etcML.com is a great effort to make this possible.







Most popular last 30 days


 

Most viewed last 30 days

  1. The Grammar of Data Science: Python vs R - Mar 28, 2015.
  2. Awesome Public Datasets on GitHub - Apr 6, 2015.
  3. More Free Data Mining, Data Science Books and Resources - Mar 25, 2015.
  4. 10 things statistics taught us about big data analysis - Feb 10, 2015.
  5. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
  6. 7 Steps for Learning Data Mining and Data Science - Oct 10, 2013.
  7. Top 10 Data Analysis Tools for Business - Jun 13, 2014.
  8. Deep Learning for Text Understanding from Scratch - Mar 13, 2015.
  9. 9 Must-Have Skills You Need to Become a Data Scientist - Nov 22, 2014.
  10. 7 common mistakes when doing Machine Learning - Mar 7, 2015.

 
 

Most shared last 30 days

  1. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
  2. Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure - Apr 16, 2015.
  3. Awesome Public Datasets on GitHub - Apr 6, 2015.
  4. Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science - from "Big Bang" to Now - Apr 19, 2015.
  5. The Myth of Model Interpretability - Apr 27, 2015.
  6. Top 10 R Packages to be a Kaggle Champion - Apr 21, 2015.
  7. Data Science 101: Preventing Overfitting in Neural Networks - Apr 17, 2015.
  8. Deep Learning to Fight Crime - Apr 22, 2015.
  9. Cartoon: A solution for Data Scientists allergies caused by Big Data - Apr 17, 2015.
  10. Top stories for Apr 19-25: Top LinkedIn Groups for Analytics, Big Data, Data Mining; 10 R Packages for a Kaggle Champion - Apr 26, 2015.

KDnuggets Home » News » 2014 » Mar » Software » etcML Promises to Make Text Classification Easy ( 14:n06 )