KDnuggets Home » News » 2014 » Mar » Software » Paxata automates Data Preparation for Big Data Analytics ( 14:n06 )

Paxata automates Data Preparation for Big Data Analytics

          

Tags: , , ,


Paxata wants to shorten and automate the data cleaning process, by augmenting data from a huge number of sources and by using machine learning to see statistical similarities between the data imported.

By Ajay Ohri, Mar 7, 2014.

logo-pax

Paxata.com intends to take the munging out of the whole data science process by helping shorten and semi-automate the data cleaning process. It does so by data augmentation by a huge number of sources as well  from it's data enrichment library as well as using machine learning to see statistical similarities between the data imported.

In this case  machine learning leverages text mining and association analysis along with graph analysis .

The solution runs on the Rackspace cloud and by applying it's own algorithms to the data, Paxata creates a data  model  in  the form  of  a  graph,  with associations  among the data objects. These associations are then used  to resolve data quality issues.

For example if you import one dataset where Customer Id is named as Customer_ID and the other dataset it is named as Account_Number, the Paxata solution would be able to prompt the user that it is one and the same. This makes it incredibly useful given the comparatively enormous time, data scientists spend in the data preparation phase of a project. This additional time can thus be used for better visualizations or even higher level of analytics.

By thus preparing the data Paxata  enables it to be ready in a single dataset format ready for consumption for analysis for software like Tableau, Qlikview, Excel and any ODBC compliant tool. In addition you can export the prepared data to Hadoop clusters. The founding team is led by seasoned MDM entrepreneur CEO Prakash Nanduri. With $10 million  in  funding from Accel Partners, this is a startup that is making waves that will impact enterprise software for data science. 

Pricing is $3,500 a year for Pax Personal and $10,000 for Pax Share. While Cloud Storage is 1 GB with Pax Personal, it is 5 GB for Pax Share which seems a bit low. An additional point in Pax Share is API access through the command line. The third tier of pricing is custom pricing for Enterprises based on assessed needs.

You can watch a demo here http://www.youtube.com/watch?v=bf9zvCyRwdw

Paxata screenshot

With successful use cases including better business analytics, fraud analysis, demand forecasting and resource optimization, this solution can help a lot of businesses struggling with the data deluge of spreadsheets and data marts. I do hope that the Paxata team puts up a more automated demo (like upload your own dummy data) to demonstrate their solution in working as I think this will further enhance the credibility and ease of adoption of the automation process of the data preparation. An additional trial period or demo license can help spread the word even further.

Data Preparation automation has been the dream of many data scientists and this space will only heat up given the huge amounts of data being now processed in the Big Data Analytics era.

 

 

 








Most popular last 30 days


 

Most viewed last 30 days

  1. The Grammar of Data Science: Python vs R - Mar 28, 2015.
  2. Awesome Public Datasets on GitHub - Apr 6, 2015.
  3. More Free Data Mining, Data Science Books and Resources - Mar 25, 2015.
  4. 10 things statistics taught us about big data analysis - Feb 10, 2015.
  5. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
  6. 7 Steps for Learning Data Mining and Data Science - Oct 10, 2013.
  7. Top 10 Data Analysis Tools for Business - Jun 13, 2014.
  8. Deep Learning for Text Understanding from Scratch - Mar 13, 2015.
  9. 9 Must-Have Skills You Need to Become a Data Scientist - Nov 22, 2014.
  10. 7 common mistakes when doing Machine Learning - Mar 7, 2015.

 
 

Most shared last 30 days

  1. Forrester Wave(tm) Big Data Predictive Analytics 2015: Gainers and Losers - Apr 3, 2015.
  2. Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure - Apr 16, 2015.
  3. Awesome Public Datasets on GitHub - Apr 6, 2015.
  4. Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science - from "Big Bang" to Now - Apr 19, 2015.
  5. The Myth of Model Interpretability - Apr 27, 2015.
  6. Top 10 R Packages to be a Kaggle Champion - Apr 21, 2015.
  7. Data Science 101: Preventing Overfitting in Neural Networks - Apr 17, 2015.
  8. Deep Learning to Fight Crime - Apr 22, 2015.
  9. Cartoon: A solution for Data Scientists allergies caused by Big Data - Apr 17, 2015.
  10. Top stories for Apr 19-25: Top LinkedIn Groups for Analytics, Big Data, Data Mining; 10 R Packages for a Kaggle Champion - Apr 26, 2015.

KDnuggets Home » News » 2014 » Mar » Software » Paxata automates Data Preparation for Big Data Analytics ( 14:n06 )