Curated, Cleansed Datasets can Make A World of Difference

dataX from CrowdAnalytix is a new open data sharing platform, with curated, cleansed datasets prepared as a result from 90+ projects executed for leading global enterprises.

By MOHAN SINGH, (CrowdAnalytix).


Our intent.

To begin with, we want to solve the pain of data prep – searching, extracting, aggregating, cleansing and preparing datasets for analysis – not by promising another tool to magically do this but by creating a growing repository of curated, cleansed and readily analyzable features powered by you and other data scientists like you.

New ways of thinking.

dataX, our new platform, is born out of our belief in the power of crowdsourcing and its ability to help accelerate data science and add value to the quality of big data analysis.

To seed the platform, our data contains the open datasets that we’ve prepared as a result of over 90+ projects executed for leading global enterprises.


Datasets are organized by business use-case or topic or domain. Tags include like retail analytics, HR analytics, Financial indicators, Healthcare, Telecom etc. Sample Datasets include:

  • Smartphone Subscribers in Developing Countries
  • Churn Prediction in Telecom (Dataset used in
  • H1N1 Drug Discovery Patents dataset
  • Various financial and company indicators like Copper Spot Prices, High technology exports (USA) etc.
  • Extreme weather indicators and many more!!

This is just the beginning.

We hope to start a movement – where access to curated, cleansed datasets potentially inspires new ways of collaborating on solutions – and we invite anyone interested to download the data for themselves and begin a dialogue on how we can work together.

dataX is still in beta. It’s your inputs that will shape dataX going forward.

Early Access here.

Bio: MOHAN SINGH is a Data Scientist at CrowdANALYTIX.