- How to tackle common data cleaning issues in R - May 24, 2018.
R is a great choice for manipulating, cleaning, summarizing, producing probability statistics, and so on. In addition, it's not going away anytime soon, it is platform independent, so what you create will run almost anywhere, and it has awesome help resources.
Tags: Book, Data Cleaning, ebook, Packt Publishing, R
- 7 Useful Suggestions from Andrew Ng “Machine Learning Yearning” - May 8, 2018.
Machine Learning Yearning is a book by AI and Deep Learning guru Andrew Ng, focusing on how to make machine learning algorithms work and how to structure machine learning projects. Here we present 7 very useful suggestions from the book.
Tags: Andrew Ng, Book, Data Cleaning, Data Preparation, Free ebook, Machine Learning, Metrics
- The Dirty Little Secret Every Data Scientist Knows (but won’t admit) - Apr 26, 2018.
Most people don’t realize, but the actual “fancy” machine learning algorithm is like the last mile of the marathon. There is so much that must be done before you get there!
Tags: Data Cleaning, Data Preparation, Data Science, Machine Learning
- A Primer on Web Scraping in R - Jan 12, 2018.
If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.
Pages: 1 2
Tags: Data Cleaning, Data Curation, R, Web Scraping
Cartoon: Future Machine Learning Class - Sep 2, 2017.
New KDnuggets Cartoon looks at an unusual but possible future Machine Learning Class.
Tags: Cartoon, Data Cleaning, Machine Learning
- Next Generation Data Manipulation with R and dplyr - Aug 31, 2017.
The idea behind the dplyr package is to do one thing at a time. dplyr has separate functions for every task which make its implementation crisp and easy to understand.
Tags: Data Cleaning, Data Exploration, R, R Packages
- The Ultimate Guide to Basic Data Cleaning - Aug 24, 2017.
Data cleaning can seem intimidating, but it’s not hard if you know the basic steps. That’s why we’re excited to announce our newest ebook, “The Ultimate Guide to Basic Data Cleaning”!
Tags: Data Cleaning, Data Preparation, ebook, Free ebook
- Tidying Data in Python - Jan 4, 2017.
This post summarizes some tidying examples Hadley Wickham used in his 2014 paper on Tidy Data in R, but will demonstrate how to do so using the Python pandas library.
Tags: Data Cleaning, Data Preparation, Pandas, Python
- How to Choose a Data Format - Nov 3, 2016.
In any data analytics project, after business understanding phase, data understanding and selection of right data format as well as ETL tools is very important task. In this article, a very useful and practical set of guidelines is explained covering data format selection and ETL phases of project lifecycle.
Pages: 1 2
Tags: Data Cleaning, Data Engineering, Data Preparation, ETL, Hadoop, HDFS
- How Can Lean Six Sigma Help Machine Learning? - Nov 1, 2016.
The data cleansing phase alone is not sufficient to ensure the accuracy of the machine learning, when noise / bias exists in input data. The lean six sigma variance reduction can improve the accuracy of machine learning results.
Tags: Data Cleaning, Machine Learning, Predictive Analytics, Statistics
- Choosing Tools for Data ETLs - Aug 9, 2016.
Which tool should I use for my data pipelines? Get some advice from a data scientist recently having gone through this pipeline tool selection process.
Tags: AirBnB, Data Cleaning, Data Preparation, ETL
- Getting Started with Data Science – R - Aug 3, 2016.
A great introductory post from DataRobot on getting started with data science in R, including cleaning data and performing predictive modeling.
Pages: 1 2
Tags: Beginners, Data Cleaning, Data Science, Predictive Modeling, R
- Getting Started with Data Science – Python - Aug 1, 2016.
A great introductory post from DataRobot on getting started with data science in the Python ecosystem, including cleaning data and performing predictive modeling.
Pages: 1 2
Tags: Beginners, Data Cleaning, Data Science, Predictive Modeling, Python
- Infinite Data Overlap Detection Arrives to Speed Business Insights - Jun 8, 2016.
Infinite Data Overlap Detection(IDOD) is a new, Spark-based technology that empowers non-technical business users to automatically discover data patterns and blendany data type for any set of values from multiple sources – both inside and outside the enterprise.
Tags: Apache Spark, ClearStory Data, Data Cleaning, Data Preparation
- Doing Data Science: A Kaggle Walkthrough Part 3 – Cleaning Data - Jun 3, 2016.
This is part three in a fantastic 6 part series covering the process of data science, and the application of the process to a Kaggle competition. In this episode, data cleaning and preparation is covered.
Pages: 1 2
Tags: Data Cleaning, Data Preparation, Kaggle, Python
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
Tags: Data Cleaning, Deep Learning, Machine Learning, Open Source, Overlook, Pandas, Python, scikit-learn, Theano
- KDnuggets™ News 16:n16, May 4: How to Remove Duplicates from Large Data; Datasets over Algorithms; When Automation goes too far - May 4, 2016.
How to Remove Duplicates in Large Datasets; The Development of Classification as a Learning Machine; Datasets Over Algorithms; Cartoon: When Automation Goes Too Far, and more.
Tags: Algorithms, Angoss, Classification, Data Cleaning, Unbalanced
- How to Remove Duplicates in Large Datasets - Apr 27, 2016.
Dealing with huge datasets can be tricky, especially the data cleaning process. One of such processing is de-duplication, find out how you can solve this using the statistical techniques.
Tags: CleverTap, Data Cleaning, Data Preparation
- Top KDnuggets tweets, Mar 22-29: If Hollywood Made Movies About MachineLearning; Data Scientist on Every @AirBNB Leadership Team - Mar 30, 2016.
If Hollywood Made Movies About Machine Learning; Why Airbnb Has a Data Scientist on Every Leadership Team; Very useful guide for Data Cleaning in Python; Data scientist Hilary Mason wants to show you the (near) future.
Tags: AirBnB, Data Cleaning, Hilary Mason, Movies, Python, Top tweets
- Doing Data Science: A Kaggle Walkthrough – Cleaning Data - Mar 23, 2016.
Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.
Pages: 1 2
Tags: Data Cleaning, Data Preparation, Kaggle, Pandas, Python
- Customer Study – Dealing with dirty, smelly, horrible data? - Nov 12, 2015.
If you have hands on experience with data cleaning and data engineering, Microsoft Data Platform group would love to hear about your challenges. This is for early influence on product development (not sales).
Tags: Data Cleaning, Microsoft
- Rich Data Summit Takeaways - Oct 19, 2015.
Data scientists get excited about algorithms. But nearly all time spent working with data involves acquiring, pipelining, annotating and cleaning it. At the Rich Data Summit in SF, data's dirty work took center stage.
Tags: CrowdFlower, Data Cleaning, Lukas Biewald, Nate Silver, Zachary Lipton
- Top KDnuggets tweets, Aug 04-10: Survival analysis in R – step by step guide - Aug 11, 2015.
Survival analysis in R - step by step guide; Neural Nets, AI and Deep Learning journey to acceptance; Data is Ugly - Tales of Data Cleaning; Apache Flink and the case for #stream processing #BigData #Analytics.
Tags: Data Cleaning, Flink, Neural Networks, R, Survival Analysis
- KDnuggets™ News 15:n25, Aug 5: Largest Dataset Analyzed? Big Data & the Dog Question; Impact of IoT - Aug 5, 2015.
New Poll: Largest Dataset Analyzed/Data Mined?; Cartoon: Big Data and the dog question; Impact of IoT on Big Data Landscape; Data is Ugly - Tales of Data Cleaning.
Tags: Best Practices, Cartoon, Data Cleaning, Dataset, IoT, Poll
- Data is Ugly – Tales of Data Cleaning - Aug 1, 2015.
Whether you want to do business analytics or build the deep learning models, getting correct data and cleansing it appropriately remains the major task. Find out experts opinions on how you can make efficient data cleansing and collection efforts.
Tags: Big Data, Data Cleaning, Data Preparation, Data-Driven Business
- ParseHub gives Data Scientists a better, faster way to collect data - Jul 10, 2015.
ParseHub enables data professionals to easily collect, structure, combine and manipulate data, and speed up the Data Science process.
Tags: Data Cleaning, Data Preparation, Glassdoor, ParseHub
- The Inconvenient Truth About Data Science - May 5, 2015.
Data is never clean, you will spend most of your time cleaning and preparing data, 95% of tasks do not require deep learning, and more inconvenient wisdom.
Tags: Advice, Data Cleaning, Data Science
- Automatic Statistician and the Profoundly Desired Automation for Data Science - Feb 17, 2015.
The Automatic Statistician project by Univ. of Cambridge and MIT is pushing ahead the frontiers of automation for the selection and evaluation of machine learning models. In general, what does automation mean to Data Science?
Tags: Automation, Cambridge, Data Cleaning, Data Science, Machine Learning, MIT, Modeling, Statistician