Data Cleaning (26)

Cleaner Data Analysis with Pandas Using Pipes - Jan 15, 2021.

Check out this practical guide on Pandas pipes.

Data Analysis, Data Cleaning, Pandas, Pipeline, Python
Data Cleaning and Wrangling in SQL - Jan 14, 2021.

SQL is a foundational skill for data analysts but its application is sometimes limited within the data pipeline. However, SQL can be successfully used for many pre-processing tasks, such as data cleaning and wrangling, as demonstrated here by example.

Data Cleaning, Data Preparation, SQL
Data Cleaning: The secret ingredient to the success of any Data Science Project - Jul 1, 2020.

With an uncleaned dataset, no matter what type of algorithm you try, you will never get accurate results. That is why data scientists spend a considerable amount of time on data cleaning.

Data Cleaning, Data Preparation, Data Science, Outliers, Python
The Essential Toolbox for Data Cleaning - Dec 5, 2019.

Increase your confidence to perform data cleaning with a broader perspective of what datasets typically look like, and follow this toolbox of code snipets to make your data cleaning process faster and more efficient.

Data Cleaning, Data Preparation
Data Cleaning and Preprocessing for Beginners - Nov 7, 2019.

Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.

Beginners, Data Cleaning, Data Preprocessing, Pandas, Python, Sciforce
5 Fundamental AI Principles - Oct 3, 2019.

While AI may appear magical at times, these five principles will help guide you to avoid pitfalls when leveraging this tech.

AI, Data Cleaning, Deployment, Training Data
Data Mapping Using Machine Learning - Sep 27, 2019.

Data mapping is a way to organize various bits of data into a manageable and easy-to-understand system.

Data Cleaning, Data Preparation, Machine Learning
6 bits of advice for Data Scientists - Sep 25, 2019.

As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.

Advice, Data Cleaning, Data Scientist, Metrics, Overfitting, Statistics
Dealing with categorical features in machine learning - Jul 16, 2019.

Many machine learning algorithms require that their input is numerical and therefore categorical features must be transformed into numerical features before we can use any of these algorithms.

Data Cleaning, Data Preprocessing, Feature Engineering, Machine Learning, Python
Top R Packages for Data Cleaning - Mar 15, 2019.

Data cleaning is one of the most important and time consuming task for data scientists. Here are the top R packages for data cleaning.

Data Cleaning, Data Preparation, Data Science, Machine Learning, R
Simple Yet Practical Data Cleaning Codes - Feb 26, 2019.

Real world data is messy and needs to be cleaned before it can be used for analysis. Industry experts say the data preprocessing step can easily take 70% to 80% of a data scientist's time on a project.

Data Cleaning, Data Preprocessing, Python
How to tackle common data cleaning issues in R - May 24, 2018.

R is a great choice for manipulating, cleaning, summarizing, producing probability statistics, and so on. In addition, it's not going away anytime soon, it is platform independent, so what you create will run almost anywhere, and it has awesome help resources.

Book, Data Cleaning, ebook, Packt Publishing, R
7 Useful Suggestions from Andrew Ng “Machine Learning Yearning” - May 8, 2018.

Machine Learning Yearning is a book by AI and Deep Learning guru Andrew Ng, focusing on how to make machine learning algorithms work and how to structure machine learning projects. Here we present 7 very useful suggestions from the book.

Andrew Ng, Book, Data Cleaning, Data Preparation, Free ebook, Machine Learning, Metrics
The Dirty Little Secret Every Data Scientist Knows (but won’t admit) - Apr 26, 2018.

Most people don’t realize, but the actual “fancy” machine learning algorithm is like the last mile of the marathon. There is so much that must be done before you get there!

Data Cleaning, Data Preparation, Data Science, Machine Learning
A Primer on Web Scraping in R - Jan 12, 2018.

If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.

Pages: 1 2

Data Cleaning, Data Curation, R, Web Scraping
Cartoon: Future Machine Learning Class - Sep 2, 2017.

New KDnuggets Cartoon looks at an unusual but possible future Machine Learning Class.

Cartoon, Data Cleaning, Machine Learning
Next Generation Data Manipulation with R and dplyr - Aug 31, 2017.

The idea behind the dplyr package is to do one thing at a time. dplyr has separate functions for every task which make its implementation crisp and easy to understand.

Data Cleaning, Data Exploration, R, R Packages
The Ultimate Guide to Basic Data Cleaning - Aug 24, 2017.

Data cleaning can seem intimidating, but it’s not hard if you know the basic steps. That’s why we’re excited to announce our newest ebook, “The Ultimate Guide to Basic Data Cleaning”!

Data Cleaning, Data Preparation, ebook, Free ebook
Tidying Data in Python - Jan 4, 2017.

This post summarizes some tidying examples Hadley Wickham used in his 2014 paper on Tidy Data in R, but will demonstrate how to do so using the Python pandas library.

Data Cleaning, Data Preparation, Pandas, Python
How Can Lean Six Sigma Help Machine Learning? - Nov 1, 2016.

The data cleansing phase alone is not sufficient to ensure the accuracy of the machine learning, when noise / bias exists in input data. The lean six sigma variance reduction can improve the accuracy of machine learning results.

Data Cleaning, Machine Learning, Predictive Analytics, Statistics
5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.

We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.

Data Cleaning, Deep Learning, Machine Learning, Open Source, Overlook, Pandas, Python, scikit-learn, Theano
How to Remove Duplicates in Large Datasets - Apr 27, 2016.

Dealing with huge datasets can be tricky, especially the data cleaning process. One of such processing is de-duplication, find out how you can solve this using the statistical techniques.

CleverTap, Data Cleaning, Data Preparation
Doing Data Science: A Kaggle Walkthrough – Cleaning Data - Mar 23, 2016.

Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.

Pages: 1 2

Data Cleaning, Data Preparation, Kaggle, Pandas, Python
Data is Ugly – Tales of Data Cleaning - Aug 1, 2015.

Whether you want to do business analytics or build the deep learning models, getting correct data and cleansing it appropriately remains the major task. Find out experts opinions on how you can make efficient data cleansing and collection efforts.

Big Data, Data Cleaning, Data Preparation, Data-Driven Business
The Inconvenient Truth About Data Science - May 5, 2015.

Data is never clean, you will spend most of your time cleaning and preparing data, 95% of tasks do not require deep learning, and more inconvenient wisdom.

Advice, Data Cleaning, Data Science
Automatic Statistician and the Profoundly Desired Automation for Data Science - Feb 17, 2015.

The Automatic Statistician project by Univ. of Cambridge and MIT is pushing ahead the frontiers of automation for the selection and evaluation of machine learning models. In general, what does automation mean to Data Science?

Automation, Cambridge, Data Cleaning, Data Science, Machine Learning, MIT, Modeling, Statistician

Data Cleaning (26)

Latest Posts

Top Posts