- Cleaner Data Analysis with Pandas Using Pipes - Jan 15, 2021.
Check out this practical guide on Pandas pipes.
- Data Cleaning and Wrangling in SQL - Jan 14, 2021.
SQL is a foundational skill for data analysts but its application is sometimes limited within the data pipeline. However, SQL can be successfully used for many pre-processing tasks, such as data cleaning and wrangling, as demonstrated here by example.
- Top KDnuggets tweets, Sep 23-29: An Introduction to #AI – updated for 2020; Master using Pandas for time series analysis - Sep 30, 2020.
An Introduction to #AI - updated for 2020; Free From MIT: Intro to Computer Science and Programming in Python; The Most Complete Guide to #PyTorch for Data Scientists; (Good) Data Cleaning is just reusable Data Transformations
- Data Cleaning: The secret ingredient to the success of any Data Science Project - Jul 1, 2020.
With an uncleaned dataset, no matter what type of algorithm you try, you will never get accurate results. That is why data scientists spend a considerable amount of time on data cleaning.
- KDnuggets™ News 19:n47, Dec 11: 10 Free Top Notch Machine Learning Courses; AI, Analytics, ML, DS Main Developments and Key Trends - Dec 11, 2019.
We asked top experts: What were the main developments in AI, Data Science, Deep Learning, and Machine Learning Research in 2019, and what key trends do you expect in 2020? Read their answers, and also check 10 Free Top Notch Machine Learning Courses; 4 Hottest Trends in Data Science; The Essential Toolbox for Data Cleaning, and more
- The Essential Toolbox for Data Cleaning - Dec 5, 2019.
Increase your confidence to perform data cleaning with a broader perspective of what datasets typically look like, and follow this toolbox of code snipets to make your data cleaning process faster and more efficient.
- Data Cleaning and Preprocessing for Beginners - Nov 7, 2019.
Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.
- 5 Fundamental AI Principles - Oct 3, 2019.
While AI may appear magical at times, these five principles will help guide you to avoid pitfalls when leveraging this tech.
- Data Mapping Using Machine Learning - Sep 27, 2019.
Data mapping is a way to organize various bits of data into a manageable and easy-to-understand system.
- 6 bits of advice for Data Scientists - Sep 25, 2019.
As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.
- Dealing with categorical features in machine learning - Jul 16, 2019.
Many machine learning algorithms require that their input is numerical and therefore categorical features must be transformed into numerical features before we can use any of these algorithms.
- How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat - Jun 19, 2019.
Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.
- Top R Packages for Data Cleaning - Mar 15, 2019.
Data cleaning is one of the most important and time consuming task for data scientists. Here are the top R packages for data cleaning.
- Simple Yet Practical Data Cleaning Codes - Feb 26, 2019.
Real world data is messy and needs to be cleaned before it can be used for analysis. Industry experts say the data preprocessing step can easily take 70% to 80% of a data scientist's time on a project.
- How to tackle common data cleaning issues in R - May 24, 2018.
R is a great choice for manipulating, cleaning, summarizing, producing probability statistics, and so on. In addition, it's not going away anytime soon, it is platform independent, so what you create will run almost anywhere, and it has awesome help resources.
- 7 Useful Suggestions from Andrew Ng “Machine Learning Yearning” - May 8, 2018.
Machine Learning Yearning is a book by AI and Deep Learning guru Andrew Ng, focusing on how to make machine learning algorithms work and how to structure machine learning projects. Here we present 7 very useful suggestions from the book.
- The Dirty Little Secret Every Data Scientist Knows (but won’t admit) - Apr 26, 2018.
Most people don’t realize, but the actual “fancy” machine learning algorithm is like the last mile of the marathon. There is so much that must be done before you get there!
- A Primer on Web Scraping in R - Jan 12, 2018.
If you are a data scientist who wants to capture data from such web pages then you wouldn’t want to be the one to open all these pages manually and scrape the web pages one by one. To push away the boundaries limiting data scientists from accessing such data from web pages, there are packages available in R.
Pages: 1 2
- Cartoon: Future Machine Learning Class - Sep 2, 2017.
New KDnuggets Cartoon looks at an unusual but possible future Machine Learning Class.
- Next Generation Data Manipulation with R and dplyr - Aug 31, 2017.
The idea behind the dplyr package is to do one thing at a time. dplyr has separate functions for every task which make its implementation crisp and easy to understand.
- The Ultimate Guide to Basic Data Cleaning - Aug 24, 2017.
Data cleaning can seem intimidating, but it’s not hard if you know the basic steps. That’s why we’re excited to announce our newest ebook, “The Ultimate Guide to Basic Data Cleaning”!
- Tidying Data in Python - Jan 4, 2017.
This post summarizes some tidying examples Hadley Wickham used in his 2014 paper on Tidy Data in R, but will demonstrate how to do so using the Python pandas library.
- How to Choose a Data Format - Nov 3, 2016.
In any data analytics project, after business understanding phase, data understanding and selection of right data format as well as ETL tools is very important task. In this article, a very useful and practical set of guidelines is explained covering data format selection and ETL phases of project lifecycle.
Pages: 1 2
- How Can Lean Six Sigma Help Machine Learning? - Nov 1, 2016.
The data cleansing phase alone is not sufficient to ensure the accuracy of the machine learning, when noise / bias exists in input data. The lean six sigma variance reduction can improve the accuracy of machine learning results.
- Choosing Tools for Data ETLs - Aug 9, 2016.
Which tool should I use for my data pipelines? Get some advice from a data scientist recently having gone through this pipeline tool selection process.
- Getting Started with Data Science – R - Aug 3, 2016.
A great introductory post from DataRobot on getting started with data science in R, including cleaning data and performing predictive modeling.
Pages: 1 2
- Getting Started with Data Science – Python - Aug 1, 2016.
A great introductory post from DataRobot on getting started with data science in the Python ecosystem, including cleaning data and performing predictive modeling.
Pages: 1 2
- Infinite Data Overlap Detection Arrives to Speed Business Insights - Jun 8, 2016.
Infinite Data Overlap Detection(IDOD) is a new, Spark-based technology that empowers non-technical business users to automatically discover data patterns and blendany data type for any set of values from multiple sources – both inside and outside the enterprise.
- Doing Data Science: A Kaggle Walkthrough Part 3 – Cleaning Data - Jun 3, 2016.
This is part three in a fantastic 6 part series covering the process of data science, and the application of the process to a Kaggle competition. In this episode, data cleaning and preparation is covered.
Pages: 1 2
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
- KDnuggets™ News 16:n16, May 4: How to Remove Duplicates from Large Data; Datasets over Algorithms; When Automation goes too far - May 4, 2016.
How to Remove Duplicates in Large Datasets; The Development of Classification as a Learning Machine; Datasets Over Algorithms; Cartoon: When Automation Goes Too Far, and more.
- How to Remove Duplicates in Large Datasets - Apr 27, 2016.
Dealing with huge datasets can be tricky, especially the data cleaning process. One of such processing is de-duplication, find out how you can solve this using the statistical techniques.
- Top KDnuggets tweets, Mar 22-29: If Hollywood Made Movies About MachineLearning; Data Scientist on Every @AirBNB Leadership Team - Mar 30, 2016.
If Hollywood Made Movies About Machine Learning; Why Airbnb Has a Data Scientist on Every Leadership Team; Very useful guide for Data Cleaning in Python; Data scientist Hilary Mason wants to show you the (near) future.
- Doing Data Science: A Kaggle Walkthrough – Cleaning Data - Mar 23, 2016.
Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.
Pages: 1 2
- Customer Study – Dealing with dirty, smelly, horrible data? - Nov 12, 2015.
If you have hands on experience with data cleaning and data engineering, Microsoft Data Platform group would love to hear about your challenges. This is for early influence on product development (not sales).
- Rich Data Summit Takeaways - Oct 19, 2015.
Data scientists get excited about algorithms. But nearly all time spent working with data involves acquiring, pipelining, annotating and cleaning it. At the Rich Data Summit in SF, data's dirty work took center stage.
- Top KDnuggets tweets, Aug 04-10: Survival analysis in R – step by step guide - Aug 11, 2015.
Survival analysis in R - step by step guide; Neural Nets, AI and Deep Learning journey to acceptance; Data is Ugly - Tales of Data Cleaning; Apache Flink and the case for #stream processing #BigData #Analytics.
- KDnuggets™ News 15:n25, Aug 5: Largest Dataset Analyzed? Big Data & the Dog Question; Impact of IoT - Aug 5, 2015.
New Poll: Largest Dataset Analyzed/Data Mined?; Cartoon: Big Data and the dog question; Impact of IoT on Big Data Landscape; Data is Ugly - Tales of Data Cleaning.
- Data is Ugly – Tales of Data Cleaning - Aug 1, 2015.
Whether you want to do business analytics or build the deep learning models, getting correct data and cleansing it appropriately remains the major task. Find out experts opinions on how you can make efficient data cleansing and collection efforts.
- ParseHub gives Data Scientists a better, faster way to collect data - Jul 10, 2015.
ParseHub enables data professionals to easily collect, structure, combine and manipulate data, and speed up the Data Science process.
- The Inconvenient Truth About Data Science - May 5, 2015.
Data is never clean, you will spend most of your time cleaning and preparing data, 95% of tasks do not require deep learning, and more inconvenient wisdom.
- Automatic Statistician and the Profoundly Desired Automation for Data Science - Feb 17, 2015.
The Automatic Statistician project by Univ. of Cambridge and MIT is pushing ahead the frontiers of automation for the selection and evaluation of machine learning models. In general, what does automation mean to Data Science?