Data Preprocessing (53)

Feature Selection: Where Science Meets Art - Dec 14, 2021.

From heuristic to algorithmic feature selection techniques for data science projects.

Data Preprocessing, Feature Selection, Machine Learning, Statistics
Using Datawig, an AWS Deep Learning Library for Missing Value Imputation - Dec 7, 2021.

A lot of missing values in the dataset can affect the quality of prediction in the long run. Several methods can be used to fill the missing values and Datawig is one of the most efficient ones.

AWS, Data Preparation, Data Preprocessing, Deep Learning, Missing Values
Design Patterns for Machine Learning Pipelines - Nov 2, 2021.

ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.

Data Preprocessing, ETL, Machine Learning, Pipeline
Four Basic Steps in Data Preparation - Oct 26, 2021.

What we would like to do here is introduce four very basic and very general steps in data preparation for machine learning algorithms. We will describe how and why to apply such transformations within a specific example.

Data Preparation, Data Preprocessing, Data Science, Missing Values, Normalization, Sampling
Text Preprocessing Methods for Deep Learning - Sep 10, 2021.

While the preprocessing pipeline we are focusing on in this post is mainly centered around Deep Learning, most of it will also be applicable to conventional machine learning models too.

Data Preprocessing, Data Processing, Deep Learning, NLP, Text Analytics
How to Query Your Pandas Dataframe - Aug 9, 2021.

A Data Scientist’s perspective on SQL-like Python functions.

Data Preprocessing, Data Processing, Pandas, Python, SQL
Date Processing and Feature Engineering in Python - Jul 15, 2021.

Have a look at some code to streamline the parsing and processing of dates in Python, including the engineering of some useful and common features.

Beginners, Data Preprocessing, Data Processing, Feature Engineering, Python, Time Series
5 Python Data Processing Tips & Code Snippets - Jul 9, 2021.

This is a small collection of Python code snippets that a beginner might find useful for data processing.

Data Preprocessing, Data Processing, Pandas, Programming, Python
Vaex: Pandas but 1000x faster - May 17, 2021.

If you are working with big data, especially on your local machine, then learning the basics of Vaex, a Python library that enables the fast processing of large datasets, will provide you with a productive alternative to Pandas.

Big Data, Data Preprocessing, Pandas, Scalability, Vaex
Data Science 101: Normalization, Standardization, and Regularization - Apr 20, 2021.

Normalization, standardization, and regularization all sound similar. However, each plays a unique role in your data preparation and model building process, so you must know when and how to use these important procedures.

Data Preprocessing, Feature Engineering, Normalization, Regression, Regularization, Statistics
Getting Started with 5 Essential Natural Language Processing Libraries - Feb 3, 2021.

This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.

Data Preparation, Data Preprocessing, Data Visualization, Hugging Face, NLP, Python, spaCy, Text Analytics, Transformer
How to Clean Text Data at the Command Line - Dec 16, 2020.

A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook.

Data Preprocessing, Data Processing, NLP, Text Analytics
Merging Pandas DataFrames in Python - Dec 8, 2020.

A quick how-to guide for merging Pandas DataFrames in Python.

Data Preparation, Data Preprocessing, Data Processing, Pandas, Python
Roadmap to Computer Vision - Oct 26, 2020.

Read this introduction to the main steps which compose a computer vision system, starting from how images are pre-processed, features extracted and predictions are made.

Computer Vision, Convolutional Neural Networks, Data Preprocessing, Neural Networks, Roadmap
Roadmap to Natural Language Processing (NLP) - Oct 19, 2020.

Check out this introduction to some of the most common techniques and models used in Natural Language Processing (NLP).

Data Preprocessing, LDA, NLP, Python, Roadmap, Sentiment Analysis, Transformer, Word Embeddings
Missing Value Imputation – A Review - Sep 29, 2020.

Detecting and handling missing values in the correct way is important, as they can impact the results of the analysis, and there are algorithms that can’t handle them. So what is the correct way?

Data Preprocessing, Knime, Machine Learning, Missing Values
Data Science Tools Illustrated Study Guides - Aug 25, 2020.

These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.

Cheat Sheet, Data Preprocessing, Data Processing, Data Science, Data Science Tools, Data Visualization, Python, R, SQL
KDnuggets™ News 20:n29, Jul 29: Easy Guide To Data Preprocessing In Python; Building a better Spark UI; Computational Algebra for Coders: The Free Course - Jul 29, 2020.

An easy guide to data pre-processing in Python; Monitoring Apache Spark with a better Spark UI; Computational Linear Algebra for Coders: the free course; Labelling data with Snorkel; Bayesian Statistics.

Apache Spark, Bayesian, Data Preprocessing, Linear Algebra, Python
How to Prepare Your Data - Jun 30, 2020.

This is an overview of structuring, cleaning, and enriching raw data.

Data Preparation, Data Preprocessing, Dimensionality Reduction, Missing Values, Outliers
How to Deal with Missing Values in Your Dataset - Jun 22, 2020.

In this article, we are going to talk about how to identify and treat the missing values in the data step by step.

Data Preparation, Data Preprocessing, Missing Values, Python
Simplified Mixed Feature Type Preprocessing in Scikit-Learn with Pipelines - Jun 16, 2020.

There is a quick and easy way to perform preprocessing on mixed feature type data in Scikit-Learn, which can be integrated into your machine learning pipelines.

Data Preprocessing, Pipeline, Python, scikit-learn
5 Essential Papers on AI Training Data - Jun 4, 2020.

Data pre-processing is not only the largest time sink for most Data Scientists, but it is also the most crucial aspect of the work. Learn more about training data and data processing tasks from 5 leading academic papers.

AI, Data Preparation, Data Preprocessing, Research, Training Data
Tokenization and Text Data Preparation with TensorFlow & Keras - Mar 6, 2020.

This article will look at tokenizing and further preparing text data for feeding into a neural network using TensorFlow and Keras preprocessing tools.

Data Preprocessing, Keras, NLP, Python, TensorFlow, Text Analytics, Tokenization
Audio Data Analysis Using Deep Learning with Python (Part 2) - Feb 25, 2020.

This is a followup to the first article in this series. Once you are comfortable with the concepts explained in that article, you can come back and continue with this.

Audio, Data Preprocessing, Deep Learning, Python
Easy Image Dataset Augmentation with TensorFlow - Feb 13, 2020.

What can we do when we don't have a substantial amount of varied training data? This is a quick intro to using data augmentation in TensorFlow to perform in-memory image transformations during model training to help overcome this data impediment.

Data Preprocessing, Image Processing, Image Recognition, Python, TensorFlow
Build Pipelines with Pandas Using pdpipe - Dec 13, 2019.

We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.

Data Preparation, Data Preprocessing, Pandas, Pipeline, Python
5 Great New Features in Latest Scikit-learn Release - Dec 10, 2019.

From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.

Data Preparation, Data Preprocessing, Ensemble Methods, Feature Selection, Gradient Boosting, K-nearest neighbors, Machine Learning, Missing Values, Python, scikit-learn, Visualization
Text Encoding: A Review - Nov 22, 2019.

We will focus here exactly on that part of the analysis that transforms words into numbers and texts into number vectors: text encoding.

Data Preprocessing, NLP, Representation, Rosaria Silipo, Text Analytics, Word Embeddings
How to Speed up Pandas by 4x with one line of code - Nov 12, 2019.

While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.

Data Preparation, Data Preprocessing, Modin, Pandas, Python
Data Cleaning and Preprocessing for Beginners - Nov 7, 2019.

Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.

Beginners, Data Cleaning, Data Preprocessing, Pandas, Python, Sciforce
How to Create a Vocabulary for NLP Tasks in Python - Nov 7, 2019.

This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.

Data Preparation, Data Preprocessing, NLP, Python
Know Your Data: Part 2 - Oct 8, 2019.

To build an effective learning model, it is must to understand the quality issues exist in data & how to detect and deal with it. In general, data quality issues are categories in four major sets.

Beginners, Data Preparation, Data Preprocessing, Datasets
4 Tips for Advanced Feature Engineering and Preprocessing - Aug 29, 2019.

Techniques for creating new features, detecting outliers, handling imbalanced data, and impute missing values.

Data Preprocessing, Feature Engineering, Python, Tips
Dealing with categorical features in machine learning - Jul 16, 2019.

Many machine learning algorithms require that their input is numerical and therefore categorical features must be transformed into numerical features before we can use any of these algorithms.

Data Cleaning, Data Preprocessing, Feature Engineering, Machine Learning, Python
7 Steps to Mastering Data Preparation for Machine Learning with Python — 2019 Edition - Jun 24, 2019.

Interested in mastering data preparation with Python? Follow these 7 steps which cover the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.

7 Steps, Data Preparation, Data Preprocessing, Data Science, Data Wrangling, Machine Learning, Pandas, Python
All you need to know about text preprocessing for NLP and Machine Learning - Apr 9, 2019.

We present a comprehensive introduction to text preprocessing, covering the different techniques including stemming, lemmatization, noise removal, normalization, with examples and explanations into when you should use each of them.

Data Preprocessing, Machine Learning, NLP, Python, Text Analysis, Text Mining
Simple Yet Practical Data Cleaning Codes - Feb 26, 2019.

Real world data is messy and needs to be cleaned before it can be used for analysis. Industry experts say the data preprocessing step can easily take 70% to 80% of a data scientist's time on a project.

Data Cleaning, Data Preprocessing, Python
Notes on Feature Preprocessing: The What, the Why, and the How - Oct 26, 2018.

This article covers a few important points related to the preprocessing of numeric data, focusing on the scaling of feature values, and the broad question of dealing with outliers.

Data Preparation, Data Preprocessing, numpy, Python, scikit-learn, SciPy
Get a 2–6x Speed-up on Your Data Pre-processing with Python - Oct 23, 2018.

Get a 2–6x speed-up on your pre-processing with these 3 lines of code!

Data Preprocessing, Efficiency, Programming, Python
Preprocessing for Deep Learning: From covariance matrix to image whitening - Oct 10, 2018.

The goal of this post/notebook is to go from the basics of data preprocessing to modern techniques used in deep learning. My point is that we can use code (Python/Numpy etc.) to better understand abstract mathematical notions!

Pages: 1 2 3

Data Preprocessing, Deep Learning, Image Processing, Mathematics
Financial Data Analysis – Data Processing 1: Loan Eligibility Prediction - Sep 4, 2018.

In this first part I show how to clean and remove unnecessary features. Data processing is very time-consuming, but better data would produce a better model.

Data Preprocessing, Data Processing, Finance, Python
Text Wrangling & Pre-processing: A Practitioner’s Guide to NLP - Aug 3, 2018.

I will highlight some of the most important steps which are used heavily in Natural Language Processing (NLP) pipelines and I frequently use them in my NLP projects.

Data Preprocessing, Data Wrangling, NLP, Text Analytics, Workflow
Data Retrieval with Web Scraping: A Practitioner’s Guide to NLP - Jul 26, 2018.

Proven and tested hands-on strategies to tackle NLP tasks.

Data Preprocessing, NLP, Text Analytics, Workflow
Text Mining on the Command Line - Jul 13, 2018.

In this tutorial, I use raw bash commands and regex to process raw and messy JSON file and raw HTML page. The tutorial helps us understand the text processing mechanism under the hood.

Data Preparation, Data Preprocessing, NLP, Text Mining
Getting Started with spaCy for Natural Language Processing - May 2, 2018.

spaCy is a Python natural language processing library specifically designed with the goal of being a useful library for implementing production-ready systems. It is particularly fast and intuitive, making it a top contender for NLP tasks.

Data Preparation, Data Preprocessing, NLP, Python, Text Analytics, Text Mining
Text Data Preprocessing: A Walkthrough in Python - Mar 26, 2018.

This post will serve as a practical walkthrough of a text data preprocessing task using some common Python tools.

Data Preparation, Data Preprocessing, NLP, Python, Text Analytics, Text Mining
Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches - Jan 24, 2018.

In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.

Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search - Jan 19, 2018.

Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.

Data Preprocessing, Hyperparameter, Optimization, Pipeline, Python, scikit-learn, Workflow
Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction - Dec 7, 2017.

Scikit-learn's Pipeline class is designed as a manageable way to apply a series of data transformations followed by the application of an estimator.

Data Preprocessing, Pipeline, Python, scikit-learn, Workflow
A General Approach to Preprocessing Text Data - Dec 1, 2017.

Recently we had a look at a framework for textual data science tasks in their totality. Now we focus on putting together a generalized approach to attacking text data preprocessing, regardless of the specific textual data science task you have in mind.

Data Preparation, Data Preprocessing, NLP, Text Analytics, Text Mining, Tokenization
7 Steps to Mastering Data Preparation with Python - Jun 2, 2017.

Follow these 7 steps for mastering data preparation, covering the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.

Pages: 1 2

7 Steps, Data Preparation, Data Preprocessing, Data Science, Data Wrangling, Machine Learning, Pandas, Python
Introduction to Natural Language Processing, Part 1: Lexical Units - Feb 16, 2017.

This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.

Data Preprocessing, Datascience.com, Feature Extraction, Natural Language Processing, NLP, Tokenization
5 More Machine Learning Projects You Can No Longer Overlook - Jun 28, 2016.

There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects.

Computer Vision, Data Preparation, Data Preprocessing, Javascript, Machine Learning, Natural Language Processing, NLP, Overlook, Python

Data Preprocessing (53)

Latest Posts

Top Posts