Matthew Mayo (@mattmayo13) holds a master's degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.
This article covers a few important points related to the preprocessing of numeric data, focusing on the scaling of feature values, and the broad question of dealing with outliers.
What if you want to implement an automated machine learning pipeline of your very own, or automate particular aspects of a machine learning pipeline? Rest assured that there is no need to reinvent any wheels.
Let's have a look at the main approaches to NLP tasks that we have at our disposal. We will then have a look at the concrete NLP tasks we can tackle with said approaches.
Cognitive biases are tendencies to think in certain ways that can lead to systematic deviations from a standard of rationality or good judgment. They have all sorts of practical impacts on our lives, whether we want to admit it or not.
Thinking about ways to find a better set of initial centroid positions is a valid approach to optimizing the k-means clustering process. This post outlines just such an approach.
Check out this collection of machine learning concept cheat sheets based on Stanord CS 229 material, including supervised and unsupervised learning, neural networks, tips & tricks, probability & stats, and algebra & calculus.
This article presents 5 things to know about A/B testing, from appropriate sample sizes, to statistical confidence, to A/B testing usefulness, and more.
Check out this new data science cheat sheet, a relatively broad undertaking at a novice depth of understanding, which concisely packs a wide array of diverse data science goodness into a 9 page treatment.
This is an overview of some basic functionality of the MXNet ndarray package for creating tensor-like objects, and using the autograd package for performing automatic differentiation.