-
Email Spam Filtering: An Implementation with Python and Scikit-learn
This post is an overview of a spam filtering implementation using Python and Scikit-learn. The results of 2 classifiers are contrasted and compared: multinomial Naive Bayes and support vector machines.
-
7 Types of Data Scientist Job Profiles
There is no one profile for the Data Scientist, but I tried to make a few generic job profiles that can somewhat fit job descriptions of different companies. I think there is way too much variety, but I had to narrow down on a set of profiles. Check out the list.
-
Open Source Toolkits for Speech Recognition
This article reviews the main options for free speech recognition toolkits that use traditional Hidden Markov Models and n-gram language models.
-
Working With Numpy Matrices: A Handy First Reference
This introductory tutorial does a great job of outlining the most common Numpy array creation and manipulation functionality. A good post to keep handy while taking your first steps in Numpy, or to use as a handy reminder.
-
Visualizing Time-Series Change
When creating time-series line charts, it’s important to consider which of the following messages you would like to communicate: Actual value of units? Change in absolute units? Percent change? Change from a specific point in time?
-
Beginner’s Guide to Customer Segmentation
At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can... you guessed it, get more customers!
-
Building Regression Models in R using Support Vector Regression
The article studies the advantage of Support Vector Regression (SVR) over Simple Linear Regression (SLR) models for predicting real values, using the same basic idea as Support Vector Machines (SVM) use for classification.
-
K-Means & Other Clustering Algorithms: A Quick Intro with Python
In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.
-
A Simple XGBoost Tutorial Using the Iris Dataset
This is an overview of the XGBoost machine learning algorithm, which is fast and shows good results. This example uses multiclass prediction with the Iris dataset from Scikit-learn.
-
Software Engineering vs Machine Learning Concepts
Not all core concepts from software engineering translate into the machine learning universe. Here are some differences I've noticed.
|