- Is Learning Rate Useful in Artificial Neural Networks? - Jan 15, 2018.
This article will help you understand why we need the learning rate and whether it is useful or not for training an artificial neural network. Using a very simple Python code for a single layer perceptron, the learning rate value will get changed to catch its idea.
Hyperparameter, Neural Networks, Python
- Simple Ways Of Working With Medium To Big Data Locally - Dec 27, 2017.
An overview of the installation and implementation of simple techniques for working with large datasets in your machine.
Big Data, iPhone, Python, R, SAS
- Getting Started with TensorFlow: A Machine Learning Tutorial - Dec 19, 2017.
A complete and rigorous introduction to Tensorflow. Code along with this tutorial to get started with hands-on examples.
Pages: 1 2
Machine Learning, Python, TensorFlow
- Accelerating Algorithms: Considerations in Design, Algorithm Choice and Implementation - Dec 18, 2017.
If you are trying to make your algorithms run faster, you may want to consider reviewing some important points on design and implementation.
ActiveState, Algorithms, Implementation, Python
- Building an Audio Classifier using Deep Neural Networks - Dec 15, 2017.
Using a deep convolutional neural network architecture to classify audio and how to effectively use transfer learning and data-augmentation to improve model accuracy using small datasets.
Acoustics, Audio, Deep Learning, Python, Speech, Speech Recognition, Transfer Learning
- How to Generate FiveThirtyEight Graphs in Python - Dec 14, 2017.
In this post, we'll help you. Using Python's matplotlib and pandas, we'll see that it's rather easy to replicate the core parts of any FiveThirtyEight (FTE) visualization.
Data Visualization, Dataquest, FiveThirtyEight, Python
- TensorFlow for Short-Term Stocks Prediction - Dec 12, 2017.
In this post you will see an application of Convolutional Neural Networks to stock market prediction, using a combination of stock prices with sentiment analysis.
Convolutional Neural Networks, Finance, Python, Stocks, TensorFlow
- Today I Built a Neural Network During My Lunch Break with Keras - Dec 8, 2017.
So yesterday someone told me you can build a (deep) neural network in 15 minutes in Keras. Of course, I didn’t believe that at all. So the next day I set out to play with Keras on my own data.
Keras, Neural Networks, Python
- Managing Machine Learning Workflows with Scikit-learn Pipelines Part 1: A Gentle Introduction - Dec 7, 2017.
Scikit-learn's Pipeline class is designed as a manageable way to apply a series of data transformations followed by the application of an estimator.
Data Preprocessing, Pipeline, Python, scikit-learn, Workflow
- Web Scraping for Data Science with Python - Dec 6, 2017.
We take a quick look at how web scraping can be useful in the context of data science projects, eg to construct a social graph based of S&P 500 companies, using Python and Gephi.
Bart Baesens, Data Science, Python, S&P 500, Web Mining, Web Scraping
- Exploring Recurrent Neural Networks - Dec 1, 2017.
We explore recurrent neural networks, starting with the basics, using a motivating weather modeling problem, and implement and train an RNN in TensorFlow.
Neural Networks, Packt Publishing, Python, Recurrent Neural Networks, TensorFlow
Why You Should Forget ‘for-loop’ for Data Science Code and Embrace Vectorization - Nov 29, 2017.
Data science needs fast computation and transformation of data. NumPy objects in Python provides that advantage over regular programming constructs like for-loop. How to demonstrate it in few easy lines of code?
numpy, Python, Scientific Computing
- How To Unit Test Machine Learning Code - Nov 28, 2017.
One of the main principles I learned during my time at Google Brain was that unit tests can make or break your algorithm and can save you weeks of debugging and training time.
Machine Learning, Neural Networks, Python, Software Engineering, TensorFlow
- Taming the Python Visualization Jungle, Nov 29 Webinar - Nov 22, 2017.
Python has a ton of plotting libraries—but which ones should you use? And how should you go about choosing them? This webinar shows you key starting points and demonstrates how to solve a range of common problems.
Anaconda, Data Visualization, Python
Top 10 Videos on Deep Learning in Python - Nov 17, 2017.
Playlists, individual tutorials (not part of a playlist) and online courses on Deep Learning (DL) in Python using the Keras, Theano, TensorFlow and PyTorch libraries. Assumes no prior knowledge. These videos cover all skill levels and time constraints!
Deep Learning, Keras, Python, PyTorch, TensorFlow, Theano, Top 10, Tutorials, Videolectures, Youtube
- The Python Graph Gallery - Nov 16, 2017.
Welcome to the Python Graph Gallery, a website that displays hundreds of python charts with their reproducible code snippets.
Data Visualization, Matplotlib, Python, Seaborn
- PySpark SQL Cheat Sheet: Big Data in Python - Nov 16, 2017.
PySpark is a Spark Python API that exposes the Spark programming model to Python - With it, you can speed up analytic applications. With Spark, you can get started with big data processing, as it has built-in modules for streaming, SQL, machine learning and graph processing.
Pages: 1 2
Apache Spark, Big Data, DataCamp, Python, SQL
- TensorFlow: What Parameters to Optimize? - Nov 9, 2017.
Learning TensorFlow Core API, which is the lowest level API in TensorFlow, is a very good step for starting learning TensorFlow because it let you understand the kernel of the library. Here is a very simple example of TensorFlow Core API in which we create and train a linear regression model.
Neural Networks, Optimization, Python, TensorFlow
- Tips for Getting Started with Text Mining in R and Python - Nov 8, 2017.
This article opens up the world of text mining in a simple and intuitive way and provides great tips to get started with text mining.
Python, R, Text Mining
7 Steps to Mastering Deep Learning with Keras - Oct 30, 2017.
Are you interested in learning how to use Keras? Do you already have an understanding of how neural networks work? Check out this lean, fat-free 7 step plan for going from Keras newbie to master of its basics as quickly as is possible.
7 Steps, Convolutional Neural Networks, Deep Learning, Keras, Logistic Regression, LSTM, Machine Learning, Neural Networks, Python, Recurrent Neural Networks
Ranking Popular Deep Learning Libraries for Data Science - Oct 23, 2017.
We rank 23 open-source deep learning libraries that are useful for Data Science. The ranking is based on equally weighing its three components: Github and Stack Overflow activity, as well as Google search results.
Caffe, Deep Learning, Keras, Python, PyTorch, TensorFlow, Theano
- Data Science Bootcamp in Zurich, Switzerland, January 15 – April 6, 2018 - Oct 12, 2017.
Come to the land of chocolate and Data Science where the local tech scene is booming and the jobs are a plenty. Learn the most important concepts from top instructors by doing and through projects. Use code KDNUGGETS to save.
Bootcamp, Data Science, Data Visualization, Machine Learning, NLP, Python, R, Switzerland, Zurich
How I started with learning AI in the last 2 months - Oct 9, 2017.
The relevance of a full stack developer will not be enough in the changing scenario of things. In the next two years, full stack will not be full stack without AI skills.
AI, Chatbot, Gradient Descent, Neural Networks, Python
Top 10 Videos on Machine Learning in Finance - Sep 29, 2017.
Talks, tutorials and playlists – you could not get a more gentle introduction to Machine Learning (ML) in Finance. Got a quick 4 minutes or ready to study for hours on end? These videos cover all skill levels and time constraints!
Credit Risk, Finance, Investment Portfolio, Machine Learning, Python, R, Stocks, Tutorials, Videolectures, Youtube
- Tensorflow Tutorial, Part 2 – Getting Started - Sep 28, 2017.
This tutorial will lay a solid foundation to your understanding of Tensorflow, the leading Deep Learning platform. The second part shows how to get started, install, and build a small test case.
Deep Learning, GPU, Python, TensorFlow
- Python Data Preparation Case Files: Group-based Imputation - Sep 25, 2017.
The second part in this series addresses group-based imputation for dealing with missing data values. Check out why finding group means can be a more formidable action than overall means, and see how to accomplish it in Python.
Data Preparation, Pandas, Python
30 Essential Data Science, Machine Learning & Deep Learning Cheat Sheets - Sep 22, 2017.
This collection of data science cheat sheets is not a cheat sheet dump, but a curated list of reference materials spanning a number of disciplines and tools.
Pages: 1 2 3
Cheat Sheet, Data Science, Deep Learning, Machine Learning, Neural Networks, Probability, Python, R, SQL, Statistics
- Keras Tutorial: Recognizing Tic-Tac-Toe Winners with Neural Networks - Sep 18, 2017.
In this tutorial, we will build a neural network with Keras to determine whether or not tic-tac-toe games have been won by player X for given endgame board configurations. Introductory neural network concerns are covered.
Games, Keras, Neural Networks, Python
- Python Data Preparation Case Files: Removing Instances & Basic Imputation - Sep 14, 2017.
This is the first of 3 posts to cover imputing missing values in Python using Pandas. The slowest-moving of the series (out of necessity), this first installment lays out the task and data at the risk of boring you. The next 2 posts cover group- and regression-based imputation.
Data Preparation, Pandas, Python
Python vs R – Who Is Really Ahead in Data Science, Machine Learning? - Sep 12, 2017.
We examine Google Trends, job trends, and more and note that while Python has only a small advantage among current Data Science and Machine Learning related jobs, this advantage is likely to increase in the future.
Data Science, Google Trends, Jobs, Kaggle, Machine Learning, Python, Python vs R, R
- Visualizing Cross-validation Code - Sep 5, 2017.
Cross-validation helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Check out this post for a visualized explanation.
Cross-validation, Machine Learning, Python, scikit-learn
- Search Millions of Documents for Thousands of Keywords in a Flash - Sep 1, 2017.
We present a python library called FlashText that can search or replace keywords / synonyms in documents in O(n) – linear time.
Algorithms, Data Science, GitHub, NLP, Python, Search, Search Engine, Text Mining
Python overtakes R, becomes the leader in Data Science, Machine Learning platforms - Aug 28, 2017.
While Python did not "swallow" R, in 2017 Python ecosystem overtook R as the leading platform for Analytics, Data Science, and Machine Learning and is pulling users from other platforms.
Data Science Platform, Poll, Python, Python vs R, R
42 Steps to Mastering Data Science - Aug 25, 2017.
This post is a collection of 6 separate posts of 7 steps a piece, each for mastering and better understanding a particular data science topic, with topics ranging from data preparation, to machine learning, to SQL databases, to NoSQL and beyond.
Data Preparation, Data Science, Deep Learning, Machine Learning, NoSQL, Python, SQL
A Guide to Instagramming with Python for Data Analysis - Aug 17, 2017.
I am writing this article to show you the basics of using Instagram in a programmatic way. You can benefit from this if you want to use it in a data analysis, computer vision, or any other cool project you can think of.
Pages: 1 2
Data Analysis, Image Recognition, Instagram, Python
- Comparing Distance Measurements with Python and SciPy - Aug 15, 2017.
This post introduces five perfectly valid ways of measuring distances between data points. We will also perform simple demonstration and comparison with Python and the SciPy library.
Clustering, K-means, Python, SciPy
Machine Learning Exercises in Python: An Introductory Tutorial Series - Jul 26, 2017.
This post presents a summary of a series of tutorials covering the exercises from Andrew Ng's machine learning class on Coursera. Instead of implementing the exercises in Octave, the author has opted to do so in Python, and provide commentary along the way.
Andrew Ng, Machine Learning, Python
6 Reasons Why Python Is Suddenly Super Popular - Jul 25, 2017.
Python is a general-purpose language — sometimes referred to as utilitarian — which is designed to be simple to read and write. The point that it’s not a complex language is important.
Programming Languages, Python
- Road Lane Line Detection using Computer Vision models - Jul 19, 2017.
A tutorial on how to implement a computer vision data pipeline for road lane detection used by self-driving cars.
Pages: 1 2
AI, Computer Vision, Data Science, Machine Learning, Python, Self-Driving Car
- Exploratory Data Analysis in Python - Jul 7, 2017.
We view EDA very much like a tree: there is a basic series of steps you perform every time you perform EDA (the main trunk of the tree) but at each step, observations will lead you down other avenues (branches) of exploration by raising questions you want to answer or hypotheses you want to test.
Data Analysis, Data Exploration, Data Preparation, Jupyter, Python, SVDS
Getting Started with Python for Data Analysis - Jul 5, 2017.
A guide for beginners to Python for getting started with data analysis.
Beginners, Data Analysis, Jupyter, numpy, Python
- Text Clustering : Quick insights from Unstructured Data, part 2 - Jul 4, 2017.
We will build this in a modular way and also focus on exposing the functionalities as an API so that it can serve as a plug and play model without any disruptions to the existing systems.
API, Clustering, Python, Text Analytics, Unstructured data
- How Feature Engineering Can Help You Do Well in a Kaggle Competition – Part 3 - Jul 4, 2017.
In this last post of the series, I describe how I used more powerful machine learning algorithms for the click prediction problem as well as the ensembling techniques that took me up to the 19th position on the leaderboard (top 2%)
Feature Engineering, Jupyter, Kaggle, Machine Learning, Python
Top 15 Python Libraries for Data Science in 2017 - Jun 13, 2017.
Since all of the libraries are open sourced, we have added commits, contributors count and other metrics from Github, which could be served as a proxy metrics for library popularity.
Pages: 1 2
Data Mining, Data Science, Deep Learning, Machine Learning, Natural Language Processing, Python, Visualization
- How Feature Engineering Can Help You Do Well in a Kaggle Competition – Part I - Jun 8, 2017.
As I scroll through the leaderboard page, I found my name in the 19th position, which was the top 2% from nearly 1,000 competitors. Not bad for the first Kaggle competition I had decided to put a real effort in!
Apache Spark, Feature Engineering, Jupyter, Kaggle, Machine Learning, Python
- Machine Learning Workflows in Python from Scratch Part 2: k-means Clustering - Jun 7, 2017.
The second post in this series of tutorials for implementing machine learning workflows in Python from scratch covers implementing the k-means clustering algorithm.
Clustering, K-means, Machine Learning, Python, Workflow
6 Interesting Things You Can Do with Python on Facebook Data - Jun 6, 2017.
Facebook has a huge amount of data that is available for you to explore, you can do many things with this data. I will be sharing my experience with you on how you can use the Facebook Graph API for analysis with Python.
Facebook, Pandas, Python
7 Steps to Mastering Data Preparation with Python - Jun 2, 2017.
Follow these 7 steps for mastering data preparation, covering the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
Pages: 1 2
7 Steps, Data Preparation, Data Preprocessing, Data Science, Data Wrangling, Machine Learning, Pandas, Python
- Data Science for Newbies: An Introductory Tutorial Series for Software Engineers - May 31, 2017.
This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.
Apache Spark, Data Science, Jupyter, Machine Learning, Pandas, Python, Reddit, Scala, SQL
- Data preprocessing for deep learning with nuts-ml - May 30, 2017.
Nuts-ml is a new data pre-processing library in Python for GPU-based deep learning in vision. It provides common pre-processing functions as independent, reusable units. These so called ‘nuts’ can be freely arranged to build data flows that are efficient, easy to read and modify.
Data Preparation, Deep Learning, IBM, Image Recognition, Python
Machine Learning Workflows in Python from Scratch Part 1: Data Preparation - May 29, 2017.
This post is the first in a series of tutorials for implementing machine learning workflows in Python from scratch, covering the coding of algorithms and related tools from the ground up. The end result will be a handcrafted ML toolkit. This post starts things off with data preparation.
Data Preparation, Machine Learning, Python, Workflow
- DataScience.com Releases Python Package for Interpreting the Decision-Making Processes of Predictive Models - May 24, 2017.
DataScience.com new Python library, Skater, uses a combination of model interpretation algorithms to identify how models leverage data to make predictions.
Datascience.com, GitHub, Interpretability, Python
New Leader, Trends, and Surprises in Analytics, Data Science, Machine Learning Software Poll - May 22, 2017.
Python caught up with R and (barely) overtook it; Deep Learning usage surges to 32%; RapidMiner remains top general Data Science platform; Five languages of Data Science.
Pages: 1 2
Anaconda, Data Mining Software, Poll, Python, R, RapidMiner, Spark, TensorFlow
- The Path To Learning Artificial Intelligence - May 19, 2017.
Learn how to easily build real-world AI for booming tech, business, pioneering careers and game-level fun.
AI, Artificial Intelligence, Deep Learning, Learning Path, Machine Learning, Online Education, Python
Deep Learning in Minutes with this Pre-configured Python VM Image - May 5, 2017.
Check out this Python deep learning virtual machine image, built on top of Ubuntu, which includes a number of machine learning tools and libraries, along with several projects to get up and running with right away.
Deep Learning, Machine Learning, Python
- How Not To Program the TensorFlow Graph - May 1, 2017.
Using TensorFlow from Python is like using Python to program another computer. Being thoughtful about the graphs you construct can help you avoid confusion and costly performance problems.
Deep Learning, Programming, Python, TensorFlow
The Guerrilla Guide to Machine Learning with Python - May 1, 2017.
Here is a bare bones take on learning machine learning with Python, a complete course for the quick study hacker with no time (or patience) to spare.
Deep Learning, Machine Learning, Pandas, Python, scikit-learn, Sebastian Raschka
- Dask and Pandas and XGBoost: Playing nicely between distributed systems - Apr 27, 2017.
This blogpost gives a quick example using Dask.dataframe to do distributed Pandas data wrangling, then using a new dask-xgboost package to setup an XGBoost cluster inside the Dask cluster and perform the handoff.
Dask, Distributed Systems, Pandas, Python, XGBoost
5 Machine Learning Projects You Can No Longer Overlook, April - Apr 13, 2017.
It's about that time again... 5 more machine learning or machine learning-related projects you may not yet have heard of, but may want to consider checking out. Find tools for data exploration, topic modeling, high-level APIs, and feature selection herein.
Data Exploration, Deep Learning, Java, Machine Learning, Neural Networks, Overlook, Python, Scala, scikit-learn, Topic Modeling
- Introduction to Anomaly Detection - Apr 3, 2017.
This overview will cover several methods of detecting anomalies, as well as how to build a detector in Python using simple moving average (SMA) or low-pass filter.
Anomaly Detection, Datascience.com, Python, Time Series
- A Beginner’s Guide to Tweet Analytics with Pandas - Mar 29, 2017.
Unlike a lot of other tutorials which often pull from the real-time Twitter API, we will be using the downloadable Twitter Analytics data, and most of what we do will be done in Pandas.
Pandas, Python, Twitter
- Email Spam Filtering: An Implementation with Python and Scikit-learn - Mar 17, 2017.
This post is an overview of a spam filtering implementation using Python and Scikit-learn. The results of 2 classifiers are contrasted and compared: multinomial Naive Bayes and support vector machines.
Machine Learning, Python, scikit-learn
- Open Source Toolkits for Speech Recognition - Mar 14, 2017.
This article reviews the main options for free speech recognition toolkits that use traditional Hidden Markov Models and n-gram language models.
C++, Java, Open Source, Python, Speech Recognition, SVDS
- Working With Numpy Matrices: A Handy First Reference - Mar 10, 2017.
This introductory tutorial does a great job of outlining the most common Numpy array creation and manipulation functionality. A good post to keep handy while taking your first steps in Numpy, or to use as a handy reminder.
numpy, Python
- K-Means & Other Clustering Algorithms: A Quick Intro with Python - Mar 8, 2017.
In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.
Clustering, K-means, Python, scikit-learn
- A Simple XGBoost Tutorial Using the Iris Dataset - Mar 7, 2017.
This is an overview of the XGBoost machine learning algorithm, which is fast and shows good results. This example uses multiclass prediction with the Iris dataset from Scikit-learn.
Python, scikit-learn, XGBoost
- Bokeh Cheat Sheet: Data Visualization in Python - Mar 3, 2017.
Bokeh is the Python data visualization library that enables high-performance visual presentation of large datasets in modern web browsers. The package is flexible and offers lots of possibilities to visualize your data in a compelling way, but can be overwhelming.
Bokeh, Cheat Sheet, Data Visualization, DataCamp, Python
- Gartner Data Science Platforms – A Deeper Look - Mar 3, 2017.
Thomas Dinsmore critical examination of Gartner 2017 MQ of Data Science Platforms, including vendors who out, in, have big changes, Hadoop and Spark integration, open source software, and what Data Scientists actually use.
Apache Spark, Data Science Platform, Gartner, IBM, Python, R, SAS, Thomas Dinsmore
- Building a Bot to Answer FAQs: Predicting Text Similarity - Mar 2, 2017.
In this post, learn to build a bot to answer frequently asked questions, reducing lag time for more customers and taking the load off of engineers, ensuring they can concentrate on building products!
Chatbot, Python, Similarity
7 More Steps to Mastering Machine Learning With Python - Mar 1, 2017.
This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.
Pages: 1 2
7 Steps, Classification, Clustering, Deep Learning, Ensemble Methods, Gradient Boosting, Machine Learning, Python, scikit-learn, Sebastian Raschka
- What I Learned Implementing a Classifier from Scratch in Python - Feb 28, 2017.
In this post, the author implements a machine learning algorithm from scratch, without the use of a library such as scikit-learn, and instead writes all of the code in order to have a working binary classifier algorithm.
Classification, Machine Learning, Perceptron, Python, Sebastian Raschka
An Overview of Python Deep Learning Frameworks - Feb 27, 2017.
Read this concise overview of leading Python deep learning frameworks, including Theano, Lasagne, Blocks, TensorFlow, Keras, MXNet, and PyTorch.
Deep Learning, Keras, Neural Networks, Python, TensorFlow, Theano, Torch
- Moving from R to Python: The Libraries You Need to Know - Feb 24, 2017.
Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.
Jupyter, Pandas, Programming, Python, R, scikit-learn, Yhat
- What is a Support Vector Machine, and Why Would I Use it? - Feb 23, 2017.
Support Vector Machine has become an extremely popular algorithm. In this post I try to give a simple explanation for how it works and give a few examples using the the Python Scikits libraries.
Python, scikit-learn, Support Vector Machines, SVM, Yhat
- Introduction to Correlation - Feb 22, 2017.
Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.
Beginners, Correlation, Datascience.com, Pandas, Python, Statistics
- Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory - Feb 16, 2017.
Apache Parquet and Apache Arrow both focus on improving performance and efficiency of data analytics. These two projects optimize performance for on disk and in-memory processing
Apache, Apache Arrow, Apache Spark, Data Science, Dremio, In-Memory Computing, Machine Learning, Python
- Web Scraping for Dataset Curation, Part 2: Tidying Craft Beer Data - Feb 14, 2017.
This is the second part in a 2 part series on curating data from the web. The first part focused on web scraping, while this post details the process of tidying scraped data after the fact.
Beer, Data Curation, Dataset, Python
- Web Scraping for Dataset Curation, Part 1: Collecting Craft Beer Data - Feb 13, 2017.
This post is the first in a 2 part series on scraping and cleaning data from the web using Python. This first part is concerned with the scraping aspect, while the second part while focus on the cleaning. A concrete example is presented.
Beer, Data Curation, Dataset, Python, Web Scraping
- Making Python Speak SQL with pandasql - Feb 8, 2017.
Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.
Pandas, Python, SQL, Yhat
Pandas Cheat Sheet: Data Science and Data Wrangling in Python - Jan 27, 2017.
The Pandas library can seem very elaborate and it might be hard to find a single point of entry to the material: with other learning materials focusing on different aspects of this library, you can definitely use a reference sheet to help you get the hang of it.
Cheat Sheet, Data Preparation, DataCamp, Pandas, Python
- Great Collection of Minimal and Clean Implementations of Machine Learning Algorithms - Jan 25, 2017.
Interested in learning machine learning algorithms by implementing them from scratch? Need a good set of examples to work from? Check out this post with links to minimal and clean implementations of various algorithms.
Algorithms, Machine Learning, Programming, Python
- Learn how to Develop and Deploy a Gradient Boosting Machine Model - Jan 20, 2017.
GBM is one the hottest machine learning methods. Learn how to create GBM using SciKit-Learn and Python and
understand the steps required to transform features, train, and deploy a GBM.
Gradient Boosting, Open Data Group, Python, scikit-learn
The Most Popular Language For Machine Learning and Data Science Is … - Jan 11, 2017.
When it comes to choosing programming language for Data Analytics projects or job prospects, people have different opinions depending on their career backgrounds and domains they worked in. Here is the analysis of data from indeed.com with respect to choice of programming language for machine learning and data science.
Data Science, Machine Learning, Programming Languages, Python, R, Scala
- Tidying Data in Python - Jan 4, 2017.
This post summarizes some tidying examples Hadley Wickham used in his 2014 paper on Tidy Data in R, but will demonstrate how to do so using the Python pandas library.
Data Cleaning, Data Preparation, Pandas, Python
5 Machine Learning Projects You Can No Longer Overlook, January - Jan 2, 2017.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects, the most recent in an ongoing series.
Boosting, C++, Data Preparation, Decision Trees, Machine Learning, Neural Networks, Optimization, Overlook, Pandas, Python, scikit-learn
50+ Data Science, Machine Learning Cheat Sheets, updated - Dec 14, 2016.
Gear up to speed and have concepts and commands handy in Data Science, Data Mining, and Machine learning algorithms with these cheat sheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark, Matlab, and Java.
Cheat Sheet, Data Science, Django, Hadoop, Java, Machine Learning, MATLAB, Python, R
- Introduction to K-means Clustering: A Tutorial - Dec 9, 2016.
A beginner introduction to the widely-used K-means clustering algorithm, using a delivery fleet data example in Python.
Clustering, Datascience.com, K-means, Python
- Free ebooks: Machine Learning with Python and Practical Data Analysis - Dec 5, 2016.
Two free ebooks: "Building Machine Learning Systems with Python" and "Practical Data Analysis" will give your skills a boost and make a great start in the New Year.
Data Analysis, Free ebook, Machine Learning, Packt Publishing, Python
- Random Forests® in Python - Dec 2, 2016.
Random forest is a highly versatile machine learning method with numerous applications ranging from marketing to healthcare and insurance. This is a post about random forests using Python.
Algorithms, Classification, Ensemble Methods, Python, random forests algorithm, Yhat
- Introduction to Machine Learning for Developers - Nov 28, 2016.
Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.
Pages: 1 2
Beginners, Classification, Clustering, Machine Learning, Pandas, Python, R, scikit-learn, Software Developer
Top 20 Python Machine Learning Open Source Projects, updated - Nov 21, 2016.
Open Source is the heart of innovation and rapid evolution of technologies, these days. This article presents you Top 20 Python Machine Learning Open Source Projects of 2016 along with very interesting insights and trends found during the analysis.
GitHub, Machine Learning, Open Source, Python, scikit-learn
- How to Rank 10% in Your First Kaggle Competition - Nov 9, 2016.
This post presents a pathway to achieving success in Kaggle competitions as a beginner. The path generalizes beyond competitions, however. Read on for insight into succeeding while approaching any data science project.
Pages: 1 2 3 4
Beginners, Competition, Data Science, Kaggle, Machine Learning, Python
Eight Things an R user Will Find Frustrating When Trying to Learn Python - Nov 2, 2016.
Are you an R user considering learning Python? Here's some insight into what you may be up against, and what, specifically, you may find frustrating. But don't worry, it's not all terrible.
Python, R
- Using Machine Learning to Detect Malicious URLs - Oct 28, 2016.
This is a write-up of an experiment employing a machine learning model to identify malicious URLs. The author provides a link to the code for you to try yourself.
Cybersecurity, Python, Security
- Automated Machine Learning: An Interview with Randy Olson, TPOT Lead Developer - Oct 28, 2016.
Read an insightful interview with Randy Olson, Senior Data Scientist at University of Pennsylvania Institute for Biomedical Informatics, and lead developer of TPOT, an open source Python tool that intelligently automates the entire machine learning process.
Automated Data Science, Automated Machine Learning, Machine Learning, Python, scikit-learn
- Jupyter Notebook Best Practices for Data Science - Oct 20, 2016.
Check out this overview of Jupyter notebook best practices as pertains to data science. Novice or expert, you may find something of use here.
Data Science, Jupyter, Python, SVDS
A Beginner’s Guide to Neural Networks with Python and SciKit Learn 0.18! - Oct 20, 2016.
This post outlines setting up a neural network in Python using Scikit-learn, the latest version of which now has built in support for Neural Network models.
Pages: 1 2
Beginners, Machine Learning, Neural Networks, Python, scikit-learn
- Introducing Dask for Parallel Programming: An Interview with Project Lead Developer - Sep 7, 2016.
Introducing Dask, a flexible parallel computing library for analytics. Learn more about this project built with interactive data science in mind in an interview with its lead developer.
Analytics, Continuum Analytics, Dask, Data Science, Distributed Computing, Parallelism, Python, Scientific Computing
- A Gentle Introduction to Bloom Filter - Aug 24, 2016.
The Bloom Filter is a probabilistic data structure which can make a tradeoff between space and false positive rate. Read more, and see an implementation from scratch, in this post.
Algorithms, Efficiency, Python
- Visualizing 1 Billion Points of Data: Doing It Right – Aug 18 Webinar - Aug 11, 2016.
Join Continuum Analytics on August 18 for a webinar on Big Data visualization with the datashader library. Save your spot today!
Continuum Analytics, Data Visualization, Jupyter, Python
- 7 Steps to Understanding Computer Vision - Aug 9, 2016.
A starting point for Computer Vision and how to get going deeper. Dive into this post for some overview of the right resources and a little bit of advice.
7 Steps, Computer Vision, Deep Learning, Neural Networks, Python
- Data Science of Visiting Famous Movie Locations in San Francisco - Jul 30, 2016.
Using the Google Places API and IMDb API, we selected movie locations in The Golden City which every movie fan should visit while they are in town, and optimize sightseeing by solving the travelling salesman problem.
CA, Data Science, Google, IMDb, Python, San Francisco
- Would You Survive the Titanic? A Guide to Machine Learning in Python Part 2 - Jul 26, 2016.
This is part 2 of a 3 part introductory series on machine learning in Python, using the Titanic dataset.
Pages: 1 2
Machine Learning, Python, Titanic
- Would You Survive the Titanic? A Guide to Machine Learning in Python Part 1 - Jul 25, 2016.
Check out the first of a 3 part introductory series on machine learning in Python, fueled by the Titanic dataset. This is a great place to start for a machine learning newcomer.
Machine Learning, Python, scikit-learn, Titanic
- SAS vs R vs Python: Which Tool Do Analytics Pros Prefer? - Jul 22, 2016.
There are lots of flame wars involving different data science and analytics tools... but this isn't one of them. Check out the quantitative results and analysis of a Burtch Works survey on the subject.
Burtch Works, Python, R, SAS, Survey
- Building a Data Science Portfolio: Machine Learning Project Part 1 - Jul 20, 2016.
Dataquest's founder has put together a fantastic resource on building a data science portfolio. This first of three parts lays the groundwork, with subsequent posts over the following 2 days. Very comprehensive!
Pages: 1 2
Advice, Career, Data Science, Data Scientist, Dataquest, Machine Learning, Portfolio, Project, Python
- Statistical Data Analysis in Python - Jul 18, 2016.
This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects, taking the form of a set of IPython notebooks.
IPython, Jupyter, Pandas, Python, Statistical Analysis
- America’s Next Topic Model - Jul 15, 2016.
Topic modeling is a a great way to get a bird's eye view on a large document collection using machine learning. Here are 3 ways to use open source Python tool Gensim to choose the best topic model.
LDA, NLP, Python, Text Mining, Topic Modeling, Unsupervised Learning
- 5 Deep Learning Projects You Can No Longer Overlook - Jul 12, 2016.
There are a number of "mainstream" deep learning projects out there, but many more niche projects flying under the radar. Have a look at 5 such projects worth checking out.
C++, Deep Learning, Javascript, Machine Learning, Neural Networks, Overlook, Python
- Interview: Florian Douetteau, Dataiku Founder, on Empowering Data Scientists - Jul 7, 2016.
Here is an interview with Florian Douetteau, founder of Dataiku, on how their tools empower data scientists, and how data science itself is evolving.
Ajay Ohri, API, Data Science Tools, Dataiku, Florian Douetteau, Python, R
- Deep Residual Networks for Image Classification with Python + NumPy - Jul 7, 2016.
This post outlines the results of an innovative Deep Residual Network implementation for Image Classification using Python and NumPy.
Deep Learning, Neural Networks, numpy, Python
- Mining Twitter Data with Python Part 7: Geolocation and Interactive Maps - Jul 6, 2016.
The final part of this 7 part series explores using geolocation and interactive maps with Twitter data.
Data Visualization, Geo-Localization, Javascript, Python, Social Media, Social Media Analytics, Text Mining, Twitter
- Mining Twitter Data with Python Part 6: Sentiment Analysis Basics - Jul 5, 2016.
Part 6 of this series builds on the previous installments by exploring the basics of sentiment analysis on Twitter data.
Python, Sentiment Analysis, Social Media, Social Media Analytics, Text Mining, Twitter
- Mining Twitter Data with Python Part 5: Data Visualisation Basics - Jun 29, 2016.
Part 5 of this series takes on data visualization, as we look to make sense of our data and highlight interesting insights.
D3.js, Data Visualization, Python, Social Media, Social Media Analytics, Text Mining, Twitter
- 5 More Machine Learning Projects You Can No Longer Overlook - Jun 28, 2016.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects.
Computer Vision, Data Preparation, Data Preprocessing, Javascript, Machine Learning, Natural Language Processing, NLP, Overlook, Python
- Mining Twitter Data with Python Part 4: Rugby and Term Co-occurrences - Jun 27, 2016.
Part 4 of this series employs some of the lessons learned thus far to analyze tweets related to rugby matches and term co-occurrences.
Python, Social Media, Social Media Analytics, Text Mining, Twitter
- Mining Twitter Data with Python Part 1: Collecting Data - Jun 15, 2016.
Part 1 of a 7 part series focusing on mining Twitter data for a variety of use cases. This first post lays the groundwork, and focuses on data collection.
Python, Social Media, Social Media Analytics, Twitter
- R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results - Jun 6, 2016.
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.
Pages: 1 2
Data Mining Software, Data Science Platform, Poll, Python, Python vs R, R, RapidMiner, SQL
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
Data Cleaning, Deep Learning, Machine Learning, Open Source, Overlook, Pandas, Python, scikit-learn, Theano
- Top 10 IPython Notebook Tutorials for Data Science and Machine Learning - Apr 22, 2016.
A list of 10 useful Github repositories made up of IPython (Jupyter) notebooks, focused on teaching data science and machine learning. Python is the clear target here, but general principles are transferable.
Data Science, Deep Learning, GitHub, IPython, Machine Learning, Python, Sebastian Raschka, TensorFlow
- Comprehensive Guide to Learning Python for Data Analysis and Data Science - Apr 20, 2016.
Want to make a career change to Data Science using python? Well learning anything on your own can be a challenge & a little guidance could be a great help, that is exactly what this article will provide you with.
Pages: 1 2
Data Analysis, Data Science Education, DataCamp, Python
- Doing Data Science: A Kaggle Walkthrough – Cleaning Data - Mar 23, 2016.
Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.
Pages: 1 2
Data Cleaning, Data Preparation, Kaggle, Pandas, Python
- New KDnuggets Tutorials Page: Learn R, Python, Data Visualization, Data Science, and more - Mar 16, 2016.
Introducing new KDnuggets Tutorials page with useful resources for learning about Business Analytics, Big Data, Data Science, Data Mining, R, Python, Data Visualization, Spark, Deep Learning and more.
Data Science Education, Online Education, Python, R
- scikit-feature: Open-Source Feature Selection Repository in Python - Mar 3, 2016.
scikit-feature is an open-source feature selection repository in python, with around 40 popular algorithms in feature selection research. It is developed by Data Mining and Machine Learning Lab at Arizona State University.
Data Mining, Data Science, Feature Extraction, Feature Selection, Machine Learning, Python
Scikit Flow: Easy Deep Learning with TensorFlow and Scikit-learn - Feb 12, 2016.
Scikit Learn is a new easy-to-use interface for TensorFlow from Google based on the Scikit-learn fit/predict model. Does it succeed in making deep learning more accessible?
Deep Learning, Google, Matthew Mayo, Python, scikit-learn, TensorFlow
- Data Science Skills for 2016 - Feb 12, 2016.
As demand for the hottest job is getting hotter in new year, the skill set required for them is getting larger. Here, we are discussing the skills which will be in high demand for data scientist which include data visualization, Apache Spark, R, python and many more.
Apache Spark, CrowdFlower, Data Science, Python, Skills, SQL
- Python Data Science with Pandas vs Spark DataFrame: Key Differences - Jan 29, 2016.
A post describing the key differences between Pandas and Spark's DataFrame format, including specifics on important regular processing features, with code samples.
Apache Spark, Pandas, Python
- Useful Data Science: Feature Hashing - Jan 28, 2016.
Feature engineering plays major role while solving the data science problems. Here, we will learn Feature Hashing, or the hashing trick which is a method for turning arbitrary features into a sparse binary vector.
Feature Engineering, Hashing, Python, Will McGinnis
- Implementing Your Own k-Nearest Neighbor Algorithm Using Python - Jan 27, 2016.
A detailed explanation of one of the most used machine learning algorithms, k-Nearest Neighbors, and its implementation from scratch in Python. Enhance your algorithmic understanding with this hands-on coding exercise.
Pages: 1 2 3
K-nearest neighbors, Python, Python Tutorial
- Top New Features in Orange 3 Data Mining Platform - Dec 10, 2015.
The main technical advantage of Orange 3 is its integration with NumPy and SciPy libraries. Other improvements include reading online data, working through queries for SQL and pre-processing.
Pages: 1 2
Data Mining, Data Visualization, numpy, Orange, Python, scikit-learn