When it comes to measuring marketing campaign performance or analysing customers in any business, below top 5 Key Performance Indicators (KPIs) needs to be used to strategically drive the business.
In this post, the author implements a machine learning algorithm from scratch, without the use of a library such as scikit-learn, and instead writes all of the code in order to have a working binary classifier algorithm.
"Data scientist" continues to be recognized as a top career, but does this mean unending spoils for the data scientist? With large scale mass automation on the horizon for numerous professions, what can we do to safeguard our positions?
Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.
This post sketches out some common principles which would help you better understand deep learning frameworks, and provides a guide on how to implement your own deep learning framework as well.
We compare Gartner 2017 Magic Quadrant for Data Science Platforms vs its 2016 version and identify notable changes for leaders and challengers, including IBM, SAS, RapidMiner, KNIME, MathWorks, Microsoft, and Quest.
Cyber Security is always a hot topic in IT industry and machine learning is making security systems more stronger. Here, a particular use case of machine learning in cyber security is explained in detail.
Support Vector Machine has become an extremely popular algorithm. In this post I try to give a simple explanation for how it works and give a few examples using the the Python Scikits libraries.
Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.
The second part of 17 new must-know Data Science Interview questions and answers covers overfitting, ensemble methods, feature selection, ground truth in unsupervised learning, the curse of dimensionality, and parallel algorithms.
Big Data has truly come of age in 2013 when OED introduced the term “Big Data” for the first time. But when was the term Big Data first used and Why? Here are the results of our investigation.
This post presents an example of regression model stacking, and proceeds by using XGBoost, Neural Networks, and Support Vector Regression to predict house prices.
This is an attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures.
Creativity and Innovation are integral to Data Science and going forward in the world of AI, those are the things that will give edge to the humans over the machines.
Deep Learning systems exhibit behavior that appears biological despite not being based on biological material. It so happens that humanity has luckily stumbled upon Artificial Intuition in the form of Deep Learning.
With a new Snowflake data warehouse and Looker data platform on top, data analysts at athenahealth are delivering data to more people, and improving patient experience in the US healthcare system. Register and learn how.
This series explores core concepts of natural language processing, starting with an introduction to the field and explaining how to identify lexical units as a part of data preprocessing.
Apache Parquet and Apache Arrow both focus on improving performance and efficiency of data analytics. These two projects optimize performance for on disk and in-memory processing
17 new must-know Data Science Interview questions and answers include lessons from failure to predict 2016 US Presidential election and Super Bowl LI comeback, understanding bias and variance, why fewer predictors might be better, and how to make a model more robust to outliers.
This is the second part in a 2 part series on curating data from the web. The first part focused on web scraping, while this post details the process of tidying scraped data after the fact.
In our experience working with many quantitative professionals over the years, the two main areas that contribute to long-term career growth are networking and continuous learning. Here is specific advice on how to do this and tips for Continuous Learning.
This post is the first in a 2 part series on scraping and cleaning data from the web using Python. This first part is concerned with the scraping aspect, while the second part while focus on the cleaning. A concrete example is presented.
This post outlines using Google BigQuery for an analysis of NYC Taxi Trips in the cloud, presenting the analysis and visualization in Tableau Public for readers to interact with.
In this post, I’ll look at the practical ingredients of managing agile data science. By using agile data science methods, we help data teams do fast and directed work, and manage the inherent uncertainty of data science and application development.
In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.
We examine what experts say about Big Data – is it like teenage sex? Is it more than just a large and complex collection of data? And how many Vs are there?
Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.
Despite the popularity of Regression, it is also misunderstood. Why? The answer might surprise you: There is no such thing as Regression. Rather, there are a large number of statistical methods that are called Regression, all of which are based on a shared statistical foundation.
Sexiest job... massive shortage... blah blah blah. Are you looking to get a real handle on the career paths available in "Data Science" and "Big Data?" Read this article for insight on where to look to sharpen the required entry-level skills.
Upgraded version of the qualitative analysis freeware QDA Miner Lite now includes a document overview, tree-grid display, image rotation and resizing, importing from PowerPoint and more.
What if instead of hand designing an optimising algorithm (function) we learn it instead? That way, by training on the class of problems we’re interested in solving, we can learn an optimum optimiser for the class!
Analytics is not one time job. It needs to be automated, deployed and improved for future business analytics requirements. Here an IBM expert discusses about development & deployment of analytics assets and capabilities of it.
This blog serves to expand on the approach that the data science team uses to identify (and quantify) which variables and metrics are better predictors of performance.
Many analytic models are not deployed effectively into production while others are not maintained or updated. Applying decision modeling and decision management technology within CRISP-DM addresses this.
With nearly every every smart young computer scientist planning to work on deep learning, are there really still artificial intelligence researchers working on other techniques? Is deep learning the AI silver bullet?
A carefully-curated list of 5 free collections of university course material to help you better understand the various aspects of what artificial intelligence and skills necessary for moving forward in the field.