- Learning from Imbalanced Classes - Aug 31, 2016.
Imbalanced classes can cause trouble for classification. Not all hope is lost, however. Check out this article for methods in which to deal with such a situation.
Pages: 1 2
Balancing Classes, Bayesian, Learning from Data, Sampling, Tom Fawcett
- How Convolutional Neural Networks Work - Aug 31, 2016.
Get an overview of what is going on inside convolutional neural networks, and what it is that makes them so effective.
Pages: 1 2
Brandon Rohrer, Convolutional Neural Networks, Image Recognition, Neural Networks
- What is the Role of the Activation Function in a Neural Network? - Aug 30, 2016.
Confused as to exactly what the activation function in a neural network does? Read this overview, and check out the handy cheat sheet at the end.
Linear Regression, Logistic Regression, Neural Networks
- Data Mining Tip: How to Use High-cardinality Attributes in a Predictive Model - Aug 29, 2016.
High-cardinality nominal attributes can pose an issue for inclusion in predictive models. There exist a few ways to accomplish this, however, which are put forward here.
Feature Engineering, Feature Selection, Predictive Models
Cartoon: Data Scientist – the sexiest job of the 21st century until … - Aug 27, 2016.
This Data Scientist thought that he had the sexiest job of the 21st century until the arrival of the competition ...
Automated, Automated Data Science, Cartoon, Tom Davenport
- MDL Clustering: Unsupervised Attribute Ranking, Discretization, and Clustering - Aug 26, 2016.
MDL Clustering is a free software suite for unsupervised attribute ranking, discretization, and clustering based on the Minimum Description Length principle and built on the Weka Data Mining platform.
Clustering, Feature Selection, Java, Unsupervised Learning, Weka
- The top 5 Big Data courses to help you break into the industry - Aug 25, 2016.
Here is an updated and in-depth review of top 5 providers of Big Data and Data Science courses: Simplilearn, Cloudera, Big Data University, Hortonworks, and Coursera
Big Data, Cloudera, Coursera, Data Science Education, Hortonworks, Online Education, Simplilearn
- A Tutorial on the Expectation Maximization (EM) Algorithm - Aug 25, 2016.
This is a short tutorial on the Expectation Maximization algorithm and how it can be used on estimating parameters for multi-variate data.
Clustering, Data Science, Data Science Education, Predictive Analytics, Statistics
- Introduction to Local Interpretable Model-Agnostic Explanations (LIME) - Aug 25, 2016.
Learn about LIME, a technique to explain the predictions of any machine learning classifier.
Algorithms, Classifier, Explanation, Interpretability, LIME, Machine Learning, Prediction
- A Gentle Introduction to Bloom Filter - Aug 24, 2016.
The Bloom Filter is a probabilistic data structure which can make a tradeoff between space and false positive rate. Read more, and see an implementation from scratch, in this post.
Algorithms, Efficiency, Python
- A simple approach to anomaly detection in periodic big data streams - Aug 24, 2016.
We describe a simple and scaling algorithm that can detect rare and potentially irregular behavior in a time series with periodic patterns. It performs similarly to Twitter's more complex approach.
Anomaly Detection, Apache Spark, BMW, Time Series, Twitter
- Data Science of Reviews: ReviewMeta tool Automatically Detects Unnatural Reviews on Amazon - Aug 23, 2016.
ReviewMeta is a tool that analyzes millions of reviews and helps customers decide which ones to trust. As the dataset grows, so do the insights on unbiased reviews.
Amazon, Analytics, Customer Analytics, Data Mining, Trends
How to Become a (Type A) Data Scientist - Aug 23, 2016.
This post outlines the difference between a Type A and Type B data scientist, and prescribes a learning path on becoming a Type A.
Advice, Data Science, Data Scientist, Internet of Things, IoT
- A Neat Trick to Increase Robustness of Regression Models - Aug 22, 2016.
Read this take on the validity of choosing a different approach to regression modeling. Why isn't L1 norm used more often?
CleverTap, Linear Regression, Outliers, Overfitting, Regression
How to Become a Data Scientist – Part 1 - Aug 22, 2016.
Check out this excellent (and exhaustive) article on becoming a data scientist, written by someone who spends their day recruiting data scientists. Do yourself a favor and read the whole way through. You won't regret it!
Pages: 1 2 3 4
Career, Data Science, Data Science Skills, Data Scientist, Skills
- Misinformation Key Terms, Explained - Aug 20, 2016.
Misinformation has emerged as a key issue for social media platforms. This post will introduce the concept of misinformation and the 8 Key Terms, which provides insights into mining misinformation in social media.
Explained, Key Terms, Social Media, Social Media Analytics
- The Gentlest Introduction to Tensorflow – Part 2 - Aug 19, 2016.
Check out the second and final part of this introductory tutorial to TensorFlow.
Pages: 1 2
Beginners, Deep Learning, Gradient Descent, Machine Learning, TensorFlow
- Top Machine Learning Projects for Julia - Aug 19, 2016.
Julia is gaining traction as a legitimate alternative programming language for analytics tasks. Learn more about these 5 machine learning related projects.
Deep Learning, Julia, Machine Learning, Open Source, scikit-learn
The 10 Algorithms Machine Learning Engineers Need to Know - Aug 18, 2016.
Read this introductory list of contemporary machine learning algorithms of importance that every engineer should understand.
Pages: 1 2
Algorithms, Machine Learning, Supervised Learning, Unsupervised Learning
- Approaching (Almost) Any Machine Learning Problem - Aug 18, 2016.
If you're looking for an overview of how to approach (almost) any machine learning problem, this is a good place to start. Read on as a Kaggle competition veteran shares his pipelines and approach to problem-solving.
Pages: 1 2
Advice, Feature Selection, Kaggle, Machine Learning, Modeling
- Does Data Scientist Mean What You Think It Means? - Aug 16, 2016.
Do we have an accurate idea of what "data scientist" actually means? Read this thought-provoking opinion on the topic.
Career, Data Scientist
- Central Limit Theorem for Data Science – Part 2 - Aug 16, 2016.
This post continues an explanation of Central Limit Theorem started in a previous post, with additional details... and beer.
Beer, Centrality, Distribution, Statistics
Cartoon: Make Data Great Again - Aug 13, 2016.
This KDnuggets cartoon considers a speech that a certain presidential candidate can give on a topic of Big Data.
Cartoon, Donald Trump, Politics
- Central Limit Theorem for Data Science - Aug 12, 2016.
This post is an introductory explanation of the Central Limit Theorem, and why it is (or should be) of importance to data scientists.
Centrality, Distribution, Statistics
- Understanding the Empirical Law of Large Numbers and the Gambler’s Fallacy - Aug 12, 2016.
Law of large numbers is a important concept for practising data scientists. In this post, The empirical law of large numbers is demonstrated via simple simulation approach using the Bernoulli process.
Algorithms, R, Statistics
- 5 EBooks to Read Before Getting into A Data Science or Big Data Career - Aug 11, 2016.
A short, carefully-curated list of 5 free ebooks to help you better understand what Data Science is all about and how you can best prepare for a career in data science, big data, and data analysis.
Big Data, Free ebook, Hadoop, Programming Languages, Simplilearn, Tableau
- A Beginner’s Guide to Neural Networks with R! - Aug 11, 2016.
In this article we will learn how Neural Networks work and how to implement them with the R programming language! We will see how we can easily create Neural Networks with R and even visualize them. Basic understanding of R is necessary to understand this article.
Pages: 1 2
Beginners, Neural Networks, R, Udemy
- Visualizing 1 Billion Points of Data: Doing It Right – Aug 18 Webinar - Aug 11, 2016.
Join Continuum Analytics on August 18 for a webinar on Big Data visualization with the datashader library. Save your spot today!
Continuum Analytics, Data Visualization, Jupyter, Python
- Big Data Key Terms, Explained - Aug 11, 2016.
Just getting started with Big Data, or looking to iron out the wrinkles in your current understanding? Check out these 20 Big Data-related terms and their concise definitions.
Pages: 1 2
3Vs of Big Data, Apache Spark, Big Data, Business Intelligence, Cloud Computing, Data Warehouse, Explained, Hadoop, Key Terms, Predictive Analytics
- 7 Steps to Understanding Computer Vision - Aug 9, 2016.
A starting point for Computer Vision and how to get going deeper. Dive into this post for some overview of the right resources and a little bit of advice.
7 Steps, Computer Vision, Deep Learning, Neural Networks, Python
- Short course: Statistical Learning and Data Mining IV, Washington, DC, Oct 19-20 - Aug 8, 2016.
This new two-day course gives a detailed and modern overview of statistical models used by data scientists for prediction and inference, including sparse models and deep learning.
Data Mining, DC, R, Robert Tibshirani, Statistical Learning, Trevor Hastie, Washington
- Cartoon: Facebook data science experiments and Cats - Aug 8, 2016.
In honor of International Cat Day, we revisit KDnuggets cartoon that looks at the Facebook data science experiment on emotion manipulation and the importance of happy kittens.
Cartoon, Cats, Data Science, Facebook
- Understanding the Bias-Variance Tradeoff: An Overview - Aug 8, 2016.
A model's ability to minimize bias and minimize variance are often thought of as 2 opposing ends of a spectrum. Being able to understand these two types of errors are critical to diagnosing model results.
Bias, Cross-validation, Model Performance, Variance
- Brain Monitoring with Kafka, OpenTSDB, and Grafana - Aug 5, 2016.
Interested in using open source software to monitor brain activity, and control your devices? Sure you are! Read this fantastic post for some insight and direction.
Pages: 1 2 3
Brain, Internet of Things, IoT, Kafka, Monitoring
- Contest Winner: Winning the AutoML Challenge with Auto-sklearn - Aug 5, 2016.
This post is the first place prize recipient in the recent KDnuggets blog contest. Auto-sklearn is an open-source Python tool that automatically determines effective machine learning pipelines for classification and regression datasets. It is built around the successful scikit-learn library and won the recent AutoML challenge.
Automated, Automated Data Science, Automated Machine Learning, Competition, Hyperparameter, scikit-learn, Weka
- Nigeria: Telling Internally Displaced Persons Stories Using Visual Data and Infographics - Aug 5, 2016.
Read a data-driven discussion on the plight of internally displaced persons (IDPs) in Nigeria, and see the real power of data science and data visualization.
Nigeria, Open Data, Refugees
- Reinforcement Learning and the Internet of Things - Aug 5, 2016.
Gain an understanding of how reinforcement learning can be employed in the Internet of Things world.
Brandon Rohrer, Internet of Things, IoT, Reinforcement Learning, Richard Sutton
- Contest 2nd Place: Automated Data Science and Machine Learning in Digital Advertising - Aug 4, 2016.
This post is an overview of an automated machine learning system in the digital advertising realm. It is an entrant and second-place recipient in the recent KDnuggets blog contest.
Advertising, Automated, Automated Data Science, Automated Machine Learning, Claudia Perlich, Machine Learning
- Contest 2nd Place: Automating Data Science - Aug 3, 2016.
This post discusses some considerations, options, and opportunities for automating aspects of data science and machine learning. It is the second place recipient (tied) in the recent KDnuggets blog contest.
Algorithms, Automated, Automated Data Science, Feature Selection, Machine Learning
- What Statistics Topics are Needed for Excelling at Data Science? - Aug 2, 2016.
Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity.
Bayesian, Distribution, Machine Learning, Markov Chains, Probability, Regression, Statistics
- Doing Statistics with SQL - Aug 2, 2016.
This post covers how to perform some basic in-database statistical analysis using SQL.
SQL, Statistics
- And the Winner is… Stepwise Regression - Aug 1, 2016.
This post evaluates several methods for automating the feature selection process in large-scale linear regression models and show that for marketing applications the winner is Stepwise regression.
Automated Data Science, Feature Selection, Linear Regression, Machine Learning, Predictive Analytics
- The Core of Data Science - Aug 1, 2016.
This post provides a simplifying framework, an ontology for Machine Learning and some important developments in dynamical machine learning. From first hand Data Science product experience, the author suggests how best to execute Data Science projects.
Bayesian, Data Science, Data Science Team, Ontology
- Dataiku DSS 3.1 – Now with 5 ML Backends & Scala! - Aug 1, 2016.
Introducing Dataiku DSS 3.1, with new visual machine learning engines that allow users to create incredibly powerful predictive applications within a code-free interface.
Data Science, Dataiku, Machine Learning, Scala
- Yann LeCun Quora Session Overview - Aug 1, 2016.
Here is a quick oversight, with excerpts, of the Yann LeCun Quora Session which took place on Thursday July 28, 2016.
Deep Learning, Generative Adversarial Network, Quora, Yann LeCun