- What is the Role of the Activation Function in a Neural Network? - Aug 30, 2016.
Confused as to exactly what the activation function in a neural network does? Read this overview, and check out the handy cheat sheet at the end.
Linear Regression, Logistic Regression, Neural Networks
- Data Mining Tip: How to Use High-cardinality Attributes in a Predictive Model - Aug 29, 2016.
High-cardinality nominal attributes can pose an issue for inclusion in predictive models. There exist a few ways to accomplish this, however, which are put forward here.
Feature Engineering, Feature Selection, Predictive Models
Cartoon: Data Scientist – the sexiest job of the 21st century until … - Aug 27, 2016.
This Data Scientist thought that he had the sexiest job of the 21st century until the arrival of the competition ...
Automated, Automated Data Science, Cartoon, Tom Davenport
- MDL Clustering: Unsupervised Attribute Ranking, Discretization, and Clustering - Aug 26, 2016.
MDL Clustering is a free software suite for unsupervised attribute ranking, discretization, and clustering based on the Minimum Description Length principle and built on the Weka Data Mining platform.
Clustering, Feature Selection, Java, Unsupervised Learning, Weka
- The top 5 Big Data courses to help you break into the industry - Aug 25, 2016.
Here is an updated and in-depth review of top 5 providers of Big Data and Data Science courses: Simplilearn, Cloudera, Big Data University, Hortonworks, and Coursera
Big Data, Cloudera, Coursera, Data Science Education, Hortonworks, Online Education, Simplilearn
- A Tutorial on the Expectation Maximization (EM) Algorithm - Aug 25, 2016.
This is a short tutorial on the Expectation Maximization algorithm and how it can be used on estimating parameters for multi-variate data.
Clustering, Data Science, Data Science Education, Predictive Analytics, Statistics
- Introduction to Local Interpretable Model-Agnostic Explanations (LIME) - Aug 25, 2016.
Learn about LIME, a technique to explain the predictions of any machine learning classifier.
Algorithms, Classifier, Explanation, Interpretability, LIME, Machine Learning, Prediction
- A Gentle Introduction to Bloom Filter - Aug 24, 2016.
The Bloom Filter is a probabilistic data structure which can make a tradeoff between space and false positive rate. Read more, and see an implementation from scratch, in this post.
Algorithms, Efficiency, Python
- A simple approach to anomaly detection in periodic big data streams - Aug 24, 2016.
We describe a simple and scaling algorithm that can detect rare and potentially irregular behavior in a time series with periodic patterns. It performs similarly to Twitter's more complex approach.
Anomaly Detection, Apache Spark, BMW, Time Series, Twitter
- Data Science of Reviews: ReviewMeta tool Automatically Detects Unnatural Reviews on Amazon - Aug 23, 2016.
ReviewMeta is a tool that analyzes millions of reviews and helps customers decide which ones to trust. As the dataset grows, so do the insights on unbiased reviews.
Amazon, Analytics, Customer Analytics, Data Mining, Trends
How to Become a (Type A) Data Scientist - Aug 23, 2016.
This post outlines the difference between a Type A and Type B data scientist, and prescribes a learning path on becoming a Type A.
Advice, Data Science, Data Scientist, Internet of Things, IoT
- A Neat Trick to Increase Robustness of Regression Models - Aug 22, 2016.
Read this take on the validity of choosing a different approach to regression modeling. Why isn't L1 norm used more often?
CleverTap, Linear Regression, Outliers, Overfitting, Regression
How to Become a Data Scientist – Part 1 - Aug 22, 2016.
Check out this excellent (and exhaustive) article on becoming a data scientist, written by someone who spends their day recruiting data scientists. Do yourself a favor and read the whole way through. You won't regret it!
Pages: 1 2 3 4
Career, Data Science, Data Science Skills, Data Scientist, Skills
- Misinformation Key Terms, Explained - Aug 20, 2016.
Misinformation has emerged as a key issue for social media platforms. This post will introduce the concept of misinformation and the 8 Key Terms, which provides insights into mining misinformation in social media.
Explained, Key Terms, Social Media, Social Media Analytics
- The Gentlest Introduction to Tensorflow – Part 2 - Aug 19, 2016.
Check out the second and final part of this introductory tutorial to TensorFlow.
Pages: 1 2
Beginners, Deep Learning, Gradient Descent, Machine Learning, TensorFlow
- Top Machine Learning Projects for Julia - Aug 19, 2016.
Julia is gaining traction as a legitimate alternative programming language for analytics tasks. Learn more about these 5 machine learning related projects.
Deep Learning, Julia, Machine Learning, Open Source, scikit-learn
The 10 Algorithms Machine Learning Engineers Need to Know - Aug 18, 2016.
Read this introductory list of contemporary machine learning algorithms of importance that every engineer should understand.
Pages: 1 2
Algorithms, Machine Learning, Supervised Learning, Unsupervised Learning
- Approaching (Almost) Any Machine Learning Problem - Aug 18, 2016.
If you're looking for an overview of how to approach (almost) any machine learning problem, this is a good place to start. Read on as a Kaggle competition veteran shares his pipelines and approach to problem-solving.
Pages: 1 2
Advice, Feature Selection, Kaggle, Machine Learning, Modeling
- Does Data Scientist Mean What You Think It Means? - Aug 16, 2016.
Do we have an accurate idea of what "data scientist" actually means? Read this thought-provoking opinion on the topic.
Career, Data Scientist
- Central Limit Theorem for Data Science – Part 2 - Aug 16, 2016.
This post continues an explanation of Central Limit Theorem started in a previous post, with additional details... and beer.
Beer, Centrality, Distribution, Statistics
Cartoon: Make Data Great Again - Aug 13, 2016.
This KDnuggets cartoon considers a speech that a certain presidential candidate can give on a topic of Big Data.
Cartoon, Donald Trump, Politics
- Central Limit Theorem for Data Science - Aug 12, 2016.
This post is an introductory explanation of the Central Limit Theorem, and why it is (or should be) of importance to data scientists.
Centrality, Distribution, Statistics
- Understanding the Empirical Law of Large Numbers and the Gambler’s Fallacy - Aug 12, 2016.
Law of large numbers is a important concept for practising data scientists. In this post, The empirical law of large numbers is demonstrated via simple simulation approach using the Bernoulli process.
Algorithms, R, Statistics
- 5 EBooks to Read Before Getting into A Data Science or Big Data Career - Aug 11, 2016.
A short, carefully-curated list of 5 free ebooks to help you better understand what Data Science is all about and how you can best prepare for a career in data science, big data, and data analysis.
Big Data, Free ebook, Hadoop, Programming Languages, Simplilearn, Tableau
- A Beginner’s Guide to Neural Networks with R! - Aug 11, 2016.
In this article we will learn how Neural Networks work and how to implement them with the R programming language! We will see how we can easily create Neural Networks with R and even visualize them. Basic understanding of R is necessary to understand this article.
Pages: 1 2
Beginners, Neural Networks, R, Udemy
- Visualizing 1 Billion Points of Data: Doing It Right – Aug 18 Webinar - Aug 11, 2016.
Join Continuum Analytics on August 18 for a webinar on Big Data visualization with the datashader library. Save your spot today!
Continuum Analytics, Data Visualization, Jupyter, Python
- Big Data Key Terms, Explained - Aug 11, 2016.
Just getting started with Big Data, or looking to iron out the wrinkles in your current understanding? Check out these 20 Big Data-related terms and their concise definitions.
Pages: 1 2
3Vs of Big Data, Apache Spark, Big Data, Business Intelligence, Cloud Computing, Data Warehouse, Explained, Hadoop, Key Terms, Predictive Analytics
- 7 Steps to Understanding Computer Vision - Aug 9, 2016.
A starting point for Computer Vision and how to get going deeper. Dive into this post for some overview of the right resources and a little bit of advice.
7 Steps, Computer Vision, Deep Learning, Neural Networks, Python
- Short course: Statistical Learning and Data Mining IV, Washington, DC, Oct 19-20 - Aug 8, 2016.
This new two-day course gives a detailed and modern overview of statistical models used by data scientists for prediction and inference, including sparse models and deep learning.
Data Mining, DC, R, Robert Tibshirani, Statistical Learning, Trevor Hastie, Washington
- Cartoon: Facebook data science experiments and Cats - Aug 8, 2016.
In honor of International Cat Day, we revisit KDnuggets cartoon that looks at the Facebook data science experiment on emotion manipulation and the importance of happy kittens.
Cartoon, Cats, Data Science, Facebook
- Understanding the Bias-Variance Tradeoff: An Overview - Aug 8, 2016.
A model's ability to minimize bias and minimize variance are often thought of as 2 opposing ends of a spectrum. Being able to understand these two types of errors are critical to diagnosing model results.
Bias, Cross-validation, Model Performance, Variance
- Brain Monitoring with Kafka, OpenTSDB, and Grafana - Aug 5, 2016.
Interested in using open source software to monitor brain activity, and control your devices? Sure you are! Read this fantastic post for some insight and direction.
Pages: 1 2 3
Brain, Internet of Things, IoT, Kafka, Monitoring
- Contest Winner: Winning the AutoML Challenge with Auto-sklearn - Aug 5, 2016.
This post is the first place prize recipient in the recent KDnuggets blog contest. Auto-sklearn is an open-source Python tool that automatically determines effective machine learning pipelines for classification and regression datasets. It is built around the successful scikit-learn library and won the recent AutoML challenge.
Automated, Automated Data Science, Automated Machine Learning, Competition, Hyperparameter, scikit-learn, Weka
- Nigeria: Telling Internally Displaced Persons Stories Using Visual Data and Infographics - Aug 5, 2016.
Read a data-driven discussion on the plight of internally displaced persons (IDPs) in Nigeria, and see the real power of data science and data visualization.
Nigeria, Open Data, Refugees
- Reinforcement Learning and the Internet of Things - Aug 5, 2016.
Gain an understanding of how reinforcement learning can be employed in the Internet of Things world.
Brandon Rohrer, Internet of Things, IoT, Reinforcement Learning, Richard Sutton
- Contest 2nd Place: Automated Data Science and Machine Learning in Digital Advertising - Aug 4, 2016.
This post is an overview of an automated machine learning system in the digital advertising realm. It is an entrant and second-place recipient in the recent KDnuggets blog contest.
Advertising, Automated, Automated Data Science, Automated Machine Learning, Claudia Perlich, Machine Learning
- Contest 2nd Place: Automating Data Science - Aug 3, 2016.
This post discusses some considerations, options, and opportunities for automating aspects of data science and machine learning. It is the second place recipient (tied) in the recent KDnuggets blog contest.
Algorithms, Automated, Automated Data Science, Feature Selection, Machine Learning
- What Statistics Topics are Needed for Excelling at Data Science? - Aug 2, 2016.
Here is a list of skills and statistical concepts suggested for excelling at data science, roughly in order of increasing complexity.
Bayesian, Distribution, Machine Learning, Markov Chains, Probability, Regression, Statistics
- Doing Statistics with SQL - Aug 2, 2016.
This post covers how to perform some basic in-database statistical analysis using SQL.
SQL, Statistics
- And the Winner is… Stepwise Regression - Aug 1, 2016.
This post evaluates several methods for automating the feature selection process in large-scale linear regression models and show that for marketing applications the winner is Stepwise regression.
Automated Data Science, Feature Selection, Linear Regression, Machine Learning, Predictive Analytics
- The Core of Data Science - Aug 1, 2016.
This post provides a simplifying framework, an ontology for Machine Learning and some important developments in dynamical machine learning. From first hand Data Science product experience, the author suggests how best to execute Data Science projects.
Bayesian, Data Science, Data Science Team, Ontology
- Dataiku DSS 3.1 – Now with 5 ML Backends & Scala! - Aug 1, 2016.
Introducing Dataiku DSS 3.1, with new visual machine learning engines that allow users to create incredibly powerful predictive applications within a code-free interface.
Data Science, Dataiku, Machine Learning, Scala
- Yann LeCun Quora Session Overview - Aug 1, 2016.
Here is a quick oversight, with excerpts, of the Yann LeCun Quora Session which took place on Thursday July 28, 2016.
Deep Learning, Generative Adversarial Network, Quora, Yann LeCun
- Data Science of Visiting Famous Movie Locations in San Francisco - Jul 30, 2016.
Using the Google Places API and IMDb API, we selected movie locations in The Golden City which every movie fan should visit while they are in town, and optimize sightseeing by solving the travelling salesman problem.
CA, Data Science, Google, IMDb, Python, San Francisco
- Theoretical Data Discovery: Using Physics to Understand Data Science - Jul 29, 2016.
Data science may be a relatively recent buzzword, but the collection of tools and techniques to which it refers come from a broad range of disciplines. Physics has a wealth of concepts to learn from, as evidenced in this piece.
Data Science, Physics, Quantum Computing
- Build vs Buy – Analytics Dashboards - Jul 29, 2016.
Read this post on choosing between available analytics dashboard options, and designing your own. Get an informed opinion.
Analytics, Dashboard
- Data Science Statistics 101 - Jul 28, 2016.
Statistics can often be the most intimidating aspect of data science for aspiring data scientists to learn. Gain some personal perspective from someone who has traveled the path.
Beginners, Data Science, Statistics
- 7 Steps to Understanding NoSQL Databases - Jul 27, 2016.
Are you a newcomer to NoSQL, interested in gaining a real understanding of the technologies and architectures it includes? This post is for you.
7 Steps, Cassandra, Database, Graph Databases, HBase, MongoDB, Neo4j, NoSQL
- Internet of Things Key Terms, Explained - Jul 27, 2016.
This post will define 12 Key Terms for the Internet of Things, in straightforward manner.
API, Explained, Industrial Internet, Internet of Things, IoT, Key Terms
- Would You Survive the Titanic? A Guide to Machine Learning in Python Part 2 - Jul 26, 2016.
This is part 2 of a 3 part introductory series on machine learning in Python, using the Titanic dataset.
Pages: 1 2
Machine Learning, Python, Titanic
- Data Science for Beginners 1: The 5 questions data science answers - Jul 26, 2016.
A series of videos and write-ups covering the basics of data science for beginners. This first video is about the kinds of questions that data science can answer.
Beginners, Data Science, Microsoft, Question answering
- Would You Survive the Titanic? A Guide to Machine Learning in Python Part 1 - Jul 25, 2016.
Check out the first of a 3 part introductory series on machine learning in Python, fueled by the Titanic dataset. This is a great place to start for a machine learning newcomer.
Machine Learning, Python, scikit-learn, Titanic
- 35 Open Source tools for Internet of Things - Jul 25, 2016.
If you have heard about the Internet of Things many times by now, its time to join the conversation. Explore the many open source tools & projects related to Internet of Things.
Pages: 1 2 3
Internet of Things, IoT, Open Source, Tools
- SAS vs R vs Python: Which Tool Do Analytics Pros Prefer? - Jul 22, 2016.
There are lots of flame wars involving different data science and analytics tools... but this isn't one of them. Check out the quantitative results and analysis of a Burtch Works survey on the subject.
Burtch Works, Python, R, SAS, Survey
- Building a Data Science Portfolio: Machine Learning Project Part 1 - Jul 20, 2016.
Dataquest's founder has put together a fantastic resource on building a data science portfolio. This first of three parts lays the groundwork, with subsequent posts over the following 2 days. Very comprehensive!
Pages: 1 2
Advice, Career, Data Science, Data Scientist, Dataquest, Machine Learning, Portfolio, Project, Python
- Multi-Task Learning in Tensorflow: Part 1 - Jul 20, 2016.
A discussion and step-by-step tutorial on how to use Tensorflow graphs for multi-task learning.
Pages: 1 2
Machine Learning, Neural Networks, TensorFlow
- In Deep Learning, Architecture Engineering is the New Feature Engineering - Jul 19, 2016.
A discussion of architecture engineering in deep neural networks, and its relationship with feature engineering.
Architecture, Deep Learning, Feature Engineering, Neural Networks
- What the Next Generation of IoT Sensors Have in Store - Jul 19, 2016.
This post is an overview of some of the next-generation IoT sensors, and what they could mean for our future.
Internet of Things, IoT, Sensors
- MNIST Generative Adversarial Model in Keras - Jul 19, 2016.
This post discusses and demonstrates the implementation of a generative adversarial network in Keras, using the MNIST dataset.
GANs, Generative Models, Keras, MNIST
- Statistical Data Analysis in Python - Jul 18, 2016.
This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects, taking the form of a set of IPython notebooks.
IPython, Jupyter, Pandas, Python, Statistical Analysis
Why Big Data is in Trouble: They Forgot About Applied Statistics - Jul 18, 2016.
This "classic" (but very topical and certainly relevant) post discusses issues that Big Data can face when it forgets, or ignores, applied statistics. As great of a discussion today as it was 2 years ago.
Applied Statistics, Big Data, Google, Statistics
- Predictive Analytics Introductory Key Terms, Explained - Jul 18, 2016.
Here is a collection of introductory predictive analytics terms and concepts, presented for the newcomer in a straight-forward, no frills definition style.
Book, Eric Siegel, Explained, Key Terms, Predictive Analytics
- America’s Next Topic Model - Jul 15, 2016.
Topic modeling is a a great way to get a bird's eye view on a large document collection using machine learning. Here are 3 ways to use open source Python tool Gensim to choose the best topic model.
LDA, NLP, Python, Text Mining, Topic Modeling, Unsupervised Learning
- 10 Algorithm Categories for AI, Big Data, and Data Science - Jul 14, 2016.
With a focus on leveraging algorithms and balancing human and AI capital, here are the top 10 algorithm categories used to implement A.I., Big Data, and Data Science.
AI, Algorithms, Big Data, Data Science
- How to Start Learning Deep Learning - Jul 14, 2016.
Want to get started learning deep learning? Sure you do! Check out this great overview, advice, and list of resources.
Andrej Karpathy, Coursera, Deep Learning, edX, Geoff Hinton, Neural Networks
- What Data Scientists Can Learn From Qualitative Research - Jul 14, 2016.
Learn what data scientists can learn from qualitative researchers when it comes to analysing text, and how this relates to writing quality code.
Programming, Qualitative Analytics, Qualitative Research, Text Analytics
Bayesian Machine Learning, Explained - Jul 13, 2016.
Want to know about Bayesian machine learning? Sure you do! Get a great introductory explanation here, as well as suggestions where to go for further study.
Bayesian, Explained, LDA, Machine Learning
- TalkingData Data Science Competition: understand mobile users - Jul 12, 2016.
Unique opportunity to solve complex real world big data challenges for the China mobile market - predict users demographic characteristics based on their app usage, geolocation, and mobile device properties.
China, Competition, Kaggle, Mobile, TalkingData, Turi
- 5 Deep Learning Projects You Can No Longer Overlook - Jul 12, 2016.
There are a number of "mainstream" deep learning projects out there, but many more niche projects flying under the radar. Have a look at 5 such projects worth checking out.
C++, Deep Learning, Javascript, Machine Learning, Neural Networks, Overlook, Python
- The Hard Problems AI Can’t (Yet) Touch - Jul 11, 2016.
It's tempting to consider the progress of AI as though it were a single monolithic entity,
advancing towards human intelligence on all fronts. But today's machine learning only addresses problems with simple, easily quantified objectives
AI, Machine Learning, Optimization, Reinforcement Learning, Supervised Learning
- Top Machine Learning MOOCs and Online Lectures: A Comprehensive Survey - Jul 11, 2016.
This post reviews Machine Learning MOOCs and online lectures for both the novice and expert audience.
Andrew Ng, Coursera, Deep Learning, edX, Machine Learning, MOOC, Nando de Freitas, Tom Mitchell, Udacity
- New Book: Effective CRM using Predictive Analytics – get 20% discount - Jul 11, 2016.
A comprehensive step-by-step guide to designing, setting up, executing and deploying data mining techniques in marketing. Use code VBM93 for 20% discount.
Book, CRM, Predictive Analytics, Wiley
- Big Data, Bible Codes, and Bonferroni - Jul 8, 2016.
This discussion will focus on 2 particular statistical issues to be on the look out for in your own work and in the work of others mining and learning from Big Data, with real world examples emphasizing the importance of statistical processes in practice.
Bible, Big Data, Bonferroni, Probability, Statistics, Terrorism
- Streamlining Analytic Deployment: Inside the FICO Decision Management Suite 2.0 - Jul 8, 2016.
This post explains what’s new in the 2.0 version of the FICO Decision Management Suite, and how it can be used by data scientists and others to create stronger customer relationships and provide strategic competitive advantage.
Decision Management, Decision Support, Deployment, FICO
- Support Vector Machines: A Simple Explanation - Jul 7, 2016.
A no-nonsense, 30,000 foot overview of Support Vector Machines, concisely explained with some great diagrams.
Aylien, Explanation, Machine Learning, Support Vector Machines
- Interview: Florian Douetteau, Dataiku Founder, on Empowering Data Scientists - Jul 7, 2016.
Here is an interview with Florian Douetteau, founder of Dataiku, on how their tools empower data scientists, and how data science itself is evolving.
Ajay Ohri, API, Data Science Tools, Dataiku, Florian Douetteau, Python, R
- Deep Residual Networks for Image Classification with Python + NumPy - Jul 7, 2016.
This post outlines the results of an innovative Deep Residual Network implementation for Image Classification using Python and NumPy.
Deep Learning, Neural Networks, numpy, Python
- Storytelling: The Power to Influence in Data Science - Jul 6, 2016.
Data scientists need to share results, which is different than talking shop with other data scientists. Read about influencing people and telling stories as a data scientist.
Communication, Data Science, Storytelling
- Success Criteria for Process Mining - Jul 6, 2016.
This article provides tips about the pitfalls and advice that will help you to make your first process mining project as successful as it can be.
Process Mining, Success
- Mining Twitter Data with Python Part 7: Geolocation and Interactive Maps - Jul 6, 2016.
The final part of this 7 part series explores using geolocation and interactive maps with Twitter data.
Data Visualization, Geo-Localization, Javascript, Python, Social Media, Social Media Analytics, Text Mining, Twitter
- 3 Key Ethics Principles for Big Data and Data Science - Jul 6, 2016.
If ethics in general are important, should ethics training be a crucial element of the data science field?
Big Data, Data Science, Ethics, Hui Xiong
- Mining Twitter Data with Python Part 6: Sentiment Analysis Basics - Jul 5, 2016.
Part 6 of this series builds on the previous installments by exploring the basics of sentiment analysis on Twitter data.
Python, Sentiment Analysis, Social Media, Social Media Analytics, Text Mining, Twitter
- Data Mining History: The Invention of Support Vector Machines - Jul 4, 2016.
The story starts in Paris in 1989, when I benchmarked neural networks against kernel methods, but the real invention of SVMs happened when Bernhard decided to implement Vladimir Vapnik algorithm.
History, Isabelle Guyon, Support Vector Machines, SVM, Vladimir Vapnik
- What is Softmax Regression and How is it Related to Logistic Regression? - Jul 1, 2016.
An informative exploration of softmax regression and its relationship with logistic regression, and situations in which each would be applicable.
Logistic Regression, Machine Learning, Regression
- Text Mining 101: Topic Modeling - Jul 1, 2016.
We introduce the concept of topic modelling and explain two methods: Latent Dirichlet Allocation and TextRank. The techniques are ingenious in how they work – try them yourself.
LDA, Text Mining, TextRank, Topic Modeling
- Recursive (not Recurrent!) Neural Networks in TensorFlow - Jun 30, 2016.
Learn how to implement recursive neural networks in TensorFlow, which can be used to learn tree-like structures, or directed acyclic graphs.
Neural Networks, TensorFlow
- Mining Twitter Data with Python Part 5: Data Visualisation Basics - Jun 29, 2016.
Part 5 of this series takes on data visualization, as we look to make sense of our data and highlight interesting insights.
D3.js, Data Visualization, Python, Social Media, Social Media Analytics, Text Mining, Twitter
The Big Data Ecosystem is Too Damn Big - Jun 28, 2016.
The Big Data ecosystem is just too damn big! It's complex, redundant, and confusing. There are too many layers in the technology stack, too many standards, and too many engines. Vendors? Too many. What is the user to do?
Analytics, Big Data, Business Analytics
- 5 More Machine Learning Projects You Can No Longer Overlook - Jun 28, 2016.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects.
Computer Vision, Data Preparation, Data Preprocessing, Javascript, Machine Learning, Natural Language Processing, NLP, Overlook, Python
- Mining Twitter Data with Python Part 4: Rugby and Term Co-occurrences - Jun 27, 2016.
Part 4 of this series employs some of the lessons learned thus far to analyze tweets related to rugby matches and term co-occurrences.
Python, Social Media, Social Media Analytics, Text Mining, Twitter
- Improving Nudity Detection and NSFW Image Recognition - Jun 25, 2016.
This post discussed improvements made in a tricky machine learning classification problem: nude and/or NSFW, or not?
Algorithmia, Algorithms, Classification
- Regularization in Logistic Regression: Better Fit and Better Generalization? - Jun 24, 2016.
A discussion on regularization in logistic regression, and how its usage plays into better model fit and generalization.
Cost Function, Logistic Regression, Machine Learning, Regression, Regularization
- Top Machine Learning Libraries for Javascript - Jun 24, 2016.
Javascript may not be the conventional choice for machine learning, but there is no reason it cannot be used for such tasks. Here are the top libraries to facilitate machine learning in Javascript.
Andrej Karpathy, Convolutional Neural Networks, Deep Learning, Javascript, Machine Learning, Neural Networks
- Ten Simple Rules for Effective Statistical Practice: An Overview - Jun 23, 2016.
An overview of 10 simple rules to follow to ensure proper effective statistical data analysis.
Advice, Data Quality, Noise, Replication, Reproducibility, Statistical Analysis
- Machine Learning Trends and the Future of Artificial Intelligence - Jun 22, 2016.
The confluence of data flywheels, the algorithm economy, and cloud-hosted intelligence means every company can now be a data company, every company can now access algorithmic intelligence, and every app can now be an intelligent app.
Algorithmia, Algorithms, Artificial Intelligence, Cloud, Machine Intelligence, Machine Learning
- History of Data Mining - Jun 22, 2016.
Data mining is a subfield of computer science which blends many techniques from statistics, data science, database theory and machine learning. Here are the major milestones and “firsts” in the history of data mining plus how it’s evolved and blended with data science and big data.
About Gregory Piatetsky, Alan Turing, Bayes Theorem, Data Mining, DJ Patil, History, Vladimir Vapnik
- New Andrew Ng Machine Learning Book Under Construction, Free Draft Chapters - Jun 20, 2016.
Check out the details on Andrew Ng's new book on building machine learning systems, and find out how to get your free copy of draft chapters as they are written.
Andrew Ng, Book, Free ebook, Machine Learning
- What is Your Data Worth? On LinkedIn, Microsoft, and the Value of User Data - Jun 20, 2016.
The recent announcement of Microsoft’s acquisition of LinkedIn has raised many questions about how Microsoft will monetize this data. We examine LinkedIn value per user and compare to Google, Facebook, Yahoo, and Twitter.
Business Value, Facebook, Google, LinkedIn, Microsoft, Yahoo
- Political Data Science: Analyzing Trump, Clinton, and Sanders Tweets and Sentiment - Jun 18, 2016.
This post shares some results of political text analytics performed on Twitter data. How negative are the US Presidential candidate tweets? How does the media mention the candidates in tweets? Read on to find out!
Bernie Sanders, Donald Trump, Hillary Clinton, ParseHub, Politics, Sentiment Analysis, Twitter
- A Visual Explanation of the Back Propagation Algorithm for Neural Networks - Jun 17, 2016.
A concise explanation of backpropagation for neural networks is presented in elementary terms, along with explanatory visualization.
Algorithms, Backpropagation, Explanation, Machine Learning, Neural Networks
- How open API economy accelerates the growth of big data and analytics - Jun 17, 2016.
An open API is available on the internet for free. We review the growth of API economy and how organizations have been realizing the potential of open APIs in transforming their business.
API, Big Data Analytics, Open Data
- Thinking About Analytics Readiness - Jun 16, 2016.
This article touches upon an important but under-discussed topic of analytics readiness, including whether and when organizations should engage in analytics.
Analytics, Analytics Strategy, Culture, Strategy
- Nutrition & Principal Component Analysis: A Tutorial - Jun 16, 2016.
A great overview of Principal Component Analysis (PCA), with an example application in the field of nutrition.
Pages: 1 2
Algobeans, Feature Selection, Food, Nutrition, PCA
- 7 Steps to Mastering SQL for Data Science - Jun 16, 2016.
Follow these 7 steps to go from SQL data science newbie to seasoned practitioner quickly. No nonsense, just the necessities.
Pages: 1 2
7 Steps, Data Science, Database, Relational Databases, SQL
- Mining Twitter Data with Python Part 1: Collecting Data - Jun 15, 2016.
Part 1 of a 7 part series focusing on mining Twitter data for a variety of use cases. This first post lays the groundwork, and focuses on data collection.
Python, Social Media, Social Media Analytics, Twitter
- 10 Data Acquisition Strategies for Startups - Jun 14, 2016.
An interesting discussion of the myriad methods in which startups may choose to acquire data, often the most overlooked and important aspect of a startup's success (or failure).
Pages: 1 2
Acquisitions, Crowdsourcing, Datasets, Startups
- Machine Learning Classic: Parsimonious Binary Classification Trees - Jun 14, 2016.
Get your hands on a classic technical report outlining a three-step method of construction binary decision trees for multiple classification problems.
Decision Trees, Leo Breiman, Machine Learning, Statistics
- How to Select Support Vector Machine Kernels - Jun 13, 2016.
Support Vector Machine kernel selection can be tricky, and is dataset dependent. Here is some advice on how to proceed in the kernel selection process.
Machine Learning, Support Vector Machines
- Apache Spark Key Terms, Explained - Jun 13, 2016.
An overview of 13 core Apache Spark concepts, presented with focus and clarity in mind. A great beginner's overview of essential Spark terminology.
Pages: 1 2
Apache Spark, Databricks, Dataset, Explained, Key Terms, RDD, Tungsten
- AIG & Zurich on Machine Learning in Insurance - Jun 10, 2016.
Where and how can machine learning be practically applied by insurers? And is it worth it? Read the white paper from insurance experts at AIG and Zurich.
AIG, Insurance, Machine Learning, White Paper
- Top NoSQL Database Engines - Jun 10, 2016.
An overview of the top 5 NoSQL database engines in use today, including examples of key-value, column-oriented, graph, and document paradigms.
Cassandra, Database, HBase, MongoDB, Neo4j, NoSQL
- Cloud Computing Key Terms, Explained - Jun 9, 2016.
A concise overview of 20 core cloud computing ecosystem concepts. The focus here is on the terminology, not The Big Picture.
Pages: 1 2
AWS, Cloud, Cloud Computing, Explained, Key Terms, PaaS, SaaS
- 5 Best Practices for Big Data Security - Jun 9, 2016.
Lack of data security can not only result in financial losses, but may also damage the reputation of organizations. Take a look at some of the most important data security best practices that can reduce the risks associated with analyzing a massive amount of data.
Best Practices, Big Data, Security
- Where are the Opportunities for Machine Learning Startups? - Jun 8, 2016.
Machine learning has permeated data-driven businesses, which means almost all businesses. Here are a few areas where it’s possible that big corporations haven’t already eaten everybody’s lunch.
Machine Learning, Startup
- Data Science of Variable Selection: A Review - Jun 7, 2016.
There are as many approaches to selecting features as there are statisticians since every statistician and their sibling has a POV or a paper on the subject. This is an overview of some of these approaches.
Algorithms, Big Data, Feature Selection, Statistics
- Big Data Business Model Maturity Index and the Internet of Things (IoT) - Jun 7, 2016.
This post explores how organizations could use the Big Data Business Model Maturity Index (BDBMMI) to exploit the Internet of Things (IoT).
Big Data, Internet of Things, IoT, Maturity Model
- R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results - Jun 6, 2016.
R remains the leading tool, with 49% share, but Python grows faster and almost catches up to R. RapidMiner remains the most popular general Data Science platform. Big Data tools used by almost 40%, and Deep Learning usage doubles.
Pages: 1 2
Data Mining Software, Data Science Platform, Poll, Python, Python vs R, R, RapidMiner, SQL
- Ethics in Machine Learning – Summary - Jun 6, 2016.
Still worried about the AI apocalypse? Here we are discussion about the constraints and ethics for the machine learning algorithms to prevent it.
AI, Ethics, Machine Learning, MLconf, Seattle, WA
What is the Difference Between Deep Learning and “Regular” Machine Learning? - Jun 3, 2016.
Another concise explanation of a machine learning concept by Sebastian Raschka. This time, Sebastian explains the difference between Deep Learning and "regular" machine learning.
Convolutional Neural Networks, Deep Learning
- 5 Reasons Machine Learning Applications Need a Better Lambda Architecture - Jun 2, 2016.
The Lambda Architecture enables a continuous processing of real-time data. It is a painful process that gets the job done, but at a great cost. Here is a simplified solution called as Lambda-R (ƛ-R) for the Relational Lambda.
Applications, Lambda Architecture, Machine Learning, Monte Zweben, Splice Machine
- Udacity Nanodegree Programs: Machine Learning, Data Analyst, and more - Jun 1, 2016.
Develop new skills. Be in demand. Accelerate your career with the credential that fast-tracks you to career success.
Machine Learning, Online Education, Udacity
- Top 10 Open Dataset Resources on Github - May 31, 2016.
The top open dataset repositories on Github include a variety of data, freely available for use by researchers, practitioners, and students alike.
Datasets, GitHub, Machine Learning, Open Data
- Predicting Popularity of Online Content - May 30, 2016.
A look at predicting what makes online content popular, with a particular focus on images, especially selfies.
Pages: 1 2
Prediction, Selfie
- Free eBook: Healthcare Social Media Analytics and Marketing - May 27, 2016.
Get your free copy of a new ebook outlining social media marketing and analytics strategies (including code) for healthcare professionals.
Free ebook
- A Concise Overview of Standard Model-fitting Methods - May 27, 2016.
A very concise overview of 4 standard model-fitting methods, focusing on their differences: closed-form equations, gradient descent, stochastic gradient descent, and mini-batch learning.
Pages: 1 2
Cost Function, Gradient Descent, Machine Learning, Sebastian Raschka
- 5 Ways in Which Big Data Can Help Leverage Customer Data - May 25, 2016.
Every business enterprise realizes the importance of big data but rarely puts the customer data that they possess to good use. Here are few ways enterprises can leverage customer data.
Analytics, Big Data, Data Management, Data Mining
- Let Me Hear Your Voice and I’ll Tell You How You Feel - May 24, 2016.
This post provides an overview of a voice tone analyzer implemented as part of a cohesive emotion detection system, directly from the researcher and architect.
Artificial Intelligence, Deep Learning, Emotion
- 10 Must Have Data Science Skills, Updated - May 23, 2016.
An updated look at the state of the data science landscape, and the skills - both technical and non-technical - that are absolutely required to make it as a data scientist.
Pages: 1 2
Advice, Books, Data Science Skills, Data Scientist, MOOC
- How to Explain Machine Learning to a Software Engineer - May 20, 2016.
How do you explain what machine learning is to the uninitiated software engineer? Read on for one perspective on doing so.
Automating, Machine Learning, Software Engineer
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
Data Cleaning, Deep Learning, Machine Learning, Open Source, Overlook, Pandas, Python, scikit-learn, Theano