  • Recurrent Neural Networks Tutorial, Introduction

    Recurrent Neural Networks (RNNs) are popular models that have shown great promise in NLP and many other Machine Learning tasks. Here is a much-needed guide to key RNN models and a few brilliant research papers.

  • How big data can help in home health care?

    Proper home care services can reduce both the chances and the cost of hospitalization and manage illness. Understand what big data promises for the healthcare sector and what are practical hurdles standing between the current solutions.

  • Top /r/MachineLearning Posts, September: Implement a neural network from scratch in C++

    Neural network in C++ for beginners, Chinese character handwriting recognition beats humans, a handy machine learning algorithm cheat sheet, neural nets versus functional programming, and a neural nets paper repository.

  • Crushed it! Landing a data science job

    Data scientist interviews depend on the company and the team, it might look like a software developer’s interview, or statistician’s interview. Here we collected some hot tips to pass along if you’re thinking about a move soon.

  • What Types of Questions Can Data Science Answer

    Data science has enabled us to solve complex and diverse problems by using machine learning and statistic algorithms. Here we have enumerated the common applications of supervised, unsupervised and reinforcement learning techniques

  • Data Lake vs Data Warehouse: Key Differences

    We hear lot about the data lakes these days, and many are arguing that a data lake is same as a data warehouse. But in reality, they are both optimized for different purposes, and the goal is to use each one for what they were designed to do.

  • BABELNET 3.5, Largest Multilingual Dictionary and Semantic Network

    BabelNet 3.5 covers 272 languages, and offers an improved user interface, new integrated resources of Wikiquote, VerbNet, Microsoft Terminology, GeoNames, WoNeF and ImageNet, and a very large knowledge base with over 380 million semantic relations.

  • Topological Analysis and Machine Learning: Friends or Enemies?

    What is the interaction between Topological Data Analysis and Machine Learning ? A case study shows how TDA decomposition of the data space provides useful features for improving Machine Learning results.

  • The Master Algorithm – new book by top Machine Learning researcher Pedro Domingos

    Wonderfully erudite, humorous, and easy to read, the Master Algorithm by top Machine Learning researcher Pedro Domingos takes you on a journey to visit the 5 tribes of Machine Learning experts and helps you understand what the Master Algorithm can be.

  • 15 Mathematics MOOCs for Data Science

    The essential mathematics necessary for Data Science can be acquired with these 15 MOOCs, with a strong emphasis on applied algebra & statistics.

  • SentimentBuilder: Visual Analysis of Unstructured Texts

    Sankey diagrams are mainly used to visualize the flow of data on energy flows, material flow and trade-offs. SentimentBuilder found how to use them with unstructured text in their online NLP tool.

  • Top 10 Quora Machine Learning Writers and Their Best Advice

    Top Quora machine learning writers give their advice on pursuing a career in the field, academic research, and selecting and using appropriate technologies.

  • Top 10 Quora Data Science Writers and Their Best Advice

    Top Quora data science writers give their advice on pursuing a career in the field, approaching interviews, and selecting appropriate technologies.

  • The 123 Most Influential People in Data Science

    We used LittleBird algorithm to build a true Data Science influencer network by measuring how often influencers retweet other influencers. Top influencers include @hmason, @kdnuggets, @kaggle, @peteskomoroch, @mrogati, and @KirkDBorne.

  • Big Data Monetization Lessons from Zillow

    In the current tsunami of “Big Data” every business wants to get value out of the data. Here, we are sharing lessons learned by the new real estate websites who have brought together Big Data sets, home buyers, and home sellers.

  • A Great way to learn Data Science by simply doing it

    There are tons of great online resources out there we can pick up and learn them to become a master in data science. Here is a comprehensive list of data science course providers along with links to the data science courses.

  • Data Science Data Architecture

    Data scientists are kind of a rare breed, who juggles between data science, business and IT. But, they do understand less IT than an IT person and understands less business than a business person. Which demands a specific workflow and data architecture.

  • Salaries by Roles in Data Science and Business Intelligence

    Data Scientist is the hottest role. What's next? We present national average salaries, job title progression in career, job trends and skills for popular job titles in Data Science & Business Intelligence. Check out the salaries of related roles.

  • 1KDnuggets Home Page

  • Spark SQL for Real-Time Analytics

    Apache Spark is the hottest topic in Big Data. This tutorial discusses why Spark SQL is becoming the preferred method for Real Time Analytics and for next frontier, IoT (Internet of Things).

  • 60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning, Python, R, and more

    Here is a great collection of eBooks written on the topics of Data Science, Business Analytics, Data Mining, Big Data, Machine Learning, Algorithms, Data Science Tools, and Programming Languages for Data Science.

  • How to Balance the Five Analytic Dimensions

    When developing a solution one has to consider data complexity, speed, analytic complexity, accuracy & precision, and data size. It is not possible to best in all categories, but it is necessary to understand the trade-offs.

  • The one language a Data Scientist must master

    Getting started with the data science, and wondering which language to pick up and technology to explore. But, that is secondary, every business is structured differently and to understand it and build on top of it, is the crux of data science.

  • Big Data Influence on Data Driven Advertising

    More and more companies relying on big data for their data driven initiatives. In a survey conducted by BlueKai, we are trying to capture what its impact on advertising strategies.

  • How to become a Data Scientist for Free

    Here are the most required skills for a data scientist position based on ReSkill’s analyses of thousands of job posts and free resources to learn each skill.

  • Gartner 2015 Hype Cycle: Big Data is Out, Machine Learning is in

    Which are the most hyped technologies today? Check out Gartner's latest 2015 Hype Cycle Report. Autonomous cars & IoT stay at the peak while big data is losing its prominence. Smart Dust is a new cool technology for the next decade!

  • Data Hierarchy of Needs

    Data Hierarchy of Needs helps understand the steps in Big Data processing. Before going to advanced data modeling (top of the pyramid), organizations need to fill huge holes they frequently have in the base of the pyramid, lacking reliable complete data flow.

  • Paradoxes of Data Science

    There are many paradoxes, ironies and disconnects in today’s world of data science: pain points, things ignored, shoved under the rug, denied or paid lip.

  • Poll Results: Where is Big Data? For most, Largest Dataset Analyzed is in laptop-size GB range

    A majority of data scientists (56%) work in Gigabyte dataset range. We note a small increase in Petabyte (web-scale) data miners, and a decline in Megabyte data miners. US, Australia/NZ, and Asia lead in percentage of Terabyte and Petabyte analysts.

  • Big Idea To Avoid Overfitting: Reusable Holdout to Preserve Validity in Adaptive Data Analysis

    Big Data makes it all too easy find spurious "patterns" in data. A new approach helps avoid overfitting by using 2 key ideas: validation should not reveal any information about the holdout data, and adding of a small amount of noise to any validation result.

  • Recycling Deep Learning Models with Transfer Learning

    Deep learning exploits gigantic datasets to produce powerful models. But what can we do when our datasets are comparatively small? Transfer learning by fine-tuning deep nets offers a way to leverage existing datasets to perform well on new tasks.

  • 11 things to know about Sentiment Analysis

    Seth Grimes, a text analytics guru, shares 11 key observations on what works, what is past, what is coming, and what to keep in mind while doing sentiment analysis.

  • 3D Data Sculptures: a New Way to Visualize Data

    3D printing can go beyond printing products like iPod cases, or butterfly earrings, and can offer a sustainable way to understand strategic DATA by printing decision support landscapes.

  • R Programming: Who, Where and What

    The “sexiest job” has the sexiest demand, and R is one of their leading weapons. Here, we are trying to capture how these unicorns are distributed, and also where you can move if you want to have great opportunities.

  • Three Essential Components of a Successful Data Science Team

    A Data Science team, carefully constructed with the right set of dedicated professionals, can prove to be an asset to any organization,

  • Understanding Basic Concepts and Dispersion

    In analytics it is a common practice to understand the basic statistical properties of its variables viz. range, mean and deviation. Centrality measures are the most important to them, explore how to use these measures.

  • Five Steps to Implement an Enterprise Data Lake

    This guide helps you to initiate a new IT culture mapped to your business goals, and shows how do create an efficient data reservoir, what makes data more useful, and what are the cutting-edge tools/devices/applications you need.

  • How Long Should You Stay at Your Analytics Job?

    Considering the huge demand for the data scientists many are pondering to switch for a better profile and salary. But, there some things to be pondered about like what should be the interval between two switches, acquiring new skills and your loyalty.

  • Patterns for Streaming Realtime Analytics

    Design patterns are well-known for solving the recurrent problems in software engineering, on similar lines we can have Streaming Realtime Analytics patterns and avoid reinventing the wheel. Here, you can see the major patterns we found out for it.

  • Cartoon: Big Data and the dog question

    It used to be that nobody on the internet knew that I was a dog ... New KDnuggets cartoon examines the dog question in the era of Big Data.

  • New Standard Methodology for Analytical Models

    Traditional methods for the analytical modelling like CRISP-DM have several shortcomings. Here we describe these friction points in CRISP-DM and introduce a new approach of Standard Methodology for Analytics Models which overcomes them.

  • Data is Ugly – Tales of Data Cleaning

    Whether you want to do business analytics or build the deep learning models, getting correct data and cleansing it appropriately remains the major task. Find out experts opinions on how you can make efficient data cleansing and collection efforts.

  • An Easy Way to Create Algorithm Visualizations

    Google's DeepDream project has gone viral which allows to visualize the deep learning neural networks. It highlights a need for a generalized algorithm visualization tool, in this post we introduce to you one such effort.

  • Interview: Thanigai Vellore, on Delivering Contextually Relevant Search Experience

    We discuss the role of Analytics at, the polyglot data architecture at, the use cases for Hadoop, vendor selection, supporting semantic search and experience with Avro.

  • Book: Healthcare Data Analytics

    Written by prominent researchers and experts working in the healthcare domain, this book provides a clear understanding of the analytical techniques currently available to solve healthcare problems.

  • Top June stories: Top 20 Python Machine Learning Projects; Which Big Data, Data Mining Tools go together?

    Top 20 Python Machine Learning Open Source Projects; Which Big Data, Data Mining, and Data Science Tools go together?; Popular Deep Learning Tools - a review; Why Does Deep Learning Work?

  • Deep Learning Adversarial Examples – Clarifying Misconceptions

    Google scientist clarifies misconceptions and myths around Deep Learning Adversarial Examples, including: they do not occur in practice, Deep Learning is more vulnerable to them, they can be easily solved, and human brains make similar mistakes.

  • 50+ Data Science and Machine Learning Cheat Sheets

    Gear up to speed and have Data Science & Data Mining concepts and commands handy with these cheatsheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark and Machine learning algorithms.

  • How to properly present a Data Mining project?

    Building models and getting insights are job half done for the data scientist, presenting them to the audience is an art itself. See, how to approach the presentation after wrapping up the data science project.

  • Can Deep Learning Help you Find the Perfect Girl? – Part 2

    Using Deep Learning to find the perfect match, PhD student Harm de Vries describes the process of data collection and analysis. Finally, the results from matching algorithm are compared to human assessment for identifying an individual's dating preferences.

  • Can deep learning help find the perfect date?

    When a Machine Learning PhD student at University of Montreal starts using Tinder, he soon realises that something is missing in the dating app - the ability to predict to which girls he is attracted. Harm de Vries applies Deep Learning to assist in the pursuit of the perfect match.

  • Emacs for Data Science

    Data science nowadays demands a polyglot developer and, choosing a correct code editor would definitely be a worthy investment. Here we provide, important features of Emacs and its advantages over other editors.

  • Dataiku Data Science Studio – intuitive solution for data professionals

    Data Science Studio (DSS) from Dataiku is an intuitive software solution that let data professionals harness the power of big data. The latest version DSS 2.0 brings predictive analytics to a whole new level in terms of collaboration and usability.

  • Deep Learning and the Triumph of Empiricism

    Theoretical guarantees are clearly desirable. And yet many of today's best-performing supervised learning algorithms offer none. What explains the gap between theoretical soundness and empirical success?

  • Top stories for Jun 28 – Jul 4: Top 20 R packages by popularity; Nine Laws of Data Mining

    Top 20 R packages by popularity; Top 20 R Machine Learning and Data Science packages; Nine Laws of Data Mining; The missing D in Data Science.

  • Data Science and Big Data: Two very Different Beasts

    Creating artifact from the ore requires the tools, craftmanship and science. Same is the case of big data and data science, here we present the distinguishing factors between the ore and the artifact.

  • Using Ensembles in Kaggle Data Science Competitions- Part 3

    Earlier, we showed how to create stacked ensembles with stacked generalization and out-of-fold predictions. Now we'll learn how to implement various stacking techniques.

  • Excellent Tutorial on Sequence Learning using Recurrent Neural Networks

    Excellent tutorial explaining Recurrent Neural Networks (RNNs) which hold great promise for learning general sequences, and have applications for text analysis, handwriting recognition and even machine translation.

  • Open Source Enabled Interactive Analytics: An Overview

    Explaining the aspects of creating an interactive data driven dashboard using open source technologies i.e. MongoDB, D3.Js, DC.JS and Node JS.

  • Using Ensembles in Kaggle Data Science Competitions – Part 2

    Aspiring to be a Top Kaggler? Learn more methods like Stacking & Blending. In the previous post we discussed about ensembling models by ways of weighing, averaging and ranks. There is much more to explore in Part-2!

  • Top 20 R packages by popularity

    Wondering which are the most popular R packages? Here's an analysis based on most downloaded R packages from Jan to May 2015 to identify the top trending packages in the R world!

  • Top 20 R Machine Learning and Data Science packages

    We list out the top 20 popular Machine Learning R packages by analysing the most downloaded R packages from Jan-May 2015.

  • Top 10 Machine Learning Videos on YouTube

    The top machine learning videos on YouTube include lecture series from Stanford and Caltech, Google Tech Talks on deep learning, using machine learning to play Mario and Hearthstone, and detecting NHL goals from live streams.

  • Popular Deep Learning Tools – a review

    Deep Learning is the hottest trend now in AI and Machine Learning. We review the popular software for Deep Learning, including Caffe, Cuda-convnet, Deeplearning4j, Pylearn2, Theano, and Torch.

  • In Machine Learning, What is Better: More Data or better Algorithms

    Gross over-generalization of “more data gives better results” is misguiding. Here we explain, in which scenario more data or more features are helpful and which are not. Also, how the choice of the algorithm affects the end result.

  • Interview: Joseph Babcock, Netflix on Genie, Lipstick, and Other In-house Developed Tools

    We discuss role of analytics in content acquisition, data architecture at Netflix, organizational structure, and open-source tools from Netflix.

  • Interview: Joseph Babcock, Netflix on Discovery and Personalization from Big Data

    We discuss the steps involved in Discovery process at Netflix, impact due to multitude of devices, system generated logs, and surprising insights.

  • Cognitive Computing: Solving the Big Data Problem?

    With a shortage of data scientists, what are the alternatives for making sense of Big Data? We examine Cognitive Computing, its strengths, and how it can fit into the current Big Data landscape.

  • Which Big Data, Data Mining, and Data Science Tools go together?

    We analyze the associations between the top Big Data, Data Mining, and Data Science tools based on the results of 2015 KDnuggets Software Poll. Download anonymized data and analyze it yourself.

  • Love, Sex and Predictive Analytics

    Here, we are trying to understand the working mechanisms of dating sites, algorithms used and role of predictive analytics while matchmaking. We have also gleaned some interesting analytical insights from them.

  • Top 30 Social Network Analysis and Visualization Tools

    We review major tools and packages for Social Network Analysis and visualization, which have wide applications including biology, finance, sociology, network theory, and many other domains.

  • Top 20 Python Machine Learning Open Source Projects

    We examine top Python Machine learning open source projects on Github, both in terms of contributors and commits, and identify most popular and most active ones.

  • Applied Statistics Is A Way Of Thinking, Not Just A Toolbox

    The choice of tools in applied statistics is driven by the objective, the structure of the data, and the nature of the uncertainty in the numbers, whereas in academic statistics its driven by publishing or teaching. Here we provide some of common statistical tools and the overlapping genealogy.

  • Insights from Data Science Handbook

    Here you can find perspective of lead data scientists on the definitions ranging from data science, metrics selection while solving a problem, work ethics, the art of storytelling and why data science is important in todays world.

  • Miner3D Data Visualization System Version 8

    The new software features a redesigned user interface, making it a perfect complement for Excel. New graphics visualization engine is now faster and smoother.

  • KDnuggets™ News 15:n17, May 27: R wins Annual Poll; Top 10 Algorithms; Interview with Spark Creator

    R leads RapidMiner, Python catches up - Annual Software Poll; Top 10 Data Mining Algorithms; Exclusive Interview: Matei Zaharia, creator of Apache Spark; 5 Not-to-be-Missed Ideas about Big Data.

  • Dark Knowledge Distilled from Neural Network

    Geoff Hinton never stopped generating new ideas. This post is a review of his research on “dark knowledge”. What’s that supposed to mean?

  • R vs Python for Data Science: The Winner is …

    In the battle of "best" data science tools, python and R both have their pros and cons. Selecting one over the other will depend on the use-cases, the cost of learning, and other common tools required.

  • R leads RapidMiner, Python catches up, Big Data tools grow, Spark ignites

    R is the most popular overall tool among data miners, although Python usage is growing faster. RapidMiner continues to be most popular suite for data mining/data science. Hadoop/Big Data tools usage grew to 29%, propelled by 3x growth in Spark. Other tools with strong growth include H2O (0xdata), Actian, MLlib, and Alteryx.

  • Exclusive Interview: Matei Zaharia, creator of Apache Spark, on Spark, Hadoop, Flink, and Big Data in 2020

    Apache Spark is one the hottest Big Data technologies in 2015. KDnuggets talks to Matei Zaharia, creator of Apache Spark, about key things to know about it, why it is not a replacement for Hadoop, how it is better than Flink, and vision for Big Data in 2020.

  • Top 10 Data Mining Algorithms, Explained

    Top 10 data mining algorithms, selected by top researchers, are explained here, including what do they do, the intuition behind the algorithm, available implementations of the algorithms, why use them, and interesting applications.

  • I’ve Been Replaced by an Analytics Robot

    A veteran statistician reflects on the journey from a statistician of the past to data scientist of today, how the work he used to do became automated, and what future can data scientists can expect.

  • Most Viewed Data Mining Videos on YouTube

    The top Data Mining YouTube videos by those like Google and Revolution Analytics covers topics ranging from statistics in data mining to using R for data mining to data mining in sports.

  • How to Lead a Data Science Contest without Reading the Data

    We examine a “wacky” boosting method that lets you climb the public leaderboard without even looking at the data . But there is a catch, so read on before trying to win Kaggle competitions with this approach.

  • Data Science for Workforce Optimization: Reducing Employee Attrition

    Predictive analytics is growing its reach, see how it is affecting workforce analytics domain. In this presentation Pasha Roberts explains what is in it for students, managers and practitioners.

  • Surprising Random Correlations

    An interesting demo showing how easy it is to find surprising correlations in real data. Is German unemployment rate related to Apple Stock? Is 10-year Treasury rate related to price of Red Winter Wheat? You will be surprised.

  • Seven Techniques for Data Dimensionality Reduction

    Performing data mining with high dimensional data sets. Comparative study of different feature selection techniques like Missing Values Ratio, Low Variance Filter, PCA, Random Forests / Ensemble Trees etc.

  • Plotly: Online Dashboards That Update Your Data and Graphs

    New online visualization option from allows you to have data visualizations and graphs that update dynamically.

  • Machine Learning Wars: Amazon vs Google vs BigML vs PredicSis

    Comparing 4 Machine Learning APIs: Amazon Machine Learning, BigML, Google Prediction API and PredicSis on a real data from Kaggle, we find the most accurate, the fastest, the best tradeoff, and a surprise last place.

  • Cartoon: Data Scientist Mother

    We revisit KDnuggets Cartoon which looks at the Mother of All Data. Enjoy and don't forget the mothers in your life - Big Data predicted that 67.53% of you would remember!

  • Most Viewed Big Data Videos on YouTube

    The top Big Data YouTube videos by those like Hortonworks and Kirk D. Borne cover diverse topics including Hadoop, Big Data Trends, Deep Learning, and Big Data Leadership.

  • The Inconvenient Truth About Data Science

    Data is never clean, you will spend most of your time cleaning and preparing data, 95% of tasks do not require deep learning, and more inconvenient wisdom.

  • Data Scientists Automated and Unemployed by 2025?

    Will Data Scientists be unemployed by 2025? Majority of voters in latest KDnuggets Poll expect expert-level Data Science to be automated in 10 years or less.

  • Top LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science – Discussions up, Engagement down

    While discussions are growing, the comments and engagements are falling, especially since 2012. We cluster groups into 4 quadrants by activity level and identify most active and engaged groups. Open groups are twice as active as closed.

  • WebDataCommons – the Data and Framework for Web-scale Mining

    The WebDataCommons project extracts the largest publicly available hyperlink graph, large product-, address-, recipe-, and review corpora, as well as millions of HTML tables from the Common Crawl web corpus and provides the extracted data for public download.

  • How To Become a Data Scientist And Get Hired

    A data scientist should be able to choose the right technology, understand the business context and solve a wide range of problems. To hire the the right data scientist, check the tips list in the post.

  • The Myth of Model Interpretability

    Deep networks are widely regarded as black boxes. But are they truly uninterpretable in any way that logistic regression is not?

  • New Hybrid Rare-Event Sampling Technique for Fraud Detection

    Proposed hybrid sampling methodology may prove useful when building and validating machine learning models for applications where target event is rare, such as fraud detection.

  • Data Mining: New Comprehensive Textbook by Charu Aggarwal

    This comprehensive data mining textbook explores the different aspects of data mining, from basics to advanced, and their applications, and may be used for both introductory and advanced data mining courses.

  • Top 10 R Packages to be a Kaggle Champion

    Kaggle top ranker Xavier Conort shares insights on the “10 R Packages to Win Kaggle Competitions”.

  • Algorithmia Tested: Human vs Automated Tag Generation

    Algorithmia, the marketplace for algorithms, can be a platform for hosting APIs to do a plethora of text analytics and information retrieval tasks. Automatic post tagging is done in this case study to demonstrate the effectiveness and ease-of-use of the platform.

  • Algorithmia: Building a web site explorer in 5 easy steps

    We show how to use Algorithmia for quickly building a functional web site explorer in 5 steps: GetLinks, PageRank, Url2text, Summarizer and AutoTag.

  • Cartoon: A solution for Data Scientists allergies caused by Big Data

    With more and more allergies and big trend towards gluten-free everything, new KDnuggets cartoon envisions a possible solution for Data Scientists allergies.

  • Data Science 101: Preventing Overfitting in Neural Networks

    Overfitting is a major problem for Predictive Analytics and especially for Neural Networks. Here is an overview of key methods to avoid overfitting, including regularization (L2 and L1), Max norm constraints and Dropout.

  • Interview: Ksenija Draskovic, Verizon on Dissecting the Anatomy of Predictive Analytics Projects

    We discuss Predictive Analytics use cases at Verizon Wireless, advantages of a unified data view, model selection and common causes of failure.

  • Awesome Public Datasets on GitHub

    A long, categorized list of large datasets (available for public use) to try your analytics skills on. Which one would you pick?

  • Hadoop as a Service: 18 Cloud Options

    Hadoop as a service in the cloud makes big data applications and projects easier to approach and these 18 platforms each provide their own unique solutions.

  • Computing Platforms for Analytics, Data Mining, Data Science

    The poll results suggest a split between a majority of data miners and data scientists who work with growing but still "PC-size", small GB-sized data, and a smaller group of Big Data analysts who work with cloud-sized data. Cloud computing, Unix, and especially Mac gained in popularity.

  • Interview: Bill Moreau, USOC on Evidence-based Medicine to Reduce Sports Injuries

    We discuss the success of Analytics in predicting sports injuries, recent progress in concussion management and the trends in data-driven evidence-based sports medicine.

  • More Free Data Mining, Data Science Books and Resources

    More free resources and online books by leading authors about data mining, data science, machine learning, predictive analytics and statistics.

  • Talking Machine – 3 Deep Learning Gurus Talk about History and Future of Machine Learning, part 1

    An recent interview from the talking machine podcast with three deep learning experts. They talked about the neural network winter and its renewal.

  • Do We Need More Training Data or More Complex Models?

    Do we need more training data? Which models will suffer from performance saturation as data grows large? Do we need larger models or more complicated models, and what is the difference?

  • Interview: Brad Klingenberg, StitchFix on Building Analytics-powered Personal Stylist

    We discuss StitchFix, how it leverages Analytics, understanding customer preferences, and pros-and-cons of involving human judgement in the recommendation process.

  • Top KDnuggets tweets, Mar 16-18: 87 Studies shown that accurate numbers aren’t more useful than the ones you make up (Dilbert)

    Also Sirius - a free, open-source version of Siri; #PI art: the first 13,689 digits of pi; Great tutorial + #Python code: 1-Layer Neural Networks.

  • Small Data requires Specialized Deep Learning and Yann LeCun response

    For industries that have relatively small data sets (less than a petabyte), a Specialized Deep Learning approach based on unsupervised learning and domain knowledge is needed.

  • Interview: Vince Darley, on the Serious Analytics behind Casual Gaming

    We discuss key characteristics of social gaming data, ML use cases at King, infrastructure challenges, major problems with A-B testing and recommendations to resolve them.

  • Coursera: Process Mining: Data science in Action, April 2015

    Due to the big success of the first run, this 6 week online course is repeated on Coursera, starting April 1. This free course provides data science knowledge that can be applied directly to analyze and improve processes in a variety of domains.

  • Deep Learning for Text Understanding from Scratch

    Forget about the meaning of words, forget about grammar, forget about syntax, forget even the very concept of a word. Now let the machine learn everything by itself.

  • Deep Learning, The Curse of Dimensionality, and Autoencoders

    Autoencoders are an extremely exciting new approach to unsupervised learning and for many machine learning tasks they have already surpassed the decades of progress made by researchers handpicking features.

  • SQL-like Query Language for Real-time Streaming Analytics

    We need SQL like query language for Realtime Streaming Analytics to be expressive, short, fast, define core operations that cover 90% of problems, and to be easy to follow and learn.

  • Machine Learning Table of Elements Decoded

    Machine learning packages for Python, Java, Big Data, Lua/JS/Clojure, Scala, C/C++, CV/NLP, and R/Julia are represented using a cute but ill-fitting metaphor of a periodic table. We extract the useful links.

  • 7 common mistakes when doing Machine Learning

    In statistical modeling, there are various algorithms to build a classifier, and each algorithm makes a different set of assumptions about the data. For Big Data, it pays off to analyze the data upfront and then design the modeling pipeline accordingly.

  • 10 Predictive Analytics Influencers You Need to Know

    A list of Predictive Analytics Influencers based on Twitter activity around “#PredictiveAnalytics” and “Predictive Analytics”: Gregory Piatetsky, Vineet Vashishta, Aki Kakko and more.

  • The Elements of Data Analytic Style – checklist

    Jeff Leek book "Elements of Data Analytic Style" had a rocket launch, thanks to author course on Coursera. The book includes a useful checklist that can guide beginning data analysts or serve for evaluating data analyses.

  • IBM Big Data & Analytics Heroes

    IBM's Big Data & Analytics Heroes include leaders in the field that propel the industry in order to promote thought leadership and progress in Big Data Analytics.

  • Interview: David Kasik, Boeing on Data Analysis vs Data Analytics

    We discuss the impact of increasing amount of data on visualization, difference between Data Analysis and Data Analytics, motivation, trends, desired skills and more.

  • Google BigQuery Public Datasets

    Google BigQuery is not only a fantastic tool to analyze data, but it also has a repository of public data, including GDELT world events database, NYC Taxi rides, GitHub archive, Reddit top posts, and more.

  • Fun and Top! US States in 2 Words using twitteR

    Combining twitteR package with text mining techniques and visualization tools can produce interesting outputs. Find out which US state is fun and top, and which is good and crazy, according to Twitter.

  • History of Data Science Infographic in 5 strands

    History of Data Science infographic presents key events in Data Science across 5 strands: Computer Science, Data Technology, Visualization, Mathematics/OR, and Statistics.

  • Automatic Statistician and the Profoundly Desired Automation for Data Science

    The Automatic Statistician project by Univ. of Cambridge and MIT is pushing ahead the frontiers of automation for the selection and evaluation of machine learning models. In general, what does automation mean to Data Science?

