  • Top Spark Ecosystem Projects

    Apache Spark has developed a rich ecosystem, including both official and third party tools. We have a look at 5 third party projects which complement Spark in 5 different ways.

  • New Salford Predictive Modeler 8

    Salford Predictive Modeler software suite: Faster. More Comprehensive Machine Learning. More Automation. Better results. Take a giant step forward in your data science productivity with SPM 8. Download and try it today!

  • The Mirage of a Citizen Data Scientist

    The term "citizen data scientist" has been irritating me recently. I explain why I think it both a bad term and a bad idea, and what we need instead.

  • Dynamic Data Visualization with PHP and MySQL: Election Spending

    Learn how to fetch data from MySQL database using PHP and create dynamic charts with that data, using an interesting example of New Hampshire primary election spending.

  • Distributed TensorFlow Has Arrived

    Google has open sourced its distributed version of TensorFlow. Get the info on it here, and catch up on some other TensorFlow news at the same time.

  • Data Science and Disability

    Data Science and Artificial Intelligence has come to the forefront of technology in the last few years. Learn, how practitioners are taking a more philanthropic outlook on life, supporting people suffering with both physical and mental disabilities.

  • Building Zoomable Line Charts in jQuery

    Learn how to build zoomable line charts using FusionCharts’ core JS library and its jQuery charts plugin, and get started making some beautiful data visualizations for the web.

  • Tree Kernels: Quantifying Similarity Among Tree-Structured Data

    An in-depth, informative overview of tree kernels, both theoretical and practical. Includes a use case and some code after the discussion.

  • A comparison between PCA and hierarchical clustering

    Graphical representations of high-dimensional data sets are the backbone of exploratory data analysis. We examine 2 of the most commonly used methods: heatmaps combined with hierarchical clustering and principal component analysis (PCA).

  • How Small is the World, Really?

    Social network analysis is back in the news again, with a recent Facebook project which determined that there are an average of 3.5 intermediaries between any 2 Facebook users. But this is different than "6 degrees of separation." Read on to find out why, and how.

  • Top 10 Data Visualization Projects on Github

    Github provides a number of open source data visualization options for data scientists and application developers integrating quality visuals. This is a list and description of the top project offerings available, based on the number of stars.

  • How Data Science is Fighting Disease

    Many organisations are starting to use Data Science as a method of tracking, diagnosing and curing some of the world’s most widespread diseases. We look at 3 common diseases, and how Data Science is used to save lives.

  • 21 Must-Know Data Science Interview Questions and Answers, part 2

    Second part of the answers to 20 Questions to Detect Fake Data Scientists, including controlling overfitting, experimental design, tall and wide data, understanding the validity of statistics in the media, and more.

  • Getting Started with Data Visualization

    Data visualization is on the rise nowadays. This step-by-step tutorial covers the process of creating your first data visualization using FusionCharts.

  • Opening Up Deep Learning For Everyone

    Opening deep learning up to everyone is a noble goal. But is it achievable? Should non-programmers and even non-technical people be able to implement deep neural models?

  • Data Lake Plumbers: Operationalizing the Data Lake

    Gain insight into data lakes, their benefits, when they are appropriate, and how to operationalize them. How do they compare to the data warehouse?

  • Big Data Is Driving Your Car

    Never mind driverless cars! Big Data is already hard at work in every aspect of the automotive industry, including safety, design, marketing and more. We look at where Big Data is having an impact on the cars that we are driving.

  • How IBM Watson is Taking on The World

    We have made tremendous progress in the field of data analysis and on the other, our technology is getting smart. IBM has taken a solid stride in the direction of Artificial Intelligence by unveiling its supercomputer IBM Watson, learn what it can do, its adopters and what it holds for the future.

  • Amazon Machine Learning: Nice and Easy or Overly Simple?

    Amazon Machine Learning is a predictive analytics service with binary/multiclass classification and linear regression features. The service is fast, offers a simple workflow but lacks model selection features and has slow execution times.

  • Gartner 2016 Magic Quadrant for Advanced Analytics Platforms: gainers and losers">2016 Silver BlogGartner 2016 Magic Quadrant for Advanced Analytics Platforms: gainers and losers

    We compare Gartner 2016 Magic Quadrant Advanced Analytics Platforms vs its 2015 version and identify notable changes for leaders and challengers: SAS, IBM, RapidMiner, KNIME, Dell, Angoss, and Microsoft.

  • The ICLR Experiment: Deep Learning Pioneers Take on Scientific Publishing

    Deep learning pioneers Yann LeCun and Yoshua Bengio have undertaken a grand experiment in academic publishing. Embracing a radical level of transparency and unprecedented public participation, they've created an opportunity not only to find and vet the best papers, but also to gather data about the publication process itself.

  • Data Scientist Valentine’s Day Collection

    We review Data Scientist Valentine's Day options with several topical cartoons, including Scarledoopython, Neural net predictions, and dating algorithm adjustments.

  • Elementary, My Dear Watson! An Introduction to Text Analytics via Sherlock Holmes

    Want to learn about the field of text mining, go on an adventure with Sherlock & Watson. Here you will find what are different sub-domains of text mining along with a practical example.

  • Scikit Flow: Easy Deep Learning with TensorFlow and Scikit-learn">2016 Silver BlogScikit Flow: Easy Deep Learning with TensorFlow and Scikit-learn

    Scikit Learn is a new easy-to-use interface for TensorFlow from Google based on the Scikit-learn fit/predict model. Does it succeed in making deep learning more accessible?

  • Data Science Skills for 2016

    As demand for the hottest job is getting hotter in new year, the skill set required for them is getting larger. Here, we are discussing the skills which will be in high demand for data scientist which include data visualization, Apache Spark, R, python and many more.

  • Does Machine Learning allow opposites to attract?

    Most online dating sites use 'Netflix-style' recommendations which match people based on their shared interests and likes. What about those matches that work so well because people are so different - here is my example.

  • 21 Must-Know Data Science Interview Questions and Answers">2016 Gold Blog21 Must-Know Data Science Interview Questions and Answers

    KDnuggets Editors bring you the answers to 20 Questions to Detect Fake Data Scientists, including what is regularization, Data Scientists we admire, model validation, and more.

  • Auto-Scaling scikit-learn with Spark

    Databricks gives us an overview of the spark-sklearn library, which automatically and seamlessly distributes model tuning on a Spark cluster, without impacting workflow.

  • 9 Must-Have Datasets for Investigating Recommender Systems

    Gain some insight into a variety of useful datasets for recommender systems, including data descriptions, appropriate uses, and some practical comparison.

  • 4 Reasons Why We Need More Women In Big Data

    Gender imbalance in the workforce has been highlighted alarmingly during the recent years. Here, we are providing you a couple of reasons, including the inherent advantage and lack of stereotype for role to hire women data scientists.

  • Top 10 TED Talks for the Data Scientists">2016 Silver BlogTop 10 TED Talks for the Data Scientists

    TEDTalks have been a great platform for sharing ideas and inspirations. Here, we have sifted ten interesting talks for the data scientist from statistics, social media and economics domains.

  • Avoid These Common Data Visualization Mistakes

    Data Visualization is a handy tool which can lead to interesting discoveries about the data, which otherwise wouldn’t have been possible. But, there are common mistakes which could produce the misdirecting results. Learn what are they and how you can avoid them.

  • Cartoon: Deeper Deep Learning

    New KDnuggets Cartoon looks at a creative new way of achieving even better results and breaking through Machine Learning barriers with even "deeper" Deep Learning approach.

  • AI Supercomputers: Microsoft Oxford, IBM Watson, Google DeepMind, Baidu Minwa

    In the world of AI, this is the equivalent of the US and USSR competing to put their guy on the moon first. Here is a profile of some of the giants locked into the AI space race.

  • Python Data Science with Pandas vs Spark DataFrame: Key Differences

    A post describing the key differences between Pandas and Spark's DataFrame format, including specifics on important regular processing features, with code samples.

  • Is Deep Learning Overhyped?

    With all of the success that deep learning is experiencing, the detractors and cheerleaders can be seen coming out of the woodwork. What is the real validity of deep learning, and is it simply hype?

  • Deep Learning with Spark and TensorFlow

    The integration of TensorFlow with Spark leverages the distributed framework for hyperparameter tuning and model deployment at scale. Both time savings and improved error rates are demonstrated.

  • Businesses Will Need One Million Data Scientists by 2018

    Deepening shortage of Data Science talent and cybersecurity challenges are trends shaping business in 2016.

  • How to Check Hypotheses with Bootstrap and Apache Spark

    Learn how to leverage bootstrap sampling to test hypotheses, and how to implement in Apache Spark and Scala with a complete code example.

  • Useful Data Science: Feature Hashing

    Feature engineering plays major role while solving the data science problems. Here, we will learn Feature Hashing, or the hashing trick which is a method for turning arbitrary features into a sparse binary vector.

  • Implementing Your Own k-Nearest Neighbor Algorithm Using Python

    A detailed explanation of one of the most used machine learning algorithms, k-Nearest Neighbors, and its implementation from scratch in Python. Enhance your algorithmic understanding with this hands-on coding exercise.

  • How to Tackle a Lottery with Mathematics

    With mathematical rigor and narrative flair, Adam Kucharski reveals the tangled history of betting and science. The house can seem unbeatable. In this book, Kucharski shows us just why it isn't. Even better, he shows us how the search for the perfect bet has been crucial for the scientific pursuit of a better world.

  • Google Launches Deep Learning with TensorFlow MOOC

    Google and Udacity have partnered for a new self-paced course on deep learning and TensorFlow, starting immediately.

  • Top 2015 KDnuggets Stories on Analytics, Big Data, Data Science, Data Mining, Machine Learning, updated

    R vs Python for Data Science: The Winner is ...; 60+ Free Books on Big Data, Data Science, Data Mining, Machine Learning; Top 20 Python Machine Learning Open Source Projects; 50+ Data Science and Machine Learning Cheat Sheets.

  • Anthony Goldbloom gives you the Secret to winning Kaggle competitions

    Kaggle CEO shares insights on best approaches to win Kaggle competitions, along with a brief explanation of how Kaggle competitions work.

  • Yahoo Releases the Largest-ever Machine Learning Dataset for Researchers

    Are you interested in massive amounts of data for research? Yahoo has just released the largest-ever machine learning dataset to the research community.

  • Research Leaders on Data Mining, Data Science and Big Data key advances, top trends

    Research Leaders in Data Science and Big Data reflect on the most important research advances in 2015 and the key trends expected to dominate throughout 2016.

  • Data Science Humor: Google Analytics, if Applied in Real Life

    From the lighter side: how Google Analytics would look if applied in real life situations.

  • Top 100 Big Data Experts to Follow

    Maptive gives us another list of top Big Data Influencers to check out, including data-driven reasons as to why individuals are included.

  • Top 10 Deep Learning Projects on Github

    The top 10 deep learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.

  • Podcasts on AI, Analytics, Big Data, Data Science, Machine Learning

    Becoming a Data Scientist Podcast, started Dec 2015. Behind Data Science Podcast, by Big Cloud. Started April 2020. Brave New World Podcast, a look into Read more »

  • Free Online Course: Statistical Learning

    With a free MOOC from Stanford, dive into statistical learning with the respected professors who literally wrote the book on it.

  • Attention and Memory in Deep Learning and NLP

    An overview of attention mechanisms and memory in deep neural networks and why they work, including some specific applications in natural language processing and beyond.

  • 7 Steps to Understanding Deep Learning

    There are many deep learning resources freely available online, but it can be confusing knowing where to begin. Go from vague understanding of deep neural networks to knowledgeable practitioner in 7 steps!

  • Understanding Rare Events and Anomalies: Why streaks patterns change

    We often look back at the past year and an overall history of rare events, and try to then extrapolate future odds of the same rare event, based on that. We illustrate here, that rare past events have no usefulness in understanding the rarity of the same events in the future!

  • Data Science Resume Tips and Guidelines

    A well-built resume is key to get through the first door – in the process of getting hired as a Data Scientist. Learn more, about how to present yourself as a true DS and which pitfalls to avoid.

  • AMA Data Scientist, Jan 13: Jake Porway of DataKind

    Jake Porway is a machine learning and technology enthusiast, and founder of DataKind nonprofit which helps organizations use the power of data science in the service of humanity. He will do Reddit AMA on Jan 13, 2016.

  • Free Book Download: Statistical Learning with Sparsity: The Lasso and Generalizations

    We witness an explosion of Big Data in finance, biology, medicine, marketing, and other fields. This book describes the important statistical ideas for learning from large and sparse data in a common conceptual framework.

  • 20 Questions to Detect Fake Data Scientists">2016 Gold Blog20 Questions to Detect Fake Data Scientists

    Hiring Data Scientists is no easy job, particularly when there are plenty of fake posers. Here is a handy list of questions to help separate the wheat from the chaff.

  • What questions can data science answer?

    There are only five questions machine learning can answer: Is this A or B? Is this weird? How much/how many? How is it organized? What should I do next? We examine these questions in detail and what it implies for data science.

  • DeepLearningKit – Open Source Deep Learning Framework for Apple iOS, OS X

    We are introducing you to the new deep learning framework “DeepLearningKit”, for the Apple based OS which is developed in Metal and Swift.

  • Software development skills for data scientists

    Data science is not only about building the models and sharing insights, many times they have to collaborate in deploying models and sharing them with software developers, learn which things to keep in mind while doing so.

  • The Art of Data Science: The Skills You Need and How to Get Them

    Learn, how to turn the deluge of data into the gold by algorithms, feature engineering, reasoning out business value and ultimately building a data driven organization.

  • Tour of Real-World Machine Learning Problems

    The tour lists 20 interesting real-world machine learning problems for data science enthusiasts to learn by solving.

  • Lessons from 2 Million Machine Learning Models on Kaggle

    Lessons from Kaggle competitions, including why XG Boosting is the top method for structured problems, Neural Networks and deep learning dominate unstructured problems (visuals, text, sound), and 2 types of problems for which Kaggle is suitable.

  • More Data Science Humor and Cartoons

    More humor and cartoons from Andrii aka San Sanych, #HappyDataScientist.

  • 5 Criteria To Determine If Your Data Is Ready For Serious Data Science

    If your data is a large, relevant, accurate, connected, and you also have a sharp question, you ready to do some serious data science. If you’re weak on 1-2 points, don’t worry. But if most criteria are not true, you need to do more preparation.

  • Everything You Need to Know about Natural Language Processing

    Natural language processing (NLP) helps computers understand human speech and language. We define the key NLP concepts and explain how it fits in the bigger picture of Artificial Intelligence.

  • Top stories for Dec 13-19: Top 10 Machine Learning Projects on Github; Importance of Data Science for IoT business

    Top 10 Machine Learning Projects on Github; Using Python and R together: main approaches; Importance of Data Science for IoT business; Top 10 Deep Learning Tips, Tricks.

  • 50 Deep Learning Software Tools and Platforms, Updated

    We present the popular software & toolkit resources for Deep Learning, including Caffe, Cuda-convnet, Deeplearning4j, Pylearn2, Theano, and Torch. Explore the new list!

  • Top 10 Machine Learning Projects on Github">2016 Silver BlogTop 10 Machine Learning Projects on Github

    The top 10 machine learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.

  • Big Data and Data Science for Security and Fraud Detection

    We review big data analytics tools and technologies that combine text mining, machine learning and network analysis for security threat prediction, detection and prevention at an early stage.

  • Top New Features in Orange 3 Data Mining Platform

    The main technical advantage of Orange 3 is its integration with NumPy and SciPy libraries. Other improvements include reading online data, working through queries for SQL and pre-processing.

  • Using Python and R together: 3 main approaches

    Well if Data Science and Data Scientists can not decide on what data to choose to help them decide which language to use, here is an article to use BOTH.

  • OpenText Data Digest Dec 4: Data Is Beautiful

    This week we look at the 2015 winners of the “Information Is Beautiful” Awards, including Red vs Blue politics, a World of languages, and Working for a living.

  • Anomaly Detection in Predictive Maintenance with Time Series Analysis

    How can we predict something we have never seen, an event that is not in the historical data? This requires a shift in the analytics perspective! Understand how to standardization the time and perform time series analysis on sensory data.

  • Create or machine-learn fuzzy logic rules for use with an on-line inference engine

    New DocAndys SaaS service supports user-created embeddable Fuzzy Logic Expert Systems. Use rule language Darl to hand-create or machine-learn rule sets from data and use them via REST interfaces.

  • Learning from Hurricanes: Big Data Analytics, Risk, & Data Visualization

    This year, Florida has experienced its 10th consecutive year without a hurricane, which is longest period without a hurricane strike in modern times. Exploring this is worthy of some examination, as it offers us many lessons in Big Data Analytics, Risk, and Data Visualization.

  • Beyond One-Hot: an exploration of categorical variables

    Coding categorical variables into numbers, by assign an integer to each category ordinal coding of the machine learning algorithms. Here, we explore different ways of converting a categorical variable and their effects on the dimensionality of data.

  • 50 Useful Machine Learning & Prediction APIs

    We present a list of 50 APIs selected from areas like machine learning, prediction, text analytics & classification, face recognition, language translation etc. Start consuming APIs!

  • Deep Learning Transcends the Bag of Words

    Generative RNNs are now widely popular, many modeling text at the character level and typically using unsupervised approach. Here we show how to generate contextually relevant sentences and explain recent work that does it successfully.

  • Make Beautiful Interactive Data Visualizations Easily, Dec 15 Webinar

    Learn how to use Bokeh interactive visualization framework for open data science to create rich, interactive visualizations in the browser, without writing a line of JavaScript, HTML, or CSS.

  • Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet

    Training deep neural nets can take precious time and resources. By leveraging an existing distributed batch processing framework, SparkNet can train neural nets quickly and efficiently.

  • Sentiment Analysis 101

    Sentiment analysis can be incredibly useful, and can help companies better answer pertinent questions and gain valuable business insights. Sentiment analysis technologies will continue to improve as they become more widely adopted. But what can sentiment analysis do for you?

  • How do Neural Networks Learn?

    Neural networks are generating a lot of excitement, while simultaneously posing challenges to people trying to understand how they work. Visualize how neural nets work from the experience of implementing a real world project.

  • Amazon Top 20 Books in Neural Networks

    These are the most popular neural networks books on Amazon. Perhaps there is something of interest to you here.

  • 5 Tribes of Machine Learning – Questions and Answers

    Leading researcher Pedro Domingos answers questions on 5 tribes of Machine Learning, Master Algorithm, No Free Lunch Theorem, Unsupervised Learning, Ensemble methods, 360-degree recommender, and more.

  • Detecting In-App Purchase Fraud with Machine Learning

    Hacking applications allow users to make in-app purchases for free. With help from a few big games in the GROW data network we were able to build a model that classifies each purchase as real or fraud, with a very high level of accuracy.

  • Career path explained: Big Data Hadoop DEVELOPER to ARCHITECT

    The path to becoming a Big Data and Hadoop Architect is fraught with major challenges and responsibilities, but here is a handy infographic to help you chart out your path.

  • The hardest parts of data science

    The hardest part of data science is not building an accurate model or obtaining good, clean data, but defining feasible problems and coming up with reasonable ways of measuring solutions.

  • Top KDnuggets tweets, Nov 16-22: Dilbert discovers the perfect chart; TensorFlow Disappoints – Google Deep Learning falls shallow

    A standard #graph for any occasion! #Dilbert discovers the perfect chart; TensorFlow Disappoints - Google #DeepLearning falls shallow; All the #BigData tools and how to use them; KDnuggets #DataScience #Cartoon Caption Contest.

  • What is the importance of Dark Data in Big Data world?

    Dark data is a subset of big data, but it constitutes the biggest portion of the total volume of big data collected by organizations in a year. We will discuss about what opportunities this holds for an organization.

  • Deep Learning for Visual Question Answering

    Here we discuss about the Visual Question Answering problem, and I’ll also present neural network based approaches for same.

  • 7 Steps to Mastering Machine Learning With Python

    There are many Python machine learning resources freely available online. Where to begin? How to proceed? Go from zero to Python machine learning hero in 7 steps!

  • The different data science roles in the industry

    Data science roles and responsibilities are diverse and skills required for them vary considerably. Here, we have described the different data science roles along with the skill set, technical knowledge and mindset required to carry it.

  • Getting started with Python and Apache Flink

    Apache Flink built on top of the distributed streaming dataflow architecture, which helps to crunch massive velocity and volume data sets. With version 1.0 it provided python API, learn how to write a simple Flink application in python.

  • A Statistical View of Deep Learning

    A statistical overview of deep learning, with a focus on testing wide-held beliefs, highlighting statistical connections, and the unseen implications of deep learning. The post links to 6 articles covering a number of related topics.

  • Understanding Convolutional Neural Networks for NLP

    Dive into the world of Convolution Neural Networks (CNN), learn how they work, how to apply them for NLP, and how to tune CNN hyperparameters for best performance.

  • Fast Big Data: Apache Flink vs Apache Spark for Streaming Data

    Real-time stream processing has been gaining momentum in recent past, and major tools which are enabling it are Apache Spark and Apache Flink. Learn with the help of a case study about Data processing, Data Flow, Data management using these tools.

  • Data Science of IoT: Sensor fusion and Kalman filters, Part 2

    The second part of this tutorial examines use of Kalman filters to determine context for IoT systems, which helps to combine uncertain measurements in a multi-sensor system to accurately and dynamically understand the physical world.

  • What No One Tells You About Real-Time Machine Learning

    Real-time machine learning has access to a continuous flow of transactional data, but what it really needs in order to be effective is a continuous flow of labeled transactional data, and accurate labeling introduces latency.

  • Topological Data Analysis – Open Source Implementations

    Topological Data Analysis (TDA) is making waves in the analytics community lately, but are there open source options available?

  • 5 Best Machine Learning APIs for Data Science

    Machine Learning APIs make it easy for developers to develop predictive applications. Here we review 5 important Machine Learning APIs: IBM Watson, Microsoft Azure Machine Learning, Google Prediction API, Amazon Machine Learning API, and BigML.

  • Cartoon: It all started with the iPhone answering my email

    New KDnuggets cartoon reacts to recent news that Gmail will use Machine Learning to offer answers to your emails. Here is where it can lead ...

  • Data-Planet Statistical Datasets

    Data-Planet Statistical Datasets provides easy access to an extensive repository of standardized and structured statistical data, with more than 25 billion data points from more than 70 source organizations.

  • Why Deep Learning Works – Key Insights and Saddle Points

    A quality discussion on the theoretical motivations for deep learning, including distributed representation, deep architecture, and the easily escapable saddle point.

  • Overview of Python Visualization Tools

    An overview and comparison of the leading data visualization packages and tools for Python, including Pandas, Seaborn, ggplot, Bokeh, pygal, and Plotly.

  • How Data Science increased the profitability of the e-commerce industry?

    Data Science helps businesses provide a richer understanding of the customers by capturing and integrating the information on customers web behaviour, their life events, what led to the purchase of a product or service, how customers interact with different channels, and more.

  • 6 crazy things Deep Learning and Topological Data Analysis can do with your data

    Want to analyze a high dimensional dataset and you are running out of options? Find out how Deep Learning combined with Topological Data Analysis can do exactly that and more.

  • 5 Warning Signs that Turn Off Data Science Hiring Managers

    Here are some warning signs that will prevent managers from hiring you for a Data Science position. If your resume has one or more of them, make an effort to remove the risk factors.

  • How Big Data is used in Recommendation Systems to change our lives

    A Recommendation systems have impacted or even redefined our lives in many ways. It works in well-defined, logical phases which are data collection, ratings, and filtering.

  • Integrating Python and R, Part 2: Executing R from Python and Vice Versa

    The second in a series of blog posts that: outline the basic strategy for integrating Python and R, we will concentrate on how the two scripts can be linked together by getting R to call Python and vice versa.

  • Integrating Python and R into a Data Analysis Pipeline, Part 1

    The first in a series of blog posts that: outline the basic strategy for integrating Python and R, run through the different steps involved in this process; and give a real example of how and why you would want to do this.

  • We need a statistically rigorous and scientifically meaningful definition of replication

    Replication and confirmation are indispensable concepts that help define scientific facts. It seems that before continuing the debate over replication, we need a statistically meaningful definition of replication.

  • Data Science of IoT: Sensor fusion and Kalman filters, Part 1

    The Kalman filter has numerous applications, including IoT and Sensor fusion, which helps to determine the State of an IoT based computing system based on sensor input.

  • Amazon Top 20 Books in Data Mining

    These are the most popular data mining books on Amazon. As you look to increase your knowledge, is there something listed here that is missing from your collection?

  • Random vs Pseudo-random – How to Tell the Difference

    Statistical know-how is an integral part of Data Science. Explore randomness vs. pseudo-randomness in this explanatory post with examples.

  • Cartoon: KDnuggets Addiction

    New Cartoon looks at a serious case of KDnuggets addiction and what can be done about it.

  • The Data Science Machine, or ‘How To Engineer Feature Engineering’

    MIT researchers have developed what they refer to as the Data Science Machine, which combines feature engineering and an end-to-end data science pipeline into a system that beats nearly 70% of humans in competitions. Is this game-changing?

  • MetaMind Mastermind Richard Socher: Uncut Interview

    In a wide-ranging interview, Richard Socher opens up about MetaMind, deep learning, the nature of corporate research, and the future of machine learning.

  • Infographic – Data Scientist or Business Analyst? Knowing the Difference is Key

    Infographic depicting unique differences between data scientists and business analysts. Find out what type of professional is needed to meet your organization’s needs.

  • Which Movie Sequels Are Really Better? A Data Science Answer

    The internet is filled with polls and lists of sequels that are better or worse movie in the series. Yet such rankings are often based on personal judgement and rarely on data and statistics. Here is our solution to analyze and visualize the movie series.

  • The Best Advice From Quora on ‘How to Learn Machine Learning’

    Top machine learning writers on Quora give their advice on learning machine learning, including specific resources, quotes, and personal insights, along with some extra nuggets of information.

  • Aspect Based Sentiment Analysis Competition

    SemEval is back and so is the Aspect Based Sentiment Analysis (ABSA) competition, which has gone multilingual for ABSA16. Get all of the details below.

  • Does Deep Learning Come from the Devil?

    Deep learning has revolutionized computer vision and natural language processing. Yet the mathematics explaining its success remains elusive. At the Yandex conference on machine learning prospects and applications, Vladimir Vapnik offered a critical perspective.

  • Online course: Credit Risk Modeling

    The course covers basic and advanced modeling, including stress testing Probability of Default (PD), Loss Given Default (LGD ) and Exposure At Default (EAD) models.

  • Recurrent Neural Networks Tutorial, Introduction

    Recurrent Neural Networks (RNNs) are popular models that have shown great promise in NLP and many other Machine Learning tasks. Here is a much-needed guide to key RNN models and a few brilliant research papers.

  • How big data can help in home health care?

    Proper home care services can reduce both the chances and the cost of hospitalization and manage illness. Understand what big data promises for the healthcare sector and what are practical hurdles standing between the current solutions.

