For data scientists, journalists, and business analysts, PLOTCON is THE opportunity to meet the creators of the tools you use everyday, ask questions, hear where the future is heading, and be part of the conversation. Use code KDNUGGETS to save.
Here are 3 key traits that differentiate between a data scientist and a great data scientist, starting with – great data scientist is obsessed with solving problems, not new tools.
Unlike a lot of other tutorials which often pull from the real-time Twitter API, we will be using the downloadable Twitter Analytics data, and most of what we do will be done in Pandas.
Structural Equation Modeling (SEM) is an extremely broad and flexible framework for data analysis, perhaps better thought of as a family of related methods rather than as a single technique. What is its relevance to Marketing Research?
This post approaches getting started with deep learning from a framework perspective. Gain a quick overview and comparison of available tools for implementing neural networks to help choose what's right for you.
The focus is increasingly shifting from storing and processing Big Data in an efficient way, to applying traditional and new machine learning techniques to drive higher value from the data at hand.
The author went from securities analyst to Head of Data Science at Amazon. He describes what he learned in his journey and gives 4 useful rules based on his experience.
This article is intended to help define the data scientist role, including typical skills, qualifications, education, experience, and responsibilities. This definition is somewhat loose, and given that the ideal experience and skill set is relatively rare to find in one individual.
We've put together a brief summary of the top algorithms used in predictive analysis, which you can see just below. Read to learn more about Linear Regression, Logistic Regression, Decision Trees, Random Forests, Gradient Boosting, and more.
Different business units in the organisation have different behaviours (e.g. turnover rate) and they can’t be compared with each other. So, how can we tell whether the changes in their behaviour are reasons for concern?
This post is an overview of a spam filtering implementation using Python and Scikit-learn. The results of 2 classifiers are contrasted and compared: multinomial Naive Bayes and support vector machines.
Beware of online and market research studies which can lead to false or spurious claims. We examine several notable examples including Google Street View and Argentina inflation.
March Madness is upon us. But before you get your brackets set, check out this overview of using machine learning to do the heavy lifting for you. A great discussion, and a timely topic.
We detail 50 companies leading the Artificial Intelligence revolution in AD Sales, CRM, Autotech, Business Intelligence and analytics, Commerce, Conversational AI/Bots, Core AI, Cyber-Security, Fintech, Healthcare, IoT, Vision, and other areas.
There is no one profile for the Data Scientist, but I tried to make a few generic job profiles that can somewhat fit job descriptions of different companies. I think there is way too much variety, but I had to narrow down on a set of profiles. Check out the list.
The third and final part of 17 new must-know Data Science interview questions and answers covers A/B testing, data visualization, Twitter influence evaluation, and Big Data quality.
Learn how to experiment with embodied robotic cognition with IBM Project Intu, a platform that extends Deep Learning and other cognitive services to new devices with minimum coding.
Are you a data science professional and want to advance your career as Data Science Unicorn? Here we provide important business concepts and guidelines required for a data science techie to become a Unicorn.
What if a simple, deterministic approach which did not rely on randomization could be used for centroid initialization? Naive sharding is such a method, and its time-saving and efficient results, though preliminary, are promising.
This introductory tutorial does a great job of outlining the most common Numpy array creation and manipulation functionality. A good post to keep handy while taking your first steps in Numpy, or to use as a handy reminder.
When creating time-series line charts, it’s important to consider which of the following messages you would like to communicate: Actual value of units? Change in absolute units? Percent change? Change from a specific point in time?
At the core of customer segmentation is being able to identify different types of customers and then figure out ways to find more of those individuals so you can... you guessed it, get more customers!
We examine principles of good data visualization, including some great and terrible examples, guidelines for human perception, focus on key variables, changes and trends, avoiding chart junk, and more.
Unlike other data science problems, there is no one method for predicting which customers are likely to churn in the next month. Here we review the most popular approaches.
The article studies the advantage of Support Vector Regression (SVR) over Simple Linear Regression (SLR) models for predicting real values, using the same basic idea as Support Vector Machines (SVM) use for classification.
Neuroscience is very complex and advanced study of brain and people often misuse this term. Here we try to explain neuroscience terminologies and use of data science for such studies.
In this intro cluster analysis tutorial, we'll check out a few algorithms in Python so you can get a basic understanding of the fundamentals of clustering on a real dataset.
Job hunting is challenging and sometimes frustrating task and we all experience it in our career. Here we provide a very specific and practical guide to get your dream job in Data Science world.
This is an overview of the XGBoost machine learning algorithm, which is fast and shows good results. This example uses multiclass prediction with the Iris dataset from Scikit-learn.
If Big Data is to realize its potential, people need to understand what it is capable of, what information is out there and where every piece of data comes from. Without such transparency and understanding, it will be difficult to persuade people to rely on the findings.
Bokeh is the Python data visualization library that enables high-performance visual presentation of large datasets in modern web browsers. The package is flexible and offers lots of possibilities to visualize your data in a compelling way, but can be overwhelming.
Thomas Dinsmore critical examination of Gartner 2017 MQ of Data Science Platforms, including vendors who out, in, have big changes, Hadoop and Spark integration, open source software, and what Data Scientists actually use.
The most advanced kind of Deep Learning system will involve multiple neural networks that either cooperate or compete to solve problems. The core problem of a multi-agent approach is how to control its behavior.
For this guide, I spent 10+ hours trying to identify every online intro to data science course offered as of January 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings.
In this post, learn to build a bot to answer frequently asked questions, reducing lag time for more customers and taking the load off of engineers, ensuring they can concentrate on building products!
Getting new customers is much more more expensive than retaining existing ones, so reducing churn is a top priority for many firms. Understanding why customers churn and estimating the risks are powerful components of a data-driven retention strategy.
This post is a follow-up to last year's introductory Python machine learning post, which includes a series of tutorials for extending your knowledge beyond the original.