2019 May
All (83) | Courses, Education (2) | Meetings (3) | News (5) | Opinions (30) | Top Stories, Tweets (10) | Tutorials, Overviews (31) | Webcasts & Webinars (2)
-
A Step-by-Step Guide to Transitioning your Career to Data Science – Part 1 - May 31, 2019.
If you are looking to transition your career to data science, don't immediately start learning Python or R. Instead, leverage the domain expertise you have accumulated over the years. Here's a foolproof guide on how to do that. - Io-Tahoe Integrates with OneTrust and Joins Data Discovery Partner Program - May 31, 2019.
Io-Tahoe integrates with OneTrust to help customers populate the results of data discovery scans into the OneTrust Data Inventory & Mapping solution and trigger additional privacy workflows to maintain up-to-date records of processing.
- What Does a Lady Tasting Tea Have to Do with Science? - May 31, 2019.
Design of Experiments (DOE) is a statistical concept used to find the cause-and-effect relationships. Surprisingly, an experiment arising from a casual conversation about tea-drinking is one of the first examples of an experiment designed using statistical ideas.
- Why physical storage of your database tables might matter - May 31, 2019.
Follow this investigation into why physical storage of your database tables might matter, from problem identification to possible issue resolutions.
- Understanding Backpropagation as Applied to LSTM - May 30, 2019.
Backpropagation is one of those topics that seem to confuse many once you move past feed-forward neural networks and progress to convolutional and recurrent neural networks. This article gives you and overall process to understanding back propagation by giving you the underlying principles of backpropagation.
-
Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis - May 30, 2019.
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues. - How the Lottery Ticket Hypothesis is Challenging Everything we Knew About Training Neural Networks - May 30, 2019.
The training of machine learning models is often compared to winning the lottery by buying every possible ticket. But if we know how winning the lottery looks like, couldn’t we be smarter about selecting the tickets?
- Who is your Golden Goose?: Cohort Analysis - May 30, 2019.
Step-by-step tutorial on how to perform customer segmentation using RFM analysis and K-Means clustering in Python.
- Animations with Matplotlib - May 30, 2019.
Animations make even more sense when depicting time series data like stock prices over the years, climate change over the past decade, seasonalities and trends since we can then see how a particular parameter behaves with time.
- Top KDnuggets Tweets, May 22 – 28: Mona Lisa smiles, speaks, and frowns: #Machinelearning brings old paintings and photos to life - May 29, 2019.
Also: The 3 Biggest Mistakes on Learning Data Science; A gallery of interesting Jupyter Notebooks; How do you teach physics to machine learning models?
- How to use continual learning in your ML models, June 19 Webinar - May 29, 2019.
This webinar for professional data scientists will go over how to monitor models when in production, and how to set up automatically adaptive machine learning.
- Why organizations fail in scaling AI and Machine Learning - May 29, 2019.
We explain why AI needs to understand business processes and how the business processes need to be able to change to bring insight from AI into the process.
- Becoming a Level 3.0 Data Scientist - May 29, 2019.
Want to be a Junior, Senior, or Principal Data Scientists? Find out what you need to do to navigate the Data Science Career Game.
- Choosing Between Model Candidates - May 29, 2019.
Models are useful because they allow us to generalize from one situation to another. When we use a model, we’re working under the assumption that there is some underlying pattern we want to measure, but it has some error on top of it.
- Big Data and AI Toronto 2019 - May 28, 2019.
Don't miss Canada's #1 data, AI and analytics conference + expo. From solving your data-driven business challenges to helping you navigate the latest machine learning tools, Big Data and AI Toronto is designed to give you a 360-degree view on the industry.
- AI in the Family: how to teach machine learning to your kids - May 28, 2019.
AI is all the rage with today’s programmers, but what about the next generation? Machine learning can be introduced to young ones just now learning about code, and you can help spark their interest.
- Top Stories, May 20-26: 7 Steps to Mastering SQL for Data Science; The Data Fabric for Machine Learning - May 27, 2019.
Building a Computer Vision Model: Approaches and datasets; Your Guide to Natural Language Processing (NLP); Analyzing Tweets with NLP in Minutes with Spark, Optimus and Twint; The 3 Biggest Mistakes on Learning Data Science
- ICLR 2019 highlights: Ian Goodfellow and GANs, Adversarial Examples, Reinforcement Learning, Fairness, Safety, Social Good, and all that jazz - May 27, 2019.
We provide an overview of the main themes and topics discussed at this years International Conference on Learning Representations (ICLR).
- Boost Your Image Classification Model - May 27, 2019.
Check out this collection of tricks to improve the accuracy of your classifier.
- Careful! Looking at your model results too much can cause information leakage - May 24, 2019.
We all are aware of the issue of overfitting, which is essentially where the model you build replicates the training data results so perfectly its fitted to the training data and does not generalise to better represent the population the data comes to, with catastrophic results when you feed in new data and get very odd results.
- Analyzing Tweets with NLP in Minutes with Spark, Optimus and Twint - May 24, 2019.
Social media has been gold for studying the way people communicate and behave, in this article I’ll show you the easiest way of analyzing tweets without the Twitter API and scalable for Big Data.
- Your Guide to Natural Language Processing (NLP) - May 23, 2019.
This extensive post covers NLP use cases, basic examples, Tokenization, Stop Words Removal, Stemming, Lemmatization, Topic Modeling, the future of NLP, and more.
- End-to-End Machine Learning: Making videos from images - May 23, 2019.
Video is a natural way for us to understand three dimensional and time varying information. Read this short post on how to achieve the creation of videos from still images.
- When Too Likely Human Means Not Human: Detecting Automatically Generated Text - May 23, 2019.
Passably-human automated text generation is a reality. How do we best go about detecting it? As it turns out, being too predictably human may actually be a reasonably good indicator of not being human at all.
- Top KDnuggets Tweets, May 15 – 21: 7 Steps to Mastering SQL for Data Science — 2019 Edition - May 22, 2019.
Also: The Data Fabric for Machine Learning; 10 Free Must-Read Books for ML and Data Science; Another 10 Free Must-See Courses for Machine Learning and Data Science; WTF is a Tensor?!?
- Fixing a Major Weakness in Machine Learning of Images with Hinton’s Capsule Networks - May 22, 2019.
We explore Geoffrey Hinton's capsule networks to deal with rotational variance in images.
- Extracting Knowledge from Knowledge Graphs Using Facebook’s Pytorch-BigGraph - May 22, 2019.
We are using the state-of-the-art Deep Learning tools to build a model for predict a word using the surrounding words as labels.
- 6 Industries Warming up to Predictive Analytics and Forecasting - May 22, 2019.
Here are six sectors that are realizing how beneficial predictive analytics could be, embracing the possibilities of valuable insights extracted from such technology.
- How do you teach physics to machine learning models? - May 21, 2019.
How to integrate physics-based models (these are math-based methods that explain the world around us) into machine learning models to reduce its computational complexity.
- Probability Mass and Density Functions - May 21, 2019.
This content is part of a series about the chapter 3 on probability from the Deep Learning Book by Goodfellow, I., Bengio, Y., and Courville, A. (2016). It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts.
-
The Data Fabric for Machine Learning – Part 1 - May 21, 2019.
How the new advances in semantics and the data fabric can help us be better at Machine Learning - Customer Support Chatbots: Easier & More Effective Than You Think - May 20, 2019.
Learn how to create your own free chatbot environment with just a few commands, as well as learning more about the benefits of customer service chatbots.
- Building a Computer Vision Model: Approaches and datasets - May 20, 2019.
How can we build a computer vision model using CNNs? What are existing datasets? And what are approaches to train the model? This article provides an answer to these essential questions when trying to understand the most important concepts of computer vision.
- Top Stories, May 13-19: 7 Steps to Mastering SQL for Data Science — 2019 Edition; Mathematical programming — Key Habit to Build Up for Advancing Data Science - May 20, 2019.
Also: Machine Learning in Agriculture: Applications and Techniques; 60+ useful graph visualization libraries; How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls; The Third Wave Data Scientist; The 3 Biggest Mistakes on Learning Data Science
- Think Like an Amateur, Do As an Expert: Lessons from a Career in Computer Vision - May 17, 2019.
Dr. Takeo Kanade shared his life lessons from an illustrious 50-year career in Computer Vision at last year's Embedded Vision Summit. You have a chance to attend the 2019 Embedded Vision Summit, from May 20-23, in the Santa Clara Convention Center, Santa Clara CA.
-
60+ useful graph visualization libraries - May 17, 2019.
We outline 60+ graph visualization libraries that allow users to build applications to display and interact with network representations of data. - PyCharm for Data Scientists - May 17, 2019.
This article is a discussion of some of PyCharm's features, and a comparison with Spyder, an another popular IDE for Python. Read on to find the benefits and drawbacks of PyCharm, and an outline of when to prefer it to Spyder and vice versa.
-
7 Steps to Mastering SQL for Data Science — 2019 Edition - May 17, 2019.
Follow these updated 7 steps to go from SQL data science newbie to practitioner in a hurry. We consider only the necessary concepts and skills, and provide quality resources for each. - A complete guide to K-means clustering algorithm - May 16, 2019.
Clustering - including K-means clustering - is an unsupervised learning technique used for data classification. We provide several examples to help further explain how it works.
- Why Data Professionals Should Negotiate Every Job Offer - May 16, 2019.
Here are six reasons why you shouldn't feel tempted to jump at the chance and take that job offer as it is without first negotiating.
- Large-Scale Evolution of Image Classifiers - May 16, 2019.
Deep neural networks excel in many difficult tasks, given large amounts of training data and enough processing power. The neural network architecture is an important factor in achieving a highly accurate model... Techniques to automatically discover these neural network architectures are, therefore, very much desirable.
- Top KDnuggets tweets, May 08-14: The end of average Data Scientist is near: @GoogleAI fully automated, End-to-End #AutoML #MachineLearning - May 15, 2019.
Also: My favorite free courses to learn data structures and #algorithms in depth; “Please, explain.” Interpretability of machine learning models; Decoding ‘A Game of Thrones’ #GOT with data science; Another 10 Free Must-See Courses for Machine Learning and Data Science; Best Data Visualization Techniques for small and large data
- Building Recommender systems with Azure Machine Learning service - May 15, 2019.
Microsoft has provided a GitHub repository with Python best practice examples to facilitate the building and evaluation of recommendation systems using Azure Machine Learning services.
-
Mathematical programming — Key Habit to Build Up for Advancing Data Science - May 15, 2019.
We show how, by simulating the random throw of a dart, you can compute the value of pi approximately. This is a small step towards building the habit of mathematical programming, which should be a key skill in the repertoire of a budding data scientist. - Customer Churn Prediction Using Machine Learning: Main Approaches and Models, by Altexsoft - May 14, 2019.
We reach out to experts from HubSpot and ScienceSoft to discuss how SaaS companies handle the problem of customer churn prediction using Machine Learning.
-
Machine Learning in Agriculture: Applications and Techniques - May 14, 2019.
Machine Learning has emerged together with big data technologies and high-performance computing to create new opportunities to unravel, quantify, and understand data intensive processes in agricultural operational environments. - What’s Going to Happen this Year in the Data World - May 14, 2019.
"If we wish to foresee the future of mathematics, our proper course is to study the history and present condition of the science." Henri Poncairé.
- Top Stories, May 6-12: The Third Wave Data Scientist; The 3 Biggest Mistakes on Learning Data Science - May 13, 2019.
Also: Data Scientist Best Job of the Year in USA; How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls; 2019 KDnuggets Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months?; The most desired skill in data science; Please, explain. Interpretability of machine learning
- From Prediction to Decision at Predictive Analytics World – London & Berlin - May 13, 2019.
Top expert practitioners will gather in London (16-17 Oct) and Berlin (18-19 Nov), at the premier vendor-neutral machine learning conference, to describe the design, deployment and business impact of their machine learning projects.
- What my first Silver Medal taught me about Text Classification and Kaggle in general? - May 13, 2019.
A first-hand account of ideas tried by a competitor at the recent kaggle competition 'Quora Insincere questions classification', with a brief summary of some of the other winning solutions.
- Modeling 101 - May 13, 2019.
In the past couple of decades, innovation in statistics and machine learning has been increasing at a rapid pace and we are now able to do things unimaginable when I began my career.
- Data Science Poem - May 11, 2019.
A poem about Data Science.
- Data Science in the Senses - May 10, 2019.
The evening event at the Rev conference this year will be showcasing some amazing projects that leverage data and machine learning for sensory experiences.
-
How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls - May 10, 2019.
We outline some of the common pitfalls of machine learning for time series forecasting, with a look at time delayed predictions, autocorrelations, stationarity, accuracy metrics, and more. - Ethical AI: EU’s New Guidelines and the Future of AI Trustworthiness - May 10, 2019.
The EU has issued a set of guidelines, "Ethics Guidelines for Trustworthy AI" to help tech companies steer towards ethical and inclusive AI as we come to terms with the potential of this technology.
- 5 Things to Review Before Accepting That Data Scientist Job Offer - May 10, 2019.
Before you get too excited and sign the papers for that new data scientist job, and solidify your role as a new hire, make sure you look over these 5 things first.
- Top April Stories: The most desired skill in data science; Top 10 Coding Mistakes Made by Data Scientists - May 10, 2019.
Also: Another 10 Free Must-See Courses for Machine Learning and Data Science; How to Recognize a Good Data Scientist Job From a Bad One.
- Books on Graph-Powered Machine Learning, Graph Databases, Deep Learning for Search – 50% off - May 9, 2019.
These 3 books will help you make the most from graph-powered databases. For a limited time, get 50% off any of them with the code kdngraph.
- “Please, explain.” Interpretability of machine learning models - May 9, 2019.
Unveiling secrets of black box models is no longer a novelty but a new business requirement and we explain why using several different use cases.
- A Complete Exploratory Data Analysis and Visualization for Text Data: Combine Visualization and NLP to Generate Insights - May 9, 2019.
Visually representing the content of a text document is one of the most important tasks in the field of text mining as a Data Scientist or NLP specialist. However, there are some gaps between visualizing unstructured (text) data and structured data.
- Top KDnuggets tweets, May 01-07: The 3 Biggest Mistakes in Learning Data Science; ReinforcementLearning vs. Differentiable Programming; XGBoost Reign - May 8, 2019.
Also XGBoost Algorithm: Long May She Reign; CycleGANs to Create Computer-Generated #Art - #GANs #DeepLearning; Another 10 Free Must-See Courses for Machine Learning and Data Science.
-
Data Scientist – Best Job of the Year in USA - May 8, 2019.
CareerCast ranks Data Scientist as the top job in USA, with very good work environment, low stress, high growth, and median salary of $114,520. - [White Paper] Unlocking the Power of Data Science & Machine Learning with Python - May 8, 2019.
This guide from ActiveState provides an executive overview of how you can implement Python for your team’s data science and machine learning initiatives.
- How to fix an Unbalanced Dataset - May 8, 2019.
We explain several alternative ways to handle imbalanced datasets, including different resampling and ensembling methods with code examples.
- Linear Programming and Discrete Optimization with Python using PuLP - May 8, 2019.
Knowledge of such optimization techniques is extremely useful for data scientists and machine learning (ML) practitioners as discrete and continuous optimization lie at the heart of modern ML and AI systems as well as data-driven business analytics processes.
-
2019 KDnuggets Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months? - May 7, 2019.
Vote in KDnuggets 20th Annual Poll: What software you used for Analytics, Data Mining, Data Science, Machine Learning projects in the past 12 months? We will publish the anon data, results, and trends here. - Building an Effective Data Science Project Portfolio: A Live Webinar by Metis | May 15 | 1pm ET - May 7, 2019.
This workshop is designed for business leaders, data science managers, and decision makers who want to ensure the effectiveness of the AI and data science capabilities they are building.
- Best US/Canada Masters in Analytics, Business Analytics, Data Science - May 7, 2019.
In the final part of this series, we provide an updated list of our comprehensive, unbiased survey of graduate programs in Data Science and Analytics from across the US and Canada.
- Data Science vs. Decision Science - May 7, 2019.
Data science and decision science are related but still separate fields, so at some points, it might be hard to compare them directly. We attempted to show our vision of the commonalities, differences, and specific features of data science and decision science.
- Top Stories, Apr 29 – May 5: The most desired skill in data science; Top Data Science and Machine Learning Methods Used in 2018, 2019 - May 6, 2019.
Also: Normalization vs Standardization — Quantitative analysis; Build Your First Chatbot Using Python & NLTK; Which Deep Learning Framework is Growing Fastest?; Pandas DataFrame Indexing; XGBoost Algorithm: Long May She Reign
- Business analytics programs online - May 6, 2019.
Penn State's 9-credit Graduate Certificate in Business Analytics can help you learn to perform analytics tasks in critical business areas. Apply now.
- Unleash Big Data by SaaS-based End-to-End AutoML - May 6, 2019.
This SaaS-based end-to-end AutoML tool R2 Learn enables data scientists, developers and data analysts to increase productivity, reduce errors and build quality models. Try for Free today!
-
The Third Wave Data Scientist - May 6, 2019.
An extensive look at what skills are needed to make up the portfolio of the third wave of data scientists. -
The 3 Biggest Mistakes on Learning Data Science - May 6, 2019.
Data science or whatever you want to call it is not just knowing some programming languages, math, statistics and have “domain knowledge” and here I show you why. - Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course. - May 3, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
- Strata SF day 2 Highlights: AI and Politics, Chatbots Insights, Forecasting Uncertainty, Scalable Video Analysis, and more - May 3, 2019.
AI influencing Politics, insights from Chatbots, Enterprise Data Cloud, handling Video Big Data, and more takeaways from Strata Data Conference 2019, San Francisco.
- How to Automate Tasks on GitHub With Machine Learning for Fun and Profit - May 3, 2019.
Check this tutorial on how to build a GitHub App that predicts and applies issue labels using Tensorflow and public datasets.
- XGBoost Algorithm: Long May She Reign - May 2, 2019.
In recent years, XGBoost algorithm has gained enormous popularity in academic as well as business world. We outline some of the reasons behind this incredible success.
- Modeling Price with Regularized Linear Model & XGBoost - May 2, 2019.
We are going to implement regularization techniques for linear regression of house pricing data. Our goal in price modeling is to model the pattern and ignore the noise.
- Top KDnuggets tweets, Apr 24–30: Another 10 Free Must-Read Books for Machine Learning and Data Science; Top #DataScience & #MachineLearning Methods Used in 2018/19 - May 1, 2019.
Also: Data Visualization in Python: Matplotlib vs Seaborn; Data Science Project Flow for Startups; Pandas DataFrame Indexing; Best Data Visualization Techniques for small and large data; The most desired skill in #DataScience
-
How to correctly select a sample from a huge dataset in machine learning - May 1, 2019.
We explain how choosing a small, representative dataset from a large population can improve model training reliability. - Which Deep Learning Framework is Growing Fastest? - May 1, 2019.
In September 2018, I compared all the major deep learning frameworks in terms of demand, usage, and popularity. TensorFlow was the champion of deep learning frameworks and PyTorch was the youngest framework. How has the landscape changed?
- Build Your First Chatbot Using Python & NLTK - May 1, 2019.
Today we will learn to create a simple chat assistant or chatbot using Python’s NLTK library.