2019 Aug
All (87) | Courses, Education (1) | Meetings (5) | News (6) | Opinions (24) | Top Stories, Tweets (9) | Tutorials, Overviews (41) | Webcasts & Webinars (1)
- Get KDnuggets Pass to Strata Data or TensorFlow World
- Aug 30, 2019.
As a media partner for O'Reilly, KDnuggets is pleased to offer to our readers a chance to win a 2-day Bronze Conference pass to either Strata Data NYC or TensorFlow in Santa Clara. Enter by Sep 8, 2019.
- Emoji Analytics
- Aug 30, 2019.
Emoji is becoming a global language understandable by anyone who expresses... emotion. With the pervasiveness of these little Unicode blocks, we can perform analytics on their use throughout social media to gain insight into sentiments around the world.
- R Users’ Salaries from the 2019 Stackoverflow Survey
- Aug 30, 2019.
Let’s take a look on what R users are saying about their salaries. Note that the following results could be biased because of unrepresentative and in some cases small samples.
-
Object-oriented programming for data scientists: Build your ML estimator - Aug 30, 2019.
Implement some of the core OOP principles in a machine learning context by building your own Scikit-learn-like estimator, and making it better. -
Deep Learning Next Step: Transformers and Attention Mechanism - Aug 29, 2019.
With the pervasive importance of NLP in so many of today's applications of deep learning, find out how advanced translation techniques can be further enhanced by transformers and attention mechanisms. - 4 Tips for Advanced Feature Engineering and Preprocessing
- Aug 29, 2019.
Techniques for creating new features, detecting outliers, handling imbalanced data, and impute missing values.
-
Types of Bias in Machine Learning - Aug 29, 2019.
The sample data used for training has to be as close a representation of the real scenario as possible. There are many factors that can bias a sample from the beginning and those reasons differ from each domain (i.e. business, security, medical, education etc.) - The Death of Centralized AI and the Rise of Open AI
- Aug 29, 2019.
Centralized AI is giving way to more democratic AI systems, which are becoming more and more accessible to data scientists, both through code and through open ecosystems.
- Top KDnuggets tweets, Aug 21-27: Algorithms Notes for Professionals – Free Book
- Aug 28, 2019.
Algorithms Notes for Professionals - Free Book; 10 simple Linux tips which save 50% of my time in the command line; Why so many #DataScientists are leaving their jobs; Order Matters: Alibaba Transformer-based Recommender System
-
New Poll: Data Science Skills - Aug 28, 2019.
New KDnuggets poll asks 1) What Data Science/Machine Learning-related skills you currently have, and 2) Which skills you want to add or improve? If you are human, please vote and we will analyze and publish the results. - A 2019 Guide to Human Pose Estimation
- Aug 28, 2019.
Human pose estimation refers to the process of inferring poses in an image. Essentially, it entails predicting the positions of a person’s joints in an image or video. This problem is also sometimes referred to as the localization of human joints.
- TensorFlow 2.0: Dynamic, Readable, and Highly Extended
- Aug 27, 2019.
With substantial changes coming with TensorFlow 2.0, and the release candidate version now available, learn more in this guide about the major updates and how to get started on the machine learning platform.
- Introducing AI Explainability 360: A New Toolkit to Help You Understand what Machine Learning Models are Doing
- Aug 27, 2019.
Recently, AI researchers from IBM open sourced AI Explainability 360, a new toolkit of state-of-the-art algorithms that support the interpretability and explainability of machine learning models.
- The secret sauce for growing from a data analyst to a data scientist
- Aug 27, 2019.
Despite the increasing demand and appetite for experienced data scientists, the job is ambiguously described most of the times. Also, the delineation between data science and data analytics or engineering is still loosely defined by a lot of hiring managers.
-
Why Data Visualization Is The Most Important Skill in a Data Analyst Arsenal - Aug 26, 2019.
Visually-displayed data is much more accessible, and it’s critical to promptly identify the weaknesses of an organization, accurately forecast trading volumes and sale prices, or make the right business choices. - How to count Big Data: Probabilistic data structures and algorithms
- Aug 26, 2019.
Learn how probabilistic data structures and algorithms can be used for cardinality estimation in Big Data streams.
- Artificial Intelligence vs. Machine Learning vs. Deep Learning: What is the Difference?
- Aug 26, 2019.
Over the past few years, artificial intelligence continues to be one of the hottest topics. And in order to work effectively with it, you need to understand its constituent parts.
- Top Stories, Aug 19-25: Top Handy SQL Features for Data Scientists; Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch
- Aug 26, 2019.
Also: Deep Learning for NLP: Creating a Chatbot with Keras!; Understanding Decision Trees for Classification in Python; How to Become More Marketable as a Data Scientist; Is Kaggle Learn a Faster Data Science Education?
- How to Sell Your Boss on the Need for Data Analytics
- Aug 26, 2019.
Here are some ways you can make the case to your boss that analytics investments are smart for your company to pursue.
-
Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch - Aug 23, 2019.
Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs. -
Top Handy SQL Features for Data Scientists - Aug 23, 2019.
Whenever we hear "data," the first thing that comes to mind is SQL! SQL comes with easy and quick to learn features to organize and retrieve data, as well as perform actions on it in order to gain useful insights. - Order Matters: Alibaba’s Transformer-based Recommender System
- Aug 23, 2019.
Alibaba, the largest e-commerce platform in China, is a powerhouse not only when it comes to e-commerce, but also when it comes to recommender systems research. Their latest paper, Behaviour Sequence Transformer for E-commerce Recommendation in Alibaba, is yet another publication that pushes the state of the art in recommender systems.
- eBook: How to Enhance Privacy in Data Science
- Aug 22, 2019.
Check out this eBook, How to Enhance Privacy in Data Science, to equip yourself with the tools to enhance privacy in data science, including transforming data in a manner that protects the privacy, an overview of the challenges and opportunities of privacy-aware analytics, and more.
- Proptech and the proper use of technology for house sales prediction
- Aug 22, 2019.
Using the ATTOM dataset, we extracted data on sales transactions in the USA, loans, and estimated values of property. We developed an optimal prediction model from correlations in the time and status of ownership as well as the time of the year of sales fluctuations.
- How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions
- Aug 22, 2019.
As machine learning evolves, the need for tools and platforms that automate the lifecycle management of training and testing datasets is becoming increasingly important. Fast growing technology companies like Uber or LinkedIn have been forced to build their own in-house data lifecycle management solutions to power different groups of machine learning models.
- Top KDnuggets tweets, Aug 14-20: Researcher reproduced 130 research papers on “predicting the stock market”, coded them from scratch.
- Aug 21, 2019.
Also: For data pros only - An SQL Query walks into a bar and sees two tables; Deep Learning for NLP: Creating a Chatbot with Keras!; 12 NLP Researchers, Practitioners & Innovators You Should Be Following; Wanting to be even more marketable as a data scientist? Check out these trends in the skills employers are looking for today
- Comparing Decision Tree Algorithms: Random Forest® vs. XGBoost
- Aug 21, 2019.
Check out this tutorial walking you through a comparison of XGBoost and Random Forest. You'll learn how to create a decision tree, how to do tree bagging, and how to do tree boosting.
- Gender Diversity in AI Research
- Aug 21, 2019.
Through an analysis of 1.5M papers from arXiv, this study reviews the evolution of gender diversity across disciplines, countries, and institutions as well as the semantic differences between AI papers with and without female co-authors.
- Understanding Decision Trees for Classification in Python
- Aug 21, 2019.
This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.
- Automate Stacking In Python: How to Boost Your Performance While Saving Time
- Aug 21, 2019.
Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.
- Artificial Intelligence Is Not Intelligence – Interview With Andy Cotgreave (Keynote Speaker at Crunch Conf)
- Aug 20, 2019.
Crunch is coming to Budapest, Hungary on 16-18 Oct. Use code KDNuggets to save on Data Science, Data Engineering, or BI tracks. But first, read this interview with keynote speaker Andy Cotgreave.
- Detecting stationarity in time series data
- Aug 20, 2019.
Explore how to determine if your time series data is generated by a stationary process and how to handle the necessary assumptions and potential interpretations of your result.
-
Is Kaggle Learn a “Faster Data Science Education?” - Aug 20, 2019.
Kaggle Learn is "Faster Data Science Education," featuring micro-courses covering an array of data skills for immediate application. Courses may be made with newcomers in mind, but the platform and its content is proving useful as a review for more seasoned practitioners as well. - An Overview of Python’s Datatable package
- Aug 20, 2019.
Modern machine learning applications need to process a humongous amount of data and generate multiple features. Python’s datatable module was created to address this issue. It is a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum possible speed.
- Math for Programmers
- Aug 19, 2019.
Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.
- Crafting an Elevator Pitch for your Data Science Startup
- Aug 19, 2019.
If you are launching a data science startup, these tips will give you a head start as you seek capital for seed funding or your next level of growth.
-
Deep Learning for NLP: Creating a Chatbot with Keras! - Aug 19, 2019.
Learn how to use Keras to build a Recurrent Neural Network and create a Chatbot! Who doesn’t like a friendly-robotic personal assistant? - Top Stories, Aug 12-18: How to Become More Marketable as a Data Scientist; 12 NLP Researchers, Practitioners & Innovators You Should Be Following
- Aug 19, 2019.
Also: How to Become More Marketable as a Data Scientist; 6 Key Concepts in Andrew Ng’s “Machine Learning Yearning”; Understanding Cancer using Machine Learning; Command Line Basics Every Data Scientist Should Know
- Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data?
- Aug 19, 2019.
What’s the best way to execute your data integration tasks: writing manual code or using ETL tool? Find out the approach that best fits your organization’s needs and the factors that influence it.
-
How to Become More Marketable as a Data Scientist - Aug 16, 2019.
As a data scientist, you are in high demand. So, how can you increase your marketability even more? Check out these current trends in skills most desired by employers in 2019. -
Understanding Cancer using Machine Learning - Aug 16, 2019.
Use of Machine Learning (ML) in Medicine is becoming more and more important. One application example can be Cancer Detection and Analysis. - Pytorch Lightning vs PyTorch Ignite vs Fast.ai
- Aug 16, 2019.
Here, I will attempt an objective comparison between all three frameworks. This comparison comes from laying out similarities and differences objectively found in tutorials and documentation of all three frameworks.
- Data Driven Government – Speakers Highlights
- Aug 15, 2019.
The lineup of experienced, thought-leading speakers at Data Driven Government, Sep 25 in Washington, DC, will explain how to use data and analytics to more effectively accomplish your mission, increase efficiency, and improve evidence-based policymaking.
- How Concerned Should You be About Predictor Collinearity? It Depends…
- Aug 15, 2019.
Predictor collinearity (also known as multicollinearity) can be problematic for your regression models. Check out these rules of thumb about when, and when not, to be concerned.
-
Command Line Basics Every Data Scientist Should Know - Aug 15, 2019.
Check out this introductory guide to completing simple tasks with the command line. - Introducing the Plato Research Dialogue System: Building Conversational Applications at Uber’s Scale
- Aug 15, 2019.
While the process of building simple, domain-specific chatbots has gotten way easier, building large scale, multi-agent conversational applications remains a massive challenge. Recently, the Uber engineering team open sourced the Plato Research Dialogue System, which is the framework powering conversational agents across Uber’s different applications.
- Top KDnuggets tweets, Aug 07-13: Deep Learning Cheat Sheets; 12 NLP Researchers, Practitioners To Follow
- Aug 14, 2019.
Deep Learning Cheat Sheets; 12 NLP Researchers, Practitioners & Innovators You Should Be Following; Knowing Your Neighbours: Machine Learning on Graphs.
- Top July Stories: The Death of Big Data and the Emergence of the Multi-Cloud Era
- Aug 14, 2019.
Also: Top 13 Skills To Become a Rockstar Data Scientist, Top 10 Data Science Leaders You Should Follow; What's wrong with the approach to Data Science?
- Domain-Specific Language Processing Mines Value From Unstructured Data
- Aug 14, 2019.
Processing unstructured text data in real-time is challenging when applying NLP or NLU. Find out how Domain-Specific Language Processing can also help mine valuable information from data by following your guidance and using the language of your business.
-
Statistical Modelling vs Machine Learning - Aug 14, 2019.
At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem. - What is Poisson Distribution?
- Aug 14, 2019.
An solid overview of the Poisson distribution, starting from why it is needed, how it stacks up to binomial distribution, deriving its formula mathematically, and more.
- The slow, startling triumph of Reverend Bayes – John Elder’s 2019 Keynote at PAW in London
- Aug 13, 2019.
The core Bayesian idea, when learning from data, is to inject information — however slight — from outside the data. In real-world applications, meta-information is clearly needed. John Elder's Predictive Analytics World keynote covers this and more. PAW London takes place 16-17 Oct.
-
The Easy Way to Do Advanced Data Visualisation for Data Scientists - Aug 13, 2019.
Creating effective data visualisations is a core skill for data scientists. This tutorial will guide you through how to easily develop interactive visualisations using the Python library plotly. - How Creating an AI Study Group Boosted My Skills and Got Me a Job
- Aug 13, 2019.
The amount of time I had to put in to organize the AI Society left me sometimes sleep-deprived but it was definitely worth it. It was also one of the main factors why I got the job in Machine Learning after all. I hope that this article will inspire you to create your own AI study group!
- Learn how to use PySpark in under 5 minutes (Installation + Tutorial)
- Aug 13, 2019.
Apache Spark is one of the hottest and largest open source project in data processing framework with rich high-level APIs for the programming languages like Scala, Python, Java and R. It realizes the potential of bringing together both Big Data and machine learning.
- Cambridge Analytica whistleblower Chris Wylie to headline Big Data LDN 2019 keynote programme
- Aug 12, 2019.
Chris Wylie, the whistleblower who exposed Cambridge Analytica, will headline Big Data LDN 2019 programme, along with over 100 speakers at this free to attend event, Nov 13-14, London.
-
6 Key Concepts in Andrew Ng’s “Machine Learning Yearning” - Aug 12, 2019.
If you are diving into AI and machine learning, Andrew Ng's book is a great place to start. Learn about six important concepts covered to better understand how to use these tools from one of the field's best practitioners and teachers. - A 2019 Guide to Semantic Segmentation
- Aug 12, 2019.
Semantic segmentation refers to the process of linking each pixel in an image to a class label. These labels could include a person, car, flower, piece of furniture, etc., just to mention a few. We’ll now look at a number of research papers on covering state-of-the-art approaches to building semantic segmentation models.
- Top Stories, Aug 5-11: Knowing Your Neighbours: Machine Learning on Graphs; What is Benford’s Law and why is it important for data science?
- Aug 12, 2019.
Also: Deep Learning for NLP: ANNs, RNNs and LSTMs explained!; Machine Learning is Happening Now: A Survey of Organizational Adoption, Implementation, and Investment; 25 Tricks for Pandas; Getting Started with Data Science; Data Science: Scientific Discipline or Business Process?
-
12 NLP Researchers, Practitioners & Innovators You Should Be Following - Aug 12, 2019.
Check out this list of NLP researchers, practitioners and innovators you should be following, including academics, practitioners, developers, entrepreneurs, and more. - Keras Callbacks Explained In Three Minutes
- Aug 9, 2019.
A gentle introduction to callbacks in Keras. Learn about EarlyStopping, ModelCheckpoint, and other callback functions with code examples.
- Introduction to Image Segmentation with K-Means clustering
- Aug 9, 2019.
Image segmentation is the classification of an image into different groups. Many kinds of research have been done in the area of image segmentation using clustering. In this article, we will explore using the K-Means clustering algorithm to read an image and cluster different regions of the image.
- 9 Tips For Training Lightning-Fast Neural Networks In Pytorch
- Aug 9, 2019.
Who is this guide for? Anyone working on non-trivial deep learning models in Pytorch such as industrial researchers, Ph.D. students, academics, etc. The models we're talking about here might be taking you multiple days to train or even weeks or months.
-
Knowing Your Neighbours: Machine Learning on Graphs - Aug 8, 2019.
Graph Machine Learning uses the network structure of the underlying data to improve predictive outcomes. Learn how to use this modern machine learning method to solve challenges with connected data. - Inside Pluribus: Facebook’s New AI That Just Mastered the World’s Most Difficult Poker Game
- Aug 8, 2019.
The reasons why Pluribus represents a major breakthrough in AI systems might result confusing to many readers. After all, in recent years AI researchers have made tremendous progress across different complex games. However, six-player, no-limit Texas Hold’em still remains one of the most elusive challenges for AI systems.
- Data Science: Scientific Discipline or Business Process?
- Aug 8, 2019.
Simply put, data science is an attempt to understand given data using the scientific method. That's why data science is a scientific discipline. You are free (and encouraged!) to apply data science to business use cases, just as you are encouraged to apply it to many other domains.
- Top KDnuggets tweets, Jul 31 – Aug 06: NLP vs. NLU: from Understanding a Language to Its Processing
- Aug 7, 2019.
Also: Ten more random useful things in R you may not know about; 5 Probability Distributions Every Data Scientist Should Know; Machine Learning is Happening Now: A Survey of Organizational Adoption, Implementation, and Investment; Programmers rejoice! Deep TabNine offer code autocompletion with #deeplearning
- Exploratory Data Analysis Using Python
- Aug 7, 2019.
In this tutorial, you’ll use Python and Pandas to explore a dataset and create visual distributions, identify and eliminate outliers, and uncover correlations between two datasets.
-
What is Benford’s Law and why is it important for data science? - Aug 7, 2019.
Benford’s law is a little-known gem for data analytics. Learn about how this can be used for anomaly or fraud detection in scientific or technical publications. -
Deep Learning for NLP: ANNs, RNNs and LSTMs explained! - Aug 7, 2019.
Learn about Artificial Neural Networks, Deep Learning, Recurrent Neural Networks and LSTMs like never before and use NLP to build a Chatbot! - Coding Random Forests® in 100 lines of code*
- Aug 7, 2019.
There are dozens of machine learning algorithms out there. It is impossible to learn all their mechanics; however, many algorithms sprout from the most established algorithms, e.g. ordinary least squares, gradient boosting, support vector machines, tree-based algorithms and neural networks.
- Feature selection by random search in Python
- Aug 6, 2019.
Feature selection is one of the most important tasks in machine learning. Learn how to use a simple random search in Python to get good results in less time.
- 25 Tricks for Pandas
- Aug 6, 2019.
Check out this video (and Jupyter notebook) which outlines a number of Pandas tricks for working with and manipulating data, covering topics such as string manipulations, splitting and filtering DataFrames, combining and aggregating data, and more.
- Lagrange multipliers with visualizations and code
- Aug 6, 2019.
In this story, we’re going to take an aerial tour of optimization with Lagrange multipliers. When do we need them? Whenever we have an optimization problem with constraints.
- How to better manage your data science team’s workflow
- Aug 5, 2019.
This workshop, Aug 14 @ 12 PM ET, will give you the proper tools and tactics to manage the entire lifecycle of your machine learning projects, from research to exploration to development and production.
- Top Stories, Jul 29 – Aug 4: Top 10 Best Podcasts on AI, Analytics, Data Science, Machine Learning; What 70% of Data Science Learners Do Wrong
- Aug 5, 2019.
Also: GPU Accelerated Data Analytics & Machine Learning; Understanding Tensor Processing Units; Top 13 Skills To Become a Rockstar Data Scientist; Five Command Line Tools for Data Science; Ten more random useful things in R you may not know about
- [video] Introduction to Generative Adversarial Networks (for beginners and advanced Data Scientists)
- Aug 5, 2019.
Generative Adversarial Networks are driving important new technologies in deep learning methods. With so much to learn, these two videos will help you jump into your exploration with GANs and the mathematics behind the modelling.
- Machine Learning is Happening Now: A Survey of Organizational Adoption, Implementation, and Investment
- Aug 5, 2019.
This is an excerpt from a survey which sought to evaluate the relevance of machine learning in operations today, assess the current state of machine learning adoption and to identify tools used for machine learning. A link to the full report is inside.
- Getting Started With Data Science
- Aug 5, 2019.
Over the past many months, I’ve received hundreds of messages from people asking me how they could get started with Data Science. Therefore, I thought it would be useful to write down a framework for those wanting to get started with Data Science.
- Statistical Thinking for Industrial Problem Solving (STIPS) – a free online course.
- Aug 2, 2019.
This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.
- Pytorch Cheat Sheet for Beginners and Udacity Deep Learning Nanodegree
- Aug 2, 2019.
This cheatsheet should be easier to digest than the official documentation and should be a transitional tool to get students and beginners to get started reading documentations soon.
- GPU Accelerated Data Analytics & Machine Learning
- Aug 2, 2019.
The future is here! Speed up your Machine Learning workflow using Python RAPIDS libraries support.
-
What 70% of Data Science Learners Do Wrong - Aug 2, 2019.
Lessons learned from repeatedly smashing my head with a 2-meter long metal pole for a college engineering course. - Easily Deploy Deep Learning Models in Production
- Aug 1, 2019.
Getting trained neural networks to be deployed in applications and services can pose challenges for infrastructure managers. Challenges like multiple frameworks, underutilized infrastructure and lack of standard implementations can even cause AI projects to fail. This blog explores how to navigate these challenges.
- Opening Black Boxes: How to leverage Explainable Machine Learning
- Aug 1, 2019.
A machine learning model that predicts some outcome provides value. One that explains why it made the prediction creates even more value for your stakeholders. Learn how Interpretable and Explainable ML technologies can help while developing your model.
- How a simple mix of object-oriented programming can sharpen your deep learning prototype
- Aug 1, 2019.
By mixing simple concepts of object-oriented programming, like functionalization and class inheritance, you can add immense value to a deep learning prototyping code.
- A 2019 Guide to Object Detection
- Aug 1, 2019.
Object detection has been applied widely in video surveillance, self-driving cars, and object/people tracking. In this piece, we’ll look at the basics of object detection and review some of the most commonly-used algorithms and a few brand new approaches, as well.