2021 Feb

All (59) | Opinions (8) | Products, Services (4) | Tutorials, Overviews (47)

Graph Databases, Explained

Between the four main NoSQL database types, graph databases are widely appreciated for their application in handling large sets of unstructured data coming from various sources. Let’s talk about how graph databases work and what are their practical uses.

By Alex Williams on Feb 26, 2021 in Beginners, Databases, Graph Databases, NoSQL
Data Science Learning Roadmap for 2021

Venturing into the world of Data Science is an exciting, interesting, and rewarding path to consider. There is a great deal to master, and this self-learning recommendation plan will guide you toward establishing a solid understanding of all that is foundational to data science as well as a solid portfolio to showcase your developed expertise.

By Harshit Tyagi on Feb 26, 2021 in Data Engineering, Data Preparation, Data Science, Data Science Education, Python, Roadmap, SQL
Machine Learning Systems Design: A Free Stanford Course

This freely-available course from Stanford should give you a toolkit for designing machine learning systems.

By Matthew Mayo on Feb 26, 2021 in Courses, Deployment, Design, Machine Learning, Maintenance, Stanford
5 Supporting Skills That Can Help You Get a Data Science Job

If you want to stand out among your fellow applicants, here are some supporting skills you should develop.

By Devin Partida on Feb 25, 2021 in Career Advice, Data Science, Data Science Skills, Skills
6 Web Scraping Tools That Make Collecting Data A Breeze

The first step of any data science project is data collection. While it can be the most tedious and time-consuming step during your workflow, there will be no project without that data. If you are scraping information from the web, then several great tools exist that can save you a lot of time, money, and effort.

By Sara Metwalli on Feb 25, 2021 in Data Curation, Data Preparation, Data Workflow, Web Scraping
The Difficulty of Graph Anonymisation

Lessons from network science and the difficulty of graph anonymization. A data scientist's take on the difficultly of striking a balance between privacy and utility in anonymizing connected data.

By Timothy Lin on Feb 25, 2021 in Anonymized, Data Science, Graph Analytics, Graphs, Privacy, Singapore
How Reading Papers Helps You Be a More Effective Data Scientist

By reading papers, we were able to learn what others (e.g., LinkedIn) have found to work (and not work). We can then adapt their approach and not have to reinvent the rocket. This helps us deliver a working solution with lesser time and effort.

By Eugene Yan on Feb 24, 2021 in Career Advice, Data Science, Data Scientist, Research
Pandas Profiling: One-Line Magical Code for EDA

EDA can be automated using a Python library called Pandas Profiling. Let’s explore Pandas profiling to do EDA in a very short time and with just a single line code.

By Juhi Sharma on Feb 24, 2021 in Data Analysis, Data Exploration, Data Science, Pandas, Python
Using NLP to improve your Resume

This article discusses performing keyword matching and text analysis on job descriptions.

By David Moore on Feb 23, 2021 in Career Advice, NLP, Resume, Text Analysis
10 Statistical Concepts You Should Know For Data Science Interviews

Data Science is founded on time-honored concepts from statistics and probability theory. Having a strong understanding of the ten ideas and techniques highlighted here is key to your career in the field, and also a favorite topic for concept checks during interviews.

By Terence Shin on Feb 23, 2021 in Bayes Theorem, Interview Questions, Linear Regression, Logistic Regression, P-value, Sampling, Statistics
Data Observability, Part II: How to Build Your Own Data Quality Monitors Using SQL

Using schema and lineage to understand the root cause of your data anomalies.

By Moses & Kearns on Feb 23, 2021 in Data Engineering, Data Quality, Data Science, Data Science Platform, SQL
An overview of synthetic data types and generation methods

Synthetic data can be used to test new products and services, validate models, or test performances because it mimics the statistical property of production data. Today you'll find different types of structured and unstructured synthetic data.

By Devaux & Wehmeyer on Feb 22, 2021 in Autoencoder, GANs, Generative Adversarial Network, Synthetic Data
Powerful Exploratory Data Analysis in just two lines of code

EDA is a fundamental early process for any Data Science investigation. Typical approaches for visualization and exploration are powerful, but can be cumbersome for getting to the heart of your data. Now, you can get to know your data much faster with only a few lines of code... and it might even be fun!

By Francois Bertrand on Feb 22, 2021 in Data Analysis, Data Exploration, Data Visualization, Python
Inside the Architecture Powering Data Quality Management at Uber

Data Quality Monitor implements novel statistical methods for anomaly detection and quality management in large data infrastructures.

By Jesus Rodriguez on Feb 22, 2021 in Architecture, Data Quality, Uber
Cartoon: Data Scientist vs Data Engineer

New KDnuggets Cartoon examines the problems of Data Scientists vs Data Engineers.

By Gregory Piatetsky on Feb 20, 2021 in Cartoon, Data Engineer, Data Scientist, Humor
People Skills for Analytical Thinkers

Research shows that people skills are becoming more important with the rise of AI. A great way to boost these skills is by reading the new book: People Skills for Analytical Thinkers.

By Gilbert Eijkelenboom on Feb 19, 2021 in Analytics, Book, Communication, Data Science, Data Science Skills
Evaluating Deep Learning Models: The Confusion Matrix, Accuracy, Precision, and Recall

This tutorial discusses the confusion matrix, and how the precision, recall and accuracy are calculated, and how they relate to evaluating deep learning models.

By Ahmed Gad on Feb 19, 2021 in Accuracy, Confusion Matrix, Deep Learning, Metrics, Precision, Recall
Feature Store as a Foundation for Machine Learning

With so many organizations now taking the leap into building production-level machine learning models, many lessons learned are coming to light about the supporting infrastructure. For a variety of important types of use cases, maintaining a centralized feature store is essential for higher ROI and faster delivery to market. In this review, the current feature store landscape is described, and you can learn how to architect one into your MLOps pipeline.

By German Osin on Feb 19, 2021 in Data Engineering, Data Infrastructure, Data Lake, Feature Engineering, Feature Store, Machine Learning, Metadata, MLOps, Pipeline
Multidimensional multi-sensor time-series data analysis framework

This blog post provides an overview of the package “msda” useful for time-series sensor data analysis. A quick introduction about time-series data is also provided.

By Ajay Arunachalam on Feb 19, 2021 in Data Analysis, Python, Sensors, Time Series
Approaching (Almost) Any Machine Learning Problem

This freely-available book is a fantastic walkthrough of practical approaches to machine learning problems.

By Matthew Mayo on Feb 18, 2021 in Deep Learning, Free ebook, Machine Learning, Python
6 Data Science Certificates To Level Up Your Career

Anyone looking to obtain a data science certificate to prove their ability in the field will find a range of options exist. We review several valuable certificates to consider that will definitely pump up your resume and portfolio to get you closer to your dream job.

By Sara Metwalli on Feb 18, 2021 in Career Advice, Certificate, Cloudera, Data Science Certificate, Google, IBM, Microsoft Azure, SAS, TensorFlow
Forecasting Stories 5: The story of the launch

New products forecasting can be very difficult - there is no history to start with, and hence no base line. The number of assumptions can be huge. The best way to forecast then, is to try parallel approaches, build different views and triangulate on a common range.

By Rajneet Kaur on Feb 18, 2021 in Analytics, Business, Forecasting
Distributed and Scalable Machine Learning [Webinar]

Mike McCarty and Gil Forsyth work at the Capital One Center for Machine Learning, where they are building internal PyData libraries that scale with Dask and RAPIDS. For this webinar, Feb 23 @ 2 pm PST, 5pm EST, they’ll join Hugo Bowne-Anderson and Matthew Rocklin to discuss their journey to scale data science and machine learning in Python.

By Coiled.io on Feb 17, 2021 in Capital One, Dask, Distributed, Machine Learning, Python, scikit-learn, XGBoost
GPT-2 vs GPT-3: The OpenAI Showdown

Thanks to the diversity of the dataset used in the training process, we can obtain adequate text generation for text from a variety of domains. GPT-2 is 10x the parameters and 10x the data of its predecessor GPT.

By Kevin Vu on Feb 17, 2021 in GPT-2, GPT-3, Natural Language Generation, NLP, OpenAI, Transformer
10 resources for data science self-study

Many resources exist for the self-study of data science. In our modern age of information technology, an enormous amount of free learning resources are available to anyone, and with effort and dedication, you can master the fundamentals of data science.

By Benjamin Obi Tayo on Feb 17, 2021 in Data Science, Data Science Certificate, Data Science Education, Kaggle, MOOC, Python, Youtube
Deep Learning-based Real-time Video Processing

In this article, we explore how to build a pipeline and process real-time video with Deep Learning to apply this approach to business use cases overviewed in our research.

By Serhii Maksymenko on Feb 17, 2021 in Computer Vision, Deep Learning, Neural Networks, Video recognition
Data Observability: Building Data Quality Monitors Using SQL

To trigger an alert when data breaks, data teams can leverage a tried and true tactic from our friends in software engineering: monitoring and observability. In this article, we walk through how you can create your own data quality monitors for freshness and distribution from scratch using SQL.

By Kearns & Moses on Feb 16, 2021 in Data Engineering, Data Quality, Data Science, Data Science Platform, SQL
Hugging Face Transformers Package – What Is It and How To Use It

The rapid development of Transformers have brought a new wave of powerful tools to natural language processing. These models are large and very expensive to train, so pre-trained versions are shared and leveraged by researchers and practitioners. Hugging Face offers a wide variety of pre-trained transformers as open-source libraries, and you can incorporate these with only one line of code.

By Nagesh Chauhan on Feb 16, 2021 in Deep Learning, Hugging Face, Natural Language Generation, NLP, PyTorch, TensorFlow, Transformer, Zero-shot Learning
Easy, Open-Source AutoML in Python with EvalML

We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.

By Dylan Sherry on Feb 16, 2021 in Automated Machine Learning, AutoML, Machine Learning, Open Source, Python
IBM Uses Continual Learning to Avoid The Amnesia Problem in Neural Networks

Using continual learning might avoid the famous catastrophic forgetting problem in neural networks.

By Jesus Rodriguez on Feb 15, 2021 in IBM, Learning, Neural Networks, Training
Essential Math for Data Science: Scalars and Vectors

Linear algebra is the branch of mathematics that studies vector spaces. You’ll see how vectors constitute vector spaces and how linear algebra applies linear transformations to these spaces. You’ll also learn the powerful relationship between sets of linear equations and vector equations.

By Hadrien Jean on Feb 12, 2021 in Data Science, Linear Algebra, Mathematics
6 NLP Techniques Every Data Scientist Should Know

Natural language processing has already begun to transform to way humans interact with computers, and its advances are moving rapidly. The field is built on core methods that must first be understood, with which you can then launch your data science projects to a new level of sophistication and value.

By Sara Metwalli on Feb 12, 2021 in NLP, Sentiment Analysis, Text Summarization, Topic Modeling
Column-Oriented Databases, Explained

NoSQL Databases have four distinct types. Key-value stores, document-stores, graph databases, and column-oriented databases. In this article, we’ll explore column-oriented databases, also known simply as “NoSQL columns”.

By Alex Williams on Feb 12, 2021 in Beginners, Databases, NoSQL, Programming
Online MS in Data Science from Northwestern

Advance your data science career with Northwestern. Build the essential technical, analytical, and leadership skills needed for careers in today's data-driven world in Northwestern's Master of Science in Data Science program. Apply now.

By Northwestern on Feb 11, 2021 in Data Science, MS in Data Science, Northwestern, Online Education
How to Speed up Scikit-Learn Model Training

Scikit-Learn is an easy to use a Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time?

By Michael Galarnyk on Feb 11, 2021 in Distributed Systems, Hyperparameter, Machine Learning, Optimization, Parallelism, Python, scikit-learn, Training
Machine Learning – it’s all about assumptions

Just as with most things in life, assumptions can directly lead to success or failure. Similarly in machine learning, appreciating the assumed logic behind machine learning techniques will guide you toward applying the best tool for the data.

By Vishal Mendekar on Feb 11, 2021 in Algorithms, Decision Trees, K-nearest neighbors, Linear Regression, Logistic Regression, Machine Learning, Naive Bayes, SVM, XGBoost
A Critical Comparison of Machine Learning Platforms in an Evolving Market

There’s a clear inclination towards the MLaaS model across industries, given the fact that companies today have an option to select from a wide range of solutions that can cater to diverse business needs. Here is a look at 3 of the top ML platforms for data excellence.

By Vivek Jain on Feb 11, 2021 in Google Cloud, IBM Watson, Machine Learning, Microsoft Azure, Platform
Explore Molecular Engineering at UChicago

Today’s engineers need to be equipped with the tools to take on leadership positions across industries. The new master’s program at the University of Chicago’s Pritzker School of Molecular Engineering will provide you with a streamlined and flexible degree to give you broad exposure across science and engineering disciplines, while preparing you for the immediate next step in your professional journey.

By U. of Chicago on Feb 10, 2021 in Biomedical, Chicago, Engineering, Science, U. of Chicago
My machine learning model does not learn. What should I do?

This article presents 7 hints on how to get out of the quicksand.

By Silipo & Arenas on Feb 10, 2021 in Algorithms, Business Context, Data Quality, Hyperparameter, Machine Learning, Modeling, Tips
7 Most Recommended Skills to Learn to be a Data Scientist

The Data Scientist professional has emerged as a true interdisciplinary role that spans a variety of skills, theoretical and practical. For the core, day-to-day activities, many critical requirements that enable the delivery of real business value reach well outside the realm of machine learning, and should be mastered by those aspiring to the field.

By Terence Shin on Feb 10, 2021 in Career Advice, Data Science Skills, Data Scientist, Data Visualization, Docker, Pandas, Python, SQL
Data Science vs Business Intelligence, Explained

Knowing the differences between the business intelligence and data science is more than just a matter of semantics.

By Stan Pugsley on Feb 10, 2021 in BI, Business Intelligence, Data Science, Explained
How to Deploy a Flask API in Kubernetes and Connect it with Other Micro-services

A hands-on tutorial on how to implement your micro-service architecture using the powerful container orchestration tool Kubernetes.

By Rik Kraan on Feb 9, 2021 in API, Containers, Flask, Kubernetes, MySQL, Python, SQL
Who is fit to lead data science?

Data science success depends on leaders, not the latest hands-on programming skills. So, we need to start looking for the right leadership skills and stop stuffing job postings with requirements for experience in the most current development tools.

By Polly Mitchell-Guthrie on Feb 9, 2021 in Business, Career Advice, Data Leadership, Data Science, Data Scientist, TensorFlow
Adversarial Attacks on Explainable AI

Are explainability methods black-box themselves?

By Hubert Baniecki on Feb 9, 2021 in Adversarial, AI, Explainability, Explainable AI
Microsoft Explores Three Key Mysteries of Ensemble Learning

A new paper studies three key puzzling characteristics of deep learning ensembles and some potential explanations.

By Jesus Rodriguez on Feb 8, 2021 in Ensemble Methods, Machine Learning, Microsoft
How to Get Data Science Interviews: Finding Jobs, Reaching Gatekeepers, and Getting Referrals

In this post, the author shares what to do to get job interviews efficiently. Find answers to these questions: Where should I look for data science jobs? How do I reach out to the gatekeeper? How do I get referrals? What makes a good data science resume?

By Emma Ding on Feb 8, 2021 in Career Advice, Careers, Data Science, Jobs
The Best Data Science Project to Have in Your Portfolio

If you are trying to find your first path into a Data Science career, then demonstrating the quality of your skills can be the greatest hurdle. While many standard projects exist for anyone to complete, creating an original data-driven project that attempts to solve some challenge is worth so much more. A good Data Scientist is one that can solve data-related questions, and a great Data Scientist poses original data-related questions and then solves.

By Soner Yıldırım on Feb 8, 2021 in Career Advice, Data Science, Portfolio
Essential Math for Data Science: Introduction to Matrices and the Matrix Product

As vectors, matrices are data structures allowing you to organize numbers. They are square or rectangular arrays containing values organized in two dimensions: as rows and columns. You can think of them as a spreadsheet. Learn more here.

By Hadrien Jean on Feb 5, 2021 in Data Science, Linear Algebra, Mathematics, numpy, Python
Deep learning doesn’t need to be a black box

The cultural perception of AI is often suspect because of the described challenges in knowing why a deep neural network makes its predictions. So, researchers try to crack open this "black box" after a network is trained to correlate results with inputs. But, what if the goal of explainability could be designed into the network's architecture -- before the model is trained and without reducing its predictive power? Maybe the box could stay open from the beginning.

By Ben Dickson on Feb 5, 2021 in Convolutional Neural Networks, Deep Learning, Explainability, Explainable AI, Image Recognition
Backcasting: Building an Accurate Forecasting Model for Your Business

This article will shed some light on processes happening under the roof of ML-based solutions on the example of the business case where the future success directly depends on the ability to predict unknown values from the past.

By Lena Boichuk on Feb 5, 2021 in Business, Forecasting, Modeling
Build Your First Data Science Application

Check out these seven Python libraries to make your first data science MVP application.

By Naser Tamimi on Feb 4, 2021 in API, Data Science, Jupyter, Keras, numpy, Pandas, Plotly, Python, PyTorch, scikit-learn
How to create stunning visualizations using python from scratch

Data science and data analytics can be beautiful things. Not only because of the insights and enhancements to decision-making they can provide, but because of the rich visualizations about the data that can be created. Following this step-by-step guide using the Matplotlib and Seaborn libraries will help you improve the presentation and effective communication of your work.

By Sharan Kumar R on Feb 4, 2021 in Data Visualization, Matplotlib, Python, Seaborn
2011: DanNet triggers deep CNN revolution

In 2021, we are celebrating the 10-year anniversary of DanNet, which, in 2011, was the first pure deep convolutional neural network (CNN) to win computer vision contests. Read about its history here.

By Jürgen Schmidhuber on Feb 4, 2021 in AI, Convolutional Neural Networks, History, Jurgen Schmidhuber, Neural Networks
Getting Started with 5 Essential Natural Language Processing Libraries

This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.

By Matthew Mayo on Feb 3, 2021 in Data Preparation, Data Preprocessing, Data Visualization, Hugging Face, NLP, Python, spaCy, Text Analytics, Transformer
Saving and loading models in TensorFlow — why it is important and how to do it

So much time and effort can go into training your machine learning models. But, shut down the notebook or system, and all those trained weights and more vanish with the memory flush. Saving your models to maximize reusability is key for efficient productivity.

By Ahmad Anis on Feb 3, 2021 in Deep Learning, Machine Learning, TensorFlow
Adversarial generation of extreme samples

In order to mitigate risks when modelling extreme events, it is vital to be able to generate a wide range of extreme, and realistic, scenarios. Researchers from the National University of Singapore and IIT Bombay have developed an approach to do just that.

By Lucy Smith on Feb 2, 2021 in AI, GANs, Generative Adversarial Network, Generative Models, Sampling
Vision Transformers: Natural Language Processing (NLP) Increases Efficiency and Model Generality

Why do we hear so little about transformer models applied to computer vision tasks? What about attention in computer vision networks?

By Kevin Vu on Feb 2, 2021 in Attention, Efficiency, Modeling, NLP, Transformer
One question to make your data project 10x more valuable

If you are the "data person" for your organization, then providing meaningful results to stakeholder data requests can sometimes feel like shots in the dark. However, you can make sure your data analysis is actionable by asking one magic question before getting started.

By Brittany Davis on Feb 1, 2021 in Advice, Business, Data Analysis, Data Mining, Data Science, Deployment, Problem Definition
Beyond the Nash Equilibrium: DeepMind Clever Strategy to Solve Asymmetric Games

The method expands the concept of a Nash equilibrium by decomposing an asymmetric game into multiple symmetric games.

By Jesus Rodriguez on Feb 1, 2021 in DeepMind, Game Theory, Google

2021 Feb

Latest Posts

Top Posts