2020 Dec Tutorials, Overviews

All (68) | News, Education (6) | Opinions (14) | Top Stories, Tweets (2) | Tutorials, Overviews (46)

Meet whale! The stupidly simple data discovery tool

Finding data and understanding its meaning represents the traditional "daily grind" of a Data Scientist. With whale, the new lightweight data discovery, documentation, and quality engine for your data warehouse that is under development by Dataframe, your data science team will more efficiently search data and automate its data metrics.

By Robert Yi on Dec 31, 2020 in Data Curation, Data Discovery, Data Preparation, Data Warehouse
15 Free Data Science, Machine Learning & Statistics eBooks for 2021

We present a curated list of 15 free eBooks compiled in a single location to close out the year.

By Matthew Mayo on Dec 31, 2020 in Automated Machine Learning, Data Science, Deep Learning, Free ebook, Machine Learning, NLP, Python, R, Statistics
Data Science as a Product – Why Is It So Hard?

Developing machine learning models as products that deliver business value remains a new field with uncharted paths toward success. Applying well-established software development approaches, such as agile, is not straightforward, but may still offer a solid foundation to guide success.

By Tad Slaff on Dec 30, 2020 in Agile, Data Science, Deployment, Product
Generating Beautiful Neural Network Visualizations

If you are looking to easily generate visualizations of neural network architectures, PlotNeuralNet is a project you should check out.

By Matthew Mayo on Dec 30, 2020 in Neural Networks, Python, Visualization
Key Data Science Algorithms Explained: From k-means to k-medoids clustering

As a core method in the Data Scientist's toolbox, k-means clustering is valuable but can be limited based on the structure of the data. Can expanded methods like PAM (partitioning around medoids), CLARA, and CLARANS provide better solutions, and what is the future of these algorithms?

By Arushi Prakash on Dec 29, 2020 in Algorithms, Clustering, Explained, K-means
Essential Math for Data Science: The Poisson Distribution

The Poisson distribution, named after the French mathematician Denis Simon Poisson, is a discrete distribution function describing the probability that an event will occur a certain number of times in a fixed time (or space) interval.

By Hadrien Jean on Dec 29, 2020 in Data Science, Distribution, Mathematics, Poisson Distribution
2020: A Year Full of Amazing AI Papers — A Review

So much happened in the world during 2020 that it may have been easy to miss the great progress in the world of AI. To catch you up quickly, check out this curated list of the latest breakthroughs in AI by release date, along with a video explanation, link to an in-depth article, and code.

By Louis (What's AI) Bouchard on Dec 28, 2020 in AI, Research, Trends
Monte Carlo integration in Python

A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?

By Tirthajyoti Sarkar on Dec 24, 2020 in Monte Carlo, Python, Simulation, Statistics
XGBoost: What it is, and when to use it

XGBoost is a tree based ensemble machine learning algorithm which is a scalable machine learning system for tree boosting. Read more for an overview of the parameters that make it work, and when you would use the algorithm.

By Harish Krishna on Dec 23, 2020 in Algorithms, Ensemble Methods, XGBoost
Resampling Imbalanced Data and Its Limits

Can resampling tackle the problem of too few fraudulent transactions in credit card fraud detection?

By Maarit Widmann on Dec 22, 2020 in Balancing Classes, Bootstrap sampling, Fraud Detection, Knime, Sampling, Unbalanced
Feature Store vs Data Warehouse

A feature store is a data warehouse of features for machine learning. Differently from a data warehouse, it is dual-database: one serving features at low latency to online applications and another storing large volumes of features. Learn how Data Scientists leverage this capability in production-deployed models.

By Jim Dowling on Dec 22, 2020 in Data Warehouse, Databases, Feature Store, Pipeline
Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance

A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.

By Alejandro Saucedo on Dec 21, 2020 in AI, Deployment, Explainable AI, Machine Learning, Modeling, Outliers, Production, Python
Fast and Intuitive Statistical Modeling with Pomegranate

Pomegranate is a delicious fruit. It can also be a super useful Python library for statistical analysis. We will show how in this article.

By Tirthajyoti Sarkar on Dec 21, 2020 in Distribution, Markov Chains, Probability, Python, Statistical Modeling
Optimization Algorithms in Neural Networks

This article presents an overview of some of the most used optimizers while training a neural network.

By Nagesh Singh Chauhan on Dec 18, 2020 in Gradient Descent, Neural Networks, Optimization
Undersampling Will Change the Base Rates of Your Model’s Predictions

In classification problems, the proportion of cases in each class largely determines the base rate of the predictions produced by the model. Therefore if you use sampling techniques that change this proportion, there is a good chance you will want to rescale / calibrate your predictions before using them in the wild.

By Bryan Shalloway on Dec 17, 2020 in Classification, Modeling, Predictions, R, Sampling
Crack SQL Interviews

SQL is an essential programming language for data analysis and processing. So, SQL questions are always part of the interview process for data science-related jobs, including data analysts, data scientists, and data engineers. Become familiar with these common patterns seen in SQL interview questions and follow our tips on how to neatly handle each with SQL queries.

By Xinran Waibel on Dec 17, 2020 in Interview Questions, SQL
8 Places for Data Professionals to Find Datasets

Here is a curated list of sites and resources invaluable for data professionals to acquire practice datasets.

By Devin Partida on Dec 17, 2020 in Data Science, Datasets, Google, Government, Kaggle, Reddit, UCI
How to use Machine Learning for Anomaly Detection and Conditional Monitoring

This article explains the goals of anomaly detection and outlines the approaches used to solve specific use cases for anomaly detection and condition monitoring.

By Michael Garbade on Dec 16, 2020 in Anomaly Detection, Machine Learning, Python, scikit-learn, Unsupervised Learning
How to Clean Text Data at the Command Line

A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook.

By Ezz El Din Abdullah on Dec 16, 2020 in Data Preprocessing, Data Processing, NLP, Text Analytics
Data Science and Machine Learning: The Free eBook

Check out the newest addition to our free eBook collection, Data Science and Machine Learning: Mathematical and Statistical Methods, and start building your statistical learning foundation today.

By Matthew Mayo on Dec 15, 2020 in Data Science, Free ebook, Machine Learning, Python
How to Create Custom Real-time Plots in Deep Learning

How to generate real-time visualizations of custom metrics while training a deep learning model using Keras callbacks.

By Tirthajyoti Sarkar on Dec 14, 2020 in Data Visualization, Deep Learning, Keras, Metrics, Neural Networks, Python
Facebook Open Sources ReBeL, a New Reinforcement Learning Agent

The new model tries to recreate the reinforcement learning and search methods used by AlphaZero in imperfect information scenarios.

By Jesus Rodriguez on Dec 14, 2020 in Agents, AI, Facebook, Open Source, Reinforcement Learning
Matrix Decomposition Decoded

This article covers matrix decomposition, as well as the underlying concepts of eigenvalues (lambdas) and eigenvectors, as well as discusses the purpose behind using matrix and vectors in linear algebra.

By Tanveer Sayyed on Dec 11, 2020 in Linear Algebra, Mathematics, numpy, PCA, Python
Data Science Volunteering: Ways to Help

No matter the field in which you hold some expertise, sharing your skills to benefit the lives of others or supporting non-profit organizations that try to make the world a better place is a noble and time-worthy personal pursuit. Many opportunities exist in data science to contribute to meaningful projects and crucial needs from your local community to a global scale.

By Susan Sivek on Dec 11, 2020 in Alteryx, Data Science, Social Good
A Rising Library Beating Pandas in Performance

This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.

By Ezz El Din Abdullah on Dec 11, 2020 in Data Processing, Pandas, Performance, Python
10 Python Skills They Don’t Teach in Bootcamp

Ascend to new heights in Data Science and Machine Learning with this thrilling list of coding tips.

on Dec 11, 2020 in Bootcamp, Programming, Python
Implementing the AdaBoost Algorithm From Scratch

AdaBoost technique follows a decision tree model with a depth equal to one. AdaBoost is nothing but the forest of stumps rather than trees. AdaBoost works by putting more weight on difficult to classify instances and less on those already handled well. AdaBoost algorithm is developed to solve both classification and regression problem. Learn to build the algorithm from scratch here.

on Dec 10, 2020 in Adaboost, Algorithms, Ensemble Methods, Machine Learning, Python
Data Compression via Dimensionality Reduction: 3 Main Methods

Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.

on Dec 10, 2020 in Compression, Dimensionality Reduction, LDA, PCA, Python
AI registers: finally, a tool to increase transparency in AI/ML

Transparency, explainability, and trust are pressing topics in AI/ML today. While much has been written about why they are important and what you need to do, no tools have existed until now.

on Dec 9, 2020 in AI, Bias, Ethics, Explainability, Helsinki, Machine Learning, Trust
R or Python? Why Not Both?

Do you use both R and Python, either in different projects or in the same? Check out prython, an IDE designed to handle your needs.

on Dec 9, 2020 in Data Analysis, Data Science, IDE, Programming, Python, R
20 Core Data Science Concepts for Beginners

With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics.

By Benjamin Obi Tayo on Dec 8, 2020 in Beginners, Bias, Cross-validation, Data Science, Data Visualization, Data Wrangling, Outliers, PCA, Variance
5 Free Books to Learn Statistics for Data Science

Learn all the statistics you need for data science for free.

on Dec 8, 2020 in Data Science, Free ebook, Statistics
Merging Pandas DataFrames in Python

A quick how-to guide for merging Pandas DataFrames in Python.

on Dec 8, 2020 in Data Preparation, Data Preprocessing, Data Processing, Pandas, Python
Essential Math for Data Science: Probability Density and Probability Mass Functions

In this article, we’ll cover probability mass and probability density function in this sample. You’ll see how to understand and represent these distribution functions and their link with histograms.

on Dec 7, 2020 in Data Science, Mathematics, Probability, Statistics
The Ultimate Guide to Data Engineer Interviews

If you are preparing for data engineering interviews, then follow these technical recommendations regarding your resume, programming skills, SQL acumen, and system design problem-solving, as well as the non-technical aspects of your upcoming interview session.

on Dec 7, 2020 in Career Advice, Data Engineer, Data Engineering, Interview Questions, Programming, SQL
Change the Background of Any Video with 5 Lines of Code

Learn to blur, color, grayscale and create a virtual background for a video with PixelLib.

on Dec 7, 2020 in Computer Vision, Image Processing, Machine Learning, Python, Segmentation, Video
Pruning Machine Learning Models in TensorFlow

Read this overview to learn how to make your models smaller via pruning.

on Dec 4, 2020 in Machine Learning, Modeling, Python, TensorFlow
Introduction to Data Engineering

The Q&A for the most frequently asked questions about Data Engineering: What does a data engineer do? What is a data pipeline? What is a data warehouse? How is a data engineer different from a data scientist? What skills and programming languages do you need to learn to become a data engineer?

By Xinran Waibel on Dec 3, 2020 in Analytics, Data Engineer, Data Engineering, Data Science, Skills
10 Python Skills for Beginners

Python is the fastest growing, most-beloved programming language. Get started with these Data Science tips.

on Dec 3, 2020 in Data Science, Programming, Python, Tips
Building AI Models for High-Frequency Streaming Data

This post is the first in a two-part series on AI for streaming data. Here, we’ll walk through strategies for aligning times and resampling the data.

on Dec 2, 2020 in MathWorks, MATLAB, Streaming Analytics, Time Series
Simple & Intuitive Ensemble Learning in R

Read about metaEnsembleR, an R package for heterogeneous ensemble meta-learning (classification and regression) that is fully-automated.

on Dec 2, 2020 in Classification, Ensemble Methods, R, Regression
Roadmaps to becoming a Full-Stack AI Developer, Data Scientist, Machine Learning Engineer, and more

As the fields related to AI and Data Science expand, they are becoming complex with more options and specializations to consider. If you are beginning your journey toward becoming an expert in Artificial Intelligence, this roadmap will guide you to find your path along what to learn next while steering clear of the latest hype.

on Dec 2, 2020 in Advice, AI, Career, Data Scientist, Developer, Learning Path, Machine Learning Engineer, Roadmap
NoSQL for Beginners

NoSQL can offer an advantage to those who are entering Data Science and Analytics, as well as having applications with high-performance needs that aren’t met by traditional SQL databases.

on Dec 2, 2020 in Beginners, Data Science, Database, NoSQL
Remembering Pluribus: The Techniques that Facebook Used to Master World’s Most Difficult Poker Game

Pluribus used incredibly simple AI methods to set new records in six-player no-limit Texas Hold’em poker. How did it do it?

on Dec 1, 2020 in AI, Facebook, Poker
14 Data Science projects to improve your skills

There's a lot of data out there and so many data science techniques to master or review. Check out these great project ideas from easy to advanced difficulty levels to develop new skills and strengthen your portfolio.

on Dec 1, 2020 in Data Exploration, Data Science, Data Science Skills, Data Visualization, Prediction, Project
Object-Oriented Programming Explained Simply for Data Scientists

Read this simple but effective guide to start using Classes in Python 3.

on Dec 1, 2020 in Data Science, Data Scientist, Explained, Programming, Python

2020 Dec Tutorials, Overviews

Latest Posts

Top Posts