2020 Dec Tutorials, Overviews
All (82) | News, Education (12) | Opinions (15) | Top Stories, Tweets (8) | Tutorials, Overviews (47)
- Meet whale! The stupidly simple data discovery tool, by Robert Yi - Dec 31, 2020.
Finding data and understanding its meaning represents the traditional "daily grind" of a Data Scientist. With whale, the new lightweight data discovery, documentation, and quality engine for your data warehouse that is under development by Dataframe, your data science team will more efficiently search data and automate its data metrics.
- 15 Free Data Science, Machine Learning & Statistics eBooks for 2021, by Matthew Mayo - Dec 31, 2020.
We present a curated list of 15 free eBooks compiled in a single location to close out the year.
- Data Science as a Product – Why Is It So Hard?, by Tad Slaff - Dec 30, 2020.
Developing machine learning models as products that deliver business value remains a new field with uncharted paths toward success. Applying well-established software development approaches, such as agile, is not straightforward, but may still offer a solid foundation to guide success.
- Generating Beautiful Neural Network Visualizations, by Matthew Mayo - Dec 30, 2020.
If you are looking to easily generate visualizations of neural network architectures, PlotNeuralNet is a project you should check out.
- Key Data Science Algorithms Explained: From k-means to k-medoids clustering, by Arushi Prakash - Dec 29, 2020.
As a core method in the Data Scientist's toolbox, k-means clustering is valuable but can be limited based on the structure of the data. Can expanded methods like PAM (partitioning around medoids), CLARA, and CLARANS provide better solutions, and what is the future of these algorithms?
- Essential Math for Data Science: The Poisson Distribution, by Hadrien Jean - Dec 29, 2020.
The Poisson distribution, named after the French mathematician Denis Simon Poisson, is a discrete distribution function describing the probability that an event will occur a certain number of times in a fixed time (or space) interval.
- 2020: A Year Full of Amazing AI Papers — A Review, by Louis (What's AI) Bouchard - Dec 28, 2020.
So much happened in the world during 2020 that it may have been easy to miss the great progress in the world of AI. To catch you up quickly, check out this curated list of the latest breakthroughs in AI by release date, along with a video explanation, link to an in-depth article, and code.
- Monte Carlo integration in Python, by Tirthajyoti Sarkar - Dec 24, 2020.
A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python?
- SQL vs NoSQL: 7 Key Takeaways, by Alex Williams - Dec 23, 2020.
People assume that NoSQL is a counterpart to SQL. Instead, it’s a different type of database designed for use-cases where SQL is not ideal. The differences between the two are many, although some are so crucial that they define both databases at their cores.
- XGBoost: What it is, and when to use it, by Harish Krishna - Dec 23, 2020.
XGBoost is a tree based ensemble machine learning algorithm which is a scalable machine learning system for tree boosting. Read more for an overview of the parameters that make it work, and when you would use the algorithm.
- Resampling Imbalanced Data and Its Limits, by Maarit Widmann - Dec 22, 2020.
Can resampling tackle the problem of too few fraudulent transactions in credit card fraud detection?
- Feature Store vs Data Warehouse, by Jim Dowling - Dec 22, 2020.
A feature store is a data warehouse of features for machine learning. Differently from a data warehouse, it is dual-database: one serving features at low latency to online applications and another storing large volumes of features. Learn how Data Scientists leverage this capability in production-deployed models.
- Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance, by Alejandro Saucedo - Dec 21, 2020.
A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.
- Fast and Intuitive Statistical Modeling with Pomegranate, by Tirthajyoti Sarkar - Dec 21, 2020.
Pomegranate is a delicious fruit. It can also be a super useful Python library for statistical analysis. We will show how in this article.
- Optimization Algorithms in Neural Networks, by Nagesh Singh Chauhan - Dec 18, 2020.
This article presents an overview of some of the most used optimizers while training a neural network.
- Undersampling Will Change the Base Rates of Your Model’s Predictions, by Bryan Shalloway - Dec 17, 2020.
In classification problems, the proportion of cases in each class largely determines the base rate of the predictions produced by the model. Therefore if you use sampling techniques that change this proportion, there is a good chance you will want to rescale / calibrate your predictions before using them in the wild.
- Crack SQL Interviews, by Xinran Waibel - Dec 17, 2020.
SQL is an essential programming language for data analysis and processing. So, SQL questions are always part of the interview process for data science-related jobs, including data analysts, data scientists, and data engineers. Become familiar with these common patterns seen in SQL interview questions and follow our tips on how to neatly handle each with SQL queries.
- 8 Places for Data Professionals to Find Datasets, by Devin Partida - Dec 17, 2020.
Here is a curated list of sites and resources invaluable for data professionals to acquire practice datasets.
- How to use Machine Learning for Anomaly Detection and Conditional Monitoring, by Michael Garbade - Dec 16, 2020.
This article explains the goals of anomaly detection and outlines the approaches used to solve specific use cases for anomaly detection and condition monitoring.
- How to Clean Text Data at the Command Line, by Ezz El Din Abdullah - Dec 16, 2020.
A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook.
- Data Science and Machine Learning: The Free eBook, by Matthew Mayo - Dec 15, 2020.
Check out the newest addition to our free eBook collection, Data Science and Machine Learning: Mathematical and Statistical Methods, and start building your statistical learning foundation today.
- How to Create Custom Real-time Plots in Deep Learning, by Tirthajyoti Sarkar - Dec 14, 2020.
How to generate real-time visualizations of custom metrics while training a deep learning model using Keras callbacks.
- Facebook Open Sources ReBeL, a New Reinforcement Learning Agent, by Jesus Rodriguez - Dec 14, 2020.
The new model tries to recreate the reinforcement learning and search methods used by AlphaZero in imperfect information scenarios.
- Matrix Decomposition Decoded, by Tanveer Sayyed - Dec 11, 2020.
This article covers matrix decomposition, as well as the underlying concepts of eigenvalues (lambdas) and eigenvectors, as well as discusses the purpose behind using matrix and vectors in linear algebra.
- Data Science Volunteering: Ways to Help, by Susan Sivek - Dec 11, 2020.
No matter the field in which you hold some expertise, sharing your skills to benefit the lives of others or supporting non-profit organizations that try to make the world a better place is a noble and time-worthy personal pursuit. Many opportunities exist in data science to contribute to meaningful projects and crucial needs from your local community to a global scale.
- A Rising Library Beating Pandas in Performance, by Ezz El Din Abdullah - Dec 11, 2020.
This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.
- 10 Python Skills They Don’t Teach in Bootcamp - Dec 11, 2020.
Ascend to new heights in Data Science and Machine Learning with this thrilling list of coding tips.
- Implementing the AdaBoost Algorithm From Scratch - Dec 10, 2020.
AdaBoost technique follows a decision tree model with a depth equal to one. AdaBoost is nothing but the forest of stumps rather than trees. AdaBoost works by putting more weight on difficult to classify instances and less on those already handled well. AdaBoost algorithm is developed to solve both classification and regression problem. Learn to build the algorithm from scratch here.
- Data Compression via Dimensionality Reduction: 3 Main Methods - Dec 10, 2020.
Lift the curse of dimensionality by mastering the application of three important techniques that will help you reduce the dimensionality of your data, even if it is not linearly separable.
- AI registers: finally, a tool to increase transparency in AI/ML - Dec 9, 2020.
Transparency, explainability, and trust are pressing topics in AI/ML today. While much has been written about why they are important and what you need to do, no tools have existed until now.
- R or Python? Why Not Both? - Dec 9, 2020.
Do you use both R and Python, either in different projects or in the same? Check out prython, an IDE designed to handle your needs.
- 20 Core Data Science Concepts for Beginners, by Benjamin Obi Tayo - Dec 8, 2020.
With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics.
- 5 Free Books to Learn Statistics for Data Science - Dec 8, 2020.
Learn all the statistics you need for data science for free.
- Merging Pandas DataFrames in Python - Dec 8, 2020.
A quick how-to guide for merging Pandas DataFrames in Python.
- Essential Math for Data Science: Probability Density and Probability Mass Functions - Dec 7, 2020.
In this article, we’ll cover probability mass and probability density function in this sample. You’ll see how to understand and represent these distribution functions and their link with histograms.
- The Ultimate Guide to Data Engineer Interviews - Dec 7, 2020.
If you are preparing for data engineering interviews, then follow these technical recommendations regarding your resume, programming skills, SQL acumen, and system design problem-solving, as well as the non-technical aspects of your upcoming interview session.
- Change the Background of Any Video with 5 Lines of Code - Dec 7, 2020.
Learn to blur, color, grayscale and create a virtual background for a video with PixelLib.
- Pruning Machine Learning Models in TensorFlow - Dec 4, 2020.
Read this overview to learn how to make your models smaller via pruning.
- Introduction to Data Engineering, by Xinran Waibel - Dec 3, 2020.
The Q&A for the most frequently asked questions about Data Engineering: What does a data engineer do? What is a data pipeline? What is a data warehouse? How is a data engineer different from a data scientist? What skills and programming languages do you need to learn to become a data engineer?
- 10 Python Skills for Beginners - Dec 3, 2020.
Python is the fastest growing, most-beloved programming language. Get started with these Data Science tips.
- Building AI Models for High-Frequency Streaming Data - Dec 2, 2020.
This post is the first in a two-part series on AI for streaming data. Here, we’ll walk through strategies for aligning times and resampling the data.
- Simple & Intuitive Ensemble Learning in R - Dec 2, 2020.
Read about metaEnsembleR, an R package for heterogeneous ensemble meta-learning (classification and regression) that is fully-automated.
- Roadmaps to becoming a Full-Stack AI Developer, Data Scientist, Machine Learning Engineer, and more - Dec 2, 2020.
As the fields related to AI and Data Science expand, they are becoming complex with more options and specializations to consider. If you are beginning your journey toward becoming an expert in Artificial Intelligence, this roadmap will guide you to find your path along what to learn next while steering clear of the latest hype.
- NoSQL for Beginners - Dec 2, 2020.
NoSQL can offer an advantage to those who are entering Data Science and Analytics, as well as having applications with high-performance needs that aren’t met by traditional SQL databases.
- Remembering Pluribus: The Techniques that Facebook Used to Master World’s Most Difficult Poker Game - Dec 1, 2020.
Pluribus used incredibly simple AI methods to set new records in six-player no-limit Texas Hold’em poker. How did it do it?
- 14 Data Science projects to improve your skills - Dec 1, 2020.
There's a lot of data out there and so many data science techniques to master or review. Check out these great project ideas from easy to advanced difficulty levels to develop new skills and strengthen your portfolio.
- Object-Oriented Programming Explained Simply for Data Scientists - Dec 1, 2020.
Read this simple but effective guide to start using Classes in Python 3.