Tutorials, Overviews
DataCamp - Easiest Way to Learn Data Science
![]() Learning Python? Take this Intro to Python for Data Science Tutorial Now on Sale. |
![]() Learning R? Take this Intro to R for Data Science Tutorial Now on Sale. |
Latest:
-
Essential Math for Data Science: Information Theory - Jan 15, 2021.
In the context of machine learning, some of the concepts of information theory are used to characterize or compare probability distributions. Read up on the underlying math to gain a solid understanding of relevant aspects of information theory. -
K-Means 8x faster, 27x lower error than Scikit-learn in 25 lines - Jan 15, 2021.
K-means clustering is a powerful algorithm for similarity searches, and Facebook AI Research's faiss library is turning out to be a speed champion. With only a handful of lines of code shared in this demonstration, faiss outperforms the implementation in scikit-learn in speed and accuracy. -
Cleaner Data Analysis with Pandas Using Pipes - Jan 15, 2021.
Check out this practical guide on Pandas pipes. -
Data Cleaning and Wrangling in SQL - Jan 14, 2021.
SQL is a foundational skill for data analysts but its application is sometimes limited within the data pipeline. However, SQL can be successfully used for many pre-processing tasks, such as data cleaning and wrangling, as demonstrated here by example. -
Unsupervised Learning for Predictive Maintenance using Auto-Encoders - Jan 14, 2021.
This article outlines a machine learning approach to detect and diagnose anomalies in the context of machine maintenance, along with a number of introductory concepts, including: Introduction to machine maintenance; What is predictive maintenance?; Approaches for machine diagnosis; Machine diagnosis using machine learning -
Creating Good Meaningful Plots: Some Principles - Jan 12, 2021.
Hera are some thought starters to help you create meaningful plots. -
Working With Sparse Features In Machine Learning Models - Jan 12, 2021.
Sparse features can cause problems like overfitting and suboptimal results in learning models, and understanding why this happens is crucial when developing models. Multiple methods, including dimensionality reduction, are available to overcome issues due to sparse features. -
Cloud Data Warehouse is The Future of Data Storage - Jan 12, 2021.
Today, cloud data storage accounts for 45% of all enterprise data and by Q2 2021, that number could grow to 53%. Now is the time to embrace cloud than now. -
Attention mechanism in Deep Learning, Explained - Jan 11, 2021.
Attention is a powerful mechanism developed to enhance the performance of the Encoder-Decoder architecture on neural network-based machine translation tasks. Learn more about how this process works and how to implement the approach into your work. -
OpenAI Releases Two Transformer Models that Magically Link Language and Computer Vision - Jan 11, 2021.
OpenAI has released two new transformer architectures that combine image and language tasks in an fun and almost magical way. Read more about them here. -
JupyterLab 3 is Here: Key reasons to upgrade now - Jan 8, 2021.
Read about these 3 reasons for checking out JupyterLab 3 today. -
Best Python IDEs and Code Editors You Should Know - Jan 8, 2021.
Developing machine learning algorithms requires implementing countless libraries and integrating many supporting tools and software packages. All this magic must be written by you in yet another tool -- the IDE -- that is fundamental to all your code work and can drive your productivity. These top Python IDEs and code editors are among the best tools available for you to consider, and are reviewed with their noteworthy features. -
Top 10 Computer Vision Papers 2020 - Jan 8, 2021.
The top 10 computer vision papers in 2020 with video demos, articles, code, and paper reference. -
Advice to aspiring Data Scientists – your most common questions answered - Jan 7, 2021.
Embarking on a new career path can be daunting with many unknowns about how to get started and how to be successful. If you are aspiring to become a Data Scientist, then the answers to these common questions can help set you off on the right foot. -
10 Underappreciated Python Packages for Machine Learning Practitioners - Jan 7, 2021.
Here are 10 underappreciated Python packages covering neural architecture design, calibration, UI creation and dissemination. -
CatalyzeX: A must-have browser extension for machine learning engineers and researchers - Jan 6, 2021.
CatalyzeX is a free browser extension that finds code implementations for ML/AI papers anywhere on the internet (Google, Arxiv, Twitter, Scholar, and other sites). -
Learn Data Science for free in 2021 - Jan 6, 2021.
If you are considering starting a career path in machine learning and data science, then there is a great deal to learn theoretically, along with gaining practical skills in applying a broad range of techniques. This comprehensive learning plan will guide you to start on this path, and it is all available for free. -
MLOps: Model Monitoring 101 - Jan 6, 2021.
Model monitoring using a model metric stack is essential to put a feedback loop from a deployed ML model back to the model building stage so that ML models can constantly improve themselves under different scenarios. -
Model Experiments, Tracking and Registration using MLflow on Databricks - Jan 5, 2021.
This post covers how StreamSets can help expedite operations at some of the most crucial stages of Machine Learning Lifecycle and MLOps, and demonstrates integration with Databricks and MLflow. -
DeepMind’s MuZero is One of the Most Important Deep Learning Systems Ever Created - Jan 4, 2021.
MuZero takes a unique approach to solve the problem of planning in deep learning models. -
All Machine Learning Algorithms You Should Know in 2021 - Jan 4, 2021.
Many machine learning algorithms exits that range from simple to complex in their approach, and together provide a powerful library of tools for analyzing and predicting patterns from data. If you are learning for the first time or reviewing techniques, then these intuitive explanations of the most popular machine learning models will help you kick off the new year with confidence.
December:
-
Meet whale! The stupidly simple data discovery tool , by Robert Yi
Finding data and understanding its meaning represents the traditional "daily grind" of a Data Scientist. With whale, the new lightweight data discovery, documentation, and quality engine for your data warehouse that is under development by Dataframe, your data science team will more efficiently search data and automate its data metrics. -
15 Free Data Science, Machine Learning & Statistics eBooks for 2021 , by Matthew Mayo
We present a curated list of 15 free eBooks compiled in a single location to close out the year. - Data Science as a Product – Why Is It So Hard?
-
Generating Beautiful Neural Network Visualizations , by Matthew Mayo
If you are looking to easily generate visualizations of neural network architectures, PlotNeuralNet is a project you should check out. -
Key Data Science Algorithms Explained: From k-means to k-medoids clustering , by Arushi Prakash
As a core method in the Data Scientist's toolbox, k-means clustering is valuable but can be limited based on the structure of the data. Can expanded methods like PAM (partitioning around medoids), CLARA, and CLARANS provide better solutions, and what is the future of these algorithms? - Essential Math for Data Science: The Poisson Distribution
- 2020: A Year Full of Amazing AI Papers — A Review
-
Monte Carlo integration in Python , by Tirthajyoti Sarkar
A famous Casino-inspired trick for data science, statistics, and all of science. How to do it in Python? -
SQL vs NoSQL: 7 Key Takeaways , by Alex Williams
People assume that NoSQL is a counterpart to SQL. Instead, it’s a different type of database designed for use-cases where SQL is not ideal. The differences between the two are many, although some are so crucial that they define both databases at their cores. - XGBoost: What it is, and when to use it
- Resampling Imbalanced Data and Its Limits
- Feature Store vs Data Warehouse
- Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance
- Fast and Intuitive Statistical Modeling with Pomegranate
-
Optimization Algorithms in Neural Networks , by Nagesh Singh Chauhan
This article presents an overview of some of the most used optimizers while training a neural network. - Undersampling Will Change the Base Rates of Your Model’s Predictions
-
Crack SQL Interviews , by Xinran Waibel
SQL is an essential programming language for data analysis and processing. So, SQL questions are always part of the interview process for data science-related jobs, including data analysts, data scientists, and data engineers. Become familiar with these common patterns seen in SQL interview questions and follow our tips on how to neatly handle each with SQL queries. - 8 Places for Data Professionals to Find Datasets
- How to use Machine Learning for Anomaly Detection and Conditional Monitoring
- How to Clean Text Data at the Command Line
- Data Science and Machine Learning: The Free eBook
- How to Create Custom Real-time Plots in Deep Learning
- Facebook Open Sources ReBeL, a New Reinforcement Learning Agent
- Matrix Decomposition Decoded
- Data Science Volunteering: Ways to Help
-
A Rising Library Beating Pandas in Performance , by Ezz El Din Abdullah
This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare. - 10 Python Skills They Don’t Teach in Bootcamp
- Implementing the AdaBoost Algorithm From Scratch
- Data Compression via Dimensionality Reduction: 3 Main Methods
- AI registers: finally, a tool to increase transparency in AI/ML
-
R or Python? Why Not Both?
Do you use both R and Python, either in different projects or in the same? Check out prython, an IDE designed to handle your needs. -
20 Core Data Science Concepts for Beginners , by Benjamin Obi Tayo
With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics. - 5 Free Books to Learn Statistics for Data Science
- Merging Pandas DataFrames in Python
- Essential Math for Data Science: Probability Density and Probability Mass Functions
- The Ultimate Guide to Data Engineer Interviews
- Change the Background of Any Video with 5 Lines of Code
- Pruning Machine Learning Models in TensorFlow
-
Introduction to Data Engineering , by Xinran Waibel
The Q&A for the most frequently asked questions about Data Engineering: What does a data engineer do? What is a data pipeline? What is a data warehouse? How is a data engineer different from a data scientist? What skills and programming languages do you need to learn to become a data engineer? - 10 Python Skills for Beginners
- Building AI Models for High-Frequency Streaming Data
- Simple & Intuitive Ensemble Learning in R
- Roadmaps to becoming a Full-Stack AI Developer, Data Scientist, Machine Learning Engineer, and more
-
NoSQL for Beginners
NoSQL can offer an advantage to those who are entering Data Science and Analytics, as well as having applications with high-performance needs that aren’t met by traditional SQL databases. - Remembering Pluribus: The Techniques that Facebook Used to Master World’s Most Difficult Poker Game
- 14 Data Science projects to improve your skills
-
Object-Oriented Programming Explained Simply for Data Scientists
Read this simple but effective guide to start using Classes in Python 3.