DataCamp - Easiest Way to Learn Data Science
Learning Python? Take this
Intro to Python for Data Science Tutorial
Now on Sale.
Learning R? Take this
Intro to R for Data Science Tutorial
Now on Sale.
How Reading Papers Helps You Be a More Effective Data Scientist - Feb 24, 2021.
By reading papers, we were able to learn what others (e.g., LinkedIn) have found to work (and not work). We can then adapt their approach and not have to reinvent the rocket. This helps us deliver a working solution with lesser time and effort.
Pandas Profiling: One-Line Magical Code for EDA - Feb 24, 2021.
EDA can be automated using a Python library called Pandas Profiling. Let’s explore Pandas profiling to do EDA in a very short time and with just a single line code.
Using NLP to improve your Resume - Feb 23, 2021.
This article discusses performing keyword matching and text analysis on job descriptions.
10 Statistical Concepts You Should Know For Data Science Interviews - Feb 23, 2021.
Data Science is founded on time-honored concepts from statistics and probability theory. Having a strong understanding of the ten ideas and techniques highlighted here is key to your career in the field, and also a favorite topic for concept checks during interviews.
Data Observability, Part II: How to Build Your Own Data Quality Monitors Using SQL - Feb 23, 2021.
Using schema and lineage to understand the root cause of your data anomalies.
An overview of synthetic data types and generation methods - Feb 22, 2021.
Synthetic data can be used to test new products and services, validate models, or test performances because it mimics the statistical property of production data. Today you'll find different types of structured and unstructured synthetic data.
Powerful Exploratory Data Analysis in just two lines of code - Feb 22, 2021.
EDA is a fundamental early process for any Data Science investigation. Typical approaches for visualization and exploration are powerful, but can be cumbersome for getting to the heart of your data. Now, you can get to know your data much faster with only a few lines of code... and it might even be fun!
Inside the Architecture Powering Data Quality Management at Uber - Feb 22, 2021.
Data Quality Monitor implements novel statistical methods for anomaly detection and quality management in large data infrastructures.
Evaluating Deep Learning Models: The Confusion Matrix, Accuracy, Precision, and Recall - Feb 19, 2021.
This tutorial discusses the confusion matrix, and how the precision, recall and accuracy are calculated, and how they relate to evaluating deep learning models.
Feature Store as a Foundation for Machine Learning - Feb 19, 2021.
With so many organizations now taking the leap into building production-level machine learning models, many lessons learned are coming to light about the supporting infrastructure. For a variety of important types of use cases, maintaining a centralized feature store is essential for higher ROI and faster delivery to market. In this review, the current feature store landscape is described, and you can learn how to architect one into your MLOps pipeline.
Multidimensional multi-sensor time-series data analysis framework - Feb 19, 2021.
This blog post provides an overview of the package “msda” useful for time-series sensor data analysis. A quick introduction about time-series data is also provided.
Approaching (Almost) Any Machine Learning Problem - Feb 18, 2021.
This freely-available book is a fantastic walkthrough of practical approaches to machine learning problems.
6 Data Science Certificates To Level Up Your Career - Feb 18, 2021.
Anyone looking to obtain a data science certificate to prove their ability in the field will find a range of options exist. We review several valuable certificates to consider that will definitely pump up your resume and portfolio to get you closer to your dream job.
Forecasting Stories 5: The story of the launch - Feb 18, 2021.
New products forecasting can be very difficult - there is no history to start with, and hence no base line. The number of assumptions can be huge. The best way to forecast then, is to try parallel approaches, build different views and triangulate on a common range.
GPT-2 vs GPT-3: The OpenAI Showdown - Feb 17, 2021.
Thanks to the diversity of the dataset used in the training process, we can obtain adequate text generation for text from a variety of domains. GPT-2 is 10x the parameters and 10x the data of its predecessor GPT.
10 resources for data science self-study - Feb 17, 2021.
Many resources exist for the self-study of data science. In our modern age of information technology, an enormous amount of free learning resources are available to anyone, and with effort and dedication, you can master the fundamentals of data science.
Deep Learning-based Real-time Video Processing - Feb 17, 2021.
In this article, we explore how to build a pipeline and process real-time video with Deep Learning to apply this approach to business use cases overviewed in our research.
Data Observability: Building Data Quality Monitors Using SQL - Feb 16, 2021.
To trigger an alert when data breaks, data teams can leverage a tried and true tactic from our friends in software engineering: monitoring and observability. In this article, we walk through how you can create your own data quality monitors for freshness and distribution from scratch using SQL.
Hugging Face Transformers Package – What Is It and How To Use It - Feb 16, 2021.
The rapid development of Transformers have brought a new wave of powerful tools to natural language processing. These models are large and very expensive to train, so pre-trained versions are shared and leveraged by researchers and practitioners. Hugging Face offers a wide variety of pre-trained transformers as open-source libraries, and you can incorporate these with only one line of code.
Easy, Open-Source AutoML in Python with EvalML - Feb 16, 2021.
We’re excited to announce that a new open-source project has joined the Alteryx open-source ecosystem. EvalML is a library for automated machine learning (AutoML) and model understanding, written in Python.
IBM Uses Continual Learning to Avoid The Amnesia Problem in Neural Networks - Feb 15, 2021.
Using continual learning might avoid the famous catastrophic forgetting problem in neural networks.
Telling a Great Data Story: A Visualization Decision Tree - Feb 15, 2021.
Pick your visualizations strategically. They need to tell a story.
Essential Math for Data Science: Scalars and Vectors - Feb 12, 2021.
Linear algebra is the branch of mathematics that studies vector spaces. You’ll see how vectors constitute vector spaces and how linear algebra applies linear transformations to these spaces. You’ll also learn the powerful relationship between sets of linear equations and vector equations.
6 NLP Techniques Every Data Scientist Should Know - Feb 12, 2021.
Natural language processing has already begun to transform to way humans interact with computers, and its advances are moving rapidly. The field is built on core methods that must first be understood, with which you can then launch your data science projects to a new level of sophistication and value.
Understanding NoSQL Database Types: Column-Oriented Databases - Feb 12, 2021.
NoSQL Databases have four distinct types. Key-value stores, document-stores, graph databases, and column-oriented databases. In this article, we’ll explore column-oriented databases, also known simply as “NoSQL columns”.
How to Speed up Scikit-Learn Model Training - Feb 11, 2021.
Scikit-Learn is an easy to use a Python library for machine learning. However, sometimes scikit-learn models can take a long time to train. The question becomes, how do you create the best scikit-learn model in the least amount of time?
A Critical Comparison of Machine Learning Platforms in an Evolving Market - Feb 11, 2021.
There’s a clear inclination towards the MLaaS model across industries, given the fact that companies today have an option to select from a wide range of solutions that can cater to diverse business needs. Here is a look at 3 of the top ML platforms for data excellence.
My machine learning model does not learn. What should I do? - Feb 10, 2021.
This article presents 7 hints on how to get out of the quicksand.
7 Most Recommended Skills to Learn to be a Data Scientist - Feb 10, 2021.
The Data Scientist professional has emerged as a true interdisciplinary role that spans a variety of skills, theoretical and practical. For the core, day-to-day activities, many critical requirements that enable the delivery of real business value reach well outside the realm of machine learning, and should be mastered by those aspiring to the field.
Data Science vs Business Intelligence, Explained - Feb 10, 2021.
Knowing the differences between the business intelligence and data science is more than just a matter of semantics.
How to Deploy a Flask API in Kubernetes and Connect it with Other Micro-services - Feb 9, 2021.
A hands-on tutorial on how to implement your micro-service architecture using the powerful container orchestration tool Kubernetes.
Adversarial Attacks on Explainable AI - Feb 9, 2021.
Are explainability methods black-box themselves?
Microsoft Explores Three Key Mysteries of Ensemble Learning - Feb 8, 2021.
A new paper studies three key puzzling characteristics of deep learning ensembles and some potential explanations.
Essential Math for Data Science: Introduction to Matrices and the Matrix Product - Feb 5, 2021.
As vectors, matrices are data structures allowing you to organize numbers. They are square or rectangular arrays containing values organized in two dimensions: as rows and columns. You can think of them as a spreadsheet. Learn more here.
Deep learning doesn’t need to be a black box - Feb 5, 2021.
The cultural perception of AI is often suspect because of the described challenges in knowing why a deep neural network makes its predictions. So, researchers try to crack open this "black box" after a network is trained to correlate results with inputs. But, what if the goal of explainability could be designed into the network's architecture -- before the model is trained and without reducing its predictive power? Maybe the box could stay open from the beginning.
Backcasting: Building an Accurate Forecasting Model for Your Business - Feb 5, 2021.
This article will shed some light on processes happening under the roof of ML-based solutions on the example of the business case where the future success directly depends on the ability to predict unknown values from the past.
Build Your First Data Science Application - Feb 4, 2021.
Check out these seven Python libraries to make your first data science MVP application.
How to create stunning visualizations using python from scratch - Feb 4, 2021.
Data science and data analytics can be beautiful things. Not only because of the insights and enhancements to decision-making they can provide, but because of the rich visualizations about the data that can be created. Following this step-by-step guide using the Matplotlib and Seaborn libraries will help you improve the presentation and effective communication of your work.
Getting Started with 5 Essential Natural Language Processing Libraries - Feb 3, 2021.
This article is an overview of how to get started with 5 popular Python NLP libraries, from those for linguistic data visualization, to data preprocessing, to multi-task functionality, to state of the art language modeling, and beyond.
Saving and loading models in TensorFlow — why it is important and how to do it - Feb 3, 2021.
So much time and effort can go into training your machine learning models. But, shut down the notebook or system, and all those trained weights and more vanish with the memory flush. Saving your models to maximize reusability is key for efficient productivity.
Adversarial generation of extreme samples - Feb 2, 2021.
In order to mitigate risks when modelling extreme events, it is vital to be able to generate a wide range of extreme, and realistic, scenarios. Researchers from the National University of Singapore and IIT Bombay have developed an approach to do just that.
Vision Transformers: Natural Language Processing (NLP) Increases Efficiency and Model Generality - Feb 2, 2021.
Why do we hear so little about transformer models applied to computer vision tasks? What about attention in computer vision networks?
3 Ways Understanding Bayes Theorem Will Improve Your Data Science - Feb 1, 2021.
Mastery of the mathematics and applications of this intuitive statistical concept will advance your credibility as a decision maker.
Beyond the Nash Equilibrium: DeepMind Clever Strategy to Solve Asymmetric Games - Feb 1, 2021.
The method expands the concept of a Nash equilibrium by decomposing an asymmetric game into multiple symmetric games.
- Baidu Research: 10 Technology Trends in 2021
- What is Graph Theory, and Why Should You Care?
- Top 5 Reasons Why Machine Learning Projects Fail
- Working With The Lambda Layer in Keras
- Popular Machine Learning Interview Questions, part 2
- Support Vector Machine for Hand Written Alphabet Recognition in R
- Six Times Bigger than GPT-3: Inside Google’s TRILLION Parameter Switch Transformer Model
The Ultimate Scikit-Learn Machine Learning Cheatsheet
, by Andre Ye With the power and popularity of the scikit-learn for machine learning in Python, this library is a foundation to any practitioner's toolset. Preview its core methods with this review of predictive modelling, clustering, dimensionality reduction, feature importance, and data transformation.
Building a Deep Learning Based Reverse Image Search
, by Vegard Flovik Following the journey from unstructured data to content based image retrieval.
Cloud Computing, Data Science and ML Trends in 2020–2022: The battle of giants
, by George Vyshnya Kaggle’s survey of ‘State of Data Science and Machine Learning 2020’ covers a lot of diverse topics. In this post, we are going to look at the popularity of cloud computing platforms and products among the data science and ML professionals participated in the survey.
- How to Use MLOps for an Effective AI Strategy
- Mastering TensorFlow Variables in 5 Easy Steps
Popular Machine Learning Interview Questions
, by Mo Daoud Get ready for your next job interview requiring domain knowledge in machine learning with answers to these eleven common questions.
- Loglet Analysis: Revisiting COVID-19 Projections
- Graph Representation Learning: The Free eBook
Build a Data Science Portfolio that Stands Out Using These Platforms
, by Benjamin Obi Tayo Making your big break into the data science profession means standing out to potential employers in a crowd of tough competition. An important way to showcase your skills and experience is through the presentation of a portfolio. Following these recommendations for developing your portfolio will help you network effectively and stay on top of an ever-changing field.
- Microsoft Uses Transformer Networks to Answer Questions About Images With Minimum Training
- Comprehensive Guide to the Normal Distribution
Essential Math for Data Science: Information Theory
, by Hadrien Jean In the context of machine learning, some of the concepts of information theory are used to characterize or compare probability distributions. Read up on the underlying math to gain a solid understanding of relevant aspects of information theory.
K-Means 8x faster, 27x lower error than Scikit-learn in 25 lines
, by Jakub Adamczyk K-means clustering is a powerful algorithm for similarity searches, and Facebook AI Research's faiss library is turning out to be a speed champion. With only a handful of lines of code shared in this demonstration, faiss outperforms the implementation in scikit-learn in speed and accuracy.
Cleaner Data Analysis with Pandas Using Pipes
, by Soner Yıldırım Check out this practical guide on Pandas pipes.
- Data Cleaning and Wrangling in SQL
- Unsupervised Learning for Predictive Maintenance using Auto-Encoders
- Creating Good Meaningful Plots: Some Principles
- Working With Sparse Features In Machine Learning Models
- Cloud Data Warehouse is The Future of Data Storage
- Attention mechanism in Deep Learning, Explained
- OpenAI Releases Two Transformer Models that Magically Link Language and Computer Vision
- JupyterLab 3 is Here: Key reasons to upgrade now
Best Python IDEs and Code Editors You Should Know
, by Claire D. Costa Developing machine learning algorithms requires implementing countless libraries and integrating many supporting tools and software packages. All this magic must be written by you in yet another tool -- the IDE -- that is fundamental to all your code work and can drive your productivity. These top Python IDEs and code editors are among the best tools available for you to consider, and are reviewed with their noteworthy features.
Top 10 Computer Vision Papers 2020
, by Louis (What’s AI) Bouchard The top 10 computer vision papers in 2020 with video demos, articles, code, and paper reference.
- Advice to aspiring Data Scientists – your most common questions answered
10 Underappreciated Python Packages for Machine Learning Practitioners
, by Vinay Uday Prabhu Here are 10 underappreciated Python packages covering neural architecture design, calibration, UI creation and dissemination.
- CatalyzeX: A must-have browser extension for machine learning engineers and researchers
Learn Data Science for free in 2021
, by Ahmad Anis If you are considering starting a career path in machine learning and data science, then there is a great deal to learn theoretically, along with gaining practical skills in applying a broad range of techniques. This comprehensive learning plan will guide you to start on this path, and it is all available for free.
- MLOps: Model Monitoring 101
- Model Experiments, Tracking and Registration using MLflow on Databricks
DeepMind’s MuZero is One of the Most Important Deep Learning Systems Ever Created
, by Jesus Rodriguez MuZero takes a unique approach to solve the problem of planning in deep learning models.
All Machine Learning Algorithms You Should Know in 2021
, by Terence Shin Many machine learning algorithms exits that range from simple to complex in their approach, and together provide a powerful library of tools for analyzing and predicting patterns from data. If you are learning for the first time or reviewing techniques, then these intuitive explanations of the most popular machine learning models will help you kick off the new year with confidence.