2020 Aug Tutorials, Overviews
All (84) | Events (4) | News, Education (6) | Opinions (14) | Top Stories, Tweets (10) | Tutorials, Overviews (50)
- Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Semantics and Pragmatics
- Aug 31, 2020.
Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.
- Accelerated Computer Vision: A Free Course From Amazon
- Aug 31, 2020.
Amazon's Machine Learning University is making its online courses available to the public, and this time we look at its Accelerated Computer Vision offering.
- Microsoft’s DoWhy is a Cool Framework for Causal Inference
- Aug 28, 2020.
Inspired by Judea Pearl’s do-calculus for causal inference, the open source framework provides a programmatic interface for popular causal inference methods.
- Explainable and Reproducible Machine Learning Model Development with DALEX and Neptune
- Aug 27, 2020.
With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.
-
4 ways to improve your TensorFlow model – key regularization techniques you need to know - Aug 27, 2020.
Regularization techniques are crucial for preventing your models from overfitting and enables them perform better on your validation and test sets. This guide provides a thorough overview with code of four key approaches you can use for regularization in TensorFlow. - Working with Spark, Python or SQL on Azure Databricks
- Aug 27, 2020.
Here we look at some ways to interchangeably work with Python, PySpark and SQL using Azure Databricks, an Apache Spark-based big data analytics service designed for data science and data engineering offered by Microsoft.
- Data Versioning: Does it mean what you think it means?
- Aug 26, 2020.
Does data versioning mean what you think it means? Read this overview with use cases to see what data versioning really is, and the tools that can help you manage it.
- Breaking Privacy in Federated Learning
- Aug 26, 2020.
Despite the benefits of federated learning, there are still ways of breaching a user’s privacy, even without sharing private data. In this article, we’ll review some research papers that discuss how federated learning includes this vulnerability.
- Getting Started with Feature Selection
- Aug 25, 2020.
For machine learning, more data is always better. What about more features of data? Not necessarily. This beginners' guide with code examples for selecting the most useful features from your data will jump start you toward developing the most effective and efficient learning models.
- Data Science Tools Illustrated Study Guides
- Aug 25, 2020.
These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.
- DeepMind’s Three Pillars for Building Robust Machine Learning Systems
- Aug 24, 2020.
Specification Testing, Robust Training and Formal Verification are three elements that the AI powerhouse believe hold the essence of robust machine learning models.
- A Deep Dive Into the Transformer Architecture – The Development of Transformer Models
- Aug 24, 2020.
Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.
-
The NLP Model Forge: Generate Model Code On Demand - Aug 24, 2020.
You've seen their Big Bad NLP Database and The Super Duper NLP Repo. Now Quantum Stat is back with its most ambitious NLP product yet: The NLP Model Forge. - Performance Testing on Big Data Applications
- Aug 21, 2020.
You can use performance testing in any application you’re working on but it’s especially useful for big data applications. Let’s see why.
-
Must-read NLP and Deep Learning articles for Data Scientists - Aug 21, 2020.
NLP and deep learning continue to advance, nearly on a daily basis. Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve. - Data Science Meets Devops: MLOps with Jupyter, Git, and Kubernetes
- Aug 21, 2020.
An end-to-end example of deploying a machine learning product using Jupyter, Papermill, Tekton, GitOps and Kubeflow.
- Build Your Own AutoML Using PyCaret 2.0
- Aug 20, 2020.
In this post we present a step-by-step tutorial on how PyCaret can be used to build an Automated Machine Learning Solution within Power BI, thus allowing data scientists and analysts to add a layer of machine learning to their Dashboards without any additional license or software costs.
-
Introduction to Federated Learning - Aug 20, 2020.
Federated learning means enabling on-device training, model personalization, and more. Read more about it in this article. - Autotuning for Multi-Objective Optimization on LinkedIn’s Feed Ranking
- Aug 19, 2020.
In this post, the authors share their experience coming up with an automated system to tune one of the main parameters in their machine learning model that recommends content on LinkedIn’s Feed, which is just one piece of the community-focused architecture.
- Accelerated Natural Language Processing: A Free Course From Amazon
- Aug 19, 2020.
Amazon's Machine Learning University is making its online courses available to the public, starting with this Accelerated Natural Language Processing offering.
- Visualizing the Mobility Trends in European Countries Affected by COVID-19
- Aug 18, 2020.
This post highlights the movement of people from the 10 most-affected European countries based on the way they stay at home, work, and visit places, using Google's anonymized location tracking dataset.
-
Top Google AI, Machine Learning Tools for Everyone - Aug 18, 2020.
Google is much more than a search company. Learn about all the tools they are developing to help turn your ideas into reality through Google AI. - 3D Human Pose Estimation Experiments and Analysis
- Aug 17, 2020.
In this article, we explore how 3D human pose estimation works based on our research and experiments, which were part of the analysis of applying human pose estimation in AI fitness coach applications.
- How Do Neural Networks Learn?
- Aug 17, 2020.
With neural networks being so popular today in AI and machine learning development, they can still look like a black box in terms of how they learn to make predictions. To understand what is going on deep in these networks, we must consider how neural networks perform optimization.
- Are Computer Vision Models Vulnerable to Weight Poisoning Attacks?
- Aug 17, 2020.
A recent paper has explored the possibility of influencing the predictions of a freshly trained Natural Language Processing (NLP) model by tweaking the weights re-used in its training. his result is especially interesting if it proves to transfer also to the context of Computer Vision (CV) since there, the usage of pre-trained weights is widespread.
- Content-Based Recommendation System using Word Embeddings
- Aug 14, 2020.
This article explores how average Word2Vec and TF-IDF Word2Vec can be used to build a recommendation engine.
-
The List of Top 10 Lists in Data Science - Aug 14, 2020.
The list of Top 10 lists that Data Scientists -- from enthusiasts to those who want to jump start a career -- must know to smoothly navigate a path through this field. - Hypothesis Test for Real Problems
- Aug 14, 2020.
Hypothesis tests are significant for evaluating answers to questions concerning samples of data.
- Bring your Pandas Dataframes to life with D-Tale
- Aug 13, 2020.
Bring your Pandas dataframes to life with D-Tale. D-Tale is an open-source solution for which you can visualize, analyze and learn how to code Pandas data structures. In this tutorial you'll learn how to open the grid, build columns, create charts and view code exports.
-
5 Different Ways to Load Data in Python - Aug 13, 2020.
Data is the bread and butter of a Data Scientist, so knowing many approaches to loading data for analysis is crucial. Here, five Python techniques to bring in your data are reviewed with code examples for you to follow. - GitHub is the Best AutoML You Will Ever Need
- Aug 12, 2020.
This article uses PyCaret 2.0, an open source, low-code machine learning library in Python to develop a simple AutoML solution and deploy it as a Docker container using GitHub actions.
- Introduction to Statistics for Data Science
- Aug 12, 2020.
Statistics is foundational for Data Science and a crucial skill to master for any practitioner. This advanced introduction reviews with examples the fundamental concepts of inferential statistics by illustrating the differences between Point Estimators and Confidence Intervals Estimates.
- How Natural Language Processing Is Changing Data Analytics
- Aug 12, 2020.
As it becomes more prevalent, NLP will enable humans to interact with computers in ways not possible before. This new type of collaboration will allow improvements in a wide variety of human endeavors, including business, philanthropy, health, and communication.
-
Unit Test Your Data Pipeline, You Will Thank Yourself Later - Aug 11, 2020.
While you cannot test model output, at least you should test that inputs are correct. Compared to the time you invest in writing unit tests, good pieces of simple tests will save you much more time later, especially when working on large projects or big data. - Data Science Internship Interview Questions
- Aug 11, 2020.
Data science is an attractive field because not only is it lucrative, but you can have opportunities to work on interesting projects, and you’re always learning new things. If you're trying to get started from the ground up, then review this guide to prepare for the interview essentials.
- 10 Use Cases for Privacy-Preserving Synthetic Data
- Aug 11, 2020.
This article presents 10 use-cases for synthetic data, showing how enterprises today can use this artificially generated information to train machine learning models or share data externally without violating individuals' privacy.
- Exploring GPT-3: A New Breakthrough in Language Generation
- Aug 10, 2020.
GPT-3 is the largest natural language processing (NLP) transformer released to date, eclipsing the previous record, Microsoft Research’s Turing-NLG at 17B parameters, by about 10 times. This has resulted in an explosion of demos: some good, some bad, all interesting.
- Data Scientist Job Market 2020
- Aug 10, 2020.
With an analysis of over a thousand Data Scientist job descriptions in the USA, check out the trends for 2020 and current expectations on new positions in the field, including credentials, experience, and programming languages.
- Facebook Uses Bayesian Optimization to Conduct Better Experiments in Machine Learning Models
- Aug 10, 2020.
A research from Facebook proposes a Beyasian optimization method to run A/B tests in machine learning models.
- Batch Normalization in Deep Neural Networks
- Aug 7, 2020.
Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini batch.
- Containerization of PySpark Using Kubernetes
- Aug 6, 2020.
This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.
- Essential Data Science Tips: How to Use One-Vs-Rest and One-Vs-One for Multi-Class Classification
- Aug 6, 2020.
Classification, as a predictive model, involves aligning each class label to examples. Algorithms designed for binary classification cannot be applied to multi-class classification problems. For such situations, heuristic methods come in handy.
- Metrics to Use to Evaluate Deep Learning Object Detectors
- Aug 6, 2020.
It's important to understand which metric should be used to evaluate trained object detectors and which one is more important. Is mAP alone enough to evaluate the objector models? Can the same metric be used to evaluate object detectors on validation set and test set?
-
Netflix’s Polynote is a New Open Source Framework to Build Better Data Science Notebooks - Aug 5, 2020.
The new notebook environment provides substantial improvements to streamline experimentation in machine learning workflows. - Implementing MLOps on an Edge Device
- Aug 4, 2020.
This article introduces developers to MLOps and strategies for implementing MLOps on edge devices.
-
Setting Up Your Data Science & Machine Learning Capability in Python - Aug 4, 2020.
With the rich and dynamic ecosystem of Python continuing to be a leading programming language for data science and machine learning, establishing and maintaining a cost-effective development environment is crucial to your business impact. So, do you rent or buy? This overview considers the hidden and obvious factors involved in selecting and implementing your Python platform. - 5 Apache Spark Best Practices For Data Science
- Aug 4, 2020.
Check out these best practices for Spark that the author wishes they knew before starting their project.
- Announcing PyCaret 2.0
- Aug 3, 2020.
PyCaret 2.0 has been released! Find out about all of the updates and see examples of how to use them right here.
- The Machine Learning Field Guide
- Aug 3, 2020.
This straightforward guide offers a structured overview of all machine learning prerequisites needed to start working on your project, including the complete data pipeline from importing and cleaning data to modelling and production.
-
Know What Employers are Expecting for a Data Scientist Role in 2020 - Aug 3, 2020.
The analysis is done from 1000+ recent Data scientist jobs, extracted from job portals using web scraping.