2020 Aug

All (59) | Events (2) | News, Education (2) | Opinions (12) | Tutorials, Overviews (43)

A Curious Theory About the Consciousness Debate in AI

Dr. Michio Kaku has formulated a very interesting theory of consciousness that applies to AI systems.

on Aug 31, 2020 in Agents, AI, DeepMind, OpenAI
Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Semantics and Pragmatics

Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.

on Aug 31, 2020 in ebook, NLP, Text Analytics, Text Mining
Accelerated Computer Vision: A Free Course From Amazon

Amazon's Machine Learning University is making its online courses available to the public, and this time we look at its Accelerated Computer Vision offering.

on Aug 31, 2020 in Amazon, Computer Vision, Courses, Free
Data is everywhere and it powers everything we do!

In this article I would like to focus on how companies can start their data-centric strategies and how to achieve success in their data transformation journeys. Have tried to share my thoughts why companies have to consider data at its epitome for their growth, for being competitive, for being smarter, innovative and be prepared for any unforeseen market surprises.

on Aug 28, 2020 in Analytics, Business, Data Science
Microsoft’s DoWhy is a Cool Framework for Causal Inference

Inspired by Judea Pearl’s do-calculus for causal inference, the open source framework provides a programmatic interface for popular causal inference methods.

on Aug 28, 2020 in Causality, Inference, Machine Learning, Microsoft
Explainable and Reproducible Machine Learning Model Development with DALEX and Neptune

With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.

on Aug 27, 2020 in Dalex, Explainability, Explainable AI, Interpretability, Python, SHAP
4 ways to improve your TensorFlow model – key regularization techniques you need to know

Regularization techniques are crucial for preventing your models from overfitting and enables them perform better on your validation and test sets. This guide provides a thorough overview with code of four key approaches you can use for regularization in TensorFlow.

on Aug 27, 2020 in Machine Learning, Overfitting, Regularization, TensorFlow
Working with Spark, Python or SQL on Azure Databricks

Here we look at some ways to interchangeably work with Python, PySpark and SQL using Azure Databricks, an Apache Spark-based big data analytics service designed for data science and data engineering offered by Microsoft.

on Aug 27, 2020 in Apache Spark, Databricks, Microsoft Azure, Python, SQL
How to Optimize Your CV for a Data Scientist Career

As the number of data science positions continues to grow dramatically, so does the number of data scientists in the marketplace. Follow these expert tips and examples to help make your resume and job applications stand out in an increasingly competitive field.

on Aug 26, 2020 in Career, Career Advice, Data Scientist, Hiring, Resume
Breaking Privacy in Federated Learning

Despite the benefits of federated learning, there are still ways of breaching a user’s privacy, even without sharing private data. In this article, we’ll review some research papers that discuss how federated learning includes this vulnerability.

on Aug 26, 2020 in Anonymized, Federated Learning, Learning, Privacy
Unifying Data Pipelines and Machine Learning with Apache Spark™ and Amazon SageMaker

Roll up your sleeves and charge up because you’re invited to an interactive, virtual Machine Learning workshop run by Amazon Web Services, Databricks, and Immuta on September 10.

on Aug 25, 2020 in Apache Spark, AWS, Immuta, Sagemaker, Webinar
How Data Science Is Keeping People Safe During COVID-19

Data, and more importantly, the way people use it, is shaping and refining approaches to COVID-19 safety. Here's a closer look at how this is happening.

on Aug 25, 2020 in Coronavirus, COVID-19, Data Science, Safety
Getting Started with Feature Selection

For machine learning, more data is always better. What about more features of data? Not necessarily. This beginners' guide with code examples for selecting the most useful features from your data will jump start you toward developing the most effective and efficient learning models.

on Aug 25, 2020 in Beginners, Data Preparation, Feature Selection
Data Science Tools Illustrated Study Guides

These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.

on Aug 25, 2020 in Cheat Sheet, Data Preprocessing, Data Processing, Data Science, Data Science Tools, Data Visualization, Python, R, SQL
DeepMind’s Three Pillars for Building Robust Machine Learning Systems

Specification Testing, Robust Training and Formal Verification are three elements that the AI powerhouse believe hold the essence of robust machine learning models.

on Aug 24, 2020 in AI, DeepMind, Machine Learning
A Deep Dive Into the Transformer Architecture – The Development of Transformer Models

Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.

on Aug 24, 2020 in Attention, Deep Learning, Hugging Face, NLP, Transformer
The NLP Model Forge: Generate Model Code On Demand

You've seen their Big Bad NLP Database and The Super Duper NLP Repo. Now Quantum Stat is back with its most ambitious NLP product yet: The NLP Model Forge.

on Aug 24, 2020 in Google Colab, Modeling, NLP, Text Analytics
Performance Testing on Big Data Applications

You can use performance testing in any application you’re working on but it’s especially useful for big data applications. Let’s see why.

on Aug 21, 2020 in Applications, Big Data, Performance
Must-read NLP and Deep Learning articles for Data Scientists

NLP and deep learning continue to advance, nearly on a daily basis. Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve.

on Aug 21, 2020 in Deep Learning, Google, GPT-3, NLP, OpenAI, Privacy, Research, Self-Driving, TensorFlow, Trends
Data Science Meets Devops: MLOps with Jupyter, Git, and Kubernetes

An end-to-end example of deploying a machine learning product using Jupyter, Papermill, Tekton, GitOps and Kubeflow.

on Aug 21, 2020 in Data Science, DevOps, Jupyter, Kubeflow, Kubernetes, MLOps
Rapid Python Model Deployment with FICO Xpress Insight

The biggest hurdle in the use of data to create business value, is indeed the ability to operationalize analytics throughout the organization. Xpress Insight is geared to reduce the burden on IT and address their critical requirements while empowering business users to take ownership of decisions and change management.

on Aug 20, 2020 in AI, Deployment, FICO, Machine Learning, Optimization, Python
Build Your Own AutoML Using PyCaret 2.0

In this post we present a step-by-step tutorial on how PyCaret can be used to build an Automated Machine Learning Solution within Power BI, thus allowing data scientists and analysts to add a layer of machine learning to their Dashboards without any additional license or software costs.

on Aug 20, 2020 in Automated Machine Learning, AutoML, Power BI, PyCaret, Python
These Data Science Skills will be your Superpower

Learning data science means learning the hard skills of statistics, programming, and machine learning. To complete your training, a broader set of soft skills will round out your capabilities as an effective and successful professional Data Scientist.

on Aug 20, 2020 in Communication, Data Preparation, Data Science Skills, Data Visualization, Mathematics, Statistics
Introduction to Federated Learning

Federated learning means enabling on-device training, model personalization, and more. Read more about it in this article.

on Aug 20, 2020 in Data Labeling, Federated Learning, Mobile, Privacy, Training
Autotuning for Multi-Objective Optimization on LinkedIn’s Feed Ranking

In this post, the authors share their experience coming up with an automated system to tune one of the main parameters in their machine learning model that recommends content on LinkedIn’s Feed, which is just one piece of the community-focused architecture.

on Aug 19, 2020 in Automated Machine Learning, AutoML, LinkedIn, Machine Learning, Optimization
Accelerated Natural Language Processing: A Free Course From Amazon

Amazon's Machine Learning University is making its online courses available to the public, starting with this Accelerated Natural Language Processing offering.

on Aug 19, 2020 in Amazon, Courses, Free, Machine Learning, NLP
KDD-2020 (virtual), the leading conference on Data Science and Knowledge Discovery, Aug 23-27 – register now

Using an interactive VR platform, KDD-2020 brings you the latest research in AI, Data Science, Deep Learning, and Machine Learning with tutorials to improve your skills, keynotes from top experts, workshops on state-of-the-art topics and over 200 research presentations.

on Aug 18, 2020 in ACM SIGKDD, COVID-19, Data Science, Deep Learning, KDD, KDD-2020, Machine Learning, Meetings, Research
Top Google AI, Machine Learning Tools for Everyone

Google is much more than a search company. Learn about all the tools they are developing to help turn your ideas into reality through Google AI.

on Aug 18, 2020 in AI, AutoML, Bias, Data Science Platforms, Datasets, Google, Google Cloud, Google Colab, Machine Learning, TensorFlow
How “Anonymous” is Anonymized Data?

As the collection of personal data democratized over the previous century, the question of data anonymization started to rise. The regulations coming into effect around the world sealed the importance of the matter.

on Aug 18, 2020 in Anonymity, Anonymized, Compliance, GDPR, Identification, Privacy
3D Human Pose Estimation Experiments and Analysis

In this article, we explore how 3D human pose estimation works based on our research and experiments, which were part of the analysis of applying human pose estimation in AI fitness coach applications.

on Aug 17, 2020 in Analysis, Computer Vision, Humans, Sports, Video recognition
How Do Neural Networks Learn?

With neural networks being so popular today in AI and machine learning development, they can still look like a black box in terms of how they learn to make predictions. To understand what is going on deep in these networks, we must consider how neural networks perform optimization.

on Aug 17, 2020 in Beginners, Neural Networks
Content-Based Recommendation System using Word Embeddings

This article explores how average Word2Vec and TF-IDF Word2Vec can be used to build a recommendation engine.

on Aug 14, 2020 in NLP, Recommendation Engine, Recommender Systems, TF-IDF, Word Embeddings, word2vec
The List of Top 10 Lists in Data Science

The list of Top 10 lists that Data Scientists -- from enthusiasts to those who want to jump start a career -- must know to smoothly navigate a path through this field.

on Aug 14, 2020 in Algorithms, Data Science, Data Science Skills, Datasets, Influencers, LinkedIn, Python, Top 10
Hypothesis Test for Real Problems

Hypothesis tests are significant for evaluating answers to questions concerning samples of data.

on Aug 14, 2020 in Hypothesis Testing, P-value, Statistics
Bring your Pandas Dataframes to life with D-Tale

Bring your Pandas dataframes to life with D-Tale. D-Tale is an open-source solution for which you can visualize, analyze and learn how to code Pandas data structures. In this tutorial you'll learn how to open the grid, build columns, create charts and view code exports.

on Aug 13, 2020 in Data Exploration, Data Science, Data Visualization, Pandas, Python
Going Beyond Superficial: Data Science MOOCs with Substance

Data science MOOCs are superficial. At least, a lot of them are. What are your options when looking for something more substantive?

on Aug 13, 2020 in Courses, Data Science, Machine Learning, MOOC
GitHub is the Best AutoML You Will Ever Need

This article uses PyCaret 2.0, an open source, low-code machine learning library in Python to develop a simple AutoML solution and deploy it as a Docker container using GitHub actions.

on Aug 12, 2020 in Automated Machine Learning, AutoML, GitHub, PyCaret, Python
Introduction to Statistics for Data Science

Statistics is foundational for Data Science and a crucial skill to master for any practitioner. This advanced introduction reviews with examples the fundamental concepts of inferential statistics by illustrating the differences between Point Estimators and Confidence Intervals Estimates.

on Aug 12, 2020 in Beginners, Data Science, Statistics
How Natural Language Processing Is Changing Data Analytics

As it becomes more prevalent, NLP will enable humans to interact with computers in ways not possible before. This new type of collaboration will allow improvements in a wide variety of human endeavors, including business, philanthropy, health, and communication.

on Aug 12, 2020 in Data Analytics, Data Science, NLP
Unit Test Your Data Pipeline, You Will Thank Yourself Later

While you cannot test model output, at least you should test that inputs are correct. Compared to the time you invest in writing unit tests, good pieces of simple tests will save you much more time later, especially when working on large projects or big data.

on Aug 11, 2020 in Data Science, Pipeline, Programming
Data Science Internship Interview Questions

Data science is an attractive field because not only is it lucrative, but you can have opportunities to work on interesting projects, and you’re always learning new things. If you're trying to get started from the ground up, then review this guide to prepare for the interview essentials.

on Aug 11, 2020 in Data Science, Internship, Interview Questions
10 Use Cases for Privacy-Preserving Synthetic Data

This article presents 10 use-cases for synthetic data, showing how enterprises today can use this artificially generated information to train machine learning models or share data externally without violating individuals' privacy.

on Aug 11, 2020 in Compliance, Machine Learning, Privacy, Synthetic Data
HOSTKEY GPU Grant Program

The HOSTKEY GPU Grant Program is open to specialists and professionals in the Data Science sector performing research or other projects centered on innovative uses of GPU processing and which will glean practical results in the field of Data Science, with the objective of supporting basic scientific research and prospective startups.

on Aug 10, 2020 in Data Science, GPU, Research
Exploring GPT-3: A New Breakthrough in Language Generation

GPT-3 is the largest natural language processing (NLP) transformer released to date, eclipsing the previous record, Microsoft Research’s Turing-NLG at 17B parameters, by about 10 times. This has resulted in an explosion of demos: some good, some bad, all interesting.

on Aug 10, 2020 in GPT-3, Natural Language Generation, NLP, OpenAI, Turing Test
Data Scientist Job Market 2020

With an analysis of over a thousand Data Scientist job descriptions in the USA, check out the trends for 2020 and current expectations on new positions in the field, including credentials, experience, and programming languages.

on Aug 10, 2020 in Career, Data Scientist, Jobs, USA
Facebook Uses Bayesian Optimization to Conduct Better Experiments in Machine Learning Models

A research from Facebook proposes a Beyasian optimization method to run A/B tests in machine learning models.

on Aug 10, 2020 in Bayesian, Facebook, Machine Learning, Modeling, Optimization
How A Single Source of Truth Can Benefit Your Organization

A single source of truth provides stakeholders with a clear picture of the enterprise assets and the potential complications that can disrupt the data strategy. Find out how you can implement this single source of truth in your enterprise ecosystem.

on Aug 7, 2020 in Business Intelligence, Data Management, Data Quality, Decision Making
The Uncommon Data Science Job Guide

With the job landscape in Data Science becoming hyper-competitive, there are clear strategies you can consider to find your way to snagging a position in the field.

on Aug 7, 2020 in Career, Career Advice, Data Science, Jobs
Batch Normalization in Deep Neural Networks

Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini batch.

on Aug 7, 2020 in Deep Learning, Neural Networks, Normalization, Regularization
Containerization of PySpark Using Kubernetes

This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.

on Aug 6, 2020 in Apache Spark, Containers, Kubernetes
Essential Data Science Tips: How to Use One-Vs-Rest and One-Vs-One for Multi-Class Classification

Classification, as a predictive model, involves aligning each class label to examples. Algorithms designed for binary classification cannot be applied to multi-class classification problems. For such situations, heuristic methods come in handy.

on Aug 6, 2020 in Classification, Machine Learning
Metrics to Use to Evaluate Deep Learning Object Detectors

It's important to understand which metric should be used to evaluate trained object detectors and which one is more important. Is mAP alone enough to evaluate the objector models? Can the same metric be used to evaluate object detectors on validation set and test set?

on Aug 6, 2020 in Computer Vision, Deep Learning, Metrics, Object Detection
New Poll: Which Data Science Skills You Have and Which Ones You Want? Vote Now

Take part in the latest KDnuggets poll, and share your insights with the community. Which Data Science skills do you currently possess, and which are you looking forward to add or improve upon? Vote now!

on Aug 5, 2020 in Career, Data Science, Data Science Skills, Poll, Skills
Word Embedding Fairness Evaluation

With word embeddings being such a crucial component of NLP, the reported social biases resulting from the training corpora could limit their application. The framework introduced here intends to measure the fairness in word embeddings to better understand these potential biases.

on Aug 5, 2020 in Bias, Ethics, Machine Learning, Word Embeddings
Netflix’s Polynote is a New Open Source Framework to Build Better Data Science Notebooks

The new notebook environment provides substantial improvements to streamline experimentation in machine learning workflows.

on Aug 5, 2020 in IDE, Jupyter, Netflix, Open Source, Scala
Implementing MLOps on an Edge Device

This article introduces developers to MLOps and strategies for implementing MLOps on edge devices.

on Aug 4, 2020 in Edge Analytics, Machine Learning, MLOps, Speech Recognition, Workflow
5 Apache Spark Best Practices For Data Science

Check out these best practices for Spark that the author wishes they knew before starting their project.

on Aug 4, 2020 in Apache Spark, Best Practices, Data Science
Know What Employers are Expecting for a Data Scientist Role in 2020

The analysis is done from 1000+ recent Data scientist jobs, extracted from job portals using web scraping.

on Aug 3, 2020 in Career Advice, Data Science, Data Scientist
I have a joke about …

I have a machine learning joke, but it is not performing as well on a new audience. We bring you a selection of the nerdy self-referential computer jokes that were popular on the web recently.

on Aug 1, 2020 in Cartoon, Deep Learning, Humor

2020 Aug

Latest Posts

Top Posts