2020 Aug
All (83) | Events (4) | News, Education (6) | Opinions (14) | Top Stories, Tweets (10) | Tutorials, Overviews (49)
- eBook: Vocabularies, Text Mining and FAIR Data: The Strategic Role Information Managers Play - Aug 31, 2020.
How can information managers find strategic roles to play in their organization's AI and data analysis projects? Download this book to learn more.
- A Curious Theory About the Consciousness Debate in AI - Aug 31, 2020.
Dr. Michio Kaku has formulated a very interesting theory of consciousness that applies to AI systems.
- Top Stories, Aug 24-30: If I had to start learning Data Science again, how would I do it?; 4 ways to improve your TensorFlow model – key regularization techniques you need to know - Aug 31, 2020.
Also: The NLP Model Forge: Generate Model Code On Demand; DeepMinds Three Pillars for Building Robust Machine Learning Systems; Beyond the Turing Test; Must-read NLP and Deep Learning articles for Data Scientists; How to Optimize Your CV for a Data Scientist Career
- Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Semantics and Pragmatics - Aug 31, 2020.
Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.
- Accelerated Computer Vision: A Free Course From Amazon - Aug 31, 2020.
Amazon's Machine Learning University is making its online courses available to the public, and this time we look at its Accelerated Computer Vision offering.
- Data is everywhere and it powers everything we do! - Aug 28, 2020.
In this article I would like to focus on how companies can start their data-centric strategies and how to achieve success in their data transformation journeys. Have tried to share my thoughts why companies have to consider data at its epitome for their growth, for being competitive, for being smarter, innovative and be prepared for any unforeseen market surprises.
- Beyond the Turing Test - Aug 28, 2020.
With more advancements in AI, it might be time to replace the age-old Turing Test with something better to determine if a machine is thinking. Specifically, a more modern approach might include standard questions designed to probe various facets of intelligence, and comparing the computer to a spectrum of human respondents of different ages, sexes, backgrounds, and abilities.
- Microsoft’s DoWhy is a Cool Framework for Causal Inference - Aug 28, 2020.
Inspired by Judea Pearl’s do-calculus for causal inference, the open source framework provides a programmatic interface for popular causal inference methods.
- Explainable and Reproducible Machine Learning Model Development with DALEX and Neptune - Aug 27, 2020.
With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.
-
4 ways to improve your TensorFlow model – key regularization techniques you need to know - Aug 27, 2020.
Regularization techniques are crucial for preventing your models from overfitting and enables them perform better on your validation and test sets. This guide provides a thorough overview with code of four key approaches you can use for regularization in TensorFlow. - Working with Spark, Python or SQL on Azure Databricks - Aug 27, 2020.
Here we look at some ways to interchangeably work with Python, PySpark and SQL using Azure Databricks, an Apache Spark-based big data analytics service designed for data science and data engineering offered by Microsoft.
- Top KDnuggets tweets, Aug 19-25: #MachineLearning - Handling Missing Data - Aug 26, 2020.
Machine Learning - Handling Missing Data; The Last SQL Guide for Data Analysis You'll Ever Need; How (not) to use #MachineLearning for time series forecasting: The sequel
- Data Versioning: Does it mean what you think it means? - Aug 26, 2020.
Does data versioning mean what you think it means? Read this overview with use cases to see what data versioning really is, and the tools that can help you manage it.
-
How to Optimize Your CV for a Data Scientist Career - Aug 26, 2020.
As the number of data science positions continues to grow dramatically, so does the number of data scientists in the marketplace. Follow these expert tips and examples to help make your resume and job applications stand out in an increasingly competitive field. - Breaking Privacy in Federated Learning - Aug 26, 2020.
Despite the benefits of federated learning, there are still ways of breaching a user’s privacy, even without sharing private data. In this article, we’ll review some research papers that discuss how federated learning includes this vulnerability.
- Unifying Data Pipelines and Machine Learning with Apache Spark™ and Amazon SageMaker - Aug 25, 2020.
Roll up your sleeves and charge up because you’re invited to an interactive, virtual Machine Learning workshop run by Amazon Web Services, Databricks, and Immuta on September 10.
- How Data Science Is Keeping People Safe During COVID-19 - Aug 25, 2020.
Data, and more importantly, the way people use it, is shaping and refining approaches to COVID-19 safety. Here's a closer look at how this is happening.
- Getting Started with Feature Selection - Aug 25, 2020.
For machine learning, more data is always better. What about more features of data? Not necessarily. This beginners' guide with code examples for selecting the most useful features from your data will jump start you toward developing the most effective and efficient learning models.
- Data Science Tools Illustrated Study Guides - Aug 25, 2020.
These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.
- Predictive Analytics World Berlin 2020 – Keynote Sessions Announced - Aug 24, 2020.
The agenda of Predictive Analytics World Berlin, 16-17 Nov, is taking shape. Curious? Take a peek at our keynote sessions 2020. Don't forget to use the code KDNUGGETS for a 15% discount on your Predictive Analytics World ticket.
- DeepMind’s Three Pillars for Building Robust Machine Learning Systems - Aug 24, 2020.
Specification Testing, Robust Training and Formal Verification are three elements that the AI powerhouse believe hold the essence of robust machine learning models.
- Top Stories, Aug 17-23: If I had to start learning Data Science again, how would I do it? - Aug 24, 2020.
Also: Must-read NLP and Deep Learning articles for Data Scientists; Top Google AI, Machine Learning Tools for Everyone; Introduction to Federated Learning; Must-read NLP and Deep Learning articles for Data Scientists; These Data Science Skills will be your Superpower
- A Deep Dive Into the Transformer Architecture – The Development of Transformer Models - Aug 24, 2020.
Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.
-
The NLP Model Forge: Generate Model Code On Demand - Aug 24, 2020.
You've seen their Big Bad NLP Database and The Super Duper NLP Repo. Now Quantum Stat is back with its most ambitious NLP product yet: The NLP Model Forge. - Performance Testing on Big Data Applications - Aug 21, 2020.
You can use performance testing in any application you’re working on but it’s especially useful for big data applications. Let’s see why.
-
Must-read NLP and Deep Learning articles for Data Scientists - Aug 21, 2020.
NLP and deep learning continue to advance, nearly on a daily basis. Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve. - Data Science Meets Devops: MLOps with Jupyter, Git, and Kubernetes - Aug 21, 2020.
An end-to-end example of deploying a machine learning product using Jupyter, Papermill, Tekton, GitOps and Kubeflow.
- Rapid Python Model Deployment with FICO Xpress Insight - Aug 20, 2020.
The biggest hurdle in the use of data to create business value, is indeed the ability to operationalize analytics throughout the organization. Xpress Insight is geared to reduce the burden on IT and address their critical requirements while empowering business users to take ownership of decisions and change management.
- Build Your Own AutoML Using PyCaret 2.0 - Aug 20, 2020.
In this post we present a step-by-step tutorial on how PyCaret can be used to build an Automated Machine Learning Solution within Power BI, thus allowing data scientists and analysts to add a layer of machine learning to their Dashboards without any additional license or software costs.
-
These Data Science Skills will be your Superpower - Aug 20, 2020.
Learning data science means learning the hard skills of statistics, programming, and machine learning. To complete your training, a broader set of soft skills will round out your capabilities as an effective and successful professional Data Scientist. -
Introduction to Federated Learning - Aug 20, 2020.
Federated learning means enabling on-device training, model personalization, and more. Read more about it in this article. - Top KDnuggets tweets, Aug 12-18: @Amazon Wants to Make You a #MachineLearning Practitioner— For Free - Aug 19, 2020.
Also: 24 Best (and #Free) #Books To Understand #MachineLearning; Task Cheatsheet for Almost Every #MachineLearning Project; The Dunning-Kruger Effect Explains Why Society Is So Screwed-Up; 4 #Free Math Courses to do and Level up your #DataScience Skills
- How Semiconductor Innovation Could Help Prevent The Next Pandemic - Aug 19, 2020.
Samsung Semiconductor technology has played a particularly essential role in the fight against Covid-19. Samsung technology powers many of the most innovative programs and AI platforms that are helping scientists conduct research and achieve breakthroughs at a speed that would have been impossible just a few years ago.
- Autotuning for Multi-Objective Optimization on LinkedIn’s Feed Ranking - Aug 19, 2020.
In this post, the authors share their experience coming up with an automated system to tune one of the main parameters in their machine learning model that recommends content on LinkedIn’s Feed, which is just one piece of the community-focused architecture.
-
If I had to start learning Data Science again, how would I do it? - Aug 19, 2020.
While different ways to learn Data Science for the first time exist, the approach that works for you should be based on how you learn best. One powerful method is to evolve your learning from simple practice into complex foundations, as outlined in this learning path recommended by a physicist who turned into a Data Scientist. - Accelerated Natural Language Processing: A Free Course From Amazon - Aug 19, 2020.
Amazon's Machine Learning University is making its online courses available to the public, starting with this Accelerated Natural Language Processing offering.
- Visualizing the Mobility Trends in European Countries Affected by COVID-19 - Aug 18, 2020.
This post highlights the movement of people from the 10 most-affected European countries based on the way they stay at home, work, and visit places, using Google's anonymized location tracking dataset.
- KDD-2020 (virtual), the leading conference on Data Science and Knowledge Discovery, Aug 23-27 – register now - Aug 18, 2020.
Using an interactive VR platform, KDD-2020 brings you the latest research in AI, Data Science, Deep Learning, and Machine Learning with tutorials to improve your skills, keynotes from top experts, workshops on state-of-the-art topics and over 200 research presentations.
-
Top Google AI, Machine Learning Tools for Everyone - Aug 18, 2020.
Google is much more than a search company. Learn about all the tools they are developing to help turn your ideas into reality through Google AI. - How “Anonymous” is Anonymized Data? - Aug 18, 2020.
As the collection of personal data democratized over the previous century, the question of data anonymization started to rise. The regulations coming into effect around the world sealed the importance of the matter.
- Top Stories, Aug 10-16: Know What Employers are Expecting for a Data Scientist Role in 2020; The List of Top 10 Lists in Data Science - Aug 17, 2020.
Also: 5 Different Ways to Load Data in Python; Facebook Uses Bayesian Optimization to Conduct Better Experiments in Machine Learning Models; Exploring GPT-3: A New Breakthrough in Language Generation; Unit Test Your Data Pipeline, You Will Thank Yourself Later; Going Beyond Superficial: Data Science MOOCs with Substance
- Reducing Re-Identification Risk in Health Data - Aug 17, 2020.
Want to learn more about our recommendations for strengthening privacy while preserving utility? Read Immuta's new whitepaper, "Reducing Re-Identification Risk in Health Data: A Guide to Three Privacy Enhancing Technologies", to get the inside scoop on the best privacy enhancing technologies.
- 3D Human Pose Estimation Experiments and Analysis - Aug 17, 2020.
In this article, we explore how 3D human pose estimation works based on our research and experiments, which were part of the analysis of applying human pose estimation in AI fitness coach applications.
- How Do Neural Networks Learn? - Aug 17, 2020.
With neural networks being so popular today in AI and machine learning development, they can still look like a black box in terms of how they learn to make predictions. To understand what is going on deep in these networks, we must consider how neural networks perform optimization.
- Are Computer Vision Models Vulnerable to Weight Poisoning Attacks? - Aug 17, 2020.
A recent paper has explored the possibility of influencing the predictions of a freshly trained Natural Language Processing (NLP) model by tweaking the weights re-used in its training. his result is especially interesting if it proves to transfer also to the context of Computer Vision (CV) since there, the usage of pre-trained weights is widespread.
- Content-Based Recommendation System using Word Embeddings - Aug 14, 2020.
This article explores how average Word2Vec and TF-IDF Word2Vec can be used to build a recommendation engine.
-
The List of Top 10 Lists in Data Science - Aug 14, 2020.
The list of Top 10 lists that Data Scientists -- from enthusiasts to those who want to jump start a career -- must know to smoothly navigate a path through this field. - Hypothesis Test for Real Problems - Aug 14, 2020.
Hypothesis tests are significant for evaluating answers to questions concerning samples of data.
- Bring your Pandas Dataframes to life with D-Tale - Aug 13, 2020.
Bring your Pandas dataframes to life with D-Tale. D-Tale is an open-source solution for which you can visualize, analyze and learn how to code Pandas data structures. In this tutorial you'll learn how to open the grid, build columns, create charts and view code exports.
-
Going Beyond Superficial: Data Science MOOCs with Substance - Aug 13, 2020.
Data science MOOCs are superficial. At least, a lot of them are. What are your options when looking for something more substantive? - Top KDnuggets tweets, Aug 5-11: Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild - Aug 12, 2020.
Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild; How to Evaluate the Performance of Your Machine Learning Model; Deep Learning Most Important Ideas - an excellent review
- Exclusive Reuters Events C-level webinar: Technology enabled customer-centric innovation with Travelers, Wells Fargo and Henkel - Aug 12, 2020.
Join this exclusive Reuters Events webinar: Technology enabled customer-centric innovation with Travelers, Wells Fargo and Henkel, on Aug 20 @ 11:00 ET, and learn how to marry both concepts to delight your consumers with data driven customer-centric innovation!
- GitHub is the Best AutoML You Will Ever Need - Aug 12, 2020.
This article uses PyCaret 2.0, an open source, low-code machine learning library in Python to develop a simple AutoML solution and deploy it as a Docker container using GitHub actions.
- Introduction to Statistics for Data Science - Aug 12, 2020.
Statistics is foundational for Data Science and a crucial skill to master for any practitioner. This advanced introduction reviews with examples the fundamental concepts of inferential statistics by illustrating the differences between Point Estimators and Confidence Intervals Estimates.
- How Natural Language Processing Is Changing Data Analytics - Aug 12, 2020.
As it becomes more prevalent, NLP will enable humans to interact with computers in ways not possible before. This new type of collaboration will allow improvements in a wide variety of human endeavors, including business, philanthropy, health, and communication.
- Top July Stories: Data Science MOOCs are too Superficial - Aug 11, 2020.
Also: A Layman's Guide to Data Science. Part 3: Data Science Workflow; The Bitter Lesson of Machine Learning; Free MIT Courses on Calculus: The Key to Understanding Deep Learning.
- Will Reinforcement Learning Pave the Way for Accessible True Artificial Intelligence? - Aug 11, 2020.
Python Machine Learning, Third Edition covers the essential concepts of reinforcement learning, starting from its foundations, and how RL can support decision making in complex environments. Read more on the topic from the book's author Sebastian Raschka.
-
Unit Test Your Data Pipeline, You Will Thank Yourself Later - Aug 11, 2020.
While you cannot test model output, at least you should test that inputs are correct. Compared to the time you invest in writing unit tests, good pieces of simple tests will save you much more time later, especially when working on large projects or big data. - Data Science Internship Interview Questions - Aug 11, 2020.
Data science is an attractive field because not only is it lucrative, but you can have opportunities to work on interesting projects, and you’re always learning new things. If you're trying to get started from the ground up, then review this guide to prepare for the interview essentials.
- 10 Use Cases for Privacy-Preserving Synthetic Data - Aug 11, 2020.
This article presents 10 use-cases for synthetic data, showing how enterprises today can use this artificially generated information to train machine learning models or share data externally without violating individuals' privacy.
- HOSTKEY GPU Grant Program - Aug 10, 2020.
The HOSTKEY GPU Grant Program is open to specialists and professionals in the Data Science sector performing research or other projects centered on innovative uses of GPU processing and which will glean practical results in the field of Data Science, with the objective of supporting basic scientific research and prospective startups.
- Exploring GPT-3: A New Breakthrough in Language Generation - Aug 10, 2020.
GPT-3 is the largest natural language processing (NLP) transformer released to date, eclipsing the previous record, Microsoft Research’s Turing-NLG at 17B parameters, by about 10 times. This has resulted in an explosion of demos: some good, some bad, all interesting.
- Data Scientist Job Market 2020 - Aug 10, 2020.
With an analysis of over a thousand Data Scientist job descriptions in the USA, check out the trends for 2020 and current expectations on new positions in the field, including credentials, experience, and programming languages.
- Top Stories, Aug 3-9: Know What Employers are Expecting for a Data Scientist Role in 2020 - Aug 10, 2020.
Netflix's Polynote is a New Open Source Framework to Build Better Data Science Notebooks; Metrics to Use to Evaluate Deep Learning Object Detectors; Setting Up Your Data Science & Machine Learning Capability in Python; Which Data Science Skills are core and which are hot/emerging ones?
- Facebook Uses Bayesian Optimization to Conduct Better Experiments in Machine Learning Models - Aug 10, 2020.
A research from Facebook proposes a Beyasian optimization method to run A/B tests in machine learning models.
- How A Single Source of Truth Can Benefit Your Organization - Aug 7, 2020.
A single source of truth provides stakeholders with a clear picture of the enterprise assets and the potential complications that can disrupt the data strategy. Find out how you can implement this single source of truth in your enterprise ecosystem.
- The Uncommon Data Science Job Guide - Aug 7, 2020.
With the job landscape in Data Science becoming hyper-competitive, there are clear strategies you can consider to find your way to snagging a position in the field.
- Batch Normalization in Deep Neural Networks - Aug 7, 2020.
Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini batch.
- Containerization of PySpark Using Kubernetes - Aug 6, 2020.
This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.
- Essential Data Science Tips: How to Use One-Vs-Rest and One-Vs-One for Multi-Class Classification - Aug 6, 2020.
Classification, as a predictive model, involves aligning each class label to examples. Algorithms designed for binary classification cannot be applied to multi-class classification problems. For such situations, heuristic methods come in handy.
- Metrics to Use to Evaluate Deep Learning Object Detectors - Aug 6, 2020.
It's important to understand which metric should be used to evaluate trained object detectors and which one is more important. Is mAP alone enough to evaluate the objector models? Can the same metric be used to evaluate object detectors on validation set and test set?
- Top KDnuggets tweets, Jul 29 – Aug 04: Awesome Machine Learning and AI Courses - Aug 5, 2020.
Also: Why You Should Get Google’s New Machine Learning Certificate; A Tour of End-to-End Machine Learning Platforms; Want to become a #DataAnalyst, #DataScientist or #DataEngineer? Learn #SQL, and other insights from ~9K job postings; Left for Dead, R #rstats Surges Again
-
New Poll: Which Data Science Skills You Have and Which Ones You Want? Vote Now - Aug 5, 2020.
Take part in the latest KDnuggets poll, and share your insights with the community. Which Data Science skills do you currently possess, and which are you looking forward to add or improve upon? Vote now! - Word Embedding Fairness Evaluation - Aug 5, 2020.
With word embeddings being such a crucial component of NLP, the reported social biases resulting from the training corpora could limit their application. The framework introduced here intends to measure the fairness in word embeddings to better understand these potential biases.
-
Netflix’s Polynote is a New Open Source Framework to Build Better Data Science Notebooks - Aug 5, 2020.
The new notebook environment provides substantial improvements to streamline experimentation in machine learning workflows. - Implementing MLOps on an Edge Device - Aug 4, 2020.
This article introduces developers to MLOps and strategies for implementing MLOps on edge devices.
-
Setting Up Your Data Science & Machine Learning Capability in Python - Aug 4, 2020.
With the rich and dynamic ecosystem of Python continuing to be a leading programming language for data science and machine learning, establishing and maintaining a cost-effective development environment is crucial to your business impact. So, do you rent or buy? This overview considers the hidden and obvious factors involved in selecting and implementing your Python platform. - 5 Apache Spark Best Practices For Data Science - Aug 4, 2020.
Check out these best practices for Spark that the author wishes they knew before starting their project.
- Announcing PyCaret 2.0 - Aug 3, 2020.
PyCaret 2.0 has been released! Find out about all of the updates and see examples of how to use them right here.
- The Machine Learning Field Guide - Aug 3, 2020.
This straightforward guide offers a structured overview of all machine learning prerequisites needed to start working on your project, including the complete data pipeline from importing and cleaning data to modelling and production.
- Top Stories, Jul 27 – Aug 2: Computational Linear Algebra for Coders; Awesome Machine Learning and AI Courses - Aug 3, 2020.
Also: Essential Resources to Learn Bayesian Statistics; Deep Learning for Signal Processing: What You Need to Know; A Tour of End-to-End Machine Learning Platforms; I have a joke about …; First Steps of a Data Science Project
-
Know What Employers are Expecting for a Data Scientist Role in 2020 - Aug 3, 2020.
The analysis is done from 1000+ recent Data scientist jobs, extracted from job portals using web scraping. -
I have a joke about … - Aug 1, 2020.
I have a machine learning joke, but it is not performing as well on a new audience. We bring you a selection of the nerdy self-referential computer jokes that were popular on the web recently.