2020 Nov
All (86) | Events (2) | News, Education (12) | Opinions (22) | Top Stories, Tweets (9) | Tutorials, Overviews (41)
- Top Stories, Nov 23-29: TabPy: Combining Python and Tableau; The Rise of the Machine Learning Engineer - Nov 30, 2020.
Also: Learn Deep Learning with this Free Course from Yann LeCun; Know-How to Learn Machine Learning Algorithms Effectively; 15 Exciting AI Project Ideas for Beginners; How to Get Into Data Science Without a Degree
- Data science certification – why it is important and where to get it? - Nov 30, 2020.
Data science jobs are one of most sought after and in-demand jobs in the IT industry right now. In order to get into this field and get these data science jobs, certification is needed and that is widely discussed below.
- Deploying Trained Models to Production with TensorFlow Serving - Nov 30, 2020.
TensorFlow provides a way to move a trained model to a production environment for deployment with minimal effort. In this article, we’ll use a pre-trained model, save it, and serve it using TensorFlow Serving.
- Data Science History and Overview - Nov 30, 2020.
In this era of big data that is only getting bigger, a huge amount of information from different fields is gathered and stored. Its analysis and extraction of value have become one of the most attractive tasks for companies and society in general, which is harnessed by the new professional role of the Data Scientist.
- A Friendly Introduction to Graph Neural Networks - Nov 30, 2020.
Despite being what can be a confusing topic, graph neural networks can be distilled into just a handful of simple concepts. Read on to find out more.
- Is Your Machine Learning Model Likely to Fail? - Nov 27, 2020.
Read about these 5 missteps to avoid in your planning process.
- The 4 Stages of Being Data-driven for Real-life Businesses - Nov 27, 2020.
Building a new company or transforming an existing one into a data-driven enterprise is a growing process through multiple stages that takes time. The challenge is progressing into the next stage and, having attained the goal, maintaining a company culture that can remain there.
-
Learn Deep Learning with this Free Course from Yann LeCun - Nov 27, 2020.
Here is a freely-available NYU course on deep learning to check out from Yann LeCun and Alfredo Canziani, including videos, slides, and other helpful resources. - How to Know if a Neural Network is Right for Your Machine Learning Initiative - Nov 26, 2020.
It is important to remember that there must be a business reason for even considering neural nets and it should not be because the C-Suite is feeling a bad case of FOMO.
- Cartoon: Thanksgiving and Turkey Data Science - Nov 26, 2020.
A classic KDnuggets Thanksgiving cartoon examines the predicament of one group of fowl Data Scientists.
- Better data apps with Streamlit’s new layout options - Nov 26, 2020.
Introducing new layout primitives - including columns, containers and expanders!
- Essential Math for Data Science: Integrals And Area Under The Curve - Nov 25, 2020.
In this article, you’ll learn about integrals and the area under the curve using the practical data science example of the area under the ROC curve used to compare the performances of two machine learning models.
- How to Incorporate Tabular Data with HuggingFace Transformers - Nov 25, 2020.
In real-world scenarios, we often encounter data that includes text and tabular features. Leveraging the latest advances for transformers, effectively handling situations with both data structures can increase performance in your models.
- Simple Python Package for Comparing, Plotting & Evaluating Regression Models - Nov 25, 2020.
This package is aimed to help users plot the evaluation metric graph with single line code for different widely used regression model metrics comparing them at a glance. With this utility package, it also significantly lowers the barrier for the practitioners to evaluate the different machine learning algorithms in an amateur fashion by applying it to their everyday predictive regression problems.
-
TabPy: Combining Python and Tableau - Nov 24, 2020.
This article demonstrates how to get started using Python in Tableau. - Fraud through the eyes of a machine - Nov 24, 2020.
Data structured as a network of relationships can be modeled as a graph, which can then help extract insights into the data through machine learning and rule-based approaches. While these graph representations provide a natural interface to transactional data for humans to appreciate, caution and context must be applied when leveraging machine-based interpretations of these connections.
- How Data Professionals Can Add More Variation to Their Resumes - Nov 24, 2020.
This article presents seven ways data professionals can add variation to their resumes.
- Top Stories, Nov 16-22: How to Get Into Data Science Without a Degree - Nov 23, 2020.
Also: Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision; Facebook Open Sourced New Frameworks to Advance Deep Learning Research; 5 Most Useful Machine Learning Tools every lazy full-stack data scientist should use; Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision; Is Data Science for Me? 14 Self-examination Questions to Consider
-
15 Exciting AI Project Ideas for Beginners - Nov 23, 2020.
There are many branches to AI to learn, but a project-based approach can keep things interesting. Here is a list of 15 such projects you can get started on implementing today. - Know-How to Learn Machine Learning Algorithms Effectively - Nov 23, 2020.
The takeaway from the story is that machine learning is way beyond a simple fit and predict methods. The author shares their approach to actually learning these algorithms beyond the surface.
-
The Rise of the Machine Learning Engineer - Nov 23, 2020.
The evolution of Big Data into machine learning applications ushered in an exciting era of new roles and skillsets that became necessary to implement these technologies. With the Machine Learning Engineer being such a crucial component today, where the evolution of this field will take us tomorrow should be fascinating. - Computer Vision at Scale With Dask And PyTorch - Nov 23, 2020.
A tutorial on conducting image classification inference using the Resnet50 deep learning model at scale with using GPU clusters on Saturn Cloud. The results were: 40x faster computer vision that made a 3+ hour PyTorch model run in just 5 minutes.
- How Machine Learning Works for Social Good - Nov 21, 2020.
We often discuss applying data science and machine learning techniques in term so of how they help your organization or business goals. But, these algorithms aren't limited to only increasing the bottom line. Developing new applications that leverage the predictive power of AI to benefit society and those communities in need is an equally valuable endeavor for Data Scientists that will further expand the positive impact of machine learning to the world.
- Top 6 Data Science Programs for Beginners - Nov 20, 2020.
Udacity has the best industry-leading programs in data science. Here are the top six data science courses for beginners to help you get started.
- Adversarial Examples in Deep Learning – A Primer - Nov 20, 2020.
Bigger compute has led to increasingly impressive deep learning computer vision model SOTA results. However most of these SOTA deep learning models are brought down to their knees when making predictions on adversarial images. Read on to find out more.
- How Data Scientists Can Avoid ‘Lost in Translation’ Syndrome When Communicating With Management - Nov 20, 2020.
When it comes to data science projects, the disconnect between business executives and data teams can lead to major tension. Keeping these challenges from arising in the first place through effective communication will help reduce friction with stakeholders.
- Cellular Automata in Stream Learning - Nov 20, 2020.
In this post, we will start presenting CA as pattern recognition methods for stream learning. Finally, we will briefly mention two recent CA-based solutions for stream learning. Both are highly interpretable as their cellular structure represents directly the mapping between the feature space and the labels to be predicted.
- The top courses for aspiring data scientists - Nov 19, 2020.
Here are four courses that can give you the necessary skills to lead businesses in the 21st century. All of them include Python programming as a course component. Most of them require an undergraduate knowledge of statistics, calculus, linear algebra, and probability, so we recommend checking your course of interest for the specifics.
-
AI and Automation meets BI - Nov 19, 2020.
Organizations use a variety of BI tools to analyze structured data. These tools are used for ad-hoc analysis, and for dashboards and reports that are essential for decision making. In this post, we describe a new set of BI tools that continue this trend. - Compute Goes Brrr: Revisiting Sutton’s Bitter Lesson for AI - Nov 19, 2020.
"It's just about having more compute." Wait, is that really all there is to AI? As Richard Sutton's 'bitter lesson' sinks in for more AI researchers, a debate has stirred that considers a potentially more subtle relationship between advancements in AI based on ever-more-clever algorithms and massively scaled computational power.
- Kubernetes vs. Amazon ECS for Data Scientists - Nov 19, 2020.
In this article, we’ll look at two container management solutions — Kubernetes and Amazon Elastic Container Service (ECS) — from a perspective that makes sense for aspiring and current data scientists.
- Top KDnuggets tweets, Nov 11-17: Data Engineering – the Cousin of Data Science, is Troublesome - Nov 18, 2020.
Also 6 Things About #DataScience that Employers Don't Want You to Know; NLP - Zero to Hero with #Python #NLProc; 5 Tricky SQL Queries Solved - Explaining the approach to solving a few complex #SQL queries.
- Primer on TensorFlow and how PerceptiLabs Makes it Easier - Nov 18, 2020.
With PerceptiLabs, beginners can get started building a model more quickly, and those with more experience can still dive into the code. Given that PerceptiLabs runs TensorFlow behind the scenes, we thought we'd walk through the framework so you can understand its basics, and how it is utilized by PerceptiLabs.
- Hypothesis Vetting: The Most Important Skill Every Successful Data Scientist Needs - Nov 18, 2020.
A well-thought hypothesis sets the direction and plan for a Data Science project. Accordingly, a hypothesis is the most important item for evaluating whether a Data Science project will be successful.
- 5 Most Useful Machine Learning Tools every lazy full-stack data scientist should use - Nov 18, 2020.
If you consider yourself a Data Scientist who can take any project from data curation to solution deployment, then you know there are many tools available today to help you get the job done. The trouble is that there are too many choices. Here is a review of five sets of tools that should turn you into the most efficient full-stack data scientist possible.
- How to Future-Proof Your Data Science Project - Nov 18, 2020.
This article outlines 5 critical elements of ML model selection & deployment.
- AI Is More Than a Model: Four Steps to Complete Workflow Success - Nov 17, 2020.
The key element for success in practical AI implementation is uncovering any issues early on and knowing what aspects of the workflow to focus time and resources on for the best results—and it’s not always the most obvious steps.
-
Facebook Open Sourced New Frameworks to Advance Deep Learning Research - Nov 17, 2020.
Polygames, PyTorch3D and HiPlot are the new additions to Facebook’s open source deep learning stack. -
Is Data Science for Me? 14 Self-examination Questions to Consider - Nov 17, 2020.
You are intrigued by this exciting new field of Data Science, and you think you want in on the action. The demand remains very high and the salaries are strong. Before taking the leap onto this path, these questions will help you evaluate if you are ready for the challenges and opportunities. - Algorithms for Advanced Hyper-Parameter Optimization/Tuning - Nov 17, 2020.
In informed search, each iteration learns from the last, whereas in Grid and Random, modelling is all done at once and then the best is picked. In case for small datasets, GridSearch or RandomSearch would be fast and sufficient. AutoML approaches provide a neat solution to properly select the required hyperparameters that improve the model’s performance.
- The Power of Spreadsheets for Achieving a Data Driven Culture [Nov 19 Webinar] - Nov 16, 2020.
Join Metis Senior Data Scientist Kevin Birnbaum this Thurs, Nov 19 @ 12 PM ET, as he explains how spreadsheets can help all employees get comfortable with data and empower them to perform their own analysis without hand holding by your advanced analytics team.
- 5 Things You Are Doing Wrong in PyCaret - Nov 16, 2020.
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient. Find out 5 ways to improve your usage of the library.
-
How to Get Into Data Science Without a Degree - Nov 16, 2020.
Breaking into any new field or slogging through a career change is always a challenge, and requires focus and even a little grit. While transitioning to becoming a Data Scientist is no different, aspiring to this role is possible, even without a formal post-secondary degree, largely due to the vast amount of quality learning resources available today. - Top Stories, Nov 9-15: How to Acquire the Most Wanted Data Science Skills; From Y=X to Building a Complete Artificial Neural Network - Nov 16, 2020.
Also: Learn to build an end to end data science project; How to Acquire the Most Wanted Data Science Skills; Moving from Data Science to Machine Learning Engineering; Learn to build an end to end data science project; DIY Election Fraud Analysis Using Benford's Law
-
Top Python Libraries for Deep Learning, Natural Language Processing & Computer Vision - Nov 16, 2020.
This article compiles the 30 top Python libraries for deep learning, natural language processing & computer vision, as best determined by KDnuggets staff. - When Machine Learning Knows Too Much About You - Nov 14, 2020.
If machine learning models predict personal information about you, even if it is unintentional, then what sort of ethical dilemma exists in that model? Where does the line need to be drawn? There have already been many such cases, some of which have become overblown folk lore while others are potentially serious overreaches of governments.
-
From Y=X to Building a Complete Artificial Neural Network - Nov 13, 2020.
In this tutorial, we will start with the most simple artificial neural network (ANN) and move to something much more complex. We begin by building a machine learning model with no parameters—which is Y=X. - tensorflow + dalex = :) , or how to explain a TensorFlow model - Nov 13, 2020.
Having a machine learning model that generates interesting predictions is one thing. Understanding why it makes these predictions is another. For a tensorflow predictive model, it can be straightforward and convenient develop an explainable AI by leveraging the dalex Python package.
-
How to Acquire the Most Wanted Data Science Skills - Nov 13, 2020.
We recently surveyed KDnuggets readers to determine the "most wanted" data science skills. Since they seem to be those most in demand from practitioners, here is a collection of resources for getting started with this learning. - Toward a More Effective Disease Outbreak Alert System: A Symptoms Approach to Biosurveillance [Nov 19 webinar] - Nov 12, 2020.
Learn how the use of more granular symptoms-level data combined with innovative statistical techniques has the potential to identify disease outbreaks faster while limiting false positives.
-
Do’s and Don’ts of Analyzing Time Series - Nov 12, 2020.
When handling time series data in your Data Science analysis work, a variety of common mistakes are made that are basic, but very important, to the processing of this type of data. Here, we review these issues and recommend the best practices. - Free From MIT: Intro to Computational Thinking with Julia - Nov 12, 2020.
Introduction to Computational Thinking with Julia, with Applications to Modeling the COVID-19 Pandemic is another freely-available offering from MIT's Open Courseware.
- Top KDnuggets tweets, Nov 04-10: #DataVisualization of people votes. Land doesn’t vote. People do. - Nov 11, 2020.
Also: Accelerated Natural Language Processing: A #Free Amazon #MachineLearning University Course; Essential data science skills that no one talks about; U.S. election maps are wildly misleading, so this designer fixed them; Top Certificates and Certifications in #Analytics, #DataScience, #MachineLearning and AI
- How to use AI & analytics now to prepare for resiliency in 2021 - Nov 11, 2020.
Emerge with Resiliency 2020 is a no-cost virtual event presented by the IBM Planning Analytics and Cognos Community taking place on Nov 18. This one-day event includes 8 expert sessions, during which you’ll learn how IBM solutions can help enhance business continuity, reduce risk from emerging threats, and help you prepare for and manage disruption.
- Most Popular Distance Metrics Used in KNN and When to Use Them, by Sarang Anil Gokte - Nov 11, 2020.
For calculating distances KNN uses a distance metric from the list of available metrics. Read this article for an overview of these metrics, and when they should be considered for use.
-
Learn to build an end to end data science project - Nov 11, 2020.
Appreciating the process you must work through for any Data Science project is valuable before you land your first job in this field. With a well-honed strategy, such as the one outlined in this example project, you will remain productive and consistently deliver valuable machine learning models. - Deep Learning Design Patterns. - Nov 11, 2020.
New book, "Deep Learning Design Patterns" presents deep learning models in a unique-but-familiar new way: as extendable design patterns you can easily plug-and-play into your software projects. Use code kdmath50 to save 50% off.
- Mastering TensorFlow Tensors in 5 Easy Steps - Nov 11, 2020.
Discover how the building blocks of TensorFlow works at the lower level and learn how to make the most of Tensor objects.
- Top October Stories: Data Science Minimum: 10 Essential Skills You Need to Know to Start Doing Data Science; fastcore: An Underrated Python Library - Nov 10, 2020.
Also: Goodhart's Law for Data Science and what happens when a measure becomes a target? How to become a Data Scientist: a step-by-step guide; 10 Best Machine Learning Courses in 2020.
- Multi-domain summarization by PlexPage - Nov 10, 2020.
The PlexPage by Algoritmi Vision is an Abstractive Multi-domain Search Summarization application built using the unique and innovative structure of the Natural Language Generation (NLG) technique. Learn more here, and try it out for yourself.
- Predicting Heart Disease Using Machine Learning? Don’t! - Nov 10, 2020.
I believe the “Predicting Heart Disease using Machine Learning” is a classic example of how not to apply machine learning to a problem, especially where a lot of domain experience is required.
-
Every Complex DataFrame Manipulation, Explained & Visualized Intuitively - Nov 10, 2020.
Most Data Scientists might hail the power of Pandas for data preparation, but many may not be capable of leveraging all that power. Manipulating data frames can quickly become a complex task, so eight of these techniques within Pandas are presented with an explanation, visualization, code, and tricks to remember how to do it. - Moving from Data Science to Machine Learning Engineering - Nov 10, 2020.
The world of machine learning — and software — is changing. Read this article to find out how, and what you can do to stay ahead of it.
- 5 Reasons Why Containers Will Rule Data Science - Nov 9, 2020.
Historically, containers were a way to abstract a software stack away from the operating system. For data scientists, containers have historically offered few benefits.
- My Data Science Online Learning Journey on Coursera - Nov 9, 2020.
Check out the author's informative list of courses and specializations on Coursera taken to get started on their data science and machine learning journey.
- Doing the impossible? Machine learning with less than one example - Nov 9, 2020.
Machine learning algorithms are notoriously known for needing data, a lot of data -- the more data the better. But, much research has gone into developing new methods that need fewer examples to train a model, such as "few-shot" or "one-shot" learning that require only a handful or a few as one example for effective learning. Now, this lower boundary on training examples is being taken to the next extreme.
- Change the Background of Any Image with 5 Lines of Code - Nov 9, 2020.
Blur, color, grayscale and change the background of any image with a picture using PixelLib.
- Top Stories, Nov 2-8: Top Python Libraries for Data Science, Data Visualization & Machine Learning; The Best Data Science Certification You’ve Never Heard Of - Nov 9, 2020.
Also: Top 5 Free Machine Learning and Deep Learning eBooks Everyone should read; Pandas on Steroids: End to End Data Science in Python with Dask; Essential data science skills that no one talks about; DIY Election Fraud Analysis Using Benford's Law
-
Pandas on Steroids: End to End Data Science in Python with Dask - Nov 6, 2020.
End to end parallelized data science from reading big data to data manipulation to visualisation to machine learning. - Six Ethical Quandaries of Predictive Policing - Nov 6, 2020.
When predictive machine learning models are applied to real-life scenarios, especially those that directly impact humans, such as cancer detection and other medical-related applications, the risks involved with incorrect predictions carry very high stakes. These risks are also prominent in how machine learning is applied in law enforcement, and serious ethical questions must be considered.
-
Essential data science skills that no one talks about - Nov 6, 2020.
Old fashioned engineering skills are what you need to boost your data science career. - Top KDnuggets tweets, Oct 28 – Nov 03: 11 Branches of #MachineLearning and 63 Important Machine Learning Algorithms; Top #Python Libraries for #DataScience, #DataVisualization, #MachineLearning - Nov 5, 2020.
Building Neural Networks with PyTorch in Google Colab; The Roadmap of Mathematics for Deep Learning; Top #Python Libraries for Data Science, Data Visualization, Machine Learning; 11 Branches of #MachineLearning and 63 Important Machine Learning Algorithms.
- 2 Coding-free Ways to Extract Content From Websites to Boost Web Traffic - Nov 5, 2020.
There are 2 main coding-free solutions for extracting content from websites to build your content base: use web scraping tools and use content aggregation tools. We review top choices.
- How to Build a Football Dataset with Web Scraping - Nov 5, 2020.
This article covers using Selenium to scrape JavaScript rendered content.
-
Top 5 Free Machine Learning and Deep Learning eBooks Everyone should read - Nov 5, 2020.
There is always so much new to learn in machine learning, and keeping well grounded in the fundamentals will help you stay up-to-date with the latest advancements while acing your career in Data Science. - How to deploy PyTorch Lightning models to production - Nov 5, 2020.
A complete guide to serving PyTorch Lightning models at scale.
- Interpretability, Explainability, and Machine Learning – What Data Scientists Need to Know - Nov 4, 2020.
The terms “interpretability,” “explainability” and “black box” are tossed about a lot in the context of machine learning, but what do they really mean, and why do they matter?
-
The Best Data Science Certification You’ve Never Heard Of - Nov 4, 2020.
The CDMP is the best data strategy certification you’ve never heard of. (And honestly, when you consider the fact that you’re probably working a job that didn’t exist ten years ago, it’s not surprising that this certification isn’t widespread just yet.) - Building Deep Learning Projects with fastai — From Model Training to Deployment - Nov 4, 2020.
A getting started guide to develop computer vision application with fastai.
- Top Stories, Oct 26 – Nov 1: How to become a Data Scientist: a step-by-step guide; PerceptiLabs — A GUI and Visual API for TensorFlow - Nov 3, 2020.
Also: Ain't No Such a Thing as a Citizen Data Scientist; Building Neural Networks with PyTorch in Google Colab.
- 10 Principles of Practical Statistical Reasoning - Nov 3, 2020.
Practical Statistical Reasoning is a term that covers the nature and objective of applied statistics/data science, principles common to all applications, and practical steps/questions for better conclusions. The following principles have helped me become more efficient with my analyses and clearer in my conclusions.
- When good data analyses fail to deliver the results you expect - Nov 3, 2020.
To all those Data Scientists out there who thrive on discovering actionable insights from your data (all of you, right?), take heed from this cautionary tale of a data analysis, a dashboard, and a huge waste of resources.
- Topic Modeling with BERT - Nov 3, 2020.
Leveraging BERT and TF-IDF to create easily interpretable topics.
- Data scientist or machine learning engineer? Which is a better career option? - Nov 2, 2020.
In order to build automated data processing systems, we require professionals like Machine Learning Engineers and Data Scientists. But which of these is a better career option right now? Read on to find out.
- Microsoft and Google Open Sourced These Frameworks Based on Their Work Scaling Deep Learning Training - Nov 2, 2020.
Google and Microsoft have recently released new frameworks for distributed deep learning training.
- The Missing Teams For Data Scientists - Nov 2, 2020.
Still today, too large a percent of data science projects fail, many of which can be attributed to the impacts of how hard missing data teams hit the data science team. Advocating for the missing data engineering and operations components to your team will make your professional life easier and more productive.