- Top April Stories: The Most In-Demand Skills for Data Scientists in 2021, by Gregory Piatetsky - May 11, 2021.
The Most In-Demand Skills for Data Scientists in 2021; Data Science Books You Should Start Reading in 2021; How to organize your data science project; Shapash: Making Machine Learning Models Understandable.
- Make Connections With SAS Live Web Learning, by SAS - May 11, 2021.
Through a year of uncertainty, the demand for analytics skills and the desire to continue skills development remained consistent. Take this opportunity to join SAS expert instructors and learn the latest skills in a Live Web class.
- Confidence Intervals for XGBoost, by Guillaume Saupin - May 11, 2021.
Read this article about building a regularized Quantile Regression objective.
- Must-have Chrome Extensions For Machine Learning Engineers And Data Scientists, by Himanshu Ragtah - May 11, 2021.
Browser extensions are a productivity secret weapon for hackers and developers. Many machine learning practitioners use Chrome, and this list features must-have Chrome extensions for machine learning engineers and data scientists that you should check out today.
- What Makes AI Trustworthy?, by Ronel Sylvester - May 11, 2021.
This blog pertains to the importance of why AI needs to be trustworthy as well as what makes it trustworthy. AI predictions/suggestions should not just be taken at face value, but rather delved into at a deeper level. We need to understand how an AI system makes its predictions to put our trust in it. Trust should not be built on prediction accuracy alone.
- Top Stories, May 3-9: Charticulator: Microsoft Research open-sourced a game-changing Data Visualization platform; Data Preparation in SQL, with Cheat Sheet!, by KDnuggets - May 10, 2021.
Also: Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Data Scientist vs Machine Learning Engineer – what are their skills?; XGBoost Explained: DIY XGBoost Library in Less Than 200 Lines of Python
- Similarity Metrics in NLP, by James Briggs - May 10, 2021.
This post covers the use of euclidean distance, dot product, and cosine similarity as NLP similarity metrics.
- Essential Linear Algebra for Data Science and Machine Learning, by Benjamin Obi Tayo - May 10, 2021.
Linear algebra is foundational in data science and machine learning. Beginners starting out along their learning journey in data science--as well as established practitioners--must develop a strong familiarity with the essential concepts in linear algebra.
- Ensemble Methods Explained in Plain English: Bagging, by Claudia Ng - May 10, 2021.
Understand the intuition behind bagging with examples in Python.
- Applying Python’s Explode Function to Pandas DataFrames, by Michael Mosesov - May 7, 2021.
Read this applied Python method to solve the issue of accessing column by date/ year using the Pandas library and functions lambda(), list(), map() & explode().
- We Don’t Need Data Engineers, We Need Better Tools for Data Scientists, by Devin Petersohn - May 7, 2021.
In today's data science jobs landscape, a variety of roles are being filled from specialized engineering positions to the more generalized data scientist. However, is it possible that some of these job types are duplicative or misdirected, such as that of the Data Engineer, which might exist as we know it because of a lack of adequate tooling for Data Scientists?
- Data Preparation in SQL, with Cheat Sheet!, by Stan Pugsley - May 7, 2021.
If your raw data is in a SQL-based data lake, why spend the time and money to export the data into a new platform for data prep?
- A Comprehensive Guide to Ensemble Learning – Exactly What You Need to Know, by Derrick Mwiti - May 6, 2021.
This article covers ensemble learning methods, and exactly what you need to know in order to understand and implement them.
- Feature stores – how to avoid feeling that every day is Groundhog Day, by Monte Zweben - May 6, 2021.
Feature stores stop the duplication of each task in the ML lifecycle. You can reuse features and pipelines for different models, monitor models consistently, and sidestep data leakage with this MLOps technology that everyone is talking about.
- What is Neural Search?, by Pradeep Sharma - May 6, 2021.
And how to get started with it with no prior experience in Machine Learning.
- Rebuilding My 7 Python Projects, by Kaustubh Gupta - May 5, 2021.
This is how I rebuilt My Python Projects: Data Science, Web Development & Android Apps.
- What makes a winning entry in a Machine Learning competition?, by Harald Carlens - May 5, 2021.
So you want to show your grit in a Kaggle-style competition? Many, many others have the same idea, including domain experts and non-experts, and academic and corporate teams. What does it take for your bright ideas and skills to come out on top of thousands of competitors?
- The Machine Learning Research Championed by the Biggest AI Labs in the World, by Jesus Rodriguez - May 5, 2021.
How Google, Microsoft, Facebook, DeepMind, OpenAI, Amazon and IBM think about the future of AI.
- How to get started managing data quality with SQL and scale, by Soda.io - May 4, 2021.
Silent data quality issues are the biggest problem facing data teams today, who are flying blind with no systems or processes in place to monitor and detect bad data before it has a downstream impact.
- Deploy a Dockerized FastAPI App to Google Cloud Platform, by Krueger & Franklin - May 4, 2021.
A short guide to deploying a Dockerized Python app to Google Cloud Platform using Cloud Run and a SQL instance.
- Disentangling AI, Machine Learning, and Deep Learning, by Kevin Vu - May 4, 2021.
The field of Artificial Intelligence is extremely broad and captures a winding history through the evolution of various sub-fields that experienced many ups and downs over the years. Appreciating AI within its historical contexts will enhance your communication with the public, colleagues, and potential hiring managers, as well as guide your thinking as you progress in the application and study of state-of-the-art techniques.
- A simple static visualization can often be the best approach, by Kai Wong - May 4, 2021.
How I overengineered a worse solution by making an interactive visualization.
- Top Stories, Apr 26 – May 2: Data Scientist vs Machine Learning Engineer – what are their skills?, by KDnuggets - May 3, 2021.
Also: Data Science Books You Should Start Reading in 2021; Data science is not about data – applying Dijkstra principle to data science; How to ace A/B Testing Data Science Interviews; Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1
- Cloud Based Web Scraping for Big Data Applications, by Octoparse - May 3, 2021.
As the need to store and access big data increases, web scraping and web crawling technologies are becoming more and more useful. Today, companies use web scraping technology for myriad reasons. Read on to find the uses of cloud-based web scraping for big data apps.
- How To Generate Meaningful Sentences Using a T5 Transformer, by Vatsal Saglani - May 3, 2021.
Read this article to see how to develop a text generation API using the T5 transformer.
- Charticulator: Microsoft Research open-sourced a game-changing Data Visualization platform, by Josh Taylor - May 3, 2021.
Creating grand charts and graphs from your data analysis is supported by many powerful tools. However, how to make these visualizations meaningful can remain a mystery. To address this challenge, Microsoft Research has quietly open-sourced a game-changing visualization platform.
- XGBoost Explained: DIY XGBoost Library in Less Than 200 Lines of Python, by Guillaume Saupin - May 3, 2021.
Understand how XGBoost work with a simple 200 lines codes that implement gradient boosting for decision trees.
- Hilarious Data Science Humor, by Yi Li - May 2, 2021.
Data scientists and developers share a goofy sense of humor. Here are some puns that we — data scientists/programmers— can definitely relate to.