- Multivariate Time Series Analysis with an LSTM based RNN, by Kathrin Melcher - Oct 29, 2021.
Check out this codeless solution using the Keras integration.
- ETL and ELT: A Guide and Market Analysis, by Louise de Leyritz - Oct 29, 2021.
ETL and related techniques remain a powerful and foundational tool in the data industry. We explain what ETL is and how ETL and ELT processes have evolved over the years, with a close eye toward how third-generation ETL tools are about to disrupt standard data processing practices.
- Simple Text Scraping, Parsing, and Processing with this Python Library, by Matthew Mayo - Oct 29, 2021.
Scraping, parsing, and processing text data from the web can be difficult. But it can also be easy, using Newspaper3k.
- What Google Recommends You do Before Taking Their Machine Learning or Data Science Course, by Harshit Tyagi - Oct 28, 2021.
First steps to learning data science & machine learning are the foundations.
- Want to Join a Bank? Everything Data Scientists Need to Know About Working in Fintech, by Shameek Kundu - Oct 28, 2021.
There is ample opportunity for data scientists in the financial services sector. The career experience can be very different, however, from similar roles at pure technology organizations. So, it's best to first consider if this industry is right for your interests, preferences for how you work, and long-term goals.
- Analyze Python Code in Jupyter Notebooks, by Julien Delange - Oct 28, 2021.
We present a new tool that integrates modern code analysis techniques with Jupyter notebooks and helps developers find bugs as they write code.
- A Guide to 14 Different Data Science Jobs, by Nate Rosidi - Oct 27, 2021.
The field of data science is growing into one that features a variety of job titles This guide reviews different positions available for you to consider if you have a data science background.
- Machine Learning Model Development and Model Operations: Principles and Practices, by Suresh Yaram - Oct 27, 2021.
The ML model management and the delivery of highly performing model is as important as the initial build of the model by choosing right dataset. The concepts around model retraining, model versioning, model deployment and model monitoring are the basis for machine learning operations (MLOps) that helps the data science teams deliver highly performing models.
- Getting Started with PyTorch Lightning, by Kevin Vu - Oct 26, 2021.
As a library designed for production research, PyTorch Lightning streamlines hardware support and distributed training as well, and we’ll show how easy it is to move training to a GPU toward the end.
- Four Basic Steps in Data Preparation, by Rosaria Silipo - Oct 26, 2021.
What we would like to do here is introduce four very basic and very general steps in data preparation for machine learning algorithms. We will describe how and why to apply such transformations within a specific example.
- Guide To Finding The Right Predictive Maintenance Machine Learning Techniques, by Maruti Techlabs - Oct 25, 2021.
What happens to a life so dependent on machines, when that particular machine breaks down? This is precisely why there’s a dire need for predictive maintenance with machine learning.
- Learn To Reproduce Papers: Beginner’s Guide, by Olga Chernytska - Oct 25, 2021.
Step-by-step instructions on how to understand Deep Learning papers and implement the described approaches.
- Deploying Serverless spaCy Transformer Model with AWS Lambda, by Walid Amamou - Oct 22, 2021.
A step-by-step guide on how to deploy NER transformer model serverless.
- Introduction to AutoEncoder and Variational AutoEncoder (VAE), by Nagesh Chauhan - Oct 22, 2021.
Autoencoders and their variants are interesting and powerful artificial neural networks used in unsupervised learning scenarios. Learn how autoencoders perform in their different approaches and how to implement with Keras on the instructional data set of the MNIST digits.
- Find the Best-Matching Distribution for Your Data Effortlessly, by Tirthajyoti Sarkar - Oct 22, 2021.
How to find the best-matching statistical distributions for your data points — in an automated and easy way. And, then how to extend the utility further.
- Training BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face, by Harshit Tyagi - Oct 21, 2021.
Comparing the tokens generated by SOTA tokenization algorithms using Hugging Face's tokenizers package.
- Data Preparation in R using dplyr, with Cheat Sheet!, by Stan Pugsley - Oct 20, 2021.
Leverage the powerful data wrangling tools in R’s dplyr to clean and prepare your data.
- Data Science Portfolio Project Ideas That Can Get You Hired (Or Not), by Nate Rosidi - Oct 20, 2021.
Choosing what to include in your data science portfolio during the job search is the most important part of the process. Each project should be well-structured so that a hiring manager can assess your skills quickly. To help you get started, we highlight a few data science project ideas that you should consider for your portfolio.
- 11 Most Practical Data Science Skills for 2022, by Terence Shin - Oct 19, 2021.
While the field of data science continues to evolve with exciting new progress in analytical approaches and machine learning, there remain a core set of skills that are foundational for all general practitioners and specialists, especially those who want to be employable with full-stack capabilities.
- How to Create an Interactive Dashboard in Three Steps with KNIME Analytics Platform, by Emilio Silvestri - Oct 19, 2021.
In this blog post I will show you how to build a simple, but useful and good-looking dashboard to present your data - in three simple steps!
- Real Time Image Segmentation Using 5 Lines of Code, by Ayoola Olafenwa - Oct 18, 2021.
PixelLib Library is a library created to allow easy integration of object segmentation in images and videos using few lines of python code. PixelLib now provides support for PyTorch backend to perform faster, more accurate segmentation and extraction of objects in images and videos using PointRend segmentation architecture.
- Serving ML Models in Production: Common Patterns, by Mo, Oakes & Galarnyk - Oct 18, 2021.
Over the past couple years, we've seen 4 common patterns of machine learning in production: pipeline, ensemble, business logic, and online learning. In the ML serving space, implementing these patterns typically involves a tradeoff between ease of development and production readiness. Ray Serve was built to support these patterns by being both easy to develop and production ready.
- New Computing Paradigm for AI: Processing-in-Memory (PIM) Architecture, by Nam Sung Kim - Oct 15, 2021.
As larger deep neural networks are trained on the latest and fastest chip technologies, an important challenge remains that bottlenecks performance -- and it is not compute power. You can try to calculate a DNN as fast as possible, but there is data -- and it has to move. Data pipelines on the chip are expensive and new solutions must be developed to advance capabilities.
- How to calculate confidence intervals for performance metrics in Machine Learning using an automatic bootstrap method, by David B Rosen (PhD) - Oct 15, 2021.
Are your model performance measurements very precise due to a “large” test set, or very uncertain due to a “small” or imbalanced test set?
- Deploying Your First Machine Learning API, by Abid Ali Awan - Oct 14, 2021.
Effortless way to develop and deploy your machine learning API using FastAPI and Deta.
- The 20 Python Packages You Need For Machine Learning and Data Science, by Sandro Luck - Oct 14, 2021.
Do you do Python? Do you do data science and machine learning? Then, you need to do these crucial Python libraries that enable nearly all you will want to do.
- What is Clustering and How Does it Work?, by Satoru Hayasaka - Oct 14, 2021.
Let us examine how clusters with different properties are produced by different clustering algorithms. In particular, we give an overview of three clustering methods: k-Means clustering, hierarchical clustering, and DBSCAN.
- How to Ace Data Science Interview by Working on Portfolio Projects, by Abid Ali Awan - Oct 13, 2021.
Recruiters of Data Science professionals around the world focus on portfolio projects rather than resumes and LinkedIn profiles. So, learning early how to contribute and share your work on GitHub, Deepnote, and Kaggle can help you perform your best during data science interviews.
- Building Multimodal Models: Using the widedeep Pytorch package, by Rajiv Shah - Oct 13, 2021.
This article gets you started on the open-source widedeep PyTorch framework developed by Javier Rodriguez Zaurin.
- Create Synthetic Time-series with Anomaly Signatures in Python, by Tirthajyoti Sarkar - Oct 12, 2021.
A simple and intuitive way to create synthetic (artificial) time-series data with customized anomalies — particularly suited to industrial applications.
- Step by Step Building a Vacancy Tracker Using Tableau, by Dotun Opasina - Oct 12, 2021.
Step-by-step explanations of vacancies valued in tens of millions of dollars in the small town of Fitchburg, Massachusetts.
- AutoML: An Introduction Using Auto-Sklearn and Auto-PyTorch, by Kevin Vu - Oct 11, 2021.
AutoML is a broad category of techniques and tools for applying automated search to your automated search and learning to your learning. In addition to Auto-Sklearn, the Freiburg-Hannover AutoML group has also developed an Auto-PyTorch library. We’ll use both of these as our entry point into AutoML in the following simple tutorial.
- Scaling human oversight of AI systems for difficult tasks – OpenAI approach, by OpenAI - Oct 11, 2021.
The foundational idea of Artificial Intelligence is that it should demonstrate human-level intelligence. So, unless a model can perform as a human might do, its intended purpose is missed. Here, recent OpenAI research into full-length book summarization focuses on generating results that make sense to humans with state-of-the-art results that leverage scalable AI-enhanced human-in-the-loop feedback.
- 8 Must-Have Git Commands for Data Scientists, by Soner Yildirim - Oct 8, 2021.
Git is a must-have skill for data scientists. Maintaining your development work within a version control system is absolutely necessary to have a collaborative and productive working environment with your colleagues. This guide will quickly start you off in the right direction for contributing to an existing project at your organization.
- Dealing with Data Leakage, by Susan Currie Sivek, Ph.D. - Oct 8, 2021.
Target leakage and data leakage represent challenging problems in machine learning. Be prepared to recognize and avoid these potentially messy problems.
- The Evolution of Tokenization – Byte Pair Encoding in NLP, by Harshit Tyagi - Oct 7, 2021.
Though we have SOTA algorithms for tokenization, it's always a good practice to understand the evolution trail and learning how have we reached here. Read this introduction to Byte Pair Encoding.
- How to do “Limitless” Math in Python, by Tirthajyoti Sarkar - Oct 7, 2021.
How to perform arbitrary-precision computation and much more math (and fast too) than what is possible with the built-in math library in Python.
- Four Different Pipes for R with magrittr, by Gregory Janesch - Oct 6, 2021.
The magrittr package supplies the pipe operator (%>%), but it turns out that the package actually contains four pipe operators in total. Let's go into them a bit.
- 38 Free Courses on Coursera for Data Science, by Aqsa Zafar - Oct 6, 2021.
There are so many online resources for learning data science, and a great deal of it can be used at no cost. This collection of free courses hosted by Coursera will help you enhance your data science and machine learning skills, no matter your current level of expertise.
- My AI Plays Piano for Me, by Kathrin Melcher - Oct 6, 2021.
Training an RNN with a Combined Loss Function.
- Data science SQL interview questions from top tech firms, by Nate Rosidi - Oct 5, 2021.
As a data scientist, there is one thing you really need to understand and know how to handle: data. With SQL being a foundational technical approach for working with data, it should not be surprising that the top tech companies will ask about your SQL skills during an interview. Here, we cover the key concepts tested so you can best prepare for your next data science interview.
- The Architecture Behind DeepMind’s Model for Near Real Time Weather Forecasts, by Jesus Rodriguez - Oct 5, 2021.
Deep Generative Model of Rain (DGMR) is the newest creation from DeepMind which can predict precipitation in short term intervals.
- Parallelizing Python Code, by Borycki & Galarnyk - Oct 4, 2021.
This article reviews some common options for parallelizing Python code, including process-based parallelism, specialized libraries, ipython parallel, and Ray.
- Introduction to PyTorch Lightning, by Kevin Vu - Oct 4, 2021.
PyTorch Lightning is a high-level programming layer built on top of PyTorch. It makes building and training models faster, easier, and more reliable.
- Teaching AI to Classify Time-series Patterns with Synthetic Data, by Tirthajyoti Sarkar - Oct 1, 2021.
How to build and train an AI model to identify various common anomaly patterns in time-series data.
- How to Auto-Detect the Date/Datetime Columns and Set Their Datatype When Reading a CSV File in Pandas, by David B Rosen (PhD) - Oct 1, 2021.
When read_csv( ) reads e.g. “2021-03-04” and “2021-03-04 21:37:01.123” as mere “object” datatypes, often you can simply auto-convert them all at once to true datetime datatypes.