- CSV Files for Storage? No Thanks. There’s a Better Option, by Dario Radečić - Aug 31, 2021.
Saving data to CSV’s is costing you both money and disk space. It’s time to end it.
- Multilabel Document Categorization, step by step example, by Saurabh Sharma - Aug 31, 2021.
This detailed guide explores an unsupervised and supervised learning two-stage approach with LDA and BERT to develop a domain-specific document categorizer on unlabeled documents.
- A Python Data Processing Script Template, by Matthew Mayo - Aug 31, 2021.
Here's a skeleton general purpose template for getting a Python command line script fleshed out as quickly as possible.
- Introducing Packed BERT for 2x Training Speed-up in Natural Language Processing, by Krell & Kosec - Aug 30, 2021.
Check out this new BERT packing algorithm for more efficient training.
- Data Science Project Infrastructure: How To Create It, by Nate Rosidi - Aug 30, 2021.
The intension for most data science projects is to build something that people use. Creating something purposeful requires a solid infrastructure and processes that keeps problem-solving front-and-center for your audience.
- 3 Data Acquisition, Annotation, and Augmentation Tools, by Matthew Mayo - Aug 27, 2021.
Check out these 3 projects found around GitHub that can help with your data acquisition, annotation, and augmentation tasks.
- 11 Best Data Science Education Platforms, by Zulie Rane - Aug 26, 2021.
We cover 11 best Data Science Education platforms for 11 different use cases, ranging from specific languages to hands-on learners, to the best free option.
- 15 Python Snippets to Optimize your Data Science Pipeline, by Lucas Soares - Aug 25, 2021.
Quick Python solutions to help your data science cycle.
- How to Engineer Date Features in Python, by Matthew Mayo - Aug 25, 2021.
This article discusses and demonstrates how to quickly engineer some common date features using Python.
- Learning Data Science and Machine Learning: First Steps After The Roadmap, by Harshit Tyagi - Aug 24, 2021.
Just getting into learning data science may seem as daunting as (if not more than) trying to land your first job in the field. With so many options and resources online and in traditional academia to consider, these pre-requisites and pre-work are recommended before diving deep into data science and AI/ML.
- Automate Microsoft Excel and Word Using Python, by Mohammad Khorasani - Aug 24, 2021.
Integrate Excel with Word to generate automated reports seamlessly.
- How to Select an Initial Model for your Data Science Problem, by Zachary Warnes - Aug 20, 2021.
Save yourself some time and headaches and start simple.
- Enhancing Machine Learning Personalization through Variety, by Raghavan Kirthivasan - Aug 19, 2021.
Personalization drives growth and is a touchstone of good customer experience. Personalization driven through machine learning can enable companies to improve this experience while improving ROI for marketing campaigns. However, challenges exist in these techniques for when personalization makes sense and how and when specific options are recommended.
- When Correlation is Better than Causation, by Brittany Davis - Aug 18, 2021.
Identifying causality in an analysis isn't always practical. We show a heuristic approach for using correlations to inform decisions.
- Open Source Datasets for Computer Vision, by Kevin Vu - Aug 18, 2021.
Access to high-quality, noise-free, large-scale datasets is crucial for training complex deep neural network models for computer vision applications. Many open-source datasets are developed for use in image classification, pose estimation, image captioning, autonomous driving, and object segmentation. These datasets must be paired with the appropriate hardware and benchmarking strategies to optimize performance.
- Data Scientist’s Guide to Efficient Coding in Python, by Dr. Varshita Sher - Aug 18, 2021.
Read this fantastic collection of tips and tricks the author uses for writing clean code on a day-to-day basis.
- Linear Algebra for Natural Language Processing, by Taaniya Arora - Aug 17, 2021.
Learn about representing word semantics in vector space.
- Model Drift in Machine Learning – How To Handle It In Big Data, by Sai Geetha - Aug 17, 2021.
Rendezvous Architecture helps you run and choose outputs from a Champion model and many Challenger models running in parallel without many overheads. The original approach works well for smaller data sets, so how can this idea adapt to big data pipelines?
- Prefect: How to Write and Schedule Your First ETL Pipeline with Python, by Dario Radečić - Aug 16, 2021.
Workflow management systems made easy — both locally and in the cloud.
- Agile Data Labeling: What it is and why you need it, by Jennifer Prendki - Aug 16, 2021.
The notion of Agile in software development has made waves across industries with its revolution for productivity. Can the same benefits be applied to the often arduous task of annotating data sets for machine learning?
- Writing Your First Distributed Python Application with Ray, by Michael Galarnyk - Aug 16, 2021.
Using Ray, you can take Python code that runs sequentially and transform it into a distributed application with minimal code changes. Read on to find out why you should use Ray, and how to get started.
- How to Train a BERT Model From Scratch, by James Briggs - Aug 13, 2021.
Meet BERT’s Italian cousin, FiliBERTo.
- Querying the Most Granular Demographics Dataset, by Matti Grotheer - Aug 13, 2021.
Having access to broad and detailed population data can potentially offer enormous value to any organization looking to interact with specific demographics. However, access alone is not sufficient without being able to leverage advanced techniques to explore and visualize the data.
- Introduction to Statistical Learning Second Edition, by Matthew Mayo - Aug 13, 2021.
The second edition of the classic "An Introduction to Statistical Learning, with Applications in R" was published very recently, and is now freely-available via PDF on the book's website.
- MLOps And Machine Learning Roadmap, by Ben Rogojan - Aug 12, 2021.
A 16–20 week roadmap to review machine learning and learn MLOps.
- How to Detect and Overcome Model Drift in MLOps, by Bhaskar Ammu - Aug 12, 2021.
This article has a look at model drift, and how to detect and overcome it in production MLOps.
- DeepMind’s New Super Model: Perceiver IO is a Transformer that can Handle Any Dataset, by Jesus Roriguez - Aug 11, 2021.
The new transformer-based architecture can process audio, video and images using a single model.
- Practising SQL without your own database, by Hui XiangChua - Aug 10, 2021.
SQL is a very important skill for data analysts and data scientists. However, when you are just starting out learning in the field, how can you practice querying with SQL if you don’t have any data stored in a database?
- Visualizing Bias-Variance, by Theodore Tsitsimis - Aug 10, 2021.
In this article, we'll explore some different perspectives of what the bias-variance trade-off really means with the help of visualizations.
- 5 Tips for Writing Clean R Code, by Marcin Dubel - Aug 9, 2021.
This article summarizes the most common mistakes to avoid and outline best practices to follow in programming in general. Follow these tips to speed up the code review iteration process and be a rockstar developer in your reviewer’s eyes!
- How to Query Your Pandas Dataframe, by Matthew Przybyla - Aug 9, 2021.
A Data Scientist’s perspective on SQL-like Python functions.
- Using Twitter to Understand Pizza Delivery Apprehension During COVID, by Arimitra Maiti - Aug 6, 2021.
Analyzing customer sentiments and capturing any specific difference in emotion to order Dominos pizza in India during lockdown.
- Bootstrap a Modern Data Stack in 5 minutes with Terraform, by Tuan Nguyen - Aug 6, 2021.
What is a Modern Data Stack and how do you deploy one? This guide will motivate you to start on this journey with setup instructions for Airbyte, BigQuery, dbt, Metabase, and everything else you need using Terraform.
- Essential Math for Data Science: Introduction to Systems of Linear Equations, by Hadrien Jean - Aug 6, 2021.
In this post, you’ll see how you can use systems of equations and linear algebra to solve a linear regression problem.
- Be Wary of Automated Feature Selection — Chi Square Test of Independence Example, by Venkat Raman - Aug 5, 2021.
When Data Scientists use chi square test for feature selection, they just merely go by the ritualistic “If your p-value is low, the null hypothesis must go”. The automated function they use behaves no differently.
- Most Common Data Science Interview Questions and Answers, by Nate Rosidi - Aug 5, 2021.
After analyzing 900+ data science interview questions from companies over the past few years, the most common data science interview question categories are reviewed in this guide, each explained with an example.
- How To Become A Freelance Data Scientist – 4 Practical Tips, by Pau Labarta Bajo - Aug 4, 2021.
If you are a nerd-ish data scientist who wants to start working as an independent (remote) freelance data scientist, then these four practical tips can help you transition from the traditional 9-to-5 job to a dynamic experience as a remote contractor, just as the author did three years ago.
- How DeepMind Trains Agents to Play Any Game Without Intervention, by Jesus Rodriguez - Aug 4, 2021.
A new paper proposes a new architecture and training environment for generally capable agents.
- Mastering Clustering with a Segmentation Problem, by Indraneel Dutta Baruah - Aug 3, 2021.
The one stop shop for implementing the most widely used models in Python for unsupervised clustering.
- 30 Most Asked Machine Learning Questions Answered, by Abhay Parashar - Aug 3, 2021.
There is always a lot to learn in machine learning. Whether you are new to the field or a seasoned practitioner and ready for a refresher, understanding these key concepts will keep your skills honed in the right direction.
- GPU-Powered Data Science (NOT Deep Learning) with RAPIDS, by Tirthajyoti Sarkar - Aug 2, 2021.
How to utilize the power of your GPU for regular data science and machine learning even if you do not do a lot of deep learning work.
- 3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks, by Terence Shin - Aug 2, 2021.
While there may always seem to be something new, cool, and shiny in the field of AI/ML, classic statistical methods that leverage machine learning techniques remain powerful and practical for solving many real-world business problems.
- Development & Testing of ETL Pipelines for AWS Locally, by Subhash Sreenivasachar - Aug 2, 2021.
Typically, development and testing ETL pipelines is done on real environment/clusters which is time consuming to setup & requires maintenance. This article focuses on the development and testing of ETL pipelines locally with the help of Docker & LocalStack. The solution gives flexibility to test in a local environment without setting up any services on the cloud.