- 19 Data Science Project Ideas for Beginners, by Zulie Rane - Feb 7, 2022.
This article features 19 data science projects for beginners, categorized into 7 full project tutorials, 5 places to come up with your own data science projects using data, and 7 skills-based data science projects.
Data Science
- Sentiment Analysis API vs Custom Text Classification: Which one to choose?, by Jérémy Lambert - Nov 30, 2021.
In this article, we are going to compare the sentiment extraction performance between Sentiment Analysis engines and Custom Text classification engines. The idea is to show pros and cons of these two types of engines on a concrete dataset.
API, Sentiment Analysis, Text Classification
- Clustering in Crowdsourcing: Methodology and Applications, by Daniil Likhobaba - Nov 30, 2021.
As a result of the efforts outlined in this article, we confirmed that clustering through crowdsourcing is indeed possible and works impressively well.
Clustering, Crowdsourcing, Data Science, Toloka
- Building Massively Scalable Machine Learning Pipelines with Microsoft Synapse ML, by Jesus Rodriguez - Nov 30, 2021.
The new platform provides a single API to abstract dozens of ML frameworks and databases.
Machine Learning, Microsoft, Pipeline, Scalability
- Sentiment Analysis with KNIME, by Thiel & Rudnitckaia - Nov 29, 2021.
Check out this tutorial on how to approach sentiment classification with supervised machine learning algorithms.
Knime, NLP, Sentiment Analysis, Text Analytics
- How to Build a Knowledge Graph with Neo4J and Transformers, by Walid Amamou - Nov 26, 2021.
Learn to use custom Named Entity Recognition and Relation Extraction models.
Knowledge Graph, Neo4j, Transformer
- PyCaret 2.3.5 Is Here! Learn What’s New, by Moez Ali - Nov 26, 2021.
Read about the new functionalities added in PyCaret’s recent release.
Open Source, PyCaret, Python
- A Spreadsheet that Generates Python: The Mito JupyterLab Extension, by Roman Orac - Nov 25, 2021.
You can call Mito into your Jupyter Environment and each edit you make will generate the equivalent Python in the code cell below.
Jupyter, Programming, Python, Spreadsheet
- Top 4 Data Integration Tools for Modern Enterprises, by Ammar Ali - Nov 24, 2021.
Maintaining a centralized data repository can simplify your business intelligence initiatives. Here are four data integration tools that can make data more valuable for modern enterprises.
Data Analytics, Data Integration, Data Preparation
- Common Misconceptions About Differential Privacy, by Lipika Ramaswamy - Nov 24, 2021.
This article will clarify some common misconceptions about differential privacy and what it guarantees.
Data Science, Differential Privacy, Machine Learning, Privacy
-
Most Common SQL Mistakes on Data Science Interviews, by Nate Rosidi - Nov 23, 2021.
Sure, we all make mistakes -- which can be a bit more painful when we are trying to get hired -- so check out these typical errors applicants make while answering SQL questions during data science interviews.
Interview Questions, Mistakes, SQL
- 5 Advanced Tips on Python Sequences, by Michael Berk - Nov 23, 2021.
Notes from Fluent Python by Luciano Ramalho.
Programming, Python
- On-Device Deep Learning: PyTorch Mobile and TensorFlow Lite, by Dhruv Matani - Nov 22, 2021.
PyTorch and TensorFlow are the two leading AI/ML Frameworks. In this article, we take a look at their on-device counterparts PyTorch Mobile and TensorFlow Lite and examine them more deeply from the perspective of someone who wishes to develop and deploy models for use on mobile platforms.
Deep Learning, Mobile, PyTorch, TensorFlow
- Dask DataFrame is not Pandas, by Hugo Shi - Nov 22, 2021.
This article is the second article of an ongoing series on using Dask in practice. Each article in this series will be simple enough for beginners, but provide useful tips for real work. The next article in the series is about parallelizing for loops, and other embarrassingly parallel operations with dask.delayed.
Dask, Pandas, Python, Saturn Cloud
- 3 Differences Between Coding in Data Science and Machine Learning, by Nahla Davies - Nov 19, 2021.
The terms ‘data science’ and ‘machine learning’ are often used interchangeably. But while they are related, there are some glaring differences, so let’s take a look at the differences between the two disciplines, specifically as it relates to programming.
Data Science, Machine Learning, Programming
- Difference between distributed learning versus federated learning algorithms, by Aishwarya Srinivasan - Nov 19, 2021.
Want to know the difference between distributed and federated learning? Read this article to find out.
Algorithms, Distributed Systems, Federated Learning
- Build a Serverless News Data Pipeline using ML on AWS Cloud, by Maria Zentsov - Nov 18, 2021.
This is the guide on how to build a serverless data pipeline on AWS with a Machine Learning model deployed as a Sagemaker endpoint.
AWS, NLP, Pipeline, Python, Sagemaker, Text Summarization
- Easy Synthetic Data in Python with Faker, by Matthew Mayo - Nov 17, 2021.
Faker is a Python library that generates fake data to supplement or take the place of real world data. See how it can be used for data science.
Data Science, Python, Synthetic Data
- Inside recommendations: how a recommender system recommends, by Sciforce - Nov 17, 2021.
We describe types of recommender systems, more specifically, algorithms and methods for content-based systems, collaborative filtering, and hybrid systems.
Recommendation Engine, Recommender Systems
- Book Metadata and Cover Retrieval Using OCR and Google Books API, by Cadili & Rudnitckaia - Nov 17, 2021.
With KNIME extracting critical pieces of information from images becomes as easy as ABC.
API, Google, Knime, Low-Code
- Virtual Presentation Tips for Data Scientists, by Michael Berk - Nov 16, 2021.
Learn how to effectively communicate your work.
Career Advice, Data Science, Data Scientist, Presentation, Visualization
- 10 AI Project Ideas in Computer Vision, by Manika Nagpal - Nov 16, 2021.
The field of computer vision has seen the development of very powerful applications leveraging machine learning. These projects will introduce you to these techniques and guide you to more advanced practice to gain a deeper appreciation for the sophistication now available.
AI, Computer Vision, Project
- Two Simple Things You Need to Steal from Agile for Data and Analytics Work, by Jon Loyens - Nov 16, 2021.
Peer Review and Definition of Done: small changes, BIG impact.
Agile, Analytics, Data Science, Data.world
- What Are NVIDIA NGC Containers & How to Get Started Using Them, by Kevin Vu - Nov 15, 2021.
NVIDIA, the pioneer in the GPU technologies and deep learning revolution, has come up with an excellent catalog of specialized containers that they call NGC Collections. In this article, we explore their basic usage and some variations.
Containers, Data Engineering, Deep Learning, NVIDIA
-
How I Redesigned over 100 ETL into ELT Data Pipelines, by Nicholas Leong - Nov 15, 2021.
Learn how to level up your Data Pipelines!
ELT, ETL, Pipeline, SQL
- Deep Learning on your phone: PyTorch C++ API for use on Mobile Platforms, by Dhruv Matani - Nov 12, 2021.
The PyTorch Deep Learning framework has a C++ API for use on mobile platforms. This article shows an end-to-end demo of how to write a simple C++ application with Deep Learning capabilities using the PyTorch C++ API such that the same code can be built for use on mobile platforms (both Android and iOS).
C++, Deep Learning, Mobile, Python, PyTorch
- 25 Github Repositories Every Python Developer Should Know, by Abhay Parashar - Nov 12, 2021.
Check out these repositories to help you improve your data science skills.
GitHub, Programming, Python
- Dream Come True: Building websites by thinking about them, by Ajay, Agarwal & Nema - Nov 11, 2021.
From the mind to the computer, make websites using your imagination!
Brain, Deep Learning, Hackathon, Machine Learning, NLP
- The Ultimate Guide To Different Word Embedding Techniques In NLP, by Neeraj Agarwal - Nov 10, 2021.
A machine can only understand numbers. As a result, converting text to numbers, called embedding text, is an actively researched topic. In this article, we review different word embedding techniques for converting text into vectors.
BERT, NLP, TF-IDF, Word Embeddings
- OpenAI’s Approach to Solve Math Word Problems, by Jesus Rodriguez - Nov 9, 2021.
OpenAI's latest research aims to solve math word problems. Let's dive a bit deeper into the ideas behind this new research.
GPT-3, Mathematics, NLP, OpenAI
- What Comes After HDF5? Seeking a Data Storage Format for Deep Learning, by Davit Buniatyan - Nov 9, 2021.
In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.
Data Management, Deep Learning, Python
- POS Tagging, Explained, by Filiberto Emanuele - Nov 8, 2021.
Learn about the strengths of part-of-speech tagging, and about how a strong POS tagger can contribute to natural language understanding.
NLP, NLU, Speech
- 7 Top Open Source Datasets to Train Natural Language Processing (NLP) & Text Models, by Kevin Vu - Nov 8, 2021.
With a lot of excitement and research around NLP, there are growing opportunities to apply these technologies to real-world scenarios. It's not trivial to become familiar with NLP and these open-source data sets can help you increase your skills.
Dataset, NLP, Open Source
- AI Infinite Training & Maintaining Loop, by Roey Mechrez - Nov 4, 2021.
Productizing AI is an infrastructure orchestration problem. In planning your solution design, you should use continuous monitoring, retraining, and feedback to ensure stability and sustainability.
AI, Deployment, Machine Learning, Production, Training
- NLP for Business in the Time of BERTera: Seven Misplaced Passions, by Anand Ramanathan - Nov 4, 2021.
This article is a brief summary of our observations on some common client misperceptions with respect to recent developments in NLP, especially the use of large-scale models and datasets.
BERT, Business, NLP
- Visual Scoring Techniques for Classification Models, by Maarit Widmann - Nov 3, 2021.
Read this article assessing a model performance in a broader context.
Classification, Knime, Low-Code, Machine Learning, Metrics, Visualization
-
Data Scientist Career Path from Novice to First Job, by Nate Rosidi - Nov 3, 2021.
If you are beginning your data science journey, then you must be prepared to plan it out as a step-by-step process that will guide you from being a total newbie to getting your first job as a data scientist. These tips and educational resources should be useful for you and add confidence as you take that first big step.
Beginners, Career Advice, Data Scientist
- Neural Networks from a Bayesian Perspective, by Zeldes & Naor - Nov 3, 2021.
This article looks at neural networks from a Bayesian perspective.
Bayesian, Neural Networks
-
ORDAINED: The Python Project Template, by Bryan Patrick Wood - Nov 2, 2021.
Recently I decided to take the time to better understand the Python packaging ecosystem and create a project boilerplate template as an improvement over copying a directory tree and doing find and replace.
Development, Programming, Project, Python
-
Design Patterns for Machine Learning Pipelines, by David Buniatyan - Nov 2, 2021.
ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.
Data Preprocessing, ETL, Machine Learning, Pipeline
-
Salary Breakdown of the Top Data Science Jobs, by Matthew Przybyla - Nov 2, 2021.
Machine Learning vs NLP vs Data Engineer vs Data Scientist, and what it means to be in each role.
Career Advice, Data Engineer, Data Scientist, Machine Learning Engineer, NLP, Salary
- Advanced PyTorch Lightning with TorchMetrics and Lightning Flash, by Kevin Vu - Nov 1, 2021.
In this tutorial we will be diving deeper into two additional tools you should be using: TorchMetrics and Lightning Flash. TorchMetrics unsurprisingly provides a modular approach to define and track useful metrics across batches and devices, while Lightning Flash offers a suite of functionality facilitating more efficient transfer learning and data handling, and a recipe book of state-of-the-art approaches to typical deep learning problems.
Metrics, Python, PyTorch, PyTorch Lightning, Transfer Learning
- Top 5 Time Series Methods, by Pranay Dave - Nov 1, 2021.
Data that varies in time can offer powerful applications and use cases for data scientists to analyze. This overview considers the top techniques you can learn to understand and gain insight from time-series data.
Forecasting, Seasonality, Time Series