2021 Nov Tutorials, Overviews

All (61) | Opinions (19) | Products, Services (8) | Tutorials, Overviews (34)

19 Data Science Project Ideas for Beginners

This article features 19 data science projects for beginners, categorized into 7 full project tutorials, 5 places to come up with your own data science projects using data, and 7 skills-based data science projects.

By Zulie Rane on Feb 7, 2022 in Data Science
Sentiment Analysis API vs Custom Text Classification: Which one to choose?

In this article, we are going to compare the sentiment extraction performance between Sentiment Analysis engines and Custom Text classification engines. The idea is to show pros and cons of these two types of engines on a concrete dataset.

By Jérémy Lambert on Nov 30, 2021 in API, Sentiment Analysis, Text Classification
Clustering in Crowdsourcing: Methodology and Applications

As a result of the efforts outlined in this article, we confirmed that clustering through crowdsourcing is indeed possible and works impressively well.

By Daniil Likhobaba on Nov 30, 2021 in Clustering, Crowdsourcing, Data Science, Toloka
Building Massively Scalable Machine Learning Pipelines with Microsoft Synapse ML

The new platform provides a single API to abstract dozens of ML frameworks and databases.

By Jesus Rodriguez on Nov 30, 2021 in Machine Learning, Microsoft, Pipeline, Scalability
Sentiment Analysis with KNIME

Check out this tutorial on how to approach sentiment classification with supervised machine learning algorithms.

By Thiel & Rudnitckaia on Nov 29, 2021 in Knime, NLP, Sentiment Analysis, Text Analytics
How to Build a Knowledge Graph with Neo4J and Transformers

Learn to use custom Named Entity Recognition and Relation Extraction models.

By Walid Amamou on Nov 26, 2021 in Knowledge Graph, Neo4j, Transformer
A Spreadsheet that Generates Python: The Mito JupyterLab Extension

You can call Mito into your Jupyter Environment and each edit you make will generate the equivalent Python in the code cell below.

By Roman Orac on Nov 25, 2021 in Jupyter, Programming, Python, Spreadsheet
Most Common SQL Mistakes on Data Science Interviews

Sure, we all make mistakes -- which can be a bit more painful when we are trying to get hired -- so check out these typical errors applicants make while answering SQL questions during data science interviews.

By Nate Rosidi on Nov 23, 2021 in Interview Questions, Mistakes, SQL
5 Advanced Tips on Python Sequences

Notes from Fluent Python by Luciano Ramalho.

By Michael Berk on Nov 23, 2021 in Programming, Python
On-Device Deep Learning: PyTorch Mobile and TensorFlow Lite

PyTorch and TensorFlow are the two leading AI/ML Frameworks. In this article, we take a look at their on-device counterparts PyTorch Mobile and TensorFlow Lite and examine them more deeply from the perspective of someone who wishes to develop and deploy models for use on mobile platforms.

By Dhruv Matani on Nov 22, 2021 in Deep Learning, Mobile, PyTorch, TensorFlow
Dask DataFrame is not Pandas

This article is the second article of an ongoing series on using Dask in practice. Each article in this series will be simple enough for beginners, but provide useful tips for real work. The next article in the series is about parallelizing for loops, and other embarrassingly parallel operations with dask.delayed.

By Hugo Shi on Nov 22, 2021 in Dask, Pandas, Python, Saturn Cloud
3 Differences Between Coding in Data Science and Machine Learning

The terms ‘data science’ and ‘machine learning’ are often used interchangeably. But while they are related, there are some glaring differences, so let’s take a look at the differences between the two disciplines, specifically as it relates to programming.

By Nahla Davies on Nov 19, 2021 in Data Science, Machine Learning, Programming
Difference between distributed learning versus federated learning algorithms

Want to know the difference between distributed and federated learning? Read this article to find out.

By Aishwarya Srinivasan on Nov 19, 2021 in Algorithms, Distributed Systems, Federated Learning
Build a Serverless News Data Pipeline using ML on AWS Cloud

This is the guide on how to build a serverless data pipeline on AWS with a Machine Learning model deployed as a Sagemaker endpoint.

By Maria Zentsov on Nov 18, 2021 in AWS, NLP, Pipeline, Python, Sagemaker, Text Summarization
Easy Synthetic Data in Python with Faker

Faker is a Python library that generates fake data to supplement or take the place of real world data. See how it can be used for data science.

By Matthew Mayo on Nov 17, 2021 in Data Science, Python, Synthetic Data
Inside recommendations: how a recommender system recommends

We describe types of recommender systems, more specifically, algorithms and methods for content-based systems, collaborative filtering, and hybrid systems.

By Sciforce on Nov 17, 2021 in Recommendation Engine, Recommender Systems
Virtual Presentation Tips for Data Scientists

Learn how to effectively communicate your work.

By Michael Berk on Nov 16, 2021 in Career Advice, Data Science, Data Scientist, Presentation, Visualization
10 AI Project Ideas in Computer Vision

The field of computer vision has seen the development of very powerful applications leveraging machine learning. These projects will introduce you to these techniques and guide you to more advanced practice to gain a deeper appreciation for the sophistication now available.

By Manika Nagpal on Nov 16, 2021 in AI, Computer Vision, Project
Two Simple Things You Need to Steal from Agile for Data and Analytics Work

Peer Review and Definition of Done: small changes, BIG impact.

By Jon Loyens on Nov 16, 2021 in Agile, Analytics, Data Science, Data.world
How I Redesigned over 100 ETL into ELT Data Pipelines

Learn how to level up your Data Pipelines!

By Nicholas Leong on Nov 15, 2021 in ELT, ETL, Pipeline, SQL
Deep Learning on your phone: PyTorch C++ API for use on Mobile Platforms

The PyTorch Deep Learning framework has a C++ API for use on mobile platforms. This article shows an end-to-end demo of how to write a simple C++ application with Deep Learning capabilities using the PyTorch C++ API such that the same code can be built for use on mobile platforms (both Android and iOS).

By Dhruv Matani on Nov 12, 2021 in C++, Deep Learning, Mobile, Python, PyTorch
25 Github Repositories Every Python Developer Should Know

Check out these repositories to help you improve your data science skills.

By Abhay Parashar on Nov 12, 2021 in GitHub, Programming, Python
Dream Come True: Building websites by thinking about them

From the mind to the computer, make websites using your imagination!

By Ajay, Agarwal & Nema on Nov 11, 2021 in Brain, Deep Learning, Hackathon, Machine Learning, NLP
OpenAI’s Approach to Solve Math Word Problems

OpenAI's latest research aims to solve math word problems. Let's dive a bit deeper into the ideas behind this new research.

By Jesus Rodriguez on Nov 9, 2021 in GPT-3, Mathematics, NLP, OpenAI
What Comes After HDF5? Seeking a Data Storage Format for Deep Learning

In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.

By Davit Buniatyan on Nov 9, 2021 in Data Management, Deep Learning, Python
7 Top Open Source Datasets to Train Natural Language Processing (NLP) & Text Models

With a lot of excitement and research around NLP, there are growing opportunities to apply these technologies to real-world scenarios. It's not trivial to become familiar with NLP and these open-source data sets can help you increase your skills.

By Kevin Vu on Nov 8, 2021 in Dataset, NLP, Open Source
AI Infinite Training & Maintaining Loop

Productizing AI is an infrastructure orchestration problem. In planning your solution design, you should use continuous monitoring, retraining, and feedback to ensure stability and sustainability.

By Roey Mechrez on Nov 4, 2021 in AI, Deployment, Machine Learning, Production, Training
Visual Scoring Techniques for Classification Models

Read this article assessing a model performance in a broader context.

By Maarit Widmann on Nov 3, 2021 in Classification, Knime, Low-Code, Machine Learning, Metrics, Visualization
Data Scientist Career Path from Novice to First Job

If you are beginning your data science journey, then you must be prepared to plan it out as a step-by-step process that will guide you from being a total newbie to getting your first job as a data scientist. These tips and educational resources should be useful for you and add confidence as you take that first big step.

By Nate Rosidi on Nov 3, 2021 in Beginners, Career Advice, Data Scientist
Neural Networks from a Bayesian Perspective

This article looks at neural networks from a Bayesian perspective.

By Zeldes & Naor on Nov 3, 2021 in Bayesian, Neural Networks
Design Patterns for Machine Learning Pipelines

ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.

By David Buniatyan on Nov 2, 2021 in Data Preprocessing, ETL, Machine Learning, Pipeline
Salary Breakdown of the Top Data Science Jobs

Machine Learning vs NLP vs Data Engineer vs Data Scientist, and what it means to be in each role.

By Matthew Przybyla on Nov 2, 2021 in Career Advice, Data Engineer, Data Scientist, Machine Learning Engineer, NLP, Salary
Advanced PyTorch Lightning with TorchMetrics and Lightning Flash

In this tutorial we will be diving deeper into two additional tools you should be using: TorchMetrics and Lightning Flash. TorchMetrics unsurprisingly provides a modular approach to define and track useful metrics across batches and devices, while Lightning Flash offers a suite of functionality facilitating more efficient transfer learning and data handling, and a recipe book of state-of-the-art approaches to typical deep learning problems.

By Kevin Vu on Nov 1, 2021 in Metrics, Python, PyTorch, PyTorch Lightning, Transfer Learning
Top 5 Time Series Methods

Data that varies in time can offer powerful applications and use cases for data scientists to analyze. This overview considers the top techniques you can learn to understand and gain insight from time-series data.

By Pranay Dave on Nov 1, 2021 in Forecasting, Seasonality, Time Series

2021 Nov Tutorials, Overviews

Latest Posts

Top Posts