Search results for dataframe
-
How to Anonymise Places in Python
A ready-to-run code which identifies and anonymises places, based on the GeoNames database.https://www.kdnuggets.com/2022/12/anonymise-places-python.html
-
Five Ways to do Conditional Filtering in Pandas
Learn five ways to perform conditional filtering with Pandas to help slice and dice your data.https://www.kdnuggets.com/2022/12/five-ways-conditional-filtering-pandas.html
-
7 Essential Cheat Sheets for Data Engineering
Learn about the data life cycle, PySpark, dbt, Kafka, BigQuery, Airflow, and Docker.https://www.kdnuggets.com/2022/12/7-essential-cheat-sheets-data-engineering.html
-
What Google Recommends You do Before Taking Their Machine Learning or Data Science Course
First steps to learning data science & machine learning are the foundations.https://www.kdnuggets.com/2021/10/google-recommends-before-machine-learning-data-science-course.html
-
SHAP: Explain Any Machine Learning Model in Python
A Comprehensive Guide to SHAP and Shapley Valueshttps://www.kdnuggets.com/2022/11/shap-explain-machine-learning-model-python.html
-
Announcing a Blog Writing Contest, Winner Gets an NVIDIA GPU!
KDnuggets and NVIDIA are announcing a blog-writing contest with a GPU focus, with the winner receiving an RTX 3080 Ti GPU!https://www.kdnuggets.com/2022/11/blog-writing-contest-nvidia-gpu.html
-
Introduction to Pandas for Data Science
The Pandas library is core to any Data Science work in Python. This introduction will walk you through the basics of data manipulating, and features many of Pandas important features.https://www.kdnuggets.com/2020/06/introduction-pandas-data-science.html
-
Geocoding in Python: A Complete Guide
A step-by-step tutorial on geocoding with Pythonhttps://www.kdnuggets.com/2022/11/geocoding-python-complete-guide.html
-
How to Setup Julia on Jupyter Notebook
Learn three simple steps to install Julia for Jupyter Notebook and write your first data visualization code.https://www.kdnuggets.com/2022/11/setup-julia-jupyter-notebook.html
-
Getting Started with PyCaret
An open-source low-code machine learning library for training and deploying the models in production.https://www.kdnuggets.com/2022/11/getting-started-pycaret.html
-
Fake It Till You Make It: Generating Realistic Synthetic Customer Datasets
Finding the data you need is hard. So why not fake it?https://www.kdnuggets.com/2022/01/fake-realistic-synthetic-customer-datasets-projects.html
-
4 Ways to Rename Pandas Columns
A simple pandas tutorial for beginners with code examples.https://www.kdnuggets.com/2022/11/4-ways-rename-pandas-columns.html
-
KDnuggets News, November 2: The Current State of Data Science Careers • 15 Free Machine Learning and Deep Learning Books
The Current State of Data Science Careers • 15 Free Machine Learning and Deep Learning Books • How to Make Python Code Run Incredibly Fast • Machine Learning on the Edge • Don't Become a Commoditized Data Scientisthttps://www.kdnuggets.com/2022/n43.html
-
Should I Learn Julia?
Do you think learning Julia is better for your data science career? Let’s find out.https://www.kdnuggets.com/2022/11/learn-julia.html
-
Top 10 MLOps Tools to Optimize & Manage Machine Learning Lifecycle
As more businesses experiment with data, they realize that developing a machine learning (ML) model is only one of many steps in the ML lifecycle.https://www.kdnuggets.com/2022/10/top-10-mlops-tools-optimize-manage-machine-learning-lifecycle.html
-
Easy Guide To Data Preprocessing In Python
Preprocessing data for machine learning models is a core general skill for any Data Scientist or Machine Learning Engineer. Follow this guide using Pandas and Scikit-learn to improve your techniques and make sure your data leads to the best possible outcome.https://www.kdnuggets.com/2020/07/easy-guide-data-preprocessing-python.html
-
KDnuggets Top Posts for September 2022: Free Python for Data Science Course
Free Python for Data Science Course • 7 Machine Learning Portfolio Projects to Boost the Resume • Free Algorithms in Python Course • How to Select Rows and Columns in Pandas • 5 Data Science Skills That Pay & 5 That Don't • Everything You’ve Ever Wanted to Know About Machine Learning • Free SQL and Database Course • 7 Data Analytics Interview Questions & Answershttps://www.kdnuggets.com/2022/09/top-posts-september-2022.html
-
Converting Text Documents to Token Counts with CountVectorizer
The post explains the significance of CountVectorizer and demonstrates its implementation with Python code.https://www.kdnuggets.com/2022/10/converting-text-documents-token-counts-countvectorizer.html
-
Implementing Adaboost in Scikit-learn
It is called Adaptive Boosting due to the fact that the weights are re-assigned to each instance, with higher weights being assigned to instances that are not correctly classified - therefore it ‘adapts’.https://www.kdnuggets.com/2022/10/implementing-adaboost-scikitlearn.html
-
Statistical Functions in Python
In this tutorial, we would be covering some useful statistical functions which can be applied to pandas and series objects.https://www.kdnuggets.com/2022/10/statistical-functions-python.html
-
3 Simple Ways to Speed Up Your Python Code
The post explains three popular frameworks, PySpark, Dask, and Ray, and discusses various factors to select the most appropriate one for your project.https://www.kdnuggets.com/2022/10/3-simple-ways-speed-python-code.html
-
A Beginner’s Guide to Web Scraping Using Python
This article serves as a beginner’s guide to web scraping using Python and looks at the different frameworks and methods you can use, outlined in simple terms.https://www.kdnuggets.com/2022/10/beginner-guide-web-scraping-python.html
-
3 Ways to Process CSV Files in Python
This article is about 3 ways you can process a CSV file using Python.https://www.kdnuggets.com/2022/10/3-ways-process-csv-files-python.html
-
Hyperparameter Tuning Using Grid Search and Random Search in Python
A comprehensive guide on optimizing model hyperparameters with Scikit-Learn.https://www.kdnuggets.com/2022/10/hyperparameter-tuning-grid-search-random-search-python.html
-
Handling Missing Values in Time-series with SQL
This article is about a specific use-case that comes up often when dealing with time-series data.https://www.kdnuggets.com/2022/09/handling-missing-values-timeseries-sql.html
-
Getting Started with Pandas Cheatsheet
The latest KDnuggets cheatsheet aims to get you up to speed with introductory Pandas operations, and provide a handy reference as you work with the library. Check it out if you're interested in a quick start.https://www.kdnuggets.com/2022/09/getting-started-pandas-cheatsheet.html
-
KDnuggets News, September 21: 7 Machine Learning Portfolio Projects to Boost the Resume • Free SQL and Database Course
7 Machine Learning Portfolio Projects to Boost the Resume • Free SQL and Database Course • Top 5 Bookmarks Every Data Analyst Should Have • 7 Steps to Mastering Python for Data Science • 5 Concepts You Should Know About Gradient Descent and Cost Functionhttps://www.kdnuggets.com/2022/n37.html
-
KDnuggets News, September 14: Free Python for Data Science Course • Everything You’ve Ever Wanted to Know About Machine Learning
Free Python for Data Science Course • Everything You’ve Ever Wanted to Know About Machine Learning • Progress Bars in Python with tqdm for Fun and Profit • 7 Tips for Python Beginners • 7 Data Analytics Interview Questions & Answershttps://www.kdnuggets.com/2022/n36.html
-
Convert Text Documents to a TF-IDF Matrix with tfidfvectorizer
Convert text documents to vectors using TF-IDF vectorizer for topic extraction, clustering, and classification.https://www.kdnuggets.com/2022/09/convert-text-documents-tfidf-matrix-tfidfvectorizer.html
-
How to build a model to find the most impactful paths in user journeys
In this how-to, we’ll build a model to uncover which paths in user journeys have the biggest impact on product goals (e.g. conversion). You can use it to improve products or optimize marketing campaigns, or as a base for deeper user behavior analyses.https://www.kdnuggets.com/2022/09/objectiv-build-model-impactful-paths-user-journeys.html
-
How to Select Rows and Columns in Pandas Using [ ], .loc, iloc, .at and .iat
Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.https://www.kdnuggets.com/2019/06/select-rows-columns-pandas.html
-
Progress Bars in Python with tqdm for Fun and Profit
Add progress bar to the Python functions, Jupyter Notebook, and pandas dataframe.https://www.kdnuggets.com/2022/09/progress-bars-python-tqdm-fun-profit.html
-
KDnuggets News, August 31: The Complete Data Science Study Roadmap • 7 Techniques to Handle Imbalanced Data
The Complete Data Science Study Roadmap • 7 Techniques to Handle Imbalanced Data • 3 Ways to Append Rows to Pandas DataFrames • The Bias-Variance Trade-off • How to Package and Distribute Machine Learning Models with MLFlowhttps://www.kdnuggets.com/2022/n35.html
-
Machine Learning Metadata Store
In this article, we will learn about metadata stores, the need for them, their components, and metadata store management.https://www.kdnuggets.com/2022/08/machine-learning-metadata-store.html
-
Customize Your Data Frame Column Names in Python
This tutorial will explore four scenarios in which you can apply different transformations to all DataFrame columns.https://www.kdnuggets.com/2022/08/customize-data-frame-column-names-python.html
-
Simplify Data Processing with Pandas Pipeline
Write a single line of code to clean and process the data for analytics and machine learning tasks.https://www.kdnuggets.com/2022/08/simplify-data-processing-pandas-pipeline.html
-
Implementing DBSCAN in Python
Density-based clustering algorithm explained with scikit-learn code example.https://www.kdnuggets.com/2022/08/implementing-dbscan-python.html
-
Machine Learning Over Encrypted Data
This blog outlines a solution to the Kaggle Titanic challenge that employs Privacy-Preserving Machine Learning (PPML) using the Concrete-ML open-source toolkit.https://www.kdnuggets.com/2022/08/machine-learning-encrypted-data.html
-
How to Perform Motion Detection Using Python
In this article, we will specifically take a look at motion detection using a webcam of a laptop or computer and will create a code script to work on our computer and see its real-time example.https://www.kdnuggets.com/2022/08/perform-motion-detection-python.html
-
Data Transformation: Standardization vs Normalization
Increasing accuracy in your models is often obtained through the first steps of data transformations. This guide explains the difference between the key feature scaling methods of standardization and normalization, and demonstrates when and how to apply each approach.https://www.kdnuggets.com/2020/04/data-transformation-standardization-normalization.html
-
How to Deal with Categorical Data for Machine Learning
Check out this guide to implementing different types of encoding for categorical data, including a cheat sheet on when to use what type.https://www.kdnuggets.com/2021/05/deal-with-categorical-data-machine-learning.html
-
What are the Assumptions of XGBoost?
In this article, you will learn: how boosting relates to XGBoost; the features of XGBoost; how it reduces the loss function value and overfitting.https://www.kdnuggets.com/2022/08/assumptions-xgboost.html
-
Online Training and Workshops with Nvidia
Learn about the Nvidia Self-Paced Online Training from their Deep Learning Institute.https://www.kdnuggets.com/2022/07/online-training-workshops-nvidia.html
-
KDnuggets News, July 27: The AIoT Revolution: How AI and IoT Are Transforming Our World • Introduction to Hill Climbing Algorithm
Calculus for Data Science • Real-time Translations with AI • Using Numpy's argmax() • Using the apply() Method with Pandas DataFrames • An Introduction to Hill Climbing Algorithm in AIhttps://www.kdnuggets.com/2022/n30.html
-
Using Scikit-learn’s Imputer
Learn about Scikit-learn’s SimpleImputer, IterativeImputer, KNNImputer, and machine learning pipelines.https://www.kdnuggets.com/2022/07/scikitlearn-imputer.html
-
Primary Supervised Learning Algorithms Used in Machine Learning
In this tutorial, we are going to list some of the most common algorithms that are used in supervised learning along with a practical tutorial on such algorithms.https://www.kdnuggets.com/2022/06/primary-supervised-learning-algorithms-used-machine-learning.html
-
Generate Synthetic Time-series Data with Open-source Tools
An introduction to the generative adversarial network model DoppelGANger, and how you can use a new open-source PyTorch implementation of it to create high-quality synthetic time-series data.https://www.kdnuggets.com/2022/06/generate-synthetic-timeseries-data-opensource-tools.html
-
Top Data Science Podcasts for 2022
Here are some data science related podcasts to help you either grow your interest in the field, increase your current knowledge, or help you develop yourself.https://www.kdnuggets.com/2022/06/top-data-science-podcasts-2022.html
-
Predicting Cryptocurrency Prices Using Regression Models
In this article, we explore how to get started with the prediction of cryptocurrency prices using multiple linear regression. The factors investigated include predictions on various time intervals as well as the use of various features in the models such as opening price, high price, low price and volume.https://www.kdnuggets.com/2022/05/predicting-cryptocurrency-prices-regression-models.html
-
Image Classification with Convolutional Neural Networks (CNNs)
In this article, we’ll look at what Convolutional Neural Networks are and how they work.https://www.kdnuggets.com/2022/05/image-classification-convolutional-neural-networks-cnns.html
-
5 Different Ways to Load Data in Python
Data is the bread and butter of a Data Scientist, so knowing many approaches to loading data for analysis is crucial. Here, five Python techniques to bring in your data are reviewed with code examples for you to follow.https://www.kdnuggets.com/2020/08/5-different-ways-load-data-python.html
-
How to Ace Data Science Assessment Test by Using Automatic EDA Tools
By using a few lines of code, you can understand key aspects of a given dataset. These tools have helped me answer business-related questions during the data assessment test by Alooba.https://www.kdnuggets.com/2022/04/ace-data-science-assessment-test-automatic-eda-tools.html
-
Data Visualization in Python with Seaborn
Learn to create beautiful charts in Python using the Seaborn library.https://www.kdnuggets.com/2022/04/data-visualization-python-seaborn.html
-
Data Ingestion with Pandas: A Beginner Tutorial
Learn tricks on importing various data formats using Pandas with a few lines of code. We will be learning to import SQL databases, Excel sheets, HTML tables, CSV, and JSON files with examples.https://www.kdnuggets.com/2022/04/data-ingestion-pandas-beginner-tutorial.html
-
KDnuggets News, April 6: 8 Free MIT Courses to Learn Data Science Online; The Complete Collection Of Data Repositories – Part 1
8 Free MIT Courses to Learn Data Science Online; The Complete Collection Of Data Repositories - Part 1; DBSCAN Clustering Algorithm in Machine Learning; Introductory Pandas Tutorial; People Management for AI: Building High-Velocity AI Teamshttps://www.kdnuggets.com/2022/n14.html
-
Introductory Pandas Tutorial
A gentle introduction to data analysis with Pandas.https://www.kdnuggets.com/2022/03/introductory-pandas-tutorial.html
-
KDnuggets News March 30: The Most Popular Intro to Programming Course From Harvard is Free!; Top 13 Skills That Every Data Scientist Should Have
The Most Popular Intro to Programming Course From Harvard is Free!; Top 13 Skills That Every Data Scientist Should Have; Junior vs Senior Data Scientist Salary: What’s the Difference?; MLOps Is a Mess But That's to be Expected; Data Science at the Command Line: The Free eBookhttps://www.kdnuggets.com/2022/n13.html
-
DIY Automated Machine Learning with Streamlit
In this article, we will create an automated machine learning web app you can actually use.https://www.kdnuggets.com/2021/11/diy-automated-machine-learning-app.html
-
How to Engineer Date Features in Python
This article discusses and demonstrates how to quickly engineer some common date features using Python.https://www.kdnuggets.com/2021/08/engineer-date-features-python.html
-
Building a Tractable, Feature Engineering Pipeline for Multivariate Time Series
A time series feature engineering pipeline requires different transformations such as imputation and window aggregation, which follows a sequence of stages. This article demonstrates the building of a pipeline to derive multivariate time series features such that the features can then be easily tracked and validated.https://www.kdnuggets.com/2022/03/building-tractable-feature-engineering-pipeline-multivariate-time-series.html
-
Build a Machine Learning Web App in 5 Minutes
In this article, you will learn to export your models and use them outside a Jupyter Notebook environment. You will build a simple web application that is able to feed user input into a machine learning model, and display an output prediction to the user.https://www.kdnuggets.com/2022/03/build-machine-learning-web-app-5-minutes.html
-
Top Data Science Tools for 2022
Check out this curated collection for new and popular tools to add to your data stack this year.https://www.kdnuggets.com/2022/03/top-data-science-tools-2022.html
-
How to Filter Data with Python
Let’s dive a little deeper into some simple operations that might make your everyday work a little easier.https://www.kdnuggets.com/2022/02/filter-data-python.html
-
The Challenges of Creating Features for Machine Learning
What are the challenges of creating features for machine learning and how can we mitigate them.https://www.kdnuggets.com/2022/02/challenges-creating-features-machine-learning.html
-
The Complete Collection of Data Science Cheat Sheets – Part 2
A collection of cheat sheets that will help you prepare for a technical interview on Data Structures & Algorithms, Machine learning, Deep Learning, Natural Language Processing, Data Engineering, Web Frameworks.https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-2.html
-
Managing Your Reusable Python Code as a Data Scientist
Here are a few approaches that I have settled on for managing my own reusable Python code as a data scientist, presented from most to least general code use, and aimed at beginners.https://www.kdnuggets.com/2021/06/managing-reusable-python-code-data-scientist.html
-
The Complete Collection of Data Science Cheat Sheets – Part 1
A collection of cheat sheets that will help you prepare for a technical interview, assessment tests, class presentation, and help you revise core data science concepts.https://www.kdnuggets.com/2022/02/complete-collection-data-science-cheat-sheets-part-1.html
-
19 Data Science Project Ideas for Beginners
This article features 19 data science projects for beginners, categorized into 7 full project tutorials, 5 places to come up with your own data science projects using data, and 7 skills-based data science projects.https://www.kdnuggets.com/2021/11/19-data-science-project-ideas-beginners.html
-
Build a Web Scraper with Python in 5 Minutes
In this article, I will show you how to create a web scraper from scratch in Python.https://www.kdnuggets.com/2022/02/build-web-scraper-python-5-minutes.html
-
Getting Started Cleaning Data
In order to achieve quality data, there is a process that needs to happen. That process is data cleaning. Learn more about the various stages of this process.https://www.kdnuggets.com/2022/01/getting-started-cleaning-data.html
-
The Best Python Courses: An Analysis Summary
What does the data reveal if we ask: "What are the 10 Best Python Courses?". Collecting almost all of the courses from top platforms shows there are plenty to choose from, with over 3000 offerings. This article summarizes my analysis and presents the top three courses.https://www.kdnuggets.com/2022/01/best-python-courses-analysis-summary.html
-
3 Reasons Why Data Scientists Should Use LightGBM
There are many great boosting Python libraries for data scientists to reap the benefits of. In this article, the author discusses LightGBM benefits and how they are specific to your data science job.https://www.kdnuggets.com/2022/01/data-scientists-reasons-lightgbm.html
-
KDnuggets™ News 22:n03, Jan 19: A Deep Look Into 13 Data Scientist Roles and Their Responsibilities; Top Five SQL Window Functions You Should Know For Data Science Interviews
A Deep Look Into 13 Data Scientist Roles and Their Responsibilities; Top Five SQL Window Functions You Should Know For Data Science Interviews; 5 Things to Keep in Mind Before Selecting Your Next Data Science Job; Models Are Rarely Deployed: An Industry-wide Failure in Machine Learning Leadership; Running Redis on Google Colabhttps://www.kdnuggets.com/2022/n03.html
-
The Easiest Way to Make Beautiful Interactive Visualizations With Pandas
Check out these one-liner interactive visualization with Pandas in Python.https://www.kdnuggets.com/2021/12/easiest-way-make-beautiful-interactive-visualizations-pandas.html
-
Explainable Forecasting and Nowcasting with State-of-the-art Deep Neural Networks and Dynamic Factor Model
Review this detailed tutorial with code and revisit the decades-long old problem using a democratized and interpretable AI framework of how precisely can we anticipate the future and understand its causal factors?https://www.kdnuggets.com/2021/12/sota-explainable-forecasting-and-nowcasting.html
-
Alternative Feature Selection Methods in Machine Learning
Feature selection methodologies go beyond filter, wrapper and embedded methods. In this article, I describe 3 alternative algorithms to select predictive features based on a feature importance score.https://www.kdnuggets.com/2021/12/alternative-feature-selection-methods-machine-learning.html
-
Cutting Down Implementation Time by Integrating Jupyter and KNIME
Are you a KNIME fan or a Jupyter fan? Well, here you don’t have to choose.https://www.kdnuggets.com/2021/12/cutting-implementation-time-integrating-jupyter-knime.html
-
Introduction to Clustering in Python with PyCaret
A step-by-step, beginner-friendly tutorial for unsupervised clustering tasks in Python using PyCaret.https://www.kdnuggets.com/2021/12/introduction-clustering-python-pycaret.html
-
Main 2021 Developments and Key 2022 Trends in AI, Data Science, Machine Learning Technology
Our panel of leading experts reviews 2021 main developments and examines the key trends in AI, Data Science, Machine Learning, and Deep Learning Technology.https://www.kdnuggets.com/2021/12/trends-ai-data-science-ml-technology.html
-
Analyzing Scientific Articles with fine-tuned SciBERT NER Model and Neo4j
In this article, we will be analyzing a dataset of scientific abstracts using the Neo4j Graph database and a fine-tuned SciBERT model.https://www.kdnuggets.com/2021/12/analyzing-scientific-articles-finetuned-scibert-ner-model-neo4j.html
-
Introduction to Binary Classification with PyCaret
PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. See how to use it for binary classification.https://www.kdnuggets.com/2021/12/introduction-binary-classification-pycaret.html
-
Avoid These Mistakes with Time Series Forecasting
A few checks to make before training a Machine Learning model on data that could be random.https://www.kdnuggets.com/2021/12/avoid-mistakes-time-series-forecasting.html
-
Movie Recommendations with Spark Collaborative Filtering
Not sure what movie to watch? Ask your recommender system.https://www.kdnuggets.com/2021/12/movie-recommendations-spark-collaborative-filtering.html
-
KDnuggets™ News 21:n45, Dec 1: Most Common SQL Mistakes on Data Science Interviews; Why Machine Learning Engineers are Replacing Data Scientists
Most Common SQL Mistakes on Data Science Interviews; Why Machine Learning Engineers are Replacing Data Scientists; Vote in new KDnuggets Poll: What Percentage of Your Machine Learning Models Have Been Deployed? KDnuggets: Personal History and Nuggets of Experience.https://www.kdnuggets.com/2021/n45.html
-
A Spreadsheet that Generates Python: The Mito JupyterLab Extension
You can call Mito into your Jupyter Environment and each edit you make will generate the equivalent Python in the code cell below.https://www.kdnuggets.com/2021/11/spreadsheet-generates-python-mito-jupyterlab-extension.html
-
Easy Synthetic Data in Python with Faker
Faker is a Python library that generates fake data to supplement or take the place of real world data. See how it can be used for data science.https://www.kdnuggets.com/2021/11/easy-synthetic-data-python-faker.html
-
What Comes After HDF5? Seeking a Data Storage Format for Deep Learning
In this article we are discussing that HDF5 is one of the most popular and reliable formats for non-tabular, numerical data. But this format is not optimized for deep learning work. This article suggests what kind of ML native data format should be to truly serve the needs of modern data scientists.https://www.kdnuggets.com/2021/11/after-hdf5-data-storage-format-deep-learning.html
-
Training BPE, WordPiece, and Unigram Tokenizers from Scratch using Hugging Face
Comparing the tokens generated by SOTA tokenization algorithms using Hugging Face's tokenizers package.https://www.kdnuggets.com/2021/10/bpe-wordpiece-unigram-tokenizers-using-hugging-face.html
-
KDnuggets™ News 21:n40, Oct 20: The 20 Python Packages You Need For Machine Learning and Data Science; Ace Data Science Interviews with Portfolio Projects
The 20 Python Packages You Need For Machine Learning and Data Science; How to Ace Data Science Interview by Working on Portfolio Projects; Deploying Your First Machine Learning API; Real Time Image Segmentation Using 5 Lines of Code; What is Clustering and How Does it Work?https://www.kdnuggets.com/2021/n40.html
-
How to calculate confidence intervals for performance metrics in Machine Learning using an automatic bootstrap method
Are your model performance measurements very precise due to a “large” test set, or very uncertain due to a “small” or imbalanced test set?https://www.kdnuggets.com/2021/10/calculate-confidence-intervals-performance-metrics-machine-learning.html
-
How to Auto-Detect the Date/Datetime Columns and Set Their Datatype When Reading a CSV File in Pandas
When read_csv( ) reads e.g. “2021-03-04” and “2021-03-04 21:37:01.123” as mere “object” datatypes, often you can simply auto-convert them all at once to true datetime datatypes.https://www.kdnuggets.com/2021/10/auto-detect-date-datetime-columns-and-set-their-datatype-when-reading-a-csv-file-in-pandas.html
-
How To Build A Database Using Python">How To Build A Database Using Python
Implement your database without handling the SQL using the Flask-SQLAlchemy library.https://www.kdnuggets.com/2021/09/build-database-using-python.html
-
Building a Structured Financial Newsfeed Using Python, SpaCy and Streamlit
Getting started with NLP by building a Named Entity Recognition(NER) application.https://www.kdnuggets.com/2021/09/-structured-financial-newsfeed-using-python-spacy-and-streamlit.html
-
How To Deal With Imbalanced Classification, Without Re-balancing the Data
Before considering oversampling your skewed data, try adjusting your classification decision threshold, in Python.https://www.kdnuggets.com/2021/09/imbalanced-classification-without-re-balancing-data.html
-
Data Engineering Technologies 2021
Emerging technologies supporting the field of data engineering are growing at a rapid clip. This curated list includes the most important offerings available in 2021.https://www.kdnuggets.com/2021/09/data-engineering-technologies-2021.html
-
If You Can Write Functions, You Can Use Dask
This article is the second article of an ongoing series on using Dask in practice. Each article in this series will be simple enough for beginners, but provide useful tips for real work. The first article in the series is about using LocalCluster.https://www.kdnuggets.com/2021/09/write-functions-use-dask.html
-
Working with Python APIs For Data Science Project
In this article, we will work with YouTube Python API to collect video statistics from our channel using the requests python library to make an API call and save it as a Pandas DataFrame.https://www.kdnuggets.com/2021/09/python-apis-data-science-project.html
-
How to Create an AutoML Pipeline Optimization Sandbox
In this article, we will implement an automated machine learning pipeline optimization sandbox web app using Streamlit and TPOT.https://www.kdnuggets.com/2021/09/automl-pipeline-optimization-sandbox.html
-
How to Create Stunning Web Apps for your Data Science Projects">How to Create Stunning Web Apps for your Data Science Projects
Data scientists do not have to learn HTML, CSS, and JavaScript to build web pages.https://www.kdnuggets.com/2021/09/create-stunning-web-apps-data-science-projects.html
-
Build a synthetic data pipeline using Gretel and Apache Airflow
In this blog post, we build an ETL pipeline that generates synthetic data from a PostgreSQL database using Gretel’s Synthetic Data APIs and Apache Airflow.https://www.kdnuggets.com/2021/09/build-synthetic-data-pipeline-gretel-apache-airflow.html
-
Do You Read Excel Files with Python? There is a 1000x Faster Way">Do You Read Excel Files with Python? There is a 1000x Faster Way
In this article, I’ll show you five ways to load data in Python. Achieving a speedup of 3 orders of magnitude.https://www.kdnuggets.com/2021/09/excel-files-python-1000x-faster-way.html
-
CSV Files for Storage? No Thanks. There’s a Better Option
Saving data to CSV’s is costing you both money and disk space. It’s time to end it.https://www.kdnuggets.com/2021/08/csv-files-storage-better-option.html
-
Multilabel Document Categorization, step by step example
This detailed guide explores an unsupervised and supervised learning two-stage approach with LDA and BERT to develop a domain-specific document categorizer on unlabeled documents.https://www.kdnuggets.com/2021/08/multilabel-document-categorization.html
-
15 Python Snippets to Optimize your Data Science Pipeline
Quick Python solutions to help your data science cycle.https://www.kdnuggets.com/2021/08/15-python-snippets-optimize-data-science-pipeline.html
-
Learning Data Science and Machine Learning: First Steps After The Roadmap">Learning Data Science and Machine Learning: First Steps After The Roadmap
Just getting into learning data science may seem as daunting as (if not more than) trying to land your first job in the field. With so many options and resources online and in traditional academia to consider, these pre-requisites and pre-work are recommended before diving deep into data science and AI/ML.https://www.kdnuggets.com/2021/08/learn-data-science-machine-learning.html
-
5 Things That Make My Job as a Data Scientist Easier
After working as a Data Scientist for a year, I am here to share some things I learnt along the way that I feel are helpful and have increased my efficiency. Hopefully some of these tips can help you in your journey :)https://www.kdnuggets.com/2021/08/5-things-job-data-scientist-easier.html
-
Data Scientist’s Guide to Efficient Coding in Python
Read this fantastic collection of tips and tricks the author uses for writing clean code on a day-to-day basis.https://www.kdnuggets.com/2021/08/data-scientist-guide-efficient-coding-python.html
-
Prefect: How to Write and Schedule Your First ETL Pipeline with Python">Prefect: How to Write and Schedule Your First ETL Pipeline with Python
Workflow management systems made easy — both locally and in the cloud.https://www.kdnuggets.com/2021/08/prefect-write-schedule-etl-pipeline-python.html
-
GPU-Powered Data Science (NOT Deep Learning) with RAPIDS">GPU-Powered Data Science (NOT Deep Learning) with RAPIDS
How to utilize the power of your GPU for regular data science and machine learning even if you do not do a lot of deep learning work.https://www.kdnuggets.com/2021/08/gpu-powered-data-science-deep-learning-rapids.html
-
Not Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics">Not Only for Deep Learning: How GPUs Accelerate Data Science & Data Analytics
Modern AI/ML systems’ success has been critically dependent on their ability to process massive amounts of raw data in a parallel fashion using task-optimized hardware. Can we leverage the power of GPU and distributed computing for regular data processing jobs too?https://www.kdnuggets.com/2021/07/deep-learning-gpu-accelerate-data-science-data-analytics.html
-
Top Python Data Science Interview Questions
Six must-know technical concepts and two types of questions to test them.https://www.kdnuggets.com/2021/07/top-python-data-science-interview-questions.html
-
Date Processing and Feature Engineering in Python
Have a look at some code to streamline the parsing and processing of dates in Python, including the engineering of some useful and common features.https://www.kdnuggets.com/2021/07/date-pre-processing-feature-engineering-python.html
-
Streamlit Tips, Tricks, and Hacks for Data Scientists
Today, I am going to talk about a few tips that I learned within more than a year of using Streamlit, that you can also use to unleash your powerful DS/AI/ML (whatever they may be) applications.https://www.kdnuggets.com/2021/07/streamlit-tips-tricks-hacks-data-scientists.html
-
5 Python Data Processing Tips & Code Snippets">5 Python Data Processing Tips & Code Snippets
This is a small collection of Python code snippets that a beginner might find useful for data processing.https://www.kdnuggets.com/2021/07/python-tips-snippets-data-processing.html
-
A Lightning Fast Look at Single Line Exploratory Data Analysis
Here's a very quick look at how you can perform EDA with a single line of code using D-Tale.https://www.kdnuggets.com/2021/07/single-line-exploratory-data-analysis.html
-
Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python">Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python
While the Pandas library remains a crucial workhorse in data processing and management for data science, some limitations exist that can impact efficiencies, especially with very large data sets. Here, a few interesting alternatives to Pandas are introduced to improve your large data handling performance.https://www.kdnuggets.com/2021/07/pandas-alternatives-processing-larger-faster-data-python.html
-
ROC Curve Explained
Learn to visualise a ROC curve in Python.https://www.kdnuggets.com/2021/07/roc-curve-explained.html
-
Data Scientists and ML Engineers Are Luxury Employees">Data Scientists and ML Engineers Are Luxury Employees
Maybe it seems that everyone wants to become a data scientist and every organization wants to hire one as quickly as possible. However, a mismatch often exists between what companies tend to need and what ML practitioners want to do. So, it's time for the field to take another step toward maturity through an enhanced appreciation of the broad range of technical foundations for an organization to become data-driven.https://www.kdnuggets.com/2021/07/data-scientists-machine-learning-engineers-luxury-employees.html
-
From Scratch: Permutation Feature Importance for ML Interpretability
Use permutation feature importance to discover which features in your dataset are useful for prediction — implemented from scratch in Python.https://www.kdnuggets.com/2021/06/from-scratch-permutation-feature-importance-ml-interpretability.html
-
Create and Deploy Dashboards using Voila and Saturn Cloud
Working with and training large datasets, maintaining them all in one place, and deploying them to production is a challenging job. In this article, we covered what Saturn Cloud is and how it can speed up your end-to-end pipeline, how to create dashboards using Voila and Python and publish them to production in just a few easy steps.https://www.kdnuggets.com/2021/06/create-deploy-dashboards-voila-saturn-cloud.html
-
Pandas vs SQL: When Data Scientists Should Use Each Tool">Pandas vs SQL: When Data Scientists Should Use Each Tool
Exploring data sets and understanding its structure, content, and relationships is a routine and core process for any Data Scientist. Multiple tools exist for performing such analysis, and we take a deep dive into the benefits and different approaches of two important tools, SQL and Pandas.https://www.kdnuggets.com/2021/06/pandas-vs-sql.html
-
How to troubleshoot memory problems in Python
Memory problems are hard to diagnose and fix in Python. This post goes through a step-by-step process for how to pinpoint and fix memory leaks using popular open source python packages.https://www.kdnuggets.com/2021/06/troubleshoot-memory-problems-python.html
-
Get Interactive Plots Directly With Pandas">Get Interactive Plots Directly With Pandas
Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.https://www.kdnuggets.com/2021/06/interactive-plots-directly-pandas.html
-
Supercharge Your Machine Learning Experiments with PyCaret and Gradio
A step-by-step tutorial to develop and interact with machine learning pipelines rapidly.https://www.kdnuggets.com/2021/05/supercharge-machine-learning-experiments-pycaret-gradio.html
-
Topic Modeling with Streamlit
What does it take to create and deploy a topic modeling web application quickly? Read this post to see how the author uses Python NLP packages for topic modeling, Streamlit for the web application framework, and Streamlit Sharing for deployment.https://www.kdnuggets.com/2021/05/topic-modeling-streamlit.html
-
Data Validation in Machine Learning is Imperative, Not Optional
Before we reach model training in the pipeline, there are various components like data ingestion, data versioning, data validation, and data pre-processing that need to be executed. In this article, we will discuss data validation, why it is important, its challenges, and more.https://www.kdnuggets.com/2021/05/data-validation-machine-learning-imperative.html
-
Animated Bar Chart Races in Python
A quick and step-by-step beginners project to create an animation bar graph for an amazing Covid dataset.https://www.kdnuggets.com/2021/05/animated-race-bar-charts-python.html
-
Vaex: Pandas but 1000x faster">Vaex: Pandas but 1000x faster
If you are working with big data, especially on your local machine, then learning the basics of Vaex, a Python library that enables the fast processing of large datasets, will provide you with a productive alternative to Pandas.https://www.kdnuggets.com/2021/05/vaex-pandas-1000x-faster.html
-
Super Charge Python with Pandas on GPUs Using Saturn Cloud
Saturn Cloud is a tool that allows you to have 10 hours of free GPU computing and 3 hours of Dask Cluster computing a month for free. In this tutorial, you will learn how to use these free resources to process data using Pandas on a GPU. The experiments show that Pandas is over 1,000,000% slower on a CPU as compared to running Pandas on a Dask cluster of GPUs.https://www.kdnuggets.com/2021/05/super-charge-python-pandas-gpus-saturn-cloud.html
-
Multiple Time Series Forecasting with PyCaret
A step-by-step tutorial to forecast multiple time series with PyCaret.https://www.kdnuggets.com/2021/04/multiple-time-series-forecasting-pycaret.html