- Do You Read Excel Files with Python? There is a 1000x Faster Way - Sep 1, 2021.
In this article, I’ll show you five ways to load data in Python. Achieving a speedup of 3 orders of magnitude.
- CSV Files for Storage? No Thanks. There’s a Better Option - Aug 31, 2021.
Saving data to CSV’s is costing you both money and disk space. It’s time to end it.
- 5 Things That Make My Job as a Data Scientist Easier - Aug 23, 2021.
After working as a Data Scientist for a year, I am here to share some things I learnt along the way that I feel are helpful and have increased my efficiency. Hopefully some of these tips can help you in your journey :)
- KDnuggets™ News 21:n30, Aug 11: Most Common Data Science Interview Questions and Answers; How Visualization is Transforming Exploratory Data Analysis - Aug 11, 2021.
Most Common Data Science Interview Questions and Answers; How Visualization is Transforming Exploratory Data Analysis; How To Become A Freelance Data Scientist – 4 Practical Tips; How to Query Your Pandas Dataframe; Essential Math for Data Science: Introduction to Systems of Linear Equations
- How to Query Your Pandas Dataframe - Aug 9, 2021.
A Data Scientist’s perspective on SQL-like Python functions.
- KDnuggets™ News 21:n26, Jul 14: Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python; 5 Python Data Processing Tips - Jul 14, 2021.
If Pandas not enough, here are a few good alternatives to processing larger and faster data in Python; 5 Python Data Processing Tips and Code Snippets; Relax! Data Scientists will not go extinct in 10 years, but the role will change; How to Get Practical Data Science Experience to be Career-Ready.
- 5 Python Data Processing Tips & Code Snippets - Jul 9, 2021.
This is a small collection of Python code snippets that a beginner might find useful for data processing.
- Pandas not enough? Here are a few good alternatives to processing larger and faster data in Python - Jul 8, 2021.
While the Pandas library remains a crucial workhorse in data processing and management for data science, some limitations exist that can impact efficiencies, especially with very large data sets. Here, a few interesting alternatives to Pandas are introduced to improve your large data handling performance.
- How to Get Practical Data Science Experience to be Career-Ready - Jul 7, 2021.
Becoming a professional in the field of data science takes more than just book-smarts. You need to have experience with real-world data sets, frequently-used tools, and an intuition for solutions that you can only gain from hands-on experience. These resources will jump start developing your practical skills.
- KDnuggets™ News 21:n23, Jun 23: Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months - Jun 23, 2021.
Pandas vs SQL: When Data Scientists Should Use Each Tool; How to Land a Data Analytics Job in 6 Months; A Graph-based Text Similarity Method with Named Entity Information in NLP; The Best Way to Learn Practical NLP?; An introduction to Explainable AI (XAI) and Explainable Boosting Machines (EBM)
- Pandas vs SQL: When Data Scientists Should Use Each Tool - Jun 21, 2021.
Exploring data sets and understanding its structure, content, and relationships is a routine and core process for any Data Scientist. Multiple tools exist for performing such analysis, and we take a deep dive into the benefits and different approaches of two important tools, SQL and Pandas.
- Get Interactive Plots Directly With Pandas - Jun 14, 2021.
Telling a story with data is a core function for any Data Scientist, and creating data visualizations that are simultaneously illuminating and appealing can be challenging. This tutorial reviews how to create Plotly and Bokeh plots directly through Pandas plotting syntax, which will help you convert static visualizations into interactive counterparts -- and take your analysis to the next level.
- Make Pandas 3 Times Faster with PyPolars - May 31, 2021.
Learn how to speed up your Pandas workflow using the PyPolars library.
- How to Deal with Categorical Data for Machine Learning - May 24, 2021.
Check out this guide to implementing different types of encoding for categorical data, including a cheat sheet on when to use what type.
- Animated Bar Chart Races in Python - May 18, 2021.
A quick and step-by-step beginners project to create an animation bar graph for an amazing Covid dataset.
- Vaex: Pandas but 1000x faster - May 17, 2021.
If you are working with big data, especially on your local machine, then learning the basics of Vaex, a Python library that enables the fast processing of large datasets, will provide you with a productive alternative to Pandas.
- Super Charge Python with Pandas on GPUs Using Saturn Cloud - May 12, 2021.
Saturn Cloud is a tool that allows you to have 10 hours of free GPU computing and 3 hours of Dask Cluster computing a month for free. In this tutorial, you will learn how to use these free resources to process data using Pandas on a GPU. The experiments show that Pandas is over 1,000,000% slower on a CPU as compared to running Pandas on a Dask cluster of GPUs.
- KDnuggets™ News 21:n18, May 12: Data Preparation in SQL, with Cheat Sheet!; Rebuilding 7 Python Projects - May 12, 2021.
Data Preparation in SQL, with Cheat Sheet!; Rebuilding My 7 Python Projects; Applying Python’s Explode Function to Pandas DataFrames; Essential Linear Algebra for Data Science and Machine Learning; Similarity Metrics in NLP
- Applying Python’s Explode Function to Pandas DataFrames - May 7, 2021.
Read this applied Python method to solve the issue of accessing column by date/ year using the Pandas library and functions lambda(), list(), map() & explode().
- Top 10 Python Libraries Data Scientists should know in 2021 - Mar 24, 2021.
So many Python libraries exist that offer powerful and efficient foundations for supporting your data science work and machine learning model development. While the list may seem overwhelming, there are certain libraries you should focus your time on, as they are some of the most commonly used today.
- How to Speed Up Pandas with Modin - Mar 10, 2021.
The Modin library has the ability to scale your pandas workflows by changing one line of code and integration with the Python ecosystem and Ray clusters. This tutorial goes over how to get started with Modin and how it can speed up your pandas workflows.
- KDnuggets™ News 21:n10, Mar 10: More Resources for Women in AI, Data Science, and Machine Learning; Speeding up Scikit-Learn Model Training - Mar 10, 2021.
More Resources for Women in AI, Data Science, and Machine Learning; Speeding up Scikit-Learn Model Training; Dask and Pandas: No Such Thing as Too Much Data; 9 Skills You Need to Become a Data Engineer; 8 Women in AI Who Are Striving to Humanize the World
- 11 Essential Code Blocks for Complete EDA (Exploratory Data Analysis) - Mar 5, 2021.
This article is a practical guide to exploring any data science project and gain valuable insights.
- Dask and Pandas: No Such Thing as Too Much Data - Mar 4, 2021.
Do you love pandas, but don't love it when you reach the limits of your memory or compute resources? Dask provides you with the option to use the pandas API with distributed data and computing. Learn how it works, how to use it, and why it’s worth the switch when you need it most.
- Are You Still Using Pandas to Process Big Data in 2021? Here are two better options - Mar 1, 2021.
When its time to handle a lot of data -- so much that you are in the realm of Big Data -- what tools can you use to wrangle the data, especially in a notebook environment? Pandas doesn’t handle really Big Data very well, but two other libraries do. So, which one is better and faster?
- Pandas Profiling: One-Line Magical Code for EDA - Feb 24, 2021.
EDA can be automated using a Python library called Pandas Profiling. Let’s explore Pandas profiling to do EDA in a very short time and with just a single line code.
- 7 Most Recommended Skills to Learn to be a Data Scientist - Feb 10, 2021.
The Data Scientist professional has emerged as a true interdisciplinary role that spans a variety of skills, theoretical and practical. For the core, day-to-day activities, many critical requirements that enable the delivery of real business value reach well outside the realm of machine learning, and should be mastered by those aspiring to the field.
- Build Your First Data Science Application - Feb 4, 2021.
Check out these seven Python libraries to make your first data science MVP application.
- Cleaner Data Analysis with Pandas Using Pipes - Jan 15, 2021.
Check out this practical guide on Pandas pipes.
- KDnuggets™ News 20:n47, Dec 16: A Rising Library Beating Pandas in Performance; R or Python? Why Not Both? - Dec 16, 2020.
Also: 10 Python Skills They Don't Teach in Bootcamp; Data Science Volunteering: Ways to Help; A Journey from Software to Machine Learning Engineer; Data Science and Machine Learning: The Free eBook
- A Rising Library Beating Pandas in Performance - Dec 11, 2020.
This article compares the performance of the well-known pandas library with pypolars, a rising DataFrame library written in Rust. See how they compare.
- Merging Pandas DataFrames in Python - Dec 8, 2020.
A quick how-to guide for merging Pandas DataFrames in Python.
- KDnuggets™ News 20:n43, Nov 11: The Best Data Science Certification You’ve Never Heard Of; Essential data science skills that no one talks about - Nov 11, 2020.
The Best Data Science Certification You've Never Heard Of; Essential data science skills that no one talks about; Pandas on Steroids: End to End Data Science in Python with Dask; How to Build a Football Dataset with Web Scraping; 2 Coding-free Ways to Extract Content From Websites to Boost Web Traffic
- Every Complex DataFrame Manipulation, Explained & Visualized Intuitively - Nov 10, 2020.
Most Data Scientists might hail the power of Pandas for data preparation, but many may not be capable of leveraging all that power. Manipulating data frames can quickly become a complex task, so eight of these techniques within Pandas are presented with an explanation, visualization, code, and tricks to remember how to do it.
- Pandas on Steroids: End to End Data Science in Python with Dask - Nov 6, 2020.
End to end parallelized data science from reading big data to data manipulation to visualisation to machine learning.
- 10 Underrated Python Skills - Oct 21, 2020.
Tips for feature analysis, hyperparameter tuning, data visualization and more.
- Top KDnuggets tweets, Oct 7-13: Every DataFrame Manipulation, Explained and Visualized Intuitively - Oct 14, 2020.
Also Free Introductory Machine Learning Course From Amazon; A Complete Guide to Learn #DataScience in 100 Days; Top 3 Books for Every #DataEngineer.
- Introduction to Time Series Analysis in Python - Sep 24, 2020.
Data that is updated in real-time requires additional handling and special care to prepare it for machine learning models. The important Python library, Pandas, can be used for most of this work, and this tutorial guides you through this process for analyzing time-series data.
- Statistical and Visual Exploratory Data Analysis with One Line of Code - Sep 21, 2020.
If EDA is not executed correctly, it can cause us to start modeling with “unclean” data. See how to use Pandas Profiling to perform EDA with a single line of code.
- Bring your Pandas Dataframes to life with D-Tale - Aug 13, 2020.
Bring your Pandas dataframes to life with D-Tale. D-Tale is an open-source solution for which you can visualize, analyze and learn how to code Pandas data structures. In this tutorial you'll learn how to open the grid, build columns, create charts and view code exports.
- The Machine Learning Field Guide - Aug 3, 2020.
This straightforward guide offers a structured overview of all machine learning prerequisites needed to start working on your project, including the complete data pipeline from importing and cleaning data to modelling and production.
- Fuzzy Joins in Python with d6tjoin - Jul 31, 2020.
Combining different data sources is a time suck! d6tjoin is a python library that lets you join pandas dataframes quickly and efficiently.
- 3 Advanced Python Features You Should Know - Jul 16, 2020.
As a Data Scientist, you are already spending most of your time getting your data ready for prime time. Follow these real-world scenarios to learn how to leverage the advanced techniques in Python of list comprehension, Lambda expressions, and the Map function to get the job done faster.
- Pull and Analyze Financial Data Using a Simple Python Package - Jul 9, 2020.
We demonstrate a simple Python script/package to help you pull financial data (all the important metrics and ratios that you can think of) and plot them.
- KDnuggets™ News 20:n26, Jul 8: Speed up Your Numpy and Pandas; A Layman’s Guide to Data Science; Getting Started with TensorFlow 2 - Jul 8, 2020.
Speed up your Numpy and Pandas with NumExpr Package; A Layman's Guide to Data Science. Part 3: Data Science Workflow; Getting Started with TensorFlow 2; Feature Engineering in SQL and Python: A Hybrid Approach; Deploy Machine Learning Pipeline on AWS Fargate
- Exploratory Data Analysis on Steroids - Jul 6, 2020.
This is a central aspect of Data Science, which sometimes gets overlooked. The first step of anything you do should be to know your data: understand it, get familiar with it. This concept gets even more important as you increase your data volume: imagine trying to parse through thousands or millions of registers and make sense out of them.
- Speed up your Numpy and Pandas with NumExpr Package - Jul 1, 2020.
We show how to significantly speed up your mathematical calculations in Numpy and Pandas using a small library.
- Machine Learning in Dask - Jun 22, 2020.
In this piece, we’ll see how we can use Dask to work with large datasets on our local machines.
- Introduction to Pandas for Data Science - Jun 1, 2020.
The Pandas library is core to any Data Science work in Python. This introduction will walk you through the basics of data manipulating, and features many of Pandas important features.
- Faster machine learning on larger graphs with NumPy and Pandas - May 27, 2020.
One of the most exciting features of StellarGraph 1.0 is a new graph data structure — built using NumPy and Pandas — that results in significantly lower memory usage and faster construction times.
- Pandas in action! - May 20, 2020.
Pandas is instantly familiar to anyone who’s used spreadsheet software, whether that’s Google Sheets or good old Excel. It’s got columns, it’s got grids, it’s got rows; but pandas is far more powerful. Save 40% with code nlkdpandas40 on this book, and other Manning books and videos.
- KDnuggets™ News 20:n16, Apr 22: Scaling Pandas with Dask for Big Data; Dive Into Deep Learning: The Free eBook - Apr 22, 2020.
4 Steps to ensure your AI/Machine Learning system survives COVID-19; State of the Machine Learning and AI Industry; A Key Missing Part of the Machine Learning Stack; 5 Papers on CNNs Every Data Scientist Should Read
- Pandas in action - Apr 15, 2020.
Pandas is instantly familiar to anyone who’s used spreadsheet software, whether that’s Google Sheets or good old Excel. It’s got columns, it’s got grids, it’s got rows; but pandas is far more powerful. Save 40% with code nlkdpandas40 on this book, and other Manning books and videos.
- KDnuggets™ News 20:n14, Apr 8: Free Mathematics for Machine Learning eBook; Epidemiology Courses for Data Scientists - Apr 8, 2020.
Stop Hurting Your Pandas!; Python for data analysis... is it really that simple?!?; Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs; Build an app to generate photorealistic faces using TensorFlow and Streamlit; 5 Ways Data Scientists Can Help Respond to COVID-19 and 5 Actions to Avoid
- Stop Hurting Your Pandas! - Apr 3, 2020.
This post will address the issues that can arise when Pandas slicing is used improperly. If you see the warning that reads "A value is trying to be set on a copy of a slice from a DataFrame", this post is for you.
- Python for data analysis… is it really that simple?!? - Apr 2, 2020.
The article addresses a simple data analytics problem, comparing a Python and Pandas solution to an R solution (using plyr, dplyr, and data.table), as well as kdb+ and BigQuery solutions. Performance improvement tricks for these solutions are then covered, as are parallel/cluster computing approaches and their limitations.
- Python Pandas For Data Discovery in 7 Simple Steps - Mar 10, 2020.
Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.
- Top KDnuggets tweets, Jan 15-21: My Pandas Cheat Sheet; 5 Key Reasons Why Data Scientists Are Quitting their Jobs - Jan 22, 2020.
5 Key Reasons Why Data Scientists Are Quitting their Jobs; My Pandas Cheat Sheet; Google Colab: Jupyter Lab on steroids (perfect for Deep Learning); Top 5 Must-have Data Science Skills.
- KDnuggets™ News 19:n48, Dec 18: Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; Poll on AutoML - Dec 18, 2019.
Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; New Poll: Does AutoML work? Ultralearn Data Science; Python Dictionary How-To; Top stories of 2019 and more.
- Build Pipelines with Pandas Using pdpipe - Dec 13, 2019.
We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.
- KDnuggets™ News 19:n43, Nov 13: Dynamic Reports in Python and R; Creating NLP Vocabularies; What is Data Science? - Nov 13, 2019.
On KDnuggets this week: Orchestrating Dynamic Reports in Python and R with Rmd Files; How to Create a Vocabulary for NLP Tasks in Python; What is Data Science?; The Complete Data Science LinkedIn Profile Guide; Set Operations Applied to Pandas DataFrames; and much, much more.
- How to Speed up Pandas by 4x with one line of code - Nov 12, 2019.
While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.
- Understanding Boxplots - Nov 8, 2019.
A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.
- Data Cleaning and Preprocessing for Beginners - Nov 7, 2019.
Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.
- Set Operations Applied to Pandas DataFrames - Nov 7, 2019.
In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.
- KDnuggets™ News 19:n41, Oct 30: Feature Selection: Beyond feature importance?; Time Series Analysis Using KNIME and Spark - Oct 30, 2019.
This week in KDnuggets: Feature Selection: Beyond feature importance?; Time Series Analysis: A Simple Example with KNIME and Spark; 5 Advanced Features of Pandas and How to Use Them; How to Measure Foot Traffic Using Data Analytics; Introduction to Natural Language Processing (NLP); and much, much more!
- 5 Advanced Features of Pandas and How to Use Them - Oct 25, 2019.
The pandas library offers core functionality when preparing your data using Python. But, many don't go beyond the basics, so learn about these lesser-known advanced methods that will make handling your data easier and cleaner.
- Exploratory Data Analysis Using Python - Aug 7, 2019.
In this tutorial, you’ll use Python and Pandas to explore a dataset and create visual distributions, identify and eliminate outliers, and uncover correlations between two datasets.
- KDnuggets™ News 19:n29, Aug 7: What 70% of Data Science Learners Do Wrong; Pytorch Cheat Sheet for Beginners - Aug 7, 2019.
This week on KDnuggets: What 70% of Data Science Learners Do Wrong; Pytorch Cheat Sheet for Beginners and Udacity Deep Learning Nanodegree; How a simple mix of object-oriented programming can sharpen your deep learning prototype; Can we trust AutoML to go on full autopilot?; Ten more random useful things in R you may not know about; 25 Tricks for Pandas; and much more!
- 25 Tricks for Pandas - Aug 6, 2019.
Check out this video (and Jupyter notebook) which outlines a number of Pandas tricks for working with and manipulating data, covering topics such as string manipulations, splitting and filtering DataFrames, combining and aggregating data, and more.
- 10 Simple Hacks to Speed up Your Data Analysis in Python - Jul 11, 2019.
This article lists some curated tips for working with Python and Jupyter Notebooks, covering topics such as easily profiling data, formatting code and output, debugging, and more. Hopefully you can find something useful within.
- Top KDnuggets Tweets, Jun 19 – 25: Learn how to efficiently handle large amounts of data using #Pandas; The biggest mistake while learning #Python for #datascience - Jun 26, 2019.
Also: Data Science Jobs Report 2019; Harvard CS109 #DataScience Course, Resources #Free and Online; Google launches TensorFlow; Mastering SQL for Data Science
- KDnuggets™ News 19:n24, Jun 26: Understand Cloud Services; Pandas Tips & Tricks; Master Data Preparation w/ Python - Jun 26, 2019.
Happy summer! This week on KDnuggets: Understanding Cloud Data Services; How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat; 7 Steps to Mastering Data Preparation for Machine Learning with Python; Examining the Transformer Architecture: The OpenAI GPT-2 Controversy; Data Literacy: Using the Socratic Method; and much more!
- 7 Steps to Mastering Data Preparation for Machine Learning with Python — 2019 Edition - Jun 24, 2019.
Interested in mastering data preparation with Python? Follow these 7 steps which cover the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
- How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat - Jun 19, 2019.
Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.
- KDnuggets™ News 19:n23, Jun 19: Useful Stats for Data Scientists; Python, TensorFlow & R Winners in Latest Job Report - Jun 19, 2019.
This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!
- How to Learn Python for Data Science the Right Way - Jun 14, 2019.
The biggest mistake you can make while learning Python for data science is to learn Python programming from courses meant for programmers. Avoid this mistake, and learn Python the right way by following this approach.
- Become a Pro at Pandas, Python’s Data Manipulation Library - Jun 13, 2019.
Pandas is one of the most popular Python libraries for cleaning, transforming, manipulating and analyzing data. Learn how to efficiently handle large amounts of data using Pandas.
- Scalable Python Code with Pandas UDFs: A Data Science Application - Jun 13, 2019.
There is still a gap between the corpus of libraries that developers want to apply in a scalable runtime and the set of libraries that support distributed execution. This post discusses how to bridge this gap using the the functionality provided by Pandas UDFs in Spark 2.3+
- 7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
- KDnuggets™ News 19:n17, May 1: The most desired skill in data science; Seeking KDnuggets Editors, work remotely - May 1, 2019.
This week, find out about the most desired skill in data science, learn which projects to include in your portfolio, identify a single strategy for pulling data from a Pandas DataFrame (once and for all), read the results of our Top Data Science and Machine Learning Methods poll, and much more.
- Pandas DataFrame Indexing - Apr 29, 2019.
The goal of this post is identify a single strategy for pulling data from a DataFrame using the Pandas Python library that is straightforward to interpret and produces reliable results.
- Python Data Science for Beginners - Feb 20, 2019.
Python’s syntax is very clean and short in length. Python is open-source and a portable language which supports a large standard library. Buy why Python for data science? Read on to find out more.
- Top KDnuggets tweets, Jan 30 – Feb 05: state-of-the-art in #AI, #MachineLearning - Feb 6, 2019.
Also Brilliant tour-de-force! Reinforcement Learning to solve Rubiks Cube; Dask, Pandas, and GPUs: first steps; Neural network AI is simple. So Stop pretending you are a genius.
- Top Python Libraries in 2018 in Data Science, Deep Learning, Machine Learning - Dec 19, 2018.
Here are the top 15 Python libraries across Data Science, Data Visualization. Deep Learning, and Machine Learning.
- Top 10 Python Data Science Libraries - Nov 16, 2018.
The third part of our series investigating the top Python Libraries across Machine Learning, AI, Deep Learning and Data Science.
- Healthcare Analytics Made Simple - Nov 12, 2018.
Finally, a book on Python healthcare machine learning techniques is here! Healthcare Analytics Made Simple does just what the title says: it makes healthcare data science simple and approachable for everyone.
- Beginner Data Visualization & Exploration Using Pandas - Oct 22, 2018.
This tutorial will offer a beginner guide into how to get around with Pandas for data wrangling and visualization.
Pages: 1 2
- Optimus v2: Agile Data Science Workflows Made Easy - Aug 30, 2018.
Looking for a library to skyrocket your productivity as Data Scientist? Check this out!
- Programming Best Practices For Data Science - Aug 7, 2018.
In this post, I'll go over the two mindsets most people switch between when doing programming work specifically for data science: the prototype mindset and the production mindset.
- Top 20 Python Libraries for Data Science in 2018 - Jun 27, 2018.
Our selection actually contains more than 20 libraries, as some of them are alternatives to each other and solve the same problem. Therefore we have grouped them as it's difficult to distinguish one particular leader at the moment.
Pages: 1 2
- Swiftapply – Automatically efficient pandas apply operations - Apr 24, 2018.
Using Swiftapply, easily apply any function to a pandas dataframe in the fastest available manner.
- Quick Feature Engineering with Dates Using fast.ai - Mar 16, 2018.
The fast.ai library is a collection of supplementary wrappers for a host of popular machine learning libraries, designed to remove the necessity of writing your own functions to take care of some repetitive tasks in a machine learning workflow.
- Using Excel with Pandas - Jan 23, 2018.
In this tutorial, we are going to show you how to work with Excel files in pandas, covering computer setup, reading in data from Excel files into pandas, data exploration in pandas, and more.
Pages: 1 2
- Top KDnuggets tweets, Jan 3-9: A collection of Jupyter notebooks NumPy, Pandas, matplotlib, basic #Python #MachineLearning - Jan 10, 2018.
Artificial General Intelligence (AGI) in less than 50 years; Top KDnuggets tweets: 10 Free Must-Read Books for #MachineLearning and #DataScience; The Art of Learning #DataScience; Supercharging Visualization with Apache Arrow; Docker for #DataScience
- Python Data Preparation Case Files: Group-based Imputation - Sep 25, 2017.
The second part in this series addresses group-based imputation for dealing with missing data values. Check out why finding group means can be a more formidable action than overall means, and see how to accomplish it in Python.
- Top KDnuggets tweets, Sep 13-19: Top Books on NLP; What Else Can AI Guess From Your Face? - Sep 20, 2017.
Also: The Ten Fallacies of Data Science; #Python #Pandas tips and tricks; Geoff Hinton says we need to start all over.
- Python Data Preparation Case Files: Removing Instances & Basic Imputation - Sep 14, 2017.
This is the first of 3 posts to cover imputing missing values in Python using Pandas. The slowest-moving of the series (out of necessity), this first installment lays out the task and data at the risk of boring you. The next 2 posts cover group- and regression-based imputation.
- Top KDnuggets tweets, Aug 30 – Sep 5: Python overtakes R, becomes the leader in #DataScience; Humble Book Bundle: #DataScience - Sep 6, 2017.
Also: Pandas tips and tricks #Python #DataScience; How I replicated an $86 million project in 57 lines of code; Future #MachineLearning Class.
- 6 Interesting Things You Can Do with Python on Facebook Data - Jun 6, 2017.
Facebook has a huge amount of data that is available for you to explore, you can do many things with this data. I will be sharing my experience with you on how you can use the Facebook Graph API for analysis with Python.
- 7 Steps to Mastering Data Preparation with Python - Jun 2, 2017.
Follow these 7 steps for mastering data preparation, covering the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
Pages: 1 2
- Data Science for Newbies: An Introductory Tutorial Series for Software Engineers - May 31, 2017.
This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.
- 5 Machine Learning Projects You Can No Longer Overlook, May - May 10, 2017.
In this month's installment of Machine Learning Projects You Can No Longer Overlook, we find some data preparation and exploration tools, a (the?) reinforcement learning "framework," a new automated machine learning library, and yet another distributed deep learning library.
- Top KDnuggets tweets, Apr 26 – May 02: Face Recognition with Python, in under 25 lines of code - May 3, 2017.
Face Recognition with Python, in under 25 lines of code; Try #DeepLearning in #Python w. a fully pre-configured VM; Homo Bayesians #MachineLearning #humor #cartoon; The Most Popular Language For #MachineLearning, #DataScience Is ...
- The Guerrilla Guide to Machine Learning with Python - May 1, 2017.
Here is a bare bones take on learning machine learning with Python, a complete course for the quick study hacker with no time (or patience) to spare.
- Dask and Pandas and XGBoost: Playing nicely between distributed systems - Apr 27, 2017.
This blogpost gives a quick example using Dask.dataframe to do distributed Pandas data wrangling, then using a new dask-xgboost package to setup an XGBoost cluster inside the Dask cluster and perform the handoff.
- Data Science Dividends – A Gentle Introduction to Financial Data Analysis - Apr 24, 2017.
This post outlines some very basic methods for performing financial data analysis using Python, Pandas, and Matplotlib, focusing mainly on stock price data. A good place for beginners to start.
Pages: 1 2
- KDnuggets™ News 17:n13, Apr 5: What makes a great data scientist? Best R Packages for Machine Learning - Apr 5, 2017.
Also Best R Packages for Machine Learning; Deep Stubborn Networks - A Breakthrough Advance Towards Adversarial Machine Learning; A Short Guide to Navigating the Jupyter Ecosystem.
- A Beginner’s Guide to Tweet Analytics with Pandas - Mar 29, 2017.
Unlike a lot of other tutorials which often pull from the real-time Twitter API, we will be using the downloadable Twitter Analytics data, and most of what we do will be done in Pandas.
- Moving from R to Python: The Libraries You Need to Know - Feb 24, 2017.
Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.
- Introduction to Correlation - Feb 22, 2017.
Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.
- Making Python Speak SQL with pandasql - Feb 8, 2017.
Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.
- KDnuggets™ News 17:n04, Feb 1: Data Science and Python Wrangling: Pandas Cheat Sheet; Great Collection of Machine Learning Algorithms - Feb 1, 2017.
Also Great Collection of Minimal and Clean Implementations of Machine Learning Algorithms; Bad Data + Good Models = Bad Results; Data Scientist - best job in America, again.
- Pandas Cheat Sheet: Data Science and Data Wrangling in Python - Jan 27, 2017.
The Pandas library can seem very elaborate and it might be hard to find a single point of entry to the material: with other learning materials focusing on different aspects of this library, you can definitely use a reference sheet to help you get the hang of it.
- Tidying Data in Python - Jan 4, 2017.
This post summarizes some tidying examples Hadley Wickham used in his 2014 paper on Tidy Data in R, but will demonstrate how to do so using the Python pandas library.
- 5 Machine Learning Projects You Can No Longer Overlook, January - Jan 2, 2017.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects, the most recent in an ongoing series.
- Introduction to Machine Learning for Developers - Nov 28, 2016.
Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.
Pages: 1 2
- Statistical Data Analysis in Python - Jul 18, 2016.
This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects, taking the form of a set of IPython notebooks.
- Top KDnuggets tweets, Jul 6 – Jul 12: Statistical Data Analysis #Python #Jupyter Notebooks; Modern Pandas Notebooks - Jul 13, 2016.
Statistical Data Analysis in #Python (#Jupyter Notebooks); Modern Pandas: idiomatic Pandas notebook collection; New (free) book by @rdpeng: #rstats Programming for #DataScience
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
- Doing Data Science: A Kaggle Walkthrough – Cleaning Data - Mar 23, 2016.
Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.
Pages: 1 2
- Python Data Science with Pandas vs Spark DataFrame: Key Differences - Jan 29, 2016.
A post describing the key differences between Pandas and Spark's DataFrame format, including specifics on important regular processing features, with code samples.
- Overview of Python Visualization Tools - Nov 3, 2015.
An overview and comparison of the leading data visualization packages and tools for Python, including Pandas, Seaborn, ggplot, Bokeh, pygal, and Plotly.
Pages: 1 2
- Top KDnuggets tweets, Mar 30 – Apr 01: Very useful! Data Visualization with ggplot2 CheatSheet - Apr 2, 2015.
Very useful! Data Visualization with ggplot2 Cheat Sheet; Great Data Science resource: Intro to Statistics using Python, Pandas; 14 Best Python Pandas Features; Data Science shows why taxis can never compete.