- Python Pandas For Data Discovery in 7 Simple Steps - Mar 10, 2020.
Just getting started with Python's Pandas library for data analysis? Or, ready for a quick refresher? These 7 steps will help you become familiar with its core features so you can begin exploring your data in no time.
- Top KDnuggets tweets, Jan 15-21: My Pandas Cheat Sheet; 5 Key Reasons Why Data Scientists Are Quitting their Jobs - Jan 22, 2020.
5 Key Reasons Why Data Scientists Are Quitting their Jobs; My Pandas Cheat Sheet; Google Colab: Jupyter Lab on steroids (perfect for Deep Learning); Top 5 Must-have Data Science Skills.
- KDnuggets™ News 19:n48, Dec 18: Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; Poll on AutoML - Dec 18, 2019.
Build Pipelines with Pandas Using pdpipe; AI, Analytics, ML, DS, Technology Main Developments, Key Trends; New Poll: Does AutoML work? Ultralearn Data Science; Python Dictionary How-To; Top stories of 2019 and more.
- Build Pipelines with Pandas Using pdpipe - Dec 13, 2019.
We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.
- KDnuggets™ News 19:n43, Nov 13: Dynamic Reports in Python and R; Creating NLP Vocabularies; What is Data Science? - Nov 13, 2019.
On KDnuggets this week: Orchestrating Dynamic Reports in Python and R with Rmd Files; How to Create a Vocabulary for NLP Tasks in Python; What is Data Science?; The Complete Data Science LinkedIn Profile Guide; Set Operations Applied to Pandas DataFrames; and much, much more.
- How to Speed up Pandas by 4x with one line of code - Nov 12, 2019.
While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.
- Understanding Boxplots - Nov 8, 2019.
A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.
- Data Cleaning and Preprocessing for Beginners - Nov 7, 2019.
Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.
- Set Operations Applied to Pandas DataFrames - Nov 7, 2019.
In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.
- KDnuggets™ News 19:n41, Oct 30: Feature Selection: Beyond feature importance?; Time Series Analysis Using KNIME and Spark - Oct 30, 2019.
This week in KDnuggets: Feature Selection: Beyond feature importance?; Time Series Analysis: A Simple Example with KNIME and Spark; 5 Advanced Features of Pandas and How to Use Them; How to Measure Foot Traffic Using Data Analytics; Introduction to Natural Language Processing (NLP); and much, much more!
- 5 Advanced Features of Pandas and How to Use Them - Oct 25, 2019.
The pandas library offers core functionality when preparing your data using Python. But, many don't go beyond the basics, so learn about these lesser-known advanced methods that will make handling your data easier and cleaner.
- Exploratory Data Analysis Using Python - Aug 7, 2019.
In this tutorial, you’ll use Python and Pandas to explore a dataset and create visual distributions, identify and eliminate outliers, and uncover correlations between two datasets.
- KDnuggets™ News 19:n29, Aug 7: What 70% of Data Science Learners Do Wrong; Pytorch Cheat Sheet for Beginners - Aug 7, 2019.
This week on KDnuggets: What 70% of Data Science Learners Do Wrong; Pytorch Cheat Sheet for Beginners and Udacity Deep Learning Nanodegree; How a simple mix of object-oriented programming can sharpen your deep learning prototype; Can we trust AutoML to go on full autopilot?; Ten more random useful things in R you may not know about; 25 Tricks for Pandas; and much more!
- 25 Tricks for Pandas - Aug 6, 2019.
Check out this video (and Jupyter notebook) which outlines a number of Pandas tricks for working with and manipulating data, covering topics such as string manipulations, splitting and filtering DataFrames, combining and aggregating data, and more.
- 10 Simple Hacks to Speed up Your Data Analysis in Python - Jul 11, 2019.
This article lists some curated tips for working with Python and Jupyter Notebooks, covering topics such as easily profiling data, formatting code and output, debugging, and more. Hopefully you can find something useful within.
- Top KDnuggets Tweets, Jun 19 – 25: Learn how to efficiently handle large amounts of data using #Pandas; The biggest mistake while learning #Python for #datascience - Jun 26, 2019.
Also: Data Science Jobs Report 2019; Harvard CS109 #DataScience Course, Resources #Free and Online; Google launches TensorFlow; Mastering SQL for Data Science
- KDnuggets™ News 19:n24, Jun 26: Understand Cloud Services; Pandas Tips & Tricks; Master Data Preparation w/ Python - Jun 26, 2019.
Happy summer! This week on KDnuggets: Understanding Cloud Data Services; How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat; 7 Steps to Mastering Data Preparation for Machine Learning with Python; Examining the Transformer Architecture: The OpenAI GPT-2 Controversy; Data Literacy: Using the Socratic Method; and much more!
- 7 Steps to Mastering Data Preparation for Machine Learning with Python — 2019 Edition - Jun 24, 2019.
Interested in mastering data preparation with Python? Follow these 7 steps which cover the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
- How to select rows and columns in Pandas using [ ], .loc, iloc, .at and .iat - Jun 19, 2019.
Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.
- KDnuggets™ News 19:n23, Jun 19: Useful Stats for Data Scientists; Python, TensorFlow & R Winners in Latest Job Report - Jun 19, 2019.
This week on KDnuggets: 5 Useful Statistics Data Scientists Need to Know; Data Science Jobs Report 2019: Python Way Up, TensorFlow Growing Rapidly, R Use Double SAS; How to Learn Python for Data Science the Right Way; The Machine Learning Puzzle, Explained; Scalable Python Code with Pandas UDFs; and much more!
- How to Learn Python for Data Science the Right Way - Jun 14, 2019.
The biggest mistake you can make while learning Python for data science is to learn Python programming from courses meant for programmers. Avoid this mistake, and learn Python the right way by following this approach.
- Become a Pro at Pandas, Python’s Data Manipulation Library - Jun 13, 2019.
Pandas is one of the most popular Python libraries for cleaning, transforming, manipulating and analyzing data. Learn how to efficiently handle large amounts of data using Pandas.
- Scalable Python Code with Pandas UDFs: A Data Science Application - Jun 13, 2019.
There is still a gap between the corpus of libraries that developers want to apply in a scalable runtime and the set of libraries that support distributed execution. This post discusses how to bridge this gap using the the functionality provided by Pandas UDFs in Spark 2.3+
- 7 Steps to Mastering Intermediate Machine Learning with Python — 2019 Edition - Jun 3, 2019.
This is the second part of this new learning path series for mastering machine learning with Python. Check out these 7 steps to help master intermediate machine learning with Python!
- KDnuggets™ News 19:n17, May 1: The most desired skill in data science; Seeking KDnuggets Editors, work remotely - May 1, 2019.
This week, find out about the most desired skill in data science, learn which projects to include in your portfolio, identify a single strategy for pulling data from a Pandas DataFrame (once and for all), read the results of our Top Data Science and Machine Learning Methods poll, and much more.
- Pandas DataFrame Indexing - Apr 29, 2019.
The goal of this post is identify a single strategy for pulling data from a DataFrame using the Pandas Python library that is straightforward to interpret and produces reliable results.
- Python Data Science for Beginners - Feb 20, 2019.
Python’s syntax is very clean and short in length. Python is open-source and a portable language which supports a large standard library. Buy why Python for data science? Read on to find out more.
- Top KDnuggets tweets, Jan 30 – Feb 05: state-of-the-art in #AI, #MachineLearning - Feb 6, 2019.
Also Brilliant tour-de-force! Reinforcement Learning to solve Rubiks Cube; Dask, Pandas, and GPUs: first steps; Neural network AI is simple. So Stop pretending you are a genius.
- Top Python Libraries in 2018 in Data Science, Deep Learning, Machine Learning - Dec 19, 2018.
Here are the top 15 Python libraries across Data Science, Data Visualization. Deep Learning, and Machine Learning.
- Top 10 Python Data Science Libraries - Nov 16, 2018.
The third part of our series investigating the top Python Libraries across Machine Learning, AI, Deep Learning and Data Science.
- Healthcare Analytics Made Simple - Nov 12, 2018.
Finally, a book on Python healthcare machine learning techniques is here! Healthcare Analytics Made Simple does just what the title says: it makes healthcare data science simple and approachable for everyone.
- Beginner Data Visualization & Exploration Using Pandas - Oct 22, 2018.
This tutorial will offer a beginner guide into how to get around with Pandas for data wrangling and visualization.
Pages: 1 2
- Optimus v2: Agile Data Science Workflows Made Easy - Aug 30, 2018.
Looking for a library to skyrocket your productivity as Data Scientist? Check this out!
- Programming Best Practices For Data Science - Aug 7, 2018.
In this post, I'll go over the two mindsets most people switch between when doing programming work specifically for data science: the prototype mindset and the production mindset.
- Top 20 Python Libraries for Data Science in 2018 - Jun 27, 2018.
Our selection actually contains more than 20 libraries, as some of them are alternatives to each other and solve the same problem. Therefore we have grouped them as it's difficult to distinguish one particular leader at the moment.
Pages: 1 2
- Swiftapply – Automatically efficient pandas apply operations - Apr 24, 2018.
Using Swiftapply, easily apply any function to a pandas dataframe in the fastest available manner.
- Quick Feature Engineering with Dates Using fast.ai - Mar 16, 2018.
The fast.ai library is a collection of supplementary wrappers for a host of popular machine learning libraries, designed to remove the necessity of writing your own functions to take care of some repetitive tasks in a machine learning workflow.
- Using Excel with Pandas - Jan 23, 2018.
In this tutorial, we are going to show you how to work with Excel files in pandas, covering computer setup, reading in data from Excel files into pandas, data exploration in pandas, and more.
Pages: 1 2
- Top KDnuggets tweets, Jan 3-9: A collection of Jupyter notebooks NumPy, Pandas, matplotlib, basic #Python #MachineLearning - Jan 10, 2018.
Artificial General Intelligence (AGI) in less than 50 years; Top KDnuggets tweets: 10 Free Must-Read Books for #MachineLearning and #DataScience; The Art of Learning #DataScience; Supercharging Visualization with Apache Arrow; Docker for #DataScience
- Python Data Preparation Case Files: Group-based Imputation - Sep 25, 2017.
The second part in this series addresses group-based imputation for dealing with missing data values. Check out why finding group means can be a more formidable action than overall means, and see how to accomplish it in Python.
- Top KDnuggets tweets, Sep 13-19: Top Books on NLP; What Else Can AI Guess From Your Face? - Sep 20, 2017.
Also: The Ten Fallacies of Data Science; #Python #Pandas tips and tricks; Geoff Hinton says we need to start all over.
- Python Data Preparation Case Files: Removing Instances & Basic Imputation - Sep 14, 2017.
This is the first of 3 posts to cover imputing missing values in Python using Pandas. The slowest-moving of the series (out of necessity), this first installment lays out the task and data at the risk of boring you. The next 2 posts cover group- and regression-based imputation.
- Top KDnuggets tweets, Aug 30 – Sep 5: Python overtakes R, becomes the leader in #DataScience; Humble Book Bundle: #DataScience - Sep 6, 2017.
Also: Pandas tips and tricks #Python #DataScience; How I replicated an $86 million project in 57 lines of code; Future #MachineLearning Class.
- 6 Interesting Things You Can Do with Python on Facebook Data - Jun 6, 2017.
Facebook has a huge amount of data that is available for you to explore, you can do many things with this data. I will be sharing my experience with you on how you can use the Facebook Graph API for analysis with Python.
- 7 Steps to Mastering Data Preparation with Python - Jun 2, 2017.
Follow these 7 steps for mastering data preparation, covering the concepts, the individual tasks, as well as different approaches to tackling the entire process from within the Python ecosystem.
Pages: 1 2
- Data Science for Newbies: An Introductory Tutorial Series for Software Engineers - May 31, 2017.
This post summarizes and links to the individual tutorials which make up this introductory look at data science for newbies, mainly focusing on the tools, with a practical bent, written by a software engineer from the perspective of a software engineering approach.
- 5 Machine Learning Projects You Can No Longer Overlook, May - May 10, 2017.
In this month's installment of Machine Learning Projects You Can No Longer Overlook, we find some data preparation and exploration tools, a (the?) reinforcement learning "framework," a new automated machine learning library, and yet another distributed deep learning library.
- Top KDnuggets tweets, Apr 26 – May 02: Face Recognition with Python, in under 25 lines of code - May 3, 2017.
Face Recognition with Python, in under 25 lines of code; Try #DeepLearning in #Python w. a fully pre-configured VM; Homo Bayesians #MachineLearning #humor #cartoon; The Most Popular Language For #MachineLearning, #DataScience Is ...
- The Guerrilla Guide to Machine Learning with Python - May 1, 2017.
Here is a bare bones take on learning machine learning with Python, a complete course for the quick study hacker with no time (or patience) to spare.
- Dask and Pandas and XGBoost: Playing nicely between distributed systems - Apr 27, 2017.
This blogpost gives a quick example using Dask.dataframe to do distributed Pandas data wrangling, then using a new dask-xgboost package to setup an XGBoost cluster inside the Dask cluster and perform the handoff.
- Data Science Dividends – A Gentle Introduction to Financial Data Analysis - Apr 24, 2017.
This post outlines some very basic methods for performing financial data analysis using Python, Pandas, and Matplotlib, focusing mainly on stock price data. A good place for beginners to start.
Pages: 1 2
- KDnuggets™ News 17:n13, Apr 5: What makes a great data scientist? Best R Packages for Machine Learning - Apr 5, 2017.
Also Best R Packages for Machine Learning; Deep Stubborn Networks - A Breakthrough Advance Towards Adversarial Machine Learning; A Short Guide to Navigating the Jupyter Ecosystem.
- A Beginner’s Guide to Tweet Analytics with Pandas - Mar 29, 2017.
Unlike a lot of other tutorials which often pull from the real-time Twitter API, we will be using the downloadable Twitter Analytics data, and most of what we do will be done in Pandas.
- Moving from R to Python: The Libraries You Need to Know - Feb 24, 2017.
Are you considering making a move from R to Python? Here are the libraries you need to know, how they stack up to their R contemporaries, and why you should learn them.
- Introduction to Correlation - Feb 22, 2017.
Correlation is one of the most widely used (and widely misunderstood) statistical concepts. We provide the definitions and intuition behind several types of correlation and illustrate how to calculate correlation using the Python pandas library.
- Making Python Speak SQL with pandasql - Feb 8, 2017.
Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.
- KDnuggets™ News 17:n04, Feb 1: Data Science and Python Wrangling: Pandas Cheat Sheet; Great Collection of Machine Learning Algorithms - Feb 1, 2017.
Also Great Collection of Minimal and Clean Implementations of Machine Learning Algorithms; Bad Data + Good Models = Bad Results; Data Scientist - best job in America, again.
- Pandas Cheat Sheet: Data Science and Data Wrangling in Python - Jan 27, 2017.
The Pandas library can seem very elaborate and it might be hard to find a single point of entry to the material: with other learning materials focusing on different aspects of this library, you can definitely use a reference sheet to help you get the hang of it.
- Tidying Data in Python - Jan 4, 2017.
This post summarizes some tidying examples Hadley Wickham used in his 2014 paper on Tidy Data in R, but will demonstrate how to do so using the Python pandas library.
- 5 Machine Learning Projects You Can No Longer Overlook, January - Jan 2, 2017.
There are a lot of popular machine learning projects out there, but many more that are not. Which of these are actively developed and worth checking out? Here is an offering of 5 such projects, the most recent in an ongoing series.
- Introduction to Machine Learning for Developers - Nov 28, 2016.
Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning.
Pages: 1 2
- Statistical Data Analysis in Python - Jul 18, 2016.
This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects, taking the form of a set of IPython notebooks.
- Top KDnuggets tweets, Jul 6 – Jul 12: Statistical Data Analysis #Python #Jupyter Notebooks; Modern Pandas Notebooks - Jul 13, 2016.
Statistical Data Analysis in #Python (#Jupyter Notebooks); Modern Pandas: idiomatic Pandas notebook collection; New (free) book by @rdpeng: #rstats Programming for #DataScience
- 5 Machine Learning Projects You Can No Longer Overlook - May 19, 2016.
We all know the big machine learning projects out there: Scikit-learn, TensorFlow, Theano, etc. But what about the smaller niche projects that are actively developed, providing useful services to users? Here are 5 such projects.
- Doing Data Science: A Kaggle Walkthrough – Cleaning Data - Mar 23, 2016.
Gain insight into the process of cleaning data for a specific Kaggle competition, including a step by step overview.
Pages: 1 2
- Python Data Science with Pandas vs Spark DataFrame: Key Differences - Jan 29, 2016.
A post describing the key differences between Pandas and Spark's DataFrame format, including specifics on important regular processing features, with code samples.
- Overview of Python Visualization Tools - Nov 3, 2015.
An overview and comparison of the leading data visualization packages and tools for Python, including Pandas, Seaborn, ggplot, Bokeh, pygal, and Plotly.
Pages: 1 2
- Top KDnuggets tweets, Mar 30 – Apr 01: Very useful! Data Visualization with ggplot2 CheatSheet - Apr 2, 2015.
Very useful! Data Visualization with ggplot2 Cheat Sheet; Great Data Science resource: Intro to Statistics using Python, Pandas; 14 Best Python Pandas Features; Data Science shows why taxis can never compete.