How to Use groupby for Advanced Data Grouping and Aggregation in Pandas

Learn how to perform advance grouping and aggregation in Pandas.



How to Use groupby for Advanced Data Grouping and Aggregation in Pandas
Image by Author | Midjourney

 

Let’s learn how to perform grouping and aggregation in Pandas.

 

Preparation

 
We would need the Pandas packages installed, so we can install them using the following code:

pip install pandas

 

With the packages installed, let’s jump into the article.

 

Data Grouping and Aggregation with Pandas

 
The information in the data can sometimes be too big and complex to consume. That is why we often perform grouping and aggregation to get concise information. A single number or set of values can provide much more detailed information than the whole data set.

Let’s try to perform data grouping. First, we would create a sample dataset.

import pandas as pd

df = pd.DataFrame({
    'Fruit': ['Banana', 'Orange', 'Banana', 'Orange', 'Banana'],
     'Size': ['Small', 'Small', 'Large', 'Large', 'Small'],              
     'Price': [100, 150, 200, 50, 300]})

 

We can use the groupby function to group the data.

df.groupby('Fruit')

 

It’s also possible to group the data with multiple columns.

df.groupby(['Fruit', 'Size'])

 

That’s all for data grouping. Now, we would try the aggregation function with the grouped data. For example, we would use multiple columns for each group and try to sum all the values for each group.

df.groupby(['Fruit', 'Size']).sum()

 

Output:

             Price
Fruit  Size        
Banana Large    200
       Small    400
Orange Large     50
       Small    150

 

We can also perform multiple aggregations of our grouped data.

df.groupby(['Fruit', 'Size']).agg(['sum', 'mean', 'count'])

 

Output:

            Price             
               sum   mean count
Fruit  Size                    
Banana Large   200  200.0     1
       Small   400  200.0     2
Orange Large    50   50.0     1
       Small   150  150.0     1

 

If required, we can perform different aggregation methods on different columns. We can map them like this.

aggs= {
    'Price': ['sum', 'mean'],
    'Size': ['count']
}

 

df.groupby('Fruit').agg(aggs)

 

Output:

      Price         Size
         sum   mean count
Fruit                    
Banana   600  200.0     3
Orange   200  100.0     2

 

We can create our aggregation function and use it in the grouped data.

def maxminrange(series):
    return series.max() - series.min()

 

df.groupby('Fruit')['Price'].agg(maxminrange)

 

Output:

Fruit
Banana    200
Orange    100

 

That’s how you perform advanced grouping and aggregation. Mastering these techniques will help you immensely during data analysis.

 

Additional Resources

 

 
 

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

No, thanks!