How to Use groupby for Advanced Data Grouping and Aggregation in Pandas
Learn how to perform advance grouping and aggregation in Pandas.

Image by Author | Midjourney
Let’s learn how to perform grouping and aggregation in Pandas.
Preparation
We would need the Pandas packages installed, so we can install them using the following code:
pip install pandas
With the packages installed, let’s jump into the article.
Data Grouping and Aggregation with Pandas
The information in the data can sometimes be too big and complex to consume. That is why we often perform grouping and aggregation to get concise information. A single number or set of values can provide much more detailed information than the whole data set.
Let’s try to perform data grouping. First, we would create a sample dataset.
import pandas as pd
df = pd.DataFrame({
'Fruit': ['Banana', 'Orange', 'Banana', 'Orange', 'Banana'],
'Size': ['Small', 'Small', 'Large', 'Large', 'Small'],
'Price': [100, 150, 200, 50, 300]})
We can use the groupby function to group the data.
df.groupby('Fruit')
It’s also possible to group the data with multiple columns.
df.groupby(['Fruit', 'Size'])
That’s all for data grouping. Now, we would try the aggregation function with the grouped data. For example, we would use multiple columns for each group and try to sum all the values for each group.
df.groupby(['Fruit', 'Size']).sum()
Output:
Price
Fruit Size
Banana Large 200
Small 400
Orange Large 50
Small 150
We can also perform multiple aggregations of our grouped data.
df.groupby(['Fruit', 'Size']).agg(['sum', 'mean', 'count'])
Output:
Price
sum mean count
Fruit Size
Banana Large 200 200.0 1
Small 400 200.0 2
Orange Large 50 50.0 1
Small 150 150.0 1
If required, we can perform different aggregation methods on different columns. We can map them like this.
aggs= {
'Price': ['sum', 'mean'],
'Size': ['count']
}
df.groupby('Fruit').agg(aggs)
Output:
Price Size
sum mean count
Fruit
Banana 600 200.0 3
Orange 200 100.0 2
We can create our aggregation function and use it in the grouped data.
def maxminrange(series):
return series.max() - series.min()
df.groupby('Fruit')['Price'].agg(maxminrange)
Output:
Fruit
Banana 200
Orange 100
That’s how you perform advanced grouping and aggregation. Mastering these techniques will help you immensely during data analysis.
Additional Resources
- Pandas: How to Use Groupby with Multiple Aggregations
- Pandas: How to Groupby Range of Values
- How to Group Data by Hour in Pandas (With Example)
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.