How to Perform Data Aggregation Over Time Series Data with Pandas
Let's learn how to perform time series aggregation with Pandas.

Image by Editor | Ideogram
Let’s learn how to perform time series data aggregation in Pandas.
Preparation
We would need the Pandas and Numpy packages installed, so we can install them using the following code:
pip install pandas numpy
With the packages installed, let’s jump into the article.
Time Series Data Aggregation
Time series are unique data as they are collected sequentially and stored at certain points in time. This kind of dataset is often used to represent the progression, like stock price, monthly sales data, and many more. What is important is that the data is ordered chronologically.
Aggregation is a methodology for summarizing or combining several data sets to produce a single set of values. It’s usually used to understand larger datasets by providing concise information.
As time series is a dataset, we can perform time series aggregation. Let’s try it with a data set example.
import pandas as pd
import numpy as np
np.random.seed(42)
date_rng = pd.date_range(start='2021-01-01', end='2023-12-31', freq='D')
df = pd.DataFrame({
'Date': date_rng,
'Sales': np.random.randint(100, 300, size=len(date_rng)),
'Profit': np.random.randint(1000, 5000, size=len(date_rng)),
'Rating': np.random.uniform(1, 10, size=len(date_rng))
})
With this example dataset, let’s try to perform time series aggregation. The way for the aggregation in Pandas is by either using the resample or groupby method.
Let’s start with the resample. This method uses the time series to aggregate the data over a certain period. We need to set the date as the index to use the resample.
df.set_index('Date', inplace=True)
Then, we can perform time series aggregation with a resample. For example, I use a yearly aggregation period.
df.resample('Y').mean()
Output:
Sales Profit Rating
Date
2021-12-31 203.410959 3105.854795 5.507386
2022-12-31 203.153425 2962.819178 5.366746
2023-12-31 194.657534 2989.123288 5.503049
You can change the resample frequencies, such as:
- D (daily)
- W (weekly)
- M (monthly)
- Q (quarterly)
- A (yearly)
Alternatively, we can use the groupby for the time series aggregation.
df.groupby(df.index.year).mean()
Output:
Sales Profit Rating
Date
2021 203.410959 3105.854795 5.507386
2022 203.153425 2962.819178 5.366746
2023 194.657534 2989.123288 5.503049
We can map different aggregation methods to the different columns.
df.resample('Y').agg({
'Sales': 'sum',
'Profit': 'mean',
'Rating': 'max'
})
Output:
Sales Profit Rating
Date
2021-12-31 74245 3105.854795 9.959324
2022-12-31 74151 2962.819178 9.931739
2023-12-31 71050 2989.123288 9.973703
That’s all for the time series aggregation. Mastering the time series aggregation would equip you with important data analysis skills.
Additional Resources
- Top 5 Time Series Methods
- Do’s and Don’ts of Analyzing Time Series
- Pandas: How to Resample Time Series with groupby()
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.