How to Perform Data Aggregation Over Time Series Data with Pandas

Let's learn how to perform time series aggregation with Pandas.

By Cornellius Yudha Wijaya, KDnuggets Technical Content Specialist on September 16, 2024 in Python

How to Perform Data Aggregation Over Time Series Data with Pandas

Image by Editor | Ideogram

Let’s learn how to perform time series data aggregation in Pandas.

Preparation

We would need the Pandas and Numpy packages installed, so we can install them using the following code:

pip install pandas numpy

With the packages installed, let’s jump into the article.

Time Series Data Aggregation

Time series are unique data as they are collected sequentially and stored at certain points in time. This kind of dataset is often used to represent the progression, like stock price, monthly sales data, and many more. What is important is that the data is ordered chronologically.

Aggregation is a methodology for summarizing or combining several data sets to produce a single set of values. It’s usually used to understand larger datasets by providing concise information.

As time series is a dataset, we can perform time series aggregation. Let’s try it with a data set example.

import pandas as pd
import numpy as np

np.random.seed(42)
date_rng = pd.date_range(start='2021-01-01', end='2023-12-31', freq='D')

df = pd.DataFrame({
    'Date': date_rng,
    'Sales': np.random.randint(100, 300, size=len(date_rng)),
    'Profit': np.random.randint(1000, 5000, size=len(date_rng)),
    'Rating': np.random.uniform(1, 10, size=len(date_rng))
})

With this example dataset, let’s try to perform time series aggregation. The way for the aggregation in Pandas is by either using the resample or groupby method.

Let’s start with the resample. This method uses the time series to aggregate the data over a certain period. We need to set the date as the index to use the resample.

df.set_index('Date', inplace=True)

Then, we can perform time series aggregation with a resample. For example, I use a yearly aggregation period.

df.resample('Y').mean()

Output:

                Sales       Profit    Rating
Date                                         
2021-12-31  203.410959  3105.854795  5.507386
2022-12-31  203.153425  2962.819178  5.366746
2023-12-31  194.657534  2989.123288  5.503049

You can change the resample frequencies, such as:

D (daily)
W (weekly)
M (monthly)
Q (quarterly)
A (yearly)

Alternatively, we can use the groupby for the time series aggregation.

df.groupby(df.index.year).mean()

Output:

          Sales       Profit    Rating
Date                                   
2021  203.410959  3105.854795  5.507386
2022  203.153425  2962.819178  5.366746
2023  194.657534  2989.123288  5.503049

We can map different aggregation methods to the different columns.

df.resample('Y').agg({
    'Sales': 'sum',
    'Profit': 'mean',
    'Rating': 'max'
})

Output:

           Sales       Profit    Rating
Date                                    
2021-12-31  74245  3105.854795  9.959324
2022-12-31  74151  2962.819178  9.931739
2023-12-31  71050  2989.123288  9.973703

That’s all for the time series aggregation. Mastering the time series aggregation would equip you with important data analysis skills.

Additional Resources

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

How to Perform Data Aggregation Over Time Series Data with Pandas

Preparation

Time Series Data Aggregation

Additional Resources

More On This Topic

Latest Posts

Top Posts