How to Use the pivot_table Function for Advanced Data Summarization in Pandas

Let's learn to use Pandas pivot_table in Python to perform advance data summarization

Image by Author | Midjourney

Let me guide you on how to use the Pandas `pivot_table` function for your data summarization.

Preparation

``pip install pandas seaborn``

Then, we would load the packages and the dataset example, which is Titanic.

``````import pandas as pd
import seaborn as sns

Let's move on to the next section after successfully installing the package and loading the dataset.

Pivot Table with Pandas

Pivot tables in Pandas allow for flexible data reorganization and analysis. Let's examine some practical applications, starting with the simple one.

``````pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc='mean')
print(pivot)``````

``````Output>>>
sex        female       male
class
First   34.611765  41.281386
Second  28.722973  30.740707
Third   21.750000  26.507589``````

The resulting pivot table displays average ages, with passenger classes on the vertical axis and gender categories across the top.

We can go even further with the pivot table to calculate both the mean and the sum of fares.

``````pivot = pd.pivot_table(titanic, values='fare', index='class', columns='sex', aggfunc=['mean', 'sum'])
print(pivot)``````

``````Output>>>
mean                   sum
sex         female       male     female       male
class
First   106.125798  67.226127  9975.8250  8201.5875
Second   21.970121  19.741782  1669.7292  2132.1125
Third    16.118810  12.661633  2321.1086  4393.5865``````

We can create our function. For example, we create a function that takes the data maximum and minimum values differences and divides them by two.

``````def data_div_two(x):
return (x.max() - x.min())/2

pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc=data_div_two)
print(pivot)``````

``````Output>>>
sex     female    male
class
First   30.500  39.540
Second  27.500  34.665
Third   31.125  36.790``````

Lastly, you can add the margins to see the differences between the overall grouping average and the specific sub-group.

``````pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc='mean', margins=True)
print(pivot)``````

``````Output>>>
sex        female       male        All
class
First   34.611765  41.281386  38.233441
Second  28.722973  30.740707  29.877630
Third   21.750000  26.507589  25.140620
All     27.915709  30.726645  29.699118``````

Mastering the `pivot_table` function would allow you to get insight from your dataset.