Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps

A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.



By Julia Kho, Data Scientist

figure-name

A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.

In this article, I will guide you in creating your own annotated heatmap of a correlation matrix in 5 simple steps.

  1. Import Data
  2. Create Correlation Matrix
  3. Set Up Mask To Hide Upper Triangle
  4. Create Heatmap in Seaborn
  5. Export Heatmap

You can find the code from this article in my Jupyter Notebook located here.

1) Import Data

df = pd.read_csv(“Highway1.csv”, index_col = 0)


figure-name

This highway accidents data set contains the automobile accident rate, in accidents per million vehicle miles along with several design variables. More information about the data set can be found here.

2) Create Correlation Matrix

corr_matrix = df.corr()


figure-name

We create the correlation matrix with .corr . Notice that the htype column is not present in this matrix because it is not numeric. We will need to dummify htype to calculate correlation.

df_dummy = pd.get_dummies(df.htype)
df = pd.concat([df, df_dummy], axis = 1)


figure-name

In addition, note that the upper triangle half of the correlation matrix is symmetrical to the lower triangle half. Thus, there is no need for our heatmap to show the entire matrix. We’ll hide the upper triangle in the next step.

3) Set Up Mask To Hide Upper Triangle

mask = np.zeros_like(corr_matrix, dtype=np.bool)
mask[np.triu_indices_from(mask)]= True


Let’s break the above code down. np.zeros_like() returns an array of zeros with the same shape and type as the given array. By passing in the correlation matrix, we get an array of zeros like below.

figure-name

The dtype=np.bool parameter overrides the data type, so our array is an array of booleans.

figure-name

np.triu_indices_from(mask) returns the indices for the upper triangle of the array.

figure-name

Now, we set the upper triangle to True.
mask[np.triu_indices_from(mask)]= True

figure-name

Now, we have a mask that we can use to generate our heatmap.

4) Create Heatmap in Seaborn

f, ax = plt.subplots(figsize=(11, 15))

heatmap = sns.heatmap(corr_matrix,
                      mask = mask,
                      square = True,
                      linewidths = .5,
                      cmap = ’coolwarm’,
                      cbar_kws = {'shrink': .4,
                                ‘ticks’ : [-1, -.5, 0, 0.5, 1]},
                      vmin = -1,
                      vmax = 1,
                      annot = True,
                      annot_kws = {“size”: 12})

#add the column names as labels
ax.set_yticklabels(corr_matrix.columns, rotation = 0)
ax.set_xticklabels(corr_matrix.columns)

sns.set_style({'xtick.bottom': True}, {'ytick.left': True})


figure-name

To create our heatmap, we pass in our correlation matrix from step 3 and the mask we created in step 4, along with custom parameters to make our heatmap look nicer. Here’s a description of the parameters if you are interested in understanding what each line does.

#Makes each cell square-shaped.
square = True,
#Set width of the lines that will divide each cell to .5
linewidths = .5,
#Map data values to the coolwarm color space
cmap = 'coolwarm',
#Shrink the legend size and label tick marks at [-1, -.5, 0, 0.5, 1]
cbar_kws = {'shrink': .4, ‘ticks’ : [-1, -.5, 0, 0.5, 1]},
#Set min value for color bar
vmin = -1,
#Set max value for color bar
vmax = 1,
#Turn on annotations for the correlation values
annot = True,
#Set annotations to size 12
annot_kws = {“size”: 12})
#Add column names to the x labels
ax.set_xticklabels(corr_matrix.columns)
#Add column names to the y labels and rotate text to 0 degrees
ax.set_yticklabels(corr_matrix.columns, rotation = 0)
#Show tickmarks on bottom and left of heatmap
sns.set_style({'xtick.bottom': True}, {'ytick.left': True})


5) Export Heatmap
Now that you have the heatmap, let’s export it out.
heatmap.get_figure().savefig(‘heatmap.png’, bbox_inches=’tight’)

If you find that you have a very large heatmap that doesn’t export correctly, usebbox_inches = ‘tight’ to prevent your image from being cut off.

Thanks for reading! Feel free to share heatmaps that you’ve made with your data in the comments below.

 
Bio: Julia Kho is a Data Scientist passionate about creative problem solving and telling stories with data. She has previous experience in environmental consulting and working with spatial data.

Original. Reposted with permission.

Related: