Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2019 » Jul » Tutorials, Overviews » Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps ( 21:n18 )

# Annotated Heatmaps of a Correlation Matrix in 5 Simple Steps

A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.

By Julia Kho, Data Scientist A heatmap is a graphical representation of data in which data values are represented as colors. That is, it uses color in order to communicate a value to the reader. This is a great tool to assist the audience towards the areas that matter the most when you have a large volume of data.

In this article, I will guide you in creating your own annotated heatmap of a correlation matrix in 5 simple steps.

1. Import Data
2. Create Correlation Matrix
3. Set Up Mask To Hide Upper Triangle
4. Create Heatmap in Seaborn
5. Export Heatmap

You can find the code from this article in my Jupyter Notebook located here.

1) Import Data

```df = pd.read_csv(“Highway1.csv”, index_col = 0)
``` This highway accidents data set contains the automobile accident rate, in accidents per million vehicle miles along with several design variables. More information about the data set can be found here.

2) Create Correlation Matrix

```corr_matrix = df.corr()
``` We create the correlation matrix with `.corr` . Notice that the htype column is not present in this matrix because it is not numeric. We will need to dummify htype to calculate correlation.

```df_dummy = pd.get_dummies(df.htype)
df = pd.concat([df, df_dummy], axis = 1)
``` In addition, note that the upper triangle half of the correlation matrix is symmetrical to the lower triangle half. Thus, there is no need for our heatmap to show the entire matrix. We’ll hide the upper triangle in the next step.

3) Set Up Mask To Hide Upper Triangle

```mask = np.zeros_like(corr_matrix, dtype=np.bool)
```

Let’s break the above code down. `np.zeros_like()` returns an array of zeros with the same shape and type as the given array. By passing in the correlation matrix, we get an array of zeros like below. The `dtype=np.bool` parameter overrides the data type, so our array is an array of booleans. `np.triu_indices_from(mask)` returns the indices for the upper triangle of the array. Now, we set the upper triangle to True.
`mask[np.triu_indices_from(mask)]= True` Now, we have a mask that we can use to generate our heatmap.

4) Create Heatmap in Seaborn

```f, ax = plt.subplots(figsize=(11, 15))

heatmap = sns.heatmap(corr_matrix,
square = True,
linewidths = .5,
cmap = ’coolwarm’,
cbar_kws = {'shrink': .4,
‘ticks’ : [-1, -.5, 0, 0.5, 1]},
vmin = -1,
vmax = 1,
annot = True,
annot_kws = {“size”: 12})

#add the column names as labels
ax.set_yticklabels(corr_matrix.columns, rotation = 0)
ax.set_xticklabels(corr_matrix.columns)

sns.set_style({'xtick.bottom': True}, {'ytick.left': True})
``` To create our heatmap, we pass in our correlation matrix from step 3 and the mask we created in step 4, along with custom parameters to make our heatmap look nicer. Here’s a description of the parameters if you are interested in understanding what each line does.

```#Makes each cell square-shaped.
square = True,
#Set width of the lines that will divide each cell to .5
linewidths = .5,
#Map data values to the coolwarm color space
cmap = 'coolwarm',
#Shrink the legend size and label tick marks at [-1, -.5, 0, 0.5, 1]
cbar_kws = {'shrink': .4, ‘ticks’ : [-1, -.5, 0, 0.5, 1]},
#Set min value for color bar
vmin = -1,
#Set max value for color bar
vmax = 1,
#Turn on annotations for the correlation values
annot = True,
#Set annotations to size 12
annot_kws = {“size”: 12})
#Add column names to the x labels
ax.set_xticklabels(corr_matrix.columns)
#Add column names to the y labels and rotate text to 0 degrees
ax.set_yticklabels(corr_matrix.columns, rotation = 0)
#Show tickmarks on bottom and left of heatmap
sns.set_style({'xtick.bottom': True}, {'ytick.left': True})
```

5) Export Heatmap
Now that you have the heatmap, let’s export it out.
`heatmap.get_figure().savefig(‘heatmap.png’, bbox_inches=’tight’)`

If you find that you have a very large heatmap that doesn’t export correctly, use`bbox_inches = ‘tight’` to prevent your image from being cut off.

Bio: Julia Kho is a Data Scientist passionate about creative problem solving and telling stories with data. She has previous experience in environmental consulting and working with spatial data.

Original. Reposted with permission.

Related:

Top Stories Past 30 Days
Most Shared Get KDnuggets, a leading newsletter on AI, Data Science, and Machine Learning