KDnuggets Home » News » 2015 » Nov » Tutorials, Overviews, How-Tos » Overview of Python Visualization Tools ( 15:n36 )

Overview of Python Visualization Tools


An overview and comparison of the leading data visualization packages and tools for Python, including Pandas, Seaborn, ggplot, Bokeh, pygal, and Plotly.



By Chris Moffitt.

Introduction

In the python world, there are multiple options for visualizing your data. Because of this variety, it can be really challenging to figure out which one to use when. This article contains a sample of some of the more popular ones and illustrates how to use them to create a simple bar chart. I will create examples of plotting data with:

 

In the examples, I will use pandas to manipulate the data and use it to drive the visualization. In most cases these tools can be used without pandas but I think the combination of pandas + visualization tools is so common, it is the best place to start.

What About Matplotlib?

Matplotlib is the grandfather of python visualization packages. It is extremely powerful but with that power comes complexity. You can typically do anything you need using matplotlib but it is not always so easy to figure out. I am not going to walk through a pure Matplotlib example because many of the tools (especially Pandas and Seaborn) are thin wrappers over matplotlib. If you would like to read more about it, I went through several examples in my simple graphing article.

My biggest gripe with Matplotlib is that it just takes too much work to get reasonable looking graphs. In playing around with some of these examples, I found it easier to get nice looking visualization without a lot of code. For one small example of the verbose nature of matplotlib, look at the faceting example on this ggplot post.

Methodology

One quick note on my methodology for this article. I am sure that as soon as people start reading this, they will point out better ways to use these tools. My goal was not to create the exact same graph in each example. I wanted to visualize the data in roughly the same way in each example with roughly the same amount of time researching the solution.

As I went through this process, the biggest challenge I had was formatting the x and y axes and making the data look reasonable given some of the large labels. It also took some time to figure out how each tool wanted the data formatted. Once I figured those parts out, the rest was relatively simple.

Another point to consider is that a bar plot is probably one of the simpler types of graphs to make. These tools allow you to do many more types of plots with data. My examples focus more on the ease of formatting than innovative visualization examples. Also, because of the labels, some of the plots take up a lot of space so I’ve taken the liberty of cutting them off – just to keep the article length manageable. Finally, I have resized images so any blurriness is an issue of scaling and not a reflection on the actual output quality.

Finally, I’m approaching this from the mindset of trying to use another tool in lieu of Excel. I think my examples are more illustrative of displaying in a report, presentation, email or on a static web page. If you are evaluating tools for real time visualization of data or sharing via some other mechanism; then some of these tools offer a lot more capability that I don’t go into.

Data Set

The previous article describes the data we will be working with. I took the scraping example one layer deeper and determined the detail spending items in each category. This data set includes 125 line items but I have chosen to focus only on showing the top 10 to keep it a little simpler. You can find the full data set here.

Pandas

I am using a pandas DataFrame as the starting point for all the various plots. Fortunately, pandas does supply a built in plotting capability for us which is a layer over matplotlib. I will use that as the baseline. First, import our modules and read in the data into a budget DataFrame. We also want to sort the data and limit it to the top 10 items.

budget = pd.read_csv("mn-budget-detail-2014.csv")
budget = budget.sort('amount',ascending=False)[:10]

We will use the same budget lines for all of our examples. Here is what the top 5 items look like:

category detail amount
46 ADMINISTRATION Capitol Renovation and Restoration Continued 126300000
1 UNIVERSITY OF MINNESOTA Minneapolis; Tate Laboratory Renovation 56700000
78 HUMAN SERVICES Minnesota Security Hospital – St. Peter 56317000
0 UNIVERSITY OF MINNESOTA Higher Education Asset Preservation and Replac… 42500000
5 MINNESOTA STATE COLLEGES AND UNIVERSITIES Higher Education Asset Preservation and Replac… 42500000

Now, setup our display to use nicer defaults and create a bar plot:
pd.options.display.mpl_style = 'default'
budget_plot = budget.plot(kind="bar",x=budget["detail"],
title="MN Capital Budget - 2014",
legend=False)

This does all of the heavy lifting of creating the plot using the “detail” column as well as displaying the title and removing the legend. Here is the additional code needed to save the image as a png.
fig = budget_plot.get_figure()
fig.savefig("2014-mn-capital-budget.png")

Here is what it looks like (truncated to keep the article length manageable):
Pandas image

The basics look pretty nice. Ideally, I’d like to do some more formatting of the y-axis but that requires jumping into some matplotlib gymnastics. This is a perfectly serviceable visualization but it’s not possible to do a whole lot more customization purely through pandas.

Seaborn

Seaborn is a visualization library based on matplotlib. It seeks to make default data visualizations much more visually appealing. It also has the goal of making more complicated plots simpler to create. It does integrate well with pandas. My example does not allow seaborn to significantly differentiate itself. One thing I like about seaborn is the various built in styles which allows you to quickly change the color palettes to look a little nicer. Otherwise, seaborn does not do a lot for us with this simple chart. Standard imports and read in the data:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

budget = pd.read_csv("mn-budget-detail-2014.csv")
budget = budget.sort('amount',ascending=False)[:>10]

One thing I found out is that I explicitly had to set the order of the items on the x_axis using x_order This section of code sets the order, and styles the plot and bar chart colors:
sns.set_style("darkgrid")
bar_plot = sns.barplot(x=budget["detail"],y=budget["amount"],
palette="muted",
x_order=budget["detail"].tolist())
plt.xticks(rotation=>90)
plt.show()

Pandas image

As you can see, I had to use matplotlib to rotate the x axis titles so I could actually read them. Visually, the display looks nice. Ideally, I’d like to format the ticks on the y-axis but I couldn’t figure out how to do that without using plt.yticks from matplotlib.

ggplot

ggplot is similar to Seaborn in that it builds on top of matplotlib and aims to improve the visual appeal of matplotlib visualizations in a simple way. It diverges from seaborn in that it is a port of ggplot2 for R. Given this goal, some of the API is non-pythonic but it is a very powerful. I have not used ggplot in R so there was a bit of a learning curve. However, I can start to see the appeal of ggplot. The library is being actively developed and I hope it continues to grow and mature because I think it could be a really powerful option. I did have a few times in my learning where I struggled to figure out how to do something. After looking at the code and doing a little googling, I was able to figure most of it out. Go ahead and import and read our data:
import pandas as pd
from ggplot import *

budget = pd.read_csv("mn-budget-detail-2014.csv")
budget = budget.sort('amount',ascending=False)[:>10]

Now we construct our plot by chaining together a several ggplot commands:
p = ggplot(budget, aes(x="detail",y="amount")) + \
geom_bar(stat="bar", labels=budget["detail"].tolist()) +\
ggtitle("MN Capital Budget - 2014") + \
xlab("Spending Detail") + \
ylab("Amount") + scale_y_continuous(labels='millions') + \
theme(axis_text_x=element_text(angle=>90))
print p

This seems a little strange – especially using print p to display the graph. However, I found it relatively straightforward to figure out. It did take some digging to figure out how to rotate the text 90 degrees as well as figure out how to order the labels on the x-axis. The coolest feature I found was scale_y_continous which makes the labels come through a lot nicer. If you want to save the image, it’s easy with ggsave :
ggsave(p, "mn-budget-capital-ggplot.png")

Here is the final image. I know it’s a lot of grey scale. I could color it but did not take the time to do so.
Pandas image


Sign Up