Prove Your Point with Data and a Fast Python Library

Harness the power of Python and the command line to prove your point using data and a fast data-processing library.

By Matthew Ritter.

Have you ever been part of an e-mail discussion or somebody stated a "fact" that you were certain was not true? Isn't it frustrating to have to stop what you're doing and set things right? Try one of this lightweight Python library, and you have a conclusive results to paste into your reply faster than you can type "if you run the attached code you will clearly see..."

Enter csvkit

This Python library allows you to harness the power of Python from the command line. As the name implies, it takes in a CSV and gives you a swiss army knife of options for one-line analysis. I'll walk you through the steps with some real data from a series of chess games.

csvkit At a Glance

Getting the Data

First you have to download the data. Will do a simple curl from their website:

   curl -O

What does curl do? It downloads a file to the current directly, exactly as if you'd done it through a web browser. It can be extremely useful when there are a lot of data files with similar names that you want to cycle through (like "website_visitors_201501.csv", "website_visitors_201502.csv", etc).


Assuming that they have been paying their hosting bill, you should have a copy of their data as a CSV file in your working directory in just a few seconds. Now that you have that, you can start to use the power of CSVkit. Let's take a look at the first few rows:

    csvsort chess.csv | csvlook | head 

Let's take a look at the last few rows (note that, in contrast to the common 'tail' command, the header is not lost):

    csvsort -v chess.csv | csvlook | head 

Ready to see who won the most? Doing a pivot table is a one-line operation:

    csvstat chess.csv -c Winner --freq

That's it! You've gone from data to insight in just four steps, and honestly the third was just for fun.

Going Further

Want to see who they won against? There's a nifty tool that lets you execute SQL against the CSV, with no prep work required:

    csvsql --query 
       "select Winner, 
          case when White != Winner 
             then White else Black end loser, 
          count(*) countstar from chess 
          group by Winner 
          order by count(*) desc" chess.csv

csvkit Bar Graph

Maybe all of this text is going to overwhelm your reader, and you want to output a graph. No problem! Just break out another little library, and get a bar plot in the command line:

   csvsql --query 
      "select Winner, count(*) countstar from chess 
         group by Winner 
         order by count(*)" chess.csv
   | csvformat -D : | tail -n +2 | asciigraph

A lightning quick introduction to a lighting quick library. There is much more information, including installation instructions, on their excellent documentation website.

Matthew Ritter writes practical, action-oriented data advice at Preinvented Wheel.