Prove Your Point with Data and a Fast Python Library
Harness the power of Python and the command line to prove your point using data and a fast data-processing library.
By Matthew Ritter.
Have you ever been part of an e-mail discussion or somebody stated a "fact" that you were certain was not true? Isn't it frustrating to have to stop what you're doing and set things right? Try one of this lightweight Python library, and you have a conclusive results to paste into your reply faster than you can type "if you run the attached code you will clearly see..."
Enter csvkit
This Python library allows you to harness the power of Python from the command line. As the name implies, it takes in a CSV and gives you a swiss army knife of options for one-line analysis. I'll walk you through the steps with some real data from a series of chess games.
Getting the Data
First you have to download the data. Will do a simple curl from their website:
What does curl do? It downloads a file to the current directly, exactly as if you'd done it through a web browser. It can be extremely useful when there are a lot of data files with similar names that you want to cycle through (like "website_visitors_201501.csv", "website_visitors_201502.csv", etc).
Analysis
Assuming that they have been paying their hosting bill, you should have a copy of their data as a CSV file in your working directory in just a few seconds. Now that you have that, you can start to use the power of CSVkit. Let's take a look at the first few rows:
Let's take a look at the last few rows (note that, in contrast to the common 'tail' command, the header is not lost):
Ready to see who won the most? Doing a pivot table is a one-line operation:
That's it! You've gone from data to insight in just four steps, and honestly the third was just for fun.
Going Further
Want to see who they won against? There's a nifty tool that lets you execute SQL against the CSV, with no prep work required:
Maybe all of this text is going to overwhelm your reader, and you want to output a graph. No problem! Just break out another little library, and get a bar plot in the command line:
A lightning quick introduction to a lighting quick library. There is much more information, including installation instructions, on their excellent documentation website.
Matthew Ritter writes practical, action-oriented data advice at Preinvented Wheel.
Related:
Have you ever been part of an e-mail discussion or somebody stated a "fact" that you were certain was not true? Isn't it frustrating to have to stop what you're doing and set things right? Try one of this lightweight Python library, and you have a conclusive results to paste into your reply faster than you can type "if you run the attached code you will clearly see..."
Enter csvkit
This Python library allows you to harness the power of Python from the command line. As the name implies, it takes in a CSV and gives you a swiss army knife of options for one-line analysis. I'll walk you through the steps with some real data from a series of chess games.

Getting the Data
First you have to download the data. Will do a simple curl from their website:
curl -O https://blog.chartio.com/assets/images/
blog/2013/csv/chess.csv
What does curl do? It downloads a file to the current directly, exactly as if you'd done it through a web browser. It can be extremely useful when there are a lot of data files with similar names that you want to cycle through (like "website_visitors_201501.csv", "website_visitors_201502.csv", etc).
Analysis
Assuming that they have been paying their hosting bill, you should have a copy of their data as a CSV file in your working directory in just a few seconds. Now that you have that, you can start to use the power of CSVkit. Let's take a look at the first few rows:
csvsort chess.csv | csvlook | head
Let's take a look at the last few rows (note that, in contrast to the common 'tail' command, the header is not lost):
csvsort -v chess.csv | csvlook | head
Ready to see who won the most? Doing a pivot table is a one-line operation:
csvstat chess.csv -c Winner --freq
That's it! You've gone from data to insight in just four steps, and honestly the third was just for fun.
Going Further
Want to see who they won against? There's a nifty tool that lets you execute SQL against the CSV, with no prep work required:
csvsql --query
"select Winner,
case when White != Winner
then White else Black end loser,
count(*) countstar from chess
group by Winner
order by count(*) desc" chess.csv

Maybe all of this text is going to overwhelm your reader, and you want to output a graph. No problem! Just break out another little library, and get a bar plot in the command line:
csvsql --query
"select Winner, count(*) countstar from chess
group by Winner
order by count(*)" chess.csv
| csvformat -D : | tail -n +2 | asciigraph

Matthew Ritter writes practical, action-oriented data advice at Preinvented Wheel.
Related: