Minority Report Visualized – Chicago Police Analyzed

Who watches the watchmen? This article examines the recently available Chicago Police misconduct allegation dataset from the Invisible Institute.

Data Investigation

As can be seen from the following graphic, misconduct allegations are heavily concentrated in parts of Chicago's South Side, parts of the West Side - focusing mainly on Humboldt Park, Garfield Park, North Lawndale, and Austin - with smaller clusters existing in and around the Loop, as well as further north in Uptown and Lake View.

Map of Allegations
Map of misconduct allegations.

Of course, like any city, Chicago is not a homogeneous entity. Neighborhoods in the city vary widely by race, income, crime, and innumerable other factors. While there are all sorts of established and evolving theories on the correlation between particular demographic groups (be they racial, gender, income, etc.), the prevalence of crime, and allegations of police misconduct (be they sustained or otherwise), there is no denying that various factors combine to create the complex realities that exist within the different pockets of our cities. Using the urban data clearinghouse website City Data, we can investigate and compare additional data related to particular Chicago neighborhoods, in order to attempt to paint a more informative picture.

As we have broadly identified 2 major geographic areas of police misconduct allegations in the city of Chicago, namely the South Side and the West Side, we will choose 4 neighborhoods of a relatively high number of allegations, 2 each from the South and West Sides, and dig further into their data.

The neighborhoods we will focus on are:

While a number of visualizations and tables are available on the CPDP website for data exploration, once you drill down to any specificity more granular than the entire dataset, you are given the opportunity to download a table of the data for said specificity. As alluded to above, for my data analysis I selected neighborhood as the granularity.

If we download the tables for a particular neighborhood, we then have access to a wealth of information on each misconduct allegation in said neighborhood. Collecting several tables could then be used to compare and contrast neighborhood allegation data. For instance, we could compare the number of occurrences of misconduct allegation categories between 2 neighborhoods. You can see just such a comparative analysis below.

Misconduct Allegation Categories
Misconduct allegation categories for Austin and Englewood.

As a side note, this simple exercise was undertaken mainly using Python and the Pandas and Matplotlib libraries. It is not meant to represent any substantial analysis, but instead to demonstrate some of what is available in the data, and what types of analysis could be performed. If you are interested, the undertaken analysis, including some additional data from City Data regarding the selected neighborhoods, is contained in this IPython Notebook.

Using the same tables downloaded from the CPDP, we can visualize the sustained allegation categories for the South Shore and North Lawndale neighborhoods, as shown below.

Sustained Allegation Categories
Sustained allegation categories for South Shore and North Lawndale.


The CPDP's data is available for in-app investigation, and the Invisible Institute hopes that its initiative will become a national model for transparency and accountability.

Of course, while this analysis has focused on a neighborhood level, more granular investigations could be performed as well, as could explorations focused on race, officers, police beats, length of allegation investigations, and much more. While the in-web app is limiting in what explorations can be performed, the CPDP is good enough to facilitate the local download of a wealth of tabular data for unlimited independent investigation. While the full dataset is not easily downloadable in a single click, you could actually download the tables for all 77 Chicago neighborhoods (community areas) and build your own city-wide dataset... though this would take some obvious time and finesse.

Although time restrictions didn't allow me to do so, it would also be interesting to pair CPDP's full data with that of City Data in order to build a more full picture of Chicago and its interaction with its police service. Deeper analysis could be performed, and perhaps even some predictive analytics, such as attempting to determine the specific factors leading to sustained allegations, for example.

Clustering of neighborhoods based on said total data, or perhaps classification of neighborhoods based on above/below average number of misconduct allegations might be interesting. Unfortunately, I have been unable to get my hands on the full dataset thus far, although building my own seems feasible given a few hours of time to kill.

Ultimately, I wish that the entire CPDP dataset was readily available for download, as I would like to (easily) perform some more extensive data analysis. However, for those interested, the CPDP provides an extensive amount of data related to Chicago Police misconduct allegations, and could provide a great source for such exploration if you were so inclined.

Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.

Thanks to Jon Lehto for the link to Chicago Police Data. Related: