Deep Learning to Fight Crime

We look at how using Deep Learning, Spark, and H2O Machine Learning platform can be used to analyze and predict crime in San Francisco and Chicago.

Using H2O Flow, we were able to look at the arrest rates of every category of recorded crimes in Chicago versus the percentage of total crimes each category represents. Some crimes with the highest arrest rates also occur least frequently, and vice versa.

Chicago Crime vs Arrest Rate

Figure 4: Chicago arrest rates and total % of all crimes by category

H2O Flow allows users to construct their own custom graphs from imported data.

Below is the code used to generate the graph in Fig. 4.

H2o Code Crime Arrest Rate

Figure 5: code used to generate the graph in Fig 4

Once the data was transformed to an H2O RDD, we trained a Deep Neural Network to predict if an arrest would be more or less likely to be made for a given crime. For each of our scenarios, we were able to generate an AUC curve of 0.92 for Chicago and 0.95 for San Francisco from scoring the trained model against the validation dataset.

H2O Deep Learning AUC for Predicting Chicago crime rate

Figure 6: Chicago validation data AUC

H2o Crime Chicago Map Fig7

Figure 7: Chicago Crime Geomapped predictions


Because each of the crimes reported comes with latitude and longitude coordinates, we were able to plot the predictions on a map of Chicago - specifically, the Downtown district. The color coding corresponds to the model’s prediction for likelihood of an arrest, with red being very likely (X > 0.8) and blue being unlikely (X < 0.2).

Open data cities have a tremendous opportunity to use machine learning, as we’ve demonstrated, to improve operations. In this case, we see police departments nationwide being smarter about how they dispatch officers for particular 911 calls; instead of redeploying officers from patrols across town, they may be able to use smaller response teams from local beat cops given a predicted likelihood of arrest. Alternatively, if specific calls require additional backup, dispatchers can be empowered to send in more officers.

Smart analytics + resource management = safer streets.

Bio: Alex Tellez and Michal Malohlava are software engineers and Machine Learning experts at