Deep Learning to Fight Crime

We look at how using Deep Learning, Spark, and H2O Machine Learning platform can be used to analyze and predict crime in San Francisco and Chicago.

By Alex Tellez & Michal Malohlava,

We’ve seen some incredible applications of Deep Learning with respect to image recognition and machine translation. But recently we wanted to know if we could create a specific application that has to do with public safety; in particular, how Deep Learning can be used to fight crime in the forward-thinking cities of San Francisco and Chicago.

The cool thing about these two cities (and many others) is that they are both open data cities, which means anybody can access city data, ranging from transportation, building maintenance records, utility usage and 911 dispatches. So, if you are a data scientist or thinking about becoming a data scientist, there are publicly available city-specific datasets you can play with.

For this example, we looked at the historical crime data sets from both Chicago and San Francisco. To give us a more rounded data set, we then joined this data with other sources including, weather and US Census data, using Spark’s SQL context.

H2O + Spark Workflow

Figure 1: Spark + H2O Workflow

We do the data import, ad-hoc data munging (parsing the date column, for example), and joining of tables by leveraging the power of Spark and then publish the Spark RDD as an H2O Frame.

Fig. 2 and 3 include some cool visualizations we made from the joined table, provided by the Flow framework as part of our latest H2O product, which you can download here.

Crime in San Francisco

Figure 2: San Francisco crime visualizations

  Crime in Chicago

Figure 3: Chicago crime visualizations


Interestingly enough, in BOTH cities crime seems to occur most frequently during the winter - a surprising fact given how cold the weather gets in Chicago! We also discovered that the times when you don’t want to be on the streets – that is when the majority of crimes occurred – was at midnight, noon and 6pm.