Deep Learning to Fight Crime
We look at how using Deep Learning, Spark, and H2O Machine Learning platform can be used to analyze and predict crime in San Francisco and Chicago.
We’ve seen some incredible applications of Deep Learning with respect to image recognition and machine translation. But recently we wanted to know if we could create a specific application that has to do with public safety; in particular, how Deep Learning can be used to fight crime in the forward-thinking cities of San Francisco and Chicago.
The cool thing about these two cities (and many others) is that they are both open data cities, which means anybody can access city data, ranging from transportation, building maintenance records, utility usage and 911 dispatches. So, if you are a data scientist or thinking about becoming a data scientist, there are publicly available city-specific datasets you can play with.
For this example, we looked at the historical crime data sets from both Chicago and San Francisco. To give us a more rounded data set, we then joined this data with other sources including, weather and US Census data, using Spark’s SQL context.
Figure 1: Spark + H2O Workflow
We do the data import, ad-hoc data munging (parsing the date column, for example), and joining of tables by leveraging the power of Spark and then publish the Spark RDD as an H2O Frame.
Fig. 2 and 3 include some cool visualizations we made from the joined table, provided by the Flow framework as part of our latest H2O product, which you can download here.
Figure 2: San Francisco crime visualizations
Figure 3: Chicago crime visualizations
Interestingly enough, in BOTH cities crime seems to occur most frequently during the winter - a surprising fact given how cold the weather gets in Chicago! We also discovered that the times when you don’t want to be on the streets – that is when the majority of crimes occurred – was at midnight, noon and 6pm.