OpenStreetMap Data to ML Training Labels for Object Detection

I am really interested in creating a tight, clean pipeline for disaster relief applications, where we can use something like crowd sourced building polygons from OSM to train a supervised object detector to discover buildings in an unmapped location.

By Shay Strong, Director of Data Science & Machine Learning at EagleView

So I wanted to create a seamless tutorial for taking OpenStreetMap (OSM) vector data and converting it for use with machine learning (ML) models. In particular, I am really interested in creating a tight, clean pipeline for disaster relief applications, where we can use something like crowd sourced building polygons from OSM to train a supervised object detector to discover buildings in an unmapped location.

The recipe for building a basic deep learning object detector is to have two components: (1) training data (raster image + vector label pairs) and (2) model framework. The deep learning model itself will be a Single Shot Detector (SSD) object detector. We will use OSM polygons as the basis of our label data and Digital Globe imagery for the raster data. We won’t go into the details of an SSD here, as there are plenty sources available. We will run the object detector in AWS Sagemaker. This current article only focuses on the training data generation. I’ll follow up with a separate article on model training.

I should note that this tutorial is available in this Github repo, so you can bypass this article if you want to dive in. Although it is a work in progress as part of an upcoming UW Geohackweek. I anticpate using this tutorial in conjunction with HOT-OSM related tasks — where there may be crowd-sourced vector data as part of a specific project. For the purpose of establishing a demo, we will use a recent HOT OSM task area that was impacted by Cyclone Kenneth in 2019, Nzwani, Comores.


Cyclone Kenneth track in late April 2019.

The OSM vector data is freely available. The imagery we require is often not. Welcome to the world of GIS. BUT, Digital Globe has Open Data imagery for many of these natural disaster areas. So we can grab that data for this application (a little later on).

I decided to use the OSMNX python library for interfacing with OSM (which can be a bit daunting otherwise). Based on the HOT-OSM task, Nzwani, Comores was labeled as an ‘Urgent’ location (see table above). So I’m going to start there because it seems like a lot of training data could be available.


(left) The OSM ‘DiGraph’ of nodes & edges. (right) The Nzwani landmass with the roads and buildings overlaid.

If you inspect the length of the buildings (a geopandas dataframe that is returned), you will see a significant amount of features. We will use these as the training data for our object detectors.

Let’s grab some DG imagery. To create an object detector, we will mimic the VOC Pascal training data format, where we require pairs of images (jpegs) and vector (xml) labels. The xml files are formated in a particular way. You can read about it (on this difficult to navigate) website.

I would prefer to pull the data directly from the Digital Globe Open Data website, but unforuntately at present, there seems to be no imagery that covers this area post-event. I really want to keep this location as a point of focus, given the amount of buildings we found and the relevance of this task. So, I will just separately download the necessary imagery and make a small subset available for this demo. You can find a sample GeoTIFF in the GitHub repo.

Let me digress a little about the DG Open Data website. This website is rather difficult to search, and even though a huge amount of timely and relevant imagery is made available free for areas impacted by natural disasters, it is nearly impossible to search efficiently in a geospatial way as an individual user. Bottom line, it is a little hard to determine which image may optimally contain a significant amount of buildings for the region I am interested in. My typical process would be to click through some thumbnails on the website and find a region with significant urban-looking growth. There has to be buildings in urban areas! I sort of gave up after 20 minutes of looking at patterns in thumbnails (well 45 min, as I can get obsessive at pattern matching). Assuming the DG Open Data has this image at some point in the future, the subsequent steps will still be consistent for any GeoTIFF you download.

If you get imagery directly from the DG Open Data website, they are actually in the format of a Cloud Optimized GeoTIFF (COG) which turn out to be super convenient to create a virtual raster (vrt) from. The benefit of this is that we can create a light-weight file that loads in a local QGIS window without downloading the entire tif. We can cut, subset, etc and just wind up with the image we want, and not supereflous regions.

Once you have the GeoTIFF, we will use GDAL to translate it to a MBtile format and then unpack it to it x/y/z slippymap (TMS) directory structure. MBtiles are super advantageous for us here, since there a 256x256 image chips (png) that lend themselves well for deep learning model training formats.

You may notice that the images generated are in a nested file structure. We need to ‘flatten’ them, so that all the images are in a single directory. You can’t just copy them to a single directory since the .png filenames are not unique. We will also convert them to jpeg from png. We will stick them in a nonsensical VOC-like folder to distinguish it from a legit VOC dataset, VOC1900.

Yay. Images are done!
Next, we will take our buildings and buffer them to the nearest rectangle. This is what the object detector wants.

Now that we have the buildings represented as axis-aligned bounding boxes, we will want to use my new favorite utility called Supermercado. We will go to the ‘supermarket’ to identify all the TMS tiles that the building boxes overlap. We will map the TMS tile ID to the building box itself and then convert the building box, which is in a geospatial lat/long format, to a slippymap tile reference frame. Now we will have a building box on a TMS tile grid consistent with the imagery we unpacked on the same grid.

Finally, we want to cleanup the xml labels and images to ensure we remove excess vector labels. We only want an identical pair of images and annotations.

You should now have a clean and ready to go directory of VOC-style images and labels ready for training.

        │___ Annotations/*xml
        |___ JPEGImages/*jpeg

Next will be the actual training. Stay tuned.

Bio: Shay Strong is the Director of Data Science & Machine Learning at EagleView and has interests in Geospatial Machine Learning & Remote Sensing.

Original. Reposted with permission.