Gold BlogPython, Selenium & Google for Geocoding Automation: Free and Paid

This tutorial will take you through two options that have automated the geocoding process for the user using Python, Selenium and Google Geocoding API.



figure-name

Geocoding is the beginning of Spatial Analysis. Whether geocoding a place or residence, an accurate geocoder is the first step to make sure your analysis is on the right track. This tutorial will take you through two options that have automated the geocoding process for the user using Python, Selenium and Google Geocoding API. Automation is something every Data Scientist needs to learn. But automating workflows is unique to every person and not everyone will have the same workflow. Automation is something to strive for when we believe that repeating the same process is too time consuming and your time is better spent doing something else while your automated process will take care of everything.

 

Free Geocoding with Python and Selenium

 

The first tutorial was written to automate my geocoding process when a paid service was not an option. From my personal observation, my script has given me a minimum 70% accuracy given the following standards are met:

  • The address must not contain the name of a place.
  • Minimal spelling errors
  • The country is also part of the address

This script assumes you are:

  • Using xlxs file extension (Microsoft Excel Workbook)
  • The whole address is in one column.
  • Using Google Chrome

Note: When using selenium, make sure that you download and keep the Google Chrome Diriver File in the same directory as your script and that the version is compatible with your own Google Chrome Browser.

The final output will be an excel file with the geocoded address and coordinates in separate columns.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import ElementClickInterceptedException
import pandas as pd
import time
from openpyxl import load_workbook
from random import randint

wb_name = "xxx.xlsx" #file name


wb = load_workbook(wb_name, data_only = True)
ws = wb['sheet_name']
address_list =[]
link_col = xx  #column number

coord_prospects = pd.DataFrame() 

for row in ws.iter_rows(min_row = x , max_row = x, min_col = link_col, max_col=link_col):
    if str(row[0].value) != "None":
        address_list.append(row[0].value)

driver = webdriver.Chrome(options=options)
#driver.minimize_window() #this is optional if the opening google chrome window gets annoying
driver.get('https://www.mapdevelopers.com/geocode_tool.php')
add = driver.find_element_by_class_name('form-control')

for t,a in enumerate(address_list):
    print ("Geocoding...",t+1,"/",len(address_list),str(round(t/len(address_list)*100,2)),"%"," : ", a)
    add.clear()
    add.send_keys(a)
    try:
        search1 = driver.find_element_by_xpath('//*[@id="search-form"]/div[1]/span[2]').click()
        time.sleep(3)
        search2 = driver.find_element_by_xpath('//*[@id="search-form"]/div[1]/span[2]').click()
        time.sleep(3)
    except ElementClickInterceptedException:
        time.sleep(2)
        search = driver.find_element_by_xpath('//*[@id="search-form"]/div[1]/span[2]').click()
    lat=driver.find_element_by_id('display_lat')
    lng=driver.find_element_by_id('display_lng')
    street=driver.find_element_by_id('display_address')
    city=driver.find_element_by_id('display_city')
    postcode=driver.find_element_by_id('display_zip')
    state=driver.find_element_by_id('display_state')
    county=driver.find_element_by_id('display_county')
    country=driver.find_element_by_id('display_country')
    latlng = pd.DataFrame({'Latitude':pd.Series(lat.text),
                            'Longitude':pd.Series(lng.text),
                            'Street':pd.Series(street.text),
                            'City':pd.Series(city.text),
                            'Postcode':pd.Series(postcode.text),
                            'State':pd.Series(state.text),
                            'County':pd.Series(county.text),
                            'Country':pd.Series(country.text)})
    coord_prospects = coord_prospects.append(latlng, ignore_index=True)
    print(coord_prospects.tail(1))
    print("   ")

coord_prospects.to_excel('xxxx.xlsx') #name of output excel file

driver.close()


Once you have updated wb_name, link_col and the name of the output excel file, you can begin running the script.

The above script once run will open a google chrome browser and go to https://www.mapdevelopers.com/geocode_tool.php where you will see the your geocoding taking place. If not for automation, the usual scenario will be keeping a excel and google screen open and a repetitive copy and paste motion. A person with thousands of rows will just be wasting time.

There are many ways to automate and improve on geocoding and this is a method I personally developed for when I do not feel like paying for an API and I am not time constrained. This brings me to some of the drawbacks of using a free geocoding method versus an API:

  • It is very slow in comparison but more efficient and fast than manually geocoding
  • Accuracy is rarely and almost never beaten

 

Google Geocoding with Python

 

Google used to allow a certain amount (about 2,500 API calls) for its geocoding per day for free. This is an annoying limitation but one that allowed for highly accurate geocoding even if your address had some spelling mistakes and even better if you had a place name. However recently google now charges for every API call.

If not mistaken, currently the damage is $5 for every 1,000 API calls and $4 for every API call above 100,000. To put it into perspective, for the first 100,000, you will be spending $0.005 for every API call. Anything above 500,000 may need a call to the their sales team. See here for their Geocoding API billing. Every new sign up makes you eligible for $200 credit.

In order to start using Google Geocoder API, you have to obtain an API key: Google Developers API Key

Once you obtain your API Key for their geocoding service, you can start using their service. The address used can include place names as well and this should even improve the results most of the time. If it can be found on google maps when you manually check, it can be used with google's geocoder to extract the coordinates.

import pandas as pd
import googlemaps

api = "xxx" #API Key


df = pd.read_excel("xxx.xlsx") #file name

geocoder = googlemaps.Client(key=api)
df['Latitude'] = None
df['Longitude'] = None
df['Google Address'] = None

for i in range(len(df)):
    print("Geocoding..."+" "+str(i)+"/"+str(len(df)) + " " + str(round(i/len(df)*100,2))+"%")
    result = geocoder.geocode(df.loc[i,"xxx"]) # xxx is the name of the column with the full address
    try:
        lat = result[0]["geometry"]["location"]["lat"]
        lng = result[0]["geometry"]["location"]["lng"]
        address = result[0]["formatted_address"]
        df.loc[i,"Latitude"] = lat
        df.loc[i,"Longitude"] = lng
        df.loc[i,"Google Address"] = address
    except:
        lat=None
        lng=None
        location=None
print(df)
df.to_excel("xxx.xlsx") #name of output file


The only parts you need to add are api, df, name of the column with the full address and the name of the output file at the end.

The above script will return an excel file with three new columns: 'Latitude', 'Longitude', 'Google Address'. It will be much faster than any custom script built. Google is undoubtedly the most preferred geocoding API as it is reliable for a client side project. In a business setting, if a client wishes to have the coordinates of their customers residence mapped, visualized and analyzed, explaining the use of google to geocode will be simple and understandable to them.

This method uses much less lines of code and is easier to maintain. With no restrictions and a relatively cheap price per API call, it can be feasible to a certain extent.

Related: