Getting longitude-latitude coordinates for a (long) list of cities using Python and a free API

[This post was originally published on my blog]

Today I’ve decided to expand the number of cities included on my murder rate map to everywhere with 100,000+ people.
In order to do that using the FBI data (which only includes the names of the cities), I need to find the longitude-latitude for each city on my data set, and add it as new columns. This was not a big deal for the previous case, when I had 35 cities, but now my data set includes over 400, so I obviously won’t be looking them up by hand.

Here is one way of doing it using Python:

First, you need to create a free account on OpenCage Geocoder, which is an API that can be use to look up coordinates of places, and also find out the place a set of coordinates corresponds to. You can use any API you want, really. I just picked this one for simplicity and convenience. You will then get YOUR_API_KEY that you need to use every time that you make a request for a location. You also need to install and import the corresponding Python package, opencage (here is a tutorial in case you want more info).

from opencage.geocoder import OpenCageGeocode

Let’s start with a simple example, by looking for the coordinates of one single place. As an example, I’m gonna use Bijuesca, the village in Spain where I grew up, because it is awesome.

key = YOUR_API_KEY  # get api key from:  https://opencagedata.com

geocoder = OpenCageGeocode(key)

query = 'Bijuesca, Spain'

results = geocoder.geocode(query)

print (results)

The ‘results’ variable has a lot more information than we need right now:

but you can access the important fields that include the info about the coordinates in a similar way as when accessing a Python dictionary:

lat = results[0]['geometry']['lat']
lng = results[0]['geometry']['lng']
print (lat, lng)

41.5405092 -1.9203562

Which are Bijuesca’s coordinates!


Ok, so now we are ready to get the coordinates for all the cities in my data set, which looks like this:

As the simplest, not-most-efficient approach, I am going to iterate over each row to get the city and state, then use the API to get the corresponding coordinates. I’ll save longitudes and latitudes in two separate lists. Then I can add these two lists as new columns once I’m done:

list_lat = []   # create empty lists
list_long = []

for index, row in df_crime_more_cities.iterrows(): # iterate over rows in dataframe

City = row['City'] State = row['State'] query = str(City)+','+str(State) results = geocoder.geocode(query) lat = results[0]['geometry']['lat'] long = results[0]['geometry']['lng'] list_lat.append(lat) list_long.append(long)

# create new columns from lists
df_crime_more_cities['lat'] = list_lat
df_crime_more_cities['lon'] = list_long

Here we have our dataframe with the new added columns: