OpenStreetMap Data with ArcGIS Pro and QGIS

A couple years ago I wrote a post that demonstrated how to use the QuickOSM plugin for QGIS to easily extract features from the OpenStreetMap (OSM). The OSM is a great source for free and open GIS data, especially for types of features that are not captured in government sources, and for parts of the world that don’t possess a free or robust GIS data infrastructure. I’ve been using ArcGIS Pro more extensively in my new job and was wondering how I could do the same thing: query features from the OSM based on keys and values (denoting feature type) and geographic area and extract them as a vector layer. I’m looking for straightforward solutions that I could use for answering questions from students (so no command line tricks or database stuff). In this post I’ll cover three approaches for achieving this in ArcGIS Pro, with references to QGIS.

File Approach

The most straightforward method would be to export data directly from the main OSM page by zooming into an area and hitting the Export button. This is a pretty blunt approach, as you have to be zoomed in pretty close and you grab every possible feature in the view. The “native” file format of OSM is the osm / pbf format; .osm is an XML file while .pbf is a compressed binary version of the osm. QGIS is able to handle these files directly; you just add them as a vector layer. ArcGIS Pro cannot. You have to download and install a special Data Interoperability extension, which is an esoteric thing that’s not part of the standard package and requires a special license from your site license coordinator.

A better and more targeted approach is to download pre-created extracts that are provided by a number of organizations listed in the OSM wiki. I started with Geofabrik in Germany, as it was a source I recognized. They package OSM data by geographic area and feature type. On their main page they list files that contain all features for each of the continents. These are enormous files, and as such they are only provided in the osm pbf format as shapefiles can’t effectively handle data that size. Even if you downloaded the osm pbf files and added them to QGIS, the software will struggle to render something that big.

But all is not lost; Geofabrik and many other providers package data in a shapefile format for smaller areas, provided that the size and number of features is not too great. For instance, on Geofabrik’s download page if you click on North America you’re presented with country extracts for that continent (see images below). You can get shapefiles for Greenland and Mexico, but not Canada or the US as the files are still too big. Click on US, and you’re presented with files for each of the states. No luck for California (too big), but the rest of states are small enough that you can get shapefiles for all of them.

Geofabrik OSM data: download continents — Default Geofrabrik OSM download page for continents. Click on a continent name…

Geofabrik OSM data downloads: countries in North America — …to access files for countries. Click on a country name…

Geofabrik OSM data downloads: states of the US — …to access files for states / provinces / admin divisions

I downloaded and unzipped the file for Rhode Island. It contains a number of individual shapefiles classified by type of feature: buildings, land use, natural, places, places of worship (pofw), points of interest (pois), railways, roads, traffic, transport, water, and waterways. Many of the files appear twice: files with an “a” suffix represent polygons (areas) while files without that suffix are points or lines. Some OSM features are stored as polygons when such detail is available, while others are represented as points.

For example, if I add the two places of workship files to a map, for some features you have the outline of the actual building, while for most you simply have a point. After adding the layers to the map, you’ll probably want to use Select by Attribute to select the features you want based on OSM tags with keys and values, and Select by Location in conjunction with a separate boundary file to pull data out for a smaller area. The Geofabrik OSM attribute table is limited to basic attributes: an OSM ID, feature code and class, and name. It’s also likely that you’ll want to unify the point and polygon features of the same type into one layer, as they’re usually mutually exclusive. Use the Centroid (Polygon) tool in the toolbox to turn the polygons into points, and the Merge tool to meld the two point layers together. In QGIS the comparable tools under the Vector menu are Centroids and Merge Vector Layers. WGS 84 is the default CRS for the layers.

ArcGIS Pro with OSM Places of Worship from Geofabrik — OSM Places of Worship. Some features are stored as points while others are polygons

Geofabrik is just one option. There are several others and they take different approaches for structuring their extracts. For example, BBBike.org organizes their layers by city for over 200 cities around the world, and they provide a number of additional formats beyond OSM PBF and shapefiles, such as Garmin GPS, GeoJSON, and CSV. They divide the data into fewer files, and if they don’t compile data for the area you’re interested in you can use a web-based tool to create a custom extract.

Plugin Approach

It would be nice to use a plugin, as that would allow you to specify a custom geographic area and retrieve just the specific features you want. QuickOSM works quite nicely for QGIS. Fortunately there is a good ArcGIS Pro solution called OSMquery. It works for both Pro and Desktop, tested for Pro 2.2 and Desktop 10.6. I’m using Pro 2.7 and the basic tool worked fine. It’s well documented, with good instructions for installation and use.

The plugin is written in Python and you add it as a tool to your ArcToolbox. Download the repo from the OSMquery GitHub as a ZIP file (click the green code button and choose Download ZIP). Save it in or near your ArcGIS project folders, and unzip it. In Pro, go into a project and open a Catalog Pane in the View ribbon. Right click on Toolbox to add a new one, and browse to the folder you unzipped to add the tool. There are two scripts in the box, a basic and an advanced version. The basic tool functioned without trouble for me. The advanced tool threw an error, probably some Python dependency issue (I didn’t investigate as the basic tool met my needs).

In the basic tool you choose the key and value for the features you want to extract; the dropdown menu is automatically populated with these options. For the geographic extent you can enter a place name, or you can use the extent of the current map window or of a layer in the project, or you can manually type in bounding box coordinates. Another nice option is you can transform the CRS of the extracted features from WGS 84 to another system, so it matches the CRS of layers in your existing project. Run the tool, and the features are extracted. If the features exist as both points and polygons, you get two separate files for each. If you choose, you can merge them together as described in the previous section; this is a bit tougher as the plugin approach yields a much wider selection of fields in the attribute table, and not all of the point and polygon attributes align. With the Merge tool in Pro you can select which attributes you want to hold on to, and common ones will be merged. QGIS is a bit messier in this regard, but in my earlier post I outlined a work-around using a spatial database.

OSMquery tool in ArcGIS Pro — The basic OSMquery tool in an ArcGIS Pro toolbox

Web Feature Service

This initially seemed to be the most promising route, but it turned out to be a dud. Like QGIS, Pro allows you to add OSM as a tiled base map. But ESRI also offers OSM as a web feature service: by hitting Add Data on the Map ribbon and searching the Living Atlas for “OpenStreetMap” you can select from a number of OSM web feature services, organized by continent and feature type. Once you add them to a map, you can select and click on individual features to see their name and feature type. The big problem is that you are not allowed to extract features from these layers, which leaves you with an enormous and heterogeneous mix of features for an entire continent. You can interact with the features, selecting by attribute and location in reference to other spatial layers, but that’s about it.

In Summary

I would recommend taking the step of downloading the OSMquery plugin for ArcGIS Pro if you want to take a highly targeted approach to OSM feature extraction (for QGIS users, enable the QuickOSM plugin). This approach is also best if you can’t download a pre-existing extract for your area because it’s too large or has too many features, and if you want to access the fullest possible range of attribute values. Otherwise, you can simply download one of the pre-created extracts, and use your software to winnow it down to what you need (or if you do need everything, the file approach makes more sense). Since the file-based option includes fewer attributes, converting polygon features to points and merging them with the other point features is a bit simpler.

Call for Proposals: Celebrating the Census in the Journal of Maps

I’m serving as a co-editor for a special issue for the Journal of Maps entitled “Celebrating the Census“. The Journal of Maps is an open access, peer reviewed journal published by the Taylor & Francis Group. The journal is distinct in that all articles feature maps and spatial diagrams as the focal point for studying geographic phenomena from both a physical / environmental and social science perspective.

Here’s the official synopsis for this census-themed special issue:

We invite contributions to a special issue of the Journal of Maps focused upon the evolving character and cartographic opportunities offered by traditional census statistics and the impact of transitioning from these sources of population data at a range of spatial scales into a new era of big data assembly. In so doing, the special issue marks two important events taking place in the UK during 2021 in the history of British Censuses and seeks contributions that reflect the past transition of population data cartography through the digital era of the last 50 years and anticipates its transformation into the big data era of the foreseeable future.

While the issue marks the 100th anniversary of the UK census, submissions concerning census mapping from around the world are welcome and encouraged in these topic areas, including but not limited to:

Spatial and statistical consistency over time
People on the move
Mapping people through space and time
Mapping morbidity and mortality
Politics and population data
International comparison of demographic mapping
Before and after population mapping using censuses and administrative sources
Population data and mapping human-environmental interaction
Transition and evolution in population mapping

Visit the special issue announcement for full details. Deadlines:

April 30, 2021: a short draft (500-word limit) outlining themes and scope of the paper, preferably with a sample map
June 14, 2021: abstracts will be selected by the editorial team by this date
Sept 5, 2021: completed paper (4000-word limit) is due

The issue will be published sometime in 2022.

A New Year and a New Start

I have some news! After 13 1/2 years, January 31, 2021 will be my last day as the Geospatial Data Librarian at Baruch College, City University of New York (CUNY). On February 1, 2021, I will be the new GIS and Data Librarian at Brown University in Providence, Rhode Island!

It’s an exciting opportunity that I’m looking forward to. I will be building geospatial information and data services in the library from the ground up, in concert with many new colleagues. I will be working closely with the Population Studies Training Center (PSTC) and the Spatial Structures in Social Sciences (S4) as well as the Center for Digital Scholarship within the library. Several aspects of the position will be similar, as I will continue to provide research and consultation services, create research guides and tutorials, teach workshops, collect and create datasets, and eventually build and manage a data repository and small lab where we’ll provide services and peer mentor students.

The resources I’ve created at Baruch CUNY will remain accessible, and eventually a new person will take the reins. I have moved the latest materials for the GIS Practicum, my introductory QGIS tutorial and workshop, to this website and I hope to continue updating and maintaining this resource. There are a lot of people throughout CUNY that I’m going to miss, at: the Newman Library, the CUNY Institute for Demographic Research, the Weissman Center for International Business, the Marxe School, Baruch’s Journalism Department, the Geography Department at Lehman College, the Digital Humanities program and the CUNY Mapping Service at the CUNY Graduate Center, and many others.

I will continue writing posts and sharing tips and resources here based on my new adventures at Brown, but may need a little break as I transition… stay tuned!

Best – Frank

Adding Basemaps to QGIS With Web Mapping Services

For this final post of 2020, I was looking back through recent projects for something interesting yet brief; I’ve been writing some encyclopedia-length posts lately and wanted to keep this one on the lighter side. In that vein, I’ve decided to share a short list of free web mapping services that I use as basemaps in QGIS (they’ll work in ArcGIS too). This has been on my mind as I’ve recently stumbled upon the OpenTopoMap, which is an alternate stylized version of the OpenStreetMap that looks pretty sharp.

Comparison of OpenStreetMap (left) and OpenTopoMap (right)

See this earlier post for details, but in short, to connect to these services in QGIS:

Select the appropriate web map service type in the browser panel (usually WMS / WMTS or XYZ Tiles), right click, and add new connection.
Give it a meaningful name, paste the appropriate URL into the URL box, click OK.
In the browser panel drill down to see the service, and for WMS / WMTS layers you can drill down further to see specific layers you can add.
Select the layer and drag it into the window, or select, right click, and add the layer to the project.
If the resolution looks off, right click on a blank area of the toolbar and check the Tile Scale Panel. Use this to adjust the zoom for the web map. If the scale bar is greyed out you’ll need to set the map window to the same CRS as the map service: select the layer in the panel, right click, and choose set CRS – set project CRS from layer.
Some web layers may render slowly if you’re zoomed out to the full extent, or even not at all if they contain many features or are super detailed. Conversely, some layers may not render if you’re zoomed too far in, as tiles may not be available at that resolution. Experiment!

If you’re an ArcGIS user see these concise instructions for adding various tile layers. This isn’t something that I’ve ever done, as ArcGIS already has a number of accessible basemaps that you can add.

In the list below, links for the service name take you to either the website version of the service, or to a list of additional layers that you can connect to. The URLs that follow are the actual connections to the service that you’ll use within your GIS package. If you use OSM, OTP, or Stamen in your maps, make sure to cite them (they use Creative Commons Licenses – follow links to their websites for details). The government sources are public domain, but you should still cite them anyway. Happy mapping, and happy holidays!

OpenStreetMap XYZ Tile (global)

http://tile.openstreetmap.org/{z}/{x}/{y}.png

OpenTopoMap XYZ Tile (global)

https://tile.opentopomap.org/{z}/{x}/{y}.png

Stamen XYZ Tile (global) see their website for examples; the image topping this post is from watercolor

http://tile.stamen.com/terrain/{z}/{x}/{y}.png

http://tile.stamen.com/toner/{z}/{x}/{y}.png

http://tile.stamen.com/watercolor/{z}/{x}/{y}.jpg

U SGS National Map WMTS (global, but fine detail is US only)

Imagery:
https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/WMTS/1.0.0/WMTSCapabilities.xml

Imagery & Topo:
https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryTopo/MapServer/WMTS/1.0.0/WMTSCapabilities.xml

Shaded Relief: 
https://basemap.nationalmap.gov/arcgis/rest/services/USGSShadedReliefOnly/MapServer/WMTS/1.0.0/WMTSCapabilities.xml

Topographic:
https://basemap.nationalmap.gov/arcgis/rest/services/USGSTopo/MapServer/WMTS/1.0.0/WMTSCapabilities.xml

US Census Bureau TIGERweb WMS (US only) see their website for older vintages

Current TIGER features:
https://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WMSServer 

Current physical features:
https://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_PhysicalFeatures/MapServer/WMSServer

Dataset Roundup: A Summary of Specialized Open Data Sources

I list the top free GIS data sources that I consistently use on my Resources page; these are general, foundational sources that can be used for many applications. In this post I’m going to summarize an eclectic mix of more specialized resources that I’ve used or that have been recommended to me over this past year. I’ve categorized these into GIS datasets, sub-national population data for countries (tabular data that can be joined to GIS vector layers), and historic socio-economic data for countries.

Geospatial Data

North American Land Change Monitoring System

Published by the Commission for Environmental Cooperation, these land use and land cover rasters (see photo at the top of this post) are derived from MODIS imagery at 250 meter resolution for earlier years and either Landsat-7 or RapidEye imagery at 30 meter resolution for later years for Canada, the United States, and Mexico in 2005, 2010, and 2015. There are layers for both land cover and land cover change over a 5-year period. Land cover is classified into 19 categories based on UN FAO standards. It’s easy to download as the layer is unified (no individual tiles to mess with and stitch together) and for the 2015 series you can choose a national file or one for the entire continent.

PRISM Climate Data

Published by the Northwest Alliance for Computational Science & Engineering at Oregon State University, the PRISM Climate Group publishes climate data for the United States. You can generate daily, monthly, or 30-year normal rasters for temperature (min, max, mean), precipitation, dew point, and a few other measures for the continental US. There are also some prepackaged files that were created for special projects that cover Alaska, Hawaii, and some of the US territories. The site is very easy to use (certainly compared to other sites that provide climate data) and beyond its research applications the data is good for teaching purposes, as files are straightforward to create, download, and interpret.

Marineregions.org Marine Boundaries

I usually help people find vector boundaries for terrestrial features, and the oceans are an afterthought that appear as the absence of land. But what if you specifically needed features that represent oceans and seas? Marineregions.org, maintained by the Flanders Marine Institute, provides many sets of water-based boundaries that include maritime regions (legal sea zones around countries) as well as polygons that represent the boundaries of the oceans and largest seas (IHO Sea Areas, defined by the International Hydrographic Association). See the screenshot of this layer in QGIS below.

GNSS Time Series

Produced by NASA JPL, this dataset can be used for measuring vertical land movement (VLM) and subsistence, primarily due to movement of the earth’s tectonic plates. The dataset contains over 2,000 GPS observation points or stations; the majority are in the US but there are a scattering of points throughout the world. The data file for geodetic positions and velocities contains two records for every station: the POS (position) record provides data for the latitude (N), longitude (E), and elevation (V) in mm. The VEL (velocity) indicates the rate of movement over the time period by direction (N / E) and elevation. The last three columns for both sets of records are margins of error for each value. The data file is in a fixed-width text format. To use it in GIS you need to parse the data into a tabular format and drop the header information. When plotting the coordinates, the CRS for the geodetic file is IGS14 (EPSG code 9019). If your CRS library doesn’t include this system, it is roughly equivalent to ITRF2014 (EPSG code 7789).

Subnational Population Data

IPUMS Terra

Are you looking for population or socio-economic data for the first-level administrative divisions (states, provinces, departments, districts, etc) for many different countries? IPUMS Terra is part of the IPUMS series at the Minnesota Population Center, Univ of Minnesota. The data has been gathered from census and statistical agencies of individual countries, or in some cases from estimates generated by the project. Choose the "Create Your Custom Dataset" option, then on the next screen choose "Start Extract Area Level Output". On the Extract Builder (see pic below) choose variables on the left, like Demographic and Total Population. Then under Datasets on the right you can choose countries and filter by year. Once you move on to the next screen, you can choose to harmonize the output or choose specific years, and choose your administrative level: national, ADM-1, or smallest available. You must register to use the IPUMS data series, but registration is free for educational and non-commercial use (as long as you cite IPUMS as the source).

Subnational Human Development Index

An alternative for first-level admin data is the Subnational Human Development Index published by the GlobalDataLab at the Institute for Management Research at Radboud University. There are far fewer variables and less customization compared to IPUMS Terra, but as such the site is smaller and easier to use. There are several different indices for measuring human development, but you can also access the following indicators: life expectancy, GNI per capita, expected and mean years of schooling, and population size in millions.

Historic Global Population and Economic Data

Maddison Project

Yes, that’s Maddison with two "ds". This project from the Groningen Growth and Development Centre at the University of Groningen generates comparative economic growth, income, and population data for countries over a long historical time span; back to the year AD 1 in a few cases, but for the most part from AD 1500 forward. They provide detailed documentation that explains how the dataset was created, and it’s easy to download in either an Excel or STATA format.

The World Countries Urban Population

This dataset consists of two spreadsheet files – one for the total urban population and another for the urban ratio of the population for countries going back to the year 1500. The dataset was created by Jonathan Fink-Jensen at Utrecht University and is held in the International Institute of Social History’s data repository. The repository contains a variety of other historic socio-economic datasets for many different countries.

Creating Lists of Country Admin Divisions with Geonames and Python

I’m working on a project where I needed to generate a list of all the administrative subdivisions (i.e. states / provinces, counties / departments, etc) and their ID codes for several different countries. In this post I’ll demonstrate how I accomplished this using the Geonames API and Python. Geonames (https://www.geonames.org/) is a gazetteer, which is a directory of geographic features that includes coordinates, variant place names, and identifiers that classify features by type and location. Geonames includes many different types of administrative, populated, physical, and built-environment features. Last year I wrote a post about gazetteers where I compared Geonames with the NGA Geonet Names Server, and illustrated how to download CSV files for all places within a given country and load them into a database.

In this post I’ll focus on using an API to retrieve gazetteer data. Geonames provides over 40 different REST API services for accessing their data, all of them well documented with examples. You can search for places by name, return all the places that are within a distance of a set of coordinates, retrieve all places that are subdivisions of another place, geocode addresses, obtain lists of centroids and bounding boxes, and much more. Their data is crowd sourced, but is largely drawn from a body of official gazetteers and directories published by various countries.

This makes it an ideal source for generating lists of administrative divisions and subdivisions with codes for multiple countries. This information is difficult to find, because there isn’t an international body that collates and freely provides it. ISO 3166-1 includes the standard country codes that most of the world uses. ISO 3166-2 includes codes for 1st-level administrative divisions, but ISO doesn’t freely publish them. You can find them from various sources; Wikipedia lists them and there are several gist and github repos with screen scraped copies. The US GNA is a more official source that includes both ISO 3166 1 and 2. But as far as I know there isn’t a solid source for codes below the 1st level divisions. Many countries create their own systems and freely publish their codes (i.e. ANSI FIPS codes in the US, INSEE COG codes in France), but that would require you to tie them altogether. GADM is the go to source for vector-based GIS files of country subdivisions (map at the top of this post for example). For some countries they include ISO division codes, but for others they don’t (they do employ the HASC codes from Statoids, but it’s not clear if these codes are still being actively maintained) .

Geonames to the rescue – you can browse the countries on the website to see the country and 1st level admin codes (see image below), but the API will give us a quick way to obtain all division levels. First, you have to register to get an API username – it’s free – and you’ll tack that username on to the end of your requests. That gives you 20k credits per day, which in most instances equates with 1 request per credit. I recommend downloading one of their prepackaged country files first, to give you a sense for how the records are structured and what attributes are available. A readme file that describes all of the available fields accompanies each download.

1st Level Admin Divisions from Geonames — 1st Level Admin Divisions for Dominica from the Geonames website

My goal was to get all administrative divisions – names and codes and how the divisions nest within each other – for all of the countries in the French-speaking Caribbean (countries that are currently, or formerly, overseas territories of France). I also needed to get place names as they’re written in French. I’ll walk through my Python script that I’ve pasted below.

import requests,csv
from time import strftime

ccodes=['BL','DM','GD','GF','GP','HT','KN','LC','MF','MQ','VC']
fclass='A'
lang='fr' 
uname='REQUEST FROM GEONAMES'

#Columns to keep
fields=['countryId','countryName','countryCode','geonameId','name','asciiName',
        'alternateNames','fcode','fcodeName','adminName1','adminCode1',
        'adminName2','adminCode2','adminName3','adminCode3','adminName4','adminCode4',
        'adminName5','adminCode5','lng','lat']
fcode=fields.index('fcode')

#Divisions to keep
divisions=['ADM1','ADM2','ADM3','ADM4','ADM5','PCLD','PCLF','PCLI','PCLIX','PCLS']

base_url='http://api.geonames.org/searchJSON?'

def altnames(names,lang):
    "Given a dict of names, extract preferred names for a given language"
    aname=''
    for entry in names:
        if 'isPreferredName' in entry.keys() and entry['lang']==lang:
            aname=entry.get('name')
        else:
            pass
    return aname

places=[]
tossed=[]

for country in ccodes:
    data_url = f'{base_url}?name=*&country={country}&featureClass={fclass}&lang={lang}&style=full&username={uname}'
    response=requests.get(data_url)
    data=response.json() #total retrieved and results in list of dicts
    gnames=response.json()['geonames'] #create list of dicts only
    gnames.sort(key=lambda e: (e.get('countryCode',''),e.get('fcode',''),
                               e.get('adminCode1',''),e.get('adminCode2',''),
                               e.get('adminCode3',''),e.get('adminCode4',''),
                               e.get('adminCode5','')))
    for record in gnames:
        r=[]
        for f in fields:
            item=record.get(f,'')
            if f=='alternateNames' and f !='': 
                aname=altnames(item,'en')
                r.append(aname)             
            else:
                r.append(item)             
        if r[fcode] in divisions: #keep certain admin divs, toss others
            places.append(r)
        else:
            tossed.append(r)
        
filetoday=strftime('%Y_%m_%d')
outfile='geonames_fwi_adm_'+filetoday+'.csv'
    
writefile=open(outfile,'w', newline='', encoding='utf8')
writer=csv.writer(writefile, delimiter=",", quotechar='"',quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(fields) #header row
writer.writerows(places)
writefile.close()
        
print(len(places),'records written to file',outfile)

First, I identify all of the variables I need: the two-letter ISO codes of the countries, a list of the Geonames attributes that I want to keep, the two-letter language code, and the specific feature type I’m interested in. There are different features codes classified with a single letter, and a number of subtypes below that. Feature class A is for records that represent administrative divisions, and within that class I needed records that represented the country as a whole (PCL codes) and its subdivisions (ADM codes). There are several different place name variables that include official names, short forms, and an ASCII form that only includes characters found in the Latin alphabet used in English. The language code that you pass into the url will alter these results, but you still have the option to obtain preferred place names from an alternate languages field. The admin codes I’m retrieving are the actual admin codes; you can also opt to retrieve the unique Geonames integer IDs for each admin level, if you wanted to use these for bridging places together (not necessary in my case).

There are a few different approaches for achieving this goal. I decided to use the Geonames full text search, where you search for features by name (separate APIs for working with hierarchies for parent and child entities are another option). I used an asterisk as a wildcard to retrieve all names, and the other parameters to filter for specific countries and feature classes. At the end of the base url I added JSON for the search; if you leave this off the records are returned as XML.

base_url='http://api.geonames.org/searchJSON?'

My primary for loop loops through each country, and passes the parameters into the data url to retrieve the data for that country: I pass in country code, feature class A, and French as the language for the place names. It took me a while to figure out that I also needed to add style=full to retrieve all of the possible info that’s available for a given record; the default is to capture a subset of basic information, which lacked the admin codes I needed.

data_url = f'{base_url}?name=*&country={country}&featureClass={fclass}&lang={lang}&style=full&username={uname}'

I use the Python Requests module to interact with the API. Geonames returns two objects in JSON: an integer count of the total records retrieved, and another JSON object that essentially represents a list of python dictionaries, where each dictionary contains all the attributes for a record as a series of key and value pairs where the key is the attribute name (see examples below). I create a new gnames variable to isolate just this list, and then I sort the list based on how I want the final output to appear; by country and by admin codes, so that like-levels of admin codes are grouped together. The trick of using lamba to sort nested lists or dictionaries is well documented, but one variation I needed was to use the dictionary get method. Some features may not have five levels of admin codes; if they don’t then there is no key for that attribute, and using the simple dict[key] approach returns an error in those cases. Using dict.get(key,”) allows you to pass in a default value if no key is present. I provide a blank string as a placeholder, as ultimately I want each record to have the same number of columns in the output and need the attributes to line up correctly.

Gnames List — Records returned from Geonames as a list, where each list item is a dictionary of key / value pairs for a given place

Example of an individual list item, a dictionary of key / value pairs for the Parish of Charlotte, a 1st order admin division of Saint-Vincent-et-les-Grenadines. Variable names are keys.

Once I have records for the first country, I loop through them and choose just the attributes that I want from my field list. The attribute name is the key, I get the associated value, but if that key isn’t present I insert an empty string. In most cases the value associated with a key is a string or integer, but in a few instance it’s another container, as in the case of alternate names which is another list of dictionaries. If there are alternate names I want to pull out a preferred name in English if one exists. I handle this with a function so the loop looks less cluttered. Lastly, if this record represents an admin division or is a country-level record then I want to keep it, otherwise I append it to a throw-away list that I’ll inspect later.

Once the records returned for that country have been processed, we move on to the next country and keep appending records to the main list of places (image below). When we’re done, we write the results out to a CSV file. I write the list of fields out first as my header row, and then the records follow.

Final list called places that contains records for all admin divisions for specific countries and feature classes, where items are sublists that represent each place

Overall I think this approach worked well, but there are some small caveats. A number of the countries I’m studying are not independent, but are dependencies of France. For dependent countries, their 1st and sometimes even 2nd level subdivision codes appear identical to their top-level country code, as they represent a subdivision of an independent country (many overseas territories are departments of France). If I need to harmonize these codes between countries I may have to adjust the dependencies. The alternate English places names always appear for the country-level record, but usually not below that. I think I’d need to do some additional tweaking or even run a second set of requests in English if I wanted all the English spellings; for example in French many compound place names like Saint-Paul are separated by a hyphen, but in English they’re separated by a space. Not a big deal in my case as I was primarily interested in the alternate spellings for countries, i.e. Guyane versus French Guiana. See the final output below for Guyane; these subdivision codes are from INSEE COG, which are the official codes used by the French government for identifying all geographic areas for both metropolitan France and overseas departments and collectivities.

1st half of CSV file imported into spreadsheet, records showing admin divisions of Guyane / French Guiana

2nd half of CSV file imported into spreadsheet, records showing admin codes and hierarchy of divisions for Guyane / French Guiana

Two final things to point out. First, my script lacks any exception handling, since my request is rather small and the API is reliable. If I was pulling down a lot of data I would replace my main for loop with a try and except block to handle errors, and would capture retrieved data as the process unfolds in case some problem forces a bail out. Second, when importing the CSV into a spreadsheet or database, it’s important to import the admin codes as text, as many of them have leading zeros that need to be preserved.

This example represents just the tip of the iceberg in terms of things you can do with Geonames APIs. Check it out!

At These Coordinates

Dispatches from the Geospatial Data World