We recently launched our fledgling geodata portal on GitHub for the open datasets we’ll create in our new lab. In the spring we carved out a space on the 11th floor of the Sciences Library at Brown which we’ve christened GeoData@SciLi, a GIS and data consultation and work space. We’ll be doing renovations on both the webspace and workspace over the summer.
Our inaugural dataset was created by Ethan McIntosh, a senior (now graduate) who began working with me this spring. The dataset is the United Nations International Civil Service Commission’s (UN ICSC) Retail Price Indices with Details (RPID). The index measures the cost of living based on several categories of goods and services in duty stations around the world. It’s used to adjust the salaries of the UN’s international staff relative to UN headquarters in New York City (index value of 100 = cost of living in New York). The data is updated six times a year, published in an Excel spreadsheet that contains a macro that allows you to look up the value of each duty station via a dropdown menu. The UN ICSC makes the data public by request; you register and are granted access to download the data in PDF and Excel format in files that are packaged in one month / year at a time.
We were working with a PhD student in economics who wanted to construct a time-series of this data. Ethan wrote a Python script to aggregate all of the files from 2004 to present into a single CSV; the actual values for each country / duty station were stored in hidden cells that the macro pulled from, and he was able to pull them from these cells. He parsed the data into logical divisions, and added the standard 3-letter ISO 3166 country code to each duty station so that each record now has a unique place identifier. His script generates three outputs: a basic CSV of the data in separate month / year files, a “long” (aka flat) time series file where each record represents a specific duty station and retail index category or weight for a given month and year, and a “wide” time series file where the category / weight has been pivoted to a column, so each record represents all values for a duty station for a given month / year. He’s written the program to process and incorporate additional files as they’re published.
While the primary intention was to study this data as a time series in a statistical analysis, it can also be used for geospatial analysis and mapping. Using the wide file, I created the map in the header of this post, which depicts the total retail index for February 2022 for each country, where the value represents the duty station within the country (usually the capital city). I grabbed some boundaries from Natural Earth and joined the data to it using the ISO code. I classified the data using natural breaks, but manually adjusted the top level category to include all countries with a value greater than or equal to the base value of 100.
There were only five duty stations that were more expensive than New York, with values between 102 and 124: Tokyo, Ashkhabad (Turkmenistan), Singapore, Beirut, and Hong Kong. Beijing and Geneva were equivalent in price at 100. The least expensive stations with values between 52 and 69 were: Caracas (Venezuela), Tripoli, Damascus, Ankara (Turkey), Bucharest (Romania), Mbabane (Eswatini – formerly Swaziland), and Sofia (Bulgaria). There appears to be regional clustering of like values, although I didn’t run any tests. The station in the US that’s measured relative to NYC is Washington DC (index value of 89).
The final datasets and code used to generate them are available on GitHub, and we’ll update it at least once, if not a couple times, a year. We are not providing the original month / year macro spreadsheets; if you want those you should register with the UN ICSC and access them there. If you’re using our data files, you should still register with them, as they would like to be aware of how their data is being used.
We will post additional projects, datasets, and code in individual repos as we create them, linked to from our main page. I’m working on creating a basic metadata profile for our lab, so we’ll provide structured metadata for each of our datasets in the near future.