qgis

OSM Web Feature Service

OpenStreetMap Data with ArcGIS Pro and QGIS

A couple years ago I wrote a post that demonstrated how to use the QuickOSM plugin for QGIS to easily extract features from the OpenStreetMap (OSM). The OSM is a great source for free and open GIS data, especially for types of features that are not captured in government sources, and for parts of the world that don’t possess a free or robust GIS data infrastructure. I’ve been using ArcGIS Pro more extensively in my new job and was wondering how I could do the same thing: query features from the OSM based on keys and values (denoting feature type) and geographic area and extract them as a vector layer. I’m looking for straightforward solutions that I could use for answering questions from students (so no command line tricks or database stuff). In this post I’ll cover three approaches for achieving this in ArcGIS Pro, with references to QGIS.

File Approach

The most straightforward method would be to export data directly from the main OSM page by zooming into an area and hitting the Export button. This is a pretty blunt approach, as you have to be zoomed in pretty close and you grab every possible feature in the view. The “native” file format of OSM is the osm / pbf format; .osm is an XML file while .pbf is a compressed binary version of the osm. QGIS is able to handle these files directly; you just add them as a vector layer. ArcGIS Pro cannot. You have to download and install a special Data Interoperability extension, which is an esoteric thing that’s not part of the standard package and requires a special license from your site license coordinator.

A better and more targeted approach is to download pre-created extracts that are provided by a number of organizations listed in the OSM wiki. I started with Geofabrik in Germany, as it was a source I recognized. They package OSM data by geographic area and feature type. On their main page they list files that contain all features for each of the continents. These are enormous files, and as such they are only provided in the osm pbf format as shapefiles can’t effectively handle data that size. Even if you downloaded the osm pbf files and added them to QGIS, the software will struggle to render something that big.

But all is not lost; Geofabrik and many other providers package data in a shapefile format for smaller areas, provided that the size and number of features is not too great. For instance, on Geofabrik’s download page if you click on North America you’re presented with country extracts for that continent (see images below). You can get shapefiles for Greenland and Mexico, but not Canada or the US as the files are still too big. Click on US, and you’re presented with files for each of the states. No luck for California (too big), but the rest of states are small enough that you can get shapefiles for all of them.

Geofabrik OSM data: download continents
Default Geofrabrik OSM download page for continents. Click on a continent name…
Geofabrik OSM data downloads: countries in North America
…to access files for countries. Click on a country name…
Geofabrik OSM data downloads: states of the US
…to access files for states / provinces / admin divisions

I downloaded and unzipped the file for Rhode Island. It contains a number of individual shapefiles classified by type of feature: buildings, land use, natural, places, places of worship (pofw), points of interest (pois), railways, roads, traffic, transport, water, and waterways. Many of the files appear twice: files with an “a” suffix represent polygons (areas) while files without that suffix are points or lines. Some OSM features are stored as polygons when such detail is available, while others are represented as points.

For example, if I add the two places of workship files to a map, for some features you have the outline of the actual building, while for most you simply have a point. After adding the layers to the map, you’ll probably want to use Select by Attribute to select the features you want based on OSM tags with keys and values, and Select by Location in conjunction with a separate boundary file to pull data out for a smaller area. The Geofabrik OSM attribute table is limited to basic attributes: an OSM ID, feature code and class, and name. It’s also likely that you’ll want to unify the point and polygon features of the same type into one layer, as they’re usually mutually exclusive. Use the Centroid (Polygon) tool in the toolbox to turn the polygons into points, and the Merge tool to meld the two point layers together. In QGIS the comparable tools under the Vector menu are Centroids and Merge Vector Layers. WGS 84 is the default CRS for the layers.

ArcGIS Pro with OSM Places of Worship from Geofabrik
OSM Places of Worship. Some features are stored as points while others are polygons

Geofabrik is just one option. There are several others and they take different approaches for structuring their extracts. For example, BBBike.org organizes their layers by city for over 200 cities around the world, and they provide a number of additional formats beyond OSM PBF and shapefiles, such as Garmin GPS, GeoJSON, and CSV. They divide the data into fewer files, and if they don’t compile data for the area you’re interested in you can use a web-based tool to create a custom extract.

Plugin Approach

It would be nice to use a plugin, as that would allow you to specify a custom geographic area and retrieve just the specific features you want. QuickOSM works quite nicely for QGIS. Fortunately there is a good ArcGIS Pro solution called OSMquery. It works for both Pro and Desktop, tested for Pro 2.2 and Desktop 10.6. I’m using Pro 2.7 and the basic tool worked fine. It’s well documented, with good instructions for installation and use.

The plugin is written in Python and you add it as a tool to your ArcToolbox. Download the repo from the OSMquery GitHub as a ZIP file (click the green code button and choose Download ZIP). Save it in or near your ArcGIS project folders, and unzip it. In Pro, go into a project and open a Catalog Pane in the View ribbon. Right click on Toolbox to add a new one, and browse to the folder you unzipped to add the tool. There are two scripts in the box, a basic and an advanced version. The basic tool functioned without trouble for me. The advanced tool threw an error, probably some Python dependency issue (I didn’t investigate as the basic tool met my needs).

In the basic tool you choose the key and value for the features you want to extract; the dropdown menu is automatically populated with these options. For the geographic extent you can enter a place name, or you can use the extent of the current map window or of a layer in the project, or you can manually type in bounding box coordinates. Another nice option is you can transform the CRS of the extracted features from WGS 84 to another system, so it matches the CRS of layers in your existing project. Run the tool, and the features are extracted. If the features exist as both points and polygons, you get two separate files for each. If you choose, you can merge them together as described in the previous section; this is a bit tougher as the plugin approach yields a much wider selection of fields in the attribute table, and not all of the point and polygon attributes align. With the Merge tool in Pro you can select which attributes you want to hold on to, and common ones will be merged. QGIS is a bit messier in this regard, but in my earlier post I outlined a work-around using a spatial database.

OSMquery tool in ArcGIS Pro
The basic OSMquery tool in an ArcGIS Pro toolbox

Web Feature Service

This initially seemed to be the most promising route, but it turned out to be a dud. Like QGIS, Pro allows you to add OSM as a tiled base map. But ESRI also offers OSM as a web feature service: by hitting Add Data on the Map ribbon and searching the Living Atlas for “OpenStreetMap” you can select from a number of OSM web feature services, organized by continent and feature type. Once you add them to a map, you can select and click on individual features to see their name and feature type. The big problem is that you are not allowed to extract features from these layers, which leaves you with an enormous and heterogeneous mix of features for an entire continent. You can interact with the features, selecting by attribute and location in reference to other spatial layers, but that’s about it.

OSM web feature service in ArcGIS Pro
OSM web feature service in ArcGIS Pro

In Summary

I would recommend taking the step of downloading the OSMquery plugin for ArcGIS Pro if you want to take a highly targeted approach to OSM feature extraction (for QGIS users, enable the QuickOSM plugin). This approach is also best if you can’t download a pre-existing extract for your area because it’s too large or has too many features, and if you want to access the fullest possible range of attribute values. Otherwise, you can simply download one of the pre-created extracts, and use your software to winnow it down to what you need (or if you do need everything, the file approach makes more sense). Since the file-based option includes fewer attributes, converting polygon features to points and merging them with the other point features is a bit simpler.

Brown University on OpenTopoMap

A New Year and a New Start

I have some news! After 13 1/2 years, January 31, 2021 will be my last day as the Geospatial Data Librarian at Baruch College, City University of New York (CUNY). On February 1, 2021, I will be the new GIS and Data Librarian at Brown University in Providence, Rhode Island!

It’s an exciting opportunity that I’m looking forward to. I will be building geospatial information and data services in the library from the ground up, in concert with many new colleagues. I will be working closely with the Population Studies Training Center (PSTC) and the Spatial Structures in Social Sciences (S4) as well as the Center for Digital Scholarship within the library. Several aspects of the position will be similar, as I will continue to provide research and consultation services, create research guides and tutorials, teach workshops, collect and create datasets, and eventually build and manage a data repository and small lab where we’ll provide services and peer mentor students.

The resources I’ve created at Baruch CUNY will remain accessible, and eventually a new person will take the reins. I have moved the latest materials for the GIS Practicum, my introductory QGIS tutorial and workshop, to this website and I hope to continue updating and maintaining this resource. There are a lot of people throughout CUNY that I’m going to miss, at: the Newman Library, the CUNY Institute for Demographic Research, the Weissman Center for International Business, the Marxe School, Baruch’s Journalism Department, the Geography Department at Lehman College, the Digital Humanities program and the CUNY Mapping Service at the CUNY Graduate Center, and many others.

I will continue writing posts and sharing tips and resources here based on my new adventures at Brown, but may need a little break as I transition… stay tuned!

Best – Frank

Stamen Watercolor Map Tiles

Adding Basemaps to QGIS With Web Mapping Services

For this final post of 2020, I was looking back through recent projects for something interesting yet brief; I’ve been writing some encyclopedia-length posts lately and wanted to keep this one on the lighter side. In that vein, I’ve decided to share a short list of free web mapping services that I use as basemaps in QGIS (they’ll work in ArcGIS too). This has been on my mind as I’ve recently stumbled upon the OpenTopoMap, which is an alternate stylized version of the OpenStreetMap that looks pretty sharp.

See this earlier post for details, but in short, to connect to these services in QGIS:

QGIS Browser Panel
  1. Select the appropriate web map service type in the browser panel (usually WMS / WMTS or XYZ Tiles), right click, and add new connection.
  2. Give it a meaningful name, paste the appropriate URL into the URL box, click OK.
  3. In the browser panel drill down to see the service, and for WMS / WMTS layers you can drill down further to see specific layers you can add.
  4. Select the layer and drag it into the window, or select, right click, and add the layer to the project.
  5. If the resolution looks off, right click on a blank area of the toolbar and check the Tile Scale Panel. Use this to adjust the zoom for the web map. If the scale bar is greyed out you’ll need to set the map window to the same CRS as the map service: select the layer in the panel, right click, and choose set CRS – set project CRS from layer.
  6. Some web layers may render slowly if you’re zoomed out to the full extent, or even not at all if they contain many features or are super detailed. Conversely, some layers may not render if you’re zoomed too far in, as tiles may not be available at that resolution. Experiment!

If you’re an ArcGIS user see these concise instructions for adding various tile layers. This isn’t something that I’ve ever done, as ArcGIS already has a number of accessible basemaps that you can add.

In the list below, links for the service name take you to either the website version of the service, or to a list of additional layers that you can connect to. The URLs that follow are the actual connections to the service that you’ll use within your GIS package. If you use OSM, OTP, or Stamen in your maps, make sure to cite them (they use Creative Commons Licenses – follow links to their websites for details). The government sources are public domain, but you should still cite them anyway. Happy mapping, and happy holidays!

OpenStreetMap XYZ Tile (global)

http://tile.openstreetmap.org/{z}/{x}/{y}.png

OpenTopoMap XYZ Tile (global)

https://tile.opentopomap.org/{z}/{x}/{y}.png

Stamen XYZ Tile (global) see their website for examples; the image topping this post is from watercolor

http://tile.stamen.com/terrain/{z}/{x}/{y}.png
http://tile.stamen.com/toner/{z}/{x}/{y}.png
http://tile.stamen.com/watercolor/{z}/{x}/{y}.jpg

USGS National Map WMTS (global, but fine detail is US only)

Imagery:
https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/WMTS/1.0.0/WMTSCapabilities.xml

Imagery & Topo:
https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryTopo/MapServer/WMTS/1.0.0/WMTSCapabilities.xml

Shaded Relief: 
https://basemap.nationalmap.gov/arcgis/rest/services/USGSShadedReliefOnly/MapServer/WMTS/1.0.0/WMTSCapabilities.xml

Topographic:
https://basemap.nationalmap.gov/arcgis/rest/services/USGSTopo/MapServer/WMTS/1.0.0/WMTSCapabilities.xml

US Census Bureau TIGERweb WMS (US only) see their website for older vintages

Current TIGER features:
https://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WMSServer 

Current physical features:
https://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_PhysicalFeatures/MapServer/WMSServer

QGIS 3.10 Screenshot

QGIS 3.10 Tutorial Workbook

I just posted an updated version of my QGIS tutorial / workbook manual, Introduction to GIS Using Open Source Software. This version was written for QGIS 3.10 A Coruña, which recently superseded QGIS 3.4 Madeira as the current Long Term Release (LTR). The LTR is intended to be more stable than the current releases and is supported for at least a year.

The workbook was designed to accompany a day-long introductory workshop that I teach and is divided into five chapters. Chapter 1 is a broad and concise overview of GIS, chapters 2 to 4 are hands-on exercises that cover: the basics of using the interface and the difference between vectors and rasters (chapter 2), a site selection analysis that demonstrates geoprocessing, spatial selection, table joins, coordinate plotting, expressions, and spatial analysis (chapter 3), and a thematic mapping example that illustrates coordinate reference systems (CRS), data classification, and mapping (chapter 4). Chapter 5 summarizes data sources and resources for learning more about GIS. In chapters 2 to 4 the steps for doing the exercises are kept concise with many screenshots, while detailed commentary explaining how everything works follows.

The manual and tutorial data are freely available for personal and classroom use under a Creative Commons license. I’m providing the material for both 3.10 A Coruna and 3.4 Madeira for now, but will take down the latter at the end of the spring semester (late May 2020).

The changes between 3.4 and 3.10 are not dramatic as far as the basic tools and principles that I cover in the book go, but I thought an update was worthwhile as there are just enough changes that could trip up new users (see the 3.10 visual change log for a full list of software updates).  In addition to incorporating changes to the interface, I also took the opportunity to tighten up and condense the material. In particular, I consolidated the coordinate reference system (CRS) exercises in chapter 4 from two sections to one, because in practice I found that it was overkill for a one-day session.

Here are a few noteworthy changes to the tutorial and software that impact novice users:

  1. The default setting for the toolbar buttons is rather small, so during the setup phase in chapter 2 I inserted an optional step to make them bigger. Go to: Settings > Options > General tab, and under the Application section change the icon size from 16 to 24.
  2. In 3.10, when new files are generated from geoprocessing operations and added to your project, the layers appear in the layers panel with the name you give them. In 3.4 they were assigned generic aliases like “Clipped” and “Buffer” based on the process you ran.
  3. In 3.10 the “Quantiles” classification scheme has been replaced with “Equal Counts”. Same scheme, different terminology.
  4. There’s now a dedicated north arrow button in the map layout screen. In 3.4 and earlier versions you added an arrow by selecting the add image button.
  5. In 3.10, every time you add a layer with a CRS that doesn’t match the existing CRS of the window, you’re presented with a datum transformation screen to modify the file you’re adding. This is a helpful warning if you already have existing layers in your project that match the window and your new file doesn’t, but it’s annoying when you’re trying to add files to a blank window in a new project. You can turn this feature off under: Options > Settings > CRS tab, under Default Datum Transformations uncheck the box for Ask for datum transformation.

It’s hard to believe that this is the 10th edition I’ve published in the past ten years. QGIS has certainly come a long way during that time. For a trip down memory lane, look at the 1st edition I wrote for QGIS 1.5 Tethys in 2011! Back then I wrote the whole thing in HTML… thankfully I “discovered” LaTeX a year later, and have used it for writing tutorials ever since.

If you wanted to learn GIS in general and QGIS in particular, spend a day with the manual and work through the exercises and you’ll have a good foundation. All the basics are there, as well as best practices and the “gotchas” that tend to trip people up.

Percentage of Children in Households Without the Internet

Kids with No Internet at Home: Data Processing for US Census Mapping

In this post I’ll demonstrate some essential data processing steps prior to joining census American Community Survey (ACS) tables downloaded from data.census.gov to TIGER shapefiles, in order to create thematic maps. I thought this would be helpful for students in my university who are now doing GIS-related courses from home, due to COVID-19. I’ll illustrate the following with Excel and QGIS: choosing an appropriate boundary file for making your map, manipulating geographic id codes (GEOIDs) to insure you can match data file to shapefile, prepping your spreadsheet to insure that the join will work, and calculating new summaries and percent totals with ACS formulas. Much of this info is drawn from the chapters in my book that cover census geography (chapter 3), ACS data (chapter 6), and GIS (chapter 10). I’m assuming that you already have some basic spreadsheet, GIS, and US census knowledge.

For readers who are not interested in the technical details, you still may be interested in the map we’ll create in this example: how many children under 18 lack access to a computer with internet access at home? With COVID-19 there’s a sudden expectation that all school children will take classes remotely from home. There are 73.3 million children living in households in the US, and approximately 9.3 million (12.7%) either have no computer at home, or have a computer but no internet access. The remaining children have a computer with either broadband or dial-up at home. Click on the map below to explore the county distribution of the under 18 population who lack internet access at home, or follow this link: https://arcg.is/0TrGTy.

arcgis_webmap

Click on the Map to View Full Screen and Interact

Preliminaries

First, we need to get some ACS data. Read this earlier post to learn how to use data.census.gov (or for a shortcut download the files we’re using here). I downloaded ACS table B28005 Age by Presence of a Computer and Types of Internet Subscription in Household at the county-level. This is one of the detailed tables from the latest 5-year ACS from 2014-2018. Since many counties in the US have less than 65,000 people, we need to use the 5-year series (as opposed to the 1-year) to get data for all of them. The universe for this table is the population living in households; it does not include people living in group quarters (dormitories, barracks, penitentiaries, etc.).

Second, we need a boundary file of counties. You could go to the TIGER Line Shapefiles, which provides precise boundaries of every geographic area. Since we’re using this data to make a thematic map, I suggest using the Cartographic Boundary Files (CBF) instead, which are generalized versions of TIGER. Coastal water has been removed and boundaries have been smoothed to make the file smaller and less detailed. We don’t need all the detail if we’re making a national-scale map of the US that’s going on a small screen or an 8 1/2 by 11 piece of paper. I’m using the medium (5m) generalized county file for 2018. Download the files, put them together in a new folder on your computer, and unzip them.

TIGER Line shapefile

TIGER Line shapefile

CBF shapefile

CBF shapefile

GEOIDs

Downloads from data.census.gov include three csv files per table that contain: the actual data (data_with_overlays), metadata (list of variable ids and names), and a description of the table (table_title). There are some caveats when opening csv files with Excel, but they don’t apply to this example (see addendum to this post for details). Open your csv file in Excel, and save it as an Excel workbook (don’t keep it in a csv format).

The first column contains the GEOID, which is a code that uniquely identifies each piece of geography in the US. In my file, 0500000US28151 is the first record. The part before ‘US’ indicates the summary level of the data, i.e. what the geography is and where it falls in the census hierarchy. The 050 indicates this is a county. The part after the ‘US’ is the specific identifier for the geography, known as an ANSI / FIPS code: 28 is the state code for Mississippi, and 151 is the county code for Washington County, MS. You will need to use this code when joining this data to your shapefile, assuming that the shapefile has the same code. Will it?

That depends. There are two conventions for storing these codes; the full code 0500000US28151 can be used, or just the ANSI / FIPS portion, 28151. If your shapefile uses just the latter (find out by adding the shapefile in GIS and opening its attribute table), you won’t have anything to base the join on. The regular 2018 TIGER file uses just the ANSI / FIPS, but the 2018 CBF has both the full GEOID and the ANSI FIPS. So in this case we’re fine, but for the sake of argument if you needed to create the shorter code it’s easy to do using Excel’s RIGHT formula:

Excel formula: RIGHT

The formulas RIGHT, LEFT, and MID are used to return sub-strings of text

The formulas reads X characters from the right side of the value in the cell you reference and returns the result. You just have to count the number of characters up to the “S’ in the “US”. Copy and paste the formula all the way down the column. Then, select the entire column, right click and chose copy, select it again, right click and choose Paste Special and Values (in Excel, the little clipboard image with numbers on top of it). This overwrites all the formulas in the column with the actual result of the formula. You need to do this, as GIS can’t interpret your formulas. Put some labels in the two header spaces, like GEO_ID2 and id2.

Excel: Paste Special

Copy a column, and use Paste Special – Values on top of that column to overwrite formulas with values

Subsets and Headers

It’s common that you’ll download census tables that have more variables than you need for your intended purpose. In this example we’re interested in children (people under 18) living in households. We’re not going to use the other estimates for the population 18 to 64 and 65 and over. Delete all the columns you don’t need (if you ever needed them, you’ve got them saved in your csv as a backup).

Notice there are two header rows: one has a variable ID and the other has a label. In ACS tables the variables always come in pairs, where the first is the estimate and the second is the margin of error (MOE). For example, in Washington County, Mississippi there are 46,545 people living in households +/- 169. Columns are arranged and named to reflect how values nest: Estimate!!Total is the total number of people in households, Estimate!!Total!!Under 18 years is the number people under 18 living in households, which is a subset of the total estimate.

The rub here is that we’re not allowed to have two header rows when we join this table to our shapefile – we can only have one. We can’t keep the labels because they’re too long – once joined, the labels will be truncated to 10 characters and will be indistinguishable from each other. We’ll have to delete that row, leaving us with the cryptic variable IDs. We can choose to keep those IDs – remember we have a separate metadata csv file where we can look up the labels – or we can rename them. The latter is feasible if we don’t have too many. If you do rename them, you have to keep them short, no more than 10 characters or they’ll be truncated. You can’t use spaces (underscores are ok), any punctuation, and can’t begin variables names with a number. In this example I’m going to keep the variable IDs.

Two odd gotchas: first, find the District of Columbia in your worksheet and look at the MOE for total persons in households (variable 001M). There is a footnote for this value, five asterisks *****. Replace it with a zero. Keep an eye out for footnotes, as they wreak havoc. If you ever notice that a numeric column gets saved as text in GIS, it’s probably because there’s a footnote somewhere. Second, change the label for the county name from NAME to GEO_NAME (our shapefile already has a column called NAME, and it will cause problems if we have duplicates). If you save your workbook now, it’s ready to go if you want to map the data in it. But in this example we have some more work to do.

Create New ACS Values

We want to map the percentage of children that do not have access to either a computer or the internet at home. In this table these estimates are distinct for children with a computer and no internet (variable 006), and without a computer (variable 007). We’ll need to aggregate these two. For most thematic maps it doesn’t make sense to map whole counts or estimates; naturally places that have more people are going to have more computers. We need to normalize the data by calculating a percent total. We could do this work in the GIS package, but I think it’s easier to use the spreadsheet.

To calculate a new estimate for children with no internet access at home, we simply add the two values together (006_E and 007_E). To calculate a new margin of error, we take the square root of the sum of the squares for the MOEs that we’re combining (006_M and 007_M). We also use the ROUND formula so our result is a whole number. Pretty straightforward:

Excel Sum of Squares

When summing ACS estimates, take the square root of the sum of the squares for each MOE to calculate a MOE for the new estimate.

To calculate a percent total, divide our new estimate by the number of people under 18 in households (002_E). The formula for calculating a MOE for a percent total is tougher: square the percent total and the MOE for the under 18 population (002_M), multiply them, subtract that result from the MOE for the under 18 population with no internet, take the square root of that result and divide it by the under 18 population (002_E):

MOE for percentage

The formula for calculating the MOE for a proportion includes: the percentage, MOE for the subset population (numerator), and the estimate and MOE for the total population (denominator)

In Washington County, MS there are 3,626 +/- 724 children that have no internet access at home. This represents 29.4% +/- 5.9% of all children in the county who live in a household. It’s always a good idea to check your math: visit the ACS Calculator at Cornell’s Program for Applied Demographics and punch in some values to insure that your spreadsheet formulas are correct.

You should scan the results for errors. In this example, there is just one division by zero error for Kalawao County in Hawaii. In this case, replace the formula with 0 for both percentage values. In some cases it’s also possible that the MOE proportion formula will fail for certain values. Not a problem in our example, but if it does the solution is to modify the formula for the failed cases to calculate a ratio instead. Replace the percentage in the formula with the ratio (the total population divided by the subset population) AND change the minus sign under the square root to a plus sign.

Some of these MOE’s look quite high relative to the estimate. If you’d like to quantify this, you can calculate a coefficient of variation for the estimate (not the percentage). This formula is straightforward: divide the MOE by 1.645, divide that result by the estimate, and multiply by 100:

Calculate coefficient of variation

A CV can be used to gauge the reliability of an estimate

Generally speaking, a CV value between 0-15 indicates that as estimate is highly reliable, 12-34 is of medium reliability, and 35 and above is low reliability.

That’s it!. Make sure to copy the columns that have the formulas we created, and do a paste-special values over top of them to replace the formulas with the actual values. Some of the CV values have errors because of division by zero. Select the CV column and do a find and replace, to find #DIV/0! and replace it with nothing. Then save and close the workbook.

For more guidance on working with ACS formulas, take a look at this Census Bureau guidebook, or review Chapter 6 in my book.

Add Data to QGIS and Join

In QGIS, we select the Data Source Manager buttonQGIS Data Source Manager, and in the vector menu add the CBF shapefile. All census shapefiles are in the basic NAD83 system by default, which is not great for making a thematic map.  Go to the Vector Menu – Data Management Tools – Reproject Layer. Hit the little globe beside Target CRS. In the search box type ‘US National’, select the US National Atlas Equal Area option in the results, and hit OK. Lastly, we press the little ellipses button beside the Reprojected box, Save to File, and save the file in a good spot. Hit Run to create the file.

In the layers menu, we remove the original counties file, then select the new one (listed as Reprojected), right click, Set CRS, Set Project CRS From Layer. That resets our window to match the map projection of this layer. Now we have a projected counties layer that looks better for a thematic map. If we right click the layer and open its attribute table, we can see that there are two columns we could use for joining: AFFGEOID is the full census code, and GEOID is the shorter ANSI / FIPS.

Hit the Data Source Manager button again, stay under the vector menu, and browse to add the Excel spreadsheet. If our workbook had multiple sheets we’d be prompted to choose which one. Close the menu and we’ll see the table in the layers panel. Open it up to insure it looks ok.

To do a join, select the counties layer, right click, and choose properties. Go to the Joins tab. Hit the green plus symbol at the bottom. Choose the spreadsheet as the join layer, GEO_ID as the join field in the spreadsheet, and AFFGEOID as the target field in the counties file. Go down and check Custom Field Name, and delete what’s in the box. Hit OK, and OK again in the Join properties. Open the attribute table for the shapefile, scroll over and we should see the fields from the spreadsheet at the end (if you don’t, check and verify that you chose the correct IDs in the join menu).

QGIS Join Menu

QGIS Map

We’re ready to map. Right click the counties and go to the properties. Go to the Symbology tab and flip the dropdown from Single symbol to Graduated. This lets us choose a Column (percentage of children in households with no internet access) and create a thematic map. I’ve chosen Natural Breaks as the Mode and changed the colors to blues. You can artfully manipulate the legend to show the percentages as whole numbers by typing *100 in the Column box beside the column name, and adding a % at the end of the Legend format string. I also prefer to alter the default settings for boundary thickness: click the Change button beside Symbol, select Simple fill, and reduce the width of the boundaries from .26 to .06, and hit OK.

QGIS Symbology Menu

There we have a map! If you right click on the counties in the layers panel and check the Show Feature Count box, you’ll see how many counties fall in each category. Of course, to make a nice finished map with title, legend, and inset maps for AK, HI, and PR, you’d go into the Print Layout Manager. To incorporate information about uncertainty, you can add the county layer to your map a second time, and style it differently – maybe apply crosshatching for all counties that have a CV over 34. Don’t forget to save your project.

QGIS Map

Percentage of Children in Households without Internet Access by County 2014-2018

How About that Web Map?

I used my free ArcGIS Online account to create the web map at the top of the page. I followed all the steps I outlined here, and at the end exported the shapefile that had my data table joined to it out as a new shapefile; in doing so the data became fused to the new shapefile. I uploaded the shapefile to ArcGIS online, chose a base map, and re-applied the styling and classification for the county layer. The free account includes a legend editor and expression builder that allowed me to show my percentages as fractions of 100 and to modify the text of the entries. The free account does not allow you to do joins, so you have to do this prep work in desktop GIS. ArcGIS Online is pretty easy to learn if you’re already familiar with GIS. For a brief run through check out the tutorial Ryan and I wrote as part of my lab’s tutorial series.

Addendum – Excel and CSVs

While csv files can be opened in Excel with one click, csv files are NOT Excel files. Excel interprets the csv data (plain text values separated by commas, with records separated by line breaks) and parses it into rows and columns for us. Excel also makes assumptions about whether values represents text or numbers. In the case of ID codes like GEOIDs or ZIP Codes, Excel guesses wrong and stores these codes as numbers. If the IDs have leading zeros, the zeros are dropped and the codes become incorrect. If they’re incorrect, when you join them to a shapefile the join will fail. Since data.census.gov uses the longer GEOID this doesn’t happen, as the letters ‘US’ are embedded in the code, which forces Excel to recognize it as text. But if you ever deal with files that use the shorter ANSI / FIPS you’ll run into trouble.

Instead of clicking on csvs to open them in Excel: launch Excel to a blank workbook, go to the data ribbon and choose import text files, select your csv file from your folder system, indicate that it’s a delimited text file, and select your ID column and specify that it’s text. This will import the csv and save it correctly in Excel.

Map of Windham High Peak hike

From Survey Markers to GPS Coordinates

Here’s a fun post to close out the year. During GIS-based research consultations, I often help people understand the importance of coordinate reference systems (or spatial reference systems if you prefer, aka “map projections”). These systems essentially make GIS “work”; they are standards that allow you to overlay different spatial layers. You transform layers from one system to another in order to get them to align, perform specific operations that require a specific system, or preserve one aspect of the earth’s properties for a certain analysis you’re conducting or a map you’re making.

Wrestling with these systems is a conceptual issue that plays out when dealing with digital data, but I recently stumbled across a physical manifestation purely by accident. During the last week of October my wife and I rented a tiny home up in the Catskill Mountains in NY State, and decided to go for a day hike. The Catskills are home to 35 mountains known collectively as the Catskill High Peaks, which all exceed 3,500 feet in elevation. After consulting a thorough blog on upstate walks and hikes (Walking Man 24 7), we decided to try Windham High Peak, which was the closest mountain to where we were staying. We were rewarded with this nice view upon reaching the summit:

View from Summit of Windham High Peak

While poking around on the peak, we discovered a geodetic survey marker from 1942 affixed to the face of a rock. These markers were used to identify important topographical features, and to serve as control points in manual surveying to measure elevation; this particular marker (first pic below) is a triangulation marker that was used for that purpose. It looks like a flat, round disk, but it’s actually more like the head of a large nail that’s been driven into the rock. A short distance away was a second marker (second pic below) with a little arrow pointing toward the triangulation marker. This is a reference marker, which points to the other marker to help people locate it, as dirt or shrubbery can obscure the markers over time. Traditional survey methods that utilized this marker system were used for creating the first detailed sets of topographic maps and for establishing what the elevations and contours were for most of the United States. There’s a short summary of the history of the marker’s here, and a more detailed one here. NOAA provides several resources for exploring the history of the national geodetic system.

Triangulation Survey Marker

Triangulation Survey Marker

Reference Survey Marker

Reference Survey Marker

When we returned home I searched around to learn more about them, and discovered that NOAA has an app that allows you to explore all the markers throughout the US, and retrieve information about them. Each data sheet provides the longitude and latitude coordinates for the marker in the most recent reference system (NAD 83), plus previous systems that were originally used (NAD 27), a detailed physical description of the location (like the one below), and a list of related markers. It turns out there were two reference markers on the peak that point to the topographic one (we only found the first one). The sheet also references a distant point off of the peak that was used for surveying the height (the azimuth mark). There’s even a recovery form for submitting updated information and photographs for any markers you discover.

NA2038’DESCRIBED BY COAST AND GEODETIC SURVEY 1942 (GWL)
NA2038’STATION IS ON THE HIGHEST POINT AND AT THE E END OF A MOUNTAIN KNOWN
NA2038’AS WINDHAM HIGH PEAK. ABOUT 4 MILES, AIR LINE, ENE OF HENSONVILLE
NA2038’AND ON PROPERTY OWNED BY NEW YORK STATE. MARK, STAMPED WINDHAM
NA2038’1942, IS SET FLUSH IN THE TOP OF A LARGE BOULDER PROTRUDING
NA2038’ABOUT 1 FOOT, 19 FEET SE OF A LONE 10-INCH PINE TREE. U.S.
NA2038’GEOLOGICAL SURVEY STATION WINDHAM HIGH PEAK, A DRILL HOLE IN A
NA2038’BOULDER, LOCATED ON THIS SAME MOUNTAIN WAS NOT RECOVERED.

For the past thirty plus years or so we’ve used satellites to measure elevation and topography.  I used my new GPS unit on this hike; I still chose a simple, bare-bones model (a Garmin eTrex 10), but it was still an upgrade as it uses a USB connection instead of a clunky serial port. The default CRS is WGS 84, but you can change it to NAD 83 or another geographic system that’s appropriate for your area. By turning on the tracking feature you can record your entire route as a line file. Along the way you can save specific points as way points, which records the time and elevation.

Moving the data from the GPS unit to my laptop was a simple matter of plugging it into the USB port and using my operating system’s file navigator to drag and drop the files. One file contained the tracks and the other the way points, stored in a Garmin format called a gpx file (a text-based XML format). While QGIS has a number of tools for working with GPS data, I didn’t need to use any of them. QGIS 3.4 allows you to add gpx files as vector files. Once they’re plotted you can save them as shapefiles or geopackages, and in the course of doing so reproject them to a projected coordinate system that uses meters or feet. I used the field calculator to add a new elevation column to the way points to calculate elevation in feet (as the GPS recorded units in meters), and to modify the track file to delete a line; apparently I turned the unit on back at our house and the first line connected that point to the first point of our hike. By entering an editing mode and using the digitizing tool, I was able to split the features, delete the segments that weren’t part of the hike, and merge the remaining segments back together.

Original plot with line mistake

Original way points and track plotted in QGIS, with erroneous line

Using methods I described in an earlier post, I added a USGS topo map as a WMTS layer for background and modified the symbology of the points to display elevation labels, and voila! We can see all eight miles of our hike as we ascended from a base of 1,791 to a height of 3,542 feet (covering 1,751 feet from min to max). We got some solid exercise, were rewarded with some great views, and experienced a mix of old and new cartography. Happy New Year – I hope you have some fun adventures in the year to come!

Map of Windham High Peak hike

Stylized way points with elevation labels and track displayed on top of USGS topo map in QGIS