climate

Raster Temperature Jan 1, 2020 Southern NE

Summarizing Raster Data for Areas and Assigning Values to Points

It’s been a busy few months, but I have a few days to catch my breath now that it’s spring break and most people (except me) have gone away! One question that’s come up quite a bit this semester is how to associate raster data with coinciding vector data. I’ll summarize some approaches in this post using ArcGIS Pro and QGIS, to summarize raster values for polygons (zonal statistics) and to assign raster values to points (aka raster sampling).

Zonal Statistics: Summarize Rasters by Area

Imagine that you have quantitative values such as temperature or a vegetation index in a raster grid, and you want to use this data to calculate an average for counties or metro areas. The goal is to have a new attribute column in the vector layer that contains the summarized raster value, perhaps because you want to make thematic maps of that value, or you want to use it in conjunction with other variables to run spatial statistics, or you just want a plain and simple summary for given places.

The term zonal statistics is used to define any operation that calculates statistics on cell values of a raster within an area or zone defined by another dataset, either a raster or a vector. The ArcGIS Pro toolbox has a Zonal Statistics tool where the output is a new raster file with cells that are summarized by the input zones. That’s not desirable for the use case I’m presenting here; the better choice is the Zonal Statistics as Table tool. The output is a table containing the unique identifiers of the raster and vector, the summary stats you’ve generated (average, sum, min, max, etc), and a count of the number of cells used to generate the summary. You can join this resulting table back to the vector file using their common unique identifier in a table join.

In the example below, I’m using counties from the census TIGER files for southern New England as my Input Feature Zone, the AFFGEOID (Census ANSI / FIPS code) to identify the Zone Field, and a temperature grid for January 1, 2020 from PRISM as the Input Value Raster. I’m calculating the mean temperature for the counties on that day.

ArcGIS Zonal Statistics as Table Tool
ArcGIS Pro Zonal Statistics as Table; Temperature Grid and Southern New England Counties

The output table consists of one record for each zone / county, with the count of the cells used to create the average, and the mean temperature (in degrees Celsius). This table can be joined back to the original vector feature (select the county feature in the Contents, right click, Joins and Relates – Join) to thematically map the average temp.

ArcGIS Zonal Statistics Result
ArcGIS Pro Zonal Statistics; Table Output and Join to Show Average Temperature per County

In QGIS, this tool is simply called Zonal Statistics; search for it in the Processing toolbox. The vector with the zones is the Input layer, and the Raster layer is the grid with the values. By default the summary stats are the count, sum, and mean, but you can check the Statistics to calculate box to select others. Unlike ArcGIS, QGIS allows you to write output as a table or a new shapefile / geopackage, which carries along the feature geometry from the Input zones and adds the summaries, allowing you to skip the step of having to do a table join (if you opted to create a table, you could join it to the zones using the Joins tab under the Properties menu for the vector features).

QGIS Zonal Stats
QGIS Zonal Statistics

Extract Raster Values for Point Features

Zonal stats allows you to summarize raster data within a polygon. But what if you had point features, and wanted to assign each point the value of the raster cell that it falls within? In ArcGIS Pro, search the toolbox for the Extract Values to Points tool. You select your input points and raster, and a new point feature that will include the raster values. The default is to take the value for the cell that the point falls within, but there is an Interpolate option that will calculate the value from adjacent cells. The output point feature contains a new column called RASTERVALU. I created some phony point data and used it to generate the output below.

ArcGIS Extract Values to Points
ArcGIS Pro Extract Values to Points (assign raster cell values to points)

In QGIS the name of this tool is Sample raster values, which you can find in the Processing toolbox. Input the points, choose a raster layer, and write the output to a new vector point file. Unlike ArcGIS, there isn’t an option for interpolation from surrounding cells; you simply get the value for the cell that the point falls within. If you needed to interpolate, you can go to the Plugins menu, enable the SAGA plugin, and in the Processing toolbox try the SAGA tool Raster Values to Points instead.

QGIS Sample Raster Values
QGIS Sample Raster Values (assign raster cell values to points)

A variation on this theme would be to create and assign an average value around each point at a given distance, such as the average temperature within five miles. One way to achieve this would be to use the buffer tools in either ArcGIS or QGIS to create distinct buffers around each point at the specified distance. The buffer will automatically carry over all the attributes from the point features, including unique identifiers. Then you can run the zonal statistics tools against the buffer polygons and raster to compute the average, and if need be do a table join between the output table and the original point layer using their common identifier.

Wrap-up

In using any of these tools, it’s important to consider the resolution of the raster (i.e. the size of the grid cell):

1. Relative to the size of the zonal areas or number of points, and

2. In relation to the phenomena that you’re studying.

When larger grid cells or zonal areas are used for measurement, any phenomena becomes more generalized, and any variations within these large areas become masked. The temperature grid cells in this example had a resolution of 2.5 miles, which was suitable for creating county summaries. Summarizing data for census tracts at this resolution would be less ideal, as the tracts are much smaller than the cells, with the cell value characterizing a much larger area. This might be okay in the case of temperature, which tends not to vary considerably over a distance of a few miles. In contrast, averaging temperature data for states is not worthwhile, as states vary considerably in size and most are large enough that they contain multiple ecosystems and elevation levels.

The solutions I’ve described here are the desktop GIS solutions. You could also use either spatial SQL in a geodatabase or a spatial extension in a scripting language like Python or R to perform similar operations. In both cases a basic overlay and intersection statement is used, in conjunction with some grouping function for calculating summaries. I’ve been doing a lot more spatial Python work with geopandas these past few months – perhaps a topic for a subsequent post…

Hurricanes 2021

GIS Data for US Coastal Storms and Floods

Over the course of this academic year I’ve helped many students find GIS data related to coastal storms and flooding in the US. There’s a ton of data available, particularly from NOAA, but there are so many projects and initiatives that it can be tough to find what you’re looking for. So I’ll share a few key resources here.

NOAA’s DigitalCoast is a good place to start; it’s a catalog of federal, state, and US territory projects and websites that provide both spatial and non-spatial datasets related to coastal storms and flooding. You can filter by place and data type; there are even a few global sources. Most of the projects I mention below are cataloged there.

Given the size of many of these datasets, the ArcGIS File Geodatabase is often used for packaging and distribution. Once you’ve downloaded and unzipped one, it looks like a folder with lots of subfolders and files. If you’re an ArcGIS user, use the Catalog pane to browse your file system and add a connection to the database / folder to access its contents. If you’re a QGIS user, use the Data Manager and on the Vector tab change the source type from File to Directory. In the Source Type dropdown you can choose OpenFileGDB, and browse and select the database, which appears as a folder. Once you hit the Add button, you’ll be prompted to choose the features in the DB that you wish to add to the project.

Adding a File Geodatabase in QGIS
Adding a File Geodatabase in QGIS

FEMA Flood Hazards and Disasters

The FEMA flood maps are usually the first thing that comes to mind when folks set out to find data on flooding, but good luck finding their GIS data. I’ve searched through their main program site for the National Flood Hazard Layer and followed every link, but can’t for the life of me find the connection to the page that has actual GIS data; there are map viewer tools, scanned paper maps, web mapping services, and everything else under the sun.

If you want FEMA flood data in a GIS format: GO HERE! This is the record in data.gov for the National Flood Hazard Layer. The links at the bottom include this one: Download Seamless Nationwide NFHL GIS data. The data is packaged in an ArcGIS File Geodatabase, with one polygon feature class for flood zones. They’re categorized into 100 and 500 year zones, open water bodies, areas outside of flood zones, and areas outside flood zones protected by levees. The pic below illustrates 100 and 500 year zones overlaid on the OpenTopoMap.

FEMA Flood Maps. Light blue areas are 500 year zones, dark blue are 100 year
FEMA Flood Hazard Layer, 100 year zones in dark blue, 500 year in light blue

FEMA also has a GIS data feed for current and historical emergencies and disasters, that are available in a variety of formats both spatial and non-spatial. These are county-level layers that indicate where disaster areas were declared and what kind of funding or assistance is / was available.

NOAA Sea Level Rise

The FEMA maps assess both past events and current conditions to model the likelihood of flooding in a 100 or 500 year period for a major storm event. A different way of looking at flooding is to consider sea level rise due to climate change, where the impact of sea level rise is measured in different increments. Instead of the impact of a one-shot event, this illustrates potential long term change. NOAA’s Sea Level Rise (SLR) viewer allows you to easily visualize the impact of sea level rise in 1 foot increments, between 1 and 10 feet. You can download the data by US state or territory for coastal areas. There are separate downloads for sea level rise, rise depth, the confidence intervals for the models, as well as DEMs and flood frequency. The sea level rise data is package in an ArcGIS file geodatabase, with two sets of files (a low estimate and high estimate) in one foot increments. An example of 6 feet in sea level rise is shown below.

NOAA Sea Level Rise 6ft Layer
NOAA Sea Level Rise. Areas in pink illustrate sea level 6 feet higher than present

NOAA National Hurricane Center

Beyond showing the general impact of flooding or sea level rise, you can also look at the track of individual hurricanes and tropical storms. The National Hurricane Center’s GIS data page provides historical forecasts – the projected path and cone of storms, windspeeds, storm surges, etc. You choose your year, then can choose a storm, and then a particular day. You can use this data to see how the forecasts evolved as the storm moved. When we’re in hurricane season, you can also see what the circumstances are day by day for tracking new storms.

If you want to see what actually happened (as opposed to a forecast), you can dig through the data page and browse the different options. There’s the Tropical Cyclone Report (TCR) which provides “information on each tropical cyclone, including synoptic history, meteorological statistics, casualties and damages, and the post-analysis best track (six-hourly positions and intensities). Tropical cyclones include depressions, storms and hurricanes.” The default page shows you the Atlantic, but you can swap to Eastern or Central Pacific using the link at the top. Storms are listed alphabetically (and thus by date) and your format options are shapefile or KML. There’s a map at the bottom that depicts and labels all the storms for that season. You actually get four shapefiles in a download; a point file that contains a number of measurements, a line file for the storm track, a polygon file for the radius of the storm, and another polygon with the wind swath. The layers for 2021’s Tropical Storm Henri are illustrated below.

NOAA Tropical Cyclone Report Layers
Layers from NOAA”s NHC Tropical Cyclone Report, Tropical Storm Henri 2021

GIS data for the storms begins in 2010 with KMZ files (which you’ll need to convert in ArcGIS or QGIS to make them useful beyond display purposes), and shapefiles appear in 2015. Further back in time are just PDF reports and map scans.

If you really want to go back and time and get all the tracks at once, there’s the HURDAT2 database; one for the Atlantic (1851 to present) and another for the Pacific (1949 to present). It’s a csv file that contains coordinates for the track of every storm, which you can process to create a geospatial file using a points to line tool. Or – you can grab a version where that’s already been created! The International Best Track Archive for Climate Stewardship (IBTrACS) keeps a running CSV and shapefile of all global storms. Scroll down and choose shapefile (CSV is another option). The download page is just a list of files – you can choose points or lines, storms by ocean (East Pacific, North Atlantic, North Indian, South Atlantic, South Indian, South Pacific, West Pacific), or grab everything in lists that are: active, everything (ALL), last 3 years, or since 1980. Below is an example of all storms in the North Atlantic – there are quite a lot (see below)! You get storm speed and direction, wind speed and direction, coordinates, and identifiers associated with the storm as points and lines. A subset of this data for the 2021 season is displayed in the feature image at the top of this post.

IBTrACS Historical Hurricane Tracks
Historical hurricane / storm tracks from 1851 to 2021 in the North Atlantic from IBTrACS

How About the Weather?

There are many places you can go for this and the best source depends on the use case. More often than not, I end up using the Local Climatological Database. Choose a geographic type, then a specific area, and you’ll see all the weather stations in this area. Add them to the cart, and then view the cart once you have all the stations you want. On the next screen choose an output format (CSV or TXT fixed width) and a date range. You submit an order and wait a bit for it to be compiled, and are notified by email when it’s ready for download. Mixed in this CSV are records that are monthly, daily, and hourly, so after downloading you’ll want to extract just the period you’re interested in. Data includes temperature, precipitation, dew point, wind speed and direction, humidity, barometric pressure, and cloud cover.

NOAA Local Climatological Data Map Tool
Map Tool search interface for NOAA Local Climatological Data

Some processing is required to make these files GIS ready. Each record represents an observation at a station at a given point in time, so if you plot these “as is” the likely idea is you’re making an illustrated time series of some sort, as you’ll have tons of observations plotted on a few spots (where the stations are). If this isn’t desirable, then you’ll filter records to create extracts for just a given point in time, maybe separate features for each time period. For monthly summaries you can pivot time to columns, to create a column for each month and indicator. This would be impractical for daily or hourly summaries, unless you’re focusing on a single month for the former or day / week for the latter (otherwise you’ll have a bazillion columns).

Annoyingly, the CSV option doesn’t include any of the station information in the download (like the standard WBAN ID, name, longitude, latitude, and elevation) except for one unique identifier. I know that this information was all included in the past, and am not sure why it was dropped. The TXT version includes the station info, but fixed-width files are a pain to work with. If you are working with a small number of stations, you can pull the station info individually by previewing the station on the download screen (click on the station title or little eye symbol). The five digit WBAN number is included as the last 5 digits of the identifier in the CSV, so you can identify and relate each one. If you don’t want to mess with copying and pasting, you can generate a second extract for all the stations for just a single day and download that in the TXT format, and then parse just the station columns and associate them with your main table.

There are multiple ways that you can create extracts for this data beyond the example I just provided, available from the main data tools page. For a more refined search you can select the summary period (yearly, monthly, daily, hourly) and targeted variables in advance. There are also FTP options for bulk downloads.

One thing that surprises folks who are new to working with this data, is that there aren’t many weather stations. For the LCD, my home state of Delaware only has three, one in each county. The entire City of New York only has three as well, at each of the airports and one in Central Park. If you’re not interested in points and want areas, then you would need to gather a significant number of stations and do interpolation. Or – use data that’s already modeled. I mentioned PRISM at Oregon State in a previous post, as a nice source for national US rasters of temperature and precipitation that you can generate for dailies, monthlies, and normals.