housing

Census Bureau Website Shutdown

Alternative Sources for US Census Data

Just when we thought the US government couldn’t possibly become more dysfunctional, it shut down completely on Sept 30, 2025. Government websites are not being updated, and many have gone offline. I’ve had trouble accessing data.census.gov; access has been intermittent, and sometimes it has worked with some web browsers but not with others.

In this post I’ll summarize some solid, alternative portals for accessing US census data. I’ve sorted the free resources from the simplest and most basic to the most exhaustive and complex, and I mention a couple of commercial sources at the end. These are websites; the Census Bureau’s API is still working (for now), so if you are using scripts that access its API or use R packages like tidycensus you should still be in business.

Free and Public

Census Reporter
https://censusreporter.org/
Focus: the latest American Community Survey (ACS) data

A non-profit project originally created by journalists, the Census Reporter provides just the most recent ACS data, making it easy to access the latest statistics. Search for a place to get a broad profile with interactive summaries and charts, or search for a topic to download specific tables that include records for all geographies of a particular type, within a specific place. There are also basic mapping capabilities.

Census Reporter Showing ACS Data for Wilmington, DE

Missouri Census Data Center Profiles and Trends
https://mcdc.missouri.edu/
Focus: data from the ACS and decennial profile tables for the entire US

The Census Bureau publishes four profile tables for the ACS and one for the decennial census that are designed to capture a wide selection of variables that are of broad interest to most researchers. The MCDC makes these readily available through a simple interface where you select the time period, summary level, and up to four places to compare in one table, which you can download as a spreadsheet. There are also several handy charts, and separate applications for studying short term trends. Access the apps from the menu on the right-hand side of the page.

Missouri Census Data Center ACS Profiles Showing Data for Providence and Newport, RI

State and Local Government Data Pages
Focus: extracts and applications for that particular area

Hundreds of state, regional, county, and municipal governments create extracts of census data and republish them on their websites, to provide local residents with accessible summaries for their jurisdictions. In most cases these are in spreadsheets or reports, but some places have rich applications, and may recompile census data for geographies of local interest such as neighborhoods. Search for pages for planning agencies, economic development orgs, and open data portals. New York City is a noteworthy example; not only do they provide detailed spreadsheets, they also have the excellent map-based Population FactFinder application. Fairfax County, VA provides spreadsheets, reports, an interactive map, and spreadsheet tools and macros that facilitate working with ACS data.

NYC Population Factfinder Showing ACS Data for Inwood in Northern Manhattan

IPUMS NHGIS
https://www.nhgis.org/
Focus: all contemporary and historic tables and GIS boundary files for the ACS and decennial census

If you need access to everything, this is the place to go. The National Historic Geographic Information System uses an interface similar to the old American Factfinder (or the Advanced Search for data.census.gov). Choose your dataset, survey, year, topic, and geographies, and access all the tables as they were originally published. There is also a limited selection of historical comparison tables (which I’ve written about previously). Given the volume of data, the emphasis is on selecting and downloading the tables; you can see variable definitions, but you can’t preview the statistics. This is also your best option to download GIS boundary files, past and present. You must register to use NHGIS, but accounts are free and the data is available for non-commercial purposes. For users who prefer scripting, there is an API.

IPUMS NHGIS Filtered to Show County Data on Age from the 1990 Census

MCDC Uexplore / Dexter
https://mcdc.missouri.edu/applications/uexplore.html
Focus: create targeted extracts of ACS data and the decennial census back to 1980

Unlike other applications where you download data that’s prepackaged in tables, Uexplore allows you to create targeted, customized extracts where you can pick and choose variables from multiple tables. While the interface looks daunting at first, it’s not bad once you get the hang of it, and it offers tremendous flexibility and ample documentation to get you started. This is a good option for folks who want customized extracts, but are not coders or API users.

Portion of the Filter Interface for MCDC Uexplore / Dexter

Commercial Products

There are some commercial products that are quite good; they add value by bundling data together and utilizing interactive maps for exploration, visualization, and access. The upsides are they are feature rich and easy to use, while the downsides are they hide the fuzziness of ACS estimates by omitting margins of error (making it impossible to gauge reliability), and they require a subscription. Many academic libraries, as well as a few large public ones, do subscribe, so check the list of library databases at your institution to see if they subscribe (the links below take you to the product website, where you can view samples of the applications).


PolicyMap
https://www.policymap.com/
Focus: mapping contemporary census and US government data

PolicyMap bundles 21st century census data, datasets from several government agencies, and a few proprietary series, and lets you easily create thematic maps. You can generate broad reports for areas or custom regions you define, and can download comparison tables by choosing a variable and selecting all geographies within a broader area. It also incorporates some basic analytical GIS functions, and enables you to upload your own coordinate point data.

PolicyMap Displaying ACS Income Data for Providence, RI

Social Explorer
https://www.socialexplorer.com/
Focus: mapping contemporary and historic US census data

Social Explorer allows you to effortlessly create thematic maps of census data from 1790 to the present. You can create a single map, side by side maps for showing comparisons over time, and swipe maps to move back and forth from one period to the other to identify change. You can also compile data for customized regions and generate a variety of reports. There is a separate interface for downloading comparison tables. Beyond the US demographic module are a handful of modules for other datasets (election data for example), as well as census data for other countries, such as Canada and the UK.

Social Explorer Map Displaying ACS Migration Data for Providence, RI

Providence Census Geography Map

Crosswalking Census Data to Neighborhood Geographies

Last semester we completed a project to create a crosswalk between census geographies and local geographies in Providence, RI. Crosswalks are used for relating two disparate sets of geography, so that you can compile data that’s published for one set of geography in another. Many cities have locally-defined jurisdictions like wards or community districts, as well as informally defined areas like neighborhoods. When you’re working with US Census data, you use small statistical areas that the Bureau defines and publishes data for; blocks, block groups, census tracts, and perhaps ZCTAs and PUMAs. A crosswalk allows you to apportion data that’s published for census areas, to create estimates for local areas (there are also crosswalks that are used for relating census geography that changes over time, such as the IPUMS crosswalks).

How the Crosswalk Works

For example, in the Providence Census Geography Crosswalk we have two crosswalks that allow you to take census tract data, and convert it to either neighborhoods or wards. I’ll refer to the neighborhoods in this post. In the crosswalk table, there is one record for each portion of a tract that overlaps a neighborhood. For each record, there are attribute columns that indicate the count and the percentage of a tract’s population, housing units, land area, and total area that fall within a given neighborhood. If a tract appears just once in the table, that means it is located entirely within one neighborhood. In the image below, we see that tract 1.01 appears in the table once, and its population percentage is 1. That means that it falls entirely within the Washington Park neighborhood, and 100% of its population is in that neighborhood. In contrast, tract 1.02 appears in the table twice, which means it’s split between two neighborhoods. Its pct_pop column indicates that 31.5% of its population is in South Elmwood, while 68.5% is in Washington Park. The population count represents the number of people from that tract that are in that neighborhood.

Looking at the map below, we can see that census tract 1.01 falls entirely within Washington Park, and tract 1.02 is split between Washington Park and South Elmwood. To generate estimates for Washington Park, we would sum data for tract 1.01 and the portion of tract 1.02 that falls within it. Estimates for South Elmwood would be based solely on the portion of tract 1.02 that falls within it. With the crosswalk, “portion” can be defined as the percentage of the tract’s population, housing units, land area, or total area that falls within a neighborhood.

The primary purpose of the crosswalk is to generate census data estimates for neighborhoods. You apportion tract data to neighborhoods using an allocation factor (population, housing units, or area) and aggregate the result. For example, if we have a census tract table from the 2020 census with the population that’s 65 years and older, we can use the crosswalk to generate neighborhood-level estimates of the 65+ population. To do that, we would:

  1. Join the data table to the crosswalk using the tract’s unique ID; the crosswalk has both the long and short form of the GEOIDs used by the Census Bureau. So for each crosswalk record, we would associate the 65+ population for the entire tract with it.
  2. Multiply the 65+ population by one of the allocation columns – the percent population in this example. This would give us an estimate of the 65+ population that live in that tract / neighborhood piece.
  3. Group or aggregate this product by the neighborhood name, to obtain a neighborhood-level table of the 65+ population.
  4. Round decimals to whole numbers.

To do the calculations in a spreadsheet, you would import the appropriate crosswalk sheet into the workbook that contains the census data that you want to apportion, so that they appear as separate sheets in the same workbook. In the crosswalk worksheet, use the VLOOKUP formula and reference the GEOID to “join” the census tract data to the crosswalk. The formula requires: cell containing the ID value you wish to look up, the range of cells in a worksheet that you will search through, the number of the column that contains the value you wish to retrieve (column A is 1, Z is 26, etc.), and the parameter “FALSE” to get an exact match. It is assumed that the look up value in the target table (the matching ID) appears in the first column (A).

The tract data is now repeated for each tract / neighborhood segment. Next, use formulas to multiply the allocation percentage (pct_pop in this example) by the census data value (over 65 pop for the entire tract) to create an allocated estimate for each tract / neighborhood piece.

Then you can generate a pivot table (on the Insert ribbon in Excel) where you group and sum that allocated result by neighborhood (neighborhoods as rows, census data as summed values in columns). Final step is to round the estimates.

This process is okay for small projects where you have a few estimates you want to quickly tabulate, but it doesn’t scale well. I’d use a relational database instead; import the crosswalk and census data table into SQLite, where you can easily do a left join, calculated field, and then a group by statement. Or, use the joining / calculating / aggregating equivalents in Python or R.

I used the percentage of population as the allocation factor in this example. If the census data you’re apportioning pertains to housing units, you could use the housing units percentage instead. In any case, there is an implicit assumption that the data you are apportioning has the same distribution as the allocation factor. In reality this may not be true; the distribution of children, seniors, homeowners, people in poverty etc. may vary from the total population’s distribution. It’s important to bear in mind that you’re creating an estimate. If you are apportioning American Community Survey data this process gets more complicated, as the ACS statistics are fuzzy estimates. You’d also need to apportion the margin of error (MOE) and create a new MOE for the neighborhood-level estimates.

The Providence crosswalk has some additional sheets that allow you to go from tracts, ZCTAs, or blocks to neighborhoods or wards. The tract crosswalk is by far the most useful. The ZCTA crosswalk was an exercise in futility; I created it to demonstrate the complete lack of correlation between ZCTAs and the other geographies, and recommend against using it (we also produced a series of maps to visually demonstrate the relationship between all the geographies). There is a limited amount of data published at the block level, but I included it in the crosswalk for another reason…

Creating the Crosswalk

I used census blocks to create the crosswalk. They are the smallest unit of census geography, and nest within all other census geographies. I used GIS to assign each block to a neighborhood or ward based on the geography the block fell within, and then aggregated the blocks into distinct tract / ward and tract / neighborhood combinations. Then I calculated the allocation factors, the percentage of the tract’s total attributes that fell in a particular neighborhood or ward. This operation was straightforward for the wards; the city constructed them using 2020 census blocks, so the blocks nested or fit perfectly within the wards.

The neighborhoods were more complicated, as these were older boundaries that didn’t correspond to the 2020 blocks, and there were many instances where blocks were split between neighborhoods. My approach was to create a new set of neighborhood boundaries based on the 2020 blocks, and then use those new boundaries to create the crosswalk. I began with a spatial join, assigning each block a neighborhood ID based on where the center of the block fell. Then, I manually inspected the borders between each neighborhood, to determine whether I should manually re-assign a block. In almost all instances, blocks I reassigned were unpopulated and consisted of slivers that contained large highways, or blocks of greenspace or water. I struck a balance between remaining as faithful to the original boundaries as possible, while avoiding the separation of unpopulated blocks from a tract IF the rest of the blocks in that tract fell entirely within one neighborhood. In two cases where I had to assign a populated block, I used satellite imagery to determine that the population of the block lived entirely on one side of a neighborhood boundary, and made the assignment accordingly.

In the example below, 2020 tract boundaries are shown in red, 2020 block boundaries are light grey, original neighborhood boundaries are shown with dotted black lines, and reconstituted neighborhoods using 2020 blocks are shown in different colors. The boundaries of Federal Hill and the West End are shifted west, to incorporate thin unpopulated blocks that contain expressways. These empty blocks are part of tracts (10 and 13) that fall entirely within these neighborhoods; so splitting them off to adjacent Olneyville and Silver Lake didn’t make sense (as there would be no population or homes to apportion). Reassigning them doesn’t change the fact that the true boundary between these neighborhoods is still the expressway. We also see an example between Olneyville and Silver Lake where the old neighborhood boundary was just poorly aligned, and in this case blocks are assigned based on where the center of the block fell.

Creating the crosswalk from the ground up with blocks was the best approach for accounting how population is distributed within larger areas. It was primarily an aggregation-based approach, where I could sum blocks that fell within geographies. This approach allowed me to generate allocation factors for population and housing units, since this data was published with the blocks and could be carried along.

Conversely, in GIS 101 you would learn how to calculate the percentage of an area that falls within another area. You could use that approach to create a tract-level crosswalk based on area, i.e. if a tract’s area is split 50/50 between two neighborhoods, we’ll apportion its population 50/50. While this top down approach is simpler to implement, it’s far less ideal because you often can’t assume that population and area are equally distributed. Reconsider the example we began with: 31.5% of tract 1.02’s population is in South Elmwood, while 68.5% is in Washington Park. In contrast, 75.3% of tract 1.02’s land area is in South Elmwood, versus only 24.7% in Washington Park! If we apportioned our census data by area instead of population, we’d get a dramatically different, and less accurate, result. Roger Williams Park is primarily located in the portion of tract 1.02 that falls within Elmwood; it covers a lot of land but includes zero people.

Why can’t we just simply aggregate block-level census data to neighborhoods and skip the whole apportionment step? The answer is that there isn’t much data published at the block level. There’s a small set of tables that capture basic demographic variables as part of the decennial census, and that’s it. There was a sharp reduction in the number of block-level tables in the 2020 census due to new privacy regulations, and the ACS isn’t published at the block-level at all. While you can use the block-level table in the crosswalk to join and aggregate block data, in most cases you’ll need to work with tract-data and apportion it.

I used spatial SQL to create the crosswalks in Spatialite and QGIS , and if you’re interested in seeing all the gory details you can look at the code and spatial database in source folder of the project’s GitHub repo. I always prefer SQL for spatial join and aggregation operations, as I can write a single block of code instead of running 4 or 5 different of desktop GIS tools in a sequence. I’ll be updating the project this semester to include additional geographies (block groups – the level between blocks and tracts), and perhaps an introductory tutorial for using it (there are some basic docs at present).

Map Vacant Housing Units

Census Time Series Tables from NHGIS

I’m often asked about what the best approaches are for comparing US census data over time, to account for changes in census geography and to limit the amount of data processing you have to do in stitching data from different census years together. Census geography changes significantly each decade, and by and large the Census Bureau does not compile and publish historical comparison tables.

My primary suggestion is to use the National Historical Geographic Information System or NHGIS (I’ll mention some additional suggestions at the end of this post). Maintained by IPUMS at the University of Minnesota, NHGIS is the repository for all historic US census summary data from 1790 to present. While most of the data in the archive is published nominally (the format and structure in which the data was originally published), they do publish a set of Time Series Tables that compile multiple years of census data in one table. These tables come in two formats:

  • Nominal tables: the data is published “as is”, based on the boundaries that existed at each point in time. If a geography was added or dropped over the course of the years, it falls in or out of the table in the given year that the change occurred. With a few exceptions, the earliest nominal tables begin with the 1970 census and are published for eight geographies: nation, regions, divisions, states, counties, census tracts, county subdivisions, and places.
  • Standardized tables: the data has been normalized, where a geography for a single time period serves as the basis for all data in the table. The NHGIS is currently using 2010 as the basis, so that data prior and subsequent to 2010 has been modified to fit within the 2010 boundaries. This is achieved by aggregating block or block group data from each period to fit within the 2010 boundaries, and apportioning the data in cases where a block or group is split by a boundary. The earliest standardized tables begin with the 1990 census, and cover the basic 100% count data. Data is published for ten geographies: states, counties, census tracts, block groups, county subdivisions, places, congressional districts (as defined for the 110th-112th Congresses, 2007-2013), core based statistical areas (using 2009 metro area definitions), urban areas, and ZIP Code Tabulation Areas (ZCTAs).

Included in the documentation is a full list of time series tables, and whether they are available in nominal or standardized format. The availability of specific time periods and geographies varies. As of late 2024, the availability of standardized tables that include the 2020 census is currently limited to what was published in the early Public Redistricting Files. This will likely change in the near future to include additional 2020 data, and it’s possible that the standardized geography will eventually switch from 2010 to 2020 geography.

To access the Time Series Tables, you can browse the NHGIS without an account but you’ll need to create one in order to download anything. Once you launch NHGIS click on the Topics filter. In the list of topics, any topic under the Population or Housing category that has a “TS” flag next to it has at least one time series table. In the example below, I’ve used the filters to select census tracts for Geographic Level, 2010 and 2020 for Years, and HousingOccupancy and Vacancy status as my Topic.

NHGIS Time Series flags in topics filter

In the results at the bottom, the original Source Tables from each census are shown in the first tab. The Time Series Tables can be viewed by selecting the adjacent tab. The first two tables in this example are Housing Units by Occupancy Status. Clicking on the name of the tables reveals the variables that are included, and the source for the statistics. The first table is a nominal one that stretches from 1970 to the most recent ACS. The second table is a standardized one that covers 1990 to 2020. I’ve checked both boxes to add these to my cart.

NHGIS Select Time Series Tables

The third tab in the results are GIS Files. If we want to map standardized data, we would choose just the boundaries for the standardized year, as all of the data in the table has been modified to fit these boundaries. If we were mapping nominal data, we would need to download boundary files for each time period and map them separately (unless they were stable geographies like states that haven’t changed since 1970).

We hit the Continue button in the Cart panel when we’re ready to download. By default the extract will only include years and geographies we have filtered for. To add additional years or geos we can add them on this next screen. I’ve modified my list to get all available decennial years for each table. Note that if you’re going to select 5-year ACS data for nominal tables, choose only a few non-overlapping periods. In most cases you can’t filter geographies (i.e. select tracts within a state), you have to take them all. On the final screen you choose your structure; CSV is usually best, as is Time varies by column. Once you submit your request you’ll be prompted to log in if you haven’t already done so. Wait a bit for the extract to compile, then you can download the table and codebook.

NHGIS Select Years for Time Series

A portion of the nominal table is depicted below. This table includes identifiers and labels for each of the census years. The variables follow, ordered by variable and then by year. In this example, occupied housing units from 1970 to 2020 appear in the first block, and vacant units in the second. All the 1970 census tract values for Autauga County, Alabama are blank (as many rural counties in 1970 were un-tracted). We can see that values for census tract 205 run only from 1980 to 2010, with no value for 2020. The tract was split into three parts in 2020, and we see values for tracts 205.01, .02, and .03 appear in 2020. So in the nominal tables, geographies appear and disappear as they are created or destroyed. However, if geographic boundaries change but the name and designation for the geography do not, that geography persists throughout the time series in spite of the change.

Nominal tract table

A portion of the standardized table is below. This table only includes identifiers and labels for the 2010 census, as all data was modified to fit the tract geography of that year. The values for each census year except 2010 are published in triplicate: an estimate, and a lower and upper bound for the estimate. If the values in these three columns differ, it indicates that a block (or block group) was split and reapportioned to fit within the tract boundary for 2010 (you may also see decimals, indicating a split occurred). You’d use the estimate in your work, while the bounds provide some indication of the estimate’s accuracy. Note in this table, tract 205 in Autauga County persists from 1990 to 2020, as it existed in 2010. Data from the three 2020 tracts was aggregated to fit the 2010 boundary.

Standardized tract table

The crosswalk tables that IPUMS used to create the standardized data are available, if you wanted or needed to generate your own normalized data. The best approach is to proceed from the bottom up, aggregating blocks to reformulate the data to the geography you wish to use. Some decennial census data, and all data from the ACS, is not available at the block level, which necessitates using block groups instead.

There are some alternatives for obtaining or creating time series census data, which could fit the bill depending on your use case (esp if you are looking at larger geographies). There’s also reference material that can help you make sense of changes.

  1. The Longitudinal Tract Database at Brown University provides tract-level crosswalks from 1970 to 2020. They also provide some pre-compiled data tables generated from the crosswalk.
  2. For short term comparisons, the ACS includes Comparison Profile Tables for states, counties, places, and metro areas that compare two non-overlapping time periods. For example, here is the 5-year ACS Comparative Demographic Estimates profile for Providence RI in 2022 (compares 2018-2022 with 2013-2017).
  3. Use an interactive mapping tool like the Social Explorer to make side by side comparison maps from two time periods. They also incorporate some of the NHGIS standardized data into their database. (SE is a subscription-based product; if you’re at a university see if your library subscribes).
  4. The Population and Housing Unit Estimates program publishes annual estimates for states, counties, and metro areas in decade by decade spreadsheets. The MCDC has created some easy to use tools for summarizing and charting this data to show annual population change and changes in demographic characteristics.
  5. I had previously written about pulling population and economic time series tables for states, counties, and metro areas from the Bureau of Economic Analysis data portal.
  6. Counties change more often than you think. The Census Bureau has a running list of changes to counties from 1970 to present. Metro areas change frequently too, but since they are built from counties you can aggregate older county data to fit modern metro boundaries. The census provides delineation files that assign counties to metros.

2020 Census Demographic Profile

2020 Census Data Wrap-up

Right before the semester began, I updated the Rhode Island maps on my census research guide so that they link to the recently released Demographic Profile tables from the 2020 Census. I feel like the release of the 2020 census has flown lower on the radar compared to 2010 – it hasn’t made it into the news or social media feeds to the same degree. It has been released much later than usual for a variety of reasons, including the COVID pandemic and political upheaval and shenanigans. At this point in Sept 2023, most of what we can expect has been released, and is available via data.census.gov and the census APIs.

Here are the different series, and what they include.

  • Apportionment data. Released in Apr 2021. Just the total population counts for each state, used to reapportion seats in Congress.
  • Redistricting data. Released in Aug 2021. Also known as PL 91-171 (for the law that requires it), this data is intended for redrawing congressional and legislative districts. It includes just six tables, available for several geographies down to the block level. This was our first detailed glimpse of the count. The dataset contains population counts by race, Hispanic and Latino ethnicity, the 18 and over population, group quarters, and housing unit occupancy. Here are the six US-level tables.
  • Demographic and Housing Characteristics File. Released in May 2023. In the past, this series was called Summary File 1. It is the “primary” decennial census dataset that most people will use, and contains the full range of summary data tables for the 2020 census for practically all census geographies. There are fewer tables overall relative to the 2010 census, and fewer that provide a geographically granular level of detail (ostensibly due to privacy and cost concerns). The Data Table Guide is an Excel spreadsheet that lists every table and the variables they include.
  • Demographic Profile. Released in May 2023. This is a single table, DP1, that provides a broad cross-section of the variables included in the 2020 census. If you want a summary overview, this is the table you’ll consult. It’s an easily accessible option for folks who don’t want or need to compile data from several tables in the DHC. Here is the state-level table for all 50 states plus.
  • Detailed Demographic and Housing Characteristics File A. Released in Sept 2023. In the past, this series was called Summary File 2. It is a subset of the data collected in the DHC that includes more detailed cross-tabulations for race and ethnicity categories, down to the census tract level. It is primarily used by researchers who are specifically studying race, and the multiracial population.
  • Detailed Demographic and Housing Characteristics File B. Not released yet. This will be a subset of the data collected in the DHC that includes more detailed cross-tabulations on household relationships and tenure, down to the census tract level. Primarily of interest to researchers studying these characteristics.

There are a few aspects of the 2020 census data that vary from the past – I’ll link to some NPR stories that provide a good overview. Respondents were able to identify their race or ethnicity at a more granular level. In addition to checking the standard OMB race category boxes, respondents could write in additional details, which the Census Bureau standardized against a list of races, ethnicities, and national origins. This is particularly noteworthy for the Black and White populations, for whom this had not been an option in the recent past. It’s now easier to identify subgroups within these groups, such as Africans and Afro-Caribbeans within the Black population, and Middle Eastern and North Africans (MENA) within the White population. Another major change is that same-sex marriages and partnerships are now explicitly tabulated. In the past, same-sex marriages were all counted as unmarried partners, and instead of having clearly identifiable variables for same-sex partners, researchers had to impute this population from other variables.

Another major change was the implementation of the differential privacy mechanism, which is a complex statistical process to inject noise into the summary data to prevent someone from reverse engineering it to reveal information about individual people (in violation of laws to protect census respondent’s privacy). The social science community has been critical of the application of this procedure, and IPUMS has published research to study possible impacts. One big takeaway is that published block-level population data is less reliable than in the past (housing unit data on the other hand is not impacted, as it is not subjected to the mechanism).

When would you use decennial census data versus other census data? A few considerations – when you:

  • Want or need to work with actual counts rather than estimates
  • Only need basic demographic and housing characteristics
  • Need data that provides detailed cross-tabulations of race, which is not available elsewhere
  • Need a detailed breakdown of the group quarters population, which is not available elsewhere
  • Are explicitly working with voting and redistricting
  • Are making historical comparisons relative to previous 10-year censuses

In contrast, if you’re looking for detailed socio-economic characteristics of the population, you would need to look elsewhere as the decennial census does not collect this information. The annual American Community Survey or monthly Current Population Survey would be likely alternatives. If you need basic, annual population estimates or are studying the components of population change, the Population and Housing Unit Estimates Program is your best bet.

2020 Resident Population Map

2020 Census Updates

In late summer and early fall I was hammering out the draft for an ALA Tech Report on using census data for research (slated for release early 2022). The earliest 2020 census figures have been released and there are several issues surrounding this, so I’ll provide a summary of what’s happening here. Throughout this post I link to Census Bureau data sources, news bulletins, and summaries of trends, as well as analysis on population trends from Bill Frey at Brookings and reporting from Hansi Lo Wang and his colleagues at NPR.

Count Result and Reapportionment Numbers

The re-apportionment results were released back in April 2020, which provided the population totals for the US and each of the states that are used to reallocate seats in Congress. This data is typically released at the end of December of the census year, but the COVID-19 pandemic and political interference in census operations disrupted the count and pushed all the deadlines back.

Despite these disruptions, the good news is that the self-response rate, which is the percentage of households who submit the form on their own without any prompting from the Census Bureau, was 67%, which is on par with the 2010 census. This was the first decennial census where the form could be submitted online, and of the self-responders 80% chose to submit via the internet as opposed to paper or telephone. Ultimately, the Bureau said it reached over 99% of all addresses in its master address file through self-response and non-response follow-ups.

The bad news is that the rate of non-response to individual questions was much higher in 2020 than in 2010. Non-responses ranged from a low of 0.52% for the total population count to a high of 5.95% for age or date of birth. This means that a higher percentage of data will have to be imputed, but this time around the Bureau will rely more on administrative records to fill the gaps. They have transparently posted all of the data about non-response for researchers to scrutinize.

The apportionment results showed that the population of the US grew from approximately 309 million in 2010 to 331 million in 2020, a growth rate of 7.35%. This is the lowest rate of population growth since the 1940 census that followed the Great Depression. Three states lost population (West Virginia, Mississippi, and Illinois), which is the highest number since the 1980 census. The US territory of Puerto Rico lost almost twelve percent of its population. Population growth continues to be stronger in the West and South relative to the Northeast and Midwest, and the fastest growing states are in the Mountain West.

https://www.census.gov/library/visualizations/2021/dec/2020-percent-change-map.html

Public Redistricting Data

The first detailed population statistics were released as part of the redistricting data file, PL 94-171. Data in this series is published down to the block level, the smallest geography available, so that states can redraw congressional and other voting districts based on population change. Normally released at the end of March, this data was released in August 2021. This is a small package that contains the following six tables:

  • P1. Race (includes total population count)
  • P2. Hispanic or Latino, and Not Hispanic or Latino by Race
  • P3. Race for the Population 18 Years and Over
  • P4. Hispanic or Latino, and Not Hispanic or Latino by Race for the Population 18 Years and
    Over
  • P5. Group Quarters Population by Major Group Quarters Type
  • H1. Occupancy Status (includes total housing units)

The raw data files for each state can be downloaded from the 2020 PL 94-171 page and loaded into stats packages or databases. That page also provides infographics (including the maps embedded in this post) and data summaries. Data tables can be readily accessed via data.census.gov, or via IPUMS NHGIS.

The redistricting files illustrate the increasing diversity of the United States. The number of people identifying as two or more races has grown from 2.9% of the total population in 2010 to 10.2% in 2020. Hispanics and Latinos continue to be the fastest growing population group, followed by Asians. The White population actually shrank for the first time in the nation’s history, but as NPR reporter Hansi-Lo Wang and his colleagues illustrate this interpretation depends on how one measures race; as race alone (people of a single race) or persons of any race (who selected white and another race), and whether or not Hispanic-whites are included with non-Hispanic whites (as Hispanic / Latino is not a race, but is counted separately as an ethnicity, and most Hispanics identify their race as White or Other). The Census Bureau has also provided summaries using the different definitions. Other findings: the nation is becoming progressively older, and urban areas outpaced rural ones in population growth. Half of the counties in the US lost population between 2010 and 2020, mostly in rural areas.

https://www.census.gov/library/visualizations/2021/dec/percent-change-county-population.html

2020 Demographic and Housing Characteristics and the ACS

There still isn’t a published timeline for the release of the full results in the Demographic and Housing Characteristics File (DHC – known as Summary File 1 in previous censuses, I’m not sure if the DHC moniker is replacing the SF1 title or not). There are hints that this file is going to be much smaller in terms of the number of tables, and more limited in geographic detail compared to the 2010 census. Over the past few years there’s been a lot of discussion about the new differential privacy mechanisms, which will be used to inject noise into the data. The Census Bureau deemed this necessary for protecting people’s privacy, as increased computing power and access to third party datasets have made it possible to reverse engineer the summary census data to generate information on individuals.

What has not been as widely discussed is that many tables will simply not be published, or will only be summarized down to the county-level, also for the purpose of protecting privacy. The Census Bureau has invited the public to provide feedback on the new products and has published a spreadsheet crosswalking products from 2010 and 2020. IPUMS also released a preliminary list of tables that could be cut or reduced in specificity (derived from the crosswalk), which I’m republishing at the bottom of this post. This is still preliminary, but if all these changes are made it would drastically reduce the scope and specificity of the decennial census.

And then… there is the 2020 American Community Survey. Due to COVID-19 the response rates to the ACS were one-third lower than normal. As such, the sample is not large or reliable enough to publish the 1-year estimate data, which is typically released in September. Instead, the Census will publish a smaller series of experimental tables for a more limited range of geographies at the end of November 2021. There is still no news regarding what will happen with the 5-year estimate series that is typically released in December.

Needless to say, there’s no shortage of uncertainty regarding census data in 2020.

Tables in 2010 Summary File 1 that Would Have Less Geographic Detail in 2020 (Proposed)

Table NameProposed 2020 Lowest Level of Geography2010 Lowest Level of Geography
Hispanic or Latino Origin of Householder by Race of HouseholderCountyBlock
Household Size by Household Type by Presence of Own ChildrenCountyBlock
Household Type by Age of HouseholderCountyBlock
Households by Presence of People 60 Years and Over by Household TypeCountyBlock
Households by Presence of People 60 Years and Over, Household Size, and Household TypeCountyBlock
Households by Presence of People 75 Years and Over, Household Size, and Household TypeCountyBlock
Household Type by Household SizeCountyBlock
Household Type by Household Size by Race of HouseholderCountyBlock
Relationship by Age for the Population Under 18 YearsCountyBlock
Household Type by Relationship for the Population 65 Years and OverCountyBlock
Household Type by Relationship for the Population 65 Years and Over by RaceCountyBlock
Family Type by Presence and Age of Own ChildrenCountyBlock
Family Type by Presence and Age of Own Children by Race of HouseholderCountyBlock
Age of Grandchildren Under 18 Years Living with A Grandparent HouseholderCountyBlock
Household Type by Relationship by RaceCountyBlock
Average Household Size by AgeTo be determinedBlock
Household Type for the Population in HouseholdsTo be determinedBlock
Household Type by Relationship for the Population Under 18 YearsTo be determinedBlock
Population in Families by AgeTo be determinedBlock
Average Family Size by AgeTo be determinedBlock
Family Type and Age for Own Children Under 18 YearsTo be determinedBlock
Total Population in Occupied Housing Units by TenureTo be determinedBlock
Average Household Size of Occupied Housing Units by TenureTo be determinedBlock
Sex by Age for the Population in HouseholdsCountyTract
Sex by Age for the Population in Households by RaceCountyTract
Presence of Multigenerational HouseholdsCountyTract
Presence of Multigenerational Households by Race of HouseholderCountyTract
Coupled Households by TypeCountyTract
Nonfamily Households by Sex of Householder by Living Alone by Age of HouseholderCountyTract
Group Quarters Population by Sex by Age by Group Quarters TypeStateTract

Tables in 2010 Summary File 1 That Would Be Eliminated in 2020 (Proposed)

Population in Households by Age by Race of Householder
Average Household Size by Age by Race of Householder
Households by Age of Householder by Household Type by Presence of Related Children
Households by Presence of Nonrelatives
Household Type by Relationship for the Population Under 18 Years by Race
Household Type for the Population Under 18 Years in Households (Excluding Householders, Spouses, and Unmarried Partners)
Families*
Families by Race of Householder*
Population in Families by Age by Race of Householder
Average Family Size by Age by Race of Householder
Family Type by Presence and Age of Related Children
Family Type by Presence and Age of Related Children by Race of Householder
Group Quarters Population by Major Group Quarters Type*
Population Substituted
Allocation of Population Items
Allocation of Race
Allocation of Hispanic or Latino Origin
Allocation of Sex
Allocation of Age
Allocation of Relationship
Allocation of Population Items for the Population in Group Quarters
American Indian and Alaska Native Alone with One Tribe Reported for Selected Tribes
American Indian and Alaska Native Alone with One or More Tribes Reported for Selected Tribes
American Indian and Alaska Native Alone or in Combination with One or More Other Races and with One or More Tribes Reported for Selected Tribes
American Indian and Alaska Native Alone or in Combination with One or More Other Races
Asian Alone with One Asian Category for Selected Groups
Asian Alone with One or More Asian Categories for Selected Groups
Asian Alone or in Combination with One or More Other Races, and with One or More Asian Categories for Selected Groups
Native Hawaiian and Other Pacific Islander Alone with One Native Hawaiian and Other Pacific Islander Category for Selected Groups
Native Hawaiian and Other Pacific Islander Alone with One or More Native Hawaiian and Other Pacific Islander Categories for Selected Groups
Native Hawaiian and Other Pacific Islander Alone or in Combination with One or More Races, and with One or More Native Hawaiian and Other Pacific Islander Categories for Selected Groups
Hispanic or Latino by Specific Origin
Sex by Single Year of Age by Race
Household Type by Number of Children Under 18 (Excluding Householders, Spouses, and Unmarried Partners)
Presence of Unmarried Partner of Householder by Household Type for the Population Under 18 Years in Households (Excluding Householders, Spouses, and Unmarried Partners)
Nonrelatives by Household Type
Nonrelatives by Household Type by Race
Group Quarters Population by Major Group Quarters Type by Race
Group Quarters Population by Sex by Major Group Quarters Type for the Population 18 Years and Over by Race
Total Races Tallied for Householders
Hispanic or Latino Origin of Householders by Total Races Tallied
Total Population in Occupied Housing Units by Tenure by Race of Householder
Average Household Size of Occupied Housing Units by Tenure
Average Household Size of Occupied Housing Units by Tenure by Race of Householder
Occupied Housing Units Substituted
Allocation of Vacancy Status
Allocation of Tenure
Tenure by Presence and Age of Related Children
* Counts for these tables are available in other proposed DHC tables. For example, the count of families is available in the Household Type table, which will be available at the block level in the 2020 DHC. 
Philadelphia Redlining Map

Redlining Maps for GIS

I received several questions during the spring semester about redlining maps; where to find them, and how many were made. Known officially as Residential Security Maps, they were created by the Home Owners Loan Corporation in the 1930s to grade the level of security or risk for making home loans in residential portions of urban areas throughout the US. This New Deal program was intended to help people refinance mortgages and prevent foreclosures, while increasing buying opportunities to expand home ownership.

Areas were evaluated by lenders, developers, and appraisers and graded from A to D to indicate their desirability or risk level. Grade A was best (green), B still desirable (blue), C definitely declining (yellow), and D hazardous (red). The yellow and red areas were primarily populated by minorities, immigrants, and low income groups, and current research suggests that this program had a long reaching negative impact by enforcing and cementing segregation, disinvestment, and poverty in these areas.

The definitive digital source for these maps is the Mapping Inequality : Redlining in New Deal America project created at the University of Richmond’s Digital Scholarship Lab. They provide a solid history and summary of these maps and a good bibliography. The main portal is an interactive map of the US that allows you to zoom in and preview maps in different cities. You can click on individually zoned areas and get the original assessor or evaluator’s notes (when available). If you switch to the Downloads page you get a list of maps sorted alphabetically by state and city that you can download as: a jpeg of the original scanned map, a georeferenced image that can be added to GIS software as a raster, and a GIS vector polygon file (shapefile or geojson). In many cases there is also a scanned copy of the evaluators description and notes. You also have the option for downloading a unified vector file for the entire US as a shapefile or geojson. All of the data is provided under a Creative Commons Attribution Sharealike License.

Providence Redlining Map
Redlining Map of Providence, RI with graded areas, from the Mapping Inequality Project

There are a few other sources to choose from, but none of them are as complete. I originally thought of the National Archives which I thought would be the likely holder of the original paper maps, but only a fraction have been digitized. The PolicyMap database has most (but not all) of the maps available as a feature you can overlay in their platform. If you’re doing a basic web search this Slate article is among the first resources you’ll encounter, but most of the links are broken (which says something about the ephemeral nature of these kinds of digital projects).

How many maps were made? Amy Hillier’s work was among the earlier studies that examined these maps, and her case study of Philadelphia includes a detailed summary of the history of the HOLC program with references to primary source material. According to her research, 239 of these maps were made and she provides a list of each of the cities in the appendix. I was trying to discover how many maps were available in Rhode Island and found this list wasn’t complete; it only included Providence, while the Mapping Inequality project has maps for Providence, Pawtucket & Central Falls, and Woonsocket. I counted 202 maps based on unique names on Mapping Inequality, but some several individual maps include multiple cities.

She mentions that a population of 40,000 people was used as a cut-off for deciding which places to map, but noted that there were exceptions; Washington DC was omitted entirely, while there are several maps for urban counties in New Jersey as opposed to cities. In some case cities that were below the 40k threshold that were located beside larger ones were included. I checked the 1930 census against the three cities in Rhode Island that had maps, and indeed they were the only RI cities at that time that had more than 40k people (Central Falls had less than 40k but was included with Pawtucket as they’re adjacent). So this seemed to provide reasonable assurance that these were the only ones in existence for RI.

Finding the population data for the cities was another surprise. I had assumed this data was available in the NHGIS, but it wasn’t. The NHGIS includes data for places (Census Places) back to the 1970 census, which was the beginning of the period where a formal, bounded census place geography existed. Prior to this time, the Census Bureau published population count data for cities using other means, and the NHGIS is still working to include this information. It does exist (as you can find it in Wikipedia articles for most major cities) but is buried in old PDF reports on the Census Bureau’s website.

If you’re interested in learning more about the redlining maps beyond the documentation provided by Mapping Inequality, these articles provide detailed overviews of the HOLC and the residential security maps program, as well as their implications to the present day. You’ll need to access them through a library database:

Hillier, A.E. (2005). “Residential Security Maps and Neighborhood Appraisals: The Home Owners’ Loan Corporation and the Case of Philadelphia.” Social Science History, 29(2): 207-233.

Greer, J. (2012). “The Home Owners’ Loan Corporation and the Development of the Residential Security Maps“. Journal of Urban History, 39(2): 275-296.