data.census.gov is down? Here are some work-arounds

NOTE – the website has returned to normal, but I’m leaving this post up in case the problem ever reoccurs.

So data.census.gov is not working today. I went there repeatedly and got an empty white screen each time, regardless of which web browser I tried. My wife spotted a post on an urban planning listserv where someone wrote that they contacted the Census Bureau. The Bureau was aware of the problem, but due of staff shortages related to COVID-19 it could take a week to fix!

Fortunately there are work-arounds. The post provided links to some suggestions at the State Data Center Clearing House. The first suggestion is to use the Chrome Browser, clear your cache, and try going directly to the advanced search link at https://data.census.gov/cedsci/advanced. This didn’t work for me, so I looked at the second suggestion, which is to create deep links to specific tables and search results. This worked, but if you’re not intimately familiar with census geography and table identifiers this could be tough going.

The good news is that if you can just get to one table, that gets you in the backdoor, bypassing the initial search screens that aren’t loading. From there you can use the filter in the table search results to find what you want. For example, go here:


This brings you to the 5-year 2018 Data Profile table for Demographic and Housing for the United States. From there you can click the Filter button in the upper left-hand corner above the table name.

Filter Results Link

In the filter screen, scroll down to the bottom and hit the Clear All button to remove the filter for the US. Then go through the Advanced Filter options on the left and choose your Survey, Year, Geography, and Topic (see this earlier post for hints on this strategy). When you’re done, hit the green down arrows that Hides the filter menu (upper right-hand corner) and returns you to the results screen, where the new table results represent the filters you just applied. Browse around and download away! (Customize Table button, then Download)Filter Menu

There are alternatives to data.census.gov as well: the MCDC for getting decennial and ACS profiles (menu on right side of the screen) the Census Reporter for just the latest ACS data, and the NHGIS for accessing all census data tables past to present.

Census Book

Exploring the US Census Book Published!

My book, Exploring the US Census: Your Guide to America’s Data, has been published! You can purchase it directly from SAGE Publishing or from Barnes and Nobles, Amazon, or your bookstore of choice (it’s currently listed for pre-order on Amazon but its availability there is imminent). It’s $45 for the paperback, $36 for the ebook. Data for the exercises and supplemental material is available on the publisher’s website, and I’ve created a landing page for the book on this site.

Exploring the US Census is the definitive researcher’s guide to working with census data. I place the census within the context of: US society, the open data movement, and the big data universe, provide a crash course on using the new data.census.gov, and introduce the fundamental concepts of census geography and subject categories (aka universes). One chapter is devoted to each of the primary datasets: decennial census (with details about the 2020 census that’s just over the horizon), American Community Survey, Population Estimates Program, and business data from the Business Patterns, Economic Census, and BLS. Subsequent chapters demonstrate how to: integrate census data into writing and research, map census data in GIS, create derivative measures, and work with historic data and microdata with a focus on the Current Population Survey.

I wrote the book as a hybrid between a techie guidebook and an academic text. I provide hands-on exercises so that you learn by doing (techie) while supplying sufficient context so you can understand and evaluate why you’re doing it (academic). I demonstrate how to find and download data from several different sources, and how to work with the data using free and open source software: spreadsheets (LibreOffice Calc), SQL databases (DB Browser for SQLite), and GIS (QGIS). I point out the major caveats and pitfalls of working with the census, along with many helpful tools and resources.

The US census data ecosystem provides us with excellent statistics for describing, studying, and understanding our communities and our nation. It is a free and public domain resource that’s a vital piece of the country’s social, political, and economic infrastructure and a foundational element of American democracy. This book is your indispensable road map for navigating the census. Have a good trip!

See the series – census book tag for posts about the content of the book, additional material that expands on that content (but didn’t make it between the covers), and the writing process.

A-Train Classic

Neighborhood Research and the Census for Undergrads

Each semester I visit several undergraduate classes in public affairs and journalism, to introduce students to census data. They’re researching or reporting on particular issues and trends in neighborhoods in New York City, and they are looking for statistics to either support their work or generate ideas for a story. I usually showcase the NYC Population Factfinder as a starting point, mention the Census Reporter for areas outside the city , and provide background info on the decennial census, American Community Survey, and census geography and subjects. This year I included two new examples toward the beginning of the lecture to spark their interest.

I recently helped reporter Susannah Jacob navigate census data for an article she wrote on hyper-gentrification in the West Village for the New York Review of Books. A perfect example, as it’s what the students are expected to do for their assignment! Like any good journalist (and human geographer), Susannah pounded the pavement of the neighborhood, interviewing residents and small businesses and observing and documenting the urban landscape and how it was changing. But she also wanted to see what the data could tell her, and whether it would corroborate or refute what she was seeing and hearing.

NYRB Article on the West Village

Source: Jacob & Roye, New York Review of Books, Oct 2019. https://www.nybooks.com/daily/2019/10/09/what-happened-to-the-west-village/

We used the NYC Population Factfinder to assemble census tracts to approximate the neighborhood, and I did a little legwork to pull data from the County / ZIP Code Business Patterns so we could see how the business landscape was changing. The most surprising stat we discovered was that the number of 1-unit detached homes had doubled. This wouldn’t be odd in many rapidly growing places in the US, but it’s unusual for an old, built-out urban neighborhood. A 1-unit detached home is a free-standing single family structure that doesn’t share walls with other buildings. Most homes in Manhattan are either attached (row houses / town houses) or units in multi-unit buildings (apartments / condos / co-ops). How could this be? Uber-wealthy people are buying up adjoining row homes, knocking down the walls, and turning them into urban mansions. Seems extraordinary, but apparently is part of a trend.

We certainly ran up against the limitations of ACS data. The estimates for tracts have large margins of error, and when comparing two short time frames it’s difficult to detect actual change, as differences in estimates are clouded by sampling noise. Even after aggregating several tracts, many of the estimates for change weren’t reliable enough to report. When they were (as in the housing example) you could only say that there has been a relative increase without becoming wedded to a precise number. In this case, from 214 (+/- 127) detached units in 2006-2010 to 627 (+/-227) in 2013-2017, an increase of 386 (+/- 260). Not great estimates, but you can say it’s an increase as the low end for change is still positive at 126 units. Considering the time frame and character of the neighborhood, that’s still noteworthy (bearing in mind we’re working with a 90% confidence interval). In cases where the differences overlap and could represent either an increase or decrease there are few claims you can make, and it’s best to walk away (or look at larger area). I always discuss the margin of error with students and caution them about treating these numbers as counts.

While census data is invaluable for describing and studying individual places, it’s inherent geographic nature also allows us to study places in relation to each other, and to illustrate geographic patterns. For my second example, I zoom out and show them this map of racial-ethnic distribution in the United States:

Map of US Racial and Ethnic Diversity

Source: William H. Frey analysis of US Census population estimates, 2018. https://www.brookings.edu/research/americas-racial-diversity-in-six-maps/

This is one of a series of six maps by demographer William Frey at the Brookings Institute that highlights the geographic diversity of the United States. In this map, each county is shaded for a particular race / ethnicity if the population of that group in that county is greater than that group’s share of the national population. For example, Hispanics / Latinos represent 18.3% of the total US population, so counties where they represent more than this percentage are shaded.

For the purpose of the class, it helps make the census ‘pop’ and gets the students to think about the statistics as geospatial datasets that they can see and relate to, and that can form the basis for interesting research.

Some footnotes – if you like Frey’s maps, I highly recommend his book Diversity Explosion: How New Racial Demographics are Remaking America. It explores the evolving demographic and geographic landscape of the US with clear, accessible writing and more of these great maps (in color).

I used the pic at the top of this post as the background for my intro slide. It’s a screenshot of a city from A-Train, a 1992 city-building train simulator that was ported from Japan to the world by Artdink and Maxis, following the success of something called SimCity. It wasn’t nearly as successful, but I always liked the graphics which have now attained a retro-gaming vibe.

ZBP Data in a Notebook

Examples of using the Census Bureau’s API with Python

At the end of my book I briefly illustrate how the Census Bureau’s API works using Python. I’ll expand on that in this post; we’ll pull data from the Population Estimates Program, transform it, and create a chart using Python with Pandas in a Notebook. I’ll conclude with an additional example using the ZIP Code Business Patterns.

The Census Bureau has dedicated API pages for each dataset (decennial, acs, pop estimates, and more), and you need to familiarize yourself with the geographies and variables that are available for each. The API is a basic REST API, where you insert parameters into a base url and retrieve data based on the link you submit. Python has several modules you can use for interacting with APIs – the requests module is a popular choice.

The following pop estimates example is on github (but if github flops see the nbviewer example instead).

The top of the script contains basic stuff – import the modules you need, read in your key, and define the variables that you want to pull. You don’t have to use an API key, but if you don’t you’re limited to pulling in 500 records a day. Requesting a key is simple and free. A best practice is to store your key (a big integer) in a file that you read in, so you’re not exposing it in the script. Most of the census APIs require that you pass in a year and a dataset (dsource). Larger datasets may be divided into subsets (dname); for example the population estimates is divided into estimates, components of change, and characteristics (age, sex, race, etc.). Save the columns and geographies that you want to get in a comma-separated string. You have to consult the documentation and variable lists that are available for each dataset to build these, and the geography requires ANSI / FIPS codes.

%matplotlib inline
import requests,pandas as pd

with open('census_key.txt') as key:


Next, you can create the url. I’ve been doing this in two parts. The first part:

base_url = f'https://api.census.gov/data/{year}/{dsource}/{dname}'

Includes the base https://api.census.gov/data/ followed by parameters that you fill in. The year, data source, and dataset name are the standard pieces. The output looks like this:


Then you take that base_url and add additional parameters that are going to vary within the script, in this case the columns and the geography, which all appear in the ‘get’ portion of the url. The ‘for’ and ‘in’ options allow you to select the type of geography within another geography, in this case counties within states, and you pass in the appropriate ANSI FIPS codes from the string you’ve created. The key appears at the end of the url, but if you opt not to use it you can omit that part. Once the link is fully constructed you use the requests module to fetch the data using that url. You can print the result out as text (assuming it’s not too long).

data_url = f'{base_url}?get={cols}&for=county:{county}&in=state:{state}&key={api_key}'

The result looks like a nested list, but is actually a string that’s structured in a non-standard JSON format:

["Bucks County, Pennsylvania","-178","-605","862","42","017"],
["Chester County, Pennsylvania","1829","-887","1374","42","029"],
["Delaware County, Pennsylvania","1374","-2513","1579","42","045"],
["Montgomery County, Pennsylvania","1230","-1987","2315","42","091"],
["Philadelphia County, Pennsylvania","8617","-11796","8904","42","101"]]

To do anything with it, convert it to JSON with response.json(). Then you can convert it into a list, dictionary, or in this example a Pandas dataframe. Here, I build the dataframe with everything from row one forward [1:]; row zero contains the column headers[0]. I rename some of the columns, build a unique ID by concatenating the state and county FIPS codes and set that as the new index, and drop the individual county and state FIPS columns. By default every object that’s returned is a string, so I convert the numeric columns to integers:

df=pd.DataFrame(data[1:], columns=data[0]).\
    rename(columns={"NATURALINC": "Natural Increase", "DOMESTICMIG": "Net Domestic Mig", "INTERNATIONALMIG":"Net Foreign Mig"})
df=df.astype(dtype={'Natural Increase':'int64','Net Domestic Mig':'int64','Net Foreign Mig':'int64'},inplace=True)

Then I can see the result:

pep dataframe

Once the data is in good shape, you can begin to analyze and visualize it. Here’s the components of population change for Philadelphia and the surrounding suburban counties in Pennsylvania from 2017 to 2018 – natural increase is the difference between births and deaths, and there’s net migration within the US (domestic) and between the US and other countries (foreign):

labels=df['GEONAME'].str.split(' ',expand=True)[0]
ax=df.plot.bar(rot=0, title='Components of Population Change 2017-18')

Components of Population Change Plot

Each request is going to vary based on your specific needs and the construction of the particular dataset. Here’s another example where I pull data on business establishments, employees, and wages (in $1,000s of dollars) from the ZIP Code Business Patterns (ZBP). This dataset is smaller, so it doesn’t have a dataset name, just a data source. To get all the ZIP Codes in Delaware I use the asterisk * wildcard. Because ZIP Codes do not nest within states I can’t use the ‘in’ option, it’s simply not available. A state code is stored in a special field called ST, and I can use it as a general limiter with equals in the query:


base_url = f'https://api.census.gov/data/{year}/{dsource}'

data_url = f'{base_url}?get={cols}&for=zipcode:*&ST={state}&key={api_key}'
zbp_data=pd.DataFrame(data[1:], columns=data[0]).set_index('zipcode')
for field in cols.split(','):

ZBP Data for Delaware

One of the issues with the ZBP is that many variables are not disclosed due to privacy regulations; instead of returning nulls a zero is returned, but in this dataset they are not true zeros. Once you retrieve the data and set the types you can replace zeros with NaNs, which are numpy / Panda nulls – although there’s a quirk in that dataframe columns declared as integers cannot contain null values. Instead you can use a float, or a workaround that’s been implemented for new Pandas versions (for my specific use case this data will be inserted into a database, so I’ll use SQL to accomplish the zero to null conversion). ZBP data is also injected with noise to protect privacy, and you can retrieve special columns that contain noise flags.

The API is convenient for automating the data acquisition process, and allows you to cherry pick the variables you want. To avoid accessing the API over and over again as you build your scripts (which is prohibitive when requesting lots of data) you can pickle the data right after you retrieve it – a pickle is a python data object that efficiently stores data locally, and pandas has special functions for creating and accessing them. Once you pull your data and pickle it, you can comment out (or in a notebook, don’t rerun) the requests block, and subsequently pull the data from the pickle as you tweak your code (see caveat in the postscript – perhaps best to use json instead of pickle).

#Write to a pickle
zbp_data.to_pickle('insert path here.pickle')
#Read from a pickle to dataframe
zbp_new=pd.read_pickle('insert path here.pickle')

Take a look at the Census Data API User Guide to learn more. The guide focuses just on the REST API, and is not specific to a scripting language. Of course, you also need to familiarize yourself with the datasets and how they’re created and organized, and with census geography (which is why I wrote this book).


Since I’ve finished this post I’ve created a notebook that pulls ZBP data from the API (alt nbviewer here) and have some additional thoughts I’d like to share:

  1. I decided to dump the data I retrieved from the API to a json file and then pull data from it instead of using a pickle. Pickles come with serious security issues. If you don’t intend to share your code with anyone pickles are fine, otherwise consider an alternative.
  2. My method for parsing the retrieved data into a dataframe worked fine because the census API uses non-standard JSON; essentially the string that’s returned resembles a nested Python list. If this was true JSON, we may need to employ a different method to account for the fact that the number of elements per record may vary.
  3. Wildcards are not always available to build urls for certain data; for example to download the number of establishments classified by industry I wasn’t able to grab everything for one state using the method I illustrated in this post. Instead I had to loop through a list of ZIP and NAICS codes to retrieve what I wanted one at a time.
  4. In the case of retrieving establishments classified by industry there were many cases when there was no data for a particular ZIP Code (i.e. no farms and mines in midtown Manhattan). Since I needed records that showed zero establishments, I had to insert them myself if the API returned no result. Even if you didn’t need records with zeros, it’s important to consider the potential impact of getting nothing back from the API on your subsequent code.
  5. Given my experience thus far these APIs were pretty reliable, in that I haven’t had issues with time outs and partially returned data. If this was not the case and you had lots of data to retrieve, you would need to build in some try – except statements to handle exceptions, save data as you go along, and pick up where you left off if something breaks. Read about this geocoding script I wrote a few years back for examples.
FRED Chart - Pesronal Savings Rate

Finding Economic Data with FRED

I attended ALA’s annual conference in DC last month, where I met FRED. Not a person, but a database. I can’t believe I hadn’t met FRED before – it is an amazingly valuable resource for national, time-series economic data.

FRED was created by the Economic Research unit of the Federal Reserve Bank of St. Louis. It was designed to aggregate economic data from many government sources into a centralized database, with straightforward interface for creating charts and tables. At present, it contains 567,000 US and international time series datasets from 87 sources.

Categories of data include banking and finance (interest and exchange rates, lending, monetary data), labor markets (basic demographics, employment and unemployment, job openings, taxes, real estate), national accounts (national income, debt, trade), production and business (business cycles, production, retail trade, sector-level information about industries),  prices (commodities, consumer price indexes) and a lot more. Sources include the Federal Reserve, the Bureau of Labor Statistics, the Census Bureau, the Bureau of Economic Analysis, the Treasury Department, and a mix of other government and corporate sources from the US and around the world.

On their home page at https://fred.stlouisfed.org/ you can search for indicators or choose one of several options for browsing. The default dashboard shows you some of the most popular series and newest releases at a glance. Click on Civilian Unemployment Rate, and you retrieve a chart with monthly stats that stretch from the late 1940s to the present. Most of FRED’s plots highlight periods of recession since these have a clear impact on economic trends. You can modify the chart’s date range, change the frequency (monthly, quarterly, annually – varies by indicator), download the chart or the underlying data in a number of formats, and share a link to it. There are also a number of advanced customization features, such as adding other series to the chart. Directly below the chart are notes that provide a clear definition of the indicator and its source (in this case, the Bureau of Labor Statistics) and links to related tables and resources.

FRED - Chart of Civilian Unemployment Rate

The unemployment rate is certainly something that you’d expect to see, but once you browse around a bit you’ll be surprised by the mix of statistics and the level of detail. I happened to stumble across a monthly Condo Price Index for the New York City Metro Area.

Relative to other sources or portals, FRED is great for viewing and retrieving national (US and other countries) economic and fiscal data and charts gathered from many sources. It’s well suited for time-series data; there are lots of indexes and you can opt for seasonally adjusted or unadjusted values. Many of the series include data for large regions of the US, states, metro areas, and counties. The simplest way to find sub-national data is to do a search, and once you do you can apply filters for concepts, frequencies, geographies, and sources. FRED is not the place to go if you need data for small geographies below the county level. If you opt to create a FRED account (purely optional) you’ll be able to save and track indicators that you’re interested in and build your own dashboards.

If you’re interested in maps, visit FRED’s brother GeoFRED at https://geofred.stlouisfed.org/.  The homepage has a series of sample thematic maps for US counties and states and globally for countries. Choose any map, and once it opens you can change the geography and indicator to something else. You can modify the frequency, units, and time periods for many of the indicators, and you have basic options for customizing the map (colors, labels, legend, etc.) The maps are interactive, so you can zoom in and out and click on a place to see its data value. Most of the county-level data comes from the Census Bureau, but as you move up to states or metro areas the number of indicators and sources increase. For example, the map below shows individual income taxes collected per capita by state in 2018.

GeoFRED - State Income Tax

There’s a basic search function for finding specific indicators. Just like the charts, maps can be downloaded as static images, shared and embedded in websites, and you can download the data behind the map (it’s simpler to download the same indicator for multiple geographies using GeoFRED compared to FRED).

Take a few minutes and check it out. For insights and analyses of data published via FRED, visit FRED’s blog at https://fredblog.stlouisfed.org/.


Navigating the New data.census.gov

June 2019 is the final month that the Census Bureau will post new data in the American Factfinder (AFF). From this point forward, all new datasets will be published via the new data dissemination platform data.census.gov. The second chapter of my book (now available for pre-order!) is devoted to navigating this new interface. In this post I’ll provide a preview / brief tutorial of the advanced search functions.

The new interface is search-driven, so you can type the names of topics and geographies or table ID numbers to find and explore data tables. There are spiffy data profiles for several geographies, and you have the ability to make basic thematic maps. The search interface makes it much easier to casually browse and discover data, so go ahead and explore.

I’d still recommend having a search strategy to find precisely what you need. Keyword searching alone isn’t going to cut it, because you’re searching across tens of thousands of tables in dozens of datasets. The good news is that the same strategy I’ve used for the AFF can be applied to data.census.gov: use the advanced search to filter by survey, year, geography, and topic to narrow down the list of possible tables to a manageable number, and then search or browse through those results to find what you need.

Let’s say we want to download the most recent data on home values for all the counties in Pennsylvania (or a state of your choosing). On data.census.gov click on the advanced search link under the search box. On the advanced search page scroll to the bottom to the filters. We’ll address them one by one:

Surveys. These represent all the different census datasets. Select ACS 5-Year Estimates Detailed Tables. Detailed socio-economic characteristics of the population are primarily published in the ACS. The 1-Year estimates are published for all geographies that have at least 65k people. Since most states have rural counties that have less than this threshold, we’ll have to use the 5-year estimates to get all the counties. The detailed tables are narrow, focusing on estimates for a single variable. The other options include profiles (lots of different data for one place) and subject tables (narrower in scope than profiles, but broader than the detailed tables).

filter by survey

Years. At the moment 2017 is the latest year for the ACS, so let’s select that. This quickly eliminates a lot of tables that we’re not interested in.

Geography. Choose 050 – County, then scroll down and choose Pennsylvania in the County (State) list, then All counties in Pennsylvania in the final list.

filter by geography

Topics. For this example choose Housing, then Financial Characteristics, then Housing Value and Purchase Price. Of all the filter options, this one is the most opened-ended and may require some experimentation based on what you’re looking for.

filter by topic

Codes. We don’t need to filter by codes in this example, but if we were searching for labor or business-related data we’d use this filter to limit results to specific sectors or industries by NAICS codes.

Underneath the filter menu, click the View All Results button. This brings us to the first results page, which provides a list of tables, maps, and pages related to our search. Click the button to View All Tables under the tables section.

This brings us to the table results page; the list of tables is displayed on the left, and the currently selected table is displayed on the right; in this case Value of owner-occupied housing units is shown, with counts of units by value brackets. At this stage, we can scroll through the list and browse to find tables with data that we’re interested in. We can also access the filters at the top of the list, if we want to modify our search parameters.

table results

A little further down the results list is a table for Median Value. Selecting that table will preview it on the right. Hit the Customize Table button. This opens the table in its own dedicated view. Hit the blue drop down arrow to the right of the table name, and you can modify the geography, year, or time-period on the left. On the right is a Download option. Hit download and you’ll be prompted to download a CSV file. In the download you’ll get three text files that contain metadata, the data, and descriptive information about the download. Click Download and you can save it.

customize table

Back on the customize table page, you can navigate back to the table results by clicking on “Tables” in the breadcrumb links that appear in the top left-hand corner. Then you can browse and choose additional tables.

That’s it! Not bad, right? Well, there are always caveats. At the moment, the biggest one is that you can’t easily download most geographies that are contained within other geographies. With one click we can filter to select all counties within a state, or all states within the nation. But if we wanted all census tracts in a county or all county subdivisions in a state, there aren’t any “All geographies in…” options for these geographies. We’d have to select each and every tract within a county, one at a time…

While data.census.gov is now relatively stable, it’s still under development and additional features like this should (hopefully) be implemented as time passes between now and the 2020 census. This is one reason why the American Factfinder will survive for another year, as we’ll still need to lean on it to accomplish certain tasks. Of course, there are other options within the Census Bureau (the API, the FTP site) and without (NHGIS, MCDC, Census Reporter) for accessing data.

The new platform currently provides access to several datasets from the present back to the year 2010: the decennial census, the ACS, population estimates, and several of the business datasets. The first new datasets that will be published in data.census.gov (and NOT in the AFF) include the 2017 Economic Census this summer and the 1-year 2018 ACS in September.

View the Release Notes and FAQs for more details about the platform: general documentation, recent developments, bugs, and planned enhancements. The Census Bureau also has an archived webinar with slides that discuss the transition.