Day: February 17, 2020

census_paper_wcib_ops

An Overview of Census Datasets and Census API Examples

This month’s post is a bit shorter, as I have just two announcements I wanted to share about some resources I’ve created.

First, I’ve written a short technical paper that’s just been published as part of the Weissman Center of International Business’ Occasional Papers Series. Exploring US Census Datsets: A Summary of Surveys and Sources provides an overview of several different datasets (decennial census, American Community Survey, Population Estimates Program, and County Business Patterns) and sources for accessing data. The paper illustrates basic themes that are part of all my census-related talks: the census isn’t just the thing that happens every ten years but is an ecosystem of datasets updated on an on-going basis, and there are many sources for accessing data which are suitable for different purposes and designed for users with varying levels of technical skill. In some respects this paper is a super-abridged version of my book, designed to serve as an introduction and brief reference.

Second, I’ve created a series of introductory notebooks on GitHub that illustrate how to use the Census Bureau’s API with Python and Jupyter Notebooks. I designed these for a demonstration I gave at NYU’s Love Data Week back on Feb 10 (the slides for the talk are also available in the repo). I structured the talk around three examples. Example A demonstrates the basics of how the API works along with some best practices, such as defining your variables at the top and progressively building links to retrieve data. It also illustrates the utility of using these technologies in concert, as you can pull data into your script and process and visualize it in one go. I also demonstrate how to retrieve lists of census variables and their corresponding metadata, which isn’t something that’s widely documented. Example B is a variation of A, extended by adding an API key and storing data in a file immediately after retrieval. Example C introduces more complexity, reading variables in from files and looping through lists of geographies to make multiple API calls.

Since I’ve written a few posts on the census API recently, I went back and added an api tag to group them together, so you can access them via a single link.

census api example

Define census API variables, build links, and retrieve data