research

USPS mailbox

The Trouble with ZIP Codes: Solutions for Data Analysis and Mapping

Since the COVID-19 pandemic began, I’ve received several questions about finding census data and boundary files for ZIP Codes (aka US postal codes), as many states are publishing ZIP Code-level data for cases and deaths. ZIP Codes are commonly used for summarizing address data, as it’s easy to do and most Americans are familiar with them. However, there are a number of challenges associated with using ZIP Codes as a unit of analysis that most people are unaware of (until they start using them). In this post I’ll summarize these challenges and provide some solutions.

The short story is: you can get boundary files and census data from the decennial census and 5-year American Community Survey (ACS) for ZIP Code Tabulation Areas (ZCTAs, pronounced zicktas) which are approximations of ZIP Codes that have delivery areas. Use any census data provider to get ZCTA data: data.census.gov, Census Reporter, Missouri Census Data Center, NHGIS, or proprietary library databases like PolicyMap or the Social Explorer. The longer story: if you’re trying to associate ZIP Code-level data with census ZCTA boundary files or demographic data, there are caveats. I’ll cover the following issues in detail:

  1. ZIP Codes are actually not areas with defined boundaries, and there are no official USPS ZIP Code maps. Areas must be derived using address files. The Census Bureau has done this in creating ZIP Code Tabulation Areas (ZCTAs).
  2. The Census Bureau publishes population data by ZCTA and boundary files for them. But ZCTAs are not strictly analogous with ZIP Codes; there isn’t a ZCTA for every ZIP Code, and if you try to associate ZIP data with them some of your records won’t match. You need to crosswalk your ZIP Code data to the ZCTA-level to prevent this.
  3. ZCTAs do not nest or fit within any other census geographies, and the postal city name associated with a ZIP Code does not correlate with actual legal or municipal areas. This can make selecting and downloading ZIP Code data for a given area difficult.
  4. ZIP Codes were designed for delivering mail, not for studying populations. They vary tremendously in size, shape, and population.
  5. Analyzing data at either the ZIP Code or ZCTA level over time is difficult to impossible.
  6. ZIP Code and ZCTA numbers must be saved as text in data files, and not as numbers. Otherwise codes that have leading zeros get truncated, and the code becomes incorrect.

ZIP Codes versus ZCTAs and Boundaries

Contrary to popular belief, ZIP Codes are not areas and the US Postal Service does not delineate boundaries for them. They are simply numbers assigned to ranges of addresses along street segments, and the codes are associated with a specific post office. When we see ZIP Code boundaries (on Google Maps for example), these have been derived by creating areas where most addresses share the same ZIP Code.

The US Census Bureau creates areal approximations for ZIP Codes called ZIP Code Tabulation Areas or ZCTAs. The Bureau assigns census blocks to a ZIP number based on the ZIP that’s used by a majority of the addresses within each block, and aggregates blocks that share the same ZIP to form a ZCTA. After this initial assignment, they make some modifications to aggregate or eliminate orphaned blocks that share the same ZIP number but are not contiguous. ZCTAs are delineated once every ten years in conjunction with the decennial census, and data from the decennial census and the 5-year American Community Survey (ACS) are published at the ZCTA-level. You can download ZCTA boundaries from the TIGER / Line Shapefiles page, and there is also a generalized cartographic boundary file for them.

Crosswalking ZIP Code Data to ZCTAs

There isn’t a ZCTA for every ZIP Code. Some ZIP Codes represent large clusters of Post Office boxes or are assigned to large organizations that process lots of mail. As census blocks are aggregated into ZCTAs based on the predominate ZIP Code for addresses within the block, these non-areal ZIPs fall out of the equation and we’re left with ZCTAs that approximate ZIP Codes for delivery areas.

As a result, if you’re trying to match either your own summarized address data or sources that use ZIP Codes as the summary level (such as the Census Bureau’s Business Patterns and Economic Census datasets), some ZIP Codes will not have a matching ZCTA and will fall out of your dataset.

To prevent this from happening, you can aggregate your ZIP Code data to ZCTAs prior to joining it to boundary files or other datasets. The UDS Mapper project publishes a ZIP Code to ZCTA Crosswalk file that lists every ZIP Code and the ZCTA it is associated with. For the ZIP Codes that don’t have a corresponding area (the PO Box clusters and large organizations), these essentially represent points that fall within ZCTA polygons. Join your ZIP-level data to the ZIP Code ID in the crosswalk file, and then group or summarize the data using the ZCTA number in the crosswalk. Then you can match this ZCTA-summarized data to boundaries or census demographic data at the ZCTA-level.

ZIP Code to ZCTA Crosswalk

UDS ZIP Code to ZCTA Crosswalk. ZIP Code 99501 is an areal ZIP Code with a corresponding ZCTA number, 99501. ZIP Code 99520 is a post office or large volume customer that falls inside ZCTA 99501, and thus is assigned to that ZCTA.

Identifying ZIPs and ZCTAs within Other Areas

ZCTAs are built from census blocks and nest within the United States; they do not fit within any other geographies like cities and towns, counties, or even states. The boundaries of a ZCTA will often cross these other boundaries, so for example a ZCTA may fall within two or three different counties. This makes it challenging to select and download census data for all ZCTAs in a given area.

You can get lists of ZIP Codes for places, for example by using the MCDC’s ZIP Code Lookup. The problem is, the postal city that appears in addresses and is affiliated with a ZIP Code does not correspond with cities as actual legal entities, so you can’t count on the name to select all ZIPs within a specific place. For example, my hometown of Claymont, Delaware has its own ZIP Code, even though Claymont is not an incorporated city with formal, legal boundaries. Most of the ZIP Codes around Claymont are affiliated with Wilmington as a place, even though they largely cover suburbs outside the City of Wilmington; the four ZIP Codes that do cover the city cross the city boundary and include outside areas. In short, if you select all the ZIP Codes that have Wilmington, DE as their place name, they actually cover an area that’s much larger than the City of Wilmington. The Census Bureau does not associate ZCTAs with place names.

ZCTAs and Places in northern Delaware

Lack of correspondence between postal city names and actual city boundaries. Most ZCTAs with the prefix 198 are assigned to Wilmington as a place name, even though many are partially or fully outside the city.

So how can you determine which ZIP Codes fall within a certain area? Or how they do (or don’t) intersect with other areas? You can overlay and eyeball the areas in TIGERweb to get a quick idea. For something more detailed, here are three options:

  1. The Missouri Census Data Center’s Geocorr application lets you calculate overlap between a source geography and a target geography using either total population or land area for any census geographies. So in a given state, if you select ZCTAs as a source, and counties as the target, you’ll get a list that displays every ZCTA that falls wholly or partially within each county. An allocation factor indicates the percentage of the ZCTA (population or land) that’s inside and outside a county, and you can make decisions as to whether to include a given ZCTA in your study area or not. If a ZCTA falls wholly inside one county, there will be only one record with an allocation factor of 1. If it intersects more than one county, there will be a record with an allocation factor for each county.
  2. The US Department of Housing and Urban Development (HUD) publishes a series of ZIP Code crosswalk files that associates ZIP Codes with census tracts, counties, CBSAs (metropolitan areas), and congressional districts. They create these files by geocoding all addresses and calculating the ratio of residential, business, and other addresses that fall within each of these areas and that share the same ZIP Code. The files are updated quarterly. You can use them to select, assign, or apportion ZIP Codes to a given area. There’s a journal article that describes this resource in detail.
  3. Some websites allow you to select all ZCTAs that fall within a given geography when downloading data, essentially by selecting all ZCTAs that are fully or partially within the area. The Census Reporter allows you to do this: search for a profile for an area, click on a table of interest, and then subdivide the areas by smaller areas. You can even look at a map to see what’s been selected. data.census.gov currently does not provide this option; you have to select ZCTAs one by one (or if you’re using the census API, you’ll need to create a list of ZCTAs to retrieve).
MCDC Geocorr

Sample output from MCDC Geocorr. ZCTAs 08251 and 08260 fall completely within Cape May County, NJ. ZCTA 08270’s population is split between Cape May (92.4%) and Atlantic (7.6%) counties. The ZCTA names are actually postal place names; these ZCTAs cover areas that are larger than these places.

Do You Really Need to Use ZIP Codes?

ZIP Codes were an excellent mid-20th century solution for efficiently processing and delivering mail that continues to be useful for that purpose. They are less ideal for studying populations or other forms of human activity. They vary tremendously in size, shape, and population which makes them inconsistent as a unit of analysis. They have no legal or administrative meaning or function, other than delivering mail. While all American’s are familiar with them, they do not have any relevant social meaning. They don’t represent neighborhoods, and when you ask someone where they’re from, they won’t say “19703”.

So what are your other options?

  1. If you don’t have to use ZIP Code or ZCTA data for your project, don’t. For the United States as a whole, consider using counties, PUMAs, or metropolitan areas. Within states: counties, PUMAs, and county subdivisions. For smaller areas: municipalities, census tracts, or aggregates of census tracts.
  2. If you have the raw, address-based data, consider geocoding it. Once you geocode an address, you can use GIS to assign it to any type of geography that you have a boundary file for (spatial join), and then you can aggregate it to that geography. Some geocoders even provide geographies like counties or tracts in the match result. If your data is sensitive, strip all the attributes out except for the address and a serial integer to use as an ID, and after geocoding you can associate the results back to your original data using that ID. The Census Geocoder is free, requires no log in, allows you to do batches of 1,000 addresses at a time, and forces you to use these safety precautions. For bigger jobs, there’s an API.
  3. Sometimes you’ll have no choice and must use ZIP Code / ZCTA data, if what you’re interested in studying is only provided in that summary form, or if there are privacy concerns around geocoding the raw address data. You may want to modify the ZCTA geography for your area to aggregate smaller ZCTAs into larger ones surrounding them, for both visual display and statistical analysis. For example, in New York City there are several ZCTAs that cover only one city and census block, as they’re occupied by one large office building that processes a lot of mail (and thus have their own ZIP number). Also, unlike most census geographies, ZCTAs have large holes in them. Any area that does not have streets and thus no addresses isn’t included in a ZCTA. In urban areas, this means large parks and cemeteries. In rural areas, vast tracts of unpopulated forest, desert, or mountain terrain. And large bodies of water in every place.
Midtown ZCTAs

One-block ZCTAs in Midtown Manhattan, NYC that have either low or zero population.

Analyzing ZIP Code Data Over Time…

In short – forget it. The Census Bureau introduced ZCTAs in the year 2000, and in 2010 they modified their process for creating them. For a variety of reasons, they’re not strictly compatible. ACS data for ZCTAs wasn’t published until 2013. Even the economic datasets don’t go that far back; the ZIP Code Business Patterns didn’t appear until the early 1990s. Use areas that have more longevity and are relatively stable: counties, census tracts.

Why Do my ZIP Codes Look Wrong in Excel?

Regardless of whether you’re using a spreadsheet, database, or scripting language, always make sure to define ZIP / ZCTA columns as strings or text, and not as numeric types. ZIP Codes and ZCTAs begin with zeros in several states. Columns that contain ZIP / ZCTA codes must be saved as text to preserve the 5-digit code. If they’re saved as numbers, the leading zeros are dropped and the numbers are rendered incorrectly. This often happens if you’re working with data in a CSV file and you click on it to open it in Excel. In parsing the CSV, Excel assumes the ZIP / ZCTA field is a number and saves it as a number, which drops the zero and truncates the code. To prevent this from happening: open Excel to a blank project, go to the Data ribbon, click the button to import text data, choose delimited text on the import screen, choose the delimiter (comma or tab, etc), and when prompted you can select the ZIP / ZCTA column and designate it as text so that it imports properly.

Importing text files in Excel

To import CSV files in Excel, go to the Data ribbon and under Get External Data select From Text.

Conclusion

That’s all you ever (or maybe never) wanted to know about ZIP Codes and ZCTAs! For more information see the Census Bureau’s page about ZCTAs, a thorough write up by the Missouri Census Data Center, and these informative and fun blog posts from PolicyMap (complete with photos of Mr. ZIP). I wrote an article a few years back that demonstrates how to use some of these resources (the UDS mapper file, Geocorr) to process ZIP data with SQL and python. And of course, check out my book, Exploring the U.S. Census: Your Guide to America’s Data, to explore these concepts and resources in greater detail with hands-on exercises.

census_paper_wcib_ops

An Overview of Census Datasets and Census API Examples

This month’s post is a bit shorter, as I have just two announcements I wanted to share about some resources I’ve created.

First, I’ve written a short technical paper that’s just been published as part of the Weissman Center of International Business’ Occasional Papers Series. Exploring US Census Datsets: A Summary of Surveys and Sources provides an overview of several different datasets (decennial census, American Community Survey, Population Estimates Program, and County Business Patterns) and sources for accessing data. The paper illustrates basic themes that are part of all my census-related talks: the census isn’t just the thing that happens every ten years but is an ecosystem of datasets updated on an on-going basis, and there are many sources for accessing data which are suitable for different purposes and designed for users with varying levels of technical skill. In some respects this paper is a super-abridged version of my book, designed to serve as an introduction and brief reference.

Second, I’ve created a series of introductory notebooks on GitHub that illustrate how to use the Census Bureau’s API with Python and Jupyter Notebooks. I designed these for a demonstration I gave at NYU’s Love Data Week back on Feb 10 (the slides for the talk are also available in the repo). I structured the talk around three examples. Example A demonstrates the basics of how the API works along with some best practices, such as defining your variables at the top and progressively building links to retrieve data. It also illustrates the utility of using these technologies in concert, as you can pull data into your script and process and visualize it in one go. I also demonstrate how to retrieve lists of census variables and their corresponding metadata, which isn’t something that’s widely documented. Example B is a variation of A, extended by adding an API key and storing data in a file immediately after retrieval. Example C introduces more complexity, reading variables in from files and looping through lists of geographies to make multiple API calls.

Since I’ve written a few posts on the census API recently, I went back and added an api tag to group them together, so you can access them via a single link.

census api example

Define census API variables, build links, and retrieve data

Census Book

Exploring the US Census Book Published!

My book, Exploring the US Census: Your Guide to America’s Data, has been published! You can purchase it directly from SAGE Publishing or from Barnes and Nobles, Amazon, or your bookstore of choice (it’s currently listed for pre-order on Amazon but its availability there is imminent). It’s $45 for the paperback, $36 for the ebook. Data for the exercises and supplemental material is available on the publisher’s website, and I’ve created a landing page for the book on this site.

Exploring the US Census is the definitive researcher’s guide to working with census data. I place the census within the context of: US society, the open data movement, and the big data universe, provide a crash course on using the new data.census.gov, and introduce the fundamental concepts of census geography and subject categories (aka universes). One chapter is devoted to each of the primary datasets: decennial census (with details about the 2020 census that’s just over the horizon), American Community Survey, Population Estimates Program, and business data from the Business Patterns, Economic Census, and BLS. Subsequent chapters demonstrate how to: integrate census data into writing and research, map census data in GIS, create derivative measures, and work with historic data and microdata with a focus on the Current Population Survey.

I wrote the book as a hybrid between a techie guidebook and an academic text. I provide hands-on exercises so that you learn by doing (techie) while supplying sufficient context so you can understand and evaluate why you’re doing it (academic). I demonstrate how to find and download data from several different sources, and how to work with the data using free and open source software: spreadsheets (LibreOffice Calc), SQL databases (DB Browser for SQLite), and GIS (QGIS). I point out the major caveats and pitfalls of working with the census, along with many helpful tools and resources.

The US census data ecosystem provides us with excellent statistics for describing, studying, and understanding our communities and our nation. It is a free and public domain resource that’s a vital piece of the country’s social, political, and economic infrastructure and a foundational element of American democracy. This book is your indispensable road map for navigating the census. Have a good trip!

See the series – census book tag for posts about the content of the book, additional material that expands on that content (but didn’t make it between the covers), and the writing process.

A-Train Classic

Neighborhood Research and the Census for Undergrads

Each semester I visit several undergraduate classes in public affairs and journalism, to introduce students to census data. They’re researching or reporting on particular issues and trends in neighborhoods in New York City, and they are looking for statistics to either support their work or generate ideas for a story. I usually showcase the NYC Population Factfinder as a starting point, mention the Census Reporter for areas outside the city , and provide background info on the decennial census, American Community Survey, and census geography and subjects. This year I included two new examples toward the beginning of the lecture to spark their interest.

I recently helped reporter Susannah Jacob navigate census data for an article she wrote on hyper-gentrification in the West Village for the New York Review of Books. A perfect example, as it’s what the students are expected to do for their assignment! Like any good journalist (and human geographer), Susannah pounded the pavement of the neighborhood, interviewing residents and small businesses and observing and documenting the urban landscape and how it was changing. But she also wanted to see what the data could tell her, and whether it would corroborate or refute what she was seeing and hearing.

NYRB Article on the West Village

Source: Jacob & Roye, New York Review of Books, Oct 2019. https://www.nybooks.com/daily/2019/10/09/what-happened-to-the-west-village/

We used the NYC Population Factfinder to assemble census tracts to approximate the neighborhood, and I did a little legwork to pull data from the County / ZIP Code Business Patterns so we could see how the business landscape was changing. The most surprising stat we discovered was that the number of 1-unit detached homes had doubled. This wouldn’t be odd in many rapidly growing places in the US, but it’s unusual for an old, built-out urban neighborhood. A 1-unit detached home is a free-standing single family structure that doesn’t share walls with other buildings. Most homes in Manhattan are either attached (row houses / town houses) or units in multi-unit buildings (apartments / condos / co-ops). How could this be? Uber-wealthy people are buying up adjoining row homes, knocking down the walls, and turning them into urban mansions. Seems extraordinary, but apparently is part of a trend.

We certainly ran up against the limitations of ACS data. The estimates for tracts have large margins of error, and when comparing two short time frames it’s difficult to detect actual change, as differences in estimates are clouded by sampling noise. Even after aggregating several tracts, many of the estimates for change weren’t reliable enough to report. When they were (as in the housing example) you could only say that there has been a relative increase without becoming wedded to a precise number. In this case, from 214 (+/- 127) detached units in 2006-2010 to 627 (+/-227) in 2013-2017, an increase of 386 (+/- 260). Not great estimates, but you can say it’s an increase as the low end for change is still positive at 126 units. Considering the time frame and character of the neighborhood, that’s still noteworthy (bearing in mind we’re working with a 90% confidence interval). In cases where the differences overlap and could represent either an increase or decrease there are few claims you can make, and it’s best to walk away (or look at larger area). I always discuss the margin of error with students and caution them about treating these numbers as counts.

While census data is invaluable for describing and studying individual places, it’s inherent geographic nature also allows us to study places in relation to each other, and to illustrate geographic patterns. For my second example, I zoom out and show them this map of racial-ethnic distribution in the United States:

Map of US Racial and Ethnic Diversity

Source: William H. Frey analysis of US Census population estimates, 2018. https://www.brookings.edu/research/americas-racial-diversity-in-six-maps/

This is one of a series of six maps by demographer William Frey at the Brookings Institute that highlights the geographic diversity of the United States. In this map, each county is shaded for a particular race / ethnicity if the population of that group in that county is greater than that group’s share of the national population. For example, Hispanics / Latinos represent 18.3% of the total US population, so counties where they represent more than this percentage are shaded.

For the purpose of the class, it helps make the census ‘pop’ and gets the students to think about the statistics as geospatial datasets that they can see and relate to, and that can form the basis for interesting research.

Some footnotes – if you like Frey’s maps, I highly recommend his book Diversity Explosion: How New Racial Demographics are Remaking America. It explores the evolving demographic and geographic landscape of the US with clear, accessible writing and more of these great maps (in color).

I used the pic at the top of this post as the background for my intro slide. It’s a screenshot of a city from A-Train, a 1992 city-building train simulator that was ported from Japan to the world by Artdink and Maxis, following the success of something called SimCity. It wasn’t nearly as successful, but I always liked the graphics which have now attained a retro-gaming vibe.

FRED Chart - Pesronal Savings Rate

Finding Economic Data with FRED

I attended ALA’s annual conference in DC last month, where I met FRED. Not a person, but a database. I can’t believe I hadn’t met FRED before – it is an amazingly valuable resource for national, time-series economic data.

FRED was created by the Economic Research unit of the Federal Reserve Bank of St. Louis. It was designed to aggregate economic data from many government sources into a centralized database, with straightforward interface for creating charts and tables. At present, it contains 567,000 US and international time series datasets from 87 sources.

Categories of data include banking and finance (interest and exchange rates, lending, monetary data), labor markets (basic demographics, employment and unemployment, job openings, taxes, real estate), national accounts (national income, debt, trade), production and business (business cycles, production, retail trade, sector-level information about industries),  prices (commodities, consumer price indexes) and a lot more. Sources include the Federal Reserve, the Bureau of Labor Statistics, the Census Bureau, the Bureau of Economic Analysis, the Treasury Department, and a mix of other government and corporate sources from the US and around the world.

On their home page at https://fred.stlouisfed.org/ you can search for indicators or choose one of several options for browsing. The default dashboard shows you some of the most popular series and newest releases at a glance. Click on Civilian Unemployment Rate, and you retrieve a chart with monthly stats that stretch from the late 1940s to the present. Most of FRED’s plots highlight periods of recession since these have a clear impact on economic trends. You can modify the chart’s date range, change the frequency (monthly, quarterly, annually – varies by indicator), download the chart or the underlying data in a number of formats, and share a link to it. There are also a number of advanced customization features, such as adding other series to the chart. Directly below the chart are notes that provide a clear definition of the indicator and its source (in this case, the Bureau of Labor Statistics) and links to related tables and resources.

FRED - Chart of Civilian Unemployment Rate

The unemployment rate is certainly something that you’d expect to see, but once you browse around a bit you’ll be surprised by the mix of statistics and the level of detail. I happened to stumble across a monthly Condo Price Index for the New York City Metro Area.

Relative to other sources or portals, FRED is great for viewing and retrieving national (US and other countries) economic and fiscal data and charts gathered from many sources. It’s well suited for time-series data; there are lots of indexes and you can opt for seasonally adjusted or unadjusted values. Many of the series include data for large regions of the US, states, metro areas, and counties. The simplest way to find sub-national data is to do a search, and once you do you can apply filters for concepts, frequencies, geographies, and sources. FRED is not the place to go if you need data for small geographies below the county level. If you opt to create a FRED account (purely optional) you’ll be able to save and track indicators that you’re interested in and build your own dashboards.

If you’re interested in maps, visit FRED’s brother GeoFRED at https://geofred.stlouisfed.org/.  The homepage has a series of sample thematic maps for US counties and states and globally for countries. Choose any map, and once it opens you can change the geography and indicator to something else. You can modify the frequency, units, and time periods for many of the indicators, and you have basic options for customizing the map (colors, labels, legend, etc.) The maps are interactive, so you can zoom in and out and click on a place to see its data value. Most of the county-level data comes from the Census Bureau, but as you move up to states or metro areas the number of indicators and sources increase. For example, the map below shows individual income taxes collected per capita by state in 2018.

GeoFRED - State Income Tax

There’s a basic search function for finding specific indicators. Just like the charts, maps can be downloaded as static images, shared and embedded in websites, and you can download the data behind the map (it’s simpler to download the same indicator for multiple geographies using GeoFRED compared to FRED).

Take a few minutes and check it out. For insights and analyses of data published via FRED, visit FRED’s blog at https://fredblog.stlouisfed.org/.

Census Workshop Recap

I’ve been swamped these past few months, revising my census book, teaching a spatial database course, and keeping the GIS Lab running. Thus, this will be a shorter post!

Last week I taught a workshop on understanding, finding, and accessing US Census Data at the Metropolitan Library Council of New York. If you couldn’t make it, here are the presentation slides and the group exercise questions.

Most of the participants were librarians who were interested in learning how to help patrons find and understand census data, but there were also some data analysts in the crowd. We began with an overview of how the census is structured by dataset, geography, and subject categories. I always cover the differences between the decennial census and the ACS, with a focus on how to interpret ACS estimates and gauge their reliability.

For workshops I think it’s best to start with searching for profiles (lots of different data for one place). This gives new users a good overview of the breadth and depth of the types of variables that are available in the census. Since this was a New York City-centric crowd we looked at the City’s excellent NYC Population Factfinder first. The participants formed small groups and searched through the application to answer a series of fact-finding questions that I typically receive. Beyond familiarizing themselves with the applications and data, the exercises also helped to spark additional questions about how the census is structured and organized.

Then we switched over to the Missouri Census Data Center’s profile and trends applications (listed on the right hand side of their homepage) to look up data for other parts of the country, and in doing so we were able to discuss the different census geographies that are available for different places. Everyone appreciated the simple and easy to use interface and the accessible tables and graphics. The MCDC doesn’t have a map-based search, so I did a brief demo of TIGERweb for viewing census geography across the country.

Once everyone had this basic exposure, we hopped into the American Factfinder to search for comparison tables (a few pieces of data for many places). We discussed how census data is structured in tables and what the difference between the profile, summary, and detailed tables are. We used the advanced search and I introduced my tried and true method of filtering by dataset, geography, and topic to find what we need. I mentioned the Census Reporter as good place to go for ACS documentation, and as an alternate source of data. Part of my theme was that there are many tools that are suitable for different needs and skill levels, and you can pick your favorite or determine what’s suitable for a particular purpose.

We took a follow-the-leader approach for the AFF, where I stepped through the website and the process for downloading two tables and importing them into a spreadsheet, high-lighting gotchas along the way. We did some basic formulas for aggregating ACS estimates to create new margins of error, and a VLOOKUP for tying data from two tables together.

We wrapped up the morning with a foreshadowing of what’s to come with the new data.census.gov (which will replace the AFF) and the 2020 census. While there’s still much uncertainty around the citizenship question and fears of an under count, the structure of the dataset won’t be too different from 2010 and the timeline for release should be similar.