I’m still cranking away on my book, which will be published by SAGE Publications and is tentatively titled Exploring the US Census: Your Guide to America’s Data. I’m putting the finishing touches on the chapter devoted to business datasets.
Most of the chapter is dedicated to the Census Bureau’s (CB) Business Patterns and the Economic Census. In a final section I provide an overview of labor force data produced by the Bureau of Labor Statistics (BLS). At first glance these datasets appears to cover a lot of the same ground, but they do vary in terms of methodology, geographic detail, number of variables, and currency / frequency of release. I’ll provide a summary of the options in this post.
Most of these datasets provide data for business establishments, which are individual physical locations where business is conducted or where services or industrial operations are performed, and are summarized by industries, which are groups of businesses that produce similar products or provide similar services. The US federal government uses the North American Industrial Classification System (NAICS), a hierarchical series of codes used to classify businesses and the labor force into divisions and subdivisions at varying levels of detail.
Since most of these datasets are generated from counts, surveys, or administrative records for business establishments they summarize business activity and the labor force based on where people work, i.e. where the businesses are. The Current Population Survey (CPS) and American Community Survey (ACS) are exceptions, as they summarize the labor force based on residency, i.e. where people live. The Census Bureau datasets tend to be more geographically detailed and present data at one point in time, while the BLS datasets tend to be more timely and are focused on providing data in time series. The BLS gives you the option to look at employment data that is seasonally adjusted; this data has been statistically “smoothed” to remove fluctuations in employment due to normal cyclical patterns in the economy related to summer and winter holidays, the start and end of school years, and general weather patterns.
Many of the datasets are subject to data suppression or non-disclosure to protect the confidentiality of businesses; if a given geography or industrial category has few establishments, or if a small number of establishments constitutes an overwhelmingly majority of employees or wages, data is either generalized or withheld. Most of these datasets exclude agricultural workers, government employees, and individuals who are self-employed. Data for these industries and workers is available through the USDA’s Census of Agriculture and the CB’s Census of Governments and Nonemployer Statistics.
The CB datasets are published on the Census Bureau’s website via the American Factfinder, the new data.census.gov, the FTP site and API, and via individual pages dedicated to specific programs. The BLS datasets are accessible through a variety of applications via the BLS Data Tools. For each of the datasets discussed below I link to their program page, so you can see fuller descriptions of how the data is collected and what’s included.
The Census Bureau’s Business Data
- Business Patterns (BP)
- Typically referred to as the County and ZIP Code Business Patterns, this Census Bureau dataset is also published for states, metropolitan areas, and Congressional Districts. Published on an annual basis from administrative records, the number of employees, establishments, and wages (annual and first quarter) is published by NAICS, along with a summary of business establishments by employee size categories.
- Economic Census
- Released every five years in years ending in 2 and 7, this dataset is less timely than the BP but includes more variables: in addition to employment, establishments, and wages data is published on production and sales for various industries, and is summarized both geographically and in subject series that cover the entire industry. The Economic Census employs a mix of enumerations (100% counts) and sample surveying. It’s available for the same geographies as the BP with two exceptions: data isn’t published for Congressional Districts but is available for cities and towns.
Bureau of Labor Statistics Data
- Current Employment Statistics (CES)
- This is a monthly sample survey of approximately 150k businesses and government agencies that represent over 650k physical locations. It measures the number of workers, hours worked, and average hourly wages. Data is published for broad industrial categories for states and metropolitan areas.
- Quarterly Census of Employment and Wages (QCEW)
- An actual count of business establishments that’s conducted four times a year, it captures the same data that’s in the CES but also includes the number of establishments, total wages, and average annual pay (wages and salaries). Data is tabulated for states, metropolitan areas, and counties at detailed NAICS levels.
- Occupational Employment Statistics (OES)
- A bi-annual survey of 200k business establishments that measures the number of employees by occupation as opposed to industry (the specific job people do rather than the overall focus of the business). Data on the number of workers and wages is published for over 800 occupations for states and metro areas using the Standard Occupational Classification (SOC) system.
Labor Force Data by Residency
- Current Population Survey (CPS)
- Conducted jointly by the CB and BLS, this monthly survey of 60k households captures a broad range of demographic and socio-economic information about the population, but was specifically designed for measuring employment, unemployment, and labor force participation. Since it’s a survey of households it measures the labor force based on where people live and is able to capture people who are not working (which is something a survey of business establishments can’t achieve). Monthly data is only published for the nation, but sample microdata is available for researchers who want to create their own tabulations.
- Local Area Unemployment Statistics (LAUS)
- This dataset is generated using a series of statistical models to provide the employment and unemployment data published in the CPS for states, metro areas, counties, cities and towns. Over 7,000 different areas are included.
- American Community Survey (ACS)
- A rolling sample survey of 3.5 million addresses, this dataset is published annually as 1-year and 5-year period estimates. This is the Census Bureau’s primary program for collecting detailed socio-economic characteristics of the population on an on-going basis and includes labor force status and occupation. Data is published for all large geographies and small ones including census tracts, ZCTAs, and PUMAs. Each estimate is published with a margin of error at a 90% confidence interval. Labor force data from the ACS is best used when you’re OK with generally characterizing an area rather than getting a precise and timely measurement, or when you’re working with an array of ACS variables and want labor force data generated from the same source using the same methodology.
In the book I’ll spend a good deal of time navigating the NAICS codes, explaining the impact of data suppression and how to cope with it, and covering the basics of using this data from an economic geography approach. I’ve written some exercises where we calculate location quotients for advanced industries and aggregate ZIP-Code based Business Patterns data to the ZCTA-level. This is still a draft, so we’ll have to wait and see what stays and goes.
In the meantime, if you’re looking for summaries of additional data sources in any and every field I highly recommend Julia Bauder’s excellent Reference Guide to Data Sources. Even though it was published back in 2014 I find that the descriptions and links are still spot on – it primarily covers public and free US federal and international government sources.