LISA map of Broad Band Subscription by Household

Mapping US Census Data on Internet Access

ACS Data on Computers and the Internet

The Census Bureau recently released the latest five-year period estimates from the American Community Survey (ACS), with averages covering the years from 2013 to 2017.

Back in 2013 the Bureau added new questions to the ACS on computer and internet use: does a household have a computer or not, and if yes what type (desktop or laptop, smartphone, tablet, or other), and does a household have an internet subscription or not, and if so what kind (dial-up, broadband, and type of broadband). 1-year averages for geographies with 65,000 people or more have been published since 2013, but now that five years have passed there is enough data to publish reliable 5-year averages for all geographies down to the census tract level. So with this 2013-2017 release we have complete coverage for computer and internet variables for all counties, ZCTAs, places (cities and towns), and census tracts for the first time.

Summaries of this data are published in table S2801, Types of Computers and Internet Subscriptions. Detailed tables are numbered B28001 through B28010 and are cross-tabulated with each other (presence of computer and type of internet subscription) and by age, educational attainment, labor force status, and race. You can access them all via the American Factfinder or the Census API, or from third-party sites like the Census Reporter. The basic non-cross-tabbed variables have also been incorporated into the Census Bureau’s Social Data Profile table DP02, and in the MCDC Social profile.

The Census Bureau issued a press-release that discusses trends for median income, poverty rates, and computer and internet use (addressed separately) and created maps of broadband subscription rates by county (I’ve inserted one below). According to their analysis, counties that were mostly urban had higher average rates of access to broadband internet (75% of all households) relative to mostly rural counties (65%) and completely rural counties (63%). Approximately 88% of all counties that had subscription rates below 60 percent were mostly or completely rural.

Figure 1. Percentage of Households With Subscription to Any Broadband Service: 2013-2017[Source: U.S. Census Bureau]

Not surprisingly, counties with lower median incomes were also associated with lower rates of subscription. Urban counties with median incomes above $50,000 had an average subscription rate of 80% compared to 71% for completely rural counties. Mostly urban counties with median incomes below $50k had average subscription rates of 70% while completely rural counties had an average rate of 62%. In short, wealthier rural counties have rates similar to less wealthy urban counties, while less wealthy rural areas have the lowest rates of all. There also appear to be some regional clusters of high and low broadband subscriptions. Counties within major metro areas stand out as clusters with higher rates of subscription, while large swaths of the South have low rates of subscription.

Using GeoDa to Identify Broadband Clusters

I was helping a student recently with making LISA maps in GeoDa, so I quickly ran the data (percentage of households with subscription to any broadband service) through to see if there were statistically significant clusters. It’s been a couple years since I’ve used GeoDa and this version (1.12) is significantly more robust than the one I remember. It focuses on spatial statistics but has several additional applications to support basic data mapping and stats. The interface is more polished and the software can import and export a number of different vector and tabular file formats.

The Univariate Local Moran’s I analysis, also known as LISA for local indicators of spatial auto-correlation, identifies statistically significant geographic clusters of a particular variable. Once you have a polygon shapefile or geopackage with the attribute you want to study, you add it to GeoDa and then create a weights file (Tools menu) using the unique identifier for the shapes. The weights file indicates how individual polygons neighbor each other: queens contiguity classifies features as neighbors as long as they share a single node, while rooks contiguity classifies them as neighbors if they share an edge (at least two points that can form a line).

Once you’ve created and saved a weights file you can run the analysis (Shapes menu). You select the variable that you want to map, and can choose to create a cluster map, scatter plot, and significance map. The analysis generates 999 random permutations of your data and compares it to the actual distribution to evaluate whether clusters are likely the result of random chance, or if they are distinct and significant. Once the map is generated you can right click on it to change the number of permutations, or you can filter by significance level. By default a 95% confidence level is used.

The result for the broadband access data is below. The High-High polygons in red are statistically significant clusters of counties that have high percentages of broadband use: the Northeast corridor, much of California, the coastal Pacific Northwest, the Central Rocky Mountains, and certain large metro areas like Atlanta, Chicago, Minneapolis, big cities in Texas, and a few others. There is a relatively equal number of Low-Low counties that are statistically significant clusters of low broadband service. This includes much of the deep South, south Texas, and New Mexico. There are also a small number of outliers. Low-High counties represent statistically significant low values surrounded by higher values. Examples include highly urban counties like Philadelphia, Baltimore City, and Wayne County (Detroit) as well as some rural counties located along the fringe of metro areas. High-Low counties represent significant higher values surrounded by lower values. Examples include urban counties in New Mexico like Santa Fe, Sandoval (Albuquerque), and Otero (Alamogordo), and a number in the deep south. A few counties cannot be evaluated as they are islands (mostly in Hawaii) and thus have no neighbors.

LISA map of Broad Band Subscription by Household

LISA Map of % of Households that have Access to Broadband Internet by County (2013-2017 ACS). 999 permutations, 95% conf interval, queens contiguity

All ACS data is published at a 90% confidence level and margins of error are published for each estimate. Margins of error are typically higher for less populated areas, and for any population group that is small within a given area. I calculated the coefficient of variation for this variable at the county level to measure how precise the estimates are, and used GeoDa to create a quick histogram. The overwhelming majority had CV values below 15, which is regarded as being highly reliable. Only 16 counties had values that ranged from 16 to 24, which puts them in the medium reliability category. If we were dealing with a smaller population (for example, dial-up subscribers) or smaller geographies like ZCTAs or tracts, we would need to be more cautious in analyzing the results, and might have to aggregate smaller populations or areas into larger ones to increase reliability.

Wrap Up

The issue of the digital divide has gained more coverage in the news lately with the exploration of the geography of the “new economy”, and how technology-intensive industries are concentrating in certain major metros while bypassing smaller metros and rural areas. Lack of access to broadband internet and reliable wifi in rural areas and within older inner cities is one of the impediments to future economic growth in these areas.

You can download a shapefile with the data and results of the analysis described in this post.

Washington DC street

Using the ACS to Calculate Daytime Population

I’m in the home stretch for getting the last chapter of the first draft of my census book completed. The next to last chapter of the book provides an overview of a number of derivatives that you can create from census data, and one of them is the daytime population.

There are countless examples of using census data for site selection analysis and for comparing and ranking places for locating new businesses, providing new public services, and generally measuring potential activity or population in a given area. People tend to forget that census data measures people where they live. If you were trying to measure service or business potential for residents, the census is a good source.

Counts of residents are less meaningful if you wanted to gauge how crowded or busy a place was during the day. The population of an area changes during the day as people leave their homes to go to work or school, or go shopping or participate in social activities. Given the sharp divisions in the US between residential, commercial, and industrial uses created by zoning, residential areas empty out during the weekdays as people travel into the other two zones, and then fill up again at night when people return. Some places function as job centers while others serve as bedroom communities, while other places are a mixture of the two.

The Census Bureau provides recommendations for calculating daytime population using a few tables from the American Community Survey (ACS). These tables capture where workers live and work, which is the largest component of the daytime population.

Using these tables from the ACS:

Total resident population
B01003: Total Population
Total workers living in area and Workers who lived and worked in same area
B08007: Sex of Workers by Place of Work–State and County Level (‘Total:’ line and ‘Worked in county of residence’ line)
B08008: Sex of Workers by Place of Work–Place Level (‘Total:’ line and ‘Worked in place of residence’ line)
B08009: Sex of Workers by Place of Work–Minor Civil Division Level (‘Total:’ line and ‘Worked in MCD of residence’ line)
Total workers working in area
B08604: Total Workers for Workplace Geography

They propose two different approaches that lead to the same outcome. The simplest approach: add the total resident population to the total number of workers who work in the area, and then subtract the total resident workforce (workers who live in the area but may work inside or outside the area):

Daytime Population = Total Residents + Total Workers in Area - Total Resident Workers

For example, according to the 2017 ACS Washington DC had an estimated 693,972 residents (from table B01003), 844,345 (+/- 11,107) people who worked in the city (table B08604), and 375,380 (+/- 6,102) workers who lived in the city. We add the total residents and total workers, and subtract the total workers who live in the city. The subtraction allows us to avoid double counting the residents who work in the city (as they are already included in the total resident population) while omitting the residents who work outside the city (who are included in the total resident workers). The result:

693,972 + 844,345 - 375,380 = 1,162,937

And to get the new margin of error:

SQRT(0^2 + 11,107^2 + 6,102^2) = 12,673

So the daytime population of DC is approx 468,965 people (68%) higher than its resident population. The district has a high number of jobs in the government, non-profit, and education sectors, but has a limited amount of expensive real estate where people can live. In contrast, I did the calculation for Philadelphia and its daytime population is only 7% higher than its resident population. Philadelphia has a much higher proportion of resident workers relative to total workers. Geographically the city is larger than DC and has more affordable real estate, and faces stiffer suburban competition for private sector jobs.

The variables in the tables mentioned above are also cross-tabulated in other tables by age, sex, race, Hispanic origin , citizenship status, language, poverty, and tenure, so it’s possible to estimate some characteristics of the daytime population. Margins of error will limit the usefulness of estimates for small population groups, and overall the 5-year period estimates are a better choice for all but the largest areas. Data for workers living in an area who lived and worked in the same area is reported for states, counties, places (incorporated cities and towns), and municipal civil divisions (MCDs) for the states that have them.

Data for the total resident workforce is available for other, smaller geographies but is reported for those larger places, i.e. we know how many people in a census tract live and work in their county or place of residence, but not how many live and work in their tract of residence. In contrast, data on the number of workers from B08604 is not available for smaller geographies, which limits the application of this method to larger areas.

Download or explore these ACS tables from your favorite source: the American Factfinder, the Census Reporter, or the Missouri Census Data Center.

Business and Labor Force Data: The Census and the BLS

I’m still cranking away on my book, which will be published by SAGE Publications and is tentatively titled Exploring the US Census: Your Guide to America’s Data. I’m putting the finishing touches on the chapter devoted to business datasets.

Most of the chapter is dedicated to the Census Bureau’s (CB) Business Patterns and the Economic Census. In a final section I provide an overview of labor force data produced by the Bureau of Labor Statistics (BLS). At first glance these datasets appears to cover a lot of the same ground, but they do vary in terms of methodology, geographic detail, number of variables, and currency / frequency of release. I’ll provide a summary of the options in this post.

The Basics

Most of these datasets provide data for business establishments, which are individual physical locations where business is conducted or where services or industrial operations are performed, and are summarized by industries, which are groups of businesses that produce similar products or provide similar services. The US federal government uses the North American Industrial Classification System (NAICS), a hierarchical series of codes used to classify businesses and the labor force into divisions and subdivisions at varying levels of detail.

Since most of these datasets are generated from counts, surveys, or administrative records for business establishments they summarize business activity and the labor force based on where people work, i.e. where the businesses are. The Current Population Survey (CPS) and American Community Survey (ACS) are exceptions, as they summarize the labor force based on residency, i.e. where people live. The Census Bureau datasets tend to be more geographically detailed and present data at one point in time, while the BLS datasets tend to be more timely and are focused on providing data in time series. The BLS gives you the option to look at employment data that is seasonally adjusted; this data has been statistically “smoothed” to remove fluctuations in employment due to normal cyclical patterns in the economy related to summer and winter holidays, the start and end of school years, and general weather patterns.

Many of the datasets are subject to data suppression or non-disclosure to protect the confidentiality of businesses; if a given geography or industrial category has few establishments, or if a small number of establishments constitutes an overwhelmingly majority of employees or wages, data is either generalized or withheld. Most of these datasets exclude agricultural workers, government employees, and individuals who are self-employed. Data for these industries and workers is available through the USDA’s Census of Agriculture and the CB’s Census of Governments and Nonemployer Statistics.

The CB datasets are published on the Census Bureau’s website via the American Factfinder, the new, the FTP site and API, and via individual pages dedicated to specific programs. The BLS datasets are accessible through a variety of  applications via the BLS Data Tools. For each of the datasets discussed below I link to their program page, so you can see fuller descriptions of how the data is collected and what’s included.

The Census Bureau’s Business Data

Business Patterns (BP)
Typically referred to as the County and ZIP Code Business Patterns, this Census Bureau dataset is also published for states, metropolitan areas, and Congressional Districts. Published on an annual basis from administrative records, the number of employees, establishments, and wages (annual and first quarter) is published by NAICS, along with a summary of business establishments by employee size categories.
Economic Census
Released every five years in years ending in 2 and 7, this dataset is less timely than the BP but includes more variables: in addition to employment, establishments, and wages data is published on production and sales for various industries, and is summarized both geographically and in subject series that cover the entire industry. The Economic Census employs a mix of enumerations (100% counts) and sample surveying. It’s available for the same geographies as the BP with two exceptions: data isn’t published for Congressional Districts but is available for cities and towns.

Bureau of Labor Statistics Data

Current Employment Statistics (CES)
This is a monthly sample survey of approximately 150k businesses and government agencies that represent over 650k physical locations. It measures the number of workers, hours worked, and average hourly wages. Data is published for broad industrial categories for states and metropolitan areas.
Quarterly Census of Employment and Wages (QCEW)
An actual count of business establishments that’s conducted four times a year, it captures the same data that’s in the CES but also includes the number of establishments, total wages, and average annual pay (wages and salaries). Data is tabulated for states, metropolitan areas, and counties at detailed NAICS levels.
Occupational Employment Statistics (OES)
A bi-annual survey of 200k business establishments that measures the number of employees by occupation as opposed to industry (the specific job people do rather than the overall focus of the business). Data on the number of workers and wages is published for over 800 occupations for states and metro areas using the Standard Occupational Classification (SOC) system.

Labor Force Data by Residency

Current Population Survey (CPS)
Conducted jointly by the CB and BLS, this monthly survey of 60k households captures a broad range of demographic and socio-economic information about the population, but was specifically designed for measuring employment, unemployment, and labor force participation. Since it’s a survey of households it measures the labor force based on where people live and is able to capture people who are not working (which is something a survey of business establishments can’t achieve). Monthly data is only published for the nation, but sample microdata is available for researchers who want to create their own tabulations.
Local Area Unemployment Statistics (LAUS)
This dataset is generated using a series of statistical models to provide the employment and unemployment data published in the CPS for states, metro areas, counties, cities and towns. Over 7,000 different areas are included.
American Community Survey (ACS)
A rolling sample survey of 3.5 million addresses, this dataset is published annually as 1-year and 5-year period estimates. This is the Census Bureau’s primary program for collecting detailed socio-economic characteristics of the population on an on-going basis and includes labor force status and occupation. Data is published for all large geographies and small ones including census tracts, ZCTAs, and PUMAs. Each estimate is published with a margin of error at a 90% confidence interval. Labor force data from the ACS is best used when you’re OK with generally characterizing an area rather than getting a precise and timely measurement, or when you’re working with an array of ACS variables and want labor force data generated from the same source using the same methodology.

Wrap Up

In the book I’ll spend a good deal of time navigating the NAICS codes, explaining the impact of data suppression and how to cope with it, and covering the basics of using this data from an economic geography approach. I’ve written some exercises where we calculate location quotients for advanced industries and aggregate ZIP-Code based Business Patterns data to the ZCTA-level. This is still a draft, so we’ll have to wait and see what stays and goes.

In the meantime, if you’re looking for summaries of additional data sources in any and every field I highly recommend Julia Bauder’s excellent Reference Guide to Data Sources. Even though it was published back in 2014 I find that the descriptions and links are still spot on – it primarily covers public and free US federal and international government sources.

BLS Data Portal

Bureau of Labor Statistics Data Tools