After downloading US census data, you often need to reformat it before using it. It’s quite common that you download files where the population is broken down by gender and age, and you need to aggregate the data to get a total or divide a particular characteristic to get a percent total. This is pretty straightforward if you’re working with decennial census data, but data from the American Community Survey (ACS) is a little trickier to deal with since you’re working with estimates that have a margin of error. When creating new data, you also have to calculate what the margin of error is for your derived numbers. I’ll walk through some examples of how you would do this in a spreadsheet (the formulas below will work in either Excel or Calc).
Creating an Aggregate
We’ll use the following data in our example:
We have the total population of people three years and older who are enrolled in school, and a breakdown of this population enrolled in grades 1 through 4 and grades 5 through 8 in a few counties in New York, with margins of error for each data point. Our data is from the 3 year averaged 2005-2007 American Community Survey.
Let’s say we want to create a total for students who are enrolled in grades 1 through 8 for each county. We create a new column and sum the estimates for each county with the formula e3+g3, or sum(e3:g3).
To calculate a margin or error (MOE) for our grade 1 to 8 data, first we have to use the find and replace command to get rid of the “+/-” signs in the MOE column, so our spreadsheet will treat our values as numbers and not text (this is an issue if you downloaded the data as an Excel file – if you download a txt file the +/- is not included). Depending on the dataset you’re working with, you may also need to replace dashes, which represent data that was null or not estimated.
Once the data is cleaned up, we can insert a new column with this formula:
This calculates our new margin of error by squaring the moes for each of our data points, summing the results together, and taking the square root of that sum. In other words,
Once that’s done, you may want to round the new MOE to a whole number.
Creating a Percent Total
Let’s calculate the percentage of the population 3 years and older enrolled in school that are in grades 1 through 8. Based on what we have thus far (I hid the columns E,F,G, and H for grades 1-4 and 5-8 in this screenshot, as we don’t need them):
We insert a new column where we divide our subgroup by the total, as you would expect – I3/C3. In the next column we insert the following formula to create a MOE for our new percent total:
This one’s a little weightier than our last formula. We’re taking the square of our percent total (K3) and the square of the MOE of the total population (D3), multiplying them together, then subtracting that number from the square of the MOE of our subgroup (J3). Then we take the square root of the whole thing, then divide it by our total population (C3). If you’re saying – HUH? Maybe this is clearer:
Finally, we have something like this:
Based on our data, we can say things like “There were approximately 30,556 students enrolled in 1st through 8th grade per year in Dutchess County, NY between 2005 and 2007, plus or minus 1,184 students. An estimated 37% of the population enrolled in school in the county was in the 1st through 8th grade, plus or minus 1%.” The ACS estimates have a 90% confidence interval.
In this example we worked with aggregating and calculating percentages based on characteristics. We could also use these same formulas to aggregate data by geography, if we wanted to add the characteristics for all the counties together.
For the full documentation on working with ACS data, take a look at the appendix in the Census’ ACS Compass Guide, What General Data Users Need to Know. It provides you with the formulas in their proper statistical notation (for those of you more mathematically inclined than I) and includes formulas for calculating other kinds of numbers, such as ratios and percent change. It does provide you with worked-through examples, but not with spreadsheet formulas. I used their examples when I created formulas the first time around, so I could compare my formula results to their examples to insure that I was getting it right. I’d strongly recommend doing that before you start plugging away with your own data – one misplaced parentheses and you could end up with a different (and incorrect) result.