analysis

Washington DC street

Using the ACS to Calculate Daytime Population

I’m in the home stretch for getting the last chapter of the first draft of my census book completed. The next to last chapter of the book provides an overview of a number of derivatives that you can create from census data, and one of them is the daytime population.

There are countless examples of using census data for site selection analysis and for comparing and ranking places for locating new businesses, providing new public services, and generally measuring potential activity or population in a given area. People tend to forget that census data measures people where they live. If you were trying to measure service or business potential for residents, the census is a good source.

Counts of residents are less meaningful if you wanted to gauge how crowded or busy a place was during the day. The population of an area changes during the day as people leave their homes to go to work or school, or go shopping or participate in social activities. Given the sharp divisions in the US between residential, commercial, and industrial uses created by zoning, residential areas empty out during the weekdays as people travel into the other two zones, and then fill up again at night when people return. Some places function as job centers while others serve as bedroom communities, while other places are a mixture of the two.

The Census Bureau provides recommendations for calculating daytime population using a few tables from the American Community Survey (ACS). These tables capture where workers live and work, which is the largest component of the daytime population.

Using these tables from the ACS:

Total resident population
B01003: Total Population
Total workers living in area and Workers who lived and worked in same area
B08007: Sex of Workers by Place of Work–State and County Level (‘Total:’ line and ‘Worked in county of residence’ line)
B08008: Sex of Workers by Place of Work–Place Level (‘Total:’ line and ‘Worked in place of residence’ line)
B08009: Sex of Workers by Place of Work–Minor Civil Division Level (‘Total:’ line and ‘Worked in MCD of residence’ line)
Total workers working in area
B08604: Total Workers for Workplace Geography

They propose two different approaches that lead to the same outcome. The simplest approach: add the total resident population to the total number of workers who work in the area, and then subtract the total resident workforce (workers who live in the area but may work inside or outside the area):

Daytime Population = Total Residents + Total Workers in Area - Total Resident Workers

For example, according to the 2017 ACS Washington DC had an estimated 693,972 residents (from table B01003), 844,345 (+/- 11,107) people who worked in the city (table B08604), and 375,380 (+/- 6,102) workers who lived in the city. We add the total residents and total workers, and subtract the total workers who live in the city. The subtraction allows us to avoid double counting the residents who work in the city (as they are already included in the total resident population) while omitting the residents who work outside the city (who are included in the total resident workers). The result:

693,972 + 844,345 - 375,380 = 1,162,937

And to get the new margin of error:

SQRT(0^2 + 11,107^2 + 6,102^2) = 12,673

So the daytime population of DC is approx 468,965 people (68%) higher than its resident population. The district has a high number of jobs in the government, non-profit, and education sectors, but has a limited amount of expensive real estate where people can live. In contrast, I did the calculation for Philadelphia and its daytime population is only 7% higher than its resident population. Philadelphia has a much higher proportion of resident workers relative to total workers. Geographically the city is larger than DC and has more affordable real estate, and faces stiffer suburban competition for private sector jobs.

The variables in the tables mentioned above are also cross-tabulated in other tables by age, sex, race, Hispanic origin , citizenship status, language, poverty, and tenure, so it’s possible to estimate some characteristics of the daytime population. Margins of error will limit the usefulness of estimates for small population groups, and overall the 5-year period estimates are a better choice for all but the largest areas. Data for workers living in an area who lived and worked in the same area is reported for states, counties, places (incorporated cities and towns), and municipal civil divisions (MCDs) for the states that have them.

Data for the total resident workforce is available for other, smaller geographies but is reported for those larger places, i.e. we know how many people in a census tract live and work in their county or place of residence, but not how many live and work in their tract of residence. In contrast, data on the number of workers from B08604 is not available for smaller geographies, which limits the application of this method to larger areas.

Download or explore these ACS tables from your favorite source: the American Factfinder, the Census Reporter, or the Missouri Census Data Center.

Net Out-Migration from the NY Metro Area to Other Metro Areas 2011-2015

Recent Migration Trends for New York City and Metro

The Baruch GIS lab crew just published a paper: New Yorkers on the Move: Recent Migration Trends for the City and Metro Area. The paper (no. 15 Feb 2018) is part of the Weissman Center for International Business Occasional Paper Series, which focuses on New York City’s role in the international and domestic economy.

Findings

We analyzed recent population trends (2010 to 2016) in New York City and the greater metropolitan area using the US Census Bureau’s Population Estimates to study components of population change (births, deaths, domestic and international migration) and the IRS Statistics of Income division’s county to county migration data to study domestic migration flows.

Here are the main findings:

  1. The population of New York City and the New York Metropolitan Area increased significantly between 2010 and 2016, but annually growth has slowed due to greater domestic out-migration.
  2. Compared to other large US cities and metro areas, New York’s population growth depends heavily on foreign immigration and natural increase (the difference between births and deaths) to offset losses from domestic out-migration.
  3. Between 2011 and 2015 the city had few relationships where it was a net receiver of migrants (receiving more migrants than it sends) from other large counties. The New York metro area had no net-receiver relationships with any major metropolitan area.
  4. The city was a net sender (sending more migrants than it received) to all of its surrounding suburban counties and to a number of large urban counties across the US. The metro area was a net sender to metropolitan areas throughout the country.

For the domestic migration portion of the analysis we were interested in seeing the net flows between places. For example, the NYC metro area sends migrants to and receives migrants from the Miami metro. What is the net balance between the two – who receives more versus who sends more?

The answer is: the NYC metro is a net sender to most of the major metropolitan areas in the country, and has no significant net receiver relationships with any other major metropolitan area. For example, for the period from 2011 to 2015 the NYC metro’s largest net sender relationship was with the Miami metro. About 88,000 people left the NYC metro for metro Miami while 58,000 people moved in the opposite direction, resulting in a net gain of 30,000 people for Miami (or in other words, a net loss of 30k people for NYC). The chart below shows the top twenty metros where the NYC metro had a deficit in migration (sending more migrants to these areas than it received). A map of net out-migration from the NYC metro to other metros appears at the top of this post. In contrast, NYC’s largest net receiver relationship (where the NYC metro received more migrants than it sent) was with Ithaca, New York, which lost a mere 300 people to the NYC metro.

All of our summary data is available here.

domestic migration to NYMA 2011-2015: top 20 deficit metro areas

Process

For the IRS data we used the county to county migration SQLite database that Janine meticulously constructed over the course of the last year, which is freely available on the Baruch Geoportal. Anastasia employed her Python and Pandas wizardry to create Jupyter notebooks that we used for doing our analysis and generating our charts, all of which are available on github. I used an alternate approach with Python and the SQLite and prettytable modules to generate estimates independently of Anastasia, so we could compare the two and verify our numbers (we were aggregating migration flows across years and geographies from several tables, and calculating net flows between places).

One of our goals for this project was to use modern tools and avoid the clunky use of email. With the Jupyter notebooks, git and github for storing and syncing our work, and ShareLaTeX for writing the paper, we avoided using email for constantly exchanging revised versions of scripts and papers. Ultimately I had to use latex2rtf to convert the paper to a word processing format that the publisher could use. This post helped me figure out which bibliography packages to choose (in order for latex2rtf to interpret citations and references, you need to use the older natbib & bibtex combo and not biblatex & biber).

If you are doing similar research, Zillow has an excellent post that dicusses the merits of the different datasets. There are also good case studies on Washington DC and Philadelphia that employ the same datasets.

Note Taking for Academic Research

I’ve been reviewing a lot of literature over the past year in preparation for writing my book, so note taking is at the forefront of my mind. Grad students occasionally ask me for suggestions on how to effectively take notes, so I’ll share some pointers here. I’ll begin with my quest to find the right note taking software, followed by my actual process for taking notes.

Finding the Right Tool

Ten years ago, I suddenly found myself back in a position where I needed to write academic papers, something I hadn’t done since I wrote my master’s thesis about eight years before that. At that time, I was still using the techniques I had learned in high school (a much longer time ago…). Back then, you were either an index card person or a binder person. The card people would write one note on each card, while binder people kept a ledger of notes and would add additional pages as needed. You’d classify your notes as summaries, paraphrases, or quotations.

I assumed that my high school methods must be outdated by now, so I cast around to see what note taking software was available. I knew I wanted to go open source, as I didn’t want my notes tethered to a specific tool and stored in proprietary format. There were a lot of options, and I quickly became bogged down and frustrated with trying them all. I felt that much of the software forced me to conform to it, and I was spending too much time fidgeting and figuring things out.

I abandoned the search and recorded my notes in a simple text (aka notepad) document. I had always been a binder person, so the single document approach appealed to me. I could copy and paste, use spell check, and search for keyword terms that I assigned (the Linux editors like gedit, leafpad, and xed are lightweight but more robust than MS Notepad). This worked fine for a stand-alone paper and I still use this approach for small projects. But as my research became on-going I needed to rely on these notes for many future projects. The single notepad document grew unwieldy and browsing and searching became difficult.

A few years later, I made a second attempt at searching for note taking software, and this time I broadened the search to include more general-purpose options. My solution: use a wiki! With a wiki, every single source can have it’s own page, the sources can be grouped together under thematic categories, you can assign tags, and you can search across all the pages. I could also add links between pages and out to the web, and could link the notes to the source documents. The wiki was so open ended that I didn’t feel constrained in writing my notes to fit a particular interface, nor did I have to waste a lot of time sifting though buttons and tools.

I opted for a desktop wiki called Zim, which has been actively maintained since 2008. All of the pages in Zim are saved as individual text files in a basic wiki mark-up, which insures that they can be accessed outside the program. Pages are stored in a notebook which is essentially just a folder. If you create hierarchies of pages, these categories become folders and sub-folders. Zim has a ton of extra plugins so you can do spell checking, concept mapping, you can create formulas, calendars, and more. You can also export your entire notebook or portions of it out as HTML or LaTeX files.

Most importantly, the wiki solved one of my most vexing problems. I found that a lot of the note taking software was geared towards just taking notes, and couldn’t handle keeping track of citations. Citation software is it’s own genre, and I found that those packages were poor for taking notes. With Zim, I create a page dedicated to each source, and at the top of each page I embed some BibTeX code for storing the citation data. BibTeX is a format that’s used for creating LaTeX bibliographies, but it has become a common standard and can be used by word processors too. I have a template page (see below) with several BibTeX document types that I just copy and paste when I have a new source to add. Since the pages are saved as plain text, I wrote a short Python script (appears at the end of this post) that loops through my note pages, scrapes out the BibTeX records, and creates a BibTeX file that I can use in LaTeX. Within the BibTeX record I store a link to the source: either to a PDF I have locally, or a web page (if it’s a site), or a WorldCat catalog record (if it’s a book). So all my notes, citations, and the source material are kept together in one place!

BibTeX templates

Zim is desktop software that you have to download and install locally. Since the notebook consists of text files in folders, it’s easy to back it up into Box or DropBox or whatever you use. Zim doesn’t save histories or have version control, but there’s a plugin that lets you sync your files with Git and other systems.

Relying on Tried and True Methods

While the right tool is important, it’s really the method that counts. I learned that I had to jettison the idea that the note taking process has to be 100% efficient. While you certainly don’t want to flail around and waste time, note taking is not supposed to be quick and easy. The only way you can truly learn new material is to spend time with it: reading, re-reading, taking notes, and reading the notes. The process of note taking is equally if not more important than the actual notes themselves, as the process is what helps you to synthesize and learn the material. While I left the binder and note cards behind, the actual note taking process was similar to what I did in high school.

I always download articles and bookmark websites or catalog records as I’m doing my searches. Once I complete a series of searches and have gathered material from the web and library databases, I sift through the files and rename them using the first author’s last name and the year of publication (i.e. Jones2017). I’ll also use this file name as the BibTeX key that uniquely identifies the article. I create a documents folder with sub-folders for articles, books, and reports, and I keep these folders in the same location as the ZIM notebook. There’s no reason to create lots of topical or thematic folders, as you can use the wiki to categorize and tag the notes, and the wiki becomes the vehicle for searching or browsing through the documents.

As I sort through the sources I identify what’s essential and what’s ancillary. High priority sources will be read thoroughly and covered in detail, while the low-priority stuff will be skimmed and summarized. High priority sources are critical to your research and include touchstone articles in your field, excellent case studies, relevant background material, and any past research that remotely resembles what you’re working on. Low priority sources may have one important fact or concept that you need to remember; these materials are more tangential to your work and ultimately you might cite them in passing, or even not at all.

I always print out the high-priority articles. I’ll read it first, and then I’ll go back and do a second read and mark passages with a high-lighter. Next, I’ll create and type notes directly into the wiki. I might read and mark up a couple articles before I start note taking, but I don’t wait too long as I want the articles fresh in my memory. For essential books, I’ll read a chapter or two at a time and mark passages with little sticky flags. Then I’ll go back and take notes in a paper notebook, and will keep doing that until I finish the book. Then I transcribe all the notes for the book onto my laptop. This takes longer, but once again it’s not all about efficiency. I get to spend more time with the material and it helps me absorb it. This approach also separates the computer from the reading, which cuts down distractions and provides more flexibility in terms of where I can work. Reading in a comfortable chair or outside is preferable to reading while sitting at a table with a laptop.

Note flags in book

I never print out or mark up low-priority articles; I skim through the digital copies and write a summary directly in the wiki. For books, I read the book in one go and may use sticky flags here and there, and when I’m done I type the notes directly into the wiki.

For the notes themselves, each note page has the title and author prominently at the top followed by a summary of the source, and then the BibTeX citation (see below). Low-priority materials usually get nothing but a summary and a citation. High-priority materials get detailed notes. Each note is written as a bullet point, and can represent one important fact or insight, or can be a summary of a paragraph or several pages, or even a summary of a chapter. It depends on how important the material is relative to my work.

Zim Wiki

Taking notes is not like writing a book report. I’m not writing an even or objective summary of the material in it’s entirety. Instead, I’m picking out the pieces that are of interest to me and to the work I’ll be doing, and I skip the rest. Sometimes I’ll editorialize (this is great, or this stinks) but I write in such a way that my thoughts are distinct from what the author is saying. This is where efficiency comes into the picture: identify sources that are high versus low priority, and summarize the source and identify just the specific details that are relevant to you. You’re writing these notes with specific research goals in mind, so don’t waste time writing a generic book report.

I always summarize or paraphrase the material as I take notes, putting concepts in my own words. Doing this forces you to wrestle with the concepts and internalize them, which improves your understanding of the material and your memory for it. It also helps guard against plagiarism; once you start writing the paper, you’ll know your notes are already in your own words and you can use them freely. If I do quote something directly, I always surround it with quotation marks. Lastly, at the end of my note I provide the page numbers to indicate what’s been summarized, so I can go back if need be.

Note taking is an idiosyncratic process. What works for you may not work for someone else and vice versa. The key is to figure out what works best for you; create a system, try it out, and once you’re happy go with it. You can always tweak things as you move along. The notes will help you when it comes time to pull your ideas together into a cohesive paper, but it’s the reading and note taking process that helps you to become proficient with the subject matter.

As I was re-learning how to take notes, I found the handouts from the University of Melbourne’s Academic Skills Unit to be particularly valuable. This is their latest version of Taking Notes From Texts, and this is the older version that I stumbled on years ago.

(Python code for scraping BibTeX records out of wiki notes to create a bibliography is posted below).

#Parse notes stored in zim wiki to extract all bibtex records and write them
#to a new bibtex file named with today's date.

#Script must be stored directly above the notes folder where the wiki data
#is stored. It will ignore the empty bibtex template files and will only
#read wiki files stored as .txt.

#Within the wiki, all bibtex records in the notes are enclosed in a bibtex tag.
#The script reads each line and ignores them until it finds the open
#tag. Then it starts writing each line until it reads the close  tag.
#A line return is appended so records are separated in the output file.

#A list and count of extracted records is provided as a diagnostic

import os, datetime

now=datetime.date.today()
path='.'
outfile='sources_'+str(now)+'.bib'

writefile=open(outfile,'w')

counter=0
titles=[]

for (subdir,dirs,files) in os.walk(path):
    if 'Templates' in dirs:
        dirs.remove('Templates')
    if 'documents' in dirs:
        dirs.remove('documents')
    for file in files:
        if file[-4:]=='.txt':
            readfile=open(os.path.join(subdir,file),'r')
            for line in readfile:
                if line.startswith(''):
                    break
            for line in readfile:
                if line.startswith(''):
                    titles.append(file)
                    writefile.write('\n')
                    counter=counter+1
                    break
                else:
                    writefile.write(line)
            readfile.close()
writefile.close()

titles = [t.replace('_', ' ') for t in titles]
titles=[t.strip('.txt') for t in titles]
titles.sort()
print('Bibliographic records have been extracted for the following sources:','\n')
for title in titles:
    print('*',title)
print('\n')
print(counter,'bibilographic records have been parsed and written to',outfile)

Average Distance to Public Libraries in the US

A few months ago I had a new article published in LISR, but given the absurd restrictions of academic journal publishing I’m not allowed to publicly post the article, and have to wait 12 months before sharing my post-print copy. It is available via your local library if they have a subscription to the Science Direct database (you can also email me to request a copy). .

Citation and Abstract

Regional variations in average distance to public libraries in the United States
F. Donnelly
Library & Information Science Research
Volume 37, Issue 4, October 2015, Pages 280–289
http://dx.doi.org/10.1016/j.lisr.2015.11.008

Abstract

“There are substantive regional variations in public library accessibility in the United States, which is a concern considering the civic and educational roles that libraries play in communities. Average population-weighted distances and the total population living within one mile segments of the nearest public library were calculated at a regional level for metropolitan and non-metropolitan areas, and at a state level. The findings demonstrate significant regional variations in accessibility that have been persistent over time and cannot be explained by simple population distribution measures alone. Distances to the nearest public library are higher in the South compared to other regions, with statistically significant clusters of states with lower accessibility than average. The national average population-weighted distance to the nearest public library is 2.1 miles. While this supports the use of a two-mile buffer employed in many LIS studies to measure library service areas, the degree of variation that exists between regions and states suggests that local measures should be applied to local areas.”

Purpose

I’m not going to repeat all the findings, but will provide some context.

As a follow-up to my earlier work, I was interested in trying an alternate approach for measuring public library spatial equity. I previously used the standard container approach – draw a buffer at some fixed distance around a library and count whether people are in or out, and as an approximation for individuals I used population centroids for census tracts. In my second approach, I used straight-line distance measurements from census block groups (smaller than tracts) to the nearest public library so I could compare average distances for regions and states; I also summed populations for these areas by calculating the percentage of people that lived within one-mile rings of the nearest library. I weighted the distances by population, to account for the fact that census areas vary in population size (tracts and block groups are designed to fall within an ideal population range – for block groups it’s between 600 and 3000 people).

Despite the difference in approach, the outcome was similar. Using the earlier approach (census tract centroids that fell within a library buffer that varied from 1 to 3 miles based on urban or rural setting), two-thirds of Americans fell within a “library service area”, which means that they lived within a reasonable distance to a library based on standard practices in LIS research. Using the latest approach (using block group centroids and measuring the distance to the nearest library) two-thirds of Americans lived within two miles of a public library – the average population weighted distance was 2.1 miles. Both studies illustrate that there is a great deal of variation by geographic region – people in the South consistently lived further away from public libraries compared to the national average, while people in the Northeast lived closer. Spatial Autocorrelation (LISA) revealed a cluster of states in the South with high distances and a cluster in the Northeast with low distances.

The idea in doing this research was not to model actual travel behavior to measure accessibility. People in rural areas may be accustomed to traveling greater distances, public transportation can be a factor, people may not visit the library that’s close to their home for several reasons, measuring distance along a network is more precise than Euclidean distance, etc. The point is that libraries are a public good that provide tangible benefits to communities. People that live in close proximity to a public library are more likely to reap the benefits that it provides relative to those living further away. Communities that have libraries will benefit more than communities that don’t. The distance measurements serve as a basic metric for evaluating spatial equity. So, if someone lives more than six miles away from a library that does not mean that they don’t have access; it does means they are less likely to utilize it or realize it’s benefits compared to someone who lives a mile or two away.

Data

I used the 2010 Census at the block group level, and obtained the location of public libraries from the 2010 IMLS. I improved the latter by geocoding libraries that did not have address-level coordinates, so that I had street matches for 95% of the 16,720 libraries in the dataset. The tables that I’m providing here were not published in the original article, but were tacked on as supplementary material in appendices. I wanted to share them so others could incorporate them into local studies. In most LIS research the prevailing approach for measuring library service areas is to use a buffer of 1 to 2 miles for all locations. Given the variation between states, if you wanted to use the state-average for library planning in your own state you can consider using these figures.

To provide some context, the image below shows public libraries (red stars) in relation to census block group centroids (white circles) for northern Delaware (primarily suburban) and surrounding areas (mix of suburban and rural). The line drawn between the Swedesboro and Woodstown libraries in New Jersey is 5-miles in length. I used QGIS and Spatialite for most of the work, along with Python for processing the data and Geoda for the spatial autocorrelation.

Map Example - Northern Delaware

The three tables I’m posting here are for states: one counts the 2010 Census population within one to six mile rings of the nearest public library, the second is the percentage of the total state population that falls within that ring, and the third is a summary table that calculates the mean and population-weighted distance to the nearest library by state. One set of tables is formatted text (for printing or just looking up numbers) while the other set are CSV files that you can use in a spreadsheet. I’ve included a metadata record with some methodological info, but you can read the full details in the article.

In the article itself I tabulated and examined data at a broader, regional level (Northeast, Midwest, South, and West), and also broke it down into metropolitan and non-metropolitan areas for the regions. Naturally people that live in non-metropolitan areas lived further away, but the same regional patterns existed: more people in the South in both metro and non-metro areas lived further away compared to their counterparts in other parts of the country. This weekend I stumbled across this article in the Washington Post about troubles in the Deep South, and was struck by how these maps mirrored the low library accessibility maps in my past two articles.