open source

Census Book

Exploring the US Census Book Published!

My book, Exploring the US Census: Your Guide to America’s Data, has been published! You can purchase it directly from SAGE Publishing or from Barnes and Nobles, Amazon, or your bookstore of choice (it’s currently listed for pre-order on Amazon but its availability there is imminent). It’s $45 for the paperback, $36 for the ebook. Data for the exercises and supplemental material is available on the publisher’s website, and I’ve created a landing page for the book on this site.

Exploring the US Census is the definitive researcher’s guide to working with census data. I place the census within the context of: US society, the open data movement, and the big data universe, provide a crash course on using the new data.census.gov, and introduce the fundamental concepts of census geography and subject categories (aka universes). One chapter is devoted to each of the primary datasets: decennial census (with details about the 2020 census that’s just over the horizon), American Community Survey, Population Estimates Program, and business data from the Business Patterns, Economic Census, and BLS. Subsequent chapters demonstrate how to: integrate census data into writing and research, map census data in GIS, create derivative measures, and work with historic data and microdata with a focus on the Current Population Survey.

I wrote the book as a hybrid between a techie guidebook and an academic text. I provide hands-on exercises so that you learn by doing (techie) while supplying sufficient context so you can understand and evaluate why you’re doing it (academic). I demonstrate how to find and download data from several different sources, and how to work with the data using free and open source software: spreadsheets (LibreOffice Calc), SQL databases (DB Browser for SQLite), and GIS (QGIS). I point out the major caveats and pitfalls of working with the census, along with many helpful tools and resources.

The US census data ecosystem provides us with excellent statistics for describing, studying, and understanding our communities and our nation. It is a free and public domain resource that’s a vital piece of the country’s social, political, and economic infrastructure and a foundational element of American democracy. This book is your indispensable road map for navigating the census. Have a good trip!

See the series – census book tag for posts about the content of the book, additional material that expands on that content (but didn’t make it between the covers), and the writing process.

Updated QGIS Tutorial for 3.4

I recently released an updated version of the manual and data I use for my day-long GIS Practicum, Introduction to GIS Using Open Source Software (Using QGIS). The manual has five chapters: a summary overview of GIS, basics of using the QGIS interface, GIS analysis that includes several geoprocessing and analysis functions, thematic mapping and map layout, and a summary of where to find data and resources for learning more. Chapters 2, 3, and 4 are broken down into sections with clear steps, followed by commentary that explains what we did and why. We cover much of the material in a single day, although you can space the lessons out into two days if desired.

I updated this version to move us from QGIS 2.18 Las Palmas to 3.4 Madeira, which are the former and current long term service releases. While the move from 2.x to 3.x involved a major rewrite of the code base (see the change log for details), most of the basics remain the same. While veteran users can easily navigate through the differences, it can be a stumbling block for new users if they are trying to learn a new version using an old tutorial with screens and tools that are slightly different. So it was time for an update!

My goal for this edition was to keep my examples in place but revise the steps based on changes in the interface. Most of the screenshots are new, and the substantive changes include: using the Data Manager for adding layers rather than the toolbar with tons of buttons, better support for xlsx and ods files which allowed me to de-emphasize xls and dbf files for attribute table joins, the addition of geopackages to the vector data mix, the loss of the Open Layers plugin and my revision to the web mapping section using OSM XYZ tiles, the disappearance of the setting that allowed you to disable on the fly projection, and the discontinuation of the stand-alone Data Browser. There were also changes to some tools (fixed distance and variable buffer tools are now united under one tool) and names of menus (Style menu has once again become the Symbology menu).

It’s hard to believe that this is my ninth edition of this tutorial. I try to update it once a year to keep in sync with the latest long term release, but fell a bit behind this year. QGIS 2.18 also survived for a bit longer than other releases, as the earlier 3.x versions went through lots of testing before ending up at 3.4. When it comes time for my tenth edition I may change the thematic mapping example in chapter 4 to something that’s global instead of US national, and in doing streamline the content. We’ll see if I have some time this summer.

Since I’m in update mode, I also fixed several links on the Resources page to cure creeping link rot.

SQLite Logo

Looking for a Good SQLite GUI?

Goodbye SQLite Manager…

Late last year, I discovered that my favorite SQLite GUI was defunct. The SQLite Manager was a plugin for Firefox that allowed you to create and interact with SQLite databases with a simple yet highly functional interface. It had good support for importing and exporting csv files, color coding of cells based on data types, and a convenient feature for cycling back and forth between your SQL statements. Since it was a Firefox plugin it was guaranteed to work on any operating system, and since Firefox is installed on machines across my campus I knew I could rely on it for creating data extracts for students and faculty – I’d package data up in SQLite and send it to them along with a link to the plugin.

Firefox goes through about a million versions a year these days, and after a major upgrade last fall (to Firefox Quantum) most of the existing plugins, including the SQLite Manager, were no longer compatible. An upgrade it highly unlikely, as a few things changed under the hood of Firefox that makes the plugin unusable. While it still works on the Firefox Extended Support Release, in the long run the writing is on the wall.

Hello DB Browser for SQLite!

After searching through many alternatives I discovered the DB Browser for SQLite. It runs on Windows, Mac, and Linux and there’s a version for mobile. It was easy to install and has a clean interface. It provides a number of convenient tools and menus that you can use in place of writing SQL DDL, and in some cases it expands the functionality of SQLite by enabling a number of ALTER TABLE commands that are not part of SQLite SQL (like renaming and dropping columns). The Browse Data window makes it easy to quickly thumb through, sort, and filter records and to edit individual values by hand. The Execute SQL window has auto-complete and color-coded syntax, and you can see the database schema in one tab as you write your SQL in another (making it easy to reference table and column names). You can import and export data as CSV (or any delimited text file) or SQL files, and you can save the results of SELECT queries as CSV.

One interesting addition is that there’s actually a Save (Write Changes) and Undo button. So when you create, modify, or drop records, columns, or tables you see the result, but the act isn’t final until you commit the changes. A nice safety feature, especially for db novices.

DB Browser for SQLite - Browse

Browse Data

DB Browser for SQLite - SQL

Execute SQL and View DB Schema

I encountered a few quirks, but nothing insurmountable. I was using the nightly build version without realizing it, and when importing a CSV file the database takes a best guess as to what the data types for the columns should be. Even though the import screen gives you the option to specify that values are quoted, my quoted numeric fields were still saved as numbers and not text. As a result, ID codes like FIPS or ZIP Codes lose their leading zeros and are saved as integers.

The project is managed on github, so I went ahead and posted an issue. The developers were super responsive, and a discussion ensued over whether this behavior was desirable or not. We found two work-arounds. First, if you build an empty table with the desired structure, and then go to import the CSV, if you provide the name of that empty table as the new table name the db will import your data into that table. Alternatively, if I went and downloaded the latest stable release (3.10.1) the default behavior is that all columns are imported as text, which is a safer bet. You can use the GUI to change the types after import. The issue was marked as a bug, and will be addressed in a future release – one possible solution is to provide an option to turn the autodetect feature on (to determine what the types should be) or off (to import everything as text).

The browser also has a feature to attach a database to the current database, but when you do the attachment it appears like nothing happened – you can’t seeĀ  or browse the objects in database number two. But it IS attached (you can see every statement that’s been executed in a helpful log window) and you can copy a table from one db to the other like this:

CREATE TABLE sometable AS
SELECT *
FROM database2.sometable;

You run this within the current database, and database2 is the attached database (when you attach a db you provide an alias for referencing it).

These are minor quibbles. The DB Browser for SQLite is cross-platform, stable, has a clean interface with nice features, and is actively developed by a responsive and friendly team. I’ll be using it for all my SQLite tasks and projects, and will recommend it to others.

Spatialite?

An alternative I considered was to simply use the Spatialite GUI for both regular and spatial databases. It also has a simple, solid, and functional interface and supports spatial SQL, giving you the best of both worlds. So why not? While it works great for my own purposes it’s not something I can recommend to new users who are not GIS folks, either in my work or in the census data book I’m writing. Just figuring out where to download it from the website is overly complex, and while there are binaries for MS Windows there are none for Mac users. You’d have to install it from the source files, which is over the top for novices. Linux users may get lucky and find it in their software repos (it’s included for Debian and Ubuntu). The database browser in QGIS has matured in recent years, so that’s another option for GIS users who want to work with Spatialite or PostGIS.

Now if we only had a good GUI for PostgreSQL… I tried pgAdmin 4 about a year ago, and it was so bad that I’m still clinging to pgAdmin III as long as it still lives. But this is a different story, and one I’ll return to and investigate fully when it comes time to teach my spatial database course next year.

foss4g boston 2017 logo

FOSS4G 2017 Round Up

A month ago at this time I was in Boston for FOSS4G 2017 (Free and Open Source for Geospatial), which is the international conference for free and open source GIS enthusiasts, developers, educators, and practitioners. I updated my introductory GIS / QGIS workshop manual to 2.18 Las Palmas (which is slated to be the next long term service release once 3.0 comes out) and Anastasia, Janine, and I took the workshop on the road. We had a good turnout and an excellent class, and then were able to enjoy the three days of sessions. Here are some of the high-lights from sessions I attended.

  • The NYC Department of City Planning has hired their own, internal open source developer and is assembling a team called NYC Planing Labs. Their first project was to revamp the city facilities database and build the NYC Facilities Explorer, a web mapping interface that sits on top of the database and makes it easy for folks to browse and visualize.
  • There was an interesting talk from an independent research unit that’s affiliated with the University of Chicago. The speaker outlined their process for switching their team from ESRI to open source. The talk gave me appreciation for the amount of work that’s involved for transitioning a team of service providers from one set of tools to another. This group did things the right way, doing necessary background research and identifying short, medium, and long term plans for making the switch. Their biggest revelation was that they ended up shifting funds from purchasing licenses to staff, which has allowed them to expand their activities.
  • PostGIS 2.4 will add a number of functions that were only available for the geometry type, like ST_Centroid, to the geography type. PostgreSQL 10 is going to allow users with big databases to take greater advantage of parallel computing.
  • Some archaeologists are looking to adapt the schema used by Open Street Map to create an OpenHistoryMap. There aren’t many global standards for cataloging archaeological data; it’s primarily site and project specific. Unique challenges include the importance of scale (need to see that pot shard in a room, in a house, in the overall site, in the greater region…) and varying degrees of reliability. Once an artifact is found there isn’t absolute certainty regarding it’s age, provenance, or use. This project (Open History Map) is different from Open Historical Map; the latter relies on data that’s already in OSM and describes the past as a function of the present.
  • I tend to use GRASS GIS in limited circumstances, but am always pleasantly surprised when I follow up to see what’s new. The big selling points are stability, backwards compatibility, and the ability to do a lot via the command line. For version 7.2 there are several improvements: an improved GUI, a data catalog, a more sophisticated Python editor, easier vector legend tools, an advanced search feature for finding modules, and temporal algebra. I’m not a heavy raster user and never work in 3D, but these have always been and continue to be major strengths of GRASS. They also have a growing repository of 3rd party plugins and modules.
  • One of the plenary speakers gave us a demo of R markdown for creating websites, documents, and even for writing books. It gives you the ability to easily import data and write R code to produce a chart, graph, or map right in the same document as your narrative text. So instead of doing a basic analysis, creating a chart, and writing up your project in three different places you can do it all in one place and compile it to HTML or PDF. With the source, readers also have the benefit of seeing what you did and they can test the results.
  • R has really come a long way for geospatial analysis and visualization. I can’t remember it even being mentioned at the last FOSS4G I attended in 2011, but in 2017 it was a major component of the conference. In trying to figure how it fits in to the landscape, I assumed that it was a matter of background and preference. People who have a stats background and want to do geospatial work are going to gravitate towards it, while people with more of a programming or data processing background may be more disposed to using Python or Javascript. The plenary speaker framed R as an exploratory language that’s great for iterative work – let me quickly graph this data to see what it looks like, then I’ll write another piece to view it a different way. Other scripting languages tend to tackle more tasks in one large batch in a linear fashion.
  • The contingent of academic GIS and geospatial librarians and developers has been growing at this conference over the years. There was an opportunity for the Open Geoportal and Geoblacklight communities to get together and exchange notes. Both groups have a commitment to open metadata standards and resource sharing.
  • OSGeo folks have been active in promoting free and open source GIS education, and have created a directory of GIS labs around the world.
  • Just when you thought you were confused about which tool to use, here’s another one – Vega, a declarative JSON grammar for creating graphs and charts.
  • QGIS 3.0 is on the horizon; it will encompass a shift from Qt 4 to 5 and Python 2 to 3. There are a number of great new features: a task manager, a data source manager to replace the dozen individual buttons in 2.x, better support for metadata viewing and editing, more 3D tools, multiple map canvases, the ability to store different user profiles (to save your plugins and layouts on shared machines), better digitizing tools, and a whole lot more. A lot of plugins will disappear as they make their way into the processing toolbox, the stand-alone QGIS Browser will be dropped as its functions are integrated into QGIS Desktop, and map projects created in 3.x will not be backward compatible to 2.x. The time line says that 3.0 will be launched in late Nov 2017, and at that point 2.18 will become the LTS release. It will take another year, til Nov 2018, when 3.2 becomes the LTS. If you’re like me and favor stability over new features, you can stick with 2.x for the another year.
  • A good talk on community health mapping introduced a stack that you can use for data gathering (Fulcrum), analysis (QGIS) and publishing on the web (Carto). I’m well versed in the last two, but didn’t know about Fulcrum. Essentially it’s an app that you can use on phones and tablets to gather data out in the field, including GPS coordinates. On his blog he’s created a series of lab exercises that cover the entire stack, so the communities can learn the process and take ownership of it and the tools.

Planning for FOSS4G 2018 is well underway. The conference uses a three year rotation where it goes from Europe to North America to another continent. Next year it’s Africa’s turn as the conference heads to Dar es Salaam in Tanzania.