In addition to providing the NYC Geodatabase as a resource, I also wanted to use it to generate reports and build applications. None of the open source SQLite GUIs that I’m familiar with have built in report generating capabilities, so I thought I could use Python to connect to the database and generate them. I have some grand ambitions here, but decided to start out small.
Python has a built-in module, sqlite3, that you can use to work with SQLite databases. This is pretty well documented – do a search and you’ll find a ton of brief tutorials. Take a look at this great post for a comprehensive intro.
For generating reports I gave prettytable a shot: it lets you create nice looking ASCII text tables that you can copy and paste from the prompt or export out to a file. The tutorial for the module was pretty clear and covers the basics quite nicely. In the examples he directly embeds the data in the script and generates the table from it, which makes the tutorial readily understandable. For my purposes I wanted to pull data out of a SQLite database and into a formatted table, so that’s what I’ll demonstrate here.
Initially I had some trouble getting the module to load, primarily (I think) because I’m using Python 3.x and the setup file for the module was written for Python 2.x; the utility you use for importing 3rd party modules has changed between versions. I’m certainly no Python expert, so instead of figuring it out I just downloaded the module, dumped it into the site-packages folder (as suggested in the prettytable installation instructions under “The Harder Way” – but it wasn’t hard at all) and unzipped it. In my script I couldn’t get the simple “import prettytable” to work without throwing an error, but when I added the name of the specific function “import PrettyTable from prettytable” it worked. Your mileage may vary.
So here was my first go at it. I created a test database and loaded a table of population estimates from the US Census Bureau into it (you can download it if you want to experiment):
from prettytable import PrettyTable import sqlite3 conn = sqlite3.connect('pop_test.sqlite') curs = conn.cursor() curs.execute('SELECT State, Name, ESTIMATESBASE2010 AS Est2010 FROM pop_est WHERE region="1" ORDER BY Name') col_names = [cn for cn in curs.description] rows = curs.fetchall() x = PrettyTable(col_names) x.align[col_names] = "l" x.align[col_names] = "r" x.padding_width = 1 for row in rows: x.add_row(row) print (x) tabstring = x.get_string() output=open("export.txt","w") output.write("Population Data"+"n") output.write(tabstring) output.close() conn.close()
The first piece is the standard SQLite piece – connect, activate a cursor, and execute a SQL statement. Here I’m grabbing three columns from the table for records that represent Northeastern states (Region 1). I read in the names of the columns from the first row into the col_names list, and I grab everything else and dump them into rows, a list that contains a tuple for each record:
>>> col_names ['State', 'Name', 'Est2010'] >>> rows [('09', 'Connecticut', 3574097), ('23', 'Maine', 1328361), ('25', 'Massachusetts', 6547629), ('33', 'New Hampshire', 1316469), ('34', 'New Jersey', 8791898), ('36', 'New York', 19378104), ('42', 'Pennsylvania', 12702379), ('44', 'Rhode Island', 1052567), ('50', 'Vermont', 625741)] >>>
The second piece will make sense after you have a quick look at the prettytable tutorial. Here I grab the list of columns names and specify how cells for the columns should be aligned (default is center) and padded (default is one space). Then I add each row from the nested list of tuples to the table, row by row. There are two outputs: print directly to the screen, and dump the whole table into a string. That string can then be dumped into a text file, along with a title. Here’s the screen output:
+-------+---------------+----------+ | State | Name | Est2010 | +-------+---------------+----------+ | 09 | Connecticut | 3574097 | | 23 | Maine | 1328361 | | 25 | Massachusetts | 6547629 | | 33 | New Hampshire | 1316469 | | 34 | New Jersey | 8791898 | | 36 | New York | 19378104 | | 42 | Pennsylvania | 12702379 | | 44 | Rhode Island | 1052567 | | 50 | Vermont | 625741 | +-------+---------------+----------+
The one hangup I had was the formatting for the numbers: I really want some commas in there since the values are so large. I couldn’t figure out how to do this using the approach above – I’m writing all the rows in one swoop, and couldn’t step in and and format the last value for each row.
Unless – instead of constructing the table by rows, I construct it by columns. Here’s my second go at it:
from prettytable import PrettyTable import sqlite3 conn = sqlite3.connect('pop_test.sqlite') curs = conn.cursor() curs.execute('SELECT State, Name, ESTIMATESBASE2010 AS Est2010 FROM pop_est WHERE region="1" ORDER BY Name') col_names = [cn for cn in curs.description] rows = curs.fetchall() y=PrettyTable() y.padding_width = 1 y.add_column(col_names,[row for row in rows]) y.add_column(col_names,[row for row in rows]) y.add_column(col_names,[format(row,',d') for row in rows]) y.align[col_names]="l" y.align[col_names]="r" print(y) tabstring = y.get_string() output=open("export.txt","w") output.write("Population Data"+"n") output.write(tabstring) output.close() conn.close()
To add by column, you don’t provide any arguments to the PrettyTable function. You just add the columns one by one: here I call the appropriate values using the index, first for the column name and then for all of the values from the rows that are in the same position. For the last value (the population estimate) I use format to display the value like a decimal number (this works in Python 3.1+ – for earlier versions there’s a similar command – see this post for details). I tried this in my first example but I couldn’t get the format to stick, or got an error. Since I’m specifically calling these row values and then writing them I was able to get it to work in this second example. In this version the alignment specifications have to come last. Here’s the result:
+-------+---------------+------------+ | State | Name | Est2010 | +-------+---------------+------------+ | 09 | Connecticut | 3,574,097 | | 23 | Maine | 1,328,361 | | 25 | Massachusetts | 6,547,629 | | 33 | New Hampshire | 1,316,469 | | 34 | New Jersey | 8,791,898 | | 36 | New York | 19,378,104 | | 42 | Pennsylvania | 12,702,379 | | 44 | Rhode Island | 1,052,567 | | 50 | Vermont | 625,741 | +-------+---------------+------------+
prettytable gives you a few other options, like the ability to sort records by a certain column or to return only the first “n” records from a table. In this example, since we’re pulling the data from a database we could (and did) specify sorting and other constraints in the SQL statement instead. prettytable also gives you the option of exporting the table as HTML, which can certainly come in handy.