Inserting Data into Templates with Python and Jinja

In this post, I’m picking up where I left off and will cover the different methods I experimented with to get data out of a SQLite database and into a Jinja LaTeX template using Python. I’m using the NYC Geodatabase as my test case.

Standard Elements – Used Each Time

First – the Python script. For each iteration, the top half of the script remains the same. I import the necessary modules, and I set up my Jinja2 environment. This tells Jinja how to handle LaTeX syntax. I borrowed this code directly from the invaluable slides posted here. The only part that gets modified each time is the .get_template() bit, which is the actual LaTeX template with Jinja mark-up that is used for creating the reports.

import sqlite3

import jinja2
import os
from jinja2 import Template

latex_jinja_env = jinja2.Environment(
    block_start_string = 'BLOCK{',
    block_end_string = '}',
    variable_start_string = 'VAR{',
    variable_end_string = '}',
    comment_start_string = '#{',
    comment_end_string = '}',
    line_statement_prefix = '%-',
    line_comment_prefix = '%#',
    trim_blocks = True,
    autoescape = False,
    loader = jinja2.FileSystemLoader(os.path.abspath('.'))
)
# Modify to specify the template
template = latex_jinja_env.get_template('test1.tex')

The method for connecting to a SQLite database is also the same each time. There are a zillion tutorials and posts for working with Python and SQLite so I won’t belabor that here. Take a look at this excellent one or this awesome one.

conn = sqlite3.connect('nyc_gdb_jan2015a/nyc_gdb_jan2015.sqlite')
curs = conn.cursor()
curs.execute('SELECT * FROM b_pumas_2013acs1 ORDER BY GEOID2 LIMIT 3')

col_names = [cn[0] for cn in curs.description]
rows = curs.fetchall()
conn.close()

First Iteration – Pass Individual Variables to the Template

Here’s the bit that I modify each time. Using the example from the tutorial slides, I loop through the rows returned from my database, and I specify individual variables each time by slicing the elements in the row and assigning them a name which is passed out to the template with template.render(). Then I make a call to LaTeX to generate the PDF file (straightforward since I’m using Linux), one for each row (which represent geographic areas). Each file is named using the unique ID number of the geography, which we grabbed from our row list.

for row in rows:
    filename='zpuma_' + row[0] + '.tex'
    folder='test1'
    outpath=os.path.join(folder,filename)
    outfile=open(outpath,'w')
    outfile.write(template.render(geoid=row[0], geolabel=row[1], hshld=row[2], hshldmoe=row[3]))
    outfile.close()
    os.system("pdflatex -output-directory=" + folder + " " + outpath)

That’s the Python piece. The LaTeX template with the Jinja mark-up looks like this:

documentclass{article}
usepackage[margin=0.5in]{geometry}
usepackage[group-separator={,}]{siunitx}

title{VAR{geolabel | replace("&","&")} VAR{geoid}}
date{}

begin{document}

maketitle
pagestyle{empty}
thispagestyle{empty}  

begin{tabular}{|c|c|c|}
hline
& Estimate & Margin of Error\
Households: & num{VAR{hshld}} & +/- num{VAR{hshldmoe}}\
hline
end{tabular}

end{document}

You can see here where I’m passing in the variables with VAR – I’m using the same variable names that I created in the script to hold the row elements. I have to do a little bit of formatting to get this to work. First, one of my variables is text description that consistently contains an ampersand, so I have to use replace (a construct from Jinja) to replace & with & so LaTeX can properly escape it. Second, I want to format my numeric variables with a thousands separator. Here I use a LaTeX construct with the siunitx package, and every place a number appears I mark it with num. For this to work I always need to know that this variable will be a number; if it’s text or null LaTeX will throw an error and the doc won’t compile (an alternative to using this LaTeX solution would be to use Python’s formatting constructs). My simple output is below.

test1

Second Iteration – Pass Variables to Template in a List

Since I’m going to be passing lots of variables out to my template, it would be tedious if I had to declare them all individually, one by one. It would be better if I could pass out an entire list, and then do the slicing to get what I want in the template. Here’s the Python for doing that:

for row in rows:
    filename='zzpuma_' + row[0] + '.tex'
    folder='test2'
    outpath=os.path.join(folder,filename)
    outfile=open(outpath,'w')
    outfile.write(template.render(thelist=row))
    outfile.close()
    os.system("pdflatex -output-directory=" + folder + " " + outpath)

And here’s the LaTeX template – in this example I modified the variables a bit.

documentclass{article}
usepackage[margin=0.5in]{geometry}
usepackage[group-separator={,}]{siunitx}

title{VAR{thelist[2] | replace("&","&")} VAR{thelist[1]}}
date{}

begin{document}

maketitle
pagestyle{empty}
thispagestyle{empty}  

begin{tabular}{|c|c|c|c|c|}
hline
& Estimate & Margin of Error & Percent Total & Percent Margin of Error\
Car, truck, or van alone: & num{VAR{thelist[171]}} & +/- num{VAR{thelist[172]}} & num{VAR{thelist[173]}} & +/- num{VAR{thelist[174]}}\
Car, truck, or van carpooled: & num{VAR{thelist[175]}} & +/- num{VAR{thelist[176]}} & num{VAR{thelist[177]}} & +/- num{VAR{thelist[178]}}\
hline
end{tabular}

end{document}

While this is a bit better, the template is harder to read – you can’t really figure out what’s in there as you just have a bunch of list slices. You also have to keep careful track of which indices apply to what element, so you know what you’re generating. I thought I could improve this by creating nested lists where the column headings from the database get carried along, and I could reference them somehow. Then I had a better idea.

test2

Third Iteration – Pass Variables to Template in a Dictionary

I decided to use a dictionary instead of a list. Here’s the Python – since I grabbed the columns back in the database section of my code, I can loop through the elements in each row and create a dictionary by zipping the column names and row elements together, so the column name becomes the key and the row element is my data value. Then I pass the whole dictionary out to the template.

for row in rows:
    thedict=dict(zip(col_names,row))
    filename='zzpuma_' + row[0] + '.tex'
    folder='test3'
    outpath=os.path.join(folder,filename)
    outfile=open(outpath,'w')
    outfile.write(template.render(d=thedict))
    outfile.close()
    os.system("pdflatex -output-directory=" + folder + " " + outpath)

Now in the template, using Jinja I embed dict.get() for each variable and specify the key (column name) and the output will be the value. This is now highly readable, as I can see the names of the columns for the variables and there’s less potential for a mix-up.

documentclass{article}
usepackage[margin=0.5in]{geometry}
usepackage[group-separator={,}]{siunitx}

title{VAR{d.get('GEOLABEL') | replace("&","&")} VAR{d.get('GEOID2')}}
date{}

begin{document}

maketitle
pagestyle{empty}
thispagestyle{empty}  

begin{tabular}{|c|c|c|c|c|}
hline
& Estimate & Margin of Error & Percent Total & Margin of Error\

Car, truck, or van alone: & num{VAR{d.get('COM02_E')}} & +/- num{VAR{d.get('COM02_M')}} & num{VAR{d.get('COM02_PC')}} & +/- num{VAR{d.get('COM02_PM')}}\

Car, truck, or van carpooled: & num{VAR{d.get('COM03_E')}} & +/- num{VAR{d.get('COM03_M')}} & num{VAR{d.get('COM03_PC')}} & +/- num{VAR{d.get('COM03_PM')}}\
hline
end{tabular}

end{document}

In this case, the output looks the same as it did in our last iteration. Those are some basic methods for getting data into a template, and in my case I think the dictionary is the ideal data structure for this. In going further, my goal is to keep all the formatting and presentation issues in LaTeX, and all the data processing and selection pieces in Python.