We recently created a PostgreSQL / PostGIS database on a server on our local campus network and spent the last few months loading data into it. We have a couple of different SQLite / Spatialite projects that we produce and I needed to move a large number of attribute tables from them into the Postgres database. My initial idea was to simply create a SQL dump file out of SQLite and then restore it in Postgres. I encountered a number of problems in doing this; there are slight differences in how each database creates and handles dump files. The solutions I found involved lousy things like opening the dump file in an editor (not feasible if the file is huge) and finding and replacing parentheses and commas, or running the file through a script to remove them.
That gave me a better idea – Python has SQLite and PostgreSQL modules (sqlite3 and psycopg2 respectfully). I could connect to the SQLite database and load the tables into Python’s data structures, and then simply connect to my PostgreSQL database and write them out. The original CREATE TABLE statements are stored in a master table; if I grab those statements and then the data that goes with them, I can simply recreate everything. The code is below.
First I define a number of variables that I hard code for each batch of tables. The SQLite variables are the name and path to the database (sqdb) and a string that I’ll use in a LIKE clause to grab certain tables (sqlike). For example, if my series of tables starts with yr followed by a year (yr2017) my string would be ‘yr%’. The remaining variables are for the Postgres database: pgdb (database name), pguser, pgpswd (my username and password), pghost, pgport (address of the database server on port 5432), and pgschema which is the name of the schema I want to write to.
I connect to the SQLite database and read all the names of the tables that match my LIKE statement from the master table. This returns a series of tuples where the name of the table is in the first position; I grab these and save them in a list. Then I loop through that list of tables and get the CREATE TABLE statement that’s stored in the master table and save that in string variable called create. Then I fetch all the rows for that table and save them in a tuple called rows. Lastly, I count the number of columns and create the number of string substitutions I’ll need in my SQL statement in a place holder variable (minus the last character, to remove a comma from the end of the statement).
That gives me everything I need for the first table. Then I connect to Postgres, set my schema path, execute the create table statement, and load the values in. Voila! If successful, we return to the top of the table loop and grab the next table. When we’re all done, we close the connection to the database.
#Frank Donnelly, Geospatial Data Librarian #May 22, 2017 #Copies tables and data from a SQLite database and recreates them #in a PostgreSQL database import psycopg2, sqlite3, sys #Change these values as needed sqdb='' sqlike='' pgdb='' pguser='' pgpswd='' pghost='' pgport='5432' pgschema='' consq=sqlite3.connect(sqdb) cursq=consq.cursor() tabnames=[] cursq.execute("SELECT name FROM sqlite_master WHERE type='table' AND name LIKE '%s'" % sqlike) tabgrab = cursq.fetchall() for item in tabgrab: tabnames.append(item[0]) for table in tabnames: cursq.execute("SELECT sql FROM sqlite_master WHERE type='table' AND name = ?;", (table,)) create = cursq.fetchone()[0] cursq.execute("SELECT * FROM %s;" %table) rows=cursq.fetchall() colcount=len(rows[0]) pholder='%s,'*colcount newholder=pholder[:-1] try: conpg = psycopg2.connect(database=pgdb, user=pguser, password=pgpswd, host=pghost, port=pgport) curpg = conpg.cursor() curpg.execute("SET search_path TO %s;" %pgschema) curpg.execute("DROP TABLE IF EXISTS %s;" %table) curpg.execute(create) curpg.executemany("INSERT INTO %s VALUES (%s);" % (table, newholder),rows) conpg.commit() print('Created', table) except psycopg2.DatabaseError as e: print ('Error %s' % e) sys.exit(1) finally: if conpg: conpg.close() consq.close()
There are a few caveats to this. First, data in SQLite is loosely typed, so you’re allowed to get away with storing strings in numeric columns and vice versa. PostgreSQL will balk at this, so if your SQLite data is pretty loose this approach (and any other approach really) will fall flat. You’d have to tighten up your SQLite data first.
Second, this approach doesn’t handle spatial data / geometry columns. In instances where I had spatial data and I tried copying it over, it failed; there are differences with how spatial data is tied to the underlying tables in each database so moving it requires a special process. I tried using some spatial database modules in Python but couldn’t get them working. Ultimately, since my spatial layers were point features and I had the original X and Y coordinates stored in numeric columns, I simply copied the data over and left the geometry behind, and once I was in PostGIS I recreated the geometry from the coordinates. Another alternative would be to use the OGR tools to copy spatial (and attribute) data over – I’ve tried this in a few other instances in the past with success, but was going in the opposite direction (from PostGIS to Spatialite) at the time.
While I haven’t tried it, you could modify the code if you wanted to go in the other direction (copy data from PostgeSQL to SQLite). You would just need to identify the system tables in PostgreSQL where the table names and create statements are stored. Going in this direction, the abundance of data types in PostgreSQL may be a problem – in SQLite your only options are: integer, real, text, and blob. SQLite may be able to take certain types and convert them to what it needs (i.e. take a varchar and save it as text) but I’m not sure it can handle every case. You could always run each create table statement string through a find and replace operation to modify the data types.