Address Search OS OpenNames with PostGIS, SQLAlchemy and Python – PART 1

In this two part post we’ll look at implementing an address search using the Ordnance Survey Open Names dataset. We’ll use the power of Postgres with the PostGIS extension leveraging it’s built in Full Text Search, and use Python and the SQLAlchemy ORM to create a simple CLI.

address-search

Part 1 – Data Load and DB Config

Address Data

The UK is badly served for free address data. The best we have is the Ordnance Survey OpenNames dataset. It will work as a Postcode lookup or a street finder (at a push), but the dataset would require a lot of additional processing to be a useful address search. OS really want you to purchase AddressBase.

That said, OpenNames will suffice for this example and it should be easy to extend the example to a fuller dataset if you’re lucky enough to have one.

Loading Data to PostGIS

You can download OpenNames as either CSV, or GML. I’d recommend GML as it’s simpler to load it into PostGIS using OGR2OGR.

Once you unzip the archive you’ll see that the files are referenced according to the British National Grid, so you can load as much or as little as you want.

We’ll load NS68 which contains addresses in my home town of Stirling, as follows (swap out the values for your db):

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=Real-World port=5432 user=iain password=password" NS68.gml -progress -nln open_names --config PG_USE_COPY YES 

You should now have a new table called open_names containing the addressing info.

Note if you want to load more gml files just use the -append flag:

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=Real-World port=5432 user=iain password=password" NS88.gml -append -progress -nln open_names --config PG_USE_COPY YES 

Setting up Full Text Search

We now have our open_names table, but no text search column. So we can add a textsearchable column which must be of type TSVECTOR as follows:

ALTER TABLE open_names ADD COLUMN textsearchable TSVECTOR;

We can populate the column by using the built in function TO_TSVECTOR, this tokenises the words based on the supplied config, in our case english. However, multiple configs are supported.

UPDATE open_names SET textsearchable = TO_TSVECTOR('english', text || ' ' || localid);

If you look at the data in your new column you’ll see that it now contains text tokens representing the address data.

Increase accuracy by concatenating multiple columns

Note that we’re concatenating 2 columns together in this update statement – text and localid. In our case the reason for doing this is that the postcode in the localid column is stored without a space, meaning our search will return a result if the user enters a postcode without a space.

However, it should be clear if we had better address data, we could concat multiple columns. Meaning if a user searched for “1 Main St, Stirling, FK3 4GG” we would be able to return an accurate match.

Add an Index for faster searching

Now that we have data set up we can add an index to our new column which will ensure searches are fast:

CREATE INDEX textsearch_idx ON open_names USING GIN (textsearchable);

Let’s do some searches

Now lets query our new column to see if we can find some matches using the TO_TSQUERY function

SELECT COUNT(1) FROM open_names WHERE textsearchable @@ TO_TSQUERY('english', 'avenue')

Here we find we have 41 streets in Stirling area containing the word avenue. You’ll note that I don’t need to worry about lowercase, uppercase or where the word might appear in the string. Full text search takes care of that for me ūüôā

The @@ operator basically means that the query matches the tsvector column.

Using AND and OR for better matches

A very powerful feature of Postgres’ Full Text Search is the ability to find matches contain all or some of the words in the query using the AND & operator or the OR | operator, as these examples show:

select * from open_names where textsearchable @@ to_tsquery('english', 'forth & view');

Here we only return one result Forth View which contains both Forth and View, if we change this to an OR search:

select * from open_names where textsearchable @@ to_tsquery('english', 'forth | view')

We get 7 results including Forth View, Bruce View, Forth Place.

Again it should be easy to see how powerful text searches could be built for complex text documents.

A final note on Triggers

While our address data is fairly static, if you had a table where users were regularly editing address data, or any other columns you wanted to run a full text search on, you should consider adding a trigger to keep the TSVECTOR column up to date, as outlined here.

So for our example the trigger would look like:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON open_names FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, 'pg_catalog.english', localid, text);

Up Next

Hopefully Part 1 has demonstrated how it is very easy to set up powerful text searching in Postgres. In Part 2 we’ll look at how we can use Python and SQLAlchemy to allow you to integrate this functionality into your apps and APIs.

Restoring a Postgres database to AWS RDS using Docker

In this post I look at using Docker to restore a Postgres dump file to a Postgres database running in the cloud on AWS RDS.

Keep it clean

One of the big selling points of docker, for me, is that I can have lots of apps and utils running in nice containers¬†on my dev laptop, without having to install them locally. ¬†This ensures my laptop stays nice and responsive and I don’t clutter/break my laptop with lots of weird dependencies and running processes that I’m then too scared to delete.

Postgres is a good example – I don’t want to install it locally, but I do need access to the command line tools like psql and pg_restore, to be able to work with my databases effectively.

One way of accessing these tools would be to ssh onto the AWS cloud instances, but there’s a bunch of reasons most pertinently security (not to mention the faff) why you’d want to avoid¬†that every time you want to run some sql. ¬†So let’s look at how we use Docker to ease the pain instead.

Start Me Up

With Docker installed you can build¬†this simple Dockerfile to create a local Postgres container. ¬†The User and Password env vars aren’t strictly required, however, if you want to actually connect to the containerised DB, it’s pretty handy

You can build, run and connect to the container as follows (assumes you are on Mac)

Note line 4 where I map the data-load dir I created at line 1 to a new directory called data-loader inside my container.  This means that when I copy the Postgres dump file into my local data-load directory, it will be available to the postgres tools available in the container.

Line 6  allows me to connect to the container, swap the imageId  for your locally running containerID.

Restoring your database with pg_restore

I’ll assume you already have a Postgres database set up within the AWS cloud. ¬†So now we have connected to our container, we can use pg_restore to use restore our dumpfile into AWS (note this command will prompt you for the admin password)

A note on schemas

If you’re doing a partial restore, you may want to restore your dumpfile to a separate schema. ¬†Unfortunately there appears to be no way to do this from the command line. ¬†What you have to do is to rename the public schema, create a new public schema and restore into that, then reverse the process.

This StackOverflow answer outlines the process.

Restore Complete

You should now have a complete restore of your dumpfile in the cloud.  Please add comments if anything is unclear.