Address Search OS OpenNames with PostGIS, SQLAlchemy and Python – PART 1

In this two part post we’ll look at implementing an address search using the Ordnance Survey Open Names dataset. We’ll use the power of Postgres with the PostGIS extension leveraging it’s built in Full Text Search, and use Python and the SQLAlchemy ORM to create a simple CLI.

address-search

Part 1 – Data Load and DB Config

Address Data

The UK is badly served for free address data. The best we have is the Ordnance Survey OpenNames dataset. It will work as a Postcode lookup or a street finder (at a push), but the dataset would require a lot of additional processing to be a useful address search. OS really want you to purchase AddressBase.

That said, OpenNames will suffice for this example and it should be easy to extend the example to a fuller dataset if you’re lucky enough to have one.

Loading Data to PostGIS

You can download OpenNames as either CSV, or GML. I’d recommend GML as it’s simpler to load it into PostGIS using OGR2OGR.

Once you unzip the archive you’ll see that the files are referenced according to the British National Grid, so you can load as much or as little as you want.

We’ll load NS68 which contains addresses in my home town of Stirling, as follows (swap out the values for your db):

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=Real-World port=5432 user=iain password=password" NS68.gml -progress -nln open_names --config PG_USE_COPY YES 

You should now have a new table called open_names containing the addressing info.

Note if you want to load more gml files just use the -append flag:

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=Real-World port=5432 user=iain password=password" NS88.gml -append -progress -nln open_names --config PG_USE_COPY YES 

Setting up Full Text Search

We now have our open_names table, but no text search column. So we can add a textsearchable column which must be of type TSVECTOR as follows:

ALTER TABLE open_names ADD COLUMN textsearchable TSVECTOR;

We can populate the column by using the built in function TO_TSVECTOR, this tokenises the words based on the supplied config, in our case english. However, multiple configs are supported.

UPDATE open_names SET textsearchable = TO_TSVECTOR('english', text || ' ' || localid);

If you look at the data in your new column you’ll see that it now contains text tokens representing the address data.

Increase accuracy by concatenating multiple columns

Note that we’re concatenating 2 columns together in this update statement – text and localid. In our case the reason for doing this is that the postcode in the localid column is stored without a space, meaning our search will return a result if the user enters a postcode without a space.

However, it should be clear if we had better address data, we could concat multiple columns. Meaning if a user searched for “1 Main St, Stirling, FK3 4GG” we would be able to return an accurate match.

Add an Index for faster searching

Now that we have data set up we can add an index to our new column which will ensure searches are fast:

CREATE INDEX textsearch_idx ON open_names USING GIN (textsearchable);

Let’s do some searches

Now lets query our new column to see if we can find some matches using the TO_TSQUERY function

SELECT COUNT(1) FROM open_names WHERE textsearchable @@ TO_TSQUERY('english', 'avenue')

Here we find we have 41 streets in Stirling area containing the word avenue. You’ll note that I don’t need to worry about lowercase, uppercase or where the word might appear in the string. Full text search takes care of that for me 🙂

The @@ operator basically means that the query matches the tsvector column.

Using AND and OR for better matches

A very powerful feature of Postgres’ Full Text Search is the ability to find matches contain all or some of the words in the query using the AND & operator or the OR | operator, as these examples show:

select * from open_names where textsearchable @@ to_tsquery('english', 'forth & view');

Here we only return one result Forth View which contains both Forth and View, if we change this to an OR search:

select * from open_names where textsearchable @@ to_tsquery('english', 'forth | view')

We get 7 results including Forth View, Bruce View, Forth Place.

Again it should be easy to see how powerful text searches could be built for complex text documents.

A final note on Triggers

While our address data is fairly static, if you had a table where users were regularly editing address data, or any other columns you wanted to run a full text search on, you should consider adding a trigger to keep the TSVECTOR column up to date, as outlined here.

So for our example the trigger would look like:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON open_names FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, 'pg_catalog.english', localid, text);

Up Next

Hopefully Part 1 has demonstrated how it is very easy to set up powerful text searching in Postgres. In Part 2 we’ll look at how we can use Python and SQLAlchemy to allow you to integrate this functionality into your apps and APIs.

Viewsheds and Visibility Analysis in QGIS

Recently I had the occasion to do a bit of viewshed analysis in QGIS. I have done these before using tools in ArcGIS, but this was the first time I had the pleasure of doing this kind of analysis in QGIS. I was impressed by the simplicity and flexibility of the toolset.

The views from our office at Stirling are fairly legendary, with Stirling Castle, much of the Ochil Hills and the Wallace Monument all visible from our windows. But how to quantify this – what can we see from our office? This is what a viewshed does – using a height model and a position on the map it will tell you what you can (theoretically) see.

As a starting point, we can use the Ordnance Survey’s 50m Digital Elevation Model: Terrain 50.

Scotland DEM

So now we have the heights of all the hills and glens of central Scotland represented in a raster.

The function we are going to use is the Grass r.viewshed function which, rather conveniently, does most of the heavy mathematical lifting for you.

The R.Viewshed function is a GRASS function which integrates nicely into QGIS allowing you to use the algorithm without delving into the GRASS interfaces.

r_viewshed

So here we are using the Elevation Model called Terrain 50, the coordinates identifying the viewing position (thinkWhere’s office in Stirling).

We are on the first floor and without measuring the height of the building and our average employee standing up, we have estimated the height of the viewing point at 5m.

The viewshed analysis allows you to set up a height to offset the target elevations as well. This would be very useful for an analysis of windfarm visibility or  for a radio mast.

A good way to think about it is like this:

viewshed theory1

In this example, the viewer is looking to see what part of the landscape they can see. The height of the person is the viewing position and there is no height offset for heights on the landscape.

In this example, the viewing position is on the top of a hill and looking out at objects of a defined height.

viewshed theory2

So we can run this algorithm in QGIS to compute the theoretical view from our office.

The output looks a bit like this. Now it needs a little bit of processing to get it to something a bit more meaningful.

viewshed output

First of all, we’ll make this a vector for better symbology options. We can use the GDAL “Polygonize” function for this. The way that QGIS integrates these different applications into one place makes life so much easier for the GIS analyst.

polygonize

Bringing the QGIS symbologies into play, we can show the viewshed like this:

viewshed output_backdrop

However, we can have more fun than that.

Projecting a hillshade on to the Digital Elevation Model and bringing in some Ordnance Survey Vector data allows us to produce something far more meaningful.

viewshed_carto

Using the hillshade analysis (that is the way to  create the really funky 3d effect on the landscape) and bringing in some of the Vector Map District from the Ordnance Survey you can bring this map alive. Using only a very small amount of the data and concentrating on the physical features a viewshed analysis can be easily understood.

Using the blend mode stops the viewshed from obscuring the underlying data in the hillshade raster, making the information easy to understand. A few labels from the VMD names dataset helps give a little bit of context.

Viewsheds are very heavily used in renewables, telecommunication, and planning. QGIS, along with Ordnance Survey Open Data allows this to be done by anyone with access to a computer.

 

Analysing Placenames in QGIS

We meet many different people trying to do different things here at thinkWhere and one of our attendees at our QGIS Skills of Analysis & Statistics was particularly interested in looking at place names. Place names are a really interesting way of telling what kind of history a place has had. The Island of Jura is a great example of this deriving from an old Norse word meaning Animal Island. Common Scottish names such as Tarbert and Ben Mor are Gaelic derived place names translating as a narrow strip of land or a big hill respectively.

While we often wonder about the originality of our forbears in how they choose to name things, these are the footprints of history in maps we use every day. They teach us about where we come from and how the peoples who come before us lived with their landscape. It’s very romantic and also a terrific thing for some GIS analysis.

QGIS has some nice ways of helping with this kind of thing, alongside OpenStreetMap which is a rich source of place name data. The Ordnance Survey Open Names dataset would also work well for this kind of analysis.

Duns & Dums

A common word found in Scottish place names is “Dun” meaning a hill fort or roundhouse in the Celtic tradition. This is sometimes rendered as “Dun” (as in Dundee or Dunfermline) or as “Dum” (As in Dumfries or Dumbarton). Less common in Scotland, although still prevalent in Ireland is the prefix “Don” (As in Donegal), however this gets confused in Scotland due to the River Don. Don in Scots Gaelic is more associated geographically with the Celtic river goddess.

In QGIS we can use the expression:

“name” ILIKE ‘%dun%’ OR “name” ILIKE ‘%dum%’

To select or filter for any words that contain those strings of letters.

map_of_dun

The ILIKE function (a case-insensitive LIKE) is a great way of doing these kind of analyses and you can use the “%” wildcard to make selections in almost any context.

With a spot of good symbology and the use of the Print Composer to show multiple maps on the same page. You can show the distribution of a few different place name components. Having consistent maps next to each other on a page like this helps to show distribution.

placenames

A really effective way of showing this kind of data can be to use the heatmap. The last few versions of QGIS have put this into the standard symbology tool which allows you to generate them on the fly without the need to create a whole new raster. This makes it much easier. This is what the distribution of “dun”/”dum” components looks like.

dun_heatmap

The big problem with this kind of analysis is reliance on the data. The OpenStreetMap relies on users adding in data themselves and that makes an analysis of this nature a bit shaky with crowd sourced data.

The OS Open Names dataset would be a great way of doing that, but it is a huge dataset that would need a database set up for it. We do cover database creation in our advanced QGIS course, but it would be a good follow up.

Check out our QGIS training offerings at: thinkWhere.com