Address Search OS OpenNames with PostGIS, SQLAlchemy and Python – PART 2

Part 1 of this post outlined how to configure a PostGIS database to allow us to run Full Text searches against the OS OpenNames dataset.

In Part 2 we look at writing a simple Python 3 CLI app that will show you how easy it is to integrate this powerful functionality into your apps and APIs. Other than Python the only dependency we need is the SQLAlchemy ORM to let our app communicate with Postgres.

address-search

Installing SQLAlchemy

SQLAlchemy can be installed using pip. It is dependent on psycopg2, which you may struggle to install on Mac without Postgres present, which is frustrating (however solutions can be found on Stack Overflow).

A simple address search CLI

Let me draw your attention to…

Hopefully this script is fairly easy to follow, but there are a couple of lines to draw your attention to

  • Line 4 – Note we have to tell SQLAlchemy we’re using the Postgres dialect so it understands TSVECTOR.
  • Lines 8 – 12 is simply SQLAlchemy boiler plate that sets up our connection and session for the app. You’ll need to swap out the connection details for your own.
  • Lines 17-20 I’ve chosen to map only 3 columns, you’ll probably want to map more.
  • Line 25 – is very important, here we append the OR operator to every word the user has supplied, meaning we’re returning addresses. You could extend this to allow the user to specify on exact match operator and change this to an & search.
  • Line 26 – Finally note we ask SQLAlchemy to match our search, and importantly we must supply the postgresql_reconfig param to say we’re searching in English. This is vital or you wont get the matches you expect.

Running our app

We can run our app from the command line simply by entering the following command:

python address_search.py 'forth street'

And we see our app print out all matching addresses that contain either Forth or Street ūüôā

Ends

Hopefully you can see how easy it would be take the above code and integrate it into your apps and APIs. I hope you’ve found these tutorials useful. Happy text searching!

Address Search OS OpenNames with PostGIS, SQLAlchemy and Python – PART 1

In this two part post we’ll look at implementing an address search using the Ordnance Survey Open Names dataset. We’ll use the power of Postgres with the PostGIS extension leveraging it’s built in Full Text Search, and use Python and the SQLAlchemy ORM to create a simple CLI.

address-search

Part 1 – Data Load and DB Config

Address Data

The UK is badly served for free address data. The best we have is the Ordnance Survey OpenNames dataset. It will work as a Postcode lookup or a street finder (at a push), but the dataset would require a lot of additional processing to be a useful address search. OS really want you to purchase AddressBase.

That said, OpenNames will suffice for this example and it should be easy to extend the example to a fuller dataset if you’re lucky enough to have one.

Loading Data to PostGIS

You can download OpenNames as either CSV, or GML. I’d recommend GML as it’s simpler to load it into PostGIS using OGR2OGR.

Once you unzip the archive you’ll see that the files are referenced according to the British National Grid, so you can load as much or as little as you want.

We’ll load NS68 which contains addresses in my home town of Stirling, as follows (swap out the values for your db):

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=Real-World port=5432 user=iain password=password" NS68.gml -progress -nln open_names --config PG_USE_COPY YES 

You should now have a new table called open_names containing the addressing info.

Note if you want to load more gml files just use the -append flag:

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=Real-World port=5432 user=iain password=password" NS88.gml -append -progress -nln open_names --config PG_USE_COPY YES 

Setting up Full Text Search

We now have our open_names table, but no text search column. So we can add a textsearchable column which must be of type TSVECTOR as follows:

ALTER TABLE open_names ADD COLUMN textsearchable TSVECTOR;

We can populate the column by using the built in function TO_TSVECTOR, this tokenises the words based on the supplied config, in our case english. However, multiple configs are supported.

UPDATE open_names SET textsearchable = TO_TSVECTOR('english', text || ' ' || localid);

If you look at the data in your new column you’ll see that it now contains text tokens representing the address data.

Increase accuracy by concatenating multiple columns

Note that we’re concatenating 2 columns together in this update statement – text and localid. In our case the reason for doing this is that the postcode in the localid column is stored without a space, meaning our search will return a result if the user enters a postcode without a space.

However, it should be clear if we had better address data, we could concat multiple columns. Meaning if a user searched for “1 Main St, Stirling, FK3 4GG” we would be able to return an accurate match.

Add an Index for faster searching

Now that we have data set up we can add an index to our new column which will ensure searches are fast:

CREATE INDEX textsearch_idx ON open_names USING GIN (textsearchable);

Let’s do some searches

Now lets query our new column to see if we can find some matches using the TO_TSQUERY function

SELECT COUNT(1) FROM open_names WHERE textsearchable @@ TO_TSQUERY('english', 'avenue')

Here we find we have 41 streets in Stirling area containing the word avenue. You’ll note that I don’t need to worry about lowercase, uppercase or where the word might appear in the string. Full text search takes care of that for me ūüôā

The @@ operator basically means that the query matches the tsvector column.

Using AND and OR for better matches

A very powerful feature of Postgres’ Full Text Search is the ability to find matches contain all or some of the words in the query using the AND & operator or the OR | operator, as these examples show:

select * from open_names where textsearchable @@ to_tsquery('english', 'forth & view');

Here we only return one result Forth View which contains both Forth and View, if we change this to an OR search:

select * from open_names where textsearchable @@ to_tsquery('english', 'forth | view')

We get 7 results including Forth View, Bruce View, Forth Place.

Again it should be easy to see how powerful text searches could be built for complex text documents.

A final note on Triggers

While our address data is fairly static, if you had a table where users were regularly editing address data, or any other columns you wanted to run a full text search on, you should consider adding a trigger to keep the TSVECTOR column up to date, as outlined here.

So for our example the trigger would look like:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON open_names FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, 'pg_catalog.english', localid, text);

Up Next

Hopefully Part 1 has demonstrated how it is very easy to set up powerful text searching in Postgres. In Part 2 we’ll look at how we can use Python and SQLAlchemy to allow you to integrate this functionality into your apps and APIs.

Restoring a Postgres database to AWS RDS using Docker

In this post I look at using Docker to restore a Postgres dump file to a Postgres database running in the cloud on AWS RDS.

Keep it clean

One of the big selling points of docker, for me, is that I can have lots of apps and utils running in nice containers¬†on my dev laptop, without having to install them locally. ¬†This ensures my laptop stays nice and responsive and I don’t clutter/break my laptop with lots of weird dependencies and running processes that I’m then too scared to delete.

Postgres is a good example – I don’t want to install it locally, but I do need access to the command line tools like psql and pg_restore, to be able to work with my databases effectively.

One way of accessing these tools would be to ssh onto the AWS cloud instances, but there’s a bunch of reasons most pertinently security (not to mention the faff) why you’d want to avoid¬†that every time you want to run some sql. ¬†So let’s look at how we use Docker to ease the pain instead.

Start Me Up

With Docker installed you can build¬†this simple Dockerfile to create a local Postgres container. ¬†The User and Password env vars aren’t strictly required, however, if you want to actually connect to the containerised DB, it’s pretty handy

You can build, run and connect to the container as follows (assumes you are on Mac)

Note line 4 where I map the data-load dir I created at line 1 to a new directory called data-loader inside my container.  This means that when I copy the Postgres dump file into my local data-load directory, it will be available to the postgres tools available in the container.

Line 6  allows me to connect to the container, swap the imageId  for your locally running containerID.

Restoring your database with pg_restore

I’ll assume you already have a Postgres database set up within the AWS cloud. ¬†So now we have connected to our container, we can use pg_restore to use restore our dumpfile into AWS (note this command will prompt you for the admin password)

A note on schemas

If you’re doing a partial restore, you may want to restore your dumpfile to a separate schema. ¬†Unfortunately there appears to be no way to do this from the command line. ¬†What you have to do is to rename the public schema, create a new public schema and restore into that, then reverse the process.

This StackOverflow answer outlines the process.

Restore Complete

You should now have a complete restore of your dumpfile in the cloud.  Please add comments if anything is unclear.

My work placement week @ thinkWhere

My name is Yacouba Traore. I am currently studying my second year BSc (Hons) Information Technology at Teeside University School of Computing.

I have had a great week placement at thinkWhere. During my week placement I was presented with a variety of opportunities. I had a great chance to meet everyone from the various parts of the business including CEO, Portfolio Manager, Business Managers, Accounts Managers, Developers, Service Desk Consultants and Office Administrators.

I have learned how the agile development systems work such as Scrum, as well as the differences between Agile and Waterfall. I have taken part in the different ceremonies including daily stand up, backlog refinement, demo and sprint planning.

Attending the Scrum daily-stand up

I have also learned about GIS (Geographical Information Systems) and the theMapCloud. I have also created my own map using QGIS where I am from in C√īte d‚ÄôIvoire (Ivory Coast).

My first QGIS map of C√īte d‚ÄôIvoire

The thinkWhere development team also worked with me to develop my coding skills including help with JavaScript, HTML and CSS using PyCharm. I used these skills to script a page showing a graph of theMapCloud usage metrics.

The role has allowed me to learn key skills and competencies of IT and business systems. On top of this I delivered 30 minutes of presentation about myself and a project I am working on at university.

One of the best things about the company is the people. They were very friendly, approachable and well organised. I have also made a great network of colleagues and made new friends.

I have really enjoyed my work placement. I have learnt a lot and have gained skills that I will take forward with me. I have also been given many opportunities and many new experiences. I’ve gained a deeper understanding of how large IT projects work, which is going to help me in the future.

Developing my JavaScript, HTML & CSS skills using PyCharm.

Being part of this placement has also helped me to develop my interview skills and my job prospects. It is also very valuable experience for my CV.

I hope to be back at thinkWhere again one day!

HOT Tasking Manager 3.0 Development Underway at thinkWhere

We get to work on some great projects here at thinkWhere, but we’re particularly proud of the project we’ve just started with the Humanitarian OpenStreetMap Team¬†(HOT)¬†to develop the Tasking Manager version 3.0 (TM3). It’s great to work in an industry where the work we do with maps can have such a tangible impact on the humanitarian effort. thinkWhere support this wherever we can by supporting MapAction¬†by both supporting my own personal volunteering (which you can read more about here) and through fundraising efforts such the running the Stirling Marathon, which is open for sponsorship here¬†so please donate if you can!¬†Therefore being given the opportunity to also get involved with the HOT Community and deliver TM3¬†is something we’re extremely proud of.

The current HOT Tasking Manager (TM2) coordinates volunteer mapper contribution to OpenStreetMap, meaning the global network of HOT volunteers can map affected areas efficiently providing disaster responders on the ground such as MapAction, MSF and the Red Cross access to detailed mapping during the response.

Following a significant increase in the capture of OSM data through initiatives such as Missing Maps, and subsequent loads on existing servers and software, the development of TM3 aims to better meet the needs of mappers, validators and project managers.  This will be achieved by taking advantage of the very latest advances and innovations in web development frameworks and methodologies.

Blake Girardot, TM3 Project Manager said…

‚ÄúWe are very excited to be working with thinkWhere to develop the next generation of HOT’s Tasking Manager application. The team at thinkWhere brings a wealth of geospatial development experience, talent and insight to the project. Used by thousands of people around the world, our Tasking Manager software is the key technology component that enables our humanitarian mapping work and having thinkWhere as partners ensures the development project will be a success and deliver a great result for our community‚ÄĚ.

In early February 2017, we travelled to HOT’s office in Washington DC for the project initiation meeting with various project stakeholders to discuss initial requirements and wireframe designs.

This engagement with some of the largest users of the Tasking Manager to discuss functionality and solicit feedback on how features might be implemented was a great way to start the project.¬† A long, but very productive day, the discussion involved representatives from Missing Maps, YouthMappers, TeachOSM, GeoCenter, the US State Department’s Humanitarian Information Unit, Mapbox, HOT Project Managers, American Red Cross and Ethan Nelson, the lead community development volunteer.

thinkWhere‚Äôs Chief Executive Alan Moore said…

‚ÄúWe‚Äôre really delighted and privileged to be working with the team at HOT. Redevelopment of the Tasking Manager will be key to the future growth and sustainability of the humanitarian mapping effort across the world. The development work flows naturally from the innovative work we‚Äôve been doing recently on theMapCloud, our new spatial data platform, and we‚Äôre keen to bring those advantages to the benefit of HOT and the global mapping community‚ÄĚ.

Since arriving back in Scotland from the States the team have been hard at work developing TM3. You can follow development progress of TM3 on the Tasking Manager GitHub repository and can also view the TM3 Staging Site.

Everyone is invited to participate in the process. Comments, questions and issues are needed and encouraged on the GitHub repository’s Issues tab, so please get involved!

Counting Points and the Process Modeller in QGIS

Here at thinkWhere we’ve recently been working with the automation tools in QGIS. The processing toolbox is a much underrated feature in QGIS, lacking the jazzy graphics of the ArcGIS equivalent it is, however, just as useful and functional from the point of view of doing multi-stage GIS analysis.

We recently had occasion to create a tool for a client that counted the number of features within a polygon and provide statistics for subsets of the features. Let’s try this again, for the sake of demonstration and talk about trees.

map1

Here we have a series of suburbs arranged as polygons and trees as points. As you can see there are 5 types of tree in the dataset. A Spatialite or PostGIS database would make quick work of this, but you can also use the QGIS processing toolbox to iterate through this and count each aspect of the data.

Effectively from the point dataset we need to extract each of the attributes.

Extract by Attribute from Trees¬† ‚ÄúType = Beech‚ÄĚ etc.

extract-by-attribute-setup

This creates effectively a subset of the input points dataset which can then be counted via the polygons.

We can then use the count points in polygon to count the number of points that sit inside each of the neighbourhoods.

Count Points in Polygon From Beech Subset of Input Points by Input Polygon, Create field ‚ÄúBeech‚ÄĚ

count-points-in-polygon-setup

Using standard QGIS this will create a new dataset each time you run an algorithm, but the great strength of the processing modeller is that these intermediate steps can run in the background as temporary layers, rather than confusing your users with multiple layers.

Of course what needs to happen now is that the next iteration (Birch) will need to be counted using the polygon dataset you created for the counts in the Beech dataset. If you think about it, the geography of that dataset is the same as the Input Polygon (the neighbourhoods). It also carries forward all the attributes for the layer as well. So for each iteration you are not using the original input polygons, but adding the new attribute information on for each count.

In the processing toolbox this looks like this:

modelimage

When you run the model, you get a polygon shapefile with a new set of attributes showing the number of each of the tree types in the area. You can then generate graphs or use the data for other purposes.

treecounts

This is only one of the many ways that the automation tools in QGIS can help you. It is also a really useful system for analysis. QGIS is hugely flexible and has a load of options for this. To learn more and see details of our QGIS training offerings please see our training page.

Automated Testing with BrowserStack & Selenium

Ensuring high quality software requires a lot of testing. Whether it’s code peer-review, unit testing, integration testing, system testing or exploratory user testing – it all has to be done! Therefore we look to automate our testing where possible.¬†This blog post explores the automation of browser compatibility¬†testing in BrowserStack using Selenium WebDriver.

BrowserStack is a cloud-based platform for testing applications in various Operating Systems and browsers. It is used for both Continuous Integration (CI) and cross-browser testing. Selenium WebDriver is a tool for driving browsers to replicate and automate user journeys and assert presence and functionality of page elements.

bstackWe aim to integrate automated testing into our Continuous Integration pipeline, however a number of our more established products are not yet part of this pipeline. One such product is Location Centre which is used by many local authorities where the end-user often has limited control over the choice of OS and browser. This often restricts the development to suit the majority of browsers used and their degree of modernity. Despite not being part of our CI pipeline testing our software to ensure high quality is just as important. Therefore I’ll now provide an example of how simple and straightforward it is to automate tests for Location Centre using BrowserStack.

BrowserStack offers a 100 minute trial so I recommend you also try it out! The site provides a range of multi-system environments for manual and automated testing as shown in the image below. bw_mobile_devices

BrowserStack runs the tests using Selenium WebDriver and these code examples serve as a great starting point. The tests can be scripted manually or by using Selenium IDE plugin for Firefox Рthe extension allows you to record the user steps with minimum coding, thus minimising hand-written code and saving you time. It provides hundreds of commands for element assertion, mouse position and many more. The tool decides by itself how to locate the element. The plugin is user-friendly and well-supported and the process of recording is quick and intuitive.

I have recorded the simple process of logging in to Location Centre, selecting the ‘Mapping’ tab and then zooming and panning the map.

selelnium_ide

Once created test cases can be exported in your language of choice – in our case it’s Python.

(File> Export Test As> Python 2  / unittest / WebDriver)

We can add the exported test case to our IDE. All we need to do now is modify the setup function to add the BrowserStack keypass and specify the browser and OS requirements.

We find it’s always worth starting the tests with the high risk browsers which is traditionally Internet Explorer which is used by the majority of Location Centre customers. Gov.UK recommends testing from IE8 and up, so we will run the test in¬†IE8.

IE8 runs on Windows 7 and Windows XP, making it difficult to test locally. You could get IE8 from Microsoft virtual machine but BrowserStack makes it much easier. Using the BrowserStack tool we can get the capabilities and add them to the setUp function:

system_capabilities

System capabilities and the authorisation information can be input into the setup:

class WebDriver(unittest.TestCase):
    def setUp(self):
       <strong> self.verificationErrors = []
        url = 'http://USERNAME:PASSCODE@hub.browserstack.com:80/wd/hub'
        self.driver = webdriver.Remote(command_executor=url, desired_capabilities= {'browser': 'IE', 'browser_version': '8.0', 'os': 'Windows', 'os_version': '7', 'resolution': '1024x768'})</strong>

    def test_leics_l_c(self):
        driver = self.driver
        driver.get("https://lctrial.locationcentre.co.uk/")
        driver.find_element_by_id("ContentPlaceHolder1_loginControl_Password").clear()
        driver.find_element_by_id("ContentPlaceHolder1_loginControl_Password").send_keys("password")
        driver.find_element_by_id("ContentPlaceHolder1_loginControl_UserName").clear()
        driver.find_element_by_id("ContentPlaceHolder1_loginControl_UserName").send_keys("username")
        driver.find_element_by_id("ContentPlaceHolder1_loginControl_LoginButton").click()
        driver.find_element_by_id("Header1_lnkMap").click()
        driver.find_element_by_id("map-size-toggle").click()
        driver.find_element_by_id("map-size-toggle").click()
        driver.find_element_by_id("OpenLayers.Control.PanZoomBar_17_zoomin_innerImage").click()
        driver.find_element_by_id("OpenLayers.Control.PanZoomBar_17_zoomin_innerImage").click()
        driver.find_element_by_xpath("//div[@id='toolsPanel']/divg").click()

    def is_element_present(self, how, what):
        try:
            self.driver.find_element(by=how, value=what)
        except NoSuchElementException as e:
            return False
        return True

    def is_alert_present(self):
        try:
            self.driver.switch_to_alert()
        except NoAlertPresentException as e:
            return False
        return True

    def close_alert_and_get_its_text(self):
        try:
            alert = self.driver.switch_to_alert()
            alert_text = alert.text
            if self.accept_next_alert:
                alert.accept()
            else:
                alert.dismiss()
            return alert_text
        finally:
            self.accept_next_alert = True

    def tearDown(self):
        self.driver.quit()
        self.assertEqual([], self.verificationErrors)

    if __name__ == "__main__":
        unittest.main()

You can now run the automated test with BrowserStack as shown in the GIF below.

ice_video_20170105-102548

The test can be played back and each step can be reviewed in turn. BrowserStack will provide you with a screenshot for any failed step (I have broken the one below on purpose!).

bs_failed_test

This service offers many useful features including localhost testing, ability to upload files and many more. There is no or very little coding required Рalthough you may want to change an elements location or add some custom assertions, cursor movements or text inputs etc. Furthermore, there are no (or not as many) compatibility issues between the Selenium methods and the webdrivers which one experiences when testing locally.

You can utilise the vast Selenium library for replicating user journeys in the required system at the desired speed. BrowserStack also has integrated Cucumber and Behave used in Behaviour Driven Development (BDD) and testing. However, at the time of the writing this post, there were issues running those on Windows platforms. Overall, it is a great framework for compatibility testing.

Alongside the automation, exploratory tests still need to be in place¬†to avoid kaleidoscopic bugs like this one found in our product mapTrunk…good job we caught this before Go-Live!

openlayers_bug-1