Restoring a Postgres database to AWS RDS using Docker

In this post I look at using Docker to restore a Postgres dump file to a Postgres database running in the cloud on AWS RDS.

Keep it clean

One of the big selling points of docker, for me, is that I can have lots of apps and utils running in nice containers on my dev laptop, without having to install them locally.  This ensures my laptop stays nice and responsive and I don’t clutter/break my laptop with lots of weird dependencies and running processes that I’m then too scared to delete.

Postgres is a good example – I don’t want to install it locally, but I do need access to the command line tools like psql and pg_restore, to be able to work with my databases effectively.

One way of accessing these tools would be to ssh onto the AWS cloud instances, but there’s a bunch of reasons most pertinently security (not to mention the faff) why you’d want to avoid that every time you want to run some sql.  So let’s look at how we use Docker to ease the pain instead.

Start Me Up

With Docker installed you can build this simple Dockerfile to create a local Postgres container.  The User and Password env vars aren’t strictly required, however, if you want to actually connect to the containerised DB, it’s pretty handy

You can build, run and connect to the container as follows (assumes you are on Mac)

Note line 4 where I map the data-load dir I created at line 1 to a new directory called data-loader inside my container.  This means that when I copy the Postgres dump file into my local data-load directory, it will be available to the postgres tools available in the container.

Line 6  allows me to connect to the container, swap the imageId  for your locally running containerID.

Restoring your database with pg_restore

I’ll assume you already have a Postgres database set up within the AWS cloud.  So now we have connected to our container, we can use pg_restore to use restore our dumpfile into AWS (note this command will prompt you for the admin password)

A note on schemas

If you’re doing a partial restore, you may want to restore your dumpfile to a separate schema.  Unfortunately there appears to be no way to do this from the command line.  What you have to do is to rename the public schema, create a new public schema and restore into that, then reverse the process.

This StackOverflow answer outlines the process.

Restore Complete

You should now have a complete restore of your dumpfile in the cloud.  Please add comments if anything is unclear.

HOT Tasking Manager 3.0 Development Underway at thinkWhere

We get to work on some great projects here at thinkWhere, but we’re particularly proud of the project we’ve just started with the Humanitarian OpenStreetMap Team (HOT) to develop the Tasking Manager version 3.0 (TM3). It’s great to work in an industry where the work we do with maps can have such a tangible impact on the humanitarian effort. thinkWhere support this wherever we can by supporting MapAction by both supporting my own personal volunteering (which you can read more about here) and through fundraising efforts such the running the Stirling Marathon, which is open for sponsorship here so please donate if you can! Therefore being given the opportunity to also get involved with the HOT Community and deliver TM3 is something we’re extremely proud of.

The current HOT Tasking Manager (TM2) coordinates volunteer mapper contribution to OpenStreetMap, meaning the global network of HOT volunteers can map affected areas efficiently providing disaster responders on the ground such as MapAction, MSF and the Red Cross access to detailed mapping during the response.

Following a significant increase in the capture of OSM data through initiatives such as Missing Maps, and subsequent loads on existing servers and software, the development of TM3 aims to better meet the needs of mappers, validators and project managers.  This will be achieved by taking advantage of the very latest advances and innovations in web development frameworks and methodologies.

Blake Girardot, TM3 Project Manager said…

We are very excited to be working with thinkWhere to develop the next generation of HOT’s Tasking Manager application. The team at thinkWhere brings a wealth of geospatial development experience, talent and insight to the project. Used by thousands of people around the world, our Tasking Manager software is the key technology component that enables our humanitarian mapping work and having thinkWhere as partners ensures the development project will be a success and deliver a great result for our community”.

In early February 2017, we travelled to HOT’s office in Washington DC for the project initiation meeting with various project stakeholders to discuss initial requirements and wireframe designs.

This engagement with some of the largest users of the Tasking Manager to discuss functionality and solicit feedback on how features might be implemented was a great way to start the project.  A long, but very productive day, the discussion involved representatives from Missing Maps, YouthMappers, TeachOSM, GeoCenter, the US State Department’s Humanitarian Information Unit, Mapbox, HOT Project Managers, American Red Cross and Ethan Nelson, the lead community development volunteer.

thinkWhere’s Chief Executive Alan Moore said…

We’re really delighted and privileged to be working with the team at HOT. Redevelopment of the Tasking Manager will be key to the future growth and sustainability of the humanitarian mapping effort across the world. The development work flows naturally from the innovative work we’ve been doing recently on theMapCloud, our new spatial data platform, and we’re keen to bring those advantages to the benefit of HOT and the global mapping community”.

Since arriving back in Scotland from the States the team have been hard at work developing TM3. You can follow development progress of TM3 on the Tasking Manager GitHub repository and can also view the TM3 Staging Site.

Everyone is invited to participate in the process. Comments, questions and issues are needed and encouraged on the GitHub repository’s Issues tab, so please get involved!

AngularJS and OpenLayers: creating a scale bar module

Intro

We’ve recently released a new product called mapTrunk. The app is built using the open source libraries AngularJS and OpenLayers 3 (among many others!). As part of our development efforts we looked into creating reusable modules. This blog post offers a high level introduction to AngularJS and OpenLayers 3 and shows how they can work together to create a reusable map scale bar module example.

maptrunk

AngularJS and OpenLayers 3

AngularJS is an open source JavaScript framework for creating web apps. It provides tools for making apps modular. AngularJS handles binding data which means the view (HTML) automatically updates when the model (JavaScript) updates. Other benefits of AngularJS include form validation in the browser, the ease of wiring an app up to a backend and the testability of the code. AngularJS also lets you extend the syntax of HTML and inject components into your HTML. This feature comes in handy when creating the scale bar module.

OpenLayers 3 is an open source mapping library. It provides tools for adding dynamic maps to an app. Commonly used mapping controls provided by OpenLayers include zooming in/out control, a mouse position control and a scale bar control.

The following example shows how to create a basic map with OpenLayers and AngularJS. The result is a map and a button to recenter the map. It also shows the user how many times they have centred the map.

HTML

Firstly we need to include the AngularJS and OpenLayers 3 libraries, add a div for the map and add a button. We also need to include the Angular app called “app”, which is created in JavaScript.

<html ng-app="app">
<head>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/ol3/3.18.2/ol.css"/>
    <a href="https://code.angularjs.org/1.4.12/angular.js">https://code.angularjs.org/1.4.12/angular.js</a>
    <a href="https://cdnjs.cloudflare.com/ajax/libs/ol3/3.18.2/ol.js">https://cdnjs.cloudflare.com/ajax/libs/ol3/3.18.2/ol.js</a>
    <a href="http://main.controller.js">http://main.controller.js</a>
</head>
<body ng-controller="mainController as main">
    <div id="map"></div>
    <button ng-click="main.centerMap()">Button</button>
    <div>You have centered the map on coordinate [0,0] {{main.counter}} times.</div>
</body> </html>

JavaScript

Here we create our own Angular controller; mainController, which initialises the map and contains the function which is called on clicking the button, updating the counter.

var app = angular.module('app', []);

(function () {

    'use strict';

    /**
     * The main Angular controller which initialises the mapping
     */
    angular
        .module('app')
        .controller('mainController', [mainController]);

    function mainControllerblockquote {
        var vm = this;
        vm.counter = 0;
        vm.map = new ol.Map({
            layers: [
                new ol.layer.Tile({
                    source: new ol.source.OSM()
                })
            ],
            target: 'map',
            view: new ol.View({
                center: [0, 0],
                zoom: 2
            })
        });
        vm.centerMap = function () {
            vm.map.getView().setCenter([0, 0]);
            vm.counter++;
        }
    }
})();

Creating the scale bar module

The OpenLayers library already has a scale bar module called ‘scale line’ built-in. An example can be found here. One of the requirements for mapTrunk was to create a scale line that can display distances in two units at the same time, metric and imperial.

To create a reusable module we can create a custom Angular directive. Angular directives basically let us create our own HTML syntax and inject components by using that HTML syntax. It makes the HTML code easier to read and hides the complexity of the component. In this blog we’re not going to go into the details of Angular directives so please see AngularJS’s documentation on directives for a full explanation.

First we need to create the Angular directive and decide what the HTML syntax is going to be. In the code snippet below we called the directive scaleLineControl. This translates into the HTML tag . The directive needs to have access to an OpenLayers map object to be able to add a scale line control to the map. The map object can be passed into the directive by adding it as a property to the HTML ‘map=”main.map”‘. The OpenLayers scale line control needs a HTML target ID so this ID can be given to the HTML directive as well. The scale line control is added to the OpenLayers map object by using the addControl function. The units of the first control are specified as metric. To create a scale line module which also shows imperial measurements, a second scale line control is added to the map with imperial units. OpenLayers takes care of listening out for changes on the map and updates the controls accordingly. Now we should see two scale lines on the map, but they are positioned on top of each other so we need some CSS to fix this.

(function () {

'use strict';

/**
 * @fileoverview This file provides a scaleLineDirective. 
 * It add a scaleLine showing both metric and imperial units. 
 * CSS is needed to display the elements nicely
 *
 * Example usage:
 * 
 * 
 */

    angular
        .module('app')
        .directive('scaleLineControl', scaleLineDirective);

    function scaleLineDirective() {
        return {
            restrict: 'E',
            link: function(scope, element, attributes) {
                var attr = 'map';
                var prop = attributes[attr];
                var map = scope.$eval(prop);
 
                var scaleLineControl = new ol.control.ScaleLine({
                    target: 'scale-line-container',
                    className: 'scale-line-top',
                    minWidth: 100,
                    units: 'metric'
                });
                map.addControl(scaleLineControl);

                var scaleLineControl2 = new ol.control.ScaleLine({
                    target: 'scale-line-container',
                    className: 'scale-line-bottom',
                    minWidth: 100,
                    units: 'imperial'
                });
                map.addControl(scaleLineControl2);
            }
        };
    }
})();

Making it look good!

We can specify the CSS class names when creating the OpenLayers scale line controls. By doing so we can customise the default look of the scale line controls. Here we have added the class ‘scale-line-top’ to the metric control and ‘scale-line-bottom’ to the imperial control.

#map {
    width: auto;
    height: 100%;
    position: relative;
    overflow: hidden;
}

#scale-line-container {
    border-radius: 2px;
    background: white none repeat scroll 0 0;
    bottom: 8px;
    left: 8px;
    font-size: 10px;
    position: absolute;
    z-index: 1000;
    padding: 5px;
    text-align: center;
}

.scale-line-top-inner {
    border-style: none solid solid solid;
    border-width: medium 2px 2px 2px;
}

.scale-line-bottom-inner {
    border-style: solid solid none;
    border-width: 2px 2px medium;
    margin-top: -2px;
}

Result

The result is an Angular directive which can be injected into HTML and easily be used in other applications.

scaleline.PNG

For a full working example, see this Plnkr.

Infinite QGIS plugin repositories in the cloud

[…we found it useful to make each version of the code available for installation and testing via their own QGIS repository…]

Here at thinkWhere we’ve recently released roadNet, a tool for managing the spatial database of road layouts, roadworks and roadside assets that local authorities use to create local street gazetteers and to plan maintenance and closures.  roadNet runs as a plugin on top of the excellent open source GIS package, QGIS.

During the build of roadNet, we found it useful to make each version of the code available for installation and testing via its own QGIS repository.  This post explains how it works.

RAMP_screenshot
roadNet manages the spatial and non-spatial data required to produce a BS7666-compliant local street gazetteer.  It features automatic line-splitting and updating multiple database tables in response to user edits as well as data validation and export in standard gazetteer data transfer formats e.g. SDTF.

git, GitHub, Shippable, Docker and Amazon S3

The roadNet continuous integration (CI) system makes use of a number of cloud-based services.  We use git and GitHub for version control to allow developers to track changes to the code and use separate branches to develop new features.  GitHub is linked to Shippable, which watches out for new commits to the code base.  This is similar to other CI systems such as Travis.  When the new code is committed Shippable spins into action.

cloud_deployment_all

Shippable is used to check the code and to create the cloud repositories.  The instructions to do this are stored in the shippable.yml file.  It does this inside a docker container, which contains QGIS and all its dependencies already installed and configured.

This stage was a bit tricky to configure because QGIS, as a desktop GIS application, assumes that it is running on a desktop computer with attached display to which it can send windows and message dialogs, when infact it is running on a little bit of memory and a little bit of hard disk on a big computer in a warehouse somewhere i.e. the Amazon Cloud.  The DockerFile contains the instructions to set up a fake display (or X virtual frame buffer) in the container.

Once the code has been tested a Python script pushes it out to the repository.

QGIS plugin repositories in the cloud

A QGIS plugin repository is just a web-facing folder that contains a plugins.xml file that describes the available plugins and a series of zip files containing the code.  Amazon Web Services includes the S3 service, which provides ‘buckets’ for storing files.  These can be configured to be accessible for the web, making them ideal for hosting repositories.

The deploy.py script contains the instructions to zip up the files, prepare plugins.xml and copy the files to S3.  The core of the key function is below:

def update_repos(build_name):
    """
    Create new repo for build, and update latest master/dev repos.
    :param build_name:
    """
    deploy_to_s3(build_name)
    deploy_to_s3('latest_push')

    branch = os.getenv('BRANCH')
    if branch is None:
        return
    elif branch == 'develop':
        deploy_to_s3('latest_develop')
    elif branch == 'master':
        deploy_to_s3('latest_master')

It is so easy to create repositories that we just make lots of them.  Every set of changes gets a build_name.  The first ‘deploy_to_s3’ line creates a repository specifically for that build.  These are all stored indefinitely.  This means that just by connecting to the specific repository, we can run the code as it was at any stage during the development.

The other ‘deploy_to_s3’ lines provide convenience repositories.  These get a copy of the code that was just pushed, meaning developers can connect to the latest_push and see their most recent changes.  thinkWhere’s testers can connect to latest_develop and try out new features as soon as they are merged into the develop branch.  Clients point their QGIS installs at the latest_master branch to ensure that they keep up with the latest stable releases.

Anyone can update to the latest version in their branch with a single click in the QGIS plugin installer.

Conclusion

We have found the automatic deployment of QGIS plugins to be immensely useful, facilitating both rapid development / testing feedback loops, and easy delivery of bug fixes and upgrades to the master branch. Check out roadNet today from the official QGIS plugin repository:

https://plugins.qgis.org/plugins/Roadnet/

If you think this may be useful to you, please also have a look at the code – its all released under GPL v2.

Further reading

You can install the latest stable build of roadNet in your local QGIS machine by adding our latest_master repository to your installation (Plugins > Manage and Install Plugins > Settings):

http://roadnet-builds.s3-website-eu-west-1.amazonaws.com/latest_master/plugins.xml??qgis=2.14

Multi-Container Deployment with Docker Compose

docker_future

Why Docker?

At thinkWhere we always aim to keep pace with latest tech-industry trends which is easier said than done in such a fast paced sector! However one unavoidable technology trend we’re now employing across our application deployment model is Docker. Docker is a software containerisation platform guaranteeing that software will always run the same, regardless of it’s environment. Docker offers many benefits over traditional application deployment, including:

Simplicity – Once an application is Dockerized, you have full control (start, stop, restart, etc) with only half a dozen commands. As these are generic Docker commands, it is easy for anyone unfamiliar with the specifics of an application to get started.

It’s already Dockerized Docker Hub is the central marketplace for Docker images to be shared with other Docker users. Often you find official Docker images for an application already exist or you can find another users efforts to build upon. Docker images we have used include MapFish Print and Nginx. We have also containerised our own flavours of MapProxy and GeoServer.

Blueprint of application configuration – A Dockerfile provides the blueprint or instructions to build an application. This can be stored in source control and refined overtime to improve the build. It also removes any ambiguity of build/configuration differences between various deployments.

suHPcQ

Rapid to deploy – Having this blueprint for each application means that all we need is a server or virtual machine with Docker engine installed. This has drastically reduced the time spent deploying and configuring our applications.

Plays nicely with continuous integration – Amazon Web Services offer services dedicated to deploying and managing containerised applications. We recently constructed a Shippable Pipeline which builds and pushes new images to the Docker Hub repository as changes are merged into a code base. These new images are pulled down by the Amazon Elastic Container Service and deployed seamlessly. The ‘throw away’ nature of Docker lends itself to scalability, therefore services such as this can be scaled up or down with just a few clicks of the mouse.

Why Compose?

These days its rare to deploy applications which exist in a completely standalone context without the need to communicate with at least one other application or service. If all these applications happen to be Dockerized then Docker Compose is a great tool to create a multi-container application. In a nutshell, Compose lets us start/stop/build a cluster of related Docker containers.

The Compose File

In addition to the existing Dockerfile for each image, a single docker-compose.yaml file must be created for a Compose project. It’s here we define how the various containers will behave and communicate with each other.

This example compose file is from a Flask-restful application I wrote to serve GB Postcode GeoJSON from MongoDB. You can see it working and try your own postcode here.

This Compose configuration comprises three containers – a Python web application which talks to a Mongo database all sitting behind an Nginx web server.

web:
  restart: always
  build: ./web
  expose:
    - "8000"
  volumes:
    - /usr/src/app/static
  env_file: .env
  links:
    - db
  command: /usr/local/bin/gunicorn -w 2 -b :8000 app:app

nginx:
  restart: always
  build: ./nginx/
  ports:
    - "80:80"
  volumes:
    - /www/static
  volumes_from:
    - web
  links:
    - web:web

db:
  restart: always
  build: ./mongo/
  ports:
    - "27017:27017"
  volumes:
    - /home/matt/postcode_search/mongo:/data/db

This Compose file is largely self explanatory as you can see the three container configurations (web, db, nginx) are defined separately.

Some of the entries in the example file above will be familiar to Docker users, such as mapping a volume or forwarding ports to the host. The Compose file is best used for setting some of these parameters which may have previously been configured when starting a single container with the ‘docker run’ command. This is because a single command is used to start all the containers within a Compose cluster, rather than starting them individually.

In order to allow communication between containers a ‘links’ entry is used. You can see that the ‘web’ container will be linked to the ‘db’ container. The ‘links’ entry is used in the same way as the ‘–link’ option for ‘docker run’. When the Compose cluster is started the links entries are used to determine the startup order of the containers.

Another unfamiliar entry may be ‘volumes_from’. As you may have guessed this simply mounts all the volumes from another container. In this case the Nginx container needs visibility of the static files from the Python application.

So to bring up the application we simple use the ‘docker compose up’ command. With this single command we can build (if required) and start our three containers. Easy!

d79025b6d689c4eb240b70f84a3c3c94eb1f753378709ce1ab57f0938d8a93a1