in Engineering

Visualising GoCardless' UK Growth

The above animation is a time lapse of customers using GoCardless for the first time. It covers the last 3 years and only maps the UK for now, with each red dot representing a new customer joining GoCardless, then becoming a blue dot for the remainder of the clip. It works even better if you view it full-screen.

It started as just something to do for fun on the side, we've been seeing a lot of growth recently and as we move into Europe and launch our new API I was curious about what it'd be like to take a moment to look back on how far we've come. The result actually turned out to be pretty interesting and the post below explains how I generated it.

Generating the Data Set

Using street addresses would have been a little messy, but luckily UK addresses include a postal code. There are around 1.8 million postcodes, providing a good level of granularity throughout the UK and Northern Ireland for around 29 million potential addresses. Sadly, this meant that the rest of Europe wasn't plottable this time around - a challenge for another day.

Data from GoCardless
+---------------------+-------------+
|     created_at      | postal_code |
+---------------------+-------------+
| 2011-09-21 22:15:44 | EC1V 1LQ    |
| 2011-09-27 12:42:17 | TA9 3FJ     |
+---------------------+-------------+

Unfortunately, postal codes also aren't distributed throughout the country in a neat grid system. There was no easy way to translate a postcode to a location on screen. What I really needed was something uniform and regular - latitudes and longitudes.

A quick search revealed several services online which provide the latitude and longitude for a given postcode, however, with 1.8m postcodes to potentially sift through, and taking into account the rate limiting on a lot of these services, this wasn't quite going to cut it.

As is often the case, it seems I'm not the first person to come across this problem and after some more searching I discovered freemaptools.com who have compiled a database containing all the 1.8m+ UK postcodes together with their latitude and longitude!

Data from the postcode data dump
+---------+----------+--------------------+--------------------+
|   id    | postcode |      latitude      |     longitude      |
+---------+----------+--------------------+--------------------+
| 1237144 | EC1V1LQ  | 51.531058677023300 | -.100484683590648  |
|  210341 | TA93FJ   | 51.229186607412900 | -2.976539258481700 |
+---------+----------+--------------------+--------------------+

After importing all of this into SQL, a few queries later I finally had the data I needed.

+---------------------+--------------------+--------------------+
|      timestamp      |      latitude      |     longitude      |
+---------------------+--------------------+--------------------+
| 2011-09-21 22:15:44 | 51.531058677023300 | -.100484683590648  |
| 2011-09-27 12:42:17 | 51.229186607412900 | -2.976539258481700 |
+---------------------+--------------------+--------------------+

Plotting Locations on a Map

image

R is a language I'd been looking for an excuse to experiment with for a while. It's free, open source and and after checking out some available packages like maps and mapdata it quickly became apparent that plotting latitudes and longitudes shouldn't be too much hassle.

Sure enough, after a little playing around, it was possible to map all the customers represented by little blue dots.

Very exciting, but still some room for improvement - the points seemed a bit large and very messy. In areas of high density (for example, London) the map was solid blue. It was awesome to see how many people have used us but it didn't make for the best of visuals.

After toying with some point sizes and opacity values, things were looking much more interesting and the densities naturally emerged.

Before After
# draw.r
library(maps)
library(mapdata)

setwd("/Users/petehamilton/gocardless-maps")

# Set variables
gocardless_blue = rgb(0.31,0.57,0.85)
gocardless_blue_translucent = rgb(0.31,0.57,0.85,0.1)
par(family="Gotham-Book")

# Read data
customer_data <- read.csv("data/r-customers-export.csv", header = TRUE)

# Set output file
png("output.png", height=3000, width=2000, pointsize = 80)

map('worldHires', c('UK', 'Ireland', 'Isle of Man','Isle of Wight'), xlim=c(-11,3), ylim=c(49,60.9), fill=FALSE, col=gocardless_blue, mar=rep(1,4))
points(customer_data$lon,customer_data$lat, col=gocardless_blue_translucent, pch=20, cex=0.2)
title("GoCardless Customers", col.main=gocardless_blue)
dev.off(dev.cur())

# Execute with: R --slave -f draw.r

Where, Meet When

At this point I had plotted all the customers and made good use of the where portion of the data, but hadn't done anything with the when side of things.

The principle of animating this kind of data is generally conceptually straightforward. For each day, split the customers who were added on that day into chunks corresponding to the number of frames you want to animate per day. After rendering each chunk, output an image, then at the end, stitch them all together - you've got yourself an animation!

To better visualize the new customers I highlighted them in red to make them stand out against the previously plotted customers. I also removed the outline of the UK until the very end frames, resulting in the emergent GoCardless UK outline you see above.

Speeding Up R

Initially, the first rendering of these frames took hours and the concept of "rapid iteration" went squarely out the window. However, when you're fiddling with point sizes and opacities this doesn't really work well, there had to be a better way.

After some digging, it transpired that R was only using one of my CPUs. After looking around, it seemed that R had support for parallel operations, but in order to parallelise loops I'd need doParallel and foreach.

After altering the code to leverage these packages, I was then generating one frame on each core, in my case resulting in a 4x speed-up.

You can see the final code here. It's my first foray into R so there are doubtless improvements, in which case I'd love to hear from you at pete@gocardless.com.

Stitching it all together

The final step is to put together all the frames. Since we've named them using a %06d format, on OSX we can leverage ffmpeg to join them together.

ffmpeg -f image2 -r 60 -i output/frame_%06d.png customer-growth.mpg

If you're on linux (using AWS for example), you can also do this using avconv in a similar way.

We're gonna need a bigger boat...

At this point you may have noticed the intense heat and noise of your laptop gradually melting in a fiery blaze of screaming CPUs. Thousands of frames and video rendering don't generally agree with your average laptop. To get around this, I suggest renting time on an AWS spot instance or Digital Ocean instance. You can leverage some seriously beefy machines with a bunch of CPUs and RAM, then just SCP down the results once it's done.

Next Steps

I'm thinking of doing some more visualisations in future and there's doubtless other areas of the business I could explore, if you have any ideas, let me know. Also, if this seems like the kind of thing you'd love to do, we have plenty of other interesting challenges, we're hiring and would love to hear from you.

Sources

We’re hiring developers
See job listing
in Business

5 ways to do good with GoCardless

At GoCardless, as well as helping business take payments, we're proud to support many non profit organisations. Making the world a better place has always been a core part of what motivates us as a team, so we're always delighted when we find people using GoCardless for particularly admirable causes. If one of your New Year's Resolutions is to give more to charities and non-profits, have a look at some of these awesome GoCardless users.

1. Help Kids' Company

The Help Kids' Company makes the Christmas period less harsh for kids from less privileged backgrounds. This year they've extended their annual campaign to help meet the basic needs of kids this winter, with warm clothing, hot meals and food vouchers. I'm pretty sure everyone can agree that this is a pretty awesome cause to be supporting! You can get involved on their Crowdfunder page.

Marianna from Help Kids' Company said this: "Kids Company provides practical, emotional and educational support to vulnerable inner-city children and young people in London, Bristol and Liverpool. We offer a safe, caring, family environment where support is tailored to the needs of each individual. We have extended our fundraising target to help us provide our children with basic necessities such as food and heating throughout winter, the harshest time of year. Thank you so much everyone!"

2. Meningitis Now

Meningitis Now's vision is "a future where no one in the UK loses their life to meningitis and everyone affected gets the support they need to rebuild their lives". They provide valuable support and counselling services to those that have lose relatives to Meningitis or to those who's lives have been adversely impacted by Meninigitis (for some people, Meningitis can leave people with permanent short term memory loss, seizures & learning disabilities). If you want to help out, you can do so at the Meningitis Now website.

3. The Seed Africa

The Seed Africa is a great cause - the desire to take one young gifted woman and give her the education that will empower her to make a difference in her community. That girl is Lelo, from Swaziland, who's education in Switzerland has been progressing admirably and she plans to go to boarding school in a couple of years. The Seed Africa is also raising through Crowdfunder, and if you're interested you can donate here The Seed Africa.

4. The House of St Barnabas

The House of St Barnabas focusses on getting young homeless people into long term, sustainable employment. Like GoCardless, it's based in London, and supports many homeless people in London achieve their goals.They describe themselves as 'social pioneers driven by helping others to forge ahead' and they provide a place for young people to meet, connect, exchange and realise ideas. All in all, a really worth cause! The House of St Barnabas.

5. The Nightshelter

The nightshelter provides safety, warmth and food to the homeless in Cardiff. Its residents can take a hot shower, use the laundry facilities and enjoy a cooked meal. Unfortunately, due to cuts in its Local Authority funding, the Night shelter is in danger of closing, which would mean that an additional 10-12 people would have to sleep rough on the streets of Cardiff every night. Its trying to crowdfund enough money to keep the Nightshelter going for at least one more year and is drawing in on its sum. You can donate to the Nightshelter on crowdfunder.

Have you got a great cause and want to take recurring donations to fund it? Get started with GoCardless

in Engineering

Ibandit: simple IBAN manipulation

We just open-sourced Ibandit, a simple library for working with IBANs.

Usage

Constructing an IBAN from national details:

iban = Ibandit::IBANBuilder.build(
  country_code: 'GB',
  bank_code: 'BARC', # optional if a BIC finder is configured
  branch_code: '200000',
  account_number: '55779911'
)

iban.iban
# => "GB60BARC20000055779911"

Deconstructing an IBAN into national details:

iban = Ibandit::IBAN.new("GB82 WEST 1234 5698 7654 32")

iban.country_code
# => "GB"
iban.check_digits
# => "82"
iban.bank_code
# => "WEST"
iban.branch_code
# => "123456"
iban.account_number
# => "98765432"
iban.iban_national_id
# => "WEST98765432"

Validating an IBAN's format and check digits (national modulus checks are NOT applied):

iban = Ibandit::IBAN.new("GB81 WEST 1234 5698 7654 32")

iban.valid?
# => false
iban.errors
# => { check_digits: "Check digits failed modulus check. Expected '82', received '81'"}

Why convert to/from IBAN

IBANs are used for all SEPA payments, such as the collection of SEPA Direct Debit payments, but most people only know their national bank details. Further, most countries have validations which apply to their national details, rather than to IBANs.

Ibandit lets you work with national details when communicating with your customers, but with IBANs when communicating with the banks. Its conversions are based on data provided by SWIFT and our experience at GoCardless, and it's heavily commented with descriptions of each country's national bank details. Internally, we use it as our standard interface for "bank details".

Given an IBAN Ibandit can also validate the format and IBAN check digit, and can deconstruct the IBAN so country-specific checks can be applied. It is not a modulus checking gem, and will not perform national modulus checks, but it does include implementations of some of these checks.

Other libraries

Another gem, iban-tools, also exists and is an excellent choice if you only require basic IBAN validation. However, iban-tools does not provide a comprehensive, consistent interface for the construction and deconstruction of IBANs into national details.

We're hiring developers
See job listing
in Announcements, Business

Bacs Processing Calendar 2015

As we head towards the end of 2014 we have inevitably started to think about making sure we’re ready to hit the ground running in 2015. To help you do the same we thought it would be helpful to share the 2015 Bacs Processing calendar.

Bacs Processing Calendar 2015

The Bacs Processing Calendar shows the dates that Bacs will accept Direct Debit submissions throughout 2015 as well as the days it won’t (weekends, bank holidays etc.).

Bacs explain, “The Bacs processing calendars are a valuable tool to help you ensure you don’t miss those important processing dates over the holiday period. The Bacs processing calendars supply you with the all important Julian dates you need to process your payment files”.

Here at GoCardless, we automatically take the processing calendar into account so you don’t need to. We do still recommend making sure you’re aware of any non-processing dates which might affect your payment schedule so that you are either prepared for payment timings to change or can decide to move those payments to fit into your schedule (which we're more than happy to help with).

If you have any questions on the 2015 Bacs processing calendar or payment timings more generally our support team are always happy to help so get in touch at support@gocardless.com.

Wondering whether SEPA Direct Debit could be for you?
Find out more
in Engineering

Syncing Postgres to Elasticsearch: lessons learned

At GoCardless we use Elasticsearch to power the search functionality of our dashboards. When we were building our Pro API, we decided to rethink how we got data into Elasticsearch.

At a high level, the problem is that you have your data in one place (for us, that's Postgres), and you want to keep a copy of it in Elasticsearch. This means every write you make (INSERT, UPDATE and DELETE statements) needs to be replicated to Elasticsearch. At first this sounds easy: just add some code which pushes a document to Elasticsearch after updating Postgres, and you're done.

But what happens if Elasticsearch is slow to acknowledge the update? What if Elasticsearch processes those updates out of order? How do you know Elasticsearch processed every update correctly?

We thought those issues through, and decided our indexes had to be:

  • Updated asynchronously - The user's request should be delayed as little as possible.
  • Eventually consistent - While it can lag behind slightly, serving stale results indefinitely isn't an option.
  • Easy to rebuild - Updates can be lost before reaching Elasticsearch, and Elasticsearch itself is known to lose data under network partitions.

Updating asynchronously

This is the easy part. Rather than generating and indexing the Elasticsearch document inside the request cycle, we enqueue a job to resync it asynchronously. Those jobs are processed by a pool of workers, either individually or in batches - as you start processing higher volumes, batching makes more and more sense.

Leaving the JSON generation and Elasticsearch API call out of the request cycle helps keep our API response times low and predictable.

Ensuring consistency

The easiest way to get data into Elasticsearch is via the update API, setting any fields which were changed. Unfortunately, this offers no safety when it comes to concurrent updates, so you can end up with old or corrupt data in your index.

To handle this, Elasticsearch offers a versioning system with optimistic locking. Every write to a document causes its version to increment by 1. When posting an update, you read the current version of a document, increment it and supply that as the version number in your update. If someone else has written to the document in the meantime, the update will fail. Unfortunately, it's still possible to have an older update win under this scheme. Consider a situation where users Alice and Bob make requests which update some data at the same time:

Alice Bob
Postgres update commits -
Elasticsearch request delayed -
- Postgres update commits
- Reads v2 from Elasticsearch
- Writes v3 to Elasticsearch
Reads v3 from Elasticsearch -
Writes v4 to Elasticsearch Changes lost

This may seem unlikely, but it isn't. If you're making a lot of updates, especially if you're doing them asynchronously, you will end up with bad data in your search cluster. Fortunately, Elasticsearch provides another way of doing versioning. Rather than letting it generate version numbers, you can set version_type to external in your requests, and provide your own version numbers. Elasticsearch will always keep the highest version of a document you send it.

Since we're using Postgres, we already have a great version number available to us: transaction IDs. They're 64-bit integers, and they always increase on new transactions. Getting hold of the current one is as simple as:

SELECT txid_current();

The asynchronous job simply selects the current transaction ID, loads the relevent data from Postgres, and sends it to Elasticsearch with that ID set as the version. Since this all happens after the data is committed in Postgres, the document we send to Elasticsearch is at least as up to date as when we enqueued the asynchronous job. It can be newer (if another transaction has committed in the meantime), but that's fine. We don't need every version of every record to make it to Elasticsearch. All we care about is ending up with the newest one once all our asynchronous jobs have run.

Rebuilding from scratch

The last thing to take care of is to handle any inconsistencies from lost updates. We do so by periodically resyncing all recently written Postgres records, and the same code allows us to easily rebuild our indexes from scratch without downtime.

With the asynchronous approach above, and without a transactional, Postgres-backed queue, it's possible to lose updates. If an app server dies after committing the transaction in Postgres, but before enqueueing the sync job, that update won't make it to Elasticsearch. Even with a transactional, Postgres-backed queue there is a chance of losing updates for other reasons (such as the issues under network partition mentioned earlier).

To handle the above, we decided to periodically resync all recently updated records. To do this we use Elasticsearch's Bulk API, and reindex anything which was updated after the last resync (with a small overlap to make sure no records get missed by this catch-up process).

The great thing about this approach is you can use the same code to rebuild the entire index. You'll need to do this routinely, when you change your mappings, and it's always nice to know you can recover from disaster.

On the point of rebuilding indexes from scratch, you'll want to do that without downtime. It's worth taking a look at how to do this with aliases right from the start. You'll avoid a bunch of pain later on.

Closing thoughts

There's a lot more to building a great search experience than you can fit in one blog post. Different applications have different constraints, and it's worth thinking yours through before you start writing production code. That said, hopefully you'll find some of the techniques in this post useful.

We’re hiring developers
See job listing