New API Version - 2015-07-06

Version 2015-07-06 is released today, with the following changes:

  • Removes /helpers endpoint from the API
  • Renames subscription start_at and end_at to start_date and end_date
  • Enforces date format when passing a payment charge_date

For the majority of integrations, the upgrade will be extremely simple, but we will continue our support for v2015-04-29 until the 6th of January, 2016.

Upgrading

Upgrading to the new API version should be extremely simple:

  1. Update your version header:

    GoCardless-Version: 2015-07-06
    
  2. If you use the subscriptions endpoint, update your integration to use start_date and end_date keys instead of start_at and end_at.

  3. If you generate PDF mandates, update your integration to use the new mandate_pdfs endpoint.

  4. If you use the old /helpers/modulus_check endpoint, update your code to use the new bank_details_lookups endpoint.

Why are GoCardless making these changes?

The above changes achieve two improvements to the GoCardless API:

  1. Dates now all have an _date key, whilst timestamps all have an _at key.

  2. All endpoints act as first class resources. Previously the /helpers endpoints were inconsistent with the rest of the API, making them harder to use.

Off the back of these changes we will release version 1.0 of our Java and Ruby client libraries this week. Python and PHP will follow shortly afterwards.

Need help upgrading or have any questions?
Get in touch
in Engineering

Coach: An alternative to Rails controllers

Today we're open sourcing Coach, a library that removes the complexity from Rails controllers. Bundle your shared behaviour into highly robust, heavily tested middlewares and rely on Coach to join them together, providing static analysis over the entire chain. Coach ensures you only require a glance to see what's being run on each controller endpoint.

At GoCardless we've replaced all our controller code with Coach middlewares.

Why controller code is tricky

Controller code often suffers from hidden behaviour and tangled data dependencies, making it hard to read and difficult to test.

To keep your endpoints performant, you never want to be running a cacheable operation more than once. This leads to memoizing your database queries but the question is then where to store this data? If you want the rest of your controller code to be able to access it, then storing the data on the controller instance is the easiest way to make that happen.

In an attempt to reuse code, controller methods are then split out into controller concerns (mixins), which are included as needed. This leads to controllers that look skinny but have a large amount of included behaviour, all defined far from their call site.

Some of these implicitly defined methods are called in before_actions, making it even more unclear what code is being run when you hit your controllers. Inherited before_actions can lead to a controller that runs several methods before every action without it being clear which these are.

One of the first things we do when a request hits GoCardless is parse authentication details and pull a reference to that access token out of the database. As the request progresses, we make use of that token to scope our database queries, tag our logs and verify permissions. This sharing and reuse of data is what makes writing controller code complex.

So how does Coach help?

Coach rethinks this approach by building your controller code around the data needed for your request. All your controller code is built from Coach::Middlewares, which take a request context and decide on a response. Each middleware can opt to respond itself, or call the next middleware in the chain.

Each middleware can specify the data it requires from those that have ran before it, and can declare what data it will pass to those that come after. This makes the flow of data explicit, and Coach will verify that the requirements have been met before you ever mount an endpoint.

Coach by example

The best way to see the benefits of Coach is with a demonstration...

Mounting an endpoint

class HelloWorld < Coach::Middleware
  def call
    # Middleware return a Rack response
    [ 200, {}, ['hello world'] ]
  end
end

So we've created ourselves a piece of middleware, HelloWorld. As you'd expect, HelloWorld simply outputs the string 'hello world'.

In an example Rails app, called Example, we can mount this route like so...

Example::Application.routes.draw do
  match "/hello_world",
        to: Coach::Handler.new(HelloWorld),
        via: :get
end

Once you've booted Rails locally, the following should return 'hello world':

$ curl -XGET http://localhost:3000/hello_world

Building chains

Suppose we didn't want just anybody to see our HelloWorld endpoint. In fact, we'd like to lock it down behind some authentication.

Our request will now have two stages, one where we check authentication details and another where we respond with our secret greeting to the world. Let's split into two pieces, one for each of the two subtasks, allowing us to reuse this authentication flow in other middlewares.

class Authentication < Coach::Middleware
  def call
    unless User.exists?(login: params[:login])
      return [ 401, {}, ['Access denied'] ]
    end

    next_middleware.call
  end
end

class HelloWorld < Coach::Middleware
  uses Authentication

  def call
    [ 200, {}, ['hello world'] ]
  end
end

Here we detach the authentication logic into its own middleware. HelloWorld now uses Authentication, and will only run if it has been called via next_middleware.call from authentication.

Notice we also use params just like you would in a normal Rails controller. Every middleware class will have access to a request object, which is an instance of ActionDispatch::Request.

Passing data through middleware

So far we've demonstrated how Coach can help you break your controller code into modular pieces. The big innovation with Coach, however, is the ability to explicitly pass your data through the middleware chain.

An example usage here is to create a HelloUser endpoint. We want to protect the route by authentication, as we did before, but this time greet the user that is logged in. Making a small modification to the Authentication middleware we showed above...

class Authentication < Coach::Middleware
  provides :user  # declare that Authentication provides :user

  def call
    return [ 401, {}, ['Access denied'] ] unless user.present?

    provide(user: user)
    next_middleware.call
  end

  def user
    @user ||= User.find_by(login: params[:login])
  end
end

class HelloUser < Coach::Middleware
  uses Authentication
  requires :user  # state that HelloUser requires this data

  def call
    # Can now access `user`, as it's been provided by Authentication
    [ 200, {}, [ "hello #{user.name}" ] ]
  end
end

# Inside config/routes.rb
Example::Application.routes.draw do
  match "/hello_user",
        to: Coach::Handler.new(HelloUser),
        via: :get
end

Coach analyses your middleware chains whenever a new Handler is created. If any middleware requires :x when its chain does not provide :x, we'll error out before the app even starts.

Summary

Our problems with controller code were implicit behaviours, hidden data dependencies and as a consequence of both, difficult testing. Coach has tackled each of these, providing the framework to restructure own controllers into code that is easily understood, and easily maintained.

Coach also hooks into ActiveSupport::Notifications, making monitoring performance of our API really easy. We've written a little adapter that sends detailed performance metrics up to Skylight, where we can keep an eye out for sluggish endpoints.

Coach is on GitHub. As always, we love suggestions, feedback, and pull requests!

We're hiring developers
See job listing
in Engineering

Prius: environmentally-friendly app config

We just open-sourced Prius, a simple library for handling environment variables in Ruby.

Safer environment variables

Environment variables are a convenient way of managing application config, but it's easy to misconfigure or forget them. This can cause big problems:

# If ENCRYPTION_KEY is missing, a nil encryption key will be used
encrypted_data = Cryto.encrypt(really_secret_data, ENV["ENCRYPTION_KEY"])

# If FOO_API_KEY is missing, this code will bomb out at run time
FooApi::Client.new(ENV.fetch("FOO_API_KEY")).make_request

Prius helps you guarantee that your environment variables are:

  • Present - an exception is raised if an environment variable is missing, so you can hear about it as soon as your app boots.
  • Valid - an environment variable can be coerced to a desired type (integer, boolean or string), and an exception will be raised if the value doesn't match the type.

Usage

# Load a required environment variable (GITHUB_TOKEN) into Prius.
Prius.load(:github_token)

# Use the environment variable.
Prius.get(:github_token)

# Load an optional environment variable:
Prius.load(:might_be_here_or_not, required: false)

# Load and alias an environment variable:
Prius.load(:short_name, env_var: "LONG_NAME_WE_HAVE_NO_CONTROL_OVER")

# Load and coerce an environment variable (or raise):
Prius.load(:my_flag, type: :bool)

How we use Prius

All environment variables we use are loaded as our app starts, so we catch config issues at boot time. If an app can't boot, it won't be deployed, meaning we can't release mis-configured apps to production.

We check a file of dummy config values into source control, which makes running the app in development and test environments easier. Dotenv is used to automatically load this in non-production environments.

We're hiring developers
See job listing

Safely retrying API requests

Today we're announcing support for idempotency keys on our Pro API, which make it safe to retry non-idempotent API requests.

Why are they necessary?

Here's an example that illustrates the purpose of idempotency keys.

You submit a POST request to our /payments endpoint to create a payment. If all goes well, you'll receive a 201 Created response. If the request is invalid, you'll receive a 4xx response, and know that the payment wasn't created. But what if something goes wrong our end and we issue a 500 response? Or what if there's a network issue that means you get no response at all? In these cases you have no way of knowing whether or not the payment was created. This leaves you with two options:

  • Hope the request succeeded, and take no further action.
  • Assume the request failed, and retry it. However, if the request did succeed you'll end up with a duplicate payment.

Not an ideal situation.

Idempotency Keys

To solve this, we've now rolled out support for idempotency keys across all our creation endpoints. Idempotency keys are unique tokens that you submit as a request header, that guarantee that only one resource will be created regardless of how many times a request is sent to us.

For example, the following request can be made repeatedly, with only one payment ever being created:

POST https://api.gocardless.com/payments HTTP/1.1
Idempotency-Key: PROCESS-ME-ONCE
{
  "payments": {
    "amount": 100,
    "currency": "GBP",
    "charge_date": "2015-06-20",
    "reference": "DOLLAR01",
    "links": {
      "mandate": "MD00001EKBQ412"
    }
  }
}

If the request fails, then it's perfectly safe to retry as long as you use the same idempotency key. If the original request was successful, then you'll receive the following response:

HTTP/1.1 409 (Conflict)
{
  "error": {
    "code": 409,
    "type": "invalid_state",
    "message": "A resource has already been created with this idempotency key",
    "documentation_url": "https://developer.gocardless.com/pro#idempotent_creation_conflict",
    "request_id": "5f917bf9-df56-460f-a165-15d9e77414cb",
    "errors": [
      {
        "reason": "idempotent_creation_conflict",
        "message": "A resource has already been created with this idempotency key",
        "links": {
          "conflicting_resource_id": "PM00001KKVGTS0"
        }
      }
    ]
  }
}

It's worth noting that we haven't added support for idempotency keys to our update endpoints, as they're idempotent by nature. For example, trying to cancel the same payment multiple times will have no adverse effect.

We're constantly improving our API to provide a better experience to our integrators. If you have any feedback or suggestions then get in touch, we'd love to hear from you!

Wondering whether GoCardless Pro could be for you?
Find out more

Zero-downtime Postgres migrations - the hard parts

A few months ago, we took around 15 seconds of unexpected API downtime during a planned database migration. We're always careful about deploying schema changes, so we were surprised to see one go so badly wrong. As a payments company, the uptime of our API matters more than most - if we're not accepting requests, our merchants are losing money. It's not in our nature to leave issues like this unexplored, so naturally we set about figuring out what went wrong. This is what we found out.

Background

We're no strangers to zero-downtime schema changes. Having the database stop responding to queries for more than a second or two isn't an option, so there's a bunch of stuff you learn early on. It's well covered in other articles1, and it mostly boils down to:

  • Don't rename columns/tables which are in use by the app - always copy the data and drop the old one once the app is no longer using it
  • Don't rewrite a table while you have an exclusive lock on it (e.g. no ALTER TABLE foos ADD COLUMN bar varchar DEFAULT 'baz' NOT NULL)
  • Don't perform expensive, synchronous actions while holding an exclusive lock (e.g. adding an index without the CONCURRENTLY flag)

This advice will take you a long way. It may even be all you need to scale this part of your app. For us, it wasn't, and we learned that the hard way.

The migration

Jump back to late January. At the time, we were building invoicing for our Pro product. We'd been through a couple of iterations, and settled on model/table names. We'd already deployed an earlier revision, so we had to rename the tables. That wasn't a problem though - the tables were empty, and there was no code depending on them in production.

The foreign key constraints on those tables had out of date names after the rename, so we decided to drop and recreate them2. Again, we weren't worried. The tables were empty, so there would be no long-held lock taken to validate the constraints.

So what happened?

We deployed the changes, and all of our assumptions got blown out of the water. Just after the schema migration started, we started getting alerts about API requests timing out. These lasted for around 15 seconds, at which point the migration went through and our API came back up. After a few minutes collecting our thoughts, we started digging into what went wrong.

First, we re-ran the migrations against a backup of the database from earlier that day. They went through in a few hundred milliseconds. From there we turned back to the internet for an answer.

Information was scarce. We found lots of blog posts giving the advice from above, but no clues on what happened to us. Eventually, we stumbled on an old thread on the Postgres mailing list, which sounded exactly like the situation we'd ran into. We kept looking, and found a blog post which went into more depth3.

In order to add a foreign key constraint, Postgres takes AccessExclusive locks on both the table with the constraint4, and the one it references while it adds the triggers which enforce the constraint. When a lock can't be acquired because of a lock held by another transaction, it goes into a queue. Any locks that conflict with the queued lock will queue up behind it. As AccessExclusive locks conflict with every other type of lock, having one sat in the queue blocks all other operations5 on that table.

Here's a worked example using 3 concurrent transactions, started in order:

-- Transaction 1
SELECT DISTINCT(email)     -- Takes an AccessShare lock on "parent"
FROM parent;               -- for duration of slow query.

-- Transaction 2
ALTER TABLE child          -- Needs an AccessExclusive lock on
ADD CONSTRAINT parent_fk   -- "child" /and/ "parent". AccessExclusive
  FOREIGN KEY (parent_id)  -- conflicts with AccessShare, so sits in
  REFERENCES parent        -- a queue.
  NOT VALID;

-- Transaction 3
SELECT *                   -- Normal query also takes an AccessShare,
FROM parent                -- which conflicts with AccessExclusive
WHERE id = 123;            -- so goes to back of queue, and hangs.

While the tables we were adding the constraints to were unused by the app code at that point, the tables they referenced were some of the most heavily used. An unfortunately timed, long-running read query on the parent table collided with the migration which added the foreign key constraint.

The ALTER TABLE statement itself was fast to execute, but the effect of it waiting for an AccessExclusive lock on the referenced table caused the downtime - read/write queries issued by calls to our API piled up behind it, and clients timed out.

Avoiding downtime

Applications vary too much for there to be a "one size fits all" solution to this problem, but there are a few good places to start:

  • Eliminate long-running queries/transactions from your application.6 Run analytics queries against an asynchronously updated replica.
    • It's worth setting log_min_duration_statement and log_lock_waits to find these issues in your app before they turn into downtime.
  • Set lock_timeout in your migration scripts to a pause your app can tolerate. It's better to abort a deploy than take your application down.
  • Split your schema changes up.
    • Problems become easier to diagnose.
    • Transactions around DDL are shorter, so locks aren't held so long.
  • Keep Postgres up to date. The locking code is improved with every release.

Whether this is worth doing comes down to the type of project you're working on. Some sites get by just fine putting up a maintenance page for the 30 seconds it takes to deploy. If that's not an option for you, then hopefully the advice in this post will help you avoid unexpected downtime one day.


  1. Braintree have a really good post on this. 

  2. At the time, partly as an artefact of using Rails migrations which don't include a method to do it, we didn't realise that Postgres had support for renaming constraints with ALTER TABLE. Using this avoids the AccessExclusive lock on the table being referenced, but still takes one on the referencing table. Either way, we want to be able to add new foreign keys, not just rename the ones we have. 

  3. It's also worth noting that the Postgres documentation and source code are extremely high quality. Once we had an idea of what was happening, we went straight to the locking code for ALTER TABLE statements

  4. This still applies if you add the constraint with the NOT VALID flag. Postgres will briefly hold an AccessExclusive lock against both tables while it adds constraint triggers to them. 9.4 does make the VALIDATE CONSTRAINT step take a weaker ShareUpdateExclusive lock though, which makes it possible to validate existing data in large tables without downtime. 

  5. SELECT statements take an AccessShare lock. 

  6. If developers have access to a console where they can run queries against the production database, they need to be extremely cautious. BEGIN; SELECT * FROM some_table WHERE id = 123; /* Developer goes to make a cup of tea */ will cause downtime if someone deploys a schema change for some_table

We’re hiring developers
See job listing