The GoCardless API is a way for developers to interact via software with GoCardless, allowing you to integrate us into your website, mobile app or desktop software. This means you can build your own customised integration to automate payment collection and reconciliation.
Integrating with the GoCardless API is incredibly simple and can be done in minutes with the following simple steps and our easy to use API libraries.
On Friday 6th April at 16:04 BST, we experienced a service outage. For a period of 27 minutes, the GoCardless API and Dashboard were unavailable, and users were unable to set up a Direct Debit via our payment pages or connect their account to a partner integration through our OAuth flow.
The outage was caused by a misconfiguration of our database, which stopped us making changes to the data we have stored.
For those of you wanting the technical details, the error was due to us reaching a limit on the ID given to each entry in our “update log”. Every time we make changes to certain tables in our database (i.e. create or update a row), we make a record of the changes to an “update log” to provide an audit trail. Each entry in the log has an automatically-generated sequential ID (1, 2, 3, and so on). Our database configuration meant that the maximum possible value for this ID was 2,147,483,648.
We hit this limit, so we were unable to write to the “update log”, which blocked writes to the database. For more details, see our previous blog post.
As a payments company, we know how critical our service is to our customers, and we take incidents like this extremely seriously.
As such, once we’ve responded to an incident and restored service for our users, we run “post-mortems” to make sure we understand:
Following the post-mortem for this incident, we’ve already taken a number of steps to improve our systems and processes for the future:
We’d like to apologise again for any inconvenience caused to you and your customers. We will continue to invest in our technology and processes to ensure we guard against any similar incidents in the future.
On Friday 6 April at 16:04 BST, we experienced a service outage. For a period of 27 minutes, the GoCardless API and Dashboard were unavailable, and users were unable to set up a Direct Debit via our payment pages or connect their account to a partner integration through our OAuth flow.
Submissions to the banks to collect payments and pay out collected funds were unaffected.
We’d like to apologise for any inconvenience caused to you and your customers. As a payments company, we know how important reliability is to our customers, and we take incidents like this extremely seriously. We’re completing a detailed review of the incident and taking the required steps to improve our technology and processes to ensure this doesn’t happen again.
All of our most critical data is stored in a PostgreSQL database. When we make changes to certain tables in that database (i.e. create or update a row), we use a trigger to keep a separate record of exactly what changed. We use this “update log” to enable data analysis tasks like fraud detection.
Each entry in the log has an automatically-generated sequential ID (1, 2, 3, and so on). This ID is stored using the
serial type in the database, which means it can be a value between 1 and 2147483648.
At 16:04 on Friday 6 April, we hit this upper limit, meaning we could no longer write to the “update log” table. In PostgreSQL, when a trigger fails, the original database write that triggered it fails too. This caused requests to our application to fail, returning a
500 Internal Server Error.
This issue also affected API requests (including those from the Dashboard) which only appear to read data (e.g. listing your customers or fetching a specific payment), since authenticated requests update access tokens to record when you last accessed the API.
Having identified the root cause of the problem, we disabled the trigger which sends writes to the “update log”, thereby restoring service.
We’ve resolved this problem for the future by storing the IDs for our “update log” using the
bigserial type, which allows values up to 9223372036854775807. This is effectively unlimited, and can be expected to provide enough IDs to last millions of years.
In the next few days, we’ll be running a full post-mortem to better understand:
We’ll publish the results of this post-mortem in a follow-up post within the next 4 weeks.
In this post, I’ll talk about how we changed the way we work over the last 9 months to build truly global software and introduce a localisation process which allows us to move quickly and deliver real value for customers.
We wanted to provide a great experience for our users, whatever language they speak — but it was imperative to do so in a way that didn’t slow us down as we continue to build out our product. When we get processes like this wrong, we not only make our team’s work harder than it needs to be, but we place a drag on what we care about most: delivering value for our users.
There’s a whole other post I could write about the intricacies of that process and how we’ve invested in sourcing skilled translators and ensuring we have perfect translations with quality assurance (QA) processes - but in this post, we’ll focus on the developer workflow.
This post represents the collective work of our Core Infrastructure team's investigation into our API and Dashboard outage on 10 October 2017.
As a payments company, we take reliability very seriously. We hope that the transparency in technical write-ups like this reflects that.
We have included a high-level summary of the incident, and a more detailed technical breakdown of what happened, our investigation, and changes we've made since.
On the afternoon of 10 October 2017, we experienced an outage of our API and Dashboard, lasting 1 hour and 50 minutes. Any requests made during that time failed, and returned an error.
The cause of the incident was a hardware failure on our primary database node, combined with unusual circumstances that prevented our database cluster automation from promoting one of the replica database nodes to act as the new primary.
This failure to promote a new primary database node extended an outage that would normally last 1 or 2 minutes to one that lasted almost 2 hours.
As we continue with our mission to create a global payments network, we’re making the experience for our international customers even better, by now supporting French, Spanish and German language selection from within GoCardless.
Just over two years ago, Lawrence wrote about Coach, our open-source Ruby library which makes it easier to build robust, maintainable and well-tested APIs by replacing Rails controllers built with ActionController with chains of "middleware".
Since then, we’ve continued using Coach and we’ve no doubt that it has allowed us to move fast, write code that stands the test of time and maintain developer happiness.
In this post, we'll build a simple API using ActionController, discover the pain points, and then see how Coach can help.
As my summer placement draws to a close, I thought I’d reflect on and share my last five months at GoCardless:
I can remember my first morning at GoCardless quite clearly; I was greeted by chocolates, a card (I know, the irony!) signed by the engineering team, a MacBook Pro waiting to be unboxed, and a GoCardless jacket hanging on the back of my chair. I instantly felt very welcome.
On Friday 25th July, we held 2017’s annual GoCardless internal hackathon.
In a hackathon, a small team comes together for a short period of intense work to solve a problem, complete a challenge or build something new.
We brought together everyone from across our cross-functional Product Development team, including everyone from product managers to designers to systems reliability engineers (SREs).
This carries on a proud GoCardless tradition, starting with the pool ball tracker we blogged about in 2012 and have continued since - one of our interns last year, Henri, highlighted 2016’s hackathon as one of the highlights of his internship.
We think our internal hackathons are super valuable because they give us a chance to try out new ideas, learn new skills and technologies and work with people we wouldn’t usually get to work with.
In this post, we’ll look at three of the projects that came out of the day:
Juliet, one of our Product Managers, worked with Ben and Joe from the Design team to build a churn calculator.
At GoCardless, we know that one of our greatest selling points is our fantastically low failure rate for payments. Where credit and debit card payments experience failure rates of 10-30% a month due to expired or cancelled cards, bank accounts don’t expire!
The team wanted find a way to help users understand the tangible difference cutting churn can make to their bottom line, so they put together a brand new churn calculator.
On the calculator, a potential user of GoCardless can input their total numbers of customers, the average value of each payment they collect and the number of payments they expect to collect per month for customer.
They’ll get back a beautiful graphical view, showing how much they can expect to lose to failed payments over the next 12 months for cards, standard Direct Debit and GoCardless.
Having experimented and built something awesome in less than a day, the team will move forward with their project, aiming to get it released onto our website soon.
Juliet said "It was amazing to have the chance to experiment and build something completely new in less than a day. We can't wait to move forward with what we've started and bring it to the GoCardless website soon".
João, one of our interns worked with Pete, one of our Technical Leads, to resurrect a classic GoCardless hackathon project from a bygone era: the 'make it rain' dashboard. This tool shows each payment being collected through our infrastructure over the course of a day as a coin falling from the sky, each one labelled with its payment amount.
They kicked off the project with another key goal: to work on something fun and learn something new.
"By the end of the hackathon, we were pretty happy with the result and what we learnt in such a short period of time. It was a really fun day!" said João.
Chris, one of our System Reliability Engineers, worked with Marco, one of our interns, to build a new tool called Slackify, aiming to bring one of the best features of MSN Messenger from the 2000s to Slack: showing what music you’re listening to in your status.
Experimenting with the Elixir programming language, they got to work together for the first time, and finished the day with a working prototype (albeit having spent much of the day fighting with Spotify and Slack’s OAuth APIs!).
Marco said "The hackathon was a blast! It gave me the opportunity to work with Chris, who I don't usually work with, and to get to know him better -- and also to learn Elixir!".