All fun and games until you start with GameDays
As a payments company, our APIs need to have as close to 100% availability as possible. We therefore need to ensure we’re ready for whatever comes our way: from losing a server without bringing the API down, to knowing how to react if a company laptop is compromised.
To accomplish this we run GameDay exercises. What you will read below is our version of a GameDay. We hope that by sharing how we do GameDays we can give you a starting point for running your first GameDay.
In search of performance - how we shaved 200ms off every POST request
While doing some work on our Pro dashboard, we noticed that search requests were taking around 300ms. We've got some people in the team who have used Elasticsearch for much larger datasets, and they were surprised by how slow the requests were, so we decided to take a look.
Today, we'll show how that investigation led to a 200ms improvement on all internal POST requests.
What we did
We started by taking a typical search request from the app and measuring how long it took. We tried this with both Ruby's
Net::HTTP and from the command line using
curl. The latter was visibly faster. Timing the requests showed that the request from Ruby took around 250ms, whereas the one from
curl took only 50ms.
We were confident that whatever was going on was isolated to Ruby1, but we wanted to dig deeper, so we moved over to our staging environment. At that point, the problem disappeared entirely.
For a while, we were stumped. We run the same versions of Ruby and Elasticsearch in staging and production. It didn't make any sense! We took a step back, and looked over our stack, piece by piece. There was something in the middle which we hadn't thought about - HAProxy.
We quickly discovered that, due to an ongoing Ubuntu upgrade2, we were using different versions of HAProxy in staging (1.4.24) and production (1.4.18). Something in those 6 patch revisions was responsible, so we turned our eyes to the commit logs. There were a few candidates, but one patch stood out in particular.
We did a custom build of HAProxy 1.4.18, with just that patch added, and saw request times drop by around 200ms. Job done.