While doing some work on our Pro dashboard, we noticed that search requests were taking around 300ms. We've got some people in the team who have used Elasticsearch for much larger datasets, and they were surprised by how slow the requests were, so we decided to take a look.
Today, we'll show how that investigation led to a 200ms improvement on all internal POST requests.
What we did
We started by taking a typical search request from the app and measuring how long it took. We tried this with both Ruby's
Net::HTTP and from the command line using
curl. The latter was visibly faster. Timing the requests showed that the request from Ruby took around 250ms, whereas the one from
curl took only 50ms.
We were confident that whatever was going on was isolated to Ruby1, but we wanted to dig deeper, so we moved over to our staging environment. At that point, the problem disappeared entirely.
For a while, we were stumped. We run the same versions of Ruby and Elasticsearch in staging and production. It didn't make any sense! We took a step back, and looked over our stack, piece by piece. There was something in the middle which we hadn't thought about - HAProxy.
We quickly discovered that, due to an ongoing Ubuntu upgrade2, we were using different versions of HAProxy in staging (1.4.24) and production (1.4.18). Something in those 6 patch revisions was responsible, so we turned our eyes to the commit logs. There were a few candidates, but one patch stood out in particular.
We did a custom build of HAProxy 1.4.18, with just that patch added, and saw request times drop by around 200ms. Job done.
Sound like something you'd enjoy?