As developers, we work on features that our users interact with every day. When you're working on the infrastructure that underpins those features, success is silent to the outside world, and failure looks like this:
Recently, GoCardless moved to a container-based infrastructure. We were lucky, and did so silently. We think that our experiences, and the choices we made along the way, are worth sharing with the wider community. Today, we're going to talk about:
- deploying software reliably
- why you might want a container-based infrastructure
- what it takes to reliably run containers in production
We'll wrap up with a little chat about the container ecosystem as it is today, and where it might go over the next year or two.
An aside - deployment artifacts
Before we start, it's worth clearing up which parts of container-based infrastructure we're going to focus on. It's a huge topic!
Some people hear "container" and jump straight to the building blocks - the namespace and control group primitives in the Linux kernel1. Others think of container images and Dockerfiles - a way to describe the needs of their application and build an image to run it from.
It's the latter we're going to focus on today: not the Dockerfile itself, but on what it takes to go from source code in a repository to something you can run in production.
That "something" is called a build artifact. What it looks like can vary. It may be:
- a jar for an application running on the JVM
- a statically-linked native binary
- a native operating system package, such as a deb or an rpm
To deploy the application the artifact is copied to a bunch of servers, the old version of the app is stopped, and the new one is started. If it's not okay for the service to go down during deployment, you use a load balancer to drain traffic from the old version before stopping it.
Some deployment flows don't involve such concrete, pre-built artifacts. A popular example is the default Capistrano flow, which is, in a nutshell:
- clone the application's source code repository on every server
- install dependencies (Ruby gems)
- run database schema migrations
- build static assets
- start the new version of the application
We're not here to throw shade at Capistrano - a lot of software is deployed successfully using this flow every day. We were using it for over 4 years.
It's worth noting what's missing from that approach. Application code doesn't run in isolation. It needs a variety of functionality from the operating system and shared libraries. Often, a virtual machine is needed to run the code (e.g. the JVM, CRuby). All these need to be installed at the right version for the application, but they are typically controlled far away from the application's codebase.
It's easy to see, then, why people rushed at Docker when it showed up. You can define the application's requirements, right down to the OS-level dependencies, in a file that sits next to the application's codebase. From there, you can build a single artifact, and ship that to each environment (e.g. staging, production) in turn.
For us, and - I think - for most people, this was what made Docker exciting. Unless you're running at huge scale, where squeezing the most out of your compute infrastructure really matters, you're probably not as excited by the container primitives themselves.
What mattered to us?
You may be thinking that a lengthy aside on deployment artifacts could only be there to make this section easy, and you'd be right. In short, we wanted to:
- have a uniform way to deploy our applications - to reduce the effort of running the ones we had, and make it easier to spin new ones up as the business grows
- produce artifacts that can reproducibly be shipped to multiple environments3
- do as much work up-front as possible - detecting failure during artifact build is better than detecting it during deployment
And what didn't matter to us?
In a word: scheduling.
The excitement around containers and image-based deployment has coincided with excitement around systems that allocate machine resources to applications - Mesos, Kubernetes, and friends. While those tools certainly play well with application containers, you can use one without the other.
Those systems are great when you have a lot of computers, a lot of applications, or both. They remove the manual work of allocating machines to applications, and help you squeeze the most out of your compute infrastructure.
Neither of those are big problems for us right now, so we settled on something smaller.
What we built
Even with that cut-down approach, there was a gap between what we wanted to do, and what you get out-of-the-box with Docker. We wanted a way to define the services that should be running on each machine, and how they should be configured. Beyond that, we had to be able to upgrade a running service without dropping any requests.
We were going to need some glue to make this happen.
Step one: service definitions
We wanted to have a central definition of the services we were running. That meant:
- a list of services
- the machines a service should run on
- the image it should boot
- the environment variable config it should be booted with
- and so on
We decided that Chef was the natural place for this to live in our infrastructure4. Changes are infrequent enough that updating data bags and environment config isn't too much of a burden, and we didn't want to introduce even more new infrastructure to hold this state5.
With that info, Chef writes a config file onto each machine, telling it which applications to boot, and how.
Step two: using those service definitions
So we have config on each machine for what it should run. Now we need something to take that config and tell the Docker daemon what to do. Enter Conductor.
Conductor is a single-node orchestration tool we wrote to start long-lived and one-off tasks for a service, including interactive tasks such as consoles.
For the most part, its job is simple. When deploying a new version of a service, it takes a service identifier and git revision as arguments:
conductor service upgrade --id gocardless_app_production --revision 279d9035886d4c0427549863c4c2101e4a63e041
It looks up that identifier in the config we templated earlier with Chef, and uses the information there to make API calls to the Docker daemon. Using that information, it spins up new containers with those parameters and the git SHA provided. If all goes well, it spins down any old container processes and exits. If anything goes wrong, it bails out and tells the user what happened.
For services handling inbound traffic (e.g. API requests), there's a little more work to do - we can't drop requests on the floor every time we deploy. To make deploys seamless, Conductor brings up the new containers, and waits for them to respond successfully on a health check endpoint. Once they do, it writes out config for a local nginx instance with the ports that the new containers are bound to, and issues a reload of nginx. Before exiting, it tells the old containers to terminate gracefully.
In addition to long-running and one-off tasks, Conductor supports recurring tasks. If the application supplies a
generate-cron script, Conductor can install those cron jobs on the host machine. The application's
generate-cron script doesn't need to know anything about containers. The script outputs standard crontab format, as if there was no container involved, and Conductor wraps it with the extra command needed to run in a container:
# Example job to clean out expired API tokens
*/30 * * * * /usr/local/bin/conductor run --id gocardless_cron_production --revision 279d9035886d4c0427549863c4c2101e4a63e041 bin/rails runner 'Jobs::CleanUpApiTokens.run'
Step three: triggering Conductor on deploys
There's one small piece of the puzzle we've not mentioned yet - we needed something to run Conductor on the right machines during deployment.
We considered a couple of options, but decided to stick with Capistrano, just in a reduced capacity. Doing this made it easier to run these deployments alongside deployments to our traditional app servers.
Unlike the regular Capistrano flow, which does most of the work in a deployment, our Capistrano tasks do very little. They invoke Conductor on the right machines, and leave it to do its job.
One step beyond: process supervision
At that point, we thought we were done. We weren't.
An important part of running a service in production is keeping it running. At a machine level this means monitoring the processes that make up the service and restarting them if they fail.
Early in the project we decided to use Docker's restart policies. The
on-failure options both looked like good fits for what we wanted. As we got nearer to finishing the project, we ran into a couple of issues that prompted us to change our approach.
The main one was handling processes that failed just after they started6. Docker will continue to restart these containers, and neither of those restart policies make it easy to stop this. To stop the restart policy, you have to get the container ID and issue a
docker stop. By the time you do that the process you're trying to stop has exited, been swept up by Docker, and a new one will soon be started in its place.
on-failure policy does have a
max-retries parameter to avoid this situation but we don't want to give up on a service forever. Transient conditions such as being isolated from the network shouldn't permanently stop services from running.
We're also keen on the idea of periodically checking that processes are still able to do work. Even if a process is running, it may not be able to serve requests. You don't see this in every process supervisior7, but having a process respond to an HTTP request tells you a lot more about it than simply checking it's still running.
To solve these issues, we taught Conductor one more trick:
conductor supervise. The approach we took was:
- check that the number of containers running for a service matches the number that should be running
- check that each of those containers responds to a HTTP request on its health check endpoint
- start new containers if either of those checks fail
- do that no more frequently than every 5 seconds to avoid excessive churn
So far, this approach has worked well for us. It picks up containers that have fallen over, and we can tell
conductor supervise to stop trying to pick up a service if we need to.
That said, it's code we'd love not to maintain. If we see a chance to use something else, and it's worth the time to make the switch,
conductor supervise won't live on.
The road to production
So that's the setup, but moving our apps into that stack didn't happen overnight.
Our earliest attempts were at the end of last year (September/October 2015). We started with non-critical batch processes at first - giving ourselves space to learn from failure. Gradually, we were able to ramp up to running more critical asynchronous workers. By December we were serving a portion of live traffic for some of our services from the new stack.
We spent January and February porting the rest of our services over8, and adjusting our setup as we learned more9.
By early March we had everything working on the new stack, and on the 11th we shut down the last of our traditional app servers. 🎉
Many ways to get to Rome
So here we are, 3 months after completing the move to the new infrastructure. Overall, we've been happy with the results. What we built hits the mark on the goals we mentioned earlier. Since the move, we've seen:
- more frequent upgrades of Ruby - now that the busy-work is minimal, people have been more inclined to make the jump to newer versions
- more small internal services deployed - previously we'd held back on these because of the per-app operational burden
- faster, more reliable deployments - now that we do most of the work up-front, in an artifact build, deployment is a simpler step
So should you rush out and implement something like this? It depends.
The world of deployment and orchestration is moving rapidly right now, and with that comes a lot of excitement and blog posts. It's very easy to get swept along and feel that you need to do something because a company you respect does it. Maybe you would benefit from a distributed scheduler such as Mesos. Perhaps the container-based systems are too immature and fast-moving, and you'd prefer to use full-on virtual machine (VM) images as your deployment primitive. It's going to vary from team to team.
Even if you decide that you want broadly similar things to us, there are multiple ways to get there. Before we finish, let's look at a couple of them.
A VM option
There are plenty of hosting providers that support taking a snapshot of a machine, storing it as an image, and launching new instances from it. Packer is a tool that provides a way to build those images from a template and works with a variety of providers (AWS, Digital Ocean, etc - it can even build Docker images now).
Once you have that, you need something to bring up those VMs in the right quantities, and update load balancers to point to the right places. Terraform is a tool that handles this, and has been gaining a lot of popularity recently.
With this approach you sidestep the pitfalls of the rapidly-changing container landscape, but still get the benefits of image-based deployments.
A different container option
Docker has certainly been centre stage when it comes to container runtimes, but there are others out there. One which provides an interesting contrast is rkt.
Docker, with its daemon model, assumes responsibility for parenting, supervising, and restarting container processes if they fail. In contrast, rkt doesn't have a daemon. The
rkt command line tool is designed to be invoked and supervised by something else10.
Lately, a lot of Linux distributions have been switching to systemd for their default init process11. systemd brings a richer process supervision and notification model than many previous init systems. With it comes a new question of boundaries and overlap - is there a reason to supervise containerised processes in a different way to the rest of the processes on a machine? Is Docker's daemon-based approach still worthwhile, or does it end up getting in the way? I think we'll see these questions play out over the next year or two.
There's less contrast when it comes to images. There's the
acbuild tool if you want to build rkt-compatible images directly and they've also cleverly supported Docker images. It has conversion built-in with the
docker2aci tool, which means you can continue to use Docker's build tools and Dockerfile.
We mentioned earlier that deployment and orchestration of services are fast-moving areas right now. It's definitely an exciting time - one that should see some really solid options stabilise over the next few years.
As for what to do now? That's tough. There's no one answer. If you're comfortable being an early-adopter, ready for the extra churn that comes with that, then you can go ahead and try out some of the newer tooling. If that's not for you, the virtual machine path is more well-established, and there's no shame in using proven technology.
To sum up:
- start by thinking about the problems you have and avoid spending time on ones you don't have
- don't feel you have to change all of your tooling at once
- remember the tradeoff between the promise of emerging tools and the increased churn they bring
If you'd like to ask us questions, we'll be around on @GoCardlessEng on Twitter.
Thanks for reading, and good luck!