Orchestration tools are often used when scaling out an application stack. In a Docker environment, tools like Kubernetes, Mesos and Docker Swarm have typically been used for this purpose. Docker has brought significant updates to their orchestration offering with their latest release. In this blog post, we’ll give a contextual overview of the orchestration features offered in the Docker 1.12 release and discuss our experience of trying to setup and run replicated MySQL instances under Docker Swarm mode. We’ll see that although Swarm mode works fine for stateless servers, it isn’t (yet) a good fit for running multi instance database services with differing configs and where the role of each instance may change over time. Our hope is that this will change as the tooling evolves, because there are clear and obvious advantages to the orchestration approach in such setups.
Docker Swarm Mode
Up until now, Docker Swarm has been a standalone product, separate from the Docker Engine typically used for running containers. With the latest Docker 1.12 release, a “swarm mode” has been introduced as a native part of the Docker Engine. Stated objectives of Docker Swarm mode is to reduce the complexity of multi-host and multi-container application deployment and support scalability, resilience and redundancy in such deployments. Setting up, joining a node cluster and launching your application should only take a few terminal commands to get you going, and subsequent scaling of your setup and performing e.g. rolling updates should be similarly easy.
Swarm mode uses the manager/worker-model. A manager node distributes tasks across worker nodes, where the worker node executes the tasks assigned to it. When you deploy your application in Swarm mode, you create what is called a service, which defines the tasks to be run on the worker nodes. The manager assigns tasks to the workers in the form of a Docker container and the commands to run inside it. If a worker node goes down, the manager will redistribute and restart the task across the remaining nodes. Once a task is started on a worker node, it cannot be moved to other nodes. It either has to keep on running or fail, in place. A consequence of this is that when a task is restarted, a new container will be started with a clean sheet. Swarm mode also sports an internal load balancer, and when connecting to your exposed port, Docker will automatically distribute your requests across the cluster.
Swarm Mode and MySQL Replication
We wanted to explore the possibilities of using Docker Swarm mode for setting up and running replicated MySQL instances. We started out by creating a Docker overlay network which our servers connected to. We then created two different Swarm mode services, one service for the MySQL master and another for the MySQL slaves. Following this, we could set up replication following the steps in the MySQL documentation. While pretty straightforward, this setup does have some issues. First and foremost, Swarm mode services create homogeneous instances, while each MySQL replication node needs a slightly customized setup. An example is that each slave instance needs a unique
To get around this, we could specify the same options for all our instances at launch and then manually configure each instance dynamically at runtime. Another alternative would be to use the startup script to perform specific actions, but setting up and managing servers from bash can easily become complicated. This setup, relying on two different services, also makes managing failover impossible, as slaves cannot be promoted to masters. This problem arises as containers can’t be transferred from one service to the other. This is of course a major issue in a replication setting, and we spent some time investigating how to work around it.
What we ended up with was creating a single service for all of our MySQL instances. This makes it possible to fail over to a new replication master instance. The downside is that this exacerbates the problems we have with managing the required config variations between the different MySQL instances in the setup: we would now also have to find a way to manage the much larger differences between the master and the slave nodes.
In a typical replication setting, you would perform read/write-operations on your master and read-operations on the slave. With the current setup, we won’t be able to differentiate between a master and a slave when connecting from the outside of the overlay network as the connections would be loadbalanced across the cluster. The result is that an application has to run on the inside of the network in order to separate the MySQL instances from each other, and perform write operations on the master and read operations on the slave.
Bottom line: Even though a setup based on a single service definition fixes the failover issue, it does not bring us much closer to a really viable overall solution for using Docker’s Swarm mode to manage a MySQL replication setup.
Swarm Mode and Data Persistence
The other key area where our scenario seems to go beyond the current capabilities of Swarm mode is persistent data storage. In a typical Docker scenario, data persistence can be accomplished using volumes, host directories or a separate data volume container. When running in Swarm mode, these storage options either aren’t available or have undesirable traits. As mentioned, the tasks in a Swarm setup are distributed between nodes in the form of containers and commands to run inside these containers. When a node fails, the manager will discard the containers and create new containers in place of the old ones. Since a container is discarded when it goes down, we’d lose the corresponding data volume too. So Docker data volumes will not work in our scenario.
As we have covered above, Docker badly wants all instances of a service within a cluster to be identical, and there’s no easy way to specify individual options for separate instances. As a consequence we can’t use the host directory storage feature, since that would in practice mean that we would be specifying the same data directory for all our instances, and that won’t work for obvious reasons. Finally, storage containers can’t be used in Swarm mode as of the writing of this post, and custom volume drivers for Swarm mode haven’t reached a mature stage yet.
With no viable persistent storage options, we would basically lose all our data if all our nodes went down at the same time. Provisioning of new slave instances would also be impractical and potentially very time consuming for the same set of reasons.
The current Swarm mode implementation for Docker is easy to use and works very well with easily generalizable service instances. For a replicated MySQL setup, however, where instances need a certain degree of custom configuration and where data persistence is a must, some more functionality is needed in Docker. We believe Swarm mode will represent a very attractive option for managing multi node MySQL setups if future versions of the tooling has viable persistent storage support and native functionality that can deal with service instance customization.