Docker Swarm: Cost-effective container scheduling

[Editor's Note] This paper explores several container scheduling strategies, and takes memory constraints as an example to discuss how to use Docker Swarm to achieve reasonable scheduling of containers through resource constraints. Among them, the constraints on the container resources, including hard constraints and soft constraints, hard constraints refers to the actual constraints of memory resources, and soft constraints is when the server actual memory resources are sufficient, the container is free to use, once the memory resources The constraints are beginning to take effect. Hard binding and soft constraints can be used to reduce the waste of resources while ensuring the stability of the service.

We run hundreds of containers every day on hundreds of servers, and one of the biggest challenges is how to efficiently schedule containers. Container scheduling refers to the handling of container allocation on a set of servers to ensure that services can run smoothly. Since these containers need to be dispatched are components of the client application, we must schedule them before they are known for their performance characteristics.

Inappropriate scheduling methods can lead to the following possible outcomes:

  1. Excessive resource allocation – means higher costs.
  2. Too little resource allocation – means that the user's stability is poor.

Appropriate scheduling methods are important to us and provide the best user experience in a cost-effective manner.

Random scheduling strategy

Initially, the same scheduling method was used in our early products . This method (before Docker Swarm) did not constrain the operation of the container in any way, but simply randomly selected a server.

However, running the full stack environment and running the code segment is completely different – we soon found that this solution is not ideal. Our servers are often overloaded due to CPU overload and lack of memory.

Hard constraints

Together we need to define a new scheduler: no longer randomly select the server; to be able to restrain the required allocation of resources, ideally, but also easy to deploy.

Fortunately, Docker Swarm has all of these features, and the stability of the tool has recently met the requirements of the production environment. We use the spread scheduling policy to reduce the number of containers that are damaged by a server failure. And set up a mirror-based category relationship, similar containers can run in the same server.

We used the Docker integration feature in Datadog to see how containers were using resources in detail. Datadog contains all the data we need to describe the memory or CPU usage of each container, as well as the disk usage of each server.

With this data, we found that memory is a constraint (not a CPU or a disk), so we decided to use memory constraints to schedule our containers. Based on the observed Datalog memory allocation, we set our memory constraint at 99% of the position that is 1GB. We can also manually reset the constraints on each container.
The results show that this constraint is very effective! We will not see the server memory is low, or because of overload and run slowly.

Soft constraints

Enjoy the stability of this discovery, after a period of time, we note that this strategy over-occupied the server resources. Most of the actual memory usage of the container is much lower than the memory hard constraint of 1GB. This means that we pay a lot more than actually used.

We want to be more cost-effective, but we can not lose stability. Reducing hard constraints is not a good choice because applications that consume memory will crash because of this constraint.

We need a method based on the estimated constraints that can be broken if necessary. Fortunately, Docker provides the –memory-reservation option to set the memory soft constraint. When the soft constraint is set, the container is free to use the required memory, but when there is memory contention on the server, Docker will try to reduce the memory to the soft constraint value. Soft constraint-based scheduling reduces waste and sets a hard constraint to prevent runaway. But Swarm does not have this feature, so it is time for us to use the Go language to set up a custom version of the Swarm branch, which can schedule soft memory constraints instead of hard constraints. Then use Datadog to collect data, select the ideal soft constraint threshold based on probability, and set the hard constraint to the maximum value used by the container. This method significantly reduces waste and does not affect stability.

Dynamic range and breakthrough

Docker1.12.0 version, the coolest function is the ability to schedule soft constraints. Although it is still waiting to be released, but we have tried in advance, you can easily use the following command to schedule soft constraints.

  Docker service create --reserve-memory <SOFT_LIMIT> 

Given the success of soft constraints, our next step is to dynamically select soft constraints and hard constraints for each container. Because all the data is transported to the Datadog, through a query, get the ideal hardware and software constraints threshold, keep the container stable operation without wasting resources. Please pay attention to this blog, we have a result will let you know!

Source: Cost-efficient engine scheduling with Docker Swarm (translation: Chen Yan'e, proofreading: Huang Shuai)


Translator introduction Chen Yan'e, Anshan Iron and Steel Group Mining Company Information Development Center senior engineer, focusing on virtualization technology.

    Heads up! This alert needs your attention, but it's not super important.