Chat cluster management model

Docker is very fire very red, simply to the point where there is no reason. Why is Docker so red? Because it is a technology that can be used to lift the table. In the deployment of automation industry workers and machine tool manufacturers, home care center cmdb, distributed script execution and so on the so-called core technology, if not become tomorrow, will become second-rate technology. Just to Docker as a lightweight vmware to use, is unable to see through its essence. To understand the meaning of Docker, not from what Docker can do. Let us first recall the development process of the cluster management model, as well as the drawbacks of these backward models.

Manual management of the times

IP address is on the excel table. Management is landing board machine, SSH connection to the server. Manually execute the command to do a new server deployment, upgrade the existing version of the server, and various configurations to refresh the modified job.
The drawbacks are self-evident, there are so few things:

  • Lack of consistency, because it is manual so there are always some differences between the server
  • Inefficient, a person can manage the number of servers is very limited
  • Automated Great Leap Forward Times

The increase in the number of operations, and soon make the number of machines beyond the limit of manual operation and maintenance. No matter how bad the team, as long as the business long to the sake of this, there will be a lot of automated tools with the script to automate the implementation of the way to quickly support the business. This era is a golden age, the era of real long face. Because there is no automated operation and maintenance technology, business will encounter bottlenecks. The introduction of automation technology, effectively reflected the benefits of the business.

This era is characterized by two key systems

  • The local excel table in the IP address of the database with the way management, called the CMDB
  • A distributed script execution platform based on SSH or agent

Inefficient is no longer a major problem, the main drawbacks become:

  • A lot of script, messy, repeat the content, the quality is difficult to guarantee, and ultimately to the fault left hidden
  • There is no definition and management of the expected state of the current network, all the current state of the network are full of accumulated products, resulting in server state drift, resulting in snowflake server (each machine are not the same), and then to the business stability left hidden

These shortcomings on the business and there is no immediate damage, are internal injuries. And even a lot of hidden risks will be exposed to the emphasis on discipline, emphasizing the operation and maintenance awareness and so on. Very few people will be investigated behind the operation and maintenance of the concept of the problem. The result is that most companies are staying at this stage. After all, operation and maintenance is a sufficient area of ​​support can be used. Operation and maintenance and then do high-tech, particularly high availability, and the entrepreneurial company may not have the number of direct contact.

Development of the revolutionary era

Along with DevOps is the introduction of infrastructure as code. Simply is a bunch of development to kill the field after the operation and maintenance, see these operation and maintenance is actually to manage the current state of the network. So they bring the experience of writing code, the current state of the network model (the so-called code), the expected state submitted to the version control. Just like writing code, to manage the server configuration.

Many background development led the small start-up companies directly skip the last era, operation and maintenance automation system from the beginning is based on puppet and chef to engage in. In all fairness, with puppet more is the lack of historical burden, not because of the operation and maintenance problems how complex. Many management of the number of machines does not exceed ten, but how to use puppet / chef on a lot of time wasting the team is also some. On the contrary many large companies because of the heavy burden of history, and the huge traditional operation and maintenance team, this development of the revolutionary road but can not get through.

This approach is mainly to solve the problem of script management, and because the direct definition of the current state of the network, the consistency between the server will be much better. But the bright and beautiful model is essentially a bunch of scripts to drive. The drawbacks of the last era only after packaging and improvement, and there is no way to eradicate.

Apply the expected state to the current network to rely on or run the script. And different from the previous, and now is more to run someone else's cookbook, and the quality is good and bad.

Although the definition of the current state of the network, but the starting point is different (such as from a => c, b => c) need to do the upgrade operation may be completely different. It is very difficult to write an exhaustive script.

What are the problems?

Consistency and stability are the biggest problems. After the server is turned on, perennial is not reloading the system. Countless people in the above run the script, the implementation of the order, positioning problems. The actual state of the server is no way to precisely control. Infrastructure as code is an improvement, but has not yet eradicated the problem. Every time running a script on the server is actually a gamble, because no two servers are exactly the same. Test the script in the local test, may not be in another one will not cause problems. This is not to emphasize the code can not rm *, and rm path / * can solve the problem.

Version management is actually no. Those who have been developed may also use git / svn as the baseline for the deployment, and the basic version will be submitted to the warehouse. More use of the line or rsync mode. Rsync means to install a new server, you need to find a "most like the" server. And then copy the file to the new server, the configuration changes, start off things. Ctrip accident, I personally guess should be related to the version management confusion.

Fault replacement is very difficult. Do not say the first failure to replace, that is, a fault machine is a headache thing. Such as ZooKeeper. Each client is hard-coded three ip address. Once one of the ip hangs up. Zookeepr in accordance with the high availability of the agreement can be normal, but in the long run the ip is still from the use of the consumer removed. This is changed. Once the high availability of the business is not good, need to carry out the operation to engage in some trouble after the replacement of the fault machine thing, that is, a variety of script toss the rhythm of a variety of configuration files.

How the Docker lifted the table

Two-point theology, after entering the Docker era

  • CMDB is no longer important. CMDB together with IP, and server resources into the underlying blue-collar workers concerned about the problem. The upper level of background development and business operation and maintenance no longer need to be able to IP-centric CMDB to manage the configuration.
  • The distributed script execution platform retreated from the core operating system. Very simple, the server no longer needs to change the regular new server, release the new version are no longer dependent on the script on an existing server to perform the modified state. Instead, create a new container.

Docker Essence is a real version management tool. Prior to Docker version management was a patchwork solution. What is the version, the server is composed of three parts: version, configuration, data. The so-called version is the operating system, as well as the operating system configuration. Various third party packages, developed executable files, and some configuration files. The collection of these is a version, in fact, is a complete executable environment. In addition to the general is a database, which put two parts, part of the administrator can modify the configuration from the page, part of the business data. The version of the puppet era is a declarative file. The implementation of this statement when the need to install an operating system from an ISO, and then use apt-get / yum from a mirror source to install a bunch of system packets, and then use pip / bundle to install a bunch of python / ruby ​​language Level package, the last is to develop to your git / svn / an unknown tar.gz. Do you think these things are assembled each time the same thing is the version? Actually not necessarily. Want a wall to kill github, do not know how many people can not do the release. Docker packaged out of the system together with the mirror, in fact, is the best version of the elaboration.

Use Docker no longer need to modify the current network of the container. If a container needs to be upgraded, then kill it and put the new mirror in advance as a new container. The distributed script is executed and becomes a distributed container replacement. Of course this standardized operation, with mesos marathon has been perfectly solved.

After using Docker, can no longer be based on IP management. It is not assigned to each container an IP allocation, but IP static model can not keep up with the times. Based on IP management, it means that you will be based on SSH to log the IP to manage. This idea is from the bones of the backward. Process, process group, module, set these are the management of the granularity. As for the process which is running on which IP container, no longer important. A picture can illustrate this question:
Above the expansion of the button after the end of the fill you have to fill it? No! You just need to tell marathon that I want 32 process instances. It will go to these resources to run these 32 instances. The business ultimately needs 32 processes instead of 32 IP. IP is only the resources needed to run the process. The actual operation of the process may be in an IP on the start of the 32 ports, it may be randomly assigned to the five IP, each ran a number of ports. Of course, these distributions can be expressed by "constraints". Rather than let you engage in 32 IP, and then run a script to these IP deployment of these processes.

The Missing Piece

The jigsaw puzzle is the last one. Docker as a version of the tool is absolutely qualified. Marathon's hosting of all processes in Docker is also tricky. But not complete:

  • Docker mirror as a version of the release to the current network is unable to run, because any application at least have several services to visit each other. These hardcuts in the mirror of the IP address for an environment can not be implemented. A version of any configuration can be hard-coded, that is, IP address and port is not hard-coded.
  • Can you easily create and destroy containers, but what about other containers that reference this container's server?
  • Release, fault replacement is the same problem

Solution can look at these two pictures:

The program is very simple. App1 = local => haproxy = network => haproxy = local => app2. By hproxy "hosting all ports" locally in the container, that is, haproxy is used to connect between processes, not every process itself is responsible for connecting other processes on the network.

Imagine before the hard disk in the configuration file is a database. Hard code is wrong, is to spank the. So we put the hard-coded ip address into This time we no longer hard code any IP, and we only hard code a special port number. Each process has a bunch of special local port numbers to access the upstream and downstream services it needs. This port number behind the process in which IP, which port, which container implementation. As the user does not need to modify any code (such as compatible with what ZooKeeper / etcd God horse), do not care. Even this port is behind a number of remote IPs that make up a client-based high availability. The agent can even do some mistakes for a back-end and then try again.

With this artifact, the capacity expansion, release changes, fault replacement is very easy. Containers casually added, casually deleted. After the network structure changes, refresh the haproxy configuration of the various places it wants. All kinds of gray, all kinds of zero stop replacement program can be engaged.

Name service and network

There are many similar programs. The bottom of the program is SDN / IP drift, as well as network bonding. This program is characterized by maintaining the IP address as the most traditional name service, vain attempt to continue its life.

The upper level of the program is DNS. And then some of the upper program is ZooKeeper.

A variety of programs to fight is how to register their own services, how to find this point. The advantages and disadvantages of various programs can read on their own:

  • SmartStack: Service Discovery in the Cloud

Btw, airbnb put the program in 13 years put into production.

The most interesting is to compare this haproxy scheme with the SDN-based IP drift scheme. Haproxy is the network for the application layer process between the line of things, through the introduction of haproxy so that this line more flexible. The SDN program is that you are now between the business process is a static link between the IP, this connection is not flexible enough, the router to help you the whole. An IP hang up, you can IP drift to another machine to continue to use. In fact, in a scene to achieve the re-line of the two processes, breaking the two IP static exchange between the restrictions, to IP-based deployment program continued life.

The underlying technology is interlinked. The so-called IP drift is the last rely on the modern Niubi CPU, and software routing technology. Finally, the user is to play the state forward, dpdk God horse. So haproxy slow, forwarding efficiency problems God horse, the long term will not be a problem. Using software to line, is the trend. Even the routers are beginning to play so, and even hardware vendors have started selling software.

The Final Battle

Cluster management into a pure process management, IP is no longer important, the state is no longer important. CMDB will become increasingly marginalized.

Release changes are no longer to modify the server, but the new destruction of the container, and update the process of network connection. Distributed operating system will be less and less use, springboard machine is even more allowed to use.

Remember that "immutable servers" this reference, it will eventually be recognized by history.

    Heads up! This alert needs your attention, but it's not super important.