In-depth understanding of Docker ulimit

[Editor's Note] Docker after a large-scale application, if you did not stepped on the pit, said certainly no one letter out. Yesterday encountered a classic problem ulimit: business Container ulimit value is too small, resulting in failure to start. Ulimit problem, cliche, but in different scenes and the environment, showing the supernatural, often need some in-depth analysis to find the cause. The problem with the OS version, Docker version and configuration methods are closely related, we have to look back.

problem

Background

Microblogging platform business after three quarters of Docker, has been stable operation more than six months, due to the use of a more conservative version, although also stepped on a lot of pits, but are within controllable. Recently with RD to promote a large project, the premise is the platform business all Docker, this part has been completed 90% of the. The basic information is as follows:
1) OS version: CentOS 6.5
2) JDK: 1.7.0_25 Tomcat: 7.0.42
3) Docker: 1.3.2
4) Docker Registry: 1.0
Among them: is also promoting the upgrade OS to CentOS 7, Docker to 1.6.2.

2. phenomenon

The phenomenon of the phenomenon: When the server restart after the restart, and then start the business Container, found that failed to start, and to reproduce.

PS: with the operation and maintenance system to do the deployment did not find this problem.

3. reproduce

The main conditions for reproduction are:
1) Version: OS (CentOS 6.5), Docker 1.3.2, Docker Daemon starts with boot.
2) Host configuration: ulimit set to 200000, configured in / etc / profile.
3) operation: manual reboot machine, log on, start the business container, start up, one will fail.

4. Analysis

1) server restart, this process is normal.
2) service Container failed shortly after the start, the analysis for the container ulimit wrong, only get the default value: 1024. Will tell the value of the future is 1024.
3) restart the Docker Daemon process, and then start the Container, found everything is normal. Check the container ulimit value: host set 200000.

Phenomenon clear, and can reproduce, solve the problem is very simple, many ways. Later elaborate.

5. Summary

In a word, when the server restarts, the Docker Daemon starts with the system, and when the Container is started, it fails after getting the ulimit value set by the host, and then restarts after restarting the Docker Deamon (PS: this Not the way to solve the problem, it is just blind cat hit dead mouse).

Classical theory of ulimit

On ulimit problem, I believe that as long as it has done SA, or played the server RD should have encountered a variety of phenomena, the theoretical basis is very simple, we can refer to Taobao Chu Pa blog article , there are 4, Enough detail. In 2013, I asked him a question, he also from the source point of view to analyze, and this share the spirit of the stick. Here is no longer started to say.

The classical theory of the Linux system starts with the environment variable loading order

1. Linux system boot

This is directly on the CentOS (Redhat class) start the process: we all know that the current Linux system is the most commonly used to start two: init (SysVinit system) and Systemd Department of the two camps, the comparison of the two can refer to this article . Systemd is mainly used CentOS 7 and later versions, and the previous release are using SysVinit system, and we are the problem is CentOS 6.5, that is, SysVinit Department. The following look at the start of the process, refer to the following:
67d8f3eejw1euc2wz0ysaj20e30be76h.jpg
Specific process:
1) Load the BIOS hardware information and execute the BIOS built-in program.
2) Read the boot information in the Boot Loader in MBR (Master Boot Record).
3) Load the kernel Kernel boot into memory.
4) kernel began to implement / sbin / init, and load / etc / inittab, the implementation of rc.sysinit initialization.
5) Start the core plugin module /etc/modules.conf.
6) Run the script under /etc/rc.d/ by pressing the startup level (the server defaults to 3). which is:

  [Guansheng @ xx-xx-xx-yf-core rc3.d] # pwd 
/etc/rc.d/rc3.d/

This process will chkconfig – list to see the three on the service all up to.
7) Execute the / bin / login program.

Here you can see the login tty window.

2. Linux environment variable loading order

For the environment variable loading order, the release version is similar, here only say RedHat system, the general order is as follows:

  -> / etc / profile # global environment variable, set for each user first login 
--------> ~ / .bash_profile # user-level environment variables, each user first login settings
--------> ~ / .bash_login
--------> ~ / .profile
-> ~ .bashrc # User-level environment variables, each user login settings, open the new shell also set
-> / etc / bashrc
-> ~ / .bash_logout # User-level environment variables, when executed

Rules: The following configuration file continues with the previous variables and shell settings, the same being overwritten.

3. Theoretical reference to Docker ulimit

Docker in the 1.6 version and later, only to support ulimit related options, see GitHub, it should be someone mentioned PR , then the official support. Prior to version 1.6, the Docker Container inherited the ulimit setting from the Docker Daemon. See the ulimit section of the Docker blog for reference.

Problem complex

After the above detailed description and theoretical guidance, in fact, for the problem complex is very simple, and we briefly focus on:

1, because we manually restart the server, according to the above process can be seen, Docker Daemon in the system has been started up, this time the user is not logged in the case, we will not read the /etc/profile under the Ulimit configuration, so Docker Daemon will be the value of 1024 to start the process.

2, then the subsequent creation of the Container because the Docker version of 1.3.2, is to continue from the Docker Daemon value, which can only be seen in the Container 70 ulimit value, and business rely on a large number of mc, mcq, Redis, MySQL and HTTP, etc., natural 1024 is not enough, and start failure.

3, when the user logs on, restart the Docker Daemon, the process will naturally be able to read the user's environment variables, so that ulimit set to 200000. And then start the Container no problem.

PS: complex disk is very simple, but do not understand the above principles, many people still confused, at least I see this is the case.

New problem

Explore ulimit under CentOS 7

1. Problem description

When the OS for the CentOS 6.5, Docker for the 1.3.2 version of the problem after sorting out the problem, would like to try 7, so still in 7 under the deployment of Docker 1.3.2, and testing, new problems, when the host (Host ) Is not set up (that is, the default 1024), start the Container, found that the container ulimit is 1048576. Modify the host of ulimit, and then restart the Docker Daemon, start Container, Container or 1048576, strange.

2. analysis

After reading with his colleagues under the Docker source, Daemon start that part , all of a sudden on the clear. You can see, Docker Daemon for different system versions, its ulimit default settings are very different.

1) CentOS 7 Systemd system initialization, will automatically call Systemd under the startup script docker.service, the declaration of the default value is as follows:

  [Service] 
ExecStart = / usr / bin / docker -d -H fd: //
MountFlags = slave
LimitNOFILE = 1048576
LimitNPROC = 1048576
LimitCORE = infinity

2) CentOS 6, Docker Daemon start, did not set the default value. Reference: sysvinit-redhat.
3) and for the Debian class system, it is also set the default value: 1048576. Reference .

Docker version 1.6 settings for ulimit:

In many cases, for a single container, this ulimit is too high. In Docker 1.6, you can set it
1) global default ulimit:

  Docker -d --default-ulimit nproc = 1024: 2048 
Docker -d --default-ulimit nofile = 20480: 40960 nproc = 1024: 2048

2) set the ulimit separately when starting the container:

  Docker run -d --ulimit nofile = 20480: 40960 nproc = 1024: 2048 container name 

Here is an introduction that can deepen your understanding.

For the flexible setting of Docker ulimit, there is also a theory to note:

1) Docker container acquiescence to remove sys_resource (Linux ability), so ulimit-n can only be changed can not be changed, change the error: ulimit: open files: can not modify limit: Operation not permitted.
2) CentOS 7 docker run can use the –privileged option to remove Linux capabilities, but Docker default to remove this Linux capability is certainly a security consideration, so try not to use this option.
3) CentOS 6 to use -privileged, Docker version can not> = 1.0.1, otherwise it will report; stat /dev/.udev/db/cpuid:cpu0: no such file or directory .

solve

After the above discussion, the corresponding question should be clear, but also explained clearly. So CentOS 6, in addition to the manual restart Docker Daemon method to solve, there are other ways? The answer is yes, there are many ways, here a brief say it, thinking similar.

That is: If you use the sysV service, add one line at the beginning of /etc/init.d/functions : ulimit -u 204800 -HSn 204800 .
The principle is that the first line of the Docker service startup script will execute it.

  [Guansheng @ xx-xx-xx-yf-core ~] # ll /etc/rc.d/rc3.d/ | grep docker 
Lrwxrwxrwx 1 guansheng root 16 Jul 3 19:25 S95docker -> ../init.d/docker

User suggested:

1) @ARGV pointed out that /etc/init.d/functions will be called with all the services initiated by the system, it is recommended to set directly in the ../init.d/docker startup script, the proposed effective, thanks to corrective. However, this is equivalent to modify the Docker Daemon since the startup script.
2) @ dead wood – Linux, with the exchange, find the best program, or directly modify the / etc / sysconfig / docker configuration file. give it a like.

to sum up

Day out of the problem, more than an hour to clear it and resolved, the feeling is still just fine. Late at night, sober, wanted to write long microblogging to share to you, the problem is not difficult, but advocating to share the spirit is always very good.

PS: the article quickly written, ideas if not clear or by the wrong point, please help, very grateful.

    Heads up! This alert needs your attention, but it's not super important.