DockOne Technology Sharing (37): Fun Docker Mirror and Mirror Build

[Editor's Note] This share from the personal point of view, about the Docker mirror and mirror the construction of some practical experience. The main contents include the use of Docker Hub for online compilation, download the mirror, dind practice, for some thinking of the mirror and so on.

@Container Container Technology Conference will be held on January 24, 2016 in Beijing, from the Philharmonic , microblogging, Tencent, where to network, the United States Mission Cloud, Jingdong, mushroom Street, Hewlett-Packard, Rush cartoon and other well-known companies responsible for the technology People will share their container application case.


This sharing is mainly from the perspective of personal practice, about my own Docker mirror for some of the play and experience. Most of the content in this article is still in the experimental stage, without the practice of mass production. It is noted. Inadequate or biased, please also correct me.

Mirror should be one of the core values ​​of Docker. Mirror consists of multiple layers. Then for a layer, there will be two angles to look at. One angle is that this layer as a separate unit to see, then this layer is mainly included in the file and configuration of the two parts. Another point of view is to combine this layer with all its parent layers, then the whole is representative of a complete image.

The Docker image described in this article mainly refers to mirroring from Dockerfile.

Now there are a number of public container service providers such as Docker Hub, which provides us with a very convenient image building service. We no longer need to run docker build locally but can simply use their services to achieve a convenient mirror to build. Here to Docker Hub, for example, to introduce some unconventional usage. You can use in the practice of a number of domestic container service providers, such as DaoCloud and so on.

The Docker Hub is compiled online

As we all know, Docker mirror can be used to describe an APP runtime. For example, we build a Tomcat image, the mirror contains the environment to run Tomcat and rely on. But we look at it, in fact, Docker mirror is not just a runtime, but to provide an environment, a software stack. From this point of view, the mirror can not only be used to provide APP to run, but also can provide such as the compiler environment.

With Docker to compile, this should not be a novelty play. Because the Docker source code is compiled through this way to get. Docker has a corresponding Dockerfile . You can use this to complete the code compilation.

Here I give an example. There is a written Dockerfile here. Test.c is a c language source file that outputs hello world.

  FROM centos: centos6 
RUN yum install-y gcc
ADD test.c /
RUN gcc /test.c

Build this image, since the last step is to compile the command gcc / test.c, so the compilation process will be executed on the Docker Hub.

We can write Dockerfile, making the entire compilation process are hosted on the Docker Hub. If we submit new code and need to recompile, you only need to rebuild the mirror.

Mirror download

In the v1 version, the Docker Client is the layer of the serial download image. For the docker pull process analysis, you can see Docker Client a total of several steps:

  • / V1 / repositories / {repository} / tags / {tag} Get the tag id,
  • / V1 / images / {tag_id} / ancestry Get the id of each layer of the tag
  • / V1 / images / {layer_id} / json Get the corresponding configuration file json for each layer
  • / V1 / images / {layer_id} / layer in order to obtain the corresponding mirror image layer

Docker Hub image data, not in their own server storage, but the use of the Amazon s3 service. So when you call the / v1 / images / {layer_id} / layer interface, when you pull the mirrored layer data, it returns 302 to redirect the request to Amazon's s3 service for download.

In order to facilitate the download, I wrote a small program, the use of HTTP protocol can completely simulate the Docker Client the whole process. The advantage of your own writing is that you can get the ID of the tag, the ID of each layer, and the configuration of all layers, and then store the corresponding mirror data of all layers at the s3 address of the Amazon and then download it in parallel. If the single-layer download fails, just re-download this layer can be. When all the layers are downloaded locally. And then labeled as a tar package, and then use the Docker Client can load.

For the above-mentioned online compiler, then we actually only concerned about the translation of the relevant documents. As just for example, we actually only need to get the last layer of the mirror on it. Then use their own tools to write, you can only download the last layer down. Download the tar package to unpack, you can directly get out of the compiled results, that is, the compiler generated the relevant documents. Docker Hub has become a powerful online compiler for us.

Note: The mirror download process here is for the Registry v1 version. Docker Hub will soon end the v1 service. At present, several domestic container service providers can also support v1. The method is equally effective. V2 agreement and code I have not studied, after the study and then share with you.

Mirror layer merge

Mirror layer merging this topic has always been a controversial topic. Excessive Dockefile will result in a lengthy layer of mirroring. And because the image layer too much (such as more than ten layers, dozens of layers), may bring the performance and stability concerns are not unreasonable, but it seems Docker community has not considered this is an important issue. So basically for the mirror layer merged PR was finally rejected. But that does not affect us here to discuss his realization.

I added two more instructions to Dockerfile. TAG and COMPRESS.

The TAG function is similar to the docker build -t parameter. But build -t Dockerfile only to the last layer of the image tag. The new TAG instruction can also be used to record the middle layer of the build. such as

  FROM centos: centos6 
RUN yum install -y sshd
TAG sshd: latest
ADD test /
CMD / bin / bash

This TAG function is equivalent to using the following Dockerfile to generate such a mirror, and marked the sshd: latest tag.

  FROM centos: centos6 
RUN yum install -y sshd

The COMPRESS function implements a mirrored multi-layer merge function. For example, the following Dockerfile:

  FROM centos: centos6 
RUN yum install -y sshd
ADD test /
CMD / bin / bash
COMPRESS centos: centos6

We know that here assume that RUN yum install -y sshd , ADD test /, CMD / bin / bash generated mirror layer is a, b, c. Then the functional goal of COMPRESS is to merge the new a, b, c files and configurations into a new layer d and set the layer d to the parent for the mirror centos: centos6. Layer d configuration file can be used directly layer c configuration file. The difficulty of the merger is how to calculate the file for layer d.

There are two ways to do this, one is to layer a, b, c in accordance with the rules of the merger of the merger. The merged rules include files that are shared by the sublayer and the parent layer, and the files that are not crossed are all newly added. This method is less efficient and can be time-consuming when you need to merge too many layers.

Another way to think is more simple, do not need to consider the total number of layers in the middle. Direct comparison of centos: centos6 image and c image (c mirror refers to the c and all of its parent layer of the mirror), the two of all the documents to do the comparison, the two diff is the new layer d.

Ultimately, I used the latter as a COMPRESS implementation. Mirroring merges shrink the number of layers, but the drawback is that the Dockerfile information that generates the mirror is also eliminated (using the Dockerfile generated mirror, which can be traced back via docker history).


Dind (Docker in Docker), as the name suggests is in the container inside to start a Docker Daemon. Then use the latter to start the container again. Dind is a relatively high play, from another point of view is also a certain risk of play. Dind cleverly exploited Docker's nesting capabilities, but it is quite worrying that the underlying graph driver is behind nested performance and stability. So dind I do not recommend using the container as an operating environment (RancherOS is actually used in this way), but using it as a mirror-building environment can be practiced. After all, the consequences of building failures are not as serious as the consequences of the run-time crash.

The reason will be used dind, because if used for mirror construction, then the direct use of multiple physical machines, a bit more waste. Because the building is not always happen. And the use of dind way, just need to apply for a number of containers, and then on its construction operation. In the absence of time can be timely release of container resources, more flexible.

Making dind mirroring requires a CentOS image (other not yet implemented, fedora / ubuntu can also do), and a wrapdocker file. The main effect of wrapdocker is the environment required for the Docker Daemon to run when the container is started.

Because the container starts, Docker also needs some environment to start the daemon. For example, in CentOS, need wrapdocker to cgroup, etc. ready. Use CentOS to create a container after the installation of Docker Docker and other components required, and then put the wrapdocker ADD. And add wrapdocker as ENTRYPOINT or CMD. And then the container commit into a mirror, to get a dind mirror. The use of dind mirror need to use privileged privileges, you can use.

Familiar with Docker source students should know, dind is no stranger. In the Docker project, there is such a dind file . This dind file is actually a wrapdocker file. When Docker performs an integration test, you need to use the file to help prepare the environment to start a Daemon inside the container to complete the integration test.

If you are interested in dind, you can refer to jpetazzo in the Dockerfile and wrapdocker, to build their own dind mirror.

Dind in the use of Docker with the same Docker. Not repeat them.

Thinking about mirroring

The Docker image consists of several layers. And each layer is composed of files and configuration. If the relationship between the layer and the layer between the father and son, as a kind of time on the relationship, then the Docker mirror and Git is very similar to the image. So in theory, Git a number of functions, such as merge, reset, rebase function In fact, we can be in the Docker construction process to be achieved. For example, the COMPRESS function above is similar to Git's merge. In theory, Docker image can actually have Git-like features. From this point of view, Docker image flexibility is much higher than the mirror like KVM.

Here, have to complain a few words. Docker's maintainers did not give a very positive attitude towards dockerfile or Docker's construction process. Of course this may also be due to their more focus on the runC, libnetwork, Orchestration. So there is no more manpower to improve the Docker build tools, but hope that the community can add their own other tools to enrich the Docker build process.

So a lot of time, docker build function is not satisfactory. For example, has been a very high voice Docker image compression function, after several discussions, and finally fruitless For example, in the build process, use the –net parameter to make it possible to control the network used by the container in the build process. The discussion began in January this year, and has yet to be settled. We can go strong onlookers. The address is here .

Here in particular, in CentOS 6, dind can not use the bridge (centos7 can support), so in CentOS 6 under the use of dind, docker build, need to specify the network – net = host way.

So a lot of features can not wait for Docker to improve their own, had their own hands-on development. In fact, familiar with the Docker source, the docker build on the development of this difficulty is not great. You can achieve their own. Read Sun Hongliang students "Docker source analysis", will soon get started.

Q & A

Q: Jingdong private cloud is based on OpenStack + Docker, network and storage solution is what?

A: Yes. Private cloud networks use VLANs. And did not use tenants to isolate, mainly to ensure efficiency. Storage is the use of Jingdong own storage.

Q: What is the benefit of that image compression?

A: Mirror compression or consolidation, mainly to reduce the number of layers, reduce concerns. In fact, the current look, the benefits are not obvious. Because too many layers bring more concerns, but there is no conclusive evidence that will affect stability.

Q: Is it widely used online? We may generally be more concerned about the final results. There are a lot of code is the first in the local compiler, after the success, and then released to the mirror.

A: This play should be said not widely. Mainly when I play their own, do not want to pull all the layers of the mirror, only concerned about the results of the compiler. So play like this

Q: for the storage of Docker mirror Jingdong is the use of what way to achieve the distributed file system Jingdong Docker have to use it?

A: Mirror storage is using the official registry. V1 version. The backend of the registry is the JFS storage of Jingdong.

Q: You mentioned earlier that the mirror merges shrink the number of layers, but the drawback is that the Dockerfile information that will be mirrored is also eliminated (using the Dockerfile generated mirror, which can be backtracked by docker history.) "If you use Compress, How should I go backtracking? Or need to abandon this part of the function?

A: Yes, there is no way to go back, so to give up. But in turn think, in fact, if the Dockerfile ADD and COPY like the function, even if the back, in fact, meaning is not big. So I think saving Dockerfile is more meaningful.

Q: Why not use the command to be executed into a script, add directly to the implementation of this, but also to reduce the number of layers?

A: This method is also feasible. Just Dockerfile is more explicit. Similarly, as long as you do a good mirror, direct export out, you can get all the files. Matched with the configuration file. So there is only one layer.

Q: I usually, when the test did not – have a compressed, do not know, compression will bring any risk, but to see you just said there may be a certain risk. Have you met you?

A: Because we have done the mirror layer of the merger, so the number of layers is not much. Do not merge will bring any risk, in fact, more of the performance and stability concerns. This concern may be superfluous. But we would rather choose to be cautious.

Q: how to mirror the merger can easily reduce the size of the mirror, I do some of the mirror in more than 1G?

A: reduce the size of the mirror or mainly by removing unnecessary files. Consolidation can only reduce redundant files, if each layer of the file are not the same, the merger does not reduce the size of the mirror.

Q: the use of this network can be said that some of the VLAN, each container has a host with the network of the real physical IP?

A: Yes. Each container has a real IP. Different from the host network segment. Is a separate container network. This can refer to neutron in the Vlan implementation.

Q: Also, the mirror compression I feel, but as you like the father of the mirror into a new mirror this point I think a bit of a problem, after all, when we play the container is added to the basic image, you use the commonly used mirror In order to compress to generate a one-time mirror, and then use the basic mirror to do other business that do not have to re-download the basic mirror?

A: Mirror merger is mainly to get a basic mirror. Then we add something on the base image. The base mirror is relatively easy to change.

Q: In your practice, large-scale deployment of the container, each node will be downloaded from the Registry node mirror, to the network pressure?

A: We did some optimization. First, most of the business use of the mirror will be pushed forward to each Docker node. Even if the node is not, the Registry is connected to the Jingdong JFS, through optimization, temporary to download the JFS can be taken directly from the mirror data. So the network pressure is not large.

Q: After the image is compressed or merged, the level of the mirror is reduced, but is the mirror of each layer not getting bigger, is it not going to take up bandwidth for efficiency?

A: This question is about the same. Consolidation is primarily used for base mirroring.

Q: How do you see the relationship between OpenStack and Docker? In the future will be two long coexistence in Jingdong? Now the two architects of the development speed and R & D strength comparison how?

A: OpenStack and Docker are not contradictory. Private cloud using nova docker combination is more to meet the user's habit of using VMs. Magnum is also growing fast. So I believe that both have the value and the need for development.

Q: Do you have any good advice or experience on the optimization of dockfile?

A: There seems to be no new advice. Refer to DockOne's article. Dockerfile optimization experience , everyone in the write dockerfile when what best practice? Hope to get everyone's advice .

Q: For example, to create a rabbitmq mirror, you need to install a lot of dependencies, and finally compile, and finally generate the mirror 1.3G, like this situation, in the creation of the mirror when the mirror can reduce the size of it?

A: There is no good way to reduce it. May need a certain manual or tool to analyze unwanted files to reduce the size of the image.

Q: how is the Docker automatically updated, how to build a mirror repository, how to update the new version of the mirror?

A: Docker we fixed a version. If not a large area of ​​serious problems, almost no updates. At present, the operation is stable. So there is no need to update. The new version of the Docker, such as the network, etc., for the time being we will not follow up the use of a large area. Their mirror storage, if you want to update the new version of the mirror, push into it.

Q: a problem that bothers me for a long time, if there is a dependency between the mirror, the basic mirror changes after the other mirror you are followed by the update?

A: In the internal private cloud, we generally use a good base mirror. There is a problem inside, once the base mirror needs to patch, the impact of relatively large. First of all, many of the base sub-mirror will be affected. On the other hand, it is important to consider already using nodes based on base or base submirrors. The former my program is directly in the base mirror in the layer, the need to patch the file to join in, re-packaged back. For the latter, there is no good way to solve.

Q: When running the container, 1, the application of the log or configuration file inside the use of local mapping is not a good point, I was considering to facilitate the view log or modify the configuration; 2, create a database image, when running the container Is the data file mapped to the local better?

A: Logs We are really using local mappings. And some business side crazy write log without constraints. So we made a local mapping LVM, linked to the container. Do the capacity of the restrictions. Configuration, now there is an internal deployment system will help them deploy the configuration. The database is a reason, but also mapped to the local. Also part of the access to the cloud hard drive.

Q: Docker, each layer of the mirror to play the label that I feel very strange, when the pull of a mirror or generate a container, how does it find your name mirror layer?

A: not to label each layer, but you according to your needs to a layer to label. As for the label content, you need to control yourself.

Q: on the implementation of Compress some doubt, is not in the realization of the process, only to consider the final image and the previous layer of diff, or to be done by layer diff?

A: just consider the last mirror and you want to merge to the parent layer mirror as diff. So as long as a diff, you can get all the changes in the middle of the file.

Q: What is the wrapdocker file's working principle?

A: This working principle is mainly to prepare some Docker to start the necessary environment. For example, in the CentOS, need to wrapdocker to prepare cgroups and so on. You can refer to the code inside the wrapdocker.

Q: the container running on the physical machine, and OpenStack platform virtual machine is the same set of management system? How to integrate with the container cluster system?

A: is the same system, all with nova. Virtual machines KVM and containers are mainly mirroring types. In the nova scheduling time, according to the type of mirror scheduling to the KVM or Docker node to create.

Q: Is the number of Dockers running on a physical machine limited or is the application running?

A: There is no special limitation. Mainly to the business side to apply. Business people used to use large memory, multi-CPU. The number of containers created on this physical machine is less. Roughly so.

Q: would like to know, how do you manage the image of the tag? According to what to play Are you still in the warehouse for the old mirror, whether you are discarding or keeping the code like Git?

A: tag is determined by each user. Different users in different Repository. Mirror tag itself. But we hope that they can be more standardized, such as the use of git version number to play tag.
If the old image is lost (the new image snatch the tag), the old image will be deleted. But not immediately, but also regularly clean up, mainly to reduce the amount of storage. Because after all, do not need to store so many versions.

The above content is based on the December 8, 2015 micro-credit group to share content. Shareers : Xu Xinkun, Jingdong Mall Cloud platform Nanjing R & D center JDOS team R & D engineers, from the beginning of 2014 engaged in Docker research and development, is responsible for Docker in Jingdong landing related development and maintenance work. DockOne organizes targeted technology sharing every week, and welcomes interested students to add a letter: liyingjiesx , who is interested in listening to your topic.

Heads up! This alert needs your attention, but it's not super important.