New Horizon | Maybe this is the right posture for large-scale distribution of container mirrors

With the growing popularity of Docker technology, some container-related problems also surfaced. In this paper, the problem of low efficiency of distribution caused by the surge of container volume is discussed, and a new solution is put forward. Find the problem, solve the problem, it is the true meaning of IT technology continues to progress, the decimal deep.

Container technology to make the configuration and deployment of the application efficiency is greatly improved, but in different areas of IT operation and maintenance of specific enhancements but not the same. For example, it is often less satisfying to use large-scale distribution of container mirroring with data center storage and network capacity.

This is indeed a bit counterintuitive: container technology has been efficient service operation capacity and compared to the virtual machine considerable level of resource conservation for people to call, why there will be inefficiencies? The answer lies in the Docker image storage and download mode. Traditionally, each mirror stored in the library must have all the files associated with its layers, which means that the Docker image on any host device is updated, and we need to download the full image from the library.

When faced with small containers in small Web applications, this mechanism does not seem to be a big problem. However, we can imagine that once the number of container systems grows to hundreds, thousands, or millions, then the volume of the library will continue to rise as the container system grows, and the amount of data needed to be downloaded each time Will be getting bigger and bigger. At the same time, access to Docker image download traffic will be raised to several GB, which is obviously we can not accept.

Twitter business like this is undoubtedly running a large number of container systems, naturally experienced the above problems. This situation will become more common as more companies adopt container technology and lead to the expansion of large-scale container libraries.

We believe that one of the viable solutions to the bloated puzzle of the container is the technical outcome of the CernVM file system (referred to as CernVM-FS). Developed by CERN (European Nuclear Research Center), and from the field of high-energy physics around the world, especially Fermilab's hard contribution.

The role of CernVM-FS is to help scientists distribute up to 2TB of software data developed each year – some of which are even distributed and uploaded to hundreds of thousands of computers every week. CernVM-FS uses a unique set of unique indexing, deduplication, caching and geographic distribution mechanisms designed to minimize the number of components associated with each download while maximizing the amount of downloaded data as much as possible.

Mesosphere and CERN are currently exploring how CernVM-FS can be combined with Apache Mesos to understand its actual effect in container download and our processing efficiency in large container environments.

CernVM-FS working principle depth analysis

The basic idea of ​​CernVM-FS was born in 2008, when CERN researchers began to look at container technology as a result of the underlying hardware virtualization requirements, seeking new application deployment mechanisms. Compared to creating mirrors or packages, they think of using a global distributed file system to help scientists install their own software on a single Web server, and then based on any location in the world to access it.

As a set of distributed file systems, the primary principle is that within a certain period of time, all available files are only a small part of the actual access. For example, to run a Web server, we only need to use the operating system (such as glibc and OpenSSL) in a part of the library. The distributed file system that hosts the operating system only needs to use the necessary files, and in fact CVMFS only needs to download and cache this necessary data locally. Researchers chose HTTP as a download protocol to access other available Web caching infrastructures (such as Akamai, CloudFront, Squid, and NGINX proxy servers).

The second principle lies in the metadata (that is, information related to the existence of the document, rather than the contents of the document) is given priority. In general, the software hosted on the file system is plagued by the fact that "libssl.so exists in lib32 or exist in the / lib64 directory or in / usr / lib32?" When dealing with these requests, generic distributed file systems tend to behave poorly. CernVM-FS will save all the metadata in the SQlite file, and as a regular file to download and cache. In this case, millions of metadata requests can be parsed locally, making CernVM-FS consistent with the local POSIX file system on speed performance.

The third principle is that the software can only be modified at the publishing point – all other customers can only achieve read-only operations at high availability levels. Most of the previously existing file systems can not handle the trade-offs mentioned in the CAP theorem in the design, since such systems assume that the client's goal is always to read the latest available data version. In this case, the software must be kept up-to-date while the application or container is running, which means that the file system must deliver a single snapshot during the run.

In view of this, CERN decided to use the content addressable storage with the Merkel tree structure. Similar to git, each file is internally named according to its (unique) encrypted content hash. Multiple copies of the same file that exist in different directories (for example, "ls" entities in different Ubuntu images) are merged into a single file. The SQlite file directory depends on the content hash value of the file. As a result, its root hash value (a 160-bit integer) will eventually determine the full set of file system snapshots. To close the chain of trust, the system provides an encrypted root hash value, and each client validates each bit of the received data to ensure that it comes from the intended source and is not tampered with.

Today, CernVM-FS is responsible for delivering millions of files and directories from large Hadron Collider software for about 100,000 computers distributed around the world.

Mesos in Mesos

Mesos uses a variety of implementations to provide task containerization capabilities through so-called "containerizers". The containerizer is responsible for isolating the tasks that are running and limiting the amount of resources (such as CPU, memory, disk, and network) that can be used for each task. The Mesos containerizer can be easily extended by the so-called "isolator", which can be viewed as a plug-in that performs a particular task (for example, starting an external volume in the container).

Last but not least, the containerizer also provides a set of runtime environments for the task. It allows the user to pre-package all the associations with the task ontology into the file system image. This image can then be arbitrarily distributed and used to start the task.

Mesos now has multiple types of containerizers. Its default option is Mes Mes containerizer – this tool uses the Linux namespace and cgropus to achieve task isolation with resource usage control, while with a set of isolators to achieve external function delivery. There is also a Docker containerizer that uses the Docker tool to extract the image and start the Docker container.

The Mesos containerizer has now been implemented with extended support for a variety of image formats, including Docker and AppC, to create a "unified containerizer". This solution makes the support of the new image format very easy: the use of a unified containerizer, we only need for the new image format to build a so-called "configer (provisioner)", and reuse all existing isolation code The For example, Mesos users can continue to use Docker mirroring, but all extraction, quarantine, and other tasks are implemented through Mesos rather than Docker tools.

Integrate CernVM-FS with Mesos for container image distribution

Content-addressable storage intervention, excellent security level and proven scalability make CernVM-FS a compelling container mirror distribution tool. In order to test its actual results, we created a new set of CernVM-FS libraries and added them to a set of Ubuntu installation packages. After that, we built a Mesos container mirror configurator based on CVMFS. Compared to the direct download of the full image, it can use the CernVM-FS client to remotely start the mirror root directory. The configurator can use the CernVM-FS library name as the input data (which is internally mapped to the URL of the CernVM-FS server) and the intra-library path necessary to act as the container mirror root.

In this way, we can be based on the same CernVM-FS library to achieve a variety of container image release. In essence, the CernVM-FS library is equivalent to a set of secure and extensible container mirroring registers.

From the perspective of the instrumentalizer, everything is in fact unaffected. It is still responsible for handling the local directory containing the mirror tree, and using all the necessary elements to achieve the container to start. But the biggest advantage of this approach is that CernVM-FS has a fine-tuned deduplication feature (in the case of a Docker, based on a file or block instead of a layer), which means that we are now able to download the full image Start the container under the premise. Once the container starts, CernVM-FS downloads the necessary files for task processing.

In our tests, we started a set of Ubuntu containers, and then run a command in the shell. In the traditional scenario, we need to download the full Ubuntu image, the volume is about 300 MB. However, when we use the CernVM-FS Configurator, we only need to download the necessary files for the task – specifically less than 6MB.

Since CernVM-FS uses content-addressable storage, we do not need to repeatedly download the same files. So if we start other containers (such as a CentOS image) and run different commands, we only need to download the files needed for the new command and reuse all the generic associations that have been downloaded in the Ubuntu image ( Such as Bash or libc.so). In this mode, the concept of the container layer has ceased to exist, and deduplication will take place at a more refined level.

We also plan to add support for this configurator to use the corresponding verification mechanism to start any CernVM-FS directory. This will allow developers to more quickly implement a fast iteration when handling container mirroring and make it easy for the operator to switch back and forth between different container mirroring versions.

With the popularity of the container wave

CERN and Mesosphere's technical team is looking forward to integrating CernVM-FS with Apache Mesos, which means that the IT industry will adopt application container technology more widely. If organizations and organizations want to improve application building, code deployment, and how data center operations are implemented at a fundamental level, it is necessary to start, shut down, update, and manage containers based on current levels of actual size. The tight integration between CernVM-FS and Mesos will be a viable way to help customers solve the container's storage capacity and network bandwidth bottlenecks.

As a new technology, a lot of pit about Docker, we need to fill together. If you have the relevant questions and confusion, but also welcome the message and the decimal to discuss.

Original link: https://mesosphere.com/blog/20 … ners /

Heads up! This alert needs your attention, but it's not super important.