What are the problems and opportunities for container storage?

[Editor's Note] In the context of popular container technology, there have been some application problems. This article describes some of the problems with container storage. These problems are described in detail, and several solutions are given.

Author: Nick Glass Matos, FICO senior director of cloud services project, how committed to providing solutions for the lack of persistent storage volumes and data Docker containers.

The idea of ​​running an application in a container is not new. Yes, this is a trend, countless people study the container this hot topic, seems to find a way to solve the problem. The container will handle everything .

The origin of the container can actually be traced back to the mainframe era, which is not a new technology, and ultimately the technology began to mature and at an alarming rate to obtain user attention and recognition.

The container causes multiple applications to run in parallel on a single operating system, whether it is directly deployed on a physical server or a virtual instance. This is done by providing the ability to perform multiple copies on "user space" (ie, the platform on which the application is running, the system, or the code running on the kernel).

The current feedback from the container comes from the problem of running virtual instance failures and operational expenses, because we must provide dedicated memory and storage resources for each instance. These instances are usually too large or not enough to run very slowly when they need to be rapidly expanded.

Virtual instances have the ability to isolate and independently upgrade each instance, but in environments that run a similar or identical version of the operating system, each virtual instance runs the same process, occupying memory and preserving an approximate boot volume.

Traditional virtualization can be said to be very inefficient and wasteful of physical resources such as memory, processors, storage, rack space, power, cooling systems, and common resources (such as management systems) for a Web-based scalable network computing architecture target , IP address).

The containers provide a degree of separation, as they are independent of the container near them, so that the container appears to have the entire operating system. And this isolation allows them to interact with the outside.

In the 2014 exponential growth trend, containers and ecosystems grew rapidly in the corporate environment in 2015, but it is still far from universal. Although there are very few backup software vendors to provide support for container backup, but there is no way to achieve any backup software can backup containers?

Compared to virtual instances, containers are usually short-lived and stored on top of them. The container uses the overwrite file system function to implement the copy-on-write process, which compares the original mirror image Update information is written to the root file system. If the container is deleted, these changes are usually lost. So the container does not have persistent storage by default .

However, similar to Docker, this distributed scheme provides two features for accessing persistent storage resources: Docker volumes and data containers.

A Docker volume allows data to be stored in a container other than the boot volume, which can be implemented in multiple ways in the root file system. A container can create one or more volumes by providing a share name to the "-v" switch parameter.

The Docker configuration folder ( /var/lib/docker ) creates an entity that represents the contents of the volume. The configuration data on the volume is stored in the /var/lib/docker/volumes folder, and each subdirectory represents a generic and unique identifier (UUID) volume name. The data itself is stored in the UUID-named folder /var/lib/docker/vfs/dir .

Any volume of data can be in the host operating system to browse and edit, are standard permissions applications. However, the use of the volume has advantages and disadvantages . Because the data is stored in a standard file system, it can be backed up, copied, imported and exported by the operating system.

Disadvantages There are volumes that follow the UUID specification, which makes it difficult to relate to the container name. Docker solves this problem by providing the "docker cp" command, which allows files and folders to be copied from the host directory to the container directory by specifying the name of the container. This is similar to rsync.

Accessing shared hosts created on external storage by using volume options makes it possible to access external shared storage on NFS shares or LUNs, although this is not recommended.

A Docker volume can also be related to the host directory. Here again to use "-v" switch, the format is as follows: "- v / host: / container". This method allows the container to access persistent data on the host.

It is possible to access external shared storage on NFS shares or LUNs by using the volume option to access host shares created on external storage. This method can also be used to back up data accessed by the container.

Another option for managing data in Docker is the Docker data container . This concept refers to a dormant container in one or more volumes. These volumes can be exported to one or more other containers, and when the additional container is started, use the '-volumes-from' switch. The data volume container is like an internal Docker NFS server that provides access to the container from the center point of support.

The advantage of this approach is that it is abstracted from the location of the original data, making the data container into a logical center point. When the persistence of data is maintained in a dedicated container, it also allows the "application" container to access the data container volume for creation and destruction.

There are some issues you need to know about using volumes and data containers.

Independent storage

It is currently possible to delete a container without deleting the associated volume. In fact, this is the default behavior, unless overridden. Eventually it can easily end a separate volume without a relevant reference container.

Clearing standalone storage is a daunting task because it needs to match the container configuration file to match the container and its associated volumes.


There are no other security issues for container volumes and data, except for standard file permissions and configuration "read-only" or "read-write" access. This means that the user's access to the file on the container needs to match the host settings.

Data integrity

Using volumes and data containers to share data, can protect the integrity of the data. Such as file locking requires the management of the container itself. This is an extra overhead that must be added to the application.

The container does not provide data protection facilities, such as snapshots or copies, so data management must be handled by the host or container.

External storage also lacks support. In addition to the functionality provided in the system operating system, Docker does not provide specific support in external storage.

The container volume is stored in the /var/lib/dockerdirectory directory by default, which may become a performance bottleneck. However, it is possible to convert the container volume default storage location during the Docker startup process.

The last point highlights the current container storage problem: it is not possible to manage data sharing between containers running on separate physical hosts .

Data volumes can be placed in external storage, but the current design does not have the ability to use volumes from one host to another. To solve this problem, Flocker's solution from ClusterHQ is trying to solve the problem of volume migration brought about by the problem. There are also ways to change the volume management functions that are similar to Docker in this distributed scenario.

Original links: How storage works in containers (translation: Wu Jin Sheng)

Heads up! This alert needs your attention, but it's not super important.