Basic Concepts of Docker Basic Tutorial

This article links the basic concepts of Docker Basic Tutorial

Introduction

This article is the first of a series of tutorials, and I will show you a few of the techniques from a user perspective: namespace, cgroups, veth, bridge, copy-on-write, image and container. The core of the entire container technology, including namespace and cgroup two parts, which namespace is responsible for resource isolation, cgroups is responsible for resource constraints. And Docker put forward several important concepts on top of these two technologies to make container technology popular.

If you just touch Docker please refer to my two other articles:

  • Docker
  • Docker

installation

The official provides a very convenient installation script, just run the following command to install the complete:

Curl -s https://get.docker.com | bash

Namespace

Namespace is Linux at the kernel level of isolation technology, it can let the city has its own independent process number, network, file system (similar to chroot), different namespace between the processes are not visible to each other. Because the system is through the underlying system calls to provide namespace function, so there is no user mode management namespace tools on how to create specific types of namespace can refer to the cool shell of the two articles Docker basic technology: Linux Namespace (on) and Docker basic technology: Linux Namespace (below) , but other processes in the namespace after the start, sometimes we need to enter the process of running the environment to debug and other operations.

The current version of the Docker can be directly through the exec sub-command to switch to the process where the namespace:

Docker run -d –name nginx nginx
Docker exec -it nginx ls
Docker exec -it nginx ip a
Docker exec -it nginx ps -ef

If in the early version, or other container engine, you can pass nsenter:

Yum install util-linux
PID = $ (docker inspect –format {{.State.Pid}} container_name
Nsenter –target $ PID – mount –uts –ipc –net –pid

Cgroups

Introduction

With the isolation of resources, if the namespace can not be under the process of using the resources to do the restrictions, then the isolation does not make sense. So Docker uses c kernels another kernel technology to make resource restrictions on processes and their child processes. Need to explain is that we through the local file system to manage the cgroups configuration, as modified / sys directory under the contents of the file can modify the same kernel parameters. But in the Docker engine this layer has been to help us shield the details of the bottom, we can easily configure the docker command cgroup related parameters.

First look at the currently open cgroup subsystem:

Mount -t cgroup

Ls / sys / fs / cgroup / * / docker /

You can see the docker will be container-related cgroup configuration into the docker directory.

CPU subsystem

Cgroups provides three ways to limit CPU resources: cpuset, cpuquota, and cpushares

Cpuset

Cpuset can limit the process using the CPU core number, you can manage through cpuset/cpuset.cpus , the corresponding docker command is:

Docker run –cpuset-cpus 0 -d –name nginx nginx

Cpuquota

Cpuquota can use the time slice to limit CPU resources, than the cpuset fine granularity, only need to set a relative value of 100000 can limit the effect of a percentage, through the cpu/cpu.cfs_period_us (configuration time slice unit , The default is 100000) and cpu/cpu.cfs_quota_us (time slice accounted for) two files to manage, the corresponding docker command is:

Docker run –cpu-quota 50000 -d –name nginx nginx

Cpushares

Cpushares can allocate CPU resources based on weight, such as if only one process weight is 1024, then the process can use 100% of the CPU resources, if there are two processes and the weight is 1024, then each process can use 50% of the CPU resources , Through cpu/cpu.shares to manage, the corresponding docker order:

Docker run –cpu-shares 1024 -d –name nginx nignx

More about cgroups for the implementation of the CPU subsystem can refer to the following links:

  • Cgroup – Speaking from CPU resource isolation (1)
  • Cgroup – from the isolation of CPU resources (b)
  • Cgroup – Speaking from CPU Resource Isolation (3)

Memory subsystem

Cgroups on the memory limit, including physical memory and swap two, when the process uses the memory reaches the upper limit will be kicked off. On the memory of several documents include:

Memory.limit_in_bytes memory.soft_limit_in_bytes memory.memsw.limit_in_bytes

The same docker help us to do the bottom of the management:

Docker run -m 100m -d –name nginx nginx

Need a special note that the docker default will swap limit set to 2 times the memory, so you will find the actual use of the process memory may be greater than- -m set the memory size.

Blkio subsystem

Blkio subsystem function is the block device to read and write the rate limit, the current individual has not how this time, the specific introduction can refer to Cgroup – Linux IO resource isolation

Veth

Veth is a special Linux network interface device, it is always in pairs, and sent to the end of the packet will be issued from the other side, the following I show through the command docker Lane inside the network and host network is how Unicom.

  # Create a new net namespace 
➜ ~ ip netns add test
➜ ~ ip netns list
Test

Create a pair of veth network interfaces

➜ ~ ip link add veth0-0 type veth peer name veth0-1
➜ ~ ip link list
27: veth0-1 @ veth0-0: <BROADCAST, MULTICAST, M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
Link / ether da: bd: 24: 5f: e6: a8 brd ff: ff: ff: ff: ff: ff
28: veth0-0 @ veth0-1: <BROADCAST, MULTICAST, M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
Link / ether 36: 0f: fc: 64: 1d: f0 brd ff: ff: ff: ff: ff: ff

Put veth0-0 into the first step in the new namespace

➜ ~ ip link set veth0-0 netns test
➜ ~ ip link
27: veth0-1 @ if28: <BROADCAST, MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
Link / ether da: bd: 24: 5f: e6: a8 brd ff: ff: ff: ff: ff: ff link-netnsid 4

To the namespace in the network interface configuration ip

➜ ~ ip netns exec test ip addr add local 10.0.78.3/24 dev veth0-0
➜ ~ ip netns exec test ip link set veth0-0 up
➜ ~

To the host on the network interface configuration ip

➜ ~ ip addr add local 10.0.78.4/24 dev veth0-1
➜ ~ ip link set veth0-1 up
➜ ~ ip a
27: veth0-1 @ if28: <BROADCAST, MULTICAST, UP, LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
Link / ether da: bd: 24: 5f: e6: a8 brd ff: ff: ff: ff: ff: ff link-netnsid 4
Inet 10.0.78.4/24 scope global veth0-1
Valid_lft forever preferred_lft forever
Inet6 fe80 :: d8bd: 24ff: fe5f: e6a8 / 64 scope link
Valid_lft forever preferred_lft forever

Test host and namespace network connectivity

➜ ~ ping 10.0.78.3
PING 10.0.78.3 (10.0.78.3) 56 (84) bytes of data.
64 bytes from 10.0.78.3: icmp_seq = 1 ttl = 64 time = 0.115 ms
64 bytes from 10.0.78.3: icmp_seq = 2 ttl = 64 time = 0.054 ms
^ C
--- 10.0.78.3 ping statistics ---
2 passengers transmitted, 2 received, 0% packet loss, time 1000ms
Rtt min / avg / max / mdev = 0.054 / 0.084 / 0.115 / 0.031 m

➜ ~ ip netns delete test

Bridge

Bridge is another Linux network interface device, you can understand him as a switch, all added to the bridge interface on the other network interface in a large two-tier network, and docker to make the same machine all the Container communication, through the way is all the host on the veth interface added to the bridge adapter called docker0.

docker run -d --name nginx1 nginx
docker run -d --name nginx2 nginx
yum install bridge-utils
brctl show docker0

Copy-on-write

Copy-on-write is a federated file system mechanism, the joint file system is a different file system directory together with the file system. I would like to explain the joint file system like people: imagine that now you have a few pieces of transparent glass, and then you write a pen on each piece of glass to write some text, and then these glasses in a certain order superimposed together, From top to bottom you can see all the text; and when you want to modify a text, all of your changes are in the top of the piece of glass took place, the lower glass on the text Just covered and actually did not have any changes. Specific to the Linux on the joint file system, when we need to modify the contents of a file, the operating system will copy a copy of the new file into the top of the directory and then modify the file, in fact, the lower directory does not change.

I use the overlayfs file system as an example to show how copy-on-write works:

  # Create a directory structure 

Mkdir test

Cd test /

Mkdir lower upper work merged

Ls

Lower merged upper work

Create two files in the upper and lower layers, respectively

Echo lower> lower / lower.txt

Touch upper / upper.txt

Hang the overlayfs file system into the merged directory

Mount -t overlay overlay -olowerdir =. / Lower, upperdir =. / Upper, workdir =. / Work ./merged

View the contents of the mounted directory

Ls merged /

Lower.txt upper.txt

The newly created directory is actually located in the uppper directory

Touch merged / merged.txt

Ls upper /

Merged.txt upper.txt

The file is modified when the file is actually copied from the lower directory to the upper directory, and then modify its contents

Echo change lower> merged / lower.txt

Ls upper /

Lower.txt merged.txt upper.txt

Cat upper / lower.txt

Change lower

Cat lower / lower.txt

Lower

Image

All of the above mentioned technologies are not docker first, and then docker's Niubi place is to organize the existing technology, and then define a set of norms so that we can more convenient to build, transfer and use of container technology. Image, as the name suggests, is the template of the container, just as we are from the mirror to start the same virtual machine. On the image we can compare virtual machine mirror view:
1. They are all in the form of a template, we start the virtual machine from the virtual machine, run the container from image. Once the build is good, it can not be changed.
2. The biggest difference between an image and a virtual machine image is that the image does not contain the kernel, it contains only a program itself and all the dependencies of the program, so compared to the virtual machine image, image can be very small.
3. image has its own specification OCI Image Specification , which makes image construction and dissemination more convenient, there will be the birth of the Docker Hub .

Personally think that image is the most important place to share, the traditional download source package, and then configure, and then make (may be related to solve the problem), the final make install the process has been replaced by docker run .

Container

Container in my opinion is more like a package of all the above technology, a container has its own various namespace, the process of hanging in the cgroups file system at a certain level, have their own network card, and then container from image to create , On the image above with a layer of writable directory, based on copy-on-write mechanism, all of the changes to the container will occur in this layer above the directory. The same, I will now container and virtual machine simple comparison:

  1. Virtual machine running a complete operating system, and the container is only through the host of the kernel to run a program
  2. Changes to the virtual machine resources are usually designed to restart the virtual machine, and the container resource changes can be online operation, does not affect the operation of the program
  3. Scheduling the virtual machine is now in the minute level, and the scheduling of the container in the second
  4. Virtual machines are more secure than containers, so they are safer.

Heads up! This alert needs your attention, but it's not super important.