Docker 's Network Solution

[Editor's Note] This is mainly to talk to you about Docker's network program, the first is the existing container network program introduced, the next focus on the characteristics and technical points of Calico, as an extension and contrast to introduce the characteristics of Contiv, Compare test results.

With the fiery development of the container, several people more and more customers on the container network characteristics of the requirements began to be higher and higher, such as:

  • A container-IP
  • Multi – host container interconnection;
  • Network isolation
  • ACL;
  • Docking SDN and so on.

The main talk about Docker's network program, the first is the existing container network program introduction, the next focus on the characteristics of Calico and technical point, as the extension and contrast to introduce the characteristics of Contiv, and finally give a comparative test results.

Existing major Docker network solutions

First of all, a brief introduction to the existing container network program, the Internet also saw a lot of contrast, we are based on the way to achieve the following points, the following two programs cited by the author Peng Zhefu in DockOne to share :

Tunneling scheme

Through the tunnel, or Overlay Networking way:

  • Weave, UDP broadcast, the machine to establish a new BR, through PCAP interoperability.
  • Open vSwitch (OVS), based on VxLAN and GRE protocol, but the performance of the loss is more serious.
  • Flannel, UDP broadcast, VxLan.

Tunneling scheme in the IaaS layer of the network is also more applications, we consensus is that with the size of the node will increase the complexity of the increase, but also out of the network problems are too cumbersome, large-scale cluster case this is a point to consider.

Routing scheme

There is another way is to achieve through routing, more typical representatives are:

  • Calico, BGP-based routing scheme, support very detailed ACL control, the hybrid cloud affinity is relatively high.
  • Macvlan, from the logic and Kernel layer to see the isolation and performance of the best solution, based on the isolation, so the need for Layer 2 router support, most cloud service providers do not support, so the hybrid cloud is more difficult to achieve.

Routing scheme is generally from 3 or 2 layers to achieve isolation and interoperability between the host container, the problem is also very easy to troubleshoot.

I think Docker 1.9 later discuss container network program, not only depends on the realization of the way, but also depends on the network model of the "team", for example, in the end you want to use Docker native "CNM", or CoreOS, Google main push "CNI "

Docker Libnetwork Container Network Model (CNM) camp

  • Docker Swarm overlay
  • Macvlan & IP network drivers
  • Calico
  • Contiv (from Cisco)

Docker Libnetwork advantage is native, and Docker container life cycle closely; shortcomings can also be understood as being native, was Docker "kidnapped."

Container Network Interface (CNI) camp

  • Kubernetes
  • Weave
  • Macvlan
  • Flannel
  • Calico
  • Contiv
  • Mesos CNI

CNI's advantage is compatible with other container technology (eg rkt) and upper-level system (Kuberneres & Mesos), and the community active momentum, Kubernetes plus CoreOS main push; shortcomings are non-Docker native.

And from the above can also be seen, there are some third-party network program is "foot two boats", and I personally think that this state is also reasonable thing, but the long term there is a risk, or Be eliminated, or acquired.

Calico

The next step focuses on Calico, because it plays a more important role in CNM and CNI camps. That has a decent performance, providing a good isolation, but also a good ACL control.

Calico is a pure 3-layer data center network solution, and seamless integration like OpenStack this IaaS cloud architecture, can provide controllable VM, container, bare metal communication between.

By compressing the entire Internet's Extensible IP network principles to the data center level, Calico uses Linux Kernel at every compute node to implement an efficient vRouter to handle data forwarding, and each vRouter is responsible for putting the workload on itself by the BGP protocol. Routing information throughout the Calico network within the spread – small-scale deployment can be directly interconnected, large-scale can be specified by the BGP route reflector to complete.

This ensures that all data traffic between all workloads is interconnected by IP routing.
1.png
Calico node networking can directly use the data center network structure (whether L2 or L3), no additional NAT, tunnel or Overlay Network.
2.png
As shown above, this ensures that the program is simple and controllable, and there is no packet unpacking, saving CPU computing resources at the same time, improve the performance of the entire network.

In addition, Calico provides a rich and flexible network policy based on iptables to ensure that multi-tenant isolation, security groups, and other reachability limits for Workload are provided through ACLs on each node.

Calico architecture

3.png
In conjunction with this picture, we have come over Calico's core components:

  • Felix, Calico Agent, running on each node that needs to run Workload, is mainly responsible for configuring routing and ACLs and other information to ensure Endpoint connectivity;
  • Etcd, distributed key value storage, is mainly responsible for network metadata consistency, to ensure the accuracy of Calico network state;
  • BGP Client (BIRD), is responsible for the Felix write Kernel routing information distributed to the current Calico network to ensure the effectiveness of communication between the Workload;
  • BGP Route Reflector (BIRD) is used in large-scale deployment to eliminate the interconnection of all nodes. The route is distributed through one or more BGP Route Reflectors.

Calico Docker Network Core Concepts

From here we will "stand" CNM, through the Calico Docker libnetwork plugin way to experience and discuss the Calico container network program.

Let's take a look at the CNM model:
4.jpg
From the above figure we can see that CNM is based on three main concepts:

  • Sandbox, including the container network stack configuration, including Interface, routing table and DNS configuration, the corresponding implementation such as: Linux Network Namespace; a Sandbox can contain multiple networks;
  • Endpoint, as Sandbox access Network media, the corresponding implementation such as: veth pair, TAP; an Endpoint can only belong to a Network, can only belong to a Sandbox;
  • Network, a group can communicate with each other Endpoints; corresponding implementation such as: Linux bridge, VLAN; Network has a large number of Endpoint resources;

In addition, CNM also needs to rely on two other key objects to complete Docker's network management functions, they are:

  • NetworkController, external distribution and management of the network to provide APIs, Docker Libnetwork support multiple active network driver, NetworkController allows binding to a specific driver to the specified network;
  • Driver, network driver is not directly interactive to the user, it is through plug-in access to provide the ultimate network function to achieve; Driver (including IPAM) is responsible for a network management, including resource allocation and recycling.

With these key concepts and objects, with the Docker life cycle, through the APIs will be able to complete the management of the container network function, the specific steps and implementation details are not discussed here, interested can move Github: https: // Github.com/docker/libn … gn.md.

Then introduce the two Calico concepts:

  • Pool, defines the range of IP resources available for the Docker Network, such as: 10.0.0.0/8 or 192.168.0.0/16;
  • Profile, which defines the collection of Docker Network Policy, consisting of tags and rules. Each Profile has a Tag with the same name as the profile. Each profile can have multiple tags and is saved as a List.

Profile Example:

  Inbound rules: 
1 allow from tag WEB
2 allow tcp to ports 80,443
Outbound rules:
1 allow

Demo

Based on the above architecture and core concepts, we have a simple example, intuitive feel under Calico's network management.

Calico to test for the purpose of building clusters, the steps are very simple, not shown here, we can directly refer to Github: https://github.com/projectcali … ME.md.

There is already a cluster of Calico networks by default, IP is 192.168.99.102 and 192.168.99.103.

Calicoctl status screenshot:
4.png
At the same time, there are already two IP Pools created, namely: 10.0.0.0/26 and 192.168.0.0/16.

Calicoctl pool show screenshot
5.png
The current cluster has also created a different Docker Network by using the calico driver and IPAM. This demo requires only DataMan.

Docker Network ls screenshot:
6.png
Calicoctl profile show screenshot:
7.png
Here we use DataMan this network, in the two slave machine to start a container:

Marathon json file:

  {"Id": "/ nginx-calico", 
"Cmd": null,
"Cpus": 0.1,
"Mem": 64,
"Disk": 0,
"Instances": 2,
"Container": {
"Type": "DOCKER",
"Volumes": [],
"Docker": {
"Image": "nginx",
"Network": "HOST",
"Privileged": false,
"Parameters": [
{
"Key": "net",
"Value": "dataman"
}
],
"ForcePullImage": false
}
},
"PortDefinitions": [
{
"Port": 10000,
"Protocol": "tcp",
"Labels": {}
}
]
}

Two slave containers IP screenshots:
8.png
9.png
Can be seen from the figure, the two slave on the container IP are: slave 10.0.0.48, slave2 192.168.115.193.

Slave container connectivity test screenshots:
10.png
11.png

IP routing implementation

12.png
Based on the above Calico data plane concept map, combined with our example, we look at how Calico cross-host interoperability:

Two slave route screenshot:
13.png
14.png
For the two slave's routing tables, we know that if the container on slave 1 (10.0.0.48) wants to send the data to the slave on slave 2 (192.168.115.193), it will match the last route rule, The packet forwarded to slave 2 (192.168.99.103), then the entire data stream is:

  Container -> kernel -> (cali2f0e) slave 1 -> one or more hops -> (192.168.99.103) slave 2 -> kernel -> (cali7d73) container 

In this way, the communication between the hosts is established, and there is no NAT, tunneling, and packets in the entire data stream.

Security policy ACL

15.png
Calico's ACLs Profile mainly rely on iptables and ipset to complete, provided for each container level can be defined for the rules.

Specific implementation We can use the iptables command to view the corresponding chain and filter rules, here we will not discuss.

Contiv

Http://contiv.github.io

Contiv is Cisco's open source for the container infrastructure, the main function is to provide policy-based network and storage management, is a new service for the micro-frame.

Contiv can be integrated with mainstream container layout systems, including: Docker Swarm, Kubernetes, Mesos and Nomad.
16.png
As shown in the above figure, Contiv's "seductive" point is that its network management capabilities, both L2 (VLAN), L3 (BGP), and Overlay (VxLAN), but also docking Cisco's own SDN products ACI The It can be said that it can ignore the underlying network infrastructure, to the upper container to provide a consistent virtual network.

Contiv Netplugin feature

  • Multi-tenant network mixed on the same host;
  • Integration of existing SDN solutions;
  • Able to cooperate with non-container environment compatible, do not rely on the specific details of the physical network;
  • Real-time container network Policy / ACL / QoS rules.

Network program performance comparison test

Finally, we use the simple performance test results made by qperf, we selected vm-to-vm, host, calico-bgp, calico-ipip and Swarm Overlay were compared.

Contiv is actually tested, but because Contiv's test environment and other programs use different, do not have direct comparability, so there is no comparison into the map. Intuitive experience is based on the OVS program is really not ideal, followed by time will continue to conduct a unified environment for testing, including Contiv L3 / L2 program and the introduction of MacVLAN, Flannel and so on.

Test environment: VirtualBox VMs, OS: Centos 7.2, kernel 3.10,2 vCPU, 2G Mem

The bandwidth comparison results are as follows:
17.png
The delay results are as follows:
18.png
Qperf command:

 # Server side 
$ Qperf
# Client side
# 10s for 64k packets, tcp_bw for bandwidth, tcp_lat for delay
$ Qperf -m 64K -t 10 192.168.2.10 tcp_bw tcp_lat
# Increases the packet size from 1 to the index
$ Qperf -oo msg_size: 1: 64k: * 2 192.168.2.10 tcp_bw tcp_lat

to sum up

With the landing of the container, the network program will become a "military" battle, we are several people cloudy PaaS container cluster management cloud platform, in the network program options on the consideration is:

  • Performance, Calico and MacVLAN have a good performance; Contiv L3 (BGP) on the theoretical performance will not be poor;
  • Universality, Calico's underlying requirement is that the IP layer is reachable; MacVLAN does not fit in the public cloud; Contiv may be better for the underlying compatibility;
  • Programmability, to ensure that the entire network management process can be programmed, API, so easy to integrate into the product, do not need manual operation and maintenance; Calico and Contiv have the appropriate module;
  • The future development, this is what I said "stand", Docker CNM and CoreOS, Kubernetes CNI, now also can not see, Calico and Contiv are supported;

In summary, the individual recommended attention and try under the Calico or Contiv as a container network program, there are problems or gains, please feel free to share.

Q & A

Q: From the Marathon json file you made, do you have an extension of Mesos's underlying network? Is the container network passed by "parameters"?

A: is the use of Marathon support Docker parameters issued, equivalent to docker run -net dataman xxx.

Q: Felix configures routes according to the configured pool address segment. BGP Client advertises routes? Will bgp neighbor relationship is how to establish? Is it dependent on the information in etcd?

A: Yes, Felix configures local Kernel routing, BGP Client is responsible for distributing routing information. BGP small-scale deployment adopts mesh mode establishment, all nodes are connected with n ^ 2, and large-scale deployment is recommended to deploy one or more BGP route reflector. Responsible for routing information distribution.

Q: How is the routing information in the DataMan network updated? We know that BGP can find their own convergence network, then the network control level is how to design?

A: With the addition or deletion of a container using DatMan's network, Felix is ​​responsible for updating the local route, and the BGP client is responsible for distributing it. Calico's BGP and WAN are still different. It uses only private AS, Control level can be programmed to control.

Q: How does Calico solve the problem of address conflict in multi-tenant?

A: multi-tenant containers mixed on the same host, the network used is best not to use the same pool in the like, so there is no conflict.

Q: If there are many containers in the cluster, then the corresponding rules of the routing table will increase, this will not affect the network performance?

A: network performance will not have much impact, because the container multi-flow network pressure is already large, it would increase the system load, after all, to configure the routing rules, synchronization network information; this part we are very concerned, but not specific Test Results.

Q: Is there routing information is stored in etcd? How to ensure that the routing table read performance and maintenance of routing table information updates?

A: the effective routing is certainly local, Endpoint and other information is in etcd; these two issues are Calico itself to solve the problem, this part we are also very concerned about, but there is no specific test results;

Q: Calico under the situation of multiple routing hop, the middle routing does not need to learn so many container routing table?

A: No, the middle process is dependent on the original network between the nodes, as long as the two nodes interchange, all the hops do not need to know Calico internal container routing information.

Q: There is a problem, that is, network management issues. Currently from the equipment manufacturers to see a few vendors to support VXLAN controller. Can these controllers cooperate with the container OVS? At present, we are here to buy foreign products is still more difficult.

A: This is definitely a problem, I think as a platform side, we can do very little, if you want to change, it is necessary to SDN equipment manufacturers force, as if everyone will go for the OpenStack adapter to provide the same driver, but Also need Docker or container scheduling system can be done as OpenStack industry standard level.

Q: Can Calico integrate with Flannel, where Calico and Flannel are present in the Kubelet on CoreOS?

A: Calico in DockerCon 2016 should be a high-profile hugging Flannel, the two integrated out of a project called canal, interested can go looking for.

Q: I would like to ask the teacher, you VXLAN label is Calico play right?

A: Calico no VXLAN, Calico's tag will correspond to ipset, iptables used to match filter rules.

The above content according to the evening of June 30, 2016 micro-credit group to share content. Share people Chu Xiangyang, several people cloud research and development engineers. Open source and OpenStack were early in the Red Hat PnT (formerly HSS) department responsible for Red Hat internal tool chain development and maintenance work. Is responsible for the development of several people cloud, Docker, Mesos have studied, familiar with and love cloud computing, distributed, SDN and other related technologies. DockOne organizes targeted technical sharing every week, and welcomes interested students. We are interested in the topic of liyoujiesz, who you want to hear or want to share.

    Heads up! This alert needs your attention, but it's not super important.