The past life of cloud computing (on)

Author brief introduction: Liu Chao, Netease cloud computing solution chief architect. 10 years of experience in R & D and architecture in cloud computing, Open DC / OS contributors. Long-term focus on kubernetes, OpenStack, Hadoop, Docker, Lucene, Mesos and other open source software enterprise applications and product. Has published "Lucene application development secret".

The following is the text:

Overview of cloud computing

Cloud computing mainly to solve the four aspects of the content: computing, network, storage, application.

Computing is the CPU and memory, such as "1 +1" the simplest algorithm is to "1" on the memory inside, and then add the CPU, the return of the results "2" and save the memory inside. Network is your root cable can access the Internet. Storage is your next movie there is a place to put. This discussion is about these four parts. Among them, the calculation, network, storage three are IaaS level, the application is PaaS level.

Cloud computing development

Cloud computing throughout the development process, with a word to describe, that is, "a long time will be together, for a long time will be points."


The first stage: together, that is, physical equipment

Introduction to physical devices

In the early stages of the development of the Internet, we all love physical equipment:

  1. Server with physical machines, such as Dell, HP, IBM, Lenovo and other physical servers, with the progress of hardware devices, more and more powerful physical server, 64-core 128G memory are considered common configuration;
  2. The network is using hardware switches and routers, such as Cisco, Huawei, from 1GE to 10GE, now has 40GE and 100GE, bandwidth more and more cattle;
  3. Some of the storage with ordinary disk, there are faster SSD disk. Capacity from M, to G, even the laptop can be configured to T, let alone disk array;

Disadvantages of physical equipment

Deployment of applications directly using the physical machine, looks so cool, a kind of local tyrant feeling, there are great shortcomings:

  1. Artificial operation and maintenance. If you install a software on a server, the system is installed bad, how do? Only heavy equipment. When you want to configure the parameters of the switch, you need to connect the serial port to configure; when you want to add a disk, to buy a plug into the server, which require manual, and very likely to require the engine room. Your company in the North Fifth Ring, the engine room in the South Sixth Ring, this sour cool.
  2. a waste of resource. In fact, you just want to deploy a small site, but use 128G of memory. Mixed deployment, there is isolation of the problem.
  3. Poor isolation. You put a lot of applications deployed in the same physical machine, grab the memory between them, grab cpu, a full of hard disk, and the other can not be used, one asked the kernel, the other also followed the hanging, if deployed Two identical applications, the port will conflict, at every turn will be wrong.

The second stage: points, that is, virtualization

Introduction to virtualization

Because of the above shortcomings of physical equipment, there is the first "together for a long time" process, called virtualization. The so-called virtualization, is the real into the virtual:
1. The physical machine becomes a virtual machine. Cpu is virtual, the memory is virtual, the kernel is virtual, the hard disk is virtual;
2. The physical switch becomes a virtual switch. NIC is virtual, the switch is virtual, the bandwidth is also virtual;
3. Physical storage becomes virtual storage. Multiple hard disk virtual into a large block;

Virtualization solves the problem

Virtualization solves the three problems that exist in the physical device phase:
Manual operation and maintenance. Virtual machine creation and deletion can be remote operation, the virtual machine was playing bad, delete and then build a minute level. Virtual network configuration can also be remote operation, create a network card, the allocation of bandwidth will be able to call the interface;
2. waste of resources. After the virtualization, the resources can be allocated very small, such as a cpu, 1G memory, 1M bandwidth, 1G hard drive, can be virtual out;
3. Isolation poor. Each virtual machine has a separate cpu, memory, hard disk, network card, the application between different virtual machines do not interfere with each other;

The Ecology of Virtual Age

In the virtualization phase, the leader is Vmware, which enables basic computing, networking, and storage virtualization.
As the world has closed source there are open source, there are windows linux, Apple has Android, there are Vmware, there are Xen and KVM.

In the open source virtualization, Xen Citrix did a good job, then Redhat in KVM force a lot; for network virtualization, Openvswitch, you can create a bridge through the command, network card, set the VLAN, set the bandwidth; for storage virtualization, The local site has LVM, can be a number of hard disk into a large disk, and then cut out a small piece of the user to the inside.

The shortcomings of virtualization

But virtualization is also flawed. Through the virtualization software to create a virtual machine, you need to manually specify the machine on which the hard disk on which storage device, the network VLAN ID, bandwidth, specific configuration, etc., need to manually specify. So only the use of virtualization operation and maintenance engineers often have an Excel table, record the number of physical machines, each machine deployed what virtual machine. By this limit, the number of clusters that are generally virtualized is not particularly large.

The third stage: the cloud, that is, cloud computing

Cloud computing to solve the problem

In order to solve the problem of virtualization stage left, so there will be a long process. This process we can vaguely call the pool.
Virtualization of the resources are very small, but so subdivided resources by Excel to manage, the cost is too high. Pooling is the resource into a large pool, when the need for resources to help users automatically select, rather than user specified. The key to this phase: the scheduler.

Private cloud, public cloud polarization

In this way, Vmware has its own Vcloud; also has a private cloud platform based on Xen and KVM CloudStack (later Citrix will acquire its open source).

When these private cloud platforms sell extremely expensive in the user's data center, and other companies are starting to make another choice. This is AWS and Google, they began to explore the field of public cloud.

AWS was originally based on Xen technology for virtualization, and eventually formed a public cloud platform. Perhaps AWS initially just do not want to make their own business profits all the profits to the private cloud manufacturers it, so the first cloud platform to support their own business. In this process, AWS seriously used its own cloud computing platform, making the public cloud platform is not more user-friendly configuration, but the deployment of the application more friendly, and ultimately shine.

The connection and difference between private cloud manufacturers and public cloud manufacturers

If you look closely, you will find that the private cloud and the public cloud, although using a similar technology, but in product design is completely different from the two creatures.

Private cloud vendors and public cloud vendors also have similar technologies, but show a completely different gene in product operations.

Private cloud vendors are selling resources, so often when selling private cloud platform with the sale of computing, network, storage devices. In product design, private cloud vendors tend to emphasize to customers the almost no use of computing, network, storage of technical parameters, because these parameters can be in the process of the pros and cons and the industry advantage. Private cloud manufacturers almost do not have their own large-scale application, so the private cloud vendor's platform to do is for others to use, they will not be large-scale use, so the product is often around the resources to start, and not the application of friendly deployment.

Public cloud vendors often have their own large-scale application needs to be deployed, so the product design can be a common application deployment module needed as a component provided, the user can be like a building block, like a patch for their own application architecture The Public cloud manufacturers do not have to worry about a variety of technical parameters of the PK, do not care whether open source, is compatible with a variety of virtualization platform, is compatible with a variety of server equipment, network equipment, storage devices. You manage what I use, the customer deployment is easy to use.

Public cloud ecology and the second son of the anti – attack

The first of the public cloud AWS live naturally cool, as the second Rackspace is not so comfortable.

Yes, the Internet industry is basically a dominance, that the second how to counterattack it? Open source is a good way to let the whole industry together for the cloud platform to contribute. So Rackspace and NASA (NASA) co-founded the open source cloud platform OpenStack.

OpenStack is now a bit like the development of AWS, so from the OpenStack module can be seen cloud computing pooling method.

OpenStack components

1. Compute the pooling module Nova: OpenStack computing virtualization mainly use KVM, but in which the physical machine in which virtual machine, it depends on nova-scheduler;
2. Network pooling module Neutron: OpenStack network virtualization mainly use Openvswitch, but for each of the Openvswitch virtual network, virtual network card, VLAN, bandwidth configuration, do not need to log on to the cluster configuration, Neutron can be carried out by SDN Configuration;
3. Storage pooling module Cinder: OpenStack storage virtualization, if you use the local disk, based on LVM, which LVM on the allocation of the disk, but also through the scheduler. Later, there will be more than one machine hard disk into a pool of the way Ceph, and the scheduling process, the completion of the Ceph layer.

OpenStack brings the private cloud market to the Red Sea

With OpenStack, all the private cloud vendors are crazy, the original VMware in the private cloud market earn too much, helplessly watching, there is no corresponding platform can compete with him. Now with the existing framework, coupled with their own hardware equipment, almost all IT manufacturers giants, all joined the community, OpenStack will be developed for their own products, together with the hardware equipment, into the private cloud market.

Public or private? NetEase choice

Netease, of course, did not miss this outlet, on the line of their own OpenStack cluster, NetEase hive based on OpenStack independent research and development of the IaaS service, in the calculation of virtualization, by cutting KVM image, optimize the virtual machine startup process improvements to achieve the virtual machine Seconds to start. In the network virtualization, through SDN and Openvswitch technology, to achieve a high-performance visits between virtual machines. In storage virtualization, by optimizing Ceph storage, to achieve high-performance cloud disk.

But Netease did not kill into the private cloud market, but the use of OpenStack support from their own applications, which is the Internet thinking. But only the flexibility of the resource level is not enough, but also need to develop a friendly component of the application deployment. Such as database, load balancing, caching, etc., which are essential for application deployment, but also Netease in large-scale application practice, tempered. These components are called PaaS.

The fourth stage: points, that is, containers

Now to talk about the application level, that is, PaaS layer.

I've been talking about the story of the IaaS layer, that is, the infrastructure as a service, basically talking about computing, networking, and storage. Now should talk about the application layer, that is, PaaS layer of things.

1. Definition and function of PaaS

IaaS definition is clear, PaaS definition is not so clear. Some people put the database, load balancing, cache as PaaS service; it was the big data Hadoop ,, Spark platform as PaaS service; also someone will install the application and management, such as Puppet, Chef ,, Ansible as PaaS service.

In fact, PaaS is mainly used to manage the application layer. I will be able to use the script to help you deploy; the other part is that you think the complex general-purpose applications do not need to deploy, such as database, cache, and so on. , Large data platform, you can get a little on the cloud platform.

Or is automatically deployed, or is not deployed, in general, is the application layer you are less worry, is the role of PaaS. Of course, the best still do not have to deploy, a key to get, so the public cloud platform will be universal services are made PaaS platform. Other applications you develop yourself, except that you others will not know, so you can use tools to become automatic deployment.

2. Advantages of PaaS

PaaS biggest advantage is that you can achieve the elasticity of the application layer. For example, in the double eleven period, 10 nodes to become 100 nodes, if the use of physical equipment, buy 90 machines certainly too late, only IaaS realize the flexibility of resources is not enough, and then create 90 virtual machines, is also empty , Or the need for operation and maintenance personnel to deploy one by one. So with PaaS just fine, a virtual machine starts, immediately run the automatic deployment of the script, the application of the installation, 90 machines automatically installed the application, is the real flexibility of stretching.

3. PaaS deployment issues

Of course, this deployment also has a problem, that is, regardless of Puppet, Chef, Ansible to install the script abstract no matter how good, in the final analysis is also based on the script, but the application of the environment vary widely. Differences in file paths, differences in file permissions, differences in dependency packages, differences in application environments, differences in software versions such as Tomcat, PHP, Apache, JDK, Python, etc., whether or not some system software is installed, Port, may cause the script to perform unsuccessful. So it seems that once the script is written, it can be quickly copied, but the environment changes slightly, you need to script a new round of modification, testing, joint tone. For example, in the data center to write the script moved to the AWS is not necessarily directly able to use, in the AWS on the good tune, migrate to the Google Cloud may also be a problem.

The birth of the container

1. Definition of container

So the container came into being. Container is Container, Container Another means that the container, in fact, the idea of ​​the container is to become a software delivery of the container. The characteristics of the container, one is packaged, the second is the standard. Imagine the era of no container, if the goods from A to B, the middle to go through three terminals, for three times the ship, the goods every time to unload the boat, put the tide, and then change the time, need Re-placed neatly, in the absence of containers, the crew need to stay on the shore a few days and then go. And in the possession of the container, all the goods are packaged together, and the size of the container all the same, so each time the ship, the whole box can be moved in the past, the hour level can be completed, the crew no longer Waiting for a long time ashore.

2. Application of container in development

Imagine A is the programmer, B is the user, the goods is the code and the operating environment, the middle of the three terminals are development, testing, on-line.
Assume that the code is running as follows:
1. Ubuntu operating system
2. Create user hadoop
3. download and extract JDK 1.7 in a directory
4. Add this directory to JAVA_HOME and PATH environment variables inside
5. Place the environment variable in the .bashrc file in the home directory of the hadoop user
6. Download and unzip tomcat 7
7. Put the war into the tomapp webapp path below
8. Modify the startup parameters of tomcat, the Java Heap Size set to 1024M

Look, a simple Java site, you need to consider so many scattered things, if not packaged, you need to develop, test, production of each environment to view, to ensure that the environment is consistent, or even to re-environment Build it again, just like every time the goods broke up the same heavy trouble. The middle of a little difference pool, such as the development environment with JDK 1.8, and the line is JDK 1.7; such as the development environment with the root user, the need to use hadoop users online, may lead to the operation of the program failed.

The fifth stage, unfinished, to be continued ~ ~
Know how the container is packed with the application, and listen to the next decomposition.

    Heads up! This alert needs your attention, but it's not super important.