Use the Kubernets Pet Set to deploy thousands of Cassandra instances

[Editor's Note] This article is an article in the Kubernetes 1.3 Day 5 , which focuses on Pet Set and the use of Pet Set to deploy thousands of Cassandra instances to verify the availability of Cassandra clusters by simulating the race data between ancient Greek monsters The Translator at the end of a brief introduction to the Pet Set, do not understand the students can first look at (mainly because the translator did not understand, welcome familiar with the students criticized correction).

The Greek Pet Monster Races (The Greek Pet Monster Races)

With the release of Kubernetes 1.3 , we want to speed up the development of the new feature Pet Set. By testing thousands of Cassandra instances, we determined that Kubernetes 1.3 was already in the production environment. The next step is to show how we build our largest Cassandra cluster through Kubernetes.

Using a container to deploy a basic stateful application is relatively easy. To mount a disk in a pod with a persistent volume, you can ensure that you keep the data after the pod lifecycle. However, it is difficult to deploy a distributed stateful application, but the Pet Set in Kubernetes 1.3 solves this difficulty. In order to test Pet Set on a large scale, we will carry out the "Greek Greek Monster Races" (The Greek Pet Monster Races). We let Centaurs and hundreds of thousands of matches in other regions of the ancient Greek monster.

Note: The so-called game is only the author through the simulation process to produce some time series random number, and then the data stored in Cassandra, the explanation can refer to gpmr .

As we all know, Kubernetes from the ancient Greek mythology of the κυβε νήτης, the meaning of: helmsman (helmsman, steersman), pilot (pilot) or ship master (ship master). In order to track the results of the match, we need a data warehouse, we choose to use Cassandra. In ancient Greek mythology, Cassandra (Κασσάνδρα) is the daughter of the king of the king of Troy, Priam and Hecuba. Since both Kubernetes and Cassandra are related to ancient Greek culture, we intend to conduct an racing game between ancient Greek monsters.

The above story is a bit biased Cassandra, in fact, in this article Cassandra is deployed in the Pet application, then we will introduce Pet Set.

Pet Set is one of the exciting new features in the Kubernetes version 1.3. In Kubernetes, in order to organize and manage container deployments, you can use different deployment mechanisms such as Resource Controllers and Daemon Set. Pet Sets is a new feature in Kubernetes that divides container deployments into multiple Pet and ensures that each Pet has a unique identity that includes DNS domain names, consistent storage, and sequential pod indexes. Prior to this, deploying with Deployments and Replication Controllers would only assign an uncoupled weak identity to the application. Weak identity is more suitable for micro-service applications, such applications do not care pod name, the focus is on service discovery, and these applications are stateless. However, there are many software applications that require strong identity, such as a variety of different types of distributed stateful systems. Cassandra is a good example, it needs a consistent network identity and fixed storage.

Pet Sets provides the following features:

  • There is a fixed hostname in DNS, the Pet hostname in the same Pet Set is based on the Pet Set name plus a sequential number starting from 0, such as cassandra-0.
  • Sequential indexes, such as 0,1,2,3.
  • Link to the Pet sequence and the fixed storage of hostname.
  • Through the DNS found companion, before the creation of Pet, the companion name is known.
  • Order to start and destroy Pet, by Pet number, the next Pet that needs to be created is known, and which Pet is destroyed when the Pet Set size is reduced is also known. This feature is useful for managing tasks such as extracting data from a Pet when reducing cluster size.

If your application has the above requirements, you can consider using Pet Set for deployment. Let's give an example of the image, assuming that you have a pet set (Pet Set) by pet dog (Pet dog) composition. You have a white, brown or black pet dog, and then the brown pet dog suddenly fled, and when you use another brown dog to replace the original one, no one will find; if you use a white pet dog to replace, Someone will be aware of it. Pet Set allows you to run an application in Pet that holds a unique identity.

Example of application using Pet Set:

  • Cluster software, such as Cassandra, Zookeeper, etcd, or resilient software that requires inherent relationships.
  • Database software, such as MySQL or PostgreSQL, which requires a single instance to hang on a persistent volume at any time.

It is only advisable to use Pet Set when your application needs some of the properties described above because it is easier to manage stateless pods.

Let us return to the game!

As just described, using Pet Set to deploy Cassandra is a perfect example. A Pet Set is very similar to the Replica Controller , but with some extra features. The following is an example of a YAML file:

 # Headless service to provide DNS lookup 
ApiVersion: v1
Kind: Service
Metadata:
Labels:
App: cassandra
Name: cassandra
Spec:
ClusterIP: None
Ports:
- port: 9042
Selector:
App: cassandra-data
A
# New API name
ApiVersion: "apps / v1alpha1"
Kind: PetSet
Metadata:
Name: cassandra
Spec:
ServiceName: cassandra
# Replicas are the same as used by Replication Controllers
# Except pets are deployed in order 0, 1, 2, 3, etc
Replicas: 5
Template:
Metadata:
Annotations
Pod.alpha.kubernetes.io/initialized: "true"
Labels:
App: cassandra-data
Spec:
# Just as other component in Kubernetes one
# Or more containers are deployed
Containers:
- name: cassandra
Image: "cassandra-debian: v1.1"
ImagePullPolicy: Always
Ports:
- containerPort: 7000
Name: intra-node
- containerPort: 7199
Name: jmx
- containerPort: 9042
Name: cql
Resources:
Limits:
Cpu: "4"
Memory: 11Gi
Requests:
Cpu: "4"
Memory: 11Gi
SecurityContext:
Privileged: true
Env:
- name: MAX_HEAP_SIZE
Value: 8192M
- name: HEAP_NEWSIZE
Value: 2048M
# This is relying on guaranteed network identity of Pet Sets, we
# Will know the name of the Pets / Pod before they are created
- name: CASSANDRA_SEEDS
Value: "cassandra-0.cassandra.default.svc.cluster.local, cassandra-1.cassandra.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
Value: "OneKDemo"
- name: CASSANDRA_DC
Value: "DC1-Data"
- name: CASSANDRA_RACK
Value: "OneKDemo-Rack1-Data"
- name: CASSANDRA_AUTO_BOOTSTRAP
Value: "false"
# This variable is used by the read-probe looking
# For the IP Address in a `nodetool status` command
- name: POD_IP
ValueFrom:
FieldRef:
FieldPath: status.podIP
ReadinessProbe:
Exec:
Command:
- / bin / bash
- -c
- /ready-probe.sh
InitialDelaySeconds: 15
TimeoutSeconds: 5
# These volume mounts are persistent. They are like inline claims,
# But not exactly because the names need to match exactly one of
# The pet volumes.
VolumeMounts:
- name: cassandra-data
MountPath: / cassandra_data
# These are converted to volume claims by the controller
# And mounted at the paths mentioned above. Storage can be automatically
# Created for the Pets depending on the cloud environment.
VolumeClaimTemplates:
- metadata:
Name: cassandra-data
Annotations
Volume.alpha.kubernetes.io/storage-class: anything
Spec:
AccessModes: ["ReadWriteOnce"]
Resources:
Requests:
Storage: 380Gi

You will see these containers are relatively large, the general production environment Cassandra only 8 CPU and 16GB memory. You need to pay attention to two key new features, dynamic volume provisioning and Pet Set. Through the above YAML file will create five Cassandra Pet, and from the beginning of the first 0, that is, cassandra-data-0, cassandra-data-1 and so on.

In order to generate data for this game, we used another feature of Kubernetes, that is, Jobs. Use simple python code to generate a random speed for monsters every second in the race process, and then store speed, location information, winners, other data points, and metrics into Cassandra. For data visualization, we use JHipster to build an AngularJS UI interface and then use D3 for graphing.

The following is an example of Job configuration:

  ApiVersion: batch / v1 
Kind: Job
Metadata:
Name: pet-race-giants
Labels:
Name: pet-races
Spec:
Parallelism
Completions: 4
Template:
Metadata:
Name: pet-race-giants
Labels:
Name: pet-races
Spec:
Containers:
- name: pet-race-giants
Image: py3numpy-job: v1.0
Command: ["pet-race-job", --length = 100 "," --pet = Giants "," --scale = 3 "]
Resources:
Limits:
Cpu: "2"
Requests:
Cpu: "2"
RestartPolicy: Never

Since it is a monster, that cluster size can not be small. We deployed 1009 minion nodes using the Kubernetes 1.3 beta on the four areas of Google Compute Engine (GCE). We run this demo on the beta code because the 1.3 version has not been released since the demo was built. For minion nodes, a virtual machine of type "n1-standard-8" on GCE is used, which has 8 CPUs and 30GB of memory.

Ultimately, all the pet is deployed. One thousand instances were divided into two different Cassandra data centers. Cassandra's distributed architecture is particularly suited for multi-data center deployments. Typically, multiple Cassandra data centers are deployed on the same physical or virtual datacenter to separate workloads. Data is duplicated across multiple data centers, but the workloads of different data centers may be different, so the performance of the application may not be the same. Two data centers are named 'DC1-Analytics' and 'DC1-Data', each with 500 petpages deployed per data center. The data generated by batch python Jobs is stored in DC1-Data, while the JHipster UI connects to DC1-Analytics.

The final cluster size is as follows:

  • 8072 The kernel, the master node used the 24-core, and all the minion nodes used the rest of the part.
  • 1009 IP addresses.
  • 1009 routes are configured on Kupernetes on Google Cloud Platform.
  • 100510GB persistent disk, supply Minion node and master node use.
  • 380020GB SSD disk, master used 20GB, and each Cassandra Pet used 340GB.
  • 1000 Cassandra instances.

Yes, we did deploy 1000 pet, but we did not want to see things happen. When the Cassandra cluster started, there were 333 nodes or services that did not get up, or the data was lost.

1.3 version of the Pet Sets limitations

  • Pet Set is a resource with alpha attribute, and version 1.3 previous Kubernetes can not be used.
  • The storage used by Pet must be provided by a dynamic storage provisioner that is stored on a request-based basis or by an administrator.
  • Deleting Pet Set does not remove any Pet or Pet storage. You must manually remove Pet and its storage.
  • At present, all Pet Set requires a "governing service", or a service that can manage the Pet network identity. This service is the responsibility of the user.
  • At present, updating the existing Pet Set requires manual operation. You can rebuild a Pet Set using the new version of the mirror, or replace the existing mirror with one by one, and then re-add it to the cluster.

Source and reference

  • The code for the demo program can be retrieved on GitHub (the Pet Set example is merged into the Kubernetes Cassandra example).
  • For more information on Jobs, see Jobs
  • Pet Set document
  • Image used: Cassandra and Cyclops .

What is Pet Set?

Reference to Kubernetes' official website.

First, what is Pet? Pet is a stateful application, which is essentially a pod with a definite name and a unique identity, including:

  • DNS can be identified in the fixed hostname
  • Sequential index (Pet name composition: PetSetName-Ordinal)
  • Link to the fixed storage of the index with hostname

Pet Set is a Pet set, which has a specific number of Pet, which aims to decouple clustered stateful applications such as MySQL, PostgreSQL and other database applications, or clustered applications such as Zookeeper, Etcd and Elasticsearch. General clustered applications are deployed on fixed nodes, have persistent storage and static IP addresses, and in the deployment process need to establish a link between the nodes in a certain relationship. And Pet Set assigns an identity to each application instance so that the application instance does not have to be fixed on the physical infrastructure, and the instance is connected by identity.

Original links: Thousand Instances of Cassandra using Kubernetes Pet Set (translated by Xiao Yuanhao)

Heads up! This alert needs your attention, but it's not super important.