How does Uber use Mesos? Answer: and Cassandra with the use

After the programmer finished the cake, let us start a new round of learning it!
Today's decimal for everyone to bring a Uber overseas engineer speech video interpretation, let us listen to pure English (fog) while watching PPT side to understand Uber's container technology world it 🙂

If you are a Uber company, you need to store drivers and passengers every 30 seconds to send the location information, a large number of real-time data need to be used in real time, how do you do?

Uber's solution is comprehensive. They set up a system to run Cassandra on Mesos . In a lecture Uber's software engineer explained the system very well ( … , the decimal representation is very pure in Indian English).

Today's developers always have too many tough decisions to do. Should we all be put into the cloud? Which one cloud? Expensive? Will manufacturers lock? Should we have two ways to try together and then be a hybrid architecture? Because we are concerned about the platform gross margin of less than 50%, whether we should all self-

Uber decided to build their own systems, or that they were going to combine the two very talented open source components. Let Cassandra work with Mesos, the way Uber chooses.

Uber made this decision is not difficult. They are well funded, have top talent and resources to develop, maintain and upgrade this complex system.

Since Uber's goal was determined to make 99.99% available for each person and every place of transportation, it would make sense to expand the scale while trying to control overhead.

Money is usually a good deal. It is often necessary to buy money with money. Taking into account the goal of Uber reliability, only 10,000 requests are allowed to fail, and they need to operate multiple data centers. Cassandra proved to be able to handle a lot of load and work in the data center, so the data center to help Uber made this decision.

If you want the transport system to reach every person and every place, then you need to use your resources efficiently. This is the idea behind using a data center operating system such as Mesos. By counting the same machine on the reuse service, you can lose 30% of the machine, saving a lot of money. Mesos was chosen because it was the only tool in the production environment that could be used to manage tens of thousands of clusters, which was what Uber needed, and Uber's system was really large.

What are some more interesting discoveries?

  • You can run a stateful service in a container. Uber found almost no difference, compared to running Cassandra on bare metal and running Cassandra in a container managed by Mesos, probably only 5-10% difference.
  • Performance is excellent: read delay mean: 13ms; write delay: 25ms; P99s looks good.
  • They can support the largest cluster every second with more than a million times to write and a hundred or so read.
  • Agile is more important than performance. Under this architecture Uber is agile: it is easy to create and run workloads on the cluster.


  • Static partitioning machines across different services
  • 50 machines are used for APIs, 50 for storage, and they do not overlap.

just now

  • Everything runs on Mesos, including stateful services such as Cassandra and Kafka.
  • Mesos is a data center operating system that allows your data center to become a single resource pool.
  • At the time Mesos was the only tool that could manage tens of thousands of machines, and perhaps there were other options.
  • Uber built their own Sharded database on MySQL, named Schenmaless. Cassandra and Schenmaless will be Uber's two data storage options. The existing Riak device will be moved to Cassandra.
  • A separate machine can run different types of services.
  • Static reuse services on the same machine can result in a 30% reduction in machine usage. This is an experimental discovery from the Google Borg system.
  • For example, a service that uses a lot of CPUs and a service that uses a lot of storage or memory can match well, and the two services can run efficiently on the same server, and machine utilization is improved.
  • Uber now has 20 Cassandra machines, plans to increase to 100 in the future.
  • Agility is more important than performance. You need to have the ability to manage these clusters and perform different operations on top of them in a smooth way.
  • Why is Cassandra running in a container rather than on the whole machine?
  • You want to store thousands of gigabytes of data, but you want it to copy or even across multiple data centers on multiple machines.
  • You also want to achieve resource isolation in different clusters, performance isolation.
  • It's hard to do this in a shared cluster. For example, if you create a 1000-node Cassandra cluster, it can not be large-scale, or there will be performance interference between different clusters.

Production Environment

  • There are about 20 cluster replicas between the two data centers (East Coast and West Coast).
  • There were initially four clusters, including China. But after the drop and drop, these clusters were closed.
  • There are about 300 machines in two data centers.
  • The two largest clusters: more than one million times per second and one hundred thousand times to write.
  • One of the clusters is used to store location information from the driver and passenger app every 30 seconds.
  • Average read delay: 13ms; average write delay: 25ms
  • Most use the consistency level of LOCAL_QUORUM (ie, strong consistency).


  • Mesos abstracts CPU, memory, and storage from the machine.
  • What you see is no longer a separate machine, the programming object is an entire resource pool.
  • Linear expansion. Can run thousands of machines.
  • High availability. Zookeeper is used to select leader in a configurable number of copies.
  • Dockers containers or Mesos containers can be used on containers.
  • Pluggable resource isolation. For example, Linux can use Cgroups memory and CPU isolator. There is a Posix isolator. There are different isolation mechanisms for different systems.
  • Secondary scheduling. Resources from the Mesos agent are provided to different frameworks. The Framework dispatches their tasks on top of it.

Apache Cassandra

  • Cassandra is very suitable for Uber's use cases.
  • Horizontal expansion. Read and write scale increases linearly with nodes
  • Highly available. Fault tolerance has an adjustable level of consistency.
  • Low latency. Maintain millisecond delay in the same data center.
  • easy to use. It is an isomorphic cluster. No master There are no special nodes in the cluster.
  • A variety of data models. It has column, compositekey, counter, secondary index and other models.
  • And other open source software has a very good integration. Cassandra and Hadoop, Spark, Hive are connected.



  • Uber and Mesosphere to build a mesosphere / dcos-cassandra-service – an automated service can be easily deployed and managed.
  • At the top is WebInterface or the ControlPlane API. You just need to specify how many nodes, how many CPUs are needed, specify the Cassandra configuration, and then submit to the Control Plane API.
  • In Uber use the deployment system, began to run in the state of the Aurora running the above, you can start from the dcos-cassandra-service framework.
  • In the example, the dcos-cassandra-serviceframework has two clusters and the Mesos master dialogue. Uber uses five Mesos master in their system. Zookeeper used for leader election.
  • Zookeeper is also used to store frame metadata: which task is running, Cassandra configuration, cluster health, and so on.
  • The Mesos agent runs on each machine in the cluster. The Agent provides resources for the Mesos master, and the master distributes them discretely. Distribution can be accepted by the framework can also be rejected. Multiple Cassandra nodes can also run on the same machine.
  • Use the Mesos Container instead of the Docker.
  • In the configuration override 5 ports (storage_port, ssl_storage_port, native_transport_port, rpcs_port, jmx_port), so multiple containers can run on the same machine.
  • Used persistent volume, so the data is stored outside the sandbox directory. If Cassandra hangs, the data is still available in the persistent volume, and after the reboot, the same task can be provided.
  • Dynamic reservations are used to ensure that the task is restarted after the resource is available.
  • Cassandra service operation
  • Cassandra has a seed node concept, when the new node joins the cluster from the gossip process. Create a custom seed provider to start the Cassandra node so that the Cassandra node can automatically roll out the Mesos cluster.
  • The number of nodes in the Cassandra cluster can be increased using a REST request. It will start additional nodes, give it seed nodes, and since the opening of the additional Cassandra daemons.
  • All Cassandra configuration parameters can be changed.
  • Using the API, a hanging node can be replaced.
  • Synchronization data between replicas is needed to be fixed. The approximate range of repairs is based on a single node. It does not affect performance.
  • There is no need to clean up the removal of data. If the node is added, the data is moved to the new node, and the cleanup is used to delete the moved data.
  • Multiple data center replication is configured through the framework.
  • Multi-data center support
  • Set up Mesos standalone installation in each data center.
  • Set up a single instance of the Framework in each data center.
  • Framework talk to each other and exchange seed on a regular basis.
  • These are what Cassandra needs. By opening the seed from other data centers, the nodes can gossip the topology and indicate what these nodes are.
  • The ping delay between data centers is 77.8ms.
  • P50 asynchronous replication delay: 44.69ms; P95: 46.38ms; P99: 47.44 ms.
  • Scheduled execution
  • Scheduling execution is abstracted as a plan, a phase, and a block. A scheduling program has a different stage, a stage and a number of blocks.
  • In the first phase, when a reconciliation occurs in progress, it will go to Mesos and indicate which is running.
  • There is a deployment phase to check if the number of nodes in the configuration already exists in the cluster and deploy them if necessary.
  • A block is equivalent to a Cassandra node specification.
  • There are other stages: backup, recovery, cleanup, and fix, depending on which REST endpoint touches.
  • The cluster can be started at the speed of a new node per minute.
  • Each node starts up to 30 seconds.
  • Cassandra can not start multiple nodes at the same time.
  • Usually to each Mesos node 2TB of hard disk space and 128GB of memory. Each container is assigned 100GB, 32GB to each Cassandra process (the data is not entirely accurate).
  • The G1garbage collector is used to replace the CMS without any tuning in the case where it has better delay and performance.

Article Source: High Scalability Copyright owned by the original author
Http:// … .html

    Heads up! This alert needs your attention, but it's not super important.