terça-feira, 21 de março de 2017

Apache Mesos, Overview and Architecture

Apache Mesos is a cluster manager, or distributed kernel system and use the same principle than linux kernel.

It abstract CPU, memory, storage and other physical and virtual resources, like fault tolerance and elastic distribution.

The Mesos kernel run in all machines providing applications ( Hadoop, Spark, Kafka, Elasticsearch) with APIs to manage resource and scheduling to datacenter or cloud.

It has fault tolerance of master and agents using zookeeper.

Native suport container with Docker and others images AppC(Organisation for the App Container specification, including the schema and associated tooling)

Support isolation of CPU, memory, disk, ports, GPU.
HTTP APIs to develop new distributed applications, to operate the cluster and for monitoring.


Mesos consists of a master daemon that manages agent daemons running on each cluster node and mesos frameworks that perform tasks on those agents.

The master allows the sharing of resources (CPU, RAM, ...) in structures and decides how many resources to offer each structure according to a given organizational policy, such as fair sharing or strict priority.

To support a diverse set of policies, the master employs a modular architecture that facilitates the addition of new allocation modules through a plug-in mechanism. 

A framework running on top of the Mesos consists of two components:
  •  a scheduler that registers with the master to be offered as a resource
  •  an executor process that is launched on agent nodes to perform the structure tasks.

The master determines how much feature is made available.
The scheduler determines which resource will be available

The figure below shows an example of how a structure is scheduled to perform a task. 

  1. Agent 1 reports to the master that it has 4 CPUs and 4GB of free memory. The master then invokes the allocation policy module and talks to the framework 1 that can offer its resources because they are free.
  2. The scheduler framework warns the master that it has two tasks to run in the agent and needs 2CPU and 1GB of memory and in the other task it needs 1 CPU and 2 GB of Memory.

Scheduling algorithm (Multilevel queue scheduling)

This algorithm can be used in situations where processes are divided into different groups.
Example: the division between foreground processes and background processes.
These two types of processes have different response times and requirements so you can have a different scheduling.
It is very useful for shared memory problems.


Mesos provides mechanisms to reserve resources in specific Slaves.
Two types of Reservation:
  • Static Reservation
  • Dinamic Reservation (Default) 


  • Isolate a task from other running tasks.
  • Container tasks for running in resource-limited time environment.
  • Control individual task resources (eg CPU, memory) programmatically.
  • Run the software on a pre-packaged file system image, allowing it to run in different environments.

Types of containerizers

Mesos manages to work with different types of container besides Docker, but by default Less uses its own container
Container Type supported:
  • Composing
  • Docker
  • Mesos Composing containerizer
Is the possibility of working with Docker and Mesos Container at the same time.
You can launch an image of Docker as a Task, or as an Executor.

Mesos  container 

This container allows tasks to be performed by an isolated container array provided by the Mesos

Allows mesos to control Tasks at runtime without relying on other containers.
You can have control of OS operations like cgroups / namespace
Promises to have the latest container technologies
Enables control of Disk Usage Limit
Insulation can be customized by task
High-Availability Mode

If the Master becomes unavailable, existing tasks will continue to run, but new features can not be. Allocated and new tasks can not be launched.
To reduce the possibility of this occurring, Mesos uses multiples, one active and several backups in case of failure.
Whoever coordinates the election of the new master is the Zookeper.

Mesos also use Apache Zookeeper, part of Hadoop, to synchronize distributed processes to ensure all clients receive consistent data and assure fault tolerance.

Nodes Discovery -> Is done by Zookeeper

When a network partition occurs and disconects a component (master, agent, or schedule) from the ZooKeeper, the master detects and induces a timeout.

Observability Metrics

The information reported by the mesos includes details about availability of resources, use of resources, registered frameworks,
Active agents and tasks state.
It is possible to create automated alerts and put different metrics in a dashboard.

Mesos provides two types of metrics:

Counter -> Accompanying the growth and the reduction of events

Gauges -> Represents some values of instantant magnitude

When you start a task, you can create a volume that exists outside the BOX of the task and persist even after the task is executed or completed.
Persistent Volumes

Mesos provides a mechanism to create a persistent volume of disk resources.
When the task finishes, its capabilities - including the persistent volume - can be offered back to the structure so that the structure can start the same task again, start a recovery task, or start a new task that consumes the previous task output as Your entry.
Persistent volumes allow services such as HDFS and Cassandra to store their data within the Mesos. 

The Mesos Replicated Log

Mesos provides a library that allows you to create fault-tolerant replicated logs;
This library is known as the replicated log.
The Mesos master uses this library to store cluster state in a replicated and durable way;
The library is also available for use by frameworks to store the replicated structure state or to implement the common pattern of "replicated state machine".
Replicated Log is often used to allow applications to manage the replicated state in a strong consistency. 

Mesos  Frameworks:

  • Vamp is a deployment and workflow tool for container orchestration systems, including Mesos/Marathon. It brings canary releasing, A/B testing, auto scaling and self healing through a web UI, CLI and REST API.
  • Aurora is a service scheduler that runs on top of Mesos, enabling you to run long-running services that take advantage of Mesos' scalability, fault-tolerance, and resource isolation.
  • Marathon is a private PaaS built on Mesos. It automatically handles hardware or software failures and ensures that an app is “always on”.
  • Spark is a fast and general-purpose cluster computing system which makes parallel jobs easy to write.
  • Chronos is a distributed job scheduler that supports complex job topologies. It can be used as a more fault-tolerant replacement for Cron.

Mesos offers many of the features that you would expect from a cluster manager, such as:

  • Scalability to over 10,000 nodes
  • Resource isolation for tasks through Linux Containers
  • Efficient CPU and memory-aware resource scheduling
  • Highly-available master through Apache ZooKeeper
  • Web UI for monitoring cluster state