Regions, AZs, and Cells.
Estimated reading time: 13 minutes.
- Objective
-
Describe how to structure a large OpenStack cluster to address performance and reliability requirements of applications and how operators configure workloads for the topology of a cluster.
Logical and Physical Organization of OpenStack Clusters
In the previous section, we introduced how OpenStack enables users to organize their teams and applications using domains and projects. They provide logical groups that address the organizational structure of your teams and applications. At the end of the section, we introduced quotas, which link that logical structure with the physical characteristics of your cluster, such as compute capacity.
OpenStack operators need some understanding of the physical topology of their OpenStack clusters to ensure their applications run in accordance with their performance and reliability requirements. Sometimes applications need special classes of hardware, such as GPUs; sometimes it is about high-availability, ensuring applications stay operational and their end-users do not notice when a compute server fails.
The figures in this chapter do not list all user-facing OpenStack services just for simplicity. |
To address those requirements, OpenStack offers many ways of grouping compute capacity. Most of them focus on the compute nodes, and some focus on control plane services:
- Regions
-
A region is a mostly stand-alone OpenStack cluster which manages its own compute resources. Multiple clusters, in different regions, share a single pair of Keystone and Horizon services, giving Operators a unified view of those clusters as a single private cloud.
- Availability Zones (AZs)
-
An availability zone typically represents a failure domain inside an OpenStack cluster. It can be defined by a server rack, an uninterrupted power supply, a network switch, a data center building, or any other physical constraint that creates the possibility of all compute resources in the AZ becoming unavailable at the same time.
- Host Aggregates
-
A host aggregate is a group of compute nodes which share common characteristics that make them more desirable to certain workloads when compared to other groups of nodes of the same cluster. For example, because of the availability of fast local storage, GPUs, or network accelerators.
- Compute Cells
-
A compute cell is a partition internal to an OpenStack control plane. It deals with the scalability and availability of OpenStack services themselves more than with compute nodes and application workloads. OpenStack services in the same cell share common back-end services, such as databases and message queues, which can be scaled and managed in cells independent of AZs.
OpenStack Operators usually have direct visibility of only regions and AZs but not of host aggregates and cells. They can choose, for certain resource types, their region and AZ.
OpenStack Operators are impacted by host aggregates and cells indirectly, from the definition of server flavors which impact the scheduling of server instances on cluster nodes.
Regions, AZs, host aggregates, and cells can be used in different ways to structure large private clouds and the usage patterns vary considerably depending on size and number of data centers and other reasons. OpenStack is not very prescriptive of how and when to use those features, and this course shows examples of typical and recommended approaches.
Regions and AZs
Regions and Availability Zones (AZs) may be familiar to users of public cloud services, and you could map these concepts in a similar way to OpenStack clusters. Not everyone is running a global private cloud and the concepts of regions and AZs may be useful for smaller OpenStack deployments.
OpenStack Regions
A Region is just a set of OpenStack API entry points. You could run as many or as few OpenStack services in each region as you wish, but the common scenario is just running all services in all regions, except for a few shared services.
Your region does not need to represent a country nor a large geographical region, like public cloud providers do. It could be a small geographic region like city or even a data center building inside a large campus. It depends on your networking and storage infrastructure, and your need to retain the ability to keep operating a region, as an OpenStack cluster, independently of other regions, in case of disasters such as earthquakes and floods.
It is assumed that OpenStack services in the same region work together, for example Nova can create server instances using volumes from Cinder and networks from Neutron in the same region. But services from different regions would not use each other because of network latency and bandwidth constraints. Other IT infrastructure, such as storage, may not work cross-regions or may work under significant performance and reliability constraints.
It is also assumed that OpenStack services in a region are standalone and work without knowledge of services in other regions, except Keystone. So API resource names inside a region, such as names of server instances and availability zones, can be equal without creating a conflict. But, because all regions are connected to a shared Keystone, they all have the same users, domains, and projects.
One of your regions runs a shared Keystone service that enables authenticating and discovering OpenStack services in all regions, so it effectively looks like a single private cloud. If you use Horizon, there’s no need to run it in multiple regions. Of course, there needs to be a special disaster recovery process for that region and its shared services.
OpenStack AZs
Availability Zones are partitions of an OpenStack cluster which include a subset of compute nodes from a region. That is, it is a partition of the data plane of an OpenStack cluster. All AZs share the same OpenStack control plane of their regions, and thus the same OpenStack services of their region.
Availability Zones typically represent a failure domain of a cluster/region: something like a power supply, horizontal switch fabric, or even the cooling of a datacenter building, whose failure effectively means the loss or unavailability of all compute nodes in the AZ.
Internally, OpenStack implements AZs as an exposed host aggregate. A compute node can be a member of only one AZ but it can be a member of multiple other, non-exposed host aggregates.
Addressability of Regions and AZs
Many OpenStack API resources include attributes to set the region and AZ they belong to and Operators want to control those attributes, which makes those two attributes of a cluster and of their compute nodes visible to OpenStack Operators.
OpenStack Operators usually take AZs into account when planning production workloads: you do not want to run all server instances of the same application (or the same component of a service-oriented application) in the same AZ. You want to spread those server instances into different AZs, so the loss of a single AZ does not impact availability of the application for its end-users.
OpenStack is able to automate the scaling and placement of server instances spread in multiple AZs, but it is up to the application (or its OpenStack Operator) to design such multi-AZ applications, using features from OpenStack such as Octavia load balancers, external IT infrastructure such as storage backends which support volume replication, and application services such as sharded databases.
Application workloads typically run inside a single region. Multi-region applications are designed as multiple copies (or instances) of the same application on each region and they rely on routing and replication services, not provided by OpenStack, to copy data between regions and direct users to the application instance in the closest region.
Host Aggregates
A Host Aggregate is a way to group compute nodes based on arbitrary attributes, so OpenStack Administrators can create sets of compute nodes based on hardware characteristics and other criteria that impacts the cost and performance of running different classes of workloads on those physical servers.
Physical machines are not born equal: some are designed for highly multi-threaded applications, while others are designed for I/O intensive applications. Physical machines come not only with different kinds of CPUs, main memory, and caches, but also with different hardware accelerators and possibly multiple I/O buses. Running any application in whatever compute node is available may be inefficient and expensive.
Most organizations have complex workloads with components that require different classes of physical servers for optimum performance or lower cost. It may be that the simplicity of managing an undifferentiated pool of compute resources is good enough, but it may be that you need to manage a better fit of applications to hardware, and host aggregates enable OpenStack Administrators and Operators to do that.
Compute nodes can belong to multiple host aggregates. It is up to the OpenStack administrator to set the attributes which control membership of nodes to different host aggregates. If the host aggregate is as an availability zone, then compute nodes cannot belong to other host aggregates also set as availability zones.
Because a host aggregate is tied to a class of physical machines, the same host aggregate can conceptually span multiple regions and AZs. For spanning AZs, remember that an AZ is a host aggregate and compute nodes can belong to multiple aggregates. For spanning regions, you could define the same attributes and assign the same names to aggregates in different regions for a consistent private cloud.
You don’t need to, but it would be unusual to put all machines of a special kind in a single AZ and lose all of them in case a power supply fails. It is more likely that those machines are spread into different server racks or buildings, thus in multiple AZs, and that you have similar machines in multiple data centers, thus in multiple regions.
Addressability of Host Aggregates
OpenStack Administrators manage host aggregates and their relationship to API resources. OpenStack Operators use host aggregates, indirectly, to ensure applications get the class of compute nodes they need or that is best for them. Unlike regions and AZs, to which an OpenStack Operator has direct visibility, in the sense of "create this server instance in region-A and AZ-1", you cannot declare "create this server instance in host-aggregate-A". You must specify an indirect link:
-
An Administrator configures a host aggregate, for example "gpu", and sets attributes to hosts which do include GPU hardware to match that host aggregate;
-
An Administrator configures one or more server flavors, for example "ml.small" and "ml.large", and sets attributes to link these server flavors to the "gpu" host aggregate;
-
An Operator creates a server instance and specifies either the "ml.small" or "ml.large" server flavor for the instance.
Compute Cells
OpenStack cluster cells are even less visible to OpenStack Operators than Host Aggregates. They relate to the reliability and scalability of an OpenStack control plane itself, instead of to workloads, but compute nodes must belong to one and only one cell, which makes this concept somewhat close to AZs.
The following figures show three typical ways of configuring cells in an OpenStack cluster: first, configuring a 1:1 equivalence of AZs and cells:
Second, creating multiple cells inside the same AZ. There could be more AZs beyond the one displayed in the figure.
Third, creating AZs inside the same cell. There could be more cells beyond the one displayed in the figure.
The point is, cells and AZs are independent concepts. They relate to different aspects of partitioning and scaling OpenStack clusters. For your peace of mind, it is best to avoid designs where the mapping between cells and AZs is not 1:1 nor hierarchical. Anyway, one compute host must belong to a single AZ and to a single cell.
The Administration learning journey will provide more information about cells and the internal services which run in an OpenStack control plane. Just as a curiosity, every OpenStack cluster has at least two cells:
- Cell0
-
It stores global information for the region, and it is needed because API resources may not relate to a compute node at all, so it is not possible to determine in which cell database to store them. For example, if a server instance was not scheduled to any compute node, because no one had sufficient capacity for it. You still need the API resource for that server instance, to get its failed status and a cause.
- Cell1
-
Includes all compute nodes in the initial cluster, until an Administrator decides to configure more cells and add compute nodes to them. Its database includes all API resources which relate to the compute nodes in the cell, for example all server instances running on those nodes.
OpenStack compute nodes connect to the cell services (the database and message queue) directly, as well as many other components of an OpenStack control plane. In this course we do not explore the internal structure of individual OpenStack services, but for now it is sufficient to know that the internal components of each service interact with each user using the cell database and cell message queues.
As you can see, OpenStack enables managing large pools of compute resources, but this requires planning and effort from Administrators.