5. Deep Dive and Conclusion
This section explores the underlying mechanics of workload partitioning, examining how OpenShift orchestrates isolation across the Kubelet, Container Runtime (CRI-O), and the Linux Kernel.
Behind the Scenes: The Orchestration of Isolation
The enforcement of CPU isolation is achieved through a multi-layered configuration managed by the Performance Addon Operator (integrated into the Node Tuning Operator).
Kubelet Configuration
The PerformanceProfile generates a specialized KubeletConfig that defines how the Kubernetes agent on each node handles CPU resources.
-
Inspect the generated
KubeletConfig:# View the Kubelet configuration generated by the PerformanceProfile oc get kubeletconfig performance-openshift-node-performance-profile -o yaml
The most critical parameter is reservedSystemCPUs. In our 32-core environment, this is set to 0-15.
-
Key Snippet from
KubeletConfig:... spec: kubeletConfig: ... cpuManagerPolicy: static reservedSystemCPUs: 0-15 topologyManagerPolicy: restricted ...-
cpuManagerPolicy: static: Enables the allocation of exclusive CPUs for Guaranteed Pods. -
reservedSystemCPUs: 0-15: Explicitly tells the Kubelet to ignore these cores when calculating allocatable resources for user Pods. -
topologyManagerPolicy: restricted: Ensures that CPU and device allocations are aligned with NUMA nodes to minimize cross-NUMA latency.
-
CRI-O Configuration: Workload Pinning
While the Kubelet manages Pod scheduling, CRI-O handles the actual container execution. For workload partitioning, a configuration file is created to pin infrastructure-level workloads (like the control plane pods) to the reserved cores.
-
Check the CRI-O workload pinning configuration on the node:
# Access the node NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}') oc debug node/$NODE_NAME# Inspect the CRI-O pinning configuration cat /host/etc/crio/crio.conf.d/99-workload-pinning.conf
This file ensures that any Pod annotated with management.workload.openshift.io/cores is restricted to the reserved cpuset.
-
Content of
99-workload-pinning.conf:[crio.runtime.workloads.management] activation_annotation = "target.workload.openshift.io/management" annotation_prefix = "resources.workload.openshift.io" resources = { "cpushares" = 0, "cpuset" = "0-15" }
Kernel-Level Isolation: Boot Parameters
The most fundamental layer of isolation happens at the kernel level. The Performance Addon Operator injects specific kernel arguments via MachineConfig.
-
Verify kernel boot parameters:
# Display the kernel command line arguments cat /proc/cmdline -
Analysis of Key Boot Parameters:
... rcu_nocbs=16-31 ... systemd.cpu_affinity=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 ... isolcpus=managed_irq,16-31 ...-
isolcpus=managed_irq,16-31: This is the "hard" isolation. It tells the Linux scheduler not to run any general tasks on cores 16-31 unless explicitly requested (e.g., viatasksetor the CPU Manager). -
rcu_nocbs=16-31: Offloads RCU (Read-Copy-Update) callbacks from the isolated cores to the reserved cores. This eliminates background kernel "jitter," ensuring the isolated cores are dedicated entirely to the workload. -
systemd.cpu_affinity=0-15: Forces thesystemdinit process and all its children (the entire OS service tree) to run only on the reserved cores.
-
-
Go back to bastion node
exit
Node Status: Capacity vs. Allocatable
The result of this configuration is clearly visible in the node’s capacity metrics.
-
Inspect the node’s resource status:
NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}') oc describe node $NODE_NAME
After Workload Partitioning
Notice the discrepancy between Capacity and Allocatable. Although the node has 32 CPUs, only 16 are "Allocatable" to standard Kubernetes workloads.
...
Capacity:
cpu: 32
management.workload.openshift.io/cores: 32k
memory: 65814140Ki
pods: 250
...
Allocatable:
cpu: 16
management.workload.openshift.io/cores: 32k
memory: 64687740Ki
pods: 250
...
...
Before Workload Partitioning (Comparison)
For comparison, a standard OpenShift node (without partitioning) would show an Allocatable CPU count nearly equal to its Capacity (e.g., 31.5 cores for a 32-core system), as only a minimal amount is reserved for system overhead.
...
Capacity:
cpu: 32
...
Allocatable:
cpu: 31500m
...
...
Conclusion
Workload Partitioning transforms OpenShift into a highly deterministic platform suitable for the most demanding edge and telecom environments. By leveraging kernel-level isolation and runtime-level pinning, it provides:
-
Guaranteed System Stability: The control plane always has its own dedicated hardware resources.
-
Ultra-Low Jitter: Isolated cores are shielded from kernel housekeeping tasks.
-
Strict Resource Boundaries: User workloads cannot impact system performance, even under extreme stress.
This concludes the lab on Workload Partitioning. You have successfully configured a cluster to behave as a high-performance, partitioned system and verified its behavior with Pods and Virtual Machines.