5. Deep Dive and Conclusion

This section explores the underlying mechanics of workload partitioning, examining how OpenShift orchestrates isolation across the Kubelet, Container Runtime (CRI-O), and the Linux Kernel.

Behind the Scenes: The Orchestration of Isolation

The enforcement of CPU isolation is achieved through a multi-layered configuration managed by the Performance Addon Operator (integrated into the Node Tuning Operator).

Kubelet Configuration

The PerformanceProfile generates a specialized KubeletConfig that defines how the Kubernetes agent on each node handles CPU resources.

Inspect the generated KubeletConfig:

# View the Kubelet configuration generated by the PerformanceProfile
oc get kubeletconfig performance-openshift-node-performance-profile -o yaml

The most critical parameter is reservedSystemCPUs. In our 32-core environment, this is set to 0-15.

Key Snippet from KubeletConfig:
```
...
spec:
  kubeletConfig:
    ...
    cpuManagerPolicy: static
    reservedSystemCPUs: 0-15
    topologyManagerPolicy: restricted
...
```
- cpuManagerPolicy: static: Enables the allocation of exclusive CPUs for Guaranteed Pods.
- reservedSystemCPUs: 0-15: Explicitly tells the Kubelet to ignore these cores when calculating allocatable resources for user Pods.
- topologyManagerPolicy: restricted: Ensures that CPU and device allocations are aligned with NUMA nodes to minimize cross-NUMA latency.

CRI-O Configuration: Workload Pinning

While the Kubelet manages Pod scheduling, CRI-O handles the actual container execution. For workload partitioning, a configuration file is created to pin infrastructure-level workloads (like the control plane pods) to the reserved cores.

Check the CRI-O workload pinning configuration on the node:

# Access the node
NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}')
oc debug node/$NODE_NAME

# Inspect the CRI-O pinning configuration
cat /host/etc/crio/crio.conf.d/99-workload-pinning.conf

This file ensures that any Pod annotated with management.workload.openshift.io/cores is restricted to the reserved cpuset.

Content of 99-workload-pinning.conf:

[crio.runtime.workloads.management]
activation_annotation = "target.workload.openshift.io/management"
annotation_prefix = "resources.workload.openshift.io"
resources = { "cpushares" = 0, "cpuset" = "0-15" }

Kernel-Level Isolation: Boot Parameters

The most fundamental layer of isolation happens at the kernel level. The Performance Addon Operator injects specific kernel arguments via MachineConfig.

Verify kernel boot parameters:

# Display the kernel command line arguments
cat /proc/cmdline

Analysis of Key Boot Parameters:
```
... rcu_nocbs=16-31 ... systemd.cpu_affinity=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 ... isolcpus=managed_irq,16-31 ...
```
- isolcpus=managed_irq,16-31: This is the "hard" isolation. It tells the Linux scheduler not to run any general tasks on cores 16-31 unless explicitly requested (e.g., via taskset or the CPU Manager).
- rcu_nocbs=16-31: Offloads RCU (Read-Copy-Update) callbacks from the isolated cores to the reserved cores. This eliminates background kernel "jitter," ensuring the isolated cores are dedicated entirely to the workload.
- systemd.cpu_affinity=0-15: Forces the systemd init process and all its children (the entire OS service tree) to run only on the reserved cores.
Go back to bastion node
```
exit
```

Node Status: Capacity vs. Allocatable

The result of this configuration is clearly visible in the node’s capacity metrics.

Inspect the node’s resource status:

NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}')
oc describe node $NODE_NAME

After Workload Partitioning

Notice the discrepancy between Capacity and Allocatable. Although the node has 32 CPUs, only 16 are "Allocatable" to standard Kubernetes workloads.

...
Capacity:
  cpu:                                     32
  management.workload.openshift.io/cores:  32k
  memory:                                  65814140Ki
  pods:                                    250
  ...
Allocatable:
  cpu:                                     16
  management.workload.openshift.io/cores:  32k
  memory:                                  64687740Ki
  pods:                                    250
  ...
...

Before Workload Partitioning (Comparison)

For comparison, a standard OpenShift node (without partitioning) would show an Allocatable CPU count nearly equal to its Capacity (e.g., 31.5 cores for a 32-core system), as only a minimal amount is reserved for system overhead.

...
Capacity:
  cpu:                                     32
  ...
Allocatable:
  cpu:                                     31500m
  ...
...

Conclusion

Workload Partitioning transforms OpenShift into a highly deterministic platform suitable for the most demanding edge and telecom environments. By leveraging kernel-level isolation and runtime-level pinning, it provides:

Guaranteed System Stability: The control plane always has its own dedicated hardware resources.
Ultra-Low Jitter: Isolated cores are shielded from kernel housekeeping tasks.
Strict Resource Boundaries: User workloads cannot impact system performance, even under extreme stress.

This concludes the lab on Workload Partitioning. You have successfully configured a cluster to behave as a high-performance, partitioned system and verified its behavior with Pods and Virtual Machines.