2. Enabling, Configuring, and Verifying Workload Partitioning

Workload partitioning is not just a configuration; it is a fundamental cluster mode that must be established at the very beginning of the cluster’s lifecycle.

Enabling Workload Partitioning During Installation

To enable this feature, you must add the cpuPartitioningMode: AllNodes parameter to your install-config.yaml file before deploying the cluster. This flag signals the installer and the operators that the cluster will operate in a partitioned mode.

Snippet of install-config.yaml

...
cpuPartitioningMode: AllNodes
...

After the cluster is installed with this setting, the infrastructure is prepared for CPU pinning. You then use a PerformanceProfile to define the specific CPU sets for system and user workloads.

In this workshop environment, the cluster was provisioned with cpuPartitioningMode enabled. You will focus on the post-installation configuration and verification.

Applying the Performance Profile

The PerformanceProfile is the primary mechanism for managing fine-grained hardware tuning in OpenShift. It is managed by the Node Tuning Operator (NTO).

In this demonstration, we are using a node with 32 CPU cores (0-31). We will partition the CPUs as follows:

Reserved CPUs (0-15): Dedicated to the operating system (RHCOS) and critical OpenShift system components (e.g., Kubelet, CRI-O, etcd, etc.).
Isolated CPUs (16-31): Dedicated to user workloads. These cores are isolated from the kernel scheduler by default.

This 50/50 split is for demonstration purposes to clearly show the isolation in top. In a production environment, you typically reserve a smaller set (e.g., 4-8 cores) for the system and allocate the rest for workloads.

Log in to the control plane node:

# Get the first node name
NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}')

# Access the node via a debug pod
oc debug node/$NODE_NAME

Inspect the node’s CPU and NUMA topology:

# Display CPU architecture information
lscpu | grep -i numa

Example Output (for a 32-core system):

NUMA node(s):          1
NUMA node0 CPU(s):     0-31

Exit the debug shell and return to the bastion host:
```
exit
```

Create the PerformanceProfile manifest:

tee $HOME/performance-profile.yaml << 'EOF'
---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: openshift-node-performance-profile
spec:
  cpu:
    # Set core 0-15 for OpenShift system components and the OS
    reserved: "0-15"
    # Set core 16-31 for user workloads
    isolated: "16-31"
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  nodeSelector:
    node-role.kubernetes.io/master: ''
  numa:
    # "restricted" policy ensures containers are pinned to the same NUMA node for memory and CPU
    topologyPolicy: "restricted"
  realTimeKernel:
    enabled: false
  workloadHints:
    realTime: false
    highPowerConsumption: false
    perPodPowerManagement: false
EOF

Apply the manifest to the cluster:

oc apply -f $HOME/performance-profile.yaml

Applying a PerformanceProfile triggers a MachineConfig update. The affected nodes will undergo a rolling reboot to apply kernel arguments (like isolcpus). This process may take several minutes.

Verifying System Component CPU Affinity

Once the nodes have rebooted and the cluster is stable, we can verify that the system components are correctly constrained to the Reserved CPU set (0-15). We will use etcd as our primary example.

Re-enter the control plane node:

NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}')

oc debug node/$NODE_NAME

Check the CPU affinity of etcd processes:

# Identify PIDs for etcd processes
ETCD_PIDS=$(ps -ef | grep "etcd " | grep -v grep | awk '{print $2}')

# Check the CPU affinity (cpuset) for each identified PID
for pid in $ETCD_PIDS; do
    echo "----------------------------------------"
    echo "Checking PID: ${pid}"

    # Get the command line of the process
    COMMAND=$(ps -o args= -p "$pid")
    echo "Command: ${COMMAND}"

    # Use taskset to show the effective CPU affinity
    echo -n "CPU affinity (Cpuset): "
    taskset -c -p "$pid"
done

Analyze the output:

The output should confirm that the etcd processes are strictly running on the reserved cores (0-15).

Expected Output

----------------------------------------
Checking PID: 4332
Command: etcd --logger=zap ...
CPU affinity (Cpuset): pid 4332's current affinity list: 0-15
----------------------------------------
Checking PID: 4369
Command: etcd grpc-proxy start ...
CPU affinity (Cpuset): pid 4369's current affinity list: 0-15

Return to the bastion host:
```
exit
```