2. Enabling, Configuring, and Verifying Workload Partitioning

Workload partitioning is not just a configuration; it is a fundamental cluster mode that must be established at the very beginning of the cluster’s lifecycle.

Enabling Workload Partitioning During Installation

To enable this feature, you must add the cpuPartitioningMode: AllNodes parameter to your install-config.yaml file before deploying the cluster. This flag signals the installer and the operators that the cluster will operate in a partitioned mode.

Snippet of install-config.yaml
...
cpuPartitioningMode: AllNodes
...

After the cluster is installed with this setting, the infrastructure is prepared for CPU pinning. You then use a PerformanceProfile to define the specific CPU sets for system and user workloads.

In this workshop environment, the cluster was provisioned with cpuPartitioningMode enabled. You will focus on the post-installation configuration and verification.

Applying the Performance Profile

The PerformanceProfile is the primary mechanism for managing fine-grained hardware tuning in OpenShift. It is managed by the Node Tuning Operator (NTO).

In this demonstration, we are using a node with 32 CPU cores (0-31). We will partition the CPUs as follows:

  • Reserved CPUs (0-15): Dedicated to the operating system (RHCOS) and critical OpenShift system components (e.g., Kubelet, CRI-O, etcd, etc.).

  • Isolated CPUs (16-31): Dedicated to user workloads. These cores are isolated from the kernel scheduler by default.

This 50/50 split is for demonstration purposes to clearly show the isolation in top. In a production environment, you typically reserve a smaller set (e.g., 4-8 cores) for the system and allocate the rest for workloads.

  1. Log in to the control plane node:

    # Get the first node name
    NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}')
    
    # Access the node via a debug pod
    oc debug node/$NODE_NAME
  2. Inspect the node’s CPU and NUMA topology:

    # Display CPU architecture information
    lscpu | grep -i numa
    Example Output (for a 32-core system):
    NUMA node(s):          1
    NUMA node0 CPU(s):     0-31
  3. Exit the debug shell and return to the bastion host:

    exit
  4. Create the PerformanceProfile manifest:

    tee $HOME/performance-profile.yaml << 'EOF'
    ---
    apiVersion: performance.openshift.io/v2
    kind: PerformanceProfile
    metadata:
      name: openshift-node-performance-profile
    spec:
      cpu:
        # Set core 0-15 for OpenShift system components and the OS
        reserved: "0-15"
        # Set core 16-31 for user workloads
        isolated: "16-31"
      machineConfigPoolSelector:
        pools.operator.machineconfiguration.openshift.io/master: ""
      nodeSelector:
        node-role.kubernetes.io/master: ''
      numa:
        # "restricted" policy ensures containers are pinned to the same NUMA node for memory and CPU
        topologyPolicy: "restricted"
      realTimeKernel:
        enabled: false
      workloadHints:
        realTime: false
        highPowerConsumption: false
        perPodPowerManagement: false
    EOF
  5. Apply the manifest to the cluster:

    oc apply -f $HOME/performance-profile.yaml

Applying a PerformanceProfile triggers a MachineConfig update. The affected nodes will undergo a rolling reboot to apply kernel arguments (like isolcpus). This process may take several minutes.

Verifying System Component CPU Affinity

Once the nodes have rebooted and the cluster is stable, we can verify that the system components are correctly constrained to the Reserved CPU set (0-15). We will use etcd as our primary example.

  1. Re-enter the control plane node:

    NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}')
    
    oc debug node/$NODE_NAME
  2. Check the CPU affinity of etcd processes:

    # Identify PIDs for etcd processes
    ETCD_PIDS=$(ps -ef | grep "etcd " | grep -v grep | awk '{print $2}')
    
    # Check the CPU affinity (cpuset) for each identified PID
    for pid in $ETCD_PIDS; do
        echo "----------------------------------------"
        echo "Checking PID: ${pid}"
    
        # Get the command line of the process
        COMMAND=$(ps -o args= -p "$pid")
        echo "Command: ${COMMAND}"
    
        # Use taskset to show the effective CPU affinity
        echo -n "CPU affinity (Cpuset): "
        taskset -c -p "$pid"
    done
  3. Analyze the output:

    The output should confirm that the etcd processes are strictly running on the reserved cores (0-15).

    Expected Output
    ----------------------------------------
    Checking PID: 4332
    Command: etcd --logger=zap ...
    CPU affinity (Cpuset): pid 4332's current affinity list: 0-15
    ----------------------------------------
    Checking PID: 4369
    Command: etcd grpc-proxy start ...
    CPU affinity (Cpuset): pid 4369's current affinity list: 0-15
  4. Return to the bastion host:

    exit