2. Enabling, Configuring, and Verifying Workload Partitioning
Workload partitioning is not just a configuration; it is a fundamental cluster mode that must be established at the very beginning of the cluster’s lifecycle.
Enabling Workload Partitioning During Installation
To enable this feature, you must add the cpuPartitioningMode: AllNodes parameter to your install-config.yaml file before deploying the cluster. This flag signals the installer and the operators that the cluster will operate in a partitioned mode.
...
cpuPartitioningMode: AllNodes
...
After the cluster is installed with this setting, the infrastructure is prepared for CPU pinning. You then use a PerformanceProfile to define the specific CPU sets for system and user workloads.
|
In this workshop environment, the cluster was provisioned with |
Applying the Performance Profile
The PerformanceProfile is the primary mechanism for managing fine-grained hardware tuning in OpenShift. It is managed by the Node Tuning Operator (NTO).
In this demonstration, we are using a node with 32 CPU cores (0-31). We will partition the CPUs as follows:
-
Reserved CPUs (0-15): Dedicated to the operating system (RHCOS) and critical OpenShift system components (e.g., Kubelet, CRI-O, etcd, etc.).
-
Isolated CPUs (16-31): Dedicated to user workloads. These cores are isolated from the kernel scheduler by default.
|
This 50/50 split is for demonstration purposes to clearly show the isolation in |
-
Log in to the control plane node:
# Get the first node name NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}') # Access the node via a debug pod oc debug node/$NODE_NAME -
Inspect the node’s CPU and NUMA topology:
# Display CPU architecture information lscpu | grep -i numaExample Output (for a 32-core system):NUMA node(s): 1 NUMA node0 CPU(s): 0-31 -
Exit the debug shell and return to the bastion host:
exit -
Create the
PerformanceProfilemanifest:tee $HOME/performance-profile.yaml << 'EOF' --- apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: openshift-node-performance-profile spec: cpu: # Set core 0-15 for OpenShift system components and the OS reserved: "0-15" # Set core 16-31 for user workloads isolated: "16-31" machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: '' numa: # "restricted" policy ensures containers are pinned to the same NUMA node for memory and CPU topologyPolicy: "restricted" realTimeKernel: enabled: false workloadHints: realTime: false highPowerConsumption: false perPodPowerManagement: false EOF -
Apply the manifest to the cluster:
oc apply -f $HOME/performance-profile.yaml
|
Applying a |
Verifying System Component CPU Affinity
Once the nodes have rebooted and the cluster is stable, we can verify that the system components are correctly constrained to the Reserved CPU set (0-15). We will use etcd as our primary example.
-
Re-enter the control plane node:
NODE_NAME=$(oc get nodes -o jsonpath='{.items[0].metadata.name}') oc debug node/$NODE_NAME -
Check the CPU affinity of
etcdprocesses:# Identify PIDs for etcd processes ETCD_PIDS=$(ps -ef | grep "etcd " | grep -v grep | awk '{print $2}') # Check the CPU affinity (cpuset) for each identified PID for pid in $ETCD_PIDS; do echo "----------------------------------------" echo "Checking PID: ${pid}" # Get the command line of the process COMMAND=$(ps -o args= -p "$pid") echo "Command: ${COMMAND}" # Use taskset to show the effective CPU affinity echo -n "CPU affinity (Cpuset): " taskset -c -p "$pid" done -
Analyze the output:
The output should confirm that the
etcdprocesses are strictly running on the reserved cores (0-15).Expected Output---------------------------------------- Checking PID: 4332 Command: etcd --logger=zap ... CPU affinity (Cpuset): pid 4332's current affinity list: 0-15 ---------------------------------------- Checking PID: 4369 Command: etcd grpc-proxy start ... CPU affinity (Cpuset): pid 4369's current affinity list: 0-15 -
Return to the bastion host:
exit