Lab: Intelligent Workload Management with Kueue
Objective
In an unmanaged "Wild West" cluster, if your hardware has 4 GPUs and users submit jobs requiring 6 GPUs, the cluster destabilizes. Pods get stuck in crash loops, resources are fragmented, and nobody gets their work done.
In this lab, you will implement Kueue to act as your platform’s traffic controller. You will create a scenario of intentional resource scarcity to prove two critical outcomes:
-
Preventing Crashes: Jobs that exceed capacity wait patiently in a queue instead of failing.
-
The VIP Lane: High-priority jobs jump ahead of low-priority jobs in the queue.
Prerequisites
-
Red Hat OpenShift AI 3.2+ installed.
-
Kueue is installed and its
managementStateis set toManagedin the Data Science Cluster. -
You are logged into the
ocCLI withcluster-adminprivileges.
Step 1: Prepare the Factory Floor (Namespaces)
First, we will simulate a multi-tenant environment by creating two distinct project namespaces for competing teams. We must also label these namespaces so Kueue’s admission controller knows to monitor them.
# Create namespaces for two competing teams
oc new-project ai-team-a
oc new-project ai-team-b
# Enable Kueue management on these namespaces
oc label namespace ai-team-a kueue.openshift.io/managed=true
oc label namespace ai-team-b kueue.openshift.io/managed=true
Step 2: Define the Global Quota (Cluster-Scope)
We need to define the physical hardware profile (the ResourceFlavor) and the global maximum limit (the ClusterQueue). For this lab, we will simulate an expensive hardware pool by artificially capping our entire cluster queue at 4 CPUs.
infrastructure.yamlapiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
name: default-flavor
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: shared-cluster-queue
spec:
namespaceSelector: {} # Monitors all namespaces with the managed label
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 4 # The absolute ceiling for this queue
- name: "memory"
nominalQuota: 8Gi
oc apply -f infrastructure.yaml
Step 3: Define Team Access (Namespace-Scope)
Users do not submit jobs to the cluster queue. They submit jobs to a LocalQueue inside their own project. This acts as their entry point into the global pool.
team-queues.yamlapiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
namespace: ai-team-a
name: team-a-queue
spec:
clusterQueue: shared-cluster-queue
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
namespace: ai-team-b
name: team-b-queue
spec:
clusterQueue: shared-cluster-queue
oc apply -f team-queues.yaml
Step 4: Configure the "VIP" Priorities
Not all AI jobs are equal. We will create two priority tiers. When the cluster is full, Kueue uses these tiers to determine who gets admitted next.
priorities.yamlapiVersion: kueue.x-k8s.io/v1beta2
kind: WorkloadPriorityClass
metadata:
name: low-priority
value: 100
description: "Standard background exploration"
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: WorkloadPriorityClass
metadata:
name: high-priority
value: 1000
description: "Critical production training runs"
oc apply -f priorities.yaml
Step 5: The "Resource Contention" Experiment
To see the Allocation Engine in action, we will submit three jobs. Combined, they require 12 CPUs, but our quota only allows 4 CPUs.
-
Job 1 (Team B, Low Priority, 4 CPUs): Starts immediately. Consumes 100% of the quota for 60 seconds.
-
Job 2 (Team B, Low Priority, 4 CPUs): Submitted second. Will be queued (Suspended) because the quota is full.
-
Job 3 (Team A, High Priority, 4 CPUs): Submitted last. Will be queued, but will jump ahead of Job 2 because of its higher priority class.
# 1. Submit Job 1 (Consumes the quota)
oc create -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: lab-job-1-low
namespace: ai-team-b
labels:
kueue.x-k8s.io/queue-name: team-b-queue
kueue.x-k8s.io/priority-class: low-priority
spec:
suspend: true # Required for Kueue to manage the job's startup
template:
spec:
containers:
- name: dummy
image: bash:5
command: ["sleep", "60"]
resources:
requests:
cpu: 4
restartPolicy: Never
EOF
# 2. Submit Job 2 (Enters the queue behind Job 1)
oc create -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: lab-job-2-low
namespace: ai-team-b
labels:
kueue.x-k8s.io/queue-name: team-b-queue
kueue.x-k8s.io/priority-class: low-priority
spec:
suspend: true
template:
spec:
containers:
- name: dummy
image: bash:5
command: ["sleep", "60"]
resources:
requests:
cpu: 4
restartPolicy: Never
EOF
# 3. Submit Job 3 (The VIP Line Jumper)
oc create -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: lab-job-3-high
namespace: ai-team-a
labels:
kueue.x-k8s.io/queue-name: team-a-queue
kueue.x-k8s.io/priority-class: high-priority
spec:
suspend: true
template:
spec:
containers:
- name: dummy
image: bash:5
command: ["sleep", "60"]
resources:
requests:
cpu: 4
restartPolicy: Never
EOF
Step 6: Verification and Observation
Observe how Kueue handles the contention without crashing the cluster.
Run the following command immediately after submitting the jobs:
oc get workloads -A
-
Observation:
lab-job-1-lowwill show asAdmitted. The other two will show asPending. Notice that there are no failed pod errors or Out of Memory (OOM) crashes on your nodes.
Run the workload check again after 60 seconds (when Job 1 finishes).
oc get workloads -A
-
Observation:
lab-job-3-highwill transition toAdmitted, whilelab-job-2-lowremainsPending. Kueue intelligently evaluated the queue and promoted the high-priority job from Team A to run next.
-
Log in to the RHOAI Dashboard.
-
In the left-hand menu, navigate to Distributed Workloads → Resource Management.
-
You will see a visual representation of your
shared-cluster-queue, showing the 4 CPU limit, current utilization, and the exact order of the pending queue.
By establishing this queueing mechanism, you ensure your hardware is never idle, your cluster never crashes from over-subscription, and your most critical business workloads always run first.