Lab: Intelligent Workload Management with Kueue

Objective

In an unmanaged "Wild West" cluster, if your hardware has 4 GPUs and users submit jobs requiring 6 GPUs, the cluster destabilizes. Pods get stuck in crash loops, resources are fragmented, and nobody gets their work done.

In this lab, you will implement Kueue to act as your platform’s traffic controller. You will create a scenario of intentional resource scarcity to prove two critical outcomes:

Preventing Crashes: Jobs that exceed capacity wait patiently in a queue instead of failing.
The VIP Lane: High-priority jobs jump ahead of low-priority jobs in the queue.

Prerequisites

Red Hat OpenShift AI 3.2+ installed.
Kueue is installed and its managementState is set to Managed in the Data Science Cluster.
You are logged into the oc CLI with cluster-admin privileges.

Step 1: Prepare the Factory Floor (Namespaces)

First, we will simulate a multi-tenant environment by creating two distinct project namespaces for competing teams. We must also label these namespaces so Kueue’s admission controller knows to monitor them.

# Create namespaces for two competing teams
oc new-project ai-team-a
oc new-project ai-team-b

# Enable Kueue management on these namespaces
oc label namespace ai-team-a kueue.openshift.io/managed=true
oc label namespace ai-team-b kueue.openshift.io/managed=true

Step 2: Define the Global Quota (Cluster-Scope)

We need to define the physical hardware profile (the ResourceFlavor) and the global maximum limit (the ClusterQueue). For this lab, we will simulate an expensive hardware pool by artificially capping our entire cluster queue at 4 CPUs.

Create infrastructure.yaml

apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: default-flavor
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: shared-cluster-queue
spec:
  namespaceSelector: {} # Monitors all namespaces with the managed label
  resourceGroups:
  - coveredResources: ["cpu", "memory"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 4 # The absolute ceiling for this queue
      - name: "memory"
        nominalQuota: 8Gi

Apply the configuration

oc apply -f infrastructure.yaml

Step 3: Define Team Access (Namespace-Scope)

Users do not submit jobs to the cluster queue. They submit jobs to a LocalQueue inside their own project. This acts as their entry point into the global pool.

Create team-queues.yaml

apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  namespace: ai-team-a
  name: team-a-queue
spec:
  clusterQueue: shared-cluster-queue
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  namespace: ai-team-b
  name: team-b-queue
spec:
  clusterQueue: shared-cluster-queue

Apply the configuration

oc apply -f team-queues.yaml

Step 4: Configure the "VIP" Priorities

Not all AI jobs are equal. We will create two priority tiers. When the cluster is full, Kueue uses these tiers to determine who gets admitted next.

Create priorities.yaml

apiVersion: kueue.x-k8s.io/v1beta2
kind: WorkloadPriorityClass
metadata:
  name: low-priority
value: 100
description: "Standard background exploration"
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: WorkloadPriorityClass
metadata:
  name: high-priority
value: 1000
description: "Critical production training runs"

Apply the configuration

oc apply -f priorities.yaml

Step 5: The "Resource Contention" Experiment

To see the Allocation Engine in action, we will submit three jobs. Combined, they require 12 CPUs, but our quota only allows 4 CPUs.

Job 1 (Team B, Low Priority, 4 CPUs): Starts immediately. Consumes 100% of the quota for 60 seconds.
Job 2 (Team B, Low Priority, 4 CPUs): Submitted second. Will be queued (Suspended) because the quota is full.
Job 3 (Team A, High Priority, 4 CPUs): Submitted last. Will be queued, but will jump ahead of Job 2 because of its higher priority class.

Run this entire block in your terminal

# 1. Submit Job 1 (Consumes the quota)
oc create -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: lab-job-1-low
  namespace: ai-team-b
  labels:
    kueue.x-k8s.io/queue-name: team-b-queue
    kueue.x-k8s.io/priority-class: low-priority
spec:
  suspend: true # Required for Kueue to manage the job's startup
  template:
    spec:
      containers:
      - name: dummy
        image: bash:5
        command: ["sleep", "60"]
        resources:
          requests:
            cpu: 4
      restartPolicy: Never
EOF

Run this entire block in your terminal

# 2. Submit Job 2 (Enters the queue behind Job 1)
oc create -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: lab-job-2-low
  namespace: ai-team-b
  labels:
    kueue.x-k8s.io/queue-name: team-b-queue
    kueue.x-k8s.io/priority-class: low-priority
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: dummy
        image: bash:5
        command: ["sleep", "60"]
        resources:
          requests:
            cpu: 4
      restartPolicy: Never
EOF

Run this entire block in your terminal

# 3. Submit Job 3 (The VIP Line Jumper)
oc create -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: lab-job-3-high
  namespace: ai-team-a
  labels:
    kueue.x-k8s.io/queue-name: team-a-queue
    kueue.x-k8s.io/priority-class: high-priority
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: dummy
        image: bash:5
        command: ["sleep", "60"]
        resources:
          requests:
            cpu: 4
      restartPolicy: Never
EOF

Step 6: Verification and Observation

Observe how Kueue handles the contention without crashing the cluster.

1. View the Workloads via CLI

Run the following command immediately after submitting the jobs:

oc get workloads -A

Observation: lab-job-1-low will show as Admitted. The other two will show as Pending. Notice that there are no failed pod errors or Out of Memory (OOM) crashes on your nodes.

2. View the Priority Jump

Run the workload check again after 60 seconds (when Job 1 finishes).

oc get workloads -A

Observation: lab-job-3-high will transition to Admitted, while lab-job-2-low remains Pending. Kueue intelligently evaluated the queue and promoted the high-priority job from Team A to run next.

3. Visualize in OpenShift AI

Log in to the RHOAI Dashboard.
In the left-hand menu, navigate to Distributed Workloads → Resource Management.
You will see a visual representation of your shared-cluster-queue, showing the 4 CPU limit, current utilization, and the exact order of the pending queue.

By establishing this queueing mechanism, you ensure your hardware is never idle, your cluster never crashes from over-subscription, and your most critical business workloads always run first.