Lab: Enabling the Governance Layer (Control & Demand)
Capacity without control is just chaos. We must now install the traffic controller.
In the previous lab, you created 4x virtual GPUs per node. In this lab, you assume the role of the Platform Admin. Your objective is to expose these resources to users via a governed "Vending Machine."
We will configure Kueue to manage the quota and deploy a Hardware Profile that allows users to request these slices without knowing the underlying complexity.
Prerequisites
-
Supply: You have completed Section 2 (Nodes are advertising
nvidia.com/gpu: 4). -
Kueue: The Kueue Operator is installed and set to
ManagedorUnmanaged.
Step 1: Initialize the Traffic Controller (Day 0 Config)
By default, the RHOAI Dashboard hides the complex queuing options. We must enable them.
-
Enable Dashboard Queue UI:
oc patch odhdashboardconfig odh-dashboard-config -n redhat-ods-applications \ --type=merge -p '{"spec":{"disableKueue": false}}' -
Invite the Namespace: Kueue only manages projects that are explicitly "invited." Label your target user namespace (e.g.,
gpu-users).oc new-project gpu-users oc label namespace gpu-users kueue.openshift.io/managed=true
Step 2: Define the Physical Flavor (ResourceFlavor)
Kueue needs to know what "Standard GPU" actually means in terms of physical labels.
-
Create the ResourceFlavor: This object tells Kueue: "When I ask for 'default-flavor', I mean nodes with the label
nvidia.com/gpu."apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: default-flavor spec: nodeLabels: nvidia.com/gpu.product: NVIDIA-L40S # <1> Optional: Pin to specific hardware tolerations: - key: "nvidia.com/gpu" operator: "Exists"
Step 3: Set the Quota (ClusterQueue)
Now we define the limits. We will create a global queue that allows the cluster to consume the virtual slices we created.
-
Create the ClusterQueue:
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: cluster-queue-gpu spec: namespaceSelector: {} # <1> Open to all namespaces resourceGroups: - coveredResources: ["nvidia.com/gpu"] flavors: - name: default-flavor resources: - name: "nvidia.com/gpu" nominalQuota: 10 # <2> Guaranteed slots borrowingLimit: 10 # <3> Extra burstable slots
Step 4: Build the Bridge (LocalQueue)
Users cannot see the global ClusterQueue. We must create a "Local Queue" inside their project that connects to the main supply.
-
Create the LocalQueue:
apiVersion: kueue.x-k8s.io/v1beta1 kind: LocalQueue metadata: namespace: gpu-users name: local-queue-gpu spec: clusterQueue: cluster-queue-gpu
Step 5: Publish the Menu (Hardware Profile)
Finally, we create the UI button.
|
Critical Architecture Note
Because we are using Kueue, this Hardware Profile must not contain |
-
Create the Hardware Profile:
apiVersion: dashboard.opendatahub.io/v1alpha1 kind: HardwareProfile metadata: name: profile-l40s-slice namespace: redhat-ods-applications spec: displayName: "NVIDIA L40S (Time-Sliced)" description: "Shared GPU segment. Good for inference and light training." identifiers: - identifier: nvidia.com/gpu count: 1 # No selectors here. Kueue handles placement.
Step 6: The "Payoff" (Verification)
-
Log in to the RHOAI Dashboard.
-
Go to Data Science Projects → gpu-users.
-
Create a Workbench.
-
Select the "NVIDIA L40S (Time-Sliced)" accelerator.
-
Success: The Pod launches.
-
Under the hood: Kueue intercepted the request, checked the quota in
cluster-queue-gpu, assigneddefault-flavor, and the Scheduler placed it on one of your 4 virtual slices.
-
Mission Accomplished. You have successfully deployed an end-to-end GPU-as-a-Service architecture.