Lab: Enabling the Governance Layer (Control & Demand)

Capacity without control is just chaos. We must now install the traffic controller.

In the previous lab, you created 4x virtual GPUs per node. In this lab, you assume the role of the Platform Admin. Your objective is to expose these resources to users via a governed "Vending Machine."

We will configure Kueue to manage the quota and deploy a Hardware Profile that allows users to request these slices without knowing the underlying complexity.

Prerequisites

Supply: You have completed Section 2 (Nodes are advertising nvidia.com/gpu: 4).
Kueue: The Kueue Operator is installed and set to Managed or Unmanaged.

Step 1: Initialize the Traffic Controller (Day 0 Config)

By default, the RHOAI Dashboard hides the complex queuing options. We must enable them.

Enable Dashboard Queue UI:

oc patch odhdashboardconfig odh-dashboard-config -n redhat-ods-applications \
  --type=merge -p '{"spec":{"disableKueue": false}}'

Invite the Namespace: Kueue only manages projects that are explicitly "invited." Label your target user namespace (e.g., gpu-users).
```
oc new-project gpu-users
oc label namespace gpu-users kueue.openshift.io/managed=true
```

Step 2: Define the Physical Flavor (ResourceFlavor)

Kueue needs to know what "Standard GPU" actually means in terms of physical labels.

Create the ResourceFlavor: This object tells Kueue: "When I ask for 'default-flavor', I mean nodes with the label nvidia.com/gpu."

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: default-flavor
spec:
  nodeLabels:
    nvidia.com/gpu.product: NVIDIA-L40S # <1> Optional: Pin to specific hardware
  tolerations:
  - key: "nvidia.com/gpu"
    operator: "Exists"

Step 3: Set the Quota (ClusterQueue)

Now we define the limits. We will create a global queue that allows the cluster to consume the virtual slices we created.

Create the ClusterQueue:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: cluster-queue-gpu
spec:
  namespaceSelector: {} # <1> Open to all namespaces
  resourceGroups:
  - coveredResources: ["nvidia.com/gpu"]
    flavors:
    - name: default-flavor
      resources:
      - name: "nvidia.com/gpu"
        nominalQuota: 10 # <2> Guaranteed slots
        borrowingLimit: 10 # <3> Extra burstable slots

Step 4: Build the Bridge (LocalQueue)

Users cannot see the global ClusterQueue. We must create a "Local Queue" inside their project that connects to the main supply.

Create the LocalQueue:

apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: gpu-users
  name: local-queue-gpu
spec:
  clusterQueue: cluster-queue-gpu

Step 5: Publish the Menu (Hardware Profile)

Finally, we create the UI button.

Critical Architecture Note

Because we are using Kueue, this Hardware Profile must not contain nodeSelectors or tolerations. It serves only as a pointer to the queue.

Create the Hardware Profile:

apiVersion: dashboard.opendatahub.io/v1alpha1
kind: HardwareProfile
metadata:
  name: profile-l40s-slice
  namespace: redhat-ods-applications
spec:
  displayName: "NVIDIA L40S (Time-Sliced)"
  description: "Shared GPU segment. Good for inference and light training."
  identifiers:
    - identifier: nvidia.com/gpu
      count: 1
  # No selectors here. Kueue handles placement.

Step 6: The "Payoff" (Verification)

Log in to the RHOAI Dashboard.
Go to Data Science Projects → gpu-users.
Create a Workbench.
Select the "NVIDIA L40S (Time-Sliced)" accelerator.
Success: The Pod launches.
- Under the hood: Kueue intercepted the request, checked the quota in cluster-queue-gpu, assigned default-flavor, and the Scheduler placed it on one of your 4 virtual slices.

Mission Accomplished. You have successfully deployed an end-to-end GPU-as-a-Service architecture.