Lab: Enabling the Governance Layer (Control & Demand)

Capacity without control is just chaos. We must now install the traffic controller.

In the previous lab, you created 4x virtual GPUs per node. In this lab, you assume the role of the Platform Admin. Your objective is to expose these resources to users via a governed "Vending Machine."

We will configure Kueue to manage the quota and deploy a Hardware Profile that allows users to request these slices without knowing the underlying complexity.

Prerequisites

  • Supply: You have completed Section 2 (Nodes are advertising nvidia.com/gpu: 4).

  • Kueue: The Kueue Operator is installed and set to Managed or Unmanaged.

Step 1: Initialize the Traffic Controller (Day 0 Config)

By default, the RHOAI Dashboard hides the complex queuing options. We must enable them.

  1. Enable Dashboard Queue UI:

    oc patch odhdashboardconfig odh-dashboard-config -n redhat-ods-applications \
      --type=merge -p '{"spec":{"disableKueue": false}}'
  2. Invite the Namespace: Kueue only manages projects that are explicitly "invited." Label your target user namespace (e.g., gpu-users).

    oc new-project gpu-users
    oc label namespace gpu-users kueue.openshift.io/managed=true

Step 2: Define the Physical Flavor (ResourceFlavor)

Kueue needs to know what "Standard GPU" actually means in terms of physical labels.

  1. Create the ResourceFlavor: This object tells Kueue: "When I ask for 'default-flavor', I mean nodes with the label nvidia.com/gpu."

    apiVersion: kueue.x-k8s.io/v1beta1
    kind: ResourceFlavor
    metadata:
      name: default-flavor
    spec:
      nodeLabels:
        nvidia.com/gpu.product: NVIDIA-L40S # <1> Optional: Pin to specific hardware
      tolerations:
      - key: "nvidia.com/gpu"
        operator: "Exists"

Step 3: Set the Quota (ClusterQueue)

Now we define the limits. We will create a global queue that allows the cluster to consume the virtual slices we created.

  1. Create the ClusterQueue:

    apiVersion: kueue.x-k8s.io/v1beta1
    kind: ClusterQueue
    metadata:
      name: cluster-queue-gpu
    spec:
      namespaceSelector: {} # <1> Open to all namespaces
      resourceGroups:
      - coveredResources: ["nvidia.com/gpu"]
        flavors:
        - name: default-flavor
          resources:
          - name: "nvidia.com/gpu"
            nominalQuota: 10 # <2> Guaranteed slots
            borrowingLimit: 10 # <3> Extra burstable slots

Step 4: Build the Bridge (LocalQueue)

Users cannot see the global ClusterQueue. We must create a "Local Queue" inside their project that connects to the main supply.

  1. Create the LocalQueue:

    apiVersion: kueue.x-k8s.io/v1beta1
    kind: LocalQueue
    metadata:
      namespace: gpu-users
      name: local-queue-gpu
    spec:
      clusterQueue: cluster-queue-gpu

Step 5: Publish the Menu (Hardware Profile)

Finally, we create the UI button.

Critical Architecture Note

Because we are using Kueue, this Hardware Profile must not contain nodeSelectors or tolerations. It serves only as a pointer to the queue.

  1. Create the Hardware Profile:

    apiVersion: dashboard.opendatahub.io/v1alpha1
    kind: HardwareProfile
    metadata:
      name: profile-l40s-slice
      namespace: redhat-ods-applications
    spec:
      displayName: "NVIDIA L40S (Time-Sliced)"
      description: "Shared GPU segment. Good for inference and light training."
      identifiers:
        - identifier: nvidia.com/gpu
          count: 1
      # No selectors here. Kueue handles placement.

Step 6: The "Payoff" (Verification)

  1. Log in to the RHOAI Dashboard.

  2. Go to Data Science Projectsgpu-users.

  3. Create a Workbench.

  4. Select the "NVIDIA L40S (Time-Sliced)" accelerator.

  5. Success: The Pod launches.

    • Under the hood: Kueue intercepted the request, checked the quota in cluster-queue-gpu, assigned default-flavor, and the Scheduler placed it on one of your 4 virtual slices.


Mission Accomplished. You have successfully deployed an end-to-end GPU-as-a-Service architecture.