Lab: Configuring the Virtualization Layer (Supply)

Raw hardware is inefficient. We must virtualize the asset to maximize throughput.

In this lab, you assume the role of the Infrastructure Engineer. Your objective is to take a physical NVIDIA L40S (48GB VRAM) and reconfigure it to support multiple concurrent inference workloads.

We will implement a Time-Slicing Strategy. By instructing the driver to interleave processes, we will transform 1 physical GPU into 4 logical accelerators, each capable of running a 7B parameter model.

Prerequisites

Access: OpenShift CLI (oc) with cluster-admin.
Target: A node with an NVIDIA GPU (L40S, A100, or T4).
Operator: NVIDIA GPU Operator installed.

Step 1: Define the Slicing Policy (The Blueprint)

The GPU Operator does not guess your density requirements. You must define them in a ConfigMap. We will define a policy named high-density-slice that splits the GPU into 4 replicas.

Create the Configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
  namespace: nvidia-gpu-operator
data:
  high-density-slice: |-  # <1> The Policy Name
    version: v1
    sharing:
      timeSlicing:
        resources:
        - name: nvidia.com/gpu
          replicas: 4         # <2> The Multiplier (1 GPU = 4 Virtual)

Apply the Blueprint:
```
oc apply -f time-slicing-config.yaml
```

Step 2: Activate the Engine (ClusterPolicy)

The blueprint exists, but the factory isn’t using it yet. We must update the global ClusterPolicy to enable the devicePlugin configuration.

Edit the ClusterPolicy:

oc edit clusterpolicy gpu-cluster-policy

Inject the Configuration: Locate the devicePlugin section and update it to reference your new map.

spec:
  devicePlugin:
    config:
      name: time-slicing-config  # <1> Must match ConfigMap name
      default: high-density-slice # <2> The default policy from Step 1

Step 3: Target the Machinery (Labeling)

In a real factory, not every machine runs the same process. We use Node Labels to tell the Operator which specific nodes should be sliced.

Identify your L40S Node:

oc get nodes -l nvidia.com/gpu.product=NVIDIA-L40S

Apply the Instruction Label: This label triggers the Operator to reconfigure the Docker runtime and Device Plugin on this specific host.
```
oc label node <node-name> nvidia.com/device-plugin.config=high-density-slice
```

The Reconfiguration Event

The GPU Operator will detect this label change. It will restart the nvidia-device-plugin pod on that node. This usually takes 15-30 seconds. Existing workloads may be disrupted—always perform this on maintenance windows.

Step 4: Quality Assurance (Verification)

We must verify that the "Inventory System" (Node Feature Discovery) sees the new capacity.

Inspect the Node Capacity:

oc describe node <node-name> | grep "nvidia.com/gpu"

Validate the Output: You should see that the capacity has quadrupled.
```
Capacity:
  nvidia.com/gpu: 4  (1)
Allocatable:
  nvidia.com/gpu: 4
```
1 Success: The node now advertises 4 available slots. Kubernetes will allow 4 separate pods to request nvidia.com/gpu: 1 on this single node.

Technical Summary

You have successfully decoupled the logical capacity from the physical hardware.

Input: 1x NVIDIA L40S.
Process: Applied Time-Slicing Policy.
Output: 4x Virtual GPUs ready for assignment.

The Supply is ready. Proceed to the next section to implement the Governance Layer (Kueue).