Lab: Configuring the Virtualization Layer (Supply)
Raw hardware is inefficient. We must virtualize the asset to maximize throughput.
In this lab, you assume the role of the Infrastructure Engineer. Your objective is to take a physical NVIDIA L40S (48GB VRAM) and reconfigure it to support multiple concurrent inference workloads.
We will implement a Time-Slicing Strategy. By instructing the driver to interleave processes, we will transform 1 physical GPU into 4 logical accelerators, each capable of running a 7B parameter model.
Prerequisites
-
Access: OpenShift CLI (
oc) withcluster-admin. -
Target: A node with an NVIDIA GPU (L40S, A100, or T4).
-
Operator: NVIDIA GPU Operator installed.
Step 1: Define the Slicing Policy (The Blueprint)
The GPU Operator does not guess your density requirements. You must define them in a ConfigMap. We will define a policy named high-density-slice that splits the GPU into 4 replicas.
-
Create the Configuration:
apiVersion: v1 kind: ConfigMap metadata: name: time-slicing-config namespace: nvidia-gpu-operator data: high-density-slice: |- # <1> The Policy Name version: v1 sharing: timeSlicing: resources: - name: nvidia.com/gpu replicas: 4 # <2> The Multiplier (1 GPU = 4 Virtual) -
Apply the Blueprint:
oc apply -f time-slicing-config.yaml
Step 2: Activate the Engine (ClusterPolicy)
The blueprint exists, but the factory isn’t using it yet. We must update the global ClusterPolicy to enable the devicePlugin configuration.
-
Edit the ClusterPolicy:
oc edit clusterpolicy gpu-cluster-policy -
Inject the Configuration: Locate the
devicePluginsection and update it to reference your new map.spec: devicePlugin: config: name: time-slicing-config # <1> Must match ConfigMap name default: high-density-slice # <2> The default policy from Step 1
Step 3: Target the Machinery (Labeling)
In a real factory, not every machine runs the same process. We use Node Labels to tell the Operator which specific nodes should be sliced.
-
Identify your L40S Node:
oc get nodes -l nvidia.com/gpu.product=NVIDIA-L40S -
Apply the Instruction Label: This label triggers the Operator to reconfigure the Docker runtime and Device Plugin on this specific host.
oc label node <node-name> nvidia.com/device-plugin.config=high-density-slice
|
The Reconfiguration Event
The GPU Operator will detect this label change. It will restart the |
Step 4: Quality Assurance (Verification)
We must verify that the "Inventory System" (Node Feature Discovery) sees the new capacity.
-
Inspect the Node Capacity:
oc describe node <node-name> | grep "nvidia.com/gpu" -
Validate the Output: You should see that the capacity has quadrupled.
Capacity: nvidia.com/gpu: 4 (1) Allocatable: nvidia.com/gpu: 41 Success: The node now advertises 4 available slots. Kubernetes will allow 4 separate pods to request nvidia.com/gpu: 1on this single node.