Creating OpenShift AI Resources - 1

Model Serving Runtimes

A model-serving runtime provides integration with a specified model server and the model frameworks that it supports. By default, Red Hat OpenShift AI includes the following model serving runtimes:

  • Multi-model

    • OpenVINO Model Server

  • Single-model

    • OpenVINO Model Server

    • Caikit Standalone ServingRuntime for KServe

    • Caikit ServingRuntime TGIS for KServe

    • TGIS Standalone ServingRuntime for KServe

    • vLLMServingRuntime For KServe

However, if these runtimes do not meet your needs (if they don’t support a particular model framework, for example), you might want to add your own custom runtimes.

As an administrator, you can use the OpenShift AI interface to add and enable custom model-serving runtimes. You can then choose from your enabled runtimes when you create a new model server.

This exercise will guide you through the steps necessary to deploy a custom Serving Runtime in order to serve a model using the Ollama Model Serving Framework.

While RHOAI supports the ability to add your own runtime, it is up to you to configure, adjust, and maintain your custom runtimes.

Add The Ollama Custom Runtime

ollama runtime
Figure 1. Animated - Add Ollama serving runtime
  1. Log in to RHOAI with a user who is part of the RHOAI admin group, for this lab we will be using the admin account.

  2. In the RHOAI Console, Navigate to the Settings menu, then select Serving Runtimes

  3. Select: the Add Serving Runtime button

  4. For the model serving platform runtime, select: Single-Model Serving Platform

  5. For API protocol this runtime supports, select: REST

  6. Click on: Start from scratch in the window that opens up, paste the following YAML

    apiVersion: serving.kserve.io/v1alpha1
    kind: ServingRuntime
    labels:
      opendatahub.io/dashboard: "true"
    metadata:
      annotations:
        openshift.io/display-name: Ollama
      name: ollama
    spec:
      builtInAdapter:
        modelLoadingTimeoutMillis: 90000
      containers:
        - image: quay.io/rh-aiservices-bu/ollama-ubi9:0.1.45
          env:
            - name: OLLAMA_MODELS
              value: /.ollama/models
            - name: OLLAMA_HOST
              value: 0.0.0.0
            - name: OLLAMA_KEEP_ALIVE
              value: '-1m'
          name: kserve-container
          ports:
            - containerPort: 11434
              name: http1
              protocol: TCP
      multiModel: false
      supportedModelFormats:
        - autoSelect: true
          name: any
  7. After clicking the Create button at the bottom of the input area, you will see the new Ollama Runtime in the list. We can re-order the list as needed (the order chosen here is the order in which the users will see these choices).


The next step is to create a Data Connection in our Data Science Project. Before we can create our Data Connection, we will setup MinIO as our S3 compatible storage for this Lab.

Continue to the next section to deploy and configure Minio.