Red Hat OpenShift AI

Automation with data science pipelines

Data science pipelines can be a game-changer for AI model development. By breaking down complex tasks into smaller, manageable steps, we can optimize each part of the process, ensuring that our models are trained and validated. Additionally, pipelines can help us maintain consistent results by versioning inputs and outputs, allowing us to track changes and identify potential issues.

This course is tailored for infrastructure solution architects and engineers who are tasked with deploying and managing data science pipelines on the OpenShift AI platform. By the end of this course, learners will have a solid understanding of how to deploy resources and support data scientists who will use RHOAI to design, build, and maintain efficient and effective data science pipelines in an OpenShift AI environment.

Let’s explore how pipelines can help us optimize training tasks, manage caching steps, and create more maintainable and reusable workloads.

Prerequisites

  • Basic knowledge of OpenShift administration

  • Theory of user and role administration

  • Working knowledge of OpenShift AI components

  • Basic experience with Python code snippets & Jupyter notebooks

Objectives

The overall objectives of this course include:

  • Understand how pipelines can make model development more efficient.

  • Define the terms and components in RHOAI used to organize and view pipelines.

  • Configure a pipeline server in a RHOAI data science project.

  • Build and submit a pipeline using the Elyra plugin for Jupyter Notebooks.

  • Execute the import of a pipeline into the data science pipelines.

  • Create a pipeline run and schedule recurring pipeline runs.

  • Use the data science pipeline dashboard to analyze the status of pipeline experiments and executions, and artifacts.