RHEL AI on AWS

Objectives

  • Explain how to deploy RHEL AI on AWS.

  • Optional: Hands-on Activity: Deploy Red Hat AI on AWS

  • Analyze the RHEL AI server requirements for training versus inference

RHEL AI

Red Hat Enterprise Linux AI (RHEL AI) is a platform that allows you to develop enterprise applications on open source Large Language Models (LLMs). RHEL AI is built from the Red Hat InstructLab open source project. For more detailed information: see the "InstructLab and RHEL AI" documentation section.

Red Hat Enterprise Linux AI allows you to do the following:

  • Serve an LLM and interact with the open source Granite family of Large Language Models (LLMs) locally or via an exposed endpoint.

  • Fine-tune a large (or small) language model with that data with minimal machine learning background.

  • Interact via chat with the model that has been fine tuned with your data.

  • Store data within a taxonomy format to organize knowledge and skills

  • Generate synthetic data to utilize in AI model alignment / fine tuning the model with new knowledge or skills.

Red Hat Enterprise Linux AI empowers you to contribute directly to LLMs. This allows you to easily and efficiently build AI-based applications, including chatbots.

Introduction RHEL AI Course

If you are new to RHEL AI, this ~15 minute intermediate course provides a detailed overview of the features, terms, and high level concepts of RHEL AI. We will not review these basics in this course.

Introduction to RHEL AI, just launched for technical sellers to get a high-level technical look at RHEL AI and InstructLab.


RHEL AI on AWS

In this course, we will delve into the essential knowledge and information required to make well-informed decisions when deploying Red Hat Enterprise Linux (RHEL) AI on Amazon Web Services (AWS). At present, there is no managed service offering of RHEL AI available on AWS; however, clients can opt for self-managed deployment. This curriculum will provide guidance on suitable AWS instance selections to optimize resource utilization when implementing RHEL AI.

By the end of this course, you will have a comprehensive understanding of best practices and recommendations for deploying RHEL AI on AWS, ensuring efficient use of resources and optimal performance.

Deploying RHEL AI on AWS

For those eager to gain practical experience in deploying Red Hat Enterprise Linux (RHEL) AI on Amazon Web Services (AWS), this lab provides a step-by-step guide on creating and launching an RHEL AI Amazon Machine Image (AMI).

The overall objectives of the Deploying RHEL AI on AWS includes:

  • Create and configure a custom RHEL AI AMI image.

  • Upload the RHEL AI AMI to an AWS S3 bucket for storage and retrieval.

  • Provision a limited resource fully functional RHEL AI server on AWS.

In this lab, we offer a cost-effective approach by utilizing a fundamental GPU configuration to install Red Hat Enterprise Linux (RHEL) AI. While this setup is sufficient for getting started with RHEL AI, it may limit performance compared to more powerful configurations. This exercise allows you to explore the basics of deploying and managing RHEL AI on AWS while keeping costs minimal.

By the end of this lab, you will have gained valuable experience in setting up a foundational RHEL AI environment on Amazon Web Services (AWS) with a cost-conscious GPU configuration. This knowledge can serve as a solid foundation for further enhancing your skills and optimizing performance when deploying RHEL AI at scale.

Visit the RHEL AI documentation for full instructions on installing RHEL AI on AWS Installing RHEL AI on AWS

RHEL AI Deployment on AWS

There are two approaches to utilizing RHEL AI in AWS,

  1. Using RHEL AI as a solution for AI model inferencing (serving), which means to make the model available via an endpoint for consumption via business applications.

  2. The second option is use RHEL AI to customize, fine-tune, or align the model to specific business knowledge in order to provide more accurate responses to the connecting application.

A single RHEL AI server can perform both functions but it’s not recommended to augment (train) a model while also providing inferencing for an application on a single RHEL AI server as performing both services may exhaust accelerator (GPU) memory resources.

Running multiple instances of RHEL AI, one for training and another inferencing can be the most cost effective option as the inferencing requirements are significantly less than the model training specifications.

As you will notice, the requirements for full end-to-end workflow are significantly higher than for model inference.

RHEL AI Instance Options

The following charts show the recommended hardware requirements for running the full InstructLab end-to-end workflow to customize the Granite student model. This includes: synthetic data generation (SDG), training, and evaluating a custom Granite model.

Amazon Web Service Instance Types

Hardware vendor Supported Accelerators(GPUs) GPU Memory AWS Instances Recommended disk storage

Nvidia

8xA100

320 GB

p4d.24xlarge

3 TB

Nvidia

8xH100

640 GB

p4de.24xlarge

3 TB

Nvidia

8xH100

640 GB

p5.48xlarge

3 TB

Nvidia

8xL40S

384 GB

g6e.48xlarge

3 TB

The following charts display the minimum hardware requirements for inference serving a model on Red Hat Enterprise Linux AI.

Amazon Web Service Instance Types

Hardware vendor Supported Accelerators(GPUs) GPU Memory AWS Instances Recommended disk storage

Nvidia

A100

40 GB

P4d series

1 TB

Nvidia

H100

80 GB

P5 series

1 TB

Nvidia

L40S

48 GB

G6e series

1 TB

Nvidia

L4

24 GB

G6 series

1 TB


It is essential to note that the actual memory requirements may vary depending on factors such as model architecture, batch size, and other specific use cases. This guideline serves as a useful starting point for selecting instances with sufficient GPU memory to run AI models effectively on Amazon Web Services (AWS).


Summary

For this course, the topics covered are specific to understanding that we can deploy multiple configurations of RHEL AI servers on AWS for functions of training AI models or for inference purposes to power applications.

Training instances can be paused or terminated once the model is aligned to specific business cases, saving resource cost for only when model updates, or new model development tasks are needed.