Generative AI and Data Science on AWS

Description

This workshop shows AWS users how to use Amazon SageMaker and other associated services to build, train, and deploy generative AI models. These labs go through data science topics such as data processing at scale, model fine-tuning, real-time model deployment, and MLOps practices all through a generative AI lens.

Distributed data processing

In this workflow, we will use the Amazon Customer Reviews Dataset for labs related to data processing as it contains a very large corpus of ~150 million customer reviews. This is useful for showcasing SageMaker's distributed processing abilities which can be extended to many large datasets.

Fine-tuning FLAN-T5 for summarizing conversation dialog

After the data processing sections, we will build our FLAN-T5 based NLP model using the dialogsum dataset from HuggingFace which contains ~15k examples of dialogue with associated summarizations.

O'Reilly Books: Generative AI on AWS and Data Science on AWS

This workshop is based on the O'Reilly Books, "Generative AI on AWS" and "Data Science on AWS" by Chris Fregly, Antje Barth, and Shelbee Eigenbrode @ AWS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Generative AI and Data Science on AWS

Description

Distributed data processing

Fine-tuning FLAN-T5 for summarizing conversation dialog

Table of Contents

Setup

PART 1: Distributed data processing

PART 2: Fine-tuning FLAN-T5 for summarizing conversation dialog (SageMaker Studio Notebook)

PART 3: Fine-tuning FLAN-T5 for summarizing conversation dialog (SageMaker Cluster)

PART 4: Automating fine-tuning workflows with SageMaker Pipelines

PART 5: Advanced fine-tuning with PEFT and RLHF

O'Reilly Books: Generative AI on AWS and Data Science on AWS

Generative AI on AWS

Related Links

Data Science on AWS

Related Links

Security

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Generative AI and Data Science on AWS

Description

Distributed data processing

Fine-tuning FLAN-T5 for summarizing conversation dialog

Table of Contents

Setup

PART 1: Distributed data processing

PART 2: Fine-tuning FLAN-T5 for summarizing conversation dialog (SageMaker Studio Notebook)

PART 3: Fine-tuning FLAN-T5 for summarizing conversation dialog (SageMaker Cluster)

PART 4: Automating fine-tuning workflows with SageMaker Pipelines

PART 5: Advanced fine-tuning with PEFT and RLHF

O'Reilly Books: Generative AI on AWS and Data Science on AWS

Generative AI on AWS

Related Links

Data Science on AWS

Related Links

Security

License