Create summaries of a large corpus of documents using Generative AI.
This solution showcases how to summarize a large corpus of documents using Generative AI. It provides an end-to-end demonstration of document summarization going all the way from raw documents, detecting text in the documents and summarizing the documents on-demand using Vertex AI LLM APIs, Cloud Vision Optical Character Recognition (OCR) and BigQuery.
To deploy this blueprint you must have an active billing account and billing permissions.
- The developer follows a tutorial on a Jupyter Notebook, where they upload a PDF — either through Vertex AI Workbench or Colaboratory.
- The uploaded PDF file is sent to a function running on Cloud Functions. This function handles PDF file processing.
- The Cloud Functions function uses Cloud Vision to extract all text from the PDF file.
- The Cloud Functions function stores the extracted text inside a Cloud Storage bucket.
- The Cloud Functions function uses Vertex AI’s LLM API to summarize the extracted text.
- The Cloud Functions function stores the text summaries of PDFs in BigQuery tables.
- As an alternative to uploading PDF files through Jupyter Notebook, the developer can upload a PDF file directly to a Cloud Storage bucket — for instance, through the Console UI or gcloud. This upload triggers Eventarc to begin the Document Processing phase.
- As a result of the direct upload to Cloud Storage, Eventarc triggers the Document Processing phase, handled by Cloud Functions.
Configuration: 1 mins Deployment: 10 mins
Name | Description | Type | Default | Required |
---|---|---|---|---|
bucket_name | The name of the bucket to create | string |
"genai-webhook" |
no |
gcf_timeout_seconds | GCF execution timeout | number |
900 |
no |
project_id | The Google Cloud project ID to deploy to | string |
n/a | yes |
region | Google Cloud region | string |
"us-central1" |
no |
time_to_enable_apis | Wait time to enable APIs in new projects | string |
"180s" |
no |
webhook_name | Name of the webhook | string |
"webhook" |
no |
webhook_path | Path to the webhook directory | string |
"webhook" |
no |
Name | Description |
---|---|
genai_doc_summary_colab_url | The URL to launch the notebook tutorial for the Generateive AI Document Summarization Solution |
neos_walkthrough_url | The URL to launch the in-console tutorial for the Generative AI Document Summarization solution |
These sections describe requirements for using this module.
The following dependencies must be available:
- Terraform v0.13
- Terraform Provider for GCP plugin v3.0
A service account with the following roles must be used to provision the resources of this module:
- Storage Admin:
roles/storage.admin
A project with the following APIs enabled must be used to host the resources of this module:
- Google Cloud Storage JSON API:
storage-api.googleapis.com
Refer to the contribution guidelines for information on contributing to this module.
Please see our security disclosure process.