This repository shows how to deploy a Burr Application
with BentoML.
Burr and BentoML help you build the application and serving layers of your system.
Burr creates easy to understand and debug applications with a clear path to production. It supports synchronous, asynchronous, and streaming actions. Persistence & durability, hooks, and telemetry features are built-in.
BentoML is a specialized tool to package, deploy, and manage AI services. Get the most performance from your system by specifying resource requirements (CPU, GPU, RAM, concurrency, workers, etc.), autoscaling, and adaptive batching for requests. It also automatically generates synchronous and asynchronous clients for your service.
web_page_qna/
is an introductory example to deploy with BentoML a BurrApplication
that uses LLMs to answer questions about a web page.
Join the BentoML developer community on Slack for more support and discussions!
Join the Burr Discord server for help, questions, and feature requests.