Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README and the poem example. #117

Merged
merged 10 commits into from
Nov 15, 2024
53 changes: 44 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<p align="center">
<a href="https://bespokelabs.ai/" target="_blank">
<picture>
<source media="(prefers-color-scheme: light)" width="80" srcset="./docs/Bespoke-Labs-Logomark-Red.png">
<img alt="Bespoke Labs Logo" width="80" src="./docs/Bespoke-Labs-Logomark-Red-on-Black.png">
<source media="(prefers-color-scheme: light)" width="80" srcset="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red.png">
<img alt="Bespoke Labs Logo" width="80" src="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-on-Black.png">
</picture>
</a>
</p>
Expand Down Expand Up @@ -37,22 +37,57 @@ pip install bespokelabs-curator

```python
from bespokelabs import curator
import os
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List

os.environ['OPENAI_API_KEY'] = 'sk-...' # Set your OpenAI API key here
# Create a dataset object for the topics you want to create the poems.
topics = Dataset.from_dict({"topic": [
"Urban loneliness in a bustling city",
"Beauty of Bespoke Labs's Curator library"
]})

# Define a class to encapsulate a list of poems.
class Poem(BaseModel):
poem: str = Field(description="A poem.")

class Poems(BaseModel):
poems_list: List[Poem] = Field(description="A list of poems.")


# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
prompt_func=lambda: "Write a poem about the beauty of computer science",
# `prompt_func` takes a row of the dataset as input.
# `row` is a dictionary with a single key 'topic' in this case.
prompt_func=lambda row: f"Write two poems about {row['topic']}.",
model_name="gpt-4o-mini",
response_format=Poems,
# `row` is the input row, and `poems` is the `Poems` class which
# is parsed from the structured output from the LLM.
parse_func=lambda row, poems: [
{"topic": row["topic"], "poem": p.poem} for p in poems.poems_list
],
)

poem = poet()
print(poem["response"][0])
poem = poet(topics)
print(poem.to_pandas())
# Example output:
# topic poem
# 0 Urban loneliness in a bustling city In the city's heart, where the sirens wail,\nA...
# 1 Urban loneliness in a bustling city City streets hum with a bittersweet song,\nHor...
# 2 Beauty of Bespoke Labs's Curator library In whispers of design and crafted grace,\nBesp...
# 3 Beauty of Bespoke Labs's Curator library In the hushed breath of parchment and ink,\nBe...
```
Note that `topics` can be created with `curator.Prompter` as well,
CharlieJCJ marked this conversation as resolved.
Show resolved Hide resolved
and we can scale this up to create tens of thousands of diverse poems.
You can see a more detailed example in the [examples/poem.py](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples/poem.py) file,
and other examples in the [examples](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples) directory.

You can see more examples in the [examples](examples) directory.
To run the examples, make sure to set your OpenAI API key in
the environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.

To run the examples, make sure to set your OpenAI API key in the environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.
See the [docs](https://docs.bespokelabs.ai/) for more details as well as
for troubleshooting information.

## Bespoke Curator Viewer

Expand Down
44 changes: 34 additions & 10 deletions bespoke-dataset-viewer/components/dataset-viewer/RunsTable.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,46 @@ const COLUMNS: Column[] = [
]

const EXAMPLE_CODE = `from bespokelabs import curator
import os
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List

# Set your OpenAI API key here
os.environ['OPENAI_API_KEY'] = 'sk-...'
# Create a dataset object for the topics you want to create the poems.
topics = Dataset.from_dict({"topic": [
"Urban loneliness in a bustling city",
"Beauty of Bespoke Labs's Curator library"
]})

# Create a prompter instance
# Define a class to encapsulate a list of poems.
class Poem(BaseModel):
poem: str = Field(description="A poem.")

class Poems(BaseModel):
poems_list: List[Poem] = Field(description="A list of poems.")


# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
prompt_func=lambda: {
"user_prompt": "Write a poem about the beauty of computer science"
},
# prompt_func takes a row of the dataset as input.
# row is a dictionary with a single key 'topic' in this case.
prompt_func=lambda row: f"Write two poems about {row['topic']}.",
model_name="gpt-4o-mini",
response_format=Poems,
# row is the input row, and poems is the Poems class which
# is parsed from the structured output from the LLM.
parse_func=lambda row, poems: [
{"topic": row["topic"], "poem": p.poem} for p in poems.poems_list
],
)

# Generate and print the poem
poem = poet()
print(poem.to_list()[0])`
poem = poet(topics)
print(poem.to_pandas())
# Example output:
# topic poem
# 0 Urban loneliness in a bustling city In the city's heart, where the sirens wail,\\nA...
# 1 Urban loneliness in a bustling city City streets hum with a bittersweet song,\\nHor...
# 2 Beauty of Bespoke Labs's Curator library In whispers of design and crafted grace,\\nBesp...
# 3 Beauty of Bespoke Labs's Curator library In the hushed breath of parchment and ink,\\nBe...`

export function RunsTable() {
const [runs, setRuns] = useState<Run[]>([])
Expand Down
6 changes: 3 additions & 3 deletions bespoke-dataset-viewer/components/ui/python-highlighter.tsx
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import { Button } from "@/components/ui/button";
import { Check, Copy } from "lucide-react";
import Prism from 'prismjs';
import 'prismjs/components/prism-python';
import 'prismjs/themes/prism-tomorrow.css';
import React from 'react';
import { Button } from "@/components/ui/button"
import { Check, Copy } from "lucide-react"

interface PythonHighlighterProps {
code: string;
Expand Down Expand Up @@ -38,7 +38,7 @@ export const PythonHighlighter: React.FC<PythonHighlighterProps> = ({ code }) =>
)}
</Button>
</div>
<pre className="text-sm bg-gray-900 p-4 m-0">
<pre className="bg-gray-900 p-4 m-0" style={{ fontSize: '12px' }}>
<code className="language-python">
{code}
</code>
Expand Down
52 changes: 2 additions & 50 deletions examples/poem.py
Original file line number Diff line number Diff line change
@@ -1,54 +1,6 @@
"""Example of using the curator library to generate diverse poems.

We generate 10 diverse topics and then generate 2 poems for each topic.

You can do this in a loop, but that is inefficient and breaks when requests fail.
When you need to do this thousands of times (or more), you need a better abstraction.

curator.Prompter takes care of this heavy lifting.

# Key Components of Prompter

## prompt_func

Calls an LLM on each row of the input dataset in parallel.

1. Takes a dataset row as input
2. Returns the prompt for the LLM

## parse_func

Converts LLM output into structured data by adding it back to the dataset.

1. Takes two arguments:
- Input row
- LLM response (in response_format)
2. Returns new rows (in list of dictionaries)


# Data Flow Example
Input Dataset:
Row A
Row B
Processing by Prompter:
Row A → prompt_func(A) → Response R1 → parse_func(A, R1) → [C, D]
Row B → prompt_func(B) → Response R2 → parse_func(B, R2) → [E, F]

Output Dataset:
Row C
Row D
Row E
Row F

In this example:

- The two input rows (A and B) are processed in parallel to prompt the LLM
- Each generates a response (R1 and R2)
- The parse function converts each response into (multiple) new rows (C, D, E, F)
- The final dataset contains all generated rows

You can chain prompters together to iteratively build up a dataset.
"""
We generate 10 diverse topics and then generate 2 poems for each topic."""

from bespokelabs import curator
from datasets import Dataset
Expand Down Expand Up @@ -103,4 +55,4 @@ class Poems(BaseModel):
# 0 Dreams vs. reality In the realm where dreams take flight,\nWhere ...
# 1 Dreams vs. reality Reality stands with open eyes,\nA weighty thro...
# 2 Urban loneliness in a bustling city In the city's heart where shadows blend,\nAmon...
# 3 Urban loneliness in a bustling city Among the crowds, I walk alone,\nA sea of face...
# 3 Urban loneliness in a bustling city Among the crowds, I walk alone,\nA sea of face...