Skip to content

Commit

Permalink
Merge pull request #117 from bespokelabsai/mahesh/update_doc
Browse files Browse the repository at this point in the history
Update README and the poem example.
  • Loading branch information
CharlieJCJ authored Nov 15, 2024
2 parents 4a7ae7f + 1ae2173 commit 3a88375
Show file tree
Hide file tree
Showing 4 changed files with 83 additions and 72 deletions.
53 changes: 44 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<p align="center">
<a href="https://bespokelabs.ai/" target="_blank">
<picture>
<source media="(prefers-color-scheme: light)" width="80" srcset="./docs/Bespoke-Labs-Logomark-Red.png">
<img alt="Bespoke Labs Logo" width="80" src="./docs/Bespoke-Labs-Logomark-Red-on-Black.png">
<source media="(prefers-color-scheme: light)" width="80" srcset="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red.png">
<img alt="Bespoke Labs Logo" width="80" src="https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-on-Black.png">
</picture>
</a>
</p>
Expand Down Expand Up @@ -37,22 +37,57 @@ pip install bespokelabs-curator

```python
from bespokelabs import curator
import os
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List

os.environ['OPENAI_API_KEY'] = 'sk-...' # Set your OpenAI API key here
# Create a dataset object for the topics you want to create the poems.
topics = Dataset.from_dict({"topic": [
"Urban loneliness in a bustling city",
"Beauty of Bespoke Labs's Curator library"
]})

# Define a class to encapsulate a list of poems.
class Poem(BaseModel):
poem: str = Field(description="A poem.")

class Poems(BaseModel):
poems_list: List[Poem] = Field(description="A list of poems.")


# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
prompt_func=lambda: "Write a poem about the beauty of computer science",
# `prompt_func` takes a row of the dataset as input.
# `row` is a dictionary with a single key 'topic' in this case.
prompt_func=lambda row: f"Write two poems about {row['topic']}.",
model_name="gpt-4o-mini",
response_format=Poems,
# `row` is the input row, and `poems` is the `Poems` class which
# is parsed from the structured output from the LLM.
parse_func=lambda row, poems: [
{"topic": row["topic"], "poem": p.poem} for p in poems.poems_list
],
)

poem = poet()
print(poem["response"][0])
poem = poet(topics)
print(poem.to_pandas())
# Example output:
# topic poem
# 0 Urban loneliness in a bustling city In the city's heart, where the sirens wail,\nA...
# 1 Urban loneliness in a bustling city City streets hum with a bittersweet song,\nHor...
# 2 Beauty of Bespoke Labs's Curator library In whispers of design and crafted grace,\nBesp...
# 3 Beauty of Bespoke Labs's Curator library In the hushed breath of parchment and ink,\nBe...
```
Note that `topics` can be created with `curator.Prompter` as well,
and we can scale this up to create tens of thousands of diverse poems.
You can see a more detailed example in the [examples/poem.py](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples/poem.py) file,
and other examples in the [examples](https://github.com/bespokelabsai/curator/blob/mahesh/update_doc/examples) directory.

You can see more examples in the [examples](examples) directory.
To run the examples, make sure to set your OpenAI API key in
the environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.

To run the examples, make sure to set your OpenAI API key in the environment variable `OPENAI_API_KEY` by running `export OPENAI_API_KEY=sk-...` in your terminal.
See the [docs](https://docs.bespokelabs.ai/) for more details as well as
for troubleshooting information.

## Bespoke Curator Viewer

Expand Down
44 changes: 34 additions & 10 deletions bespoke-dataset-viewer/components/dataset-viewer/RunsTable.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,46 @@ const COLUMNS: Column[] = [
]

const EXAMPLE_CODE = `from bespokelabs import curator
import os
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List
# Set your OpenAI API key here
os.environ['OPENAI_API_KEY'] = 'sk-...'
# Create a dataset object for the topics you want to create the poems.
topics = Dataset.from_dict({"topic": [
"Urban loneliness in a bustling city",
"Beauty of Bespoke Labs's Curator library"
]})
# Create a prompter instance
# Define a class to encapsulate a list of poems.
class Poem(BaseModel):
poem: str = Field(description="A poem.")
class Poems(BaseModel):
poems_list: List[Poem] = Field(description="A list of poems.")
# We define a Prompter that generates poems which gets applied to the topics dataset.
poet = curator.Prompter(
prompt_func=lambda: {
"user_prompt": "Write a poem about the beauty of computer science"
},
# prompt_func takes a row of the dataset as input.
# row is a dictionary with a single key 'topic' in this case.
prompt_func=lambda row: f"Write two poems about {row['topic']}.",
model_name="gpt-4o-mini",
response_format=Poems,
# row is the input row, and poems is the Poems class which
# is parsed from the structured output from the LLM.
parse_func=lambda row, poems: [
{"topic": row["topic"], "poem": p.poem} for p in poems.poems_list
],
)
# Generate and print the poem
poem = poet()
print(poem.to_list()[0])`
poem = poet(topics)
print(poem.to_pandas())
# Example output:
# topic poem
# 0 Urban loneliness in a bustling city In the city's heart, where the sirens wail,\\nA...
# 1 Urban loneliness in a bustling city City streets hum with a bittersweet song,\\nHor...
# 2 Beauty of Bespoke Labs's Curator library In whispers of design and crafted grace,\\nBesp...
# 3 Beauty of Bespoke Labs's Curator library In the hushed breath of parchment and ink,\\nBe...`

export function RunsTable() {
const [runs, setRuns] = useState<Run[]>([])
Expand Down
6 changes: 3 additions & 3 deletions bespoke-dataset-viewer/components/ui/python-highlighter.tsx
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import { Button } from "@/components/ui/button";
import { Check, Copy } from "lucide-react";
import Prism from 'prismjs';
import 'prismjs/components/prism-python';
import 'prismjs/themes/prism-tomorrow.css';
import React from 'react';
import { Button } from "@/components/ui/button"
import { Check, Copy } from "lucide-react"

interface PythonHighlighterProps {
code: string;
Expand Down Expand Up @@ -38,7 +38,7 @@ export const PythonHighlighter: React.FC<PythonHighlighterProps> = ({ code }) =>
)}
</Button>
</div>
<pre className="text-sm bg-gray-900 p-4 m-0">
<pre className="bg-gray-900 p-4 m-0" style={{ fontSize: '12px' }}>
<code className="language-python">
{code}
</code>
Expand Down
52 changes: 2 additions & 50 deletions examples/poem.py
Original file line number Diff line number Diff line change
@@ -1,54 +1,6 @@
"""Example of using the curator library to generate diverse poems.
We generate 10 diverse topics and then generate 2 poems for each topic.
You can do this in a loop, but that is inefficient and breaks when requests fail.
When you need to do this thousands of times (or more), you need a better abstraction.
curator.Prompter takes care of this heavy lifting.
# Key Components of Prompter
## prompt_func
Calls an LLM on each row of the input dataset in parallel.
1. Takes a dataset row as input
2. Returns the prompt for the LLM
## parse_func
Converts LLM output into structured data by adding it back to the dataset.
1. Takes two arguments:
- Input row
- LLM response (in response_format)
2. Returns new rows (in list of dictionaries)
# Data Flow Example
Input Dataset:
Row A
Row B
Processing by Prompter:
Row A → prompt_func(A) → Response R1 → parse_func(A, R1) → [C, D]
Row B → prompt_func(B) → Response R2 → parse_func(B, R2) → [E, F]
Output Dataset:
Row C
Row D
Row E
Row F
In this example:
- The two input rows (A and B) are processed in parallel to prompt the LLM
- Each generates a response (R1 and R2)
- The parse function converts each response into (multiple) new rows (C, D, E, F)
- The final dataset contains all generated rows
You can chain prompters together to iteratively build up a dataset.
"""
We generate 10 diverse topics and then generate 2 poems for each topic."""

from bespokelabs import curator
from datasets import Dataset
Expand Down Expand Up @@ -103,4 +55,4 @@ class Poems(BaseModel):
# 0 Dreams vs. reality In the realm where dreams take flight,\nWhere ...
# 1 Dreams vs. reality Reality stands with open eyes,\nA weighty thro...
# 2 Urban loneliness in a bustling city In the city's heart where shadows blend,\nAmon...
# 3 Urban loneliness in a bustling city Among the crowds, I walk alone,\nA sea of face...
# 3 Urban loneliness in a bustling city Among the crowds, I walk alone,\nA sea of face...

0 comments on commit 3a88375

Please sign in to comment.