Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce results in "Say What You Mean" using MBP M3 Max #1316

Open
dylanjcastillo opened this issue Dec 3, 2024 · 4 comments
Open
Labels

Comments

@dylanjcastillo
Copy link

dylanjcastillo commented Dec 3, 2024

Describe the issue as clearly as possible:

Hi there,

I really enjoyed reading Say What You Mean, so I decided to replicate some of the results.

I tried running this notebook to generate a sample response using the unstructured and structured generation and found that the structured generation mostly generates single characters (e.g., "!", "1", "#"):

image

I followed the original code exactly, only skipping the generation of all the unstructured responses.

The issue seems related to the schema_regex generated. I haven't been able to get the structured generation to output things other than a single character.

I'm using a MBP M3 Max. I installed outlines==0.1.7 with the transformers extras (i.e., outlines[transfomers]). I also tried downgrading to 0.1.5 but that didn't work either.

Any ideas of what could be going wrong?

Steps/code to reproduce the bug:

import json
import outlines
import torch
from transformers import AutoTokenizer
from textwrap import dedent
from datasets import load_dataset
import re
from outlines.samplers import greedy
from pydantic import BaseModel, Field, constr
from outlines.fsm.json_schema import build_regex_from_schema

MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
dataset = load_dataset("gsm8k", "main")
all_evals = list(dataset['test'])

model = outlines.models.transformers(
    MODEL_NAME,
    device='mps',
    model_kwargs={
        'torch_dtype': torch.bfloat16,
        'trust_remote_code': True
    })
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

example_question = [
    "There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?",
    "If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?",
    "Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?"
]

example_response = [
    """{"reasoning": "There are 15 trees originally. Then there were 21 trees after some more were planted. So there must have been 21 - 15 = 6.", "answer": 6}""",
    """{"reasoning": "There are originally 3 cars. 2 more cars arrive. 3 + 2 = 5.", "answer": 5}""",
    """{"reasoning": "Originally, Leah had 32 chocolates. Her sister had 42. So in total they had 32 + 42 = 74. After eating 35, they had 74 - 35 = 39.","answer": 39"""
]

def create_prompt(question, tokenizer):
    messages = [
        {
            "role": "system",
            "content": dedent("""
            You are an expert in solving grade school math tasks. You will be presented with a grade-school math word problem and be asked to solve it.
            Before answering you should reason about the problem (using the "reasoning" field in the JSON response described below).
              
            You will always repond with JSON in the format described below:
            
            {"reasoning": <reasoning about the answer>, "answer": <final answer>}

            The "reasoning" field will contain your reasoning about the sequence of events, and the "answer" will contain the single letter representing the correct choice you are presented with.
            """)
        },]
    for i in range(len(example_question)):
        messages.append(
        {
            "role": "user",
            "content": """Question: {question}""".format(question=example_question[i])
        }
        )
        messages.append(
        {
            "role": "assistant",
            "content": example_response[i]        
        })
    messages.append(
        {
            "role": "user",
            "content": """Question: {question}", """.format(question=question)
        })
    messages.append(
        {
            "role": "assistant",
            "content": ""
        }
    )
    return tokenizer.apply_chat_template(messages, tokenize=False)

class Response(BaseModel):
    reasoning: constr(max_length=1000)
    answer: int = Field(pattern=r'[1-9][0-9]{0,9}')

schema_regex = build_regex_from_schema(Response.schema_json())
structured_generator = outlines.generate.regex(model, regex_str=schema_regex, sampler=greedy())
structured_generator(create_prompt(all_evals[5]['question'], tokenizer))

Expected result:

A json structure with two keys: reasoning and answer.

Error message:

No response

Outlines/Python version information:

Version information

```

0.1.7

Python 3.12.6 (main, Sep 6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]

accelerate==1.1.1
aiohappyeyeballs==2.4.4
aiohttp==3.11.9
aiosignal==1.3.1
airportsdata==20241001
annotated-types==0.7.0
anyio==4.6.2.post1
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==3.0.0
async-lru==2.0.4
attrs==24.2.0
babel==2.16.0
beautifulsoup4==4.12.3
bleach==6.2.0
cachetools==5.5.0
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
click==8.1.7
cloudpickle==3.1.0
comm==0.2.2
datasets==3.1.0
debugpy==1.8.9
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
docstring_parser==0.16
executing==2.1.0
fastapi==0.115.5
fastjsonschema==2.21.1
filelock==3.16.1
fqdn==1.5.1
frozenlist==1.5.0
fsspec==2024.9.0
google-ai-generativelanguage==0.6.10
google-api-core==2.23.0
google-api-python-client==2.154.0
google-auth==2.36.0
google-auth-httplib2==0.2.0
google-generativeai==0.8.3
googleapis-common-protos==1.66.0
grpcio==1.68.1
grpcio-status==1.68.1
grpclib==0.4.7
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==1.0.7
httplib2==0.22.0
httpx==0.28.0
huggingface-hub==0.26.3
hyperframe==6.0.1
idna==3.10
instructor==1.7.0
interegular==0.3.3
ipykernel==6.29.5
ipython==8.30.0
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.4
jiter==0.6.1
joblib==1.4.2
json5==0.10.0
jsonpointer==3.0.0
jsonref==1.1.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.6
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
langsmith==0.1.147
lark==1.2.2
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mistune==3.0.2
modal==0.67.20
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
nbclient==0.10.1
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.4.2
notebook==7.2.2
notebook_shim==0.2.4
numpy==1.26.4
openai==1.56.0
orjson==3.10.12
outlines==0.1.7
outlines_core==0.1.17
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pillow==10.4.0
platformdirs==4.3.6
prometheus_client==0.21.0
prompt_toolkit==3.0.48
propcache==0.2.1
proto-plus==1.25.0
protobuf==5.29.0
psutil==6.1.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==18.1.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pycountry==24.6.1
pycparser==2.22
pydantic==2.10.2
pydantic_core==2.27.1
Pygments==2.18.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
requests-toolbelt==1.0.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.9.4
rpds-py==0.22.0
rsa==4.9
safetensors==0.4.5
scikit-learn==1.5.2
scipy==1.14.1
Send2Trash==1.8.3
setuptools==75.6.0
shellingham==1.5.4
sigtools==4.0.1
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
stack-data==0.6.3
starlette==0.41.3
sympy==1.13.1
synchronicity==0.9.5
tenacity==9.0.0
terminado==0.18.1
threadpoolctl==3.5.0
tinycss2==1.4.0
tokenizers==0.20.3
toml==0.10.2
torch==2.5.1
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
transformers==4.46.3
typer==0.14.0
types-certifi==2021.10.8.3
types-python-dateutil==2.9.0.20241003
types-toml==0.10.8.20240310
typing_extensions==4.12.2
tzdata==2024.2
uri-template==1.3.0
uritemplate==4.1.1
urllib3==2.2.3
watchfiles==1.0.0
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
xxhash==3.5.0
yarl==1.18.3

</details>

### Context for the issue:

_No response_

EDIT: Fixed typos, added original Response model.
@rlouf
Copy link
Member

rlouf commented Dec 3, 2024

@willkurt

@dylanjcastillo
Copy link
Author

Quick update: The problem seems related to running on Apple/M3/my Mac(?). I used a machine with Debian (debian-slim on Modal) and it worked.

@dylanjcastillo dylanjcastillo changed the title Unable to reproduce results in "Say What You Mean" Unable to reproduce results in "Say What You Mean" using (MBP M3 Max) Dec 3, 2024
@dylanjcastillo dylanjcastillo changed the title Unable to reproduce results in "Say What You Mean" using (MBP M3 Max) Unable to reproduce results in "Say What You Mean" using MBP M3 Max Dec 3, 2024
@rlouf
Copy link
Member

rlouf commented Dec 4, 2024

Related to #1306?

@dylanjcastillo
Copy link
Author

Oh yes, it's the same issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants