Unable to reproduce results in "Say What You Mean" using MBP M3 Max #1316

dylanjcastillo · 2024-12-03T15:02:52Z

Describe the issue as clearly as possible:

Hi there,

I really enjoyed reading Say What You Mean, so I decided to replicate some of the results.

I tried running this notebook to generate a sample response using the unstructured and structured generation and found that the structured generation mostly generates single characters (e.g., "!", "1", "#"):

I followed the original code exactly, only skipping the generation of all the unstructured responses.

The issue seems related to the schema_regex generated. I haven't been able to get the structured generation to output things other than a single character.

I'm using a MBP M3 Max. I installed outlines==0.1.7 with the transformers extras (i.e., outlines[transfomers]). I also tried downgrading to 0.1.5 but that didn't work either.

Any ideas of what could be going wrong?

Steps/code to reproduce the bug:

import json
import outlines
import torch
from transformers import AutoTokenizer
from textwrap import dedent
from datasets import load_dataset
import re
from outlines.samplers import greedy
from pydantic import BaseModel, Field, constr
from outlines.fsm.json_schema import build_regex_from_schema

MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
dataset = load_dataset("gsm8k", "main")
all_evals = list(dataset['test'])

model = outlines.models.transformers(
    MODEL_NAME,
    device='mps',
    model_kwargs={
        'torch_dtype': torch.bfloat16,
        'trust_remote_code': True
    })
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

example_question = [
    "There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?",
    "If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?",
    "Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?"
]

example_response = [
    """{"reasoning": "There are 15 trees originally. Then there were 21 trees after some more were planted. So there must have been 21 - 15 = 6.", "answer": 6}""",
    """{"reasoning": "There are originally 3 cars. 2 more cars arrive. 3 + 2 = 5.", "answer": 5}""",
    """{"reasoning": "Originally, Leah had 32 chocolates. Her sister had 42. So in total they had 32 + 42 = 74. After eating 35, they had 74 - 35 = 39.","answer": 39"""
]

def create_prompt(question, tokenizer):
    messages = [
        {
            "role": "system",
            "content": dedent("""
            You are an expert in solving grade school math tasks. You will be presented with a grade-school math word problem and be asked to solve it.
            Before answering you should reason about the problem (using the "reasoning" field in the JSON response described below).
              
            You will always repond with JSON in the format described below:
            
            {"reasoning": <reasoning about the answer>, "answer": <final answer>}

            The "reasoning" field will contain your reasoning about the sequence of events, and the "answer" will contain the single letter representing the correct choice you are presented with.
            """)
        },]
    for i in range(len(example_question)):
        messages.append(
        {
            "role": "user",
            "content": """Question: {question}""".format(question=example_question[i])
        }
        )
        messages.append(
        {
            "role": "assistant",
            "content": example_response[i]        
        })
    messages.append(
        {
            "role": "user",
            "content": """Question: {question}", """.format(question=question)
        })
    messages.append(
        {
            "role": "assistant",
            "content": ""
        }
    )
    return tokenizer.apply_chat_template(messages, tokenize=False)

class Response(BaseModel):
    reasoning: constr(max_length=1000)
    answer: int = Field(pattern=r'[1-9][0-9]{0,9}')

schema_regex = build_regex_from_schema(Response.schema_json())
structured_generator = outlines.generate.regex(model, regex_str=schema_regex, sampler=greedy())
structured_generator(create_prompt(all_evals[5]['question'], tokenizer))

Expected result:

A json structure with two keys: reasoning and answer.

Error message:

No response

Outlines/Python version information:

Version information

```

0.1.7

Python 3.12.6 (main, Sep 6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]

accelerate==1.1.1
aiohappyeyeballs==2.4.4
aiohttp==3.11.9
aiosignal==1.3.1
airportsdata==20241001
annotated-types==0.7.0
anyio==4.6.2.post1
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==3.0.0
async-lru==2.0.4
attrs==24.2.0
babel==2.16.0
beautifulsoup4==4.12.3
bleach==6.2.0
cachetools==5.5.0
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
click==8.1.7
cloudpickle==3.1.0
comm==0.2.2
datasets==3.1.0
debugpy==1.8.9
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
docstring_parser==0.16
executing==2.1.0
fastapi==0.115.5
fastjsonschema==2.21.1
filelock==3.16.1
fqdn==1.5.1
frozenlist==1.5.0
fsspec==2024.9.0
google-ai-generativelanguage==0.6.10
google-api-core==2.23.0
google-api-python-client==2.154.0
google-auth==2.36.0
google-auth-httplib2==0.2.0
google-generativeai==0.8.3
googleapis-common-protos==1.66.0
grpcio==1.68.1
grpcio-status==1.68.1
grpclib==0.4.7
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==1.0.7
httplib2==0.22.0
httpx==0.28.0
huggingface-hub==0.26.3
hyperframe==6.0.1
idna==3.10
instructor==1.7.0
interegular==0.3.3
ipykernel==6.29.5
ipython==8.30.0
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.4
jiter==0.6.1
joblib==1.4.2
json5==0.10.0
jsonpointer==3.0.0
jsonref==1.1.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.6
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
langsmith==0.1.147
lark==1.2.2
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mistune==3.0.2
modal==0.67.20
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
nbclient==0.10.1
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.4.2
notebook==7.2.2
notebook_shim==0.2.4
numpy==1.26.4
openai==1.56.0
orjson==3.10.12
outlines==0.1.7
outlines_core==0.1.17
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pillow==10.4.0
platformdirs==4.3.6
prometheus_client==0.21.0
prompt_toolkit==3.0.48
propcache==0.2.1
proto-plus==1.25.0
protobuf==5.29.0
psutil==6.1.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==18.1.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pycountry==24.6.1
pycparser==2.22
pydantic==2.10.2
pydantic_core==2.27.1
Pygments==2.18.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
requests-toolbelt==1.0.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.9.4
rpds-py==0.22.0
rsa==4.9
safetensors==0.4.5
scikit-learn==1.5.2
scipy==1.14.1
Send2Trash==1.8.3
setuptools==75.6.0
shellingham==1.5.4
sigtools==4.0.1
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
stack-data==0.6.3
starlette==0.41.3
sympy==1.13.1
synchronicity==0.9.5
tenacity==9.0.0
terminado==0.18.1
threadpoolctl==3.5.0
tinycss2==1.4.0
tokenizers==0.20.3
toml==0.10.2
torch==2.5.1
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
transformers==4.46.3
typer==0.14.0
types-certifi==2021.10.8.3
types-python-dateutil==2.9.0.20241003
types-toml==0.10.8.20240310
typing_extensions==4.12.2
tzdata==2024.2
uri-template==1.3.0
uritemplate==4.1.1
urllib3==2.2.3
watchfiles==1.0.0
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
xxhash==3.5.0
yarl==1.18.3

</details>

### Context for the issue:

_No response_

EDIT: Fixed typos, added original Response model.

The text was updated successfully, but these errors were encountered:

rlouf · 2024-12-03T15:49:39Z

@willkurt

dylanjcastillo · 2024-12-03T19:00:13Z

Quick update: The problem seems related to running on Apple/M3/my Mac(?). I used a machine with Debian (debian-slim on Modal) and it worked.

rlouf · 2024-12-04T10:18:16Z

Related to #1306?

dylanjcastillo · 2024-12-04T10:30:36Z

Oh yes, it's the same issue!

dylanjcastillo added the bug label Dec 3, 2024

dylanjcastillo changed the title ~~Unable to reproduce results in "Say What You Mean"~~ Unable to reproduce results in "Say What You Mean" using (MBP M3 Max) Dec 3, 2024

dylanjcastillo changed the title ~~Unable to reproduce results in "Say What You Mean" using (MBP M3 Max)~~ Unable to reproduce results in "Say What You Mean" using MBP M3 Max Dec 3, 2024

dylanjcastillo mentioned this issue Dec 4, 2024

Fix GuideLogitsProcessor for MPS device #1306

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce results in "Say What You Mean" using MBP M3 Max #1316

Unable to reproduce results in "Say What You Mean" using MBP M3 Max #1316

dylanjcastillo commented Dec 3, 2024 •

edited

Loading

rlouf commented Dec 3, 2024

dylanjcastillo commented Dec 3, 2024

rlouf commented Dec 4, 2024

dylanjcastillo commented Dec 4, 2024

Unable to reproduce results in "Say What You Mean" using MBP M3 Max #1316

Unable to reproduce results in "Say What You Mean" using MBP M3 Max #1316

Comments

dylanjcastillo commented Dec 3, 2024 • edited Loading

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

rlouf commented Dec 3, 2024

dylanjcastillo commented Dec 3, 2024

rlouf commented Dec 4, 2024

dylanjcastillo commented Dec 4, 2024

dylanjcastillo commented Dec 3, 2024 •

edited

Loading