You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I really enjoyed reading Say What You Mean, so I decided to replicate some of the results.
I tried running this notebook to generate a sample response using the unstructured and structured generation and found that the structured generation mostly generates single characters (e.g., "!", "1", "#"):
I followed the original code exactly, only skipping the generation of all the unstructured responses.
The issue seems related to the schema_regex generated. I haven't been able to get the structured generation to output things other than a single character.
I'm using a MBP M3 Max. I installed outlines==0.1.7 with the transformers extras (i.e., outlines[transfomers]). I also tried downgrading to 0.1.5 but that didn't work either.
Any ideas of what could be going wrong?
Steps/code to reproduce the bug:
importjsonimportoutlinesimporttorchfromtransformersimportAutoTokenizerfromtextwrapimportdedentfromdatasetsimportload_datasetimportrefromoutlines.samplersimportgreedyfrompydanticimportBaseModel, Field, constrfromoutlines.fsm.json_schemaimportbuild_regex_from_schemaMODEL_NAME="meta-llama/Meta-Llama-3-8B-Instruct"dataset=load_dataset("gsm8k", "main")
all_evals=list(dataset['test'])
model=outlines.models.transformers(
MODEL_NAME,
device='mps',
model_kwargs={
'torch_dtype': torch.bfloat16,
'trust_remote_code': True
})
tokenizer=AutoTokenizer.from_pretrained(MODEL_NAME)
example_question= [
"There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done, there will be 21 trees. How many trees did the grove workers plant today?",
"If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?",
"Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?"
]
example_response= [
"""{"reasoning": "There are 15 trees originally. Then there were 21 trees after some more were planted. So there must have been 21 - 15 = 6.", "answer": 6}""",
"""{"reasoning": "There are originally 3 cars. 2 more cars arrive. 3 + 2 = 5.", "answer": 5}""",
"""{"reasoning": "Originally, Leah had 32 chocolates. Her sister had 42. So in total they had 32 + 42 = 74. After eating 35, they had 74 - 35 = 39.","answer": 39"""
]
defcreate_prompt(question, tokenizer):
messages= [
{
"role": "system",
"content": dedent(""" You are an expert in solving grade school math tasks. You will be presented with a grade-school math word problem and be asked to solve it. Before answering you should reason about the problem (using the "reasoning" field in the JSON response described below). You will always repond with JSON in the format described below: {"reasoning": <reasoning about the answer>, "answer": <final answer>} The "reasoning" field will contain your reasoning about the sequence of events, and the "answer" will contain the single letter representing the correct choice you are presented with. """)
},]
foriinrange(len(example_question)):
messages.append(
{
"role": "user",
"content": """Question: {question}""".format(question=example_question[i])
}
)
messages.append(
{
"role": "assistant",
"content": example_response[i]
})
messages.append(
{
"role": "user",
"content": """Question: {question}", """.format(question=question)
})
messages.append(
{
"role": "assistant",
"content": ""
}
)
returntokenizer.apply_chat_template(messages, tokenize=False)
classResponse(BaseModel):
reasoning: constr(max_length=1000)
answer: int=Field(pattern=r'[1-9][0-9]{0,9}')
schema_regex=build_regex_from_schema(Response.schema_json())
structured_generator=outlines.generate.regex(model, regex_str=schema_regex, sampler=greedy())
structured_generator(create_prompt(all_evals[5]['question'], tokenizer))
Expected result:
A json structure with two keys: reasoning and answer.
Quick update: The problem seems related to running on Apple/M3/my Mac(?). I used a machine with Debian (debian-slim on Modal) and it worked.
dylanjcastillo
changed the title
Unable to reproduce results in "Say What You Mean"
Unable to reproduce results in "Say What You Mean" using (MBP M3 Max)
Dec 3, 2024
dylanjcastillo
changed the title
Unable to reproduce results in "Say What You Mean" using (MBP M3 Max)
Unable to reproduce results in "Say What You Mean" using MBP M3 Max
Dec 3, 2024
Describe the issue as clearly as possible:
Hi there,
I really enjoyed reading Say What You Mean, so I decided to replicate some of the results.
I tried running this notebook to generate a sample response using the unstructured and structured generation and found that the structured generation mostly generates single characters (e.g., "!", "1", "#"):
I followed the original code exactly, only skipping the generation of all the unstructured responses.
The issue seems related to the
schema_regex
generated. I haven't been able to get the structured generation to output things other than a single character.I'm using a MBP M3 Max. I installed
outlines==0.1.7
with the transformers extras (i.e.,outlines[transfomers]
). I also tried downgrading to0.1.5
but that didn't work either.Any ideas of what could be going wrong?
Steps/code to reproduce the bug:
Expected result:
Error message:
No response
Outlines/Python version information:
Version information
0.1.7
Python 3.12.6 (main, Sep 6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
accelerate==1.1.1
aiohappyeyeballs==2.4.4
aiohttp==3.11.9
aiosignal==1.3.1
airportsdata==20241001
annotated-types==0.7.0
anyio==4.6.2.post1
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==3.0.0
async-lru==2.0.4
attrs==24.2.0
babel==2.16.0
beautifulsoup4==4.12.3
bleach==6.2.0
cachetools==5.5.0
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
click==8.1.7
cloudpickle==3.1.0
comm==0.2.2
datasets==3.1.0
debugpy==1.8.9
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.8
diskcache==5.6.3
distro==1.9.0
docstring_parser==0.16
executing==2.1.0
fastapi==0.115.5
fastjsonschema==2.21.1
filelock==3.16.1
fqdn==1.5.1
frozenlist==1.5.0
fsspec==2024.9.0
google-ai-generativelanguage==0.6.10
google-api-core==2.23.0
google-api-python-client==2.154.0
google-auth==2.36.0
google-auth-httplib2==0.2.0
google-generativeai==0.8.3
googleapis-common-protos==1.66.0
grpcio==1.68.1
grpcio-status==1.68.1
grpclib==0.4.7
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==1.0.7
httplib2==0.22.0
httpx==0.28.0
huggingface-hub==0.26.3
hyperframe==6.0.1
idna==3.10
instructor==1.7.0
interegular==0.3.3
ipykernel==6.29.5
ipython==8.30.0
isoduration==20.11.0
jedi==0.19.2
Jinja2==3.1.4
jiter==0.6.1
joblib==1.4.2
json5==0.10.0
jsonpointer==3.0.0
jsonref==1.1.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_client==8.6.3
jupyter_core==5.7.2
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.6
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
langsmith==0.1.147
lark==1.2.2
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
mdurl==0.1.2
mistune==3.0.2
modal==0.67.20
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
nbclient==0.10.1
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.4.2
notebook==7.2.2
notebook_shim==0.2.4
numpy==1.26.4
openai==1.56.0
orjson==3.10.12
outlines==0.1.7
outlines_core==0.1.17
overrides==7.7.0
packaging==24.2
pandas==2.2.3
pandocfilters==1.5.1
parso==0.8.4
pexpect==4.9.0
pillow==10.4.0
platformdirs==4.3.6
prometheus_client==0.21.0
prompt_toolkit==3.0.48
propcache==0.2.1
proto-plus==1.25.0
protobuf==5.29.0
psutil==6.1.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==18.1.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pycountry==24.6.1
pycparser==2.22
pydantic==2.10.2
pydantic_core==2.27.1
Pygments==2.18.0
pyparsing==3.2.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
requests-toolbelt==1.0.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.9.4
rpds-py==0.22.0
rsa==4.9
safetensors==0.4.5
scikit-learn==1.5.2
scipy==1.14.1
Send2Trash==1.8.3
setuptools==75.6.0
shellingham==1.5.4
sigtools==4.0.1
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
stack-data==0.6.3
starlette==0.41.3
sympy==1.13.1
synchronicity==0.9.5
tenacity==9.0.0
terminado==0.18.1
threadpoolctl==3.5.0
tinycss2==1.4.0
tokenizers==0.20.3
toml==0.10.2
torch==2.5.1
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
transformers==4.46.3
typer==0.14.0
types-certifi==2021.10.8.3
types-python-dateutil==2.9.0.20241003
types-toml==0.10.8.20240310
typing_extensions==4.12.2
tzdata==2024.2
uri-template==1.3.0
uritemplate==4.1.1
urllib3==2.2.3
watchfiles==1.0.0
wcwidth==0.2.13
webcolors==24.11.1
webencodings==0.5.1
websocket-client==1.8.0
xxhash==3.5.0
yarl==1.18.3
The text was updated successfully, but these errors were encountered: