Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix JSON parsing from model #85

Merged
merged 9 commits into from
Nov 13, 2024
Merged

Conversation

vutrung96
Copy link
Contributor

@vutrung96 vutrung96 commented Nov 13, 2024

Since we are not using strict mode, it's possible for OpenAI to return invalid JSON (either not a JSON or valid JSON but not conform to the JSON schema). In this case we catch the invalid JSON and skip the response.

@vutrung96 vutrung96 changed the base branch from main to dev November 13, 2024 17:00
@RyanMarten
Copy link
Contributor

RyanMarten commented Nov 13, 2024

https://rwilinski.ai/posts/benchmarking-llms-for-structured-json-generation/
Overall requests are slower. By 50% 3s - 5s

Cold start problem:

As Ted Sanders mentioned in this HN comment, using strict mode bears a significant cold start penalty which goes away in the subsequent runs.

The first request with each JSON schema will be slow, as we need to preprocess the JSON schema into a context-free grammar. If you don’t want that latency hit (e.g., you’re prototyping, or have a use case that uses variable one-off schemas), then you might prefer “strict”: false

How much slower it is? Here are my results:

Model schema avgFirstRequestTime avgSecondRequestTime coldStartPenalty
gpt-4o-2024-08-06 Wide JSON Schema 20234.0549 5927.3556 241.37%
gpt-4o-mini Wide JSON Schema 21801.5501 5800.8192 275.84%
gpt-4o-2024-08-06 Complex JSON Schema 24089.9075 7100.4283 239.27%
gpt-4o-mini Complex JSON Schema 26665.4039 10270.7880 159.62%
gpt-4o-2024-08-06 Super Complex JSON Schema 60481.4465 11698.9430 416.98%
gpt-4o-mini Super Complex JSON Schema 66011.3763 13994.1616 371.71%

For a simple to medium complex schema, it is reasonable to go non-strict. Based on the success rate for complex (below)

Method Avg Time (ms) Time Diff (ms) Success Rate Cost Cost Diff
gpt-4o-2024-08-06-non-strict-tool 4079.0854 0 100.0000% 0.1680 +0.1534
gpt-4o-mini-non-strict-json 5847.6183 +1768.5329 100.0000% 0.0175 +0.0029
gpt-4o-2024-08-06-strict-json 5866.2200 +1787.1346 100.0000% 0.1528 +0.1382
gpt-4o-2024-08-06-non-strict-json 6314.3933 +2235.3079 100.0000% 0.3026 +0.2880
gpt-4o-mini-strict-json 7858.5114 +3779.4260 100.0000% 0.0146 0
gpt-4o-mini-non-strict-tool N/A N/A 0.0000% N/A N/A

We should expose a strict flag.
For now, default to non-strict so we can get the current generation jobs across the line. Right now all our own use has been simple json structures.

In line with the author's suggestion:

Based on my findings, I recommend the following approaches:

For Simple JSON Structures:

Prefer non-strict modes, especially tool-based methods for speed and cost-effectiveness
Go with smaller mini model if you can (but don’t forget about potential failures, wrap in try/catch accordingly)

@RyanMarten RyanMarten self-requested a review November 13, 2024 17:22
Copy link
Contributor

@RyanMarten RyanMarten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small change to the error message.

Just say the model successfully responded with a string that is JSON but doesn't match the schema

Copy link
Contributor

@RyanMarten RyanMarten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@RyanMarten RyanMarten merged commit e7bb89e into dev Nov 13, 2024
@RyanMarten RyanMarten deleted the ryanm/invalid-json-from-model branch November 13, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants