Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for Structured Outputs in ChatOpenAI #526

Merged
merged 1 commit into from
Aug 17, 2024

Conversation

davidmigloz
Copy link
Owner

@davidmigloz davidmigloz commented Aug 17, 2024

Structured Outputs is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema, so you don't need to worry about the model omitting a required key, or hallucinating an invalid enum value.

final prompt = PromptValue.chat([
  ChatMessage.system(
    'Extract the data of any companies mentioned in the '
    'following statement. Return a JSON list.',
  ),
  ChatMessage.humanText(
    'Google was founded in the USA, while Deepmind was founded in the UK',
  ),
]);
final chatModel = ChatOpenAI(
  apiKey: openaiApiKey,
  defaultOptions: ChatOpenAIOptions(
    model: 'gpt-4o',
    temperature: 0,
    responseFormat: ChatOpenAIResponseFormat.jsonSchema(
      ChatOpenAIJsonSchema(
        name: 'Companies',
        description: 'A list of companies',
        strict: true,
        schema: {
          'type': 'object',
          'properties': {
            'companies': {
              'type': 'array',
              'items': {
                'type': 'object',
                'properties': {
                  'name': {'type': 'string'},
                  'origin': {'type': 'string'},
                },
                'additionalProperties': false,
                'required': ['name', 'origin'],
              },
            },
          },
          'additionalProperties': false,
          'required': ['companies'],
        },
      ),
    ),
  ),
);

final res = await chatModel.invoke(prompt);
// {
//   "companies": [
//     {
//       "name": "Google",
//       "origin": "USA"
//     },
//     {
//       "name": "Deepmind",
//       "origin": "UK"
//     }
//   ]
// }

When you use strict: true, the model outputs will match the supplied schema exactly. Mind that the strict mode only support a subset of JSON schema for performance reasons. Under-the-hood, OpenAI uses a technique known as constrained sampling or constrained decoding. For each JSON Schema, they compute a grammar that represents that schema, and pre-process its components to make it easily accessible during model sampling. This is why the first request with a new schema incurs a latency penalty. Typical schemas take under 10 seconds to process on the first request, but more complex schemas may take up to a minute.

@davidmigloz davidmigloz self-assigned this Aug 17, 2024
@davidmigloz davidmigloz added c:chat-models Chat models. p:langchain_openai langchain_openai package. labels Aug 17, 2024
@davidmigloz davidmigloz added this to the v0.8.0 milestone Aug 17, 2024
@davidmigloz davidmigloz merged commit c5387b5 into main Aug 17, 2024
1 check passed
@davidmigloz davidmigloz deleted the structured_outputs branch August 17, 2024 16:00
KennethKnudsen97 pushed a commit to KennethKnudsen97/langchain_dart that referenced this pull request Oct 1, 2024
KennethKnudsen97 pushed a commit to KennethKnudsen97/langchain_dart that referenced this pull request Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:chat-models Chat models. p:langchain_openai langchain_openai package.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant