Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add with_structured_output support for Pydantic models, dicts and Enum #79

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

mattf
Copy link
Collaborator

@mattf mattf commented Jul 24, 2024

Bind a structured output schema to the model.

The schema can be -
0. a dictionary representing a JSON schema
1. a Pydantic object
2. an Enum

  1. If a dictionary is provided, the model will return a dictionary. Example:
json_schema = {
    "title": "joke",
    "description": "Joke to tell user.",
    "type": "object",
    "properties": {
        "setup": {
            "type": "string",
            "description": "The setup of the joke",
        },
        "punchline": {
            "type": "string",
            "description": "The punchline to the joke",
        },
    },
    "required": ["setup", "punchline"],
}

structured_llm = llm.with_structured_output(json_schema)
structured_llm.invoke("Tell me a joke about NVIDIA")
# Output: {'setup': 'Why did NVIDIA go broke? The hardware ate all the software.',
#          'punchline': 'It took a big bite out of their main board.'}
  1. If a Pydantic schema is provided, the model will return a Pydantic object.
    Example:
from langchain_core.pydantic_v1 import BaseModel, Field
class Joke(BaseModel):
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")

structured_llm = llm.with_structured_output(Joke)
structured_llm.invoke("Tell me a joke about NVIDIA")
# Output: Joke(setup='Why did NVIDIA go broke? The hardware ate all the software.',
#              punchline='It took a big bite out of their main board.')
  1. If an Enum is provided, all values must be strings, and the model will return
    an Enum object. Example:
import enum
class Choices(enum.Enum):
    A = "A"
    B = "B"
    C = "C"

structured_llm = llm.with_structured_output(Choices)
structured_llm.invoke("What is the first letter in this list? [X, Y, Z, C]")
# Output: <Choices.C: 'C'>

Note about streaming: Unlike other streaming responses, the streamed chunks
will be increasingly complete. They will not be deltas. The last chunk will
contain the complete response.

For instance with a dictionary schema, the chunks will be:

structured_llm = llm.with_structured_output(json_schema)
for chunk in structured_llm.stream("Tell me a joke about NVIDIA"):
    print(chunk)

# Output:
# {}
# {'setup': ''}
# {'setup': 'Why'}
# {'setup': 'Why did'}
# {'setup': 'Why did N'}
# {'setup': 'Why did NVID'}
# ...
# {'setup': 'Why did NVIDIA go broke? The hardware ate all the software.', 'punchline': 'It took a big bite out of their main board'}
# {'setup': 'Why did NVIDIA go broke? The hardware ate all the software.', 'punchline': 'It took a big bite out of their main board.'}

For instnace with a Pydantic schema, the chunks will be:

structured_llm = llm.with_structured_output(Joke)
for chunk in structured_llm.stream("Tell me a joke about NVIDIA"):
    print(chunk)

# Output:
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline=''
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline='It'
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline='It took'
# ...
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline='It took a big bite out of their main board'
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline='It took a big bite out of their main board.'

For Pydantic schema and Enum, the output will be None if the response is
insufficient to construct the object or otherwise invalid. For instance,

llm = ChatNVIDIA(max_tokens=1)
structured_llm = llm.with_structured_output(Joke)
print(structured_llm.invoke("Tell me a joke about NVIDIA"))

# Output: None

@mattf mattf requested review from dglogo and raspawar July 24, 2024 14:39
@mattf mattf self-assigned this Jul 24, 2024
*,
method: Literal["function_calling", "json_mode"] = "function_calling",
Copy link
Collaborator

@dglogo dglogo Jul 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand this correctly, and we decided to remove the ability to specify method?
https://python.langchain.com/v0.2/docs/how_to/structured_output/#advanced-specifying-the-method-for-structuring-outputs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'method' parameter is unnecessary and is ignored. The appropriate method will be chosen automatically depending on the type of schema provided.

we could enhance this by letting users force a particular implementation, but not all implementations will work with all input types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but specifying JSON mode should force it output JSON regardless. Also this would be consistent with broader lc usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method param effectively requires the user to pick the implementation that is used to achieve their desired outcome of json or pydantic responses from invoke/stream. we can do better than requiring the user to pick.

also, https://python.langchain.com/v0.2/docs/how_to/structured_output/#advanced-specifying-the-method-for-structuring-outputs
image

"JSON mode" asserts it will not pass the schema to the model, but we will pass the schema to the model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sgtm!

@mattf mattf requested a review from dglogo July 30, 2024 18:36
Copy link
Collaborator

@dglogo dglogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mattf!

@mattf mattf merged commit bd7835d into main Jul 30, 2024
12 checks passed
@mattf mattf deleted the add-chat-structured-output-support branch July 30, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants