add with_structured_output support for Pydantic models, dicts and Enum #79

mattf · 2024-07-24T14:39:18Z

Bind a structured output schema to the model.

The schema can be -
0. a dictionary representing a JSON schema
1. a Pydantic object
2. an Enum

If a dictionary is provided, the model will return a dictionary. Example:

json_schema = {
    "title": "joke",
    "description": "Joke to tell user.",
    "type": "object",
    "properties": {
        "setup": {
            "type": "string",
            "description": "The setup of the joke",
        },
        "punchline": {
            "type": "string",
            "description": "The punchline to the joke",
        },
    },
    "required": ["setup", "punchline"],
}

structured_llm = llm.with_structured_output(json_schema)
structured_llm.invoke("Tell me a joke about NVIDIA")
# Output: {'setup': 'Why did NVIDIA go broke? The hardware ate all the software.',
#          'punchline': 'It took a big bite out of their main board.'}

If a Pydantic schema is provided, the model will return a Pydantic object.
Example:

from langchain_core.pydantic_v1 import BaseModel, Field
class Joke(BaseModel):
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")

structured_llm = llm.with_structured_output(Joke)
structured_llm.invoke("Tell me a joke about NVIDIA")
# Output: Joke(setup='Why did NVIDIA go broke? The hardware ate all the software.',
#              punchline='It took a big bite out of their main board.')

If an Enum is provided, all values must be strings, and the model will return
an Enum object. Example:

import enum
class Choices(enum.Enum):
    A = "A"
    B = "B"
    C = "C"

structured_llm = llm.with_structured_output(Choices)
structured_llm.invoke("What is the first letter in this list? [X, Y, Z, C]")
# Output: <Choices.C: 'C'>

Note about streaming: Unlike other streaming responses, the streamed chunks
will be increasingly complete. They will not be deltas. The last chunk will
contain the complete response.

For instance with a dictionary schema, the chunks will be:

structured_llm = llm.with_structured_output(json_schema)
for chunk in structured_llm.stream("Tell me a joke about NVIDIA"):
    print(chunk)

# Output:
# {}
# {'setup': ''}
# {'setup': 'Why'}
# {'setup': 'Why did'}
# {'setup': 'Why did N'}
# {'setup': 'Why did NVID'}
# ...
# {'setup': 'Why did NVIDIA go broke? The hardware ate all the software.', 'punchline': 'It took a big bite out of their main board'}
# {'setup': 'Why did NVIDIA go broke? The hardware ate all the software.', 'punchline': 'It took a big bite out of their main board.'}

For instnace with a Pydantic schema, the chunks will be:

structured_llm = llm.with_structured_output(Joke)
for chunk in structured_llm.stream("Tell me a joke about NVIDIA"):
    print(chunk)

# Output:
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline=''
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline='It'
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline='It took'
# ...
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline='It took a big bite out of their main board'
# setup='Why did NVIDIA go broke? The hardware ate all the software.' punchline='It took a big bite out of their main board.'

For Pydantic schema and Enum, the output will be None if the response is
insufficient to construct the object or otherwise invalid. For instance,

llm = ChatNVIDIA(max_tokens=1)
structured_llm = llm.with_structured_output(Joke)
print(structured_llm.invoke("Tell me a joke about NVIDIA"))

# Output: None

…ms (only include_raw=False)

dglogo · 2024-07-28T02:48:48Z

libs/ai-endpoints/langchain_nvidia_ai_endpoints/chat_models.py

        *,
-        method: Literal["function_calling", "json_mode"] = "function_calling",


Do I understand this correctly, and we decided to remove the ability to specify method?
https://python.langchain.com/v0.2/docs/how_to/structured_output/#advanced-specifying-the-method-for-structuring-outputs

The 'method' parameter is unnecessary and is ignored. The appropriate method will be chosen automatically depending on the type of schema provided.

we could enhance this by letting users force a particular implementation, but not all implementations will work with all input types.

I could be wrong, but specifying JSON mode should force it output JSON regardless. Also this would be consistent with broader lc usage.

the method param effectively requires the user to pick the implementation that is used to achieve their desired outcome of json or pydantic responses from invoke/stream. we can do better than requiring the user to pick.

also, https://python.langchain.com/v0.2/docs/how_to/structured_output/#advanced-specifying-the-method-for-structuring-outputs

"JSON mode" asserts it will not pass the schema to the model, but we will pass the schema to the model.

dglogo

Thanks @mattf!

add with_structured_output support for Pydantic models, dicts and Enu…

cacc739

…ms (only include_raw=False)

mattf requested review from dglogo and raspawar July 24, 2024 14:39

mattf self-assigned this Jul 24, 2024

dglogo reviewed Jul 28, 2024

View reviewed changes

mattf requested a review from dglogo July 30, 2024 18:36

dglogo approved these changes Jul 30, 2024

View reviewed changes

mattf merged commit bd7835d into main Jul 30, 2024
12 checks passed

mattf deleted the add-chat-structured-output-support branch July 30, 2024 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add with_structured_output support for Pydantic models, dicts and Enum #79

add with_structured_output support for Pydantic models, dicts and Enum #79

mattf commented Jul 24, 2024

dglogo Jul 28, 2024 •

edited

Loading

mattf Jul 29, 2024

dglogo Jul 30, 2024

mattf Jul 30, 2024

dglogo Jul 30, 2024

dglogo left a comment

		*,
		method: Literal["function_calling", "json_mode"] = "function_calling",

add with_structured_output support for Pydantic models, dicts and Enum #79

add with_structured_output support for Pydantic models, dicts and Enum #79

Conversation

mattf commented Jul 24, 2024

dglogo Jul 28, 2024 • edited Loading

Choose a reason for hiding this comment

mattf Jul 29, 2024

Choose a reason for hiding this comment

dglogo Jul 30, 2024

Choose a reason for hiding this comment

mattf Jul 30, 2024

Choose a reason for hiding this comment

dglogo Jul 30, 2024

Choose a reason for hiding this comment

dglogo left a comment

Choose a reason for hiding this comment

dglogo Jul 28, 2024 •

edited

Loading