Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples builder #93

Open
samuelcolvin opened this issue Nov 25, 2024 · 4 comments · May be fixed by #534
Open

Examples builder #93

samuelcolvin opened this issue Nov 25, 2024 · 4 comments · May be fixed by #534
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@samuelcolvin
Copy link
Member

We plan to add an examples builder which would take a sequence of things (e.g. pydantic models, dataclasses, dicts etc.) and serialize them.

Usage would be something like

from pydantic_ai import format_examples

@agent.system_prompt
def foobar():
    return f'the examples are:\n{format_examples(examples, dialect='xml')}'

The suggest is that LLMs find it particularly easy to read XML, so we'll offer (among other formats) XML as way to format the examples.

By default, should it use

"""
<example>
  <input>
    show me values greater than 5
  </input>
  <sql>
    SELECT * FROM table WHERE value > 5
  </sql>
</example>
...
"""

or

"""
<example input="show me values greater than 5" sql="SELECT * FROM table WHERE value > 5" />
...
"""

?

@jxnl
Copy link

jxnl commented Nov 25, 2024

im 99% sure is the first one

@jxnl
Copy link

jxnl commented Nov 25, 2024

Screenshot 2024-11-25 at 2 39 08 PM

When you ask Claw to generate new meeting notes, this is basically what it will do: very, very simple XML.

@sydney-runkle sydney-runkle added the enhancement New feature or request label Dec 5, 2024
@samuelcolvin samuelcolvin added the good first issue Good for newcomers label Dec 11, 2024
@yashrathi-git
Copy link

Hi @samuelcolvin
I would like to work on this as it is a good first issue.

@josca42
Copy link

josca42 commented Dec 13, 2024

On a related note then I often want to use a pydantic object as input to an llm. This involves converting the pydantic object to a string representation that is easy to read for the llm. Having a llm_str method would be really handy. That way it is easy to get a pydantic object as output from an llm call using Instructor or PydanticAI and then you can pass that pydantic object to another llm function and call llm_str, for instance.

Something along those lines have been implemented here: https://github.com/AnswerDotAI/toolslm

Following anthropic documentation and prompt generator then it seems the llms prefer a very simple form of xml. Currently i use the following method:

def llm_str(self) -> str:
    root_tag = self.__class__.__name__
    lines = [f"<{root_tag}>"]

    for field_name, field in self.model_fields.items():
        value = getattr(self, field_name)
        if isinstance(value, BaseModel):
            lines.append(value.to_xml())
        elif isinstance(value, list):
            lines.append(f"<{field_name}>")
            for item in value:
                if isinstance(item, BaseModel):
                    lines.append(item.llm_str())
                else:
                    lines.append(f"- {item}")
            lines.append(f"</{field_name}>")
        else:
            lines.append(f"{field_name}: {value}")
    lines.append(f"</{root_tag}>")
    return "\n".join(lines)

But you could also have a general pydantic to xml converter. Something like this:

def pydantic_obj_to_xml_repr(obj: BaseModel, tag_name: str = None) -> str:
    if tag_name is None:
        tag_name = obj.__class__.__name__.lower()

    def to_element(o: BaseModel, name: str) -> etree._Element:
        e = etree.Element(name)
        for field_name in o.model_fields:
            value = getattr(o, field_name)
            if isinstance(value, BaseModel):
                e.append(to_element(value, field_name))
            elif isinstance(value, list):
                sub = etree.SubElement(e, field_name)
                for item in value:
                    if isinstance(item, BaseModel):
                        sub.append(to_element(item, "item"))
                    else:
                        item_el = etree.SubElement(sub, "item")
                        item_el.text = str(item)
            elif isinstance(value, dict):
                sub = etree.SubElement(e, field_name)
                for k, v in value.items():
                    if isinstance(v, BaseModel):
                        sub.append(to_element(v, k))
                    else:
                        kv_el = etree.SubElement(sub, k)
                        kv_el.text = str(v)
            else:
                sub = etree.SubElement(e, field_name)
                if value is not None:
                    sub.text = str(value)
        return e

    root = to_element(obj, tag_name)
    return etree.tostring(root, pretty_print=True, encoding="unicode")

@mszenfeld mszenfeld linked a pull request Dec 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants