Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to send one or more attachments with no accompanying prompt #611

Closed
simonw opened this issue Nov 6, 2024 · 5 comments
Closed
Labels
attachments enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Nov 6, 2024

llm -m gpt-4o-audio-preview \
  -a https://static.simonwillison.net/static/2024/pelican-joke-request.mp3 '.'

Note that I need to provide a prompt of . because LLM doesn't currently allow attachments with no prompt.

Originally posted by @simonw in #608 (comment)

@simonw simonw added the enhancement New feature or request label Nov 6, 2024
@simonw simonw changed the title LLM cannot send attachments with no prompt Make it possible to send one or more attachments with no accompanying prompt Nov 6, 2024
@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I tried applying this change:

diff --git a/llm/cli.py b/llm/cli.py
index 941831c..d454719 100644
--- a/llm/cli.py
+++ b/llm/cli.py
@@ -250,7 +250,13 @@ def prompt(
                 bits.append(prompt)
             prompt = " ".join(bits)
 
-        if prompt is None and not save and sys.stdin.isatty():
+        if (
+            prompt is None
+            and not save
+            and sys.stdin.isatty()
+            and not attachments
+            and not attachment_types
+        ):
             # Hang waiting for input to stdin (unless --save)
             prompt = sys.stdin.read()
         return prompt

And ran this:

llm -m gpt-4o-audio-preview \
  -a https://static.simonwillison.net/static/2024/pelican-joke-request.mp3

But I got this error:

Error: Error code: 400 - {'error': {'message': "Invalid chat format. Expected 'text' field in text type content part to be a string.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

So it looks like this change needs to be supported by individual model plugins. In this case the OpenAI plugin is having a problem here:

if not prompt.attachments:
messages.append({"role": "user", "content": prompt.prompt})

Because prompt.prompt can now be None in some cases.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

This fix works for OpenAI:

diff --git a/llm/cli.py b/llm/cli.py
index 941831c..d454719 100644
--- a/llm/cli.py
+++ b/llm/cli.py
@@ -250,7 +250,13 @@ def prompt(
                 bits.append(prompt)
             prompt = " ".join(bits)
 
-        if prompt is None and not save and sys.stdin.isatty():
+        if (
+            prompt is None
+            and not save
+            and sys.stdin.isatty()
+            and not attachments
+            and not attachment_types
+        ):
             # Hang waiting for input to stdin (unless --save)
             prompt = sys.stdin.read()
         return prompt
diff --git a/llm/default_plugins/openai_models.py b/llm/default_plugins/openai_models.py
index 81d0cc0..6944df6 100644
--- a/llm/default_plugins/openai_models.py
+++ b/llm/default_plugins/openai_models.py
@@ -346,9 +346,11 @@ class Chat(Model):
                     )
                     current_system = prev_response.prompt.system
                 if prev_response.attachments:
-                    attachment_message = [
-                        {"type": "text", "text": prev_response.prompt.prompt}
-                    ]
+                    attachment_message = []
+                    if prev_response.prompt.prompt:
+                        attachment_message.append(
+                            {"type": "text", "text": prev_response.prompt.prompt}
+                        )
                     for attachment in prev_response.attachments:
                         attachment_message.append(_attachment(attachment))
                     messages.append({"role": "user", "content": attachment_message})
@@ -362,7 +364,9 @@ class Chat(Model):
         if not prompt.attachments:
             messages.append({"role": "user", "content": prompt.prompt})
         else:
-            attachment_message = [{"type": "text", "text": prompt.prompt}]
+            attachment_message = []
+            if prompt.prompt:
+                attachment_message.append({"type": "text", "text": prompt.prompt})
             for attachment in prompt.attachments:
                 attachment_message.append(_attachment(attachment))
             messages.append({"role": "user", "content": attachment_message})

That ensures that llm -c "another about a walrus" works too.

With that diff in place the example invocation produced this after I ran llm logs -c --json:

[
  {
    "id": "01jbzyk4r4tjdcpjtmar7wymcf",
    "model": "gpt-4o-audio-preview",
    "prompt": null,
    "system": null,
    "prompt_json": {
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "input_audio",
              "input_audio": {
                "data": "...",
                "format": "mp3"
              }
            }
          ]
        }
      ]
    },
    "options_json": {},
    "response": "Why did the pelican get kicked out of the restaurant?\nBecause he had a very big bill!",
    "response_json": {
      "content": "Why did the pelican get kicked out of the restaurant?\nBecause he had a very big bill!",
      "finish_reason": "stop",
      "usage": {
        "completion_tokens": 20,
        "prompt_tokens": 55,
        "total_tokens": 75,
        "prompt_tokens_details": {
          "cached_tokens": 0,
          "audio_tokens": 44
        },
        "completion_tokens_details": {
          "reasoning_tokens": 0,
          "audio_tokens": 0,
          "accepted_prediction_tokens": 0,
          "rejected_prediction_tokens": 0
        }
      },
      "id": "chatcmpl-AQSjeGXYIwHvDsPxk4E2gs3pxUDKG",
      "object": "chat.completion.chunk",
      "model": "gpt-4o-audio-preview-2024-10-01",
      "created": 1730870350
    },
    "conversation_id": "01jbzyk4r25b0bk49n51temd5w",
    "duration_ms": 2887,
    "datetime_utc": "2024-11-06T05:19:07.703285",
    "conversation_name": "",
    "conversation_model": "gpt-4o-audio-preview",
    "attachments": [
      {
        "id": "baa308f3b1597f093eca7da609293c4b420061af1a88b0008e2bf338126b5e67",
        "type": "audio/mpeg",
        "path": null,
        "url": "https://static.simonwillison.net/static/2024/pelican-joke-request.mp3",
        "content_length": null
      }
    ]
  },
  {
    "id": "01jbzympq60bjaqb6dyad67jme",
    "model": "gpt-4o-audio-preview",
    "prompt": "another about a walrus",
    "system": null,
    "prompt_json": {
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "input_audio",
              "input_audio": {
                "data": "...",
                "format": "mp3"
              }
            }
          ]
        },
        {
          "role": "assistant",
          "content": "Why did the pelican get kicked out of the restaurant?\nBecause he had a very big bill!"
        },
        {
          "role": "user",
          "content": "another about a walrus"
        }
      ]
    },
    "options_json": {},
    "response": "Why do walruses never make a good basketball team?\nBecause they're always afraid of getting called for \"foul\"!",
    "response_json": {
      "content": "Why do walruses never make a good basketball team?\nBecause they're always afraid of getting called for \"foul\"!",
      "finish_reason": "stop",
      "usage": {
        "completion_tokens": 24,
        "prompt_tokens": 88,
        "total_tokens": 112,
        "prompt_tokens_details": {
          "cached_tokens": 0,
          "audio_tokens": 44
        },
        "completion_tokens_details": {
          "reasoning_tokens": 0,
          "audio_tokens": 0,
          "accepted_prediction_tokens": 0,
          "rejected_prediction_tokens": 0
        }
      },
      "id": "chatcmpl-AQSkT3NBfQD9c2h2jjOEjwt0y5Uyp",
      "object": "chat.completion.chunk",
      "model": "gpt-4o-audio-preview-2024-10-01",
      "created": 1730870401
    },
    "conversation_id": "01jbzyk4r25b0bk49n51temd5w",
    "duration_ms": 2715,
    "datetime_utc": "2024-11-06T05:19:59.049365",
    "conversation_name": "",
    "conversation_model": "gpt-4o-audio-preview",
    "attachments": []
  }
]

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I'm OK with this solution for the moment - I think it's fine that plugins should consider the case where there are attachments but no prompt.

I'll add that to the advanced plugin documentation.

@simonw simonw closed this as completed in 0cc4072 Nov 6, 2024
@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

I checked and this works for images too:

Text reads: Joke about pelicans and walruses living together

llm -m gpt-4o-mini -a joke-about-pelicans-and-walruses.png

Why did the pelicans and walruses decide to live together?

Because they wanted to create a fin-tastic community—where everyone can flap around and haul out together!

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2024

llm-gemini breaks on this:

llm -m gemini-1.5-flash-latest -a joke-about-pelicans-and-walruses.png

Error: * GenerateContentRequest.contents[0].parts[0].data: required oneof field 'data' must have one initialized field

simonw added a commit to simonw/llm-gemini that referenced this issue Nov 6, 2024
simonw added a commit that referenced this issue Nov 14, 2024
simonw added a commit that referenced this issue Nov 17, 2024
simonw added a commit that referenced this issue Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attachments enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant