Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DALL-E Tool not responding with image #2246

Closed
2 tasks done
perzeuss opened this issue Jan 26, 2024 · 9 comments
Closed
2 tasks done

DALL-E Tool not responding with image #2246

perzeuss opened this issue Jan 26, 2024 · 9 comments
Labels
🐞 bug Something isn't working

Comments

@perzeuss
Copy link
Contributor

perzeuss commented Jan 26, 2024

Self Checks

Dify version

0.5.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  • Create an Agent Assistant
  • Add the Tool DALL-E 3 (Setup auth if you have not)
    • Same issue with DALLE-2, so you can also choose this to reproduce the bug
  • Define the system prompt "You are an Agent capable of generating images with the tool DALL-E. Generated images will be sent to the user as an answer and not via mail."
    • you can keep the prompt empty, I did this just for debugging
  • Choose the model "gpt-4-turbo-preview" and select "Precise" preset
    • Same issue with other models e.g. gpt-3.5-turbo
  • Send the message "generate an image with a desk"

✔️ Expected Behavior

Agent responds with an image generated by DALL-E 3

❌ Actual Behavior

Agend responds with an empty image.

Tool data:

Request to dalle3
{"dalle3": {"prompt": "desk"}}
Response from dalle3
image has been created and sent to user already, you should tell user to check it now.

Hint: I can see in https://platform.openai.com/usage that DALL-E api has been called and images have been generated.

grafik

@perzeuss perzeuss added the 🐞 bug Something isn't working label Jan 26, 2024
Copy link

dosubot bot commented Jan 26, 2024

🤖

Hey @perzeuss, great to cross paths with you again! 🚀 Thanks for providing such a detailed report about the DALL-E issue you're experiencing. I'm diving into it now and will be back with you in a jiffy with some answers. 😺 Hang tight!

@perzeuss
Copy link
Contributor Author

@perzeuss
Copy link
Contributor Author

I tested it on cloud and the tool does respond with the same text, but the image is visible here:
grafik

@perzeuss
Copy link
Contributor Author

I did more debugging.

API Logs:

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/images/generations "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

Looks like the image has been generated. I also see that the frontend tries to fetch the image /files/tools/76e861ee-45b4-42d0-b412-dee37dea5f51.png?timestamp=1706289633&nonce=...&sign=... but the server responds with 404.
I do not see files in the configured s3, should the tool place DALL-E images there?

Copy link

dosubot bot commented Jan 26, 2024

🤖

Based on the information you've provided, it seems like the issue might be related to how the DALL-E tool is handling the response from the OpenAI API. In the provided code, the response from the OpenAI API is expected to be in 'b64_json' format. However, the 'b64_json' response format is not a standard response format for the OpenAI API.

The OpenAI API typically returns the generated image in a binary format, which is then base64 encoded by the client. In the provided code, it seems like the response is expected to be a base64 encoded JSON object, which might not be the case.

Here's the problematic part of the code:

response = client.images.generate(
    prompt=prompt,
    model='dall-e-3',
    size=size,
    n=n,
    style=style,
    quality=quality,
    response_format='b64_json'  # This might not be a valid response format
)

result = []
for image in response.data:
    result.append(self.create_blob_message(blob=b64decode(image.b64_json), 
                                           meta={ 'mime_type': 'image/png' },
                                            save_as=self.VARIABLE_KEY.IMAGE.value))

The 'b64_json' in the 'b64decode(image.b64_json)' line might not be the correct attribute to access the base64 encoded image data. You might need to check the actual structure of the response object to find the correct attribute.

Please note that this is just a hypothesis based on the provided code and the typical behavior of the OpenAI API. The actual issue might be different based on the specific implementation of the DALL-E tool and the OpenAI API client.

As for the expected format of the response from the DALL-E tool in the Dify codebase, it is a list of ToolInvokeMessage objects. Each ToolInvokeMessage object in the list represents an image generated by the DALL-E tool. The image data is stored in the blob field of the ToolInvokeMessage object as a base64 decoded binary data. The meta field of the ToolInvokeMessage object contains a dictionary with a mime_type key that has a value of 'image/png'. This indicates that the image data is in PNG format.

Here is the relevant code that constructs the response:

result = []

for image in response.data:
    result.append(self.create_blob_message(blob=b64decode(image.b64_json), 
                                           meta={ 'mime_type': 'image/png' },
                                            save_as=self.VARIABLE_KEY.IMAGE.value))

return result

In this code, response.data is the response from the OpenAI DALL-E tool. For each image in the response, a ToolInvokeMessage object is created with the base64 decoded image data and the mime type 'image/png'. These ToolInvokeMessage objects are then added to the result list which is returned as the final response.

You can find more details in the Dify codebase, specifically in the dalle2.py and dalle3.py files.

I hope this helps in identifying the issue. If you need further assistance, please provide more details about the specific implementation of the DALL-E tool and the OpenAI API client in your setup.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@perzeuss
Copy link
Contributor Author

perzeuss commented Jan 26, 2024

Fixed for me by adding a missing entry in the nginx conf (etc/nginx/conf.d/default.conf).

    location /files {
      proxy_pass http://api:5001;
      include proxy.conf;
    }

I used https://github.com/BorisPolonsky/dify-helm to set up dify and the nginx setup is out of date there. I opened a PR to resolve this: BorisPolonsky/dify-helm#29

@Kvieta1990
Copy link

Thank you so much for the quick fix and I can confirm that in my case it is working like a charm!

FYI, I was using the docker compose way to fire up the server and added in the missing section as suggested by @perzeuss in the docker/nginx/conf.d/default.conf file (following the directory structure of the main repo) and reran the docker composing.

@crazywoola
Copy link
Member

Define the system prompt "You are an Agent capable of generating images with the tool DALL-E. Generated images will be sent to the user as an answer and not via mail."

How about remove this from the system prompt, this seems unnecessary to me.

@perzeuss
Copy link
Contributor Author

Define the system prompt "You are an Agent capable of generating images with the tool DALL-E. Generated images will be sent to the user as an answer and not via mail."

How about remove this from the system prompt, this seems unnecessary to me.

I just did that for testing, because I initially thought the image does the wrong thing because of the tool response "image has been created and sent to user already, you should tell user to check it now."

However, the issue can be closed, we just had to update the nginx config and there was no migration hint about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants