Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc Extractor does not support the parsing of markdown format files #9757

Closed
5 tasks done
yaoice opened this issue Oct 24, 2024 · 4 comments
Closed
5 tasks done

Doc Extractor does not support the parsing of markdown format files #9757

yaoice opened this issue Oct 24, 2024 · 4 comments
Labels
🐞 bug Something isn't working

Comments

@yaoice
Copy link
Contributor

yaoice commented Oct 24, 2024

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.10.1

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

  1. upload markdown file named test1.md
  2. talk to the bot
image image

✔️ Expected Behavior

Properly parse markdown files.

❌ Actual Behavior

Return Run failed: Unsupported MIME type: application/octet-stream

@dosubot dosubot bot added the 🐞 bug Something isn't working label Oct 24, 2024
Copy link

dosubot bot commented Oct 24, 2024

The issue you're encountering with the Doc Extractor not supporting markdown files seems to be related to the MIME type being incorrectly identified as application/octet-stream. The Doc Extractor does have specific handling for markdown files when the MIME type is correctly identified as text/markdown [1].

To resolve this, you might want to ensure that the file upload process correctly identifies the MIME type of markdown files. This could involve checking the file upload configuration or the environment where the file is being uploaded to ensure that markdown files are recognized with the correct MIME type. If the MIME type is not being set correctly, it might default to application/octet-stream, which is not supported by the Doc Extractor.

To continue talking to Dosu, mention @dosu.

@crazywoola
Copy link
Member

crazywoola commented Oct 24, 2024

image image

It seems working for me on Cloud version.

@yaoice
Copy link
Contributor Author

yaoice commented Oct 24, 2024

image image
It seems working for me on Cloud version.

@crazywoola I use the docker-compose version and the source code version for testing. Can you try it?

@crazywoola
Copy link
Member

Can you try #9543 this pr? If you have further questions you can DM my wechat crazyphage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants