-
Notifications
You must be signed in to change notification settings - Fork 44.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When crawling the net, parse pdf documents as well #514
Comments
this would probably be a plugin to use a python pdf parsing library analogous to pdf2text (not sure how to mark/label the issue or if I am lacking permissions to do so) |
Agreed with @Boostrix on this one. PDF parsing is an extraneous task, and isn't as straightforward as it ought to be. It would be better to assign that to developers who are skilled in PDF parsing. |
There already is PR #3031 which supports plain text based PDF processing. that would also provide the option to support arguments, such as searching a PDF file based on authors, date, pages etc (which would return a list of pages/matches etc) a higher level command would probably be an adaption of browse_website or to search specifically just for PDF files using different search engines/APIs (think research servers as per #826), as per: #503 (comment) Probably covered by #2730 Plugin candidate, once the dust settles with #3652 |
This issue was closed automatically because it has been stale for 10 days with no activity. |
Duplicates
Summary 💡
When crawling the web to do market research, a lot of links are sometimes just pdf documents. It would be great if Auto GPT had an inherent ability to parse those pdfs & feed the text for GPT4 to analyse.
Examples 🌈
Motivation 🔦
This way Auto GPT can do the market research task far better than it currently can.
The text was updated successfully, but these errors were encountered: