-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF parsing error handling #14
Comments
Nevermind, I jsut realized it caches everything. Still nice to have the error handling though |
I should probably make the cache handling more clear in the docs so folks are reassured. Great point re: error handling. Logging an error message and continuing is the way to go here. Also, if there's a PDF that's not parsing correctly that should be (and you're comfortable sharing), let me know! |
it was a fault on my end, the pdf was empty for some reason |
I also realized the search is quite slow for 1000s of PDFs. Is this because I'm using a relatively big model or just because they're in PDF format? Would it be faster if it was raw text or if I use a smaller model? |
Just here to +1, would be great to skip PDFs that have errors. It also currently breaks if it encounters any password protected PDFs (see below). Thank you for this very useful tool!
|
Hi, it would be useful if some error handling was added in case a PDF fails to parse. I earlier got this error after parsing 1000s of PDFs and had to restart from scratch (not a big deal of course I used a small model for embedding but annoying if a large openai model would have been used).
The text was updated successfully, but these errors were encountered: