-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table parsing doesn't work #191
Comments
Hi @netapy Thank you for reaching out! Currently, table support parameters only work with If you wise to enable table support for PDFs only, you can set But more generally if you want to enable table support for different file types, i would suggest you to use the If table support works as expected, you can see |
Thank you for that complete answer. I use the following pdf document as a test : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf
And here is the ouput :
|
@netapy Thanks for pointing it out! I did reproduce on that PDF and got the same result, i also tried to run with
sorry that didn't work, this is a long existing problem with our detectron2 model and we are building a quantized version of In the meantime, i would still encourage you to run yolox locally (start the api with |
Could you update the readme on what env var needs to be set to have it use the yolox model ? |
updating readme as to comment in this GH issue: #191 ### Summary * add documentation to `hi_res_model_name` parameter in readme
Hi! Thanks for the update – it does parse tables now. However It's really not that great at doing it. Here is my code using YOLOX :
And here is the html table I get from the sample we talked about (https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf) : Why is an ML model needed ? are there no other possibilities of parsing written tables inside a pdf ? Thanks again ! :) |
Hi there! Right now I can't think of any other way to help improve the parsing, but i will rise this issue to our Engineering team and see what we can help :) |
Hi @netapy, May I ask if you are running the table parsing on a M1/M2 chip? |
Hi Yuming – Are there easy steps to automate this using the docker image or shall I dig into the container bash ? |
Thanks for follow up! I actually tired myself and got the
here are the steps i used to reproduce:
|
Hi, the table parsing doesn't seem to work at all in my case.
I tried with multiple files (.pdf, .jpeg, .docx...)
It returns most cells as
UncategorizedText
and a few asTitle
.I call the API using the following parameters :
and
Thanks !
The text was updated successfully, but these errors were encountered: