Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity Localization Bug: Symbol. Bounding box includes parts of figure in paper 1901.10159v1 #185

Open
andrewhead opened this issue Dec 30, 2020 · 1 comment
Labels
bad-entity-detection An issue or task related to an entity that was detected in the wrong place bug Something isn't working entity-localization An issue or task related to entity localization symbols An issue or task related to symbols

Comments

@andrewhead
Copy link
Contributor

Description: The bounding box of some symbols stretches to include pixels from figures. Here is a case of this happening for several symbols all on the same page.

image

URL (optional): If you run a local version of the user interface connected to the dev_symbol_failure schema, you can observe this issue at http://localhost:3001/?file=https://arxiv.org/pdf/1901.10159v1.pdf&preset=demo

How to fix (optional): There are at least two potential fixes to this issue:

  1. Outlier detection: exclude bounding boxes that are this tall
  2. Decolorize figures: Replace figure assets with equivalent-size images that have no hue (i.e., are just black or white). That way, when our entity localization code runs, it will not detect the colors within the figure.
  3. (Unknown) There's something strange going on that this bug occurs because no colors should be detected in the figure at all, because the figure should stay the same before and after the colorization. So the first task in debugging this issue should be in determining why the figure appears in the image diffs between the uncolorized and colorized versions of the paper, and seeing if there is a way to prevent that figure from appearing in the diff.
@andrewhead andrewhead added bug Something isn't working entity-localization An issue or task related to entity localization symbols An issue or task related to symbols bad-entity-detection An issue or task related to an entity that was detected in the wrong place labels Dec 30, 2020
@andrewhead andrewhead added this to the LaTeX Updates for Alpha milestone Dec 30, 2020
@andrewhead
Copy link
Contributor Author

Note that sentence bounding boxes also expand to include parts of the figure as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bad-entity-detection An issue or task related to an entity that was detected in the wrong place bug Something isn't working entity-localization An issue or task related to entity localization symbols An issue or task related to symbols
Projects
None yet
Development

No branches or pull requests

1 participant