Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity Localization Bug: Sentence. Runaway sentences in many papers #191

Open
andrewhead opened this issue Jan 5, 2021 · 0 comments
Open
Labels
bad-entity-detection An issue or task related to an entity that was detected in the wrong place bug Something isn't working entity-localization An issue or task related to entity localization sentences An issue or task related to sentences

Comments

@andrewhead
Copy link
Contributor

andrewhead commented Jan 5, 2021

Description: In some papers, it appears that the end of a sentence is never detected, such that that sentence bounding boxes start somewhere in the middle of the paper, and then continue throughout the entire rest of the paper until the end.

I have inspected 24 papers from PR #188, and observed this problem in the following papers:

  • 1702.01287v1: Starting at "Finally, we use..." in Sec 3)
  • 1701.02810v2: Starting at "Finally we briefly..." in Sec 4.4)
  • 1905.05475v2: Starting at "The idea is to reuse..." in Sec 3.3)
  • 1706.08482v1: Starting at ("d in every rame..." in Sec 3). Note that htis paper has multiple runaway sentences.
  • 1704.05838v1: Starting at ("When training with the..." in Sec. 3.5)
  • 1705.06566v2: Starting at "The GAN architecture was..." in Sec 1.2.
  • 1802.07740v2: Starting at "Finally, as a ToM needs to..." in Sec 2.1.
  • 1901.10159v1: Starting at "Therefore a single update..." in Appendix E
  • 1905.10887v2: Starting at A.2.

_How to fix: One fix is to fix the sentence segmenter; I would guess that the command for undoing the color at the end of a sentence is placed in a part of the TeX that is invalid, and hence doesn't successfully reverse the color. Another simpler fix is to do outlier detection, not uploading sentences which contain more than some threshold number of bounding boxes (e.g., more than 100).

@andrewhead andrewhead added bug Something isn't working entity-localization An issue or task related to entity localization bad-entity-detection An issue or task related to an entity that was detected in the wrong place sentences An issue or task related to sentences labels Jan 5, 2021
@andrewhead andrewhead added this to the LaTeX Updates for Alpha milestone Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bad-entity-detection An issue or task related to an entity that was detected in the wrong place bug Something isn't working entity-localization An issue or task related to entity localization sentences An issue or task related to sentences
Projects
None yet
Development

No branches or pull requests

1 participant