Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence assembly #8

Open
wehlutyk opened this issue Sep 25, 2018 · 0 comments
Open

Sequence assembly #8

wehlutyk opened this issue Sep 25, 2018 · 0 comments
Labels
question Further information is requested

Comments

@wehlutyk
Copy link
Owner

Sequence assembly (thank you bioinformatics, again!) could be adapted to piece objects together using probabilities that they follow each other.

Ideas on what this could use:

  • n-grams
  • positioning heuristics (or learned):
    • if two blocks are positioned such that really follow each other, it's highly probable that they follow each other in content
    • a block NW of another block can't be the subsequent piece of content (you can only move in the NE, SE, SW quadrants)
  • formatting: content blocks that follow each other should have similar formatting (except cases like when it's a title that comes next)
  • linking figures and footnotes to the main text could be done using the numbers that appear in the main text, assuming we can classify figures and footnotes as such before (or at the same time as) inferring the body text sequence.
@wehlutyk wehlutyk added the question Further information is requested label Sep 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant