Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html5lib parser failing due to less than symbol in XML files #110

Open
krisbukovi opened this issue Jan 10, 2020 · 0 comments
Open

html5lib parser failing due to less than symbol in XML files #110

krisbukovi opened this issue Jan 10, 2020 · 0 comments
Assignees

Comments

@krisbukovi
Copy link
Contributor

243 Errors such as "Invalid HTML tag name", "Invalid tag name", "Empty tag name", "Invalid namespace URI" are occurring as a result of "<" symbols that are mistaken for the beginning of an XML tag by the html5lib parser.

Part of the problem is due to uncaught latex formulas that we remove using a regular expression. See 2015GeoJI.200.1466Z for example.

A number of these errors are slightly more difficult to resolve as they occur from images. See this google doc for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant