Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate XML parsing as soon as a complete feed is received #53

Merged
merged 1 commit into from
Jun 7, 2018

Conversation

GarthSnyder
Copy link
Collaborator

One issue that I'm seeing in a lot of real-world feeds is that a valid XML document will be followed by some random HTML or some other kind of detritus. In the podcast world, it's particularly common to see XML followed by a pair of square brackets: "[]". There must be some production software out there that's emitting this - it seems a bit too frequent to be pure happenstance.

XMLParser flags all these extras as XML errors, causing XMLFeedParser to fail as well. But there's no need for XMLParser to even look beyond the end of the XML document; parsing can just stop after the closing </feed> or </rss> tag.

This patch calls abortParsing() on the XMLParser as soon as it returns to the root level. The abortParsing() call generates an error as a side effect, but XMLFeedParser just ignores it.

@nmdias nmdias merged commit 8c59771 into nmdias:master Jun 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants