Terminate XML parsing as soon as a complete feed is received #53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
One issue that I'm seeing in a lot of real-world feeds is that a valid XML document will be followed by some random HTML or some other kind of detritus. In the podcast world, it's particularly common to see XML followed by a pair of square brackets: "[]". There must be some production software out there that's emitting this - it seems a bit too frequent to be pure happenstance.
XMLParser flags all these extras as XML errors, causing XMLFeedParser to fail as well. But there's no need for XMLParser to even look beyond the end of the XML document; parsing can just stop after the closing
</feed>
or</rss>
tag.This patch calls abortParsing() on the XMLParser as soon as it returns to the root level. The abortParsing() call generates an error as a side effect, but XMLFeedParser just ignores it.