You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some conclusions from playing with the Atom feed below:
xml.sax.SAXParseException "undefined entity" is survivable.
"mismatched tag" is not; we get all the good entries, and then the broken entry, in a bad state (e.g. all content in <title>); entries after it are missing, but not always.
It may be worth finding what other kinds of errors can be encountered... (all of them).
Also, when the loose parser is used, the feed should be considered stale; that is, we should always prefer entries from the non-broken feed.
I'm thinking of something like this:
existing
parsed
desired behavior
current behavior
none
any
use new (any)
yes
any
strict
use new (strict)
yes (hash takes care of it)
strict
loose
keep old (strict)
no (different hash => update)
loose
loose
use new (loose)
yes (hash takes care of it)
This would favor feeds that are temporarily broken, and eventually get fixed. For feeds that become permanently broken, it results in old strict entries not receiving updates.
reader treats all bozo feeds as errors, even if the loose parser managed to parse them:
We still need a heuristic to tell that apart from complete garbage (version, and the presence of entries?):
>>> feedparser.parse("garbage") {'bozo': 1, 'entries': [], 'feed': {}, 'headers': {}, 'encoding': 'utf-8', 'version': '', 'bozo_exception': SAXParseException('syntax error'), 'namespaces': {}}
The text was updated successfully, but these errors were encountered: