-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Angle Sharp parsing xml attributes #42
Comments
Thanks for the bug report and the investigation regarding the cause. I am going to try solving the issue this weekend. |
@prestonkell I cannot reproduce this issue. Could you provide an example HTML page that generate this error? |
Yeah no problem - I get the error with this site: https://www.mumsnet.com/talk/parenting/4301360-second-child-looks-more-like-mum |
gabriele-tomassetti
added a commit
that referenced
this issue
Jun 14, 2022
gabriele-tomassetti
added a commit
that referenced
this issue
Jun 14, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi there,
I'm seeing a number of pages that throw
Invalid character detected. AngleSharp.Dom.DomException
because AngleSharp's attributes get validated with theirIsXmlName
andIsXmlNameStart
method https://github.com/AngleSharp/AngleSharp/blob/cdc88b1c0e71476f35fe9405d38b66e33fa1969c/src/AngleSharp/Text/XmlExtensions.cs#L47.It's called from the
SimplifyNestedElements
method in Readability when setting attributeshttps://github.com/Strumenta/SmartReader/blob/master/src/SmartReader/Readability.cs#L190
.Can this be optional or somehow be cleaned or something in order to avoid these errors? I've seen this with attributes that start with '@' or other chars that xml deems wrong but are in html.
The text was updated successfully, but these errors were encountered: