-
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't check preconnect links #1187
Conversation
dc397f6
to
3d10611
Compare
Huh, I don't know why this breaks a test. Somehow the parser state gets tripped up by my change. Maybe I need to reset some state. @untitaker, any ideas? |
I'm thinking I need to improve the attribute handling in general to make the tests pass. I would have to cache the attributes and only extract links once I'm fully done with reading all attributes for a single tag. (That's the one reason why I haven't removed html5ever support yet. I like how simple the parsing is, although it's slower of course.) |
instead of writing your own emitter you can always just read through the |
@untitaker, the way I would model it (without switching to the Does that make sense? Is there an easier way in html5gum (without |
yeah that totally makes sense. currently there's no simpler way in html5gum. I would like to have more kinds of emitters that take some of that work off of you, but when I tried to build them in the past they didn't feel general purpose enough.
…On Thu, Aug 24, 2023, at 16:44, Matthias Endler wrote:
@untitaker <https://github.com/untitaker>, the way I would model it (without switching to the `DefaultEmitter`) would be to create a `HashMap` with key being the attribute and value being a list/set of links, e.g. `{ key: "href", "value": [https://example.com] }`. When I reach the end of a tag, I check the attributes and emit the list of links if there are no excluded attributes (like `rel=prefetch`).
Does that make sense? Is there an easier way in html5gum (without `DefaultEmitter`)? It feels like the perf could suffer from the additional bookkeeping.
—
Reply to this email directly, view it on GitHub <#1187 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGMPROI3GASJ5FLPDYNPMLXW5SDJANCNFSM6AAAAAA24YQC7A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
This commit doesn't exclude all types of preconnect links e.g. dns-prefetch is not excluded even though it should be. |
Preconnect links are used to establish a server connection without loading a specific resource yet. Not always do these links point to a URL that should return a 200, and they are not user-facing, i.e. they don't show up in the final rendered version of a page. Therefore, I think we should them at all; not even in `--include-verbatim` mode, as they might not point to a valid resource. Fixes #897
…lity - Replace Vec<u8> with String for better readability and manipulation - Introduce Element struct to encapsulate element-related data - Use HashMap<String, String> for current_attributes for efficient lookups - Add verbatim_stack to properly handle nested verbatim elements - Remove unsafe code where possible, using String::from_utf8_lossy - Improve attribute handling with HashMap entry API and prioritize srcset - Simplify logic and consolidate verbatim element handling - Enhance encapsulation in LinkExtractor struct - Improve overall performance with more efficient data structures - Increase flexibility for future feature additions or modifications This refactor maintains core functionality while making the code more idiomatic Rust, easier to read and maintain, and more robust in handling edge cases. The new structure is better suited for future extensions or modifications.
After more than a year, I finally got around to cleaning up this PR and fixing the remaining attribute ordering issues. Along the way, I refactored (and hopefully improved) the html5gum parser. The new Other changes include refactoring the HTML link extractor for improved performance and maintainability, extending the documentation. I also removed unsafe code where possible, using Thanks, @untitaker, for the review. |
// Check and exclude rel=preconnect. Other than prefetch and preload, | ||
// preconnect only does DNS lookups and might not be a link to a resource | ||
if let Some(rel) = attrs.iter().find(|attr| &attr.name.local == "rel") { | ||
if rel.value.contains("preconnect") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would changing this conditional to the following resolve issue #1499?
if rel.value.contains("preconnect") || rel.value.contains("prefetch")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. Should we be more explicit and check for dns-prefetch
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry; don't know why I didn't use that.
if rel.value.contains("preconnect") || rel.value.contains("dns-prefetch")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanna send a pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks :)
Preconnect links are used to establish a server connection without loading a specific resource yet.
Not always do these links point to a URL that should return a 200, and they are not user-facing, i.e. they don't show up in the final rendered version of a page.
Therefore, I think we should exclude them at all; not even in
--include-verbatim
mode, as they might not point to a valid resource.
Fixes #897