Provide compatibility with mkdocs-material blog plugin #72

SeanTAllen · 2024-02-20T13:41:51Z

Prior to this commit, certain assumptions were made about the files seen in on_files that are not true when the mkdocs-material blog plugin is used.

The url seen at the time on_files is called is not guaranteed to be the final url that will appear in html. With the blog plugin, at the time on_files is called, the value will be something like:

blog/posts/foo.md

but at the time we are trying to get a mapping, the url will be something like:

blog/2024/01/foo.html

Due to this change, htmlproofer would fail to validate the url despite it being valid.

To address this, instead of looking up pages from a Dict where the key is set at the time of on_files, we know store a list of Files and check for the search_path for a url against the url attribute of each File to try to find the mapping.

SeanTAllen · 2024-02-20T13:46:04Z

@manuzhang should the linting errors be fixed as part of this PR?

manuzhang · 2024-02-20T14:39:11Z

Yes, please fix it

manuzhang · 2024-02-20T15:41:26Z

htmlproofer/plugin.py

-        try:
-            return files[search_path]
-        except KeyError:
+        for file in files:


Can this impact performance significantly?

It changes the O for lookup to be linear so for huge sites, it could have a large impact.

Unfortunately the faster lookup that existed previously is also a bug.

The performance could be addressed by creating a Dictionary at the time that lookup is needed after all the File objects have been fully configured. This would have better performance characteristics but is also more obscure and probably a bit harder to maintain.

On almost all sites, the impact of this check should not be particularly noticeable to a "reasonable human".

Note, I'm not sure where the proper place to build that Dictionary to speed the lookups in find_source_file would be.

My python is not the greatest. Ideally, if you want to do that change, I'd prefer to hand it off to you.

Me neither ;). @johnthagen could you please take a look?

With the change I had to make to get this to work for the Windows tests, the cost has increased some as os.path.normpath needs to be called on file.url for the comparison to work, so that is an additional bit of cost.

I ran again locally with the source for ponylang.io (https://github.com/ponylang/ponylang-website) and if there is a difference, it isn't noticeable to me.

If I understand the lifecycle correctly then, self.files could be turned into an optimized dictionary in on_post_page anytime before the for a in soup.find_all('a', href=True): loop.

If that is correct, I do a dictionary based structure that mimics the old data structure but has the correct information.

Can you confirm that my understanding of the lifecycle is correct?

I'll make that change in the not so distant future. Before the end of the weekend at the latest.

SeanTAllen · 2024-02-20T16:08:44Z

@manuzhang this particular implementation apparently has an issue with windows. I'm guessing that is a path separator issue. I'll look at it later.

SeanTAllen · 2024-02-20T16:36:09Z

I think I know what the windows issue is. Testing that out now.

Prior to this commit, certain assumptions were made about the files seen in `on_files` that are not true when the mkdocs-material blog plugin is used. The url seen at the time `on_files` is called is not guaranteed to be the final url that will appear in html. With the blog plugin, at the time `on_files` is called, the value will be something like: `blog/posts/foo.md` but at the time we are trying to get a mapping, the url will be something like: `blog/2024/01/foo.html` Due to this change, htmlproofer would fail to validate the url despite it being valid. To address this, instead of looking up pages from a Dict where the key is set at the time of `on_files`, we know store a list of Files and check for the `search_path` for a url against the `url` attribute of each File to try to find the mapping.

SeanTAllen · 2024-02-24T23:22:00Z

@manuzhang all set. with the optimization, all the old tests continued to work unchanged. It ends up being a better change.

I tested this locally as well with a site that uses the blog as an extra level of verification.

SeanTAllen mentioned this pull request Feb 20, 2024

Incorrect broken links reported #70

Closed

manuzhang reviewed Feb 20, 2024

View reviewed changes

SeanTAllen force-pushed the material-blog branch 5 times, most recently from 7fe9b58 to 99707e7 Compare February 20, 2024 16:05

SeanTAllen force-pushed the material-blog branch 3 times, most recently from 9f7a612 to b99b84c Compare February 20, 2024 16:35

SeanTAllen force-pushed the material-blog branch 2 times, most recently from eee2979 to c3eb1df Compare February 20, 2024 16:41

SeanTAllen added 2 commits February 24, 2024 23:05

Optimization

8d9d9db

SeanTAllen force-pushed the material-blog branch from cabe052 to 8d9d9db Compare February 24, 2024 23:08

SeanTAllen added 2 commits February 24, 2024 23:09

Fix after rebase

ba3c37c

Change tests back

6fc6d8f

manuzhang approved these changes Feb 25, 2024

View reviewed changes

manuzhang merged commit 65ae40f into manuzhang:main Feb 25, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide compatibility with mkdocs-material blog plugin #72

Provide compatibility with mkdocs-material blog plugin #72

SeanTAllen commented Feb 20, 2024

SeanTAllen commented Feb 20, 2024

manuzhang commented Feb 20, 2024

manuzhang Feb 20, 2024

SeanTAllen Feb 20, 2024

SeanTAllen Feb 20, 2024

manuzhang Feb 20, 2024

SeanTAllen Feb 20, 2024

SeanTAllen Feb 20, 2024

manuzhang Feb 22, 2024

SeanTAllen Feb 22, 2024

SeanTAllen commented Feb 20, 2024

SeanTAllen commented Feb 20, 2024

SeanTAllen commented Feb 24, 2024

Provide compatibility with mkdocs-material blog plugin #72

Provide compatibility with mkdocs-material blog plugin #72

Conversation

SeanTAllen commented Feb 20, 2024

SeanTAllen commented Feb 20, 2024

manuzhang commented Feb 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SeanTAllen commented Feb 20, 2024

SeanTAllen commented Feb 20, 2024

SeanTAllen commented Feb 24, 2024