Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide compatibility with mkdocs-material blog plugin #72

Merged
merged 4 commits into from
Feb 25, 2024

Conversation

SeanTAllen
Copy link
Contributor

Prior to this commit, certain assumptions were made about the files seen in on_files that are not true when the mkdocs-material blog plugin is used.

The url seen at the time on_files is called is not guaranteed to be the final url that will appear in html. With the blog plugin, at the time on_files is called, the value will be something like:

blog/posts/foo.md

but at the time we are trying to get a mapping, the url will be something like:

blog/2024/01/foo.html

Due to this change, htmlproofer would fail to validate the url despite it being valid.

To address this, instead of looking up pages from a Dict where the key is set at the time of on_files, we know store a list of Files and check for the search_path for a url against the url attribute of each File to try to find the mapping.

@SeanTAllen
Copy link
Contributor Author

@manuzhang should the linting errors be fixed as part of this PR?

@manuzhang
Copy link
Owner

Yes, please fix it

try:
return files[search_path]
except KeyError:
for file in files:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this impact performance significantly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It changes the O for lookup to be linear so for huge sites, it could have a large impact.

Unfortunately the faster lookup that existed previously is also a bug.

The performance could be addressed by creating a Dictionary at the time that lookup is needed after all the File objects have been fully configured. This would have better performance characteristics but is also more obscure and probably a bit harder to maintain.

On almost all sites, the impact of this check should not be particularly noticeable to a "reasonable human".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, I'm not sure where the proper place to build that Dictionary to speed the lookups in find_source_file would be.

My python is not the greatest. Ideally, if you want to do that change, I'd prefer to hand it off to you.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither ;). @johnthagen could you please take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change I had to make to get this to work for the Windows tests, the cost has increased some as os.path.normpath needs to be called on file.url for the comparison to work, so that is an additional bit of cost.

I ran again locally with the source for ponylang.io (https://github.com/ponylang/ponylang-website) and if there is a difference, it isn't noticeable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the lifecycle correctly then, self.files could be turned into an optimized dictionary in on_post_page anytime before the for a in soup.find_all('a', href=True): loop.

If that is correct, I do a dictionary based structure that mimics the old data structure but has the correct information.

Can you confirm that my understanding of the lifecycle is correct?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make that change in the not so distant future. Before the end of the weekend at the latest.

@SeanTAllen SeanTAllen force-pushed the material-blog branch 5 times, most recently from 7fe9b58 to 99707e7 Compare February 20, 2024 16:05
@SeanTAllen
Copy link
Contributor Author

@manuzhang this particular implementation apparently has an issue with windows. I'm guessing that is a path separator issue. I'll look at it later.

@SeanTAllen SeanTAllen force-pushed the material-blog branch 3 times, most recently from 9f7a612 to b99b84c Compare February 20, 2024 16:35
@SeanTAllen
Copy link
Contributor Author

I think I know what the windows issue is. Testing that out now.

@SeanTAllen SeanTAllen force-pushed the material-blog branch 2 times, most recently from eee2979 to c3eb1df Compare February 20, 2024 16:41
Prior to this commit, certain assumptions were made about the files
seen in `on_files` that are not true when the mkdocs-material blog
plugin is used.

The url seen at the time `on_files` is called is not guaranteed to
be the final url that will appear in html. With the blog plugin,
at the time `on_files` is called, the value will be something like:

`blog/posts/foo.md`

but at the time we are trying to get a mapping, the url will be
something like:

`blog/2024/01/foo.html`

Due to this change, htmlproofer would fail to validate the url despite
it being valid.

To address this, instead of looking up pages from a Dict where the
key is set at the time of `on_files`, we know store a list of Files
and check for the `search_path` for a url against the `url` attribute
of each File to try to find the mapping.
@SeanTAllen
Copy link
Contributor Author

@manuzhang all set. with the optimization, all the old tests continued to work unchanged. It ends up being a better change.

I tested this locally as well with a site that uses the blog as an extra level of verification.

@manuzhang manuzhang merged commit 65ae40f into manuzhang:main Feb 25, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants