-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide compatibility with mkdocs-material blog plugin #72
Conversation
@manuzhang should the linting errors be fixed as part of this PR? |
Yes, please fix it |
htmlproofer/plugin.py
Outdated
try: | ||
return files[search_path] | ||
except KeyError: | ||
for file in files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this impact performance significantly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It changes the O for lookup to be linear so for huge sites, it could have a large impact.
Unfortunately the faster lookup that existed previously is also a bug.
The performance could be addressed by creating a Dictionary at the time that lookup is needed after all the File objects have been fully configured. This would have better performance characteristics but is also more obscure and probably a bit harder to maintain.
On almost all sites, the impact of this check should not be particularly noticeable to a "reasonable human".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, I'm not sure where the proper place to build that Dictionary to speed the lookups in find_source_file
would be.
My python is not the greatest. Ideally, if you want to do that change, I'd prefer to hand it off to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me neither ;). @johnthagen could you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change I had to make to get this to work for the Windows tests, the cost has increased some as os.path.normpath
needs to be called on file.url
for the comparison to work, so that is an additional bit of cost.
I ran again locally with the source for ponylang.io (https://github.com/ponylang/ponylang-website) and if there is a difference, it isn't noticeable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the lifecycle correctly then, self.files could be turned into an optimized dictionary in on_post_page
anytime before the for a in soup.find_all('a', href=True):
loop.
If that is correct, I do a dictionary based structure that mimics the old data structure but has the correct information.
Can you confirm that my understanding of the lifecycle is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make that change in the not so distant future. Before the end of the weekend at the latest.
7fe9b58
to
99707e7
Compare
@manuzhang this particular implementation apparently has an issue with windows. I'm guessing that is a path separator issue. I'll look at it later. |
9f7a612
to
b99b84c
Compare
I think I know what the windows issue is. Testing that out now. |
eee2979
to
c3eb1df
Compare
Prior to this commit, certain assumptions were made about the files seen in `on_files` that are not true when the mkdocs-material blog plugin is used. The url seen at the time `on_files` is called is not guaranteed to be the final url that will appear in html. With the blog plugin, at the time `on_files` is called, the value will be something like: `blog/posts/foo.md` but at the time we are trying to get a mapping, the url will be something like: `blog/2024/01/foo.html` Due to this change, htmlproofer would fail to validate the url despite it being valid. To address this, instead of looking up pages from a Dict where the key is set at the time of `on_files`, we know store a list of Files and check for the `search_path` for a url against the `url` attribute of each File to try to find the mapping.
cabe052
to
8d9d9db
Compare
@manuzhang all set. with the optimization, all the old tests continued to work unchanged. It ends up being a better change. I tested this locally as well with a site that uses the blog as an extra level of verification. |
Prior to this commit, certain assumptions were made about the files seen in
on_files
that are not true when the mkdocs-material blog plugin is used.The url seen at the time
on_files
is called is not guaranteed to be the final url that will appear in html. With the blog plugin, at the timeon_files
is called, the value will be something like:blog/posts/foo.md
but at the time we are trying to get a mapping, the url will be something like:
blog/2024/01/foo.html
Due to this change, htmlproofer would fail to validate the url despite it being valid.
To address this, instead of looking up pages from a Dict where the key is set at the time of
on_files
, we know store a list of Files and check for thesearch_path
for a url against theurl
attribute of each File to try to find the mapping.