Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to filter log messages by regular expression #2893

Open
2 of 3 tasks
logological opened this issue Jun 16, 2021 · 13 comments · May be fixed by #3108
Open
2 of 3 tasks

Ability to filter log messages by regular expression #2893

logological opened this issue Jun 16, 2021 · 13 comments · May be fixed by #3108

Comments

@logological
Copy link
Contributor

logological commented Jun 16, 2021

  • I have searched the issues (including closed ones) and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.
  • I am willing to lend a hand to help implement this feature.

Feature Request

Pelican has a LOG_FILTER setting that allows warnings and other logging messages to be suppressed based on the exact text of the message, or on a fixed template. Instead of (or in addition to) filtering by fixed templates, it would be nice if messages could be filtered by regular expression. Among other things, this would provide a better fix for Issue #2398 by allowing warning messages to be filtered on a per-image or per-page basis.

@stale
Copy link

stale bot commented Aug 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your participation and understanding.

@stale stale bot added the stale Marked for closure due to inactivity label Aug 21, 2021
@justinmayer justinmayer removed the stale Marked for closure due to inactivity label Sep 4, 2021
@stale
Copy link

stale bot commented Apr 29, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your participation and understanding.

@stale stale bot added the stale Marked for closure due to inactivity label Apr 29, 2022
@justinmayer
Copy link
Member

While I can see the potential value in filtering logs via regular expressions, I'm not quite sure how many folks would actually use such a feature. Anyone out there want this feature enough to be willing to work on implementing it?

@stale stale bot removed the stale Marked for closure due to inactivity label Jul 21, 2022
@sonologic
Copy link

sonologic commented Mar 10, 2023

I actually just ran into a desire to have this as well. I've added https://pypi.org/project/pelican-advthumbnailer/ to my site and it is spouting a lot of warnings that can be safely ignored, but since they include full filesystem paths, the exact message depends on where the site content lives.
I'm willing to put in the work to make this happen. I'd propose to add a setting LOG_FILTER_REGEXP of type List[Tuple[int, str]] that contains 'log level', 'regular expression' tuples. Would that be a valid approach?
As for the implementation, I'd probably want to pre-compile the regular expressions when initialising the Filter object and then iterate over the regular expressions when the filter is invoked.

@justinmayer
Copy link
Member

Hi Koen. Thanks for offering to assist with this feature. Rather than adding a new setting, perhaps the existing setting could be extended to include the filter type, defaulting to string unless otherwise specified? Maybe something like:

LOG_FILTER = [
    ("string", logging.WARN, "TAG_SAVE_AS is set to False"),
    ("regex", logging.WARN, 'r"^TAG_SAVE_AS'),
]

What do you think?

@sonologic
Copy link

Hi @justinmayer ,
My only objection would be about backwards compatibility. If we do this, people who already have LOG_FILTER in their pelicanconf.py need to adapt theirs unless we do a pass over the filter and convert any 2-tuple to a 3-tuple and maybe throw a FutureWarning. But maybe that is what you meant with 'defaulting to string unless otherwise specified'. I think it makes a lot of sense to do it like that!
Cheers,
Koen

@justinmayer
Copy link
Member

That is indeed what I intended. 😊

sonologic added a commit to sonologic/pelican that referenced this issue Mar 13, 2023
@sonologic sonologic linked a pull request Mar 13, 2023 that will close this issue
4 tasks
sonologic added a commit to sonologic/pelican that referenced this issue Mar 13, 2023
@sonologic
Copy link

@justinmayer I've created an MR, but it's waiting for maintainer approval before the ci can run. If you have a moment, perhaps you can approve?

https://github.com/getpelican/pelican/actions/runs/4403482780

@justinmayer
Copy link
Member

@sonologic: I posted a comment to that PR. Let's move this discussion there.

@copperchin
Copy link
Contributor

Sorry if this is the wrong place to put this, but rather than change the length of the tuple in LOG_FILTERs values,
what about using re.compile when regex is desired?

So converting the example above:

LOG_FILTER = [
    (logging.WARN, "TAG_SAVE_AS is set to False"),
    (logging.WARN, re.compile("^TAG_SAVE_AS"),
]

This would prevent old settings from breaking in new versions of Pelican.

@avaris
Copy link
Member

avaris commented Oct 30, 2023

@copperchin Do you mean we expect users to provide compiled regular expressions? And handle it internally with an isinstance check? It would work, technically, but I worry about the "user-friendliness" a bit. Seems it would be easier to forget re.compileing them and not get expected results.

Also, changing 2-tuple to 3-tuple can be handled in a non-breaking way. It is relatively easy to issue a DeprecationWarning if 2-tuple is seen and migrate to 3-tuple while reading the settings.

sonologic added a commit to sonologic/pelican that referenced this issue Oct 30, 2023
@copperchin
Copy link
Contributor

@copperchin Do you mean we expect users to provide compiled regular expressions? And handle it internally with an isinstance check? It would work, technically, but I worry about the "user-friendliness" a bit. Seems it would be easier to forget re.compileing them and not get expected results.

It seems reasonable to assume that if :

  1. the user is actively editing a python "config" script
  2. the user is aware of regex (in that they know it solves the problem they have)

... then they know they need to mark the regex as such.

I don't think using re.compile(regex_str) is any bigger a hurdle than the already-required practice of specifying a logging level (instead of a string). In any case, this new feature ought to be documented, wherein presenting the use of re.compile should be fairly straightforward.

It also has the added benefit of being immediately identifiable as regex in any syntax highlighter 👍


Also, changing 2-tuple to 3-tuple can be handled in a non-breaking way. It is relatively easy to issue a DeprecationWarning if 2-tuple is seen and migrate to 3-tuple while reading the settings.

This is only true in one direction.

Keeping the 2-tuple format means configs will only break older versions of pelican if attempting to use the new (unsupported) feature.

Furthermore, keeping the 2-tuple means updating to a newer version of pelican won't require updating existing configs.

...

On the other hand, changing the config format to 3-tuples requires rewriting the configs, which will immediately break on older versions pelican, regardless of whether they are trying to use the new feature or not. This would be particularly annoying if needing to roll back to previous version of pelican.

sonologic added a commit to sonologic/pelican that referenced this issue Oct 30, 2023
@avaris
Copy link
Member

avaris commented Oct 30, 2023

I didn't mean it as a hurdle. Knowing what should be done and actually remembering to do it are different things :).

On the other hand, changing the config format to 3-tuples requires rewriting the configs

No, with the way I described, it will still work but you'll get a pesky warning about it. And in my experience, people don't stick with a version long enough to port everything over before doing a roll back. They usually hit some deal breaker pretty early (immediately?). So, compatibility in the other direction is less of a concern mostly.

Anyway, I don't have a strong preference in either option :). In technical terms, we can make both work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants