Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix log record IDs #29355

Closed
wants to merge 2 commits into from
Closed

Fix log record IDs #29355

wants to merge 2 commits into from

Conversation

adamslc
Copy link
Contributor

@adamslc adamslc commented Sep 24, 2018

This PR ensures that a given log statement will always produce the same ID. It is actually a bit tricky to implement because the ID should "uniquely identify the source location in the originating module, but should be stable across versions of the originating module, provided the log generating statement itself doesn't change" (according to the source).

This definitely needs some tests so that this doesn't break again.

Closes #28786.

@fredrikekre fredrikekre added bugfix This change fixes an existing bug logging The logging framework labels Sep 25, 2018
@adamslc adamslc changed the title [WIP] Fix log record IDs Fix log record IDs Sep 25, 2018
@adamslc
Copy link
Contributor Author

adamslc commented Sep 25, 2018

I think that this is ready to go now, but I do have a concern about the implementation as it stands now.

Suppose that some package has a function like this:

function f()
    for i in 1:1_000_000
        @info "Working"
    end
    @info "Done"
end

An application might want to filter out the "Working" logs, but pass along the "Done" log to the user. Prior to this PR, there was no good way to do that, but now the application can filter based on the log record ID, and everything works. Yay!

But now suppose that the package is modified to add a new log:

function f()
    @info "Starting"
    for i in 1:1_000_000
        @info "Working"
    end
    @info "Done"
end

Now the application will actually filter out the "Starting" log, and the user will be flooded with "Working" logs. One solution would be to include the message in the hash to compute the ID, but this would mean that the message (which may be expensive) would need to be computed before the shouldlog check.

@adamslc
Copy link
Contributor Author

adamslc commented Sep 26, 2018

Bump. It would be great if this could make it in to v1.0.1.

@ViralBShah
Copy link
Member

1.0.1 is frozen. Can go into 1.0.2

@oxinabox
Copy link
Contributor

oxinabox commented Sep 27, 2018

id in values(_log_record_ids)
is a $O(n)$ operation
when $n$ is the number of _log_record_ids
Probably fairly small.

Except if you do something like:

for ii in 1:10_000
     @info "round $ii  starting"
     #...
end

And I can see that code being written by a lot of people who don't know about logs now taking kwarguments, or even those who do, but think that interpolation is prettier.

@adamslc
Copy link
Contributor Author

adamslc commented Sep 27, 2018

I'm not really sure what you are arguing for? With this PR, every log record generated by your code with have the same ID, and the id in values(_log_record_ids) check will only happen on the first iteration.

Though the example you provide is another example of why including the message in the original id hash is a bad idea.

@adamslc
Copy link
Contributor Author

adamslc commented Oct 1, 2018

Can this please get a backport label so it doesn't get lost?

@StefanKarpinski StefanKarpinski added this to the 1.0.x milestone Oct 1, 2018
@oxinabox
Copy link
Contributor

oxinabox commented Oct 2, 2018

I'm not really sure what you are arguing for?

I misread something in the code.

I wish I understood why this feature worked in 0.7-beta without having this code.

@@ -311,12 +315,12 @@ function logmsg_code(_module, file, line, level, message, exs...)
_module = $_module
logger = current_logger_for_env(std_level, group, _module)
if !(logger === nothing)
file = $file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these moved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those values are needed to compute id.

@fredrikekre fredrikekre requested a review from c42f October 19, 2018 15:22
@c42f
Copy link
Member

c42f commented Oct 20, 2018

Oh, you're right there's a very serious problem with id's in 1.0. Not only are they not stable, but they're computed at every log invocation (very expensive, and will explode the log id storage!) @oxinabox this was broken by the fix in f0e1d21 which went in very shortly before 1.0 (and fixed a quite legitimate bug, but edge case). Apparently my tests didn't cover this and I've been absent from the logging discussion for a few months, sorry (new job, etc etc...).

Hmm what to do. So I'm sure this PR is part of the solution, but I'm not sure it's quite right yet.

I'm thinking that automatically computed ids should always be statically computed, regardless of whether the user overrides the _module, _file, and _line. The heuristic we use may not suit in those cases anyway and the user can always override _id if they really must.

So what about if we go back to using the _sourceinfo for passing to log_record_id at compile time, with the autogenerated _id always arising from that? This has predictable cost of only happening once at compile time, as the original implementation intended.

@vtjnash
Copy link
Member

vtjnash commented Oct 20, 2018

Glad to have you back then! Yes, I think we knew we were rushing in a few fixes to this code that weren't 100% solid, but seemed not to be breaking anything (as you say, "part of the solution, but ... not quite right yet"). Perhaps we need more of that "memoized run-time execution" code sprinkled in here too (#28289)?

@c42f
Copy link
Member

c42f commented Oct 20, 2018

Thanks! I hope I'm back for a while... we'll see.

The memoization trick is a really clever and relatively nice way around the bootstrap problems with basename. We could use it for any metadata which is known (by definition or API contract) to be fixed for all invocations of the log statement, but which can't be computed at compile time. I don't think we've defined which of the special metadata keys are allowed to change between invocations of a given logging macro. One reasonable option might be "none" (in which case any of the special fields could be memoized). That might be too restrictive though.

Also regarding memoization, are there any problems with multithreading which could arise from that? Is it reasonable to assume pointer sized stores are atomic on all platforms we support?

@c42f
Copy link
Member

c42f commented Oct 20, 2018

Anyway, back to the issue at hand. For the particular case of id, we currently only promise in the docs that the id is an "arbitrary" but unique identifier for the log statement. So even a random uuid would be good enough to satisfy that, and much better than the current situation.

Thinking back, what I was trying to achieve with making id depend on overridable _module metadata was to allow wrapping logging macros inside other macros, having the somewhat-readable id respect the module where the outer macro was called. But this turns out not to work (unless you memoize it) because the inner logging macro only has syntax to work with, not the real Module.

Putting that all together, I think we should go back to simply hashing the (full) syntax of the log statement, combined with the module from the _sourceinfo and light deduplication. All of which is known and can be evaluated at compile time.

@c42f
Copy link
Member

c42f commented Oct 31, 2018

I've made a branch with my preferred fix (cjf/fix-logging-ids) I just haven't had the time to add additional tests.

[edit] Actually my branch runs into bootstrap issues which I'll need to chase down. We might need the memoization trick again to solve that problem.

@adamslc
Copy link
Contributor Author

adamslc commented Oct 31, 2018

Feel free to take the tests from this PR.

@c42f
Copy link
Member

c42f commented Oct 31, 2018

Thanks, I cherry picked your commit containing the tests already :-)

c42f added a commit that referenced this pull request Nov 1, 2018
Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

Fixes #28786, #28400; replaces #29355.
@adamslc
Copy link
Contributor Author

adamslc commented Nov 1, 2018

Closing in favor of #29878.

@adamslc adamslc closed this Nov 1, 2018
@adamslc adamslc deleted the maxlog_fix branch November 1, 2018 16:22
fredrikekre pushed a commit that referenced this pull request Nov 1, 2018
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`
KristofferC pushed a commit that referenced this pull request Nov 1, 2018
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`

(cherry picked from commit 51683c4)
KristofferC pushed a commit that referenced this pull request Nov 2, 2018
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`

(cherry picked from commit 51683c4)
KristofferC pushed a commit that referenced this pull request Feb 11, 2019
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`

(cherry picked from commit 51683c4)
KristofferC pushed a commit that referenced this pull request Feb 20, 2020
* fix bug and add tests

* Reinstate statically computed log record ids

Log ids are meant to identify the location of the message in the source
code and must be computed at compile time.

fix #28786, fix #28400; closes #29355.

* Clarify documentation regarding log record `id`

(cherry picked from commit 51683c4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug logging The logging framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

in warn macro the maxlog keyword is not effective
8 participants