-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log: create crdb-v2 log entry parser and integrate it into debug merge-logs #65633
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right before anything else we need to talk a little bit about how to organize this work.
For any kind of project that looks like "we need to get data from point A to point B in different forms"
There is necessarily a conversion step, meaning we have at least two algorithms of interest: parsing the input into an intermediate representation, then printing out the representation in the output format.
Whenever there are two separate algorithms, we can split the work into two sub-projects.
The one you're looking at here is thus specifically the parser.
Now you have a simpler project "we need to parse data from place X (e.g. a file) into place Y (a data structure)"
If that is the project you're working on, the commit message should reflect that, and explain what are X and Y.
Next thing to talk about is how to go about a programming task to implement an algorithm. Most of the time, it's extremely useful for productivity to set up a test program (or better yet - a unit test) ahead of starting to implement the algorithm. This way, at every step of the implementation you can exercise your implementation so far by running your test and see what the test says.
What does a test look like for a parser? That's a programming exercise in its own right. A parsing test is a collection of candidate inputs, with logic that applies the parser (yet to be programmed) to the input, then compares the actual output (from the parser) to some expected output (defined in the test).
For this kind of test structure nowadays we use the datadriven
package. An example such test can be found in pkg/sql/pgwire/hba/hba_test.go
.
Reviewable status: complete! 0 of 0 LGTMs obtained
In this case you'd thus start by populating some test code in |
6ae171f
to
af56499
Compare
Wrote up some test cases for parsing single line entries, which the current form of the parser has passed. Now looking at combining multi-line entries into single entries in memory. I am trying to figure out if I am going in the right direction by working on a v2 version of the peek function (which is called by the pop function) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez)
pkg/cli/debug_merge_logs.go, line 531 at r1 (raw file):
} func (s *fileLogStream) peekV2() (logpb.Entry, bool) { for !s.read && s.err == nil {
-
This logic belongs to the entry parser in the log package, not the
merge-logs
command. -
you can capture all the continuation lines using a single regular expression. No need for a "moreLines" loop.
Let's make some time together and I will teach you about the magic of regular expressions.
pkg/util/log/format_crdb_v2.go, line 465 at r1 (raw file):
} b := d.scanner.Bytes() m := d.re.FindSubmatch(b)
-
You're using the v1 regular expression. It's unsuitable for v2. You need to define a new regular expression.
-
Separately from a new regexp, another question is whether you want to reuse the existing
EntryDecoder
type or define a new one. It can go either way.
- Different types:
- Pros: can use the same method name
Decode
on both types with a different implementation - Cons: code duplication
- Pros: can use the same method name
- Same type, different fields
- Pros: can reuse (part of) the constructor
- Cons: either different
Decode
method name, or need to plug a method via a function attribute
Do you have an opinion on the matter?
pkg/util/log/format_crdb_v2_test.go, line 193 at r1 (raw file):
fmt.Println("Here") fmt.Println(e.File == "") err := NewEntryDecoder(strings.NewReader(td.Input), 1).DecodeV2(&e)
Don't use literal constants in constructors. The constant for redactability has a name, use the name.
Also, we want the test to see that the redactability information is preserved. Your choice of using WithFlattenedSensitiveData
here removes it. You need WithMarkedSensitiveData
.
pkg/util/log/format_crdb_v2_test.go, line 197 at r1 (raw file):
return fmt.Sprintf("error: %v\n", err) } var sb strings.Builder
Using strings.Builder here and the all the tablewriter code below is very cute but it's overkill :)
You can simplify all this to:
return fmt.Sprintf("%# v", pretty.Formatter(e))
aef44d3
to
35bd33b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love how you have simplified the code. It looks much better this way.
Let's move on to polishing this a bit further.
Reviewed 3 of 4 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)
pkg/util/log/format_crdb_v2.go, line 472 at r2 (raw file):
/* Context tags */ `(?:\[((?:[^]]|\][^ ])+)\] )` + /* Counter */ `((?:\d+)?) ` + /* Continuation marker */ `([ =!+|])` +
please ensure the comments align with the previous lines properly.
pkg/util/log/format_crdb_v2.go, line 532 at r2 (raw file):
} // Process the file.
Can you handle the (gostd)
case in the same way as the crdb-v1 code thanks
(also add a test for it)
pkg/util/log/format_crdb_v2.go, line 559 at r2 (raw file):
// Process the log message. // While the entry has additional lines, collect the full message. entry.Message = string(trimFinalNewLines(m[12]))
Can I suggest you replace the direct assignment of entry.Message
by a strings.Builder
,
with entry.Message = buf.String()
at the end. This will save allocations.
pkg/util/log/format_crdb_v2.go, line 582 at r2 (raw file):
entry.Message += msg case "!": entry.Message += "\n(stack trace: " + msg
please don't enclose the stack trace between parentheses - they are commonly many lines long, and the reader will become confused by the time the closing parenthesis is encountered.
(Also not using parentheses will allow you to simplify the case below)
pkg/util/log/format_crdb_v2.go, line 585 at r2 (raw file):
hasStackTrace = true default: hasMoreLines = false
I don't think you need hasMoreLines
- a simple break
would suffice here.
pkg/util/log/format_crdb_v2.go, line 595 at r2 (raw file):
} func (d *EntryDecoderV2) split(data []byte, atEOF bool) (advance int, token []byte, err error) {
Where is this code used?
Yes! The second test includes handling the
Yes agreed, that's more efficient.
I decided to use
This was used when the decoder had a scanner field, but it no longer does so this function is now not needed - thanks! |
65342: multi-region syntax diagrams r=ericharmeling a=ericharmeling - Updated ALTER/CREATE DATABASE diagrams for multi-region - Updated CREATE TABLE diagram for multi-region - Added opt_locality diagram Release note: None 66096: util/log: extend `logpb.Entry` with more fine grained fields r=cameronnunez a=knz Needed for #65633 and #61378. cc @cockroachdb/server 66142: cli: skip under testrace as this test might be timing out r=jlinder a=adityamaru Release note: None Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Aditya Maru <[email protected]>
f0013ab
to
b05a534
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the last commit r4 "Merge branch 'cockroachdb:master' into v2-parser"
And instead do a rebase operation.
(We generally never merge the master branch into PR branches)
After you do a rebase, please refresh your parser test to include the new fields in logpb.Entry
Also please populate the new fields (StructuredStart/End, StackTraceStart) since you have enough information for them. You'll need them in the debug command code.
Reviewed 1 of 4 files at r2, 5 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez)
pkg/util/log/format_crdb_v2.go, line 472 at r2 (raw file):
Previously, knz (kena) wrote…
please ensure the comments align with the previous lines properly.
there's still a misalignment. Probably due to using tabs instead of spaces?
pkg/util/log/format_crdb_v2.go, line 555 at r3 (raw file):
var entryMsg strings.Builder fmt.Fprintf(&entryMsg, string(trimFinalNewLines(m[12])))
Fprint
, not Fprintf
pkg/util/log/format_crdb_v2.go, line 572 at r3 (raw file):
switch cont { case "+": fmt.Fprintf(&entryMsg, "\n"+msg)
Fprintf(..., "%s\n", msg)
pkg/util/log/format_crdb_v2.go, line 574 at r3 (raw file):
fmt.Fprintf(&entryMsg, "\n"+msg) case "|": fmt.Fprintf(&entryMsg, msg)
Fprint
pkg/util/log/format_crdb_v2.go, line 576 at r3 (raw file):
fmt.Fprintf(&entryMsg, msg) case "!": fmt.Fprintf(&entryMsg, "\nstack trace:\n"+msg)
Fprintf
with suitable %s
11b67aa
to
c18376a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @cameronnunez, and @knz)
pkg/server/status.go, line 1006 at r36 (raw file):
Previously, knz (kena) wrote…
There's still a stray
: %s
at the end now. That's invalid.
Fixed.
pkg/util/log/file_api.go, line 375 at r29 (raw file):
Previously, knz (kena) wrote…
yes I think so.
Got it. Removed.
pkg/util/log/file_api.go, line 293 at r45 (raw file):
Previously, knz (kena) wrote…
"... is like FetchEntriesFromFiles but the caller can specified the format of the log file."
Done.
pkg/util/log/format_crdb_v1.go, line 440 at r45 (raw file):
Previously, knz (kena) wrote…
This simplification is not an improvement.
bytes.TrimRight()
incurs two heap allocations and is going to iterate 3 times over the final newlines.
The original code is objectively better.
Understood, reverted back to the original code.
pkg/util/log/format_crdb_v2.go, line 470 at r45 (raw file):
Previously, knz (kena) wrote…
I believe this one can be simplified to just
... `\[(?P<tags>(?:[^]]|\][^ ])+)\] ` +(the outer non-grouping parentheses can be omitted)
Updated.
pkg/util/log/format_crdb_v2.go, line 503 at r45 (raw file):
Previously, knz (kena) wrote…
if the panic object is not an error, then do not change it into an error. Instead, propagate it:
default: panic(r)
Done.
pkg/util/log/format_crdb_v2.go, line 597 at r45 (raw file):
Previously, knz (kena) wrote…
this should be an
panic(errors.AssertionFailedf("..."))
Fixed.
pkg/util/log/log_decoder.go, line 42 at r45 (raw file):
Previously, knz (kena) wrote…
This is not the right location to define this mapping.
Generally, complex objects that do not depend on a function's parameters nor the time of invocation should be defined in the global scope.
In this particular case, informats.go
.
Fixed.
pkg/util/log/log_decoder.go, line 53 at r45 (raw file):
Previously, knz (kena) wrote…
I do not understand this entry nor the next one. Are you trying to invent new format names? I don't see why we should support more names for parsing than are available in the logging output configuration.
I included them in case the user simplifies the format name (for example, a user passes "v2" instead of "crdb-v2") but this is unnecessary and seems undesirable, so this is now removed.
pkg/util/log/log_decoder.go, line 57 at r45 (raw file):
Previously, knz (kena) wrote…
This comment is not reflecting the conditional immediately below it.
Updated.
pkg/util/log/log_decoder.go, line 79 at r45 (raw file):
Previously, knz (kena) wrote…
include the value specified by the user inside the error message
Done.
pkg/util/log/log_decoder.go, line 97 at r45 (raw file):
Previously, knz (kena) wrote…
for next time: when you link to an issue from the code, ensure the issue has the label
X-anchored-telemetry
.
Will do 👍
65462fd
to
545b1b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but CI is telling you something remains to be fixed
Reviewed 5 of 8 files at r46, 5 of 5 files at r47.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @cameronnunez)
pkg/sql/pgwire/auth_test.go, line 319 at r47 (raw file):
if entry.StructuredEnd == 0 { // TODO(knz): Enhance this when the log file // contains proper markers for structured entries.
You can remove this comment now
pkg/util/log/formats.go, line 22 at r47 (raw file):
} var formats = map[string]string{
I believe a better name for this variable is formatParsers
(it's a mapping so the name should describe both the keys and the values in the map)
ba12cc8
to
dc96ad9
Compare
bors r=knz,ajwerner |
bors r- |
Canceled. |
42d83e5
to
7c7f1d8
Compare
as discussed offline: you can re-add the The remaining CI failure(s) if any are flakes in tests unrelated to your PR. We need to file issues when those occurs, to fix by other teams, but they should not block you from merging this. |
The crdb-v2 log file format does not currently have an entry parser. This patch creates the new parser alongside the v1 parser. Release note (cli change): Previously, the crdb-v2 log file format lacked a parser. This has now changed. Co-authored-by: Andrew Werner <[email protected]>
bors r=knz,ajwerner |
Build failed (retrying...): |
bors r=knz,ajwerner |
Already running a review |
Build succeeded: |
Fixes #65504.
The crdb-v2 log file format is not recognized by some tools like debug merge-logs.
This is makes it difficult/impossible to merge log files properly.
This patch creates a new parser and integrates it into debug merge-logs.
Release note (cli change): the tool debug merge-logs is now able to recognize
crdb-v2 formatted log files.