log: create crdb-v2 log entry parser and integrate it into debug merge-logs #65633

cameronnunez · 2021-05-24T19:56:42Z

The crdb-v2 log file format is not recognized by some tools like debug merge-logs.
This is makes it difficult/impossible to merge log files properly.
This patch creates a new parser and integrates it into debug merge-logs.

Release note (cli change): the tool debug merge-logs is now able to recognize
crdb-v2 formatted log files.

cockroach-teamcity · 2021-05-24T19:56:49Z

This change is

knz

All right before anything else we need to talk a little bit about how to organize this work.

For any kind of project that looks like "we need to get data from point A to point B in different forms"
There is necessarily a conversion step, meaning we have at least two algorithms of interest: parsing the input into an intermediate representation, then printing out the representation in the output format.

Whenever there are two separate algorithms, we can split the work into two sub-projects.
The one you're looking at here is thus specifically the parser.

Now you have a simpler project "we need to parse data from place X (e.g. a file) into place Y (a data structure)"

If that is the project you're working on, the commit message should reflect that, and explain what are X and Y.

Next thing to talk about is how to go about a programming task to implement an algorithm. Most of the time, it's extremely useful for productivity to set up a test program (or better yet - a unit test) ahead of starting to implement the algorithm. This way, at every step of the implementation you can exercise your implementation so far by running your test and see what the test says.

What does a test look like for a parser? That's a programming exercise in its own right. A parsing test is a collection of candidate inputs, with logic that applies the parser (yet to be programmed) to the input, then compares the actual output (from the parser) to some expected output (defined in the test).

For this kind of test structure nowadays we use the datadriven package. An example such test can be found in pkg/sql/pgwire/hba/hba_test.go.

Reviewable status: complete! 0 of 0 LGTMs obtained

knz · 2021-05-25T11:22:56Z

In this case you'd thus start by populating some test code in format_crdb_v2_test.go and see how that goes.

cameronnunez · 2021-05-27T00:05:09Z

@knz

Wrote up some test cases for parsing single line entries, which the current form of the parser has passed.

Now looking at combining multi-line entries into single entries in memory. I am trying to figure out if I am going in the right direction by working on a v2 version of the peek function (which is called by the pop function) in debug_merge_logs.go. Am I headed in the right direction here or should I be looking elsewhere?

knz

Reviewed 3 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez)

pkg/cli/debug_merge_logs.go, line 531 at r1 (raw file):

}
func (s *fileLogStream) peekV2() (logpb.Entry, bool) {
	for !s.read && s.err == nil {

This logic belongs to the entry parser in the log package, not the merge-logs command.
you can capture all the continuation lines using a single regular expression. No need for a "moreLines" loop.

Let's make some time together and I will teach you about the magic of regular expressions.

pkg/util/log/format_crdb_v2.go, line 465 at r1 (raw file):

		}
		b := d.scanner.Bytes()
		m := d.re.FindSubmatch(b)

You're using the v1 regular expression. It's unsuitable for v2. You need to define a new regular expression.
Separately from a new regexp, another question is whether you want to reuse the existing EntryDecoder type or define a new one. It can go either way.

Different types:
- Pros: can use the same method name Decode on both types with a different implementation
- Cons: code duplication
Same type, different fields
- Pros: can reuse (part of) the constructor
- Cons: either different Decode method name, or need to plug a method via a function attribute

Do you have an opinion on the matter?

pkg/util/log/format_crdb_v2_test.go, line 193 at r1 (raw file):

				fmt.Println("Here")
				fmt.Println(e.File == "")
				err := NewEntryDecoder(strings.NewReader(td.Input), 1).DecodeV2(&e)

Don't use literal constants in constructors. The constant for redactability has a name, use the name.

Also, we want the test to see that the redactability information is preserved. Your choice of using WithFlattenedSensitiveData here removes it. You need WithMarkedSensitiveData.

pkg/util/log/format_crdb_v2_test.go, line 197 at r1 (raw file):

					return fmt.Sprintf("error: %v\n", err)
				}
				var sb strings.Builder

Using strings.Builder here and the all the tablewriter code below is very cute but it's overkill :)

You can simplify all this to:

return fmt.Sprintf("%# v", pretty.Formatter(e))

knz

I love how you have simplified the code. It looks much better this way.
Let's move on to polishing this a bit further.

Reviewed 3 of 4 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)

pkg/util/log/format_crdb_v2.go, line 472 at r2 (raw file):

		/* Context tags     				*/ `(?:\[((?:[^]]|\][^ ])+)\] )` +
		/* Counter          				*/ `((?:\d+)?) ` +
	  /* Continuation marker			*/ `([ =!+|])` +

please ensure the comments align with the previous lines properly.

pkg/util/log/format_crdb_v2.go, line 532 at r2 (raw file):

	}

	// Process the file.

Can you handle the (gostd) case in the same way as the crdb-v1 code thanks
(also add a test for it)

pkg/util/log/format_crdb_v2.go, line 559 at r2 (raw file):

	// Process the log message.
	// While the entry has additional lines, collect the full message.
	entry.Message = string(trimFinalNewLines(m[12]))

Can I suggest you replace the direct assignment of entry.Message by a strings.Builder,
with entry.Message = buf.String() at the end. This will save allocations.

pkg/util/log/format_crdb_v2.go, line 582 at r2 (raw file):

			entry.Message += msg
		case "!":
			entry.Message += "\n(stack trace: " + msg

please don't enclose the stack trace between parentheses - they are commonly many lines long, and the reader will become confused by the time the closing parenthesis is encountered.
(Also not using parentheses will allow you to simplify the case below)

pkg/util/log/format_crdb_v2.go, line 585 at r2 (raw file):

			hasStackTrace = true
		default:
			hasMoreLines = false

I don't think you need hasMoreLines - a simple break would suffice here.

pkg/util/log/format_crdb_v2.go, line 595 at r2 (raw file):

}

func (d *EntryDecoderV2) split(data []byte, atEOF bool) (advance int, token []byte, err error) {

Where is this code used?

cameronnunez · 2021-06-04T17:07:29Z

Can you handle the (gostd) case in the same way as the crdb-v1 code thanks
(also add a test for it)

Yes! The second test includes handling the (gostd) case.

// Process the log message.
// While the entry has additional lines, collect the full message.
entry.Message = string(trimFinalNewLines(m[12]))
Can I suggest you replace the direct assignment of entry.Message by a strings.Builder,
with entry.Message = buf.String() at the end. This will save allocations.

Yes agreed, that's more efficient.

			hasStackTrace = true
		default:
			hasMoreLines = false
I don't think you need hasMoreLines - a simple break would suffice here.

I decided to use hasMoreLines because it seems that break breaks out of the switch instead of the for loop.

func (d *EntryDecoderV2) split(data []byte, atEOF bool) (advance int, token []byte, err error) {
Where is this code used?

This was used when the decoder had a scanner field, but it no longer does so this function is now not needed - thanks!

65342: multi-region syntax diagrams r=ericharmeling a=ericharmeling - Updated ALTER/CREATE DATABASE diagrams for multi-region - Updated CREATE TABLE diagram for multi-region - Added opt_locality diagram Release note: None 66096: util/log: extend `logpb.Entry` with more fine grained fields r=cameronnunez a=knz Needed for #65633 and #61378. cc @cockroachdb/server 66142: cli: skip under testrace as this test might be timing out r=jlinder a=adityamaru Release note: None Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Aditya Maru <[email protected]>

knz

Can you remove the last commit r4 "Merge branch 'cockroachdb:master' into v2-parser"

And instead do a rebase operation.
(We generally never merge the master branch into PR branches)

After you do a rebase, please refresh your parser test to include the new fields in logpb.Entry
Also please populate the new fields (StructuredStart/End, StackTraceStart) since you have enough information for them. You'll need them in the debug command code.

Reviewed 1 of 4 files at r2, 5 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez)

pkg/util/log/format_crdb_v2.go, line 472 at r2 (raw file):

Previously, knz (kena) wrote…

please ensure the comments align with the previous lines properly.

there's still a misalignment. Probably due to using tabs instead of spaces?

pkg/util/log/format_crdb_v2.go, line 555 at r3 (raw file):

	var entryMsg strings.Builder

	fmt.Fprintf(&entryMsg, string(trimFinalNewLines(m[12])))

Fprint, not Fprintf

pkg/util/log/format_crdb_v2.go, line 572 at r3 (raw file):

		switch cont {
		case "+":
			fmt.Fprintf(&entryMsg, "\n"+msg)

Fprintf(..., "%s\n", msg)

pkg/util/log/format_crdb_v2.go, line 574 at r3 (raw file):

			fmt.Fprintf(&entryMsg, "\n"+msg)
		case "|":
			fmt.Fprintf(&entryMsg, msg)

Fprint

pkg/util/log/format_crdb_v2.go, line 576 at r3 (raw file):

			fmt.Fprintf(&entryMsg, msg)
		case "!":
			fmt.Fprintf(&entryMsg, "\nstack trace:\n"+msg)

Fprintf with suitable %s

cameronnunez

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @cameronnunez, and @knz)

pkg/server/status.go, line 1006 at r36 (raw file):

Previously, knz (kena) wrote…

There's still a stray : %s at the end now. That's invalid.

Fixed.

pkg/util/log/file_api.go, line 375 at r29 (raw file):

Previously, knz (kena) wrote…

yes I think so.

Got it. Removed.

pkg/util/log/file_api.go, line 293 at r45 (raw file):

Previously, knz (kena) wrote…

"... is like FetchEntriesFromFiles but the caller can specified the format of the log file."

Done.

pkg/util/log/format_crdb_v1.go, line 440 at r45 (raw file):

Previously, knz (kena) wrote…

This simplification is not an improvement. bytes.TrimRight() incurs two heap allocations and is going to iterate 3 times over the final newlines.
The original code is objectively better.

Understood, reverted back to the original code.

pkg/util/log/format_crdb_v2.go, line 470 at r45 (raw file):

Previously, knz (kena) wrote…

I believe this one can be simplified to just
   ...  `\[(?P<tags>(?:[^]]|\][^ ])+)\] ` +
(the outer non-grouping parentheses can be omitted)

Updated.

pkg/util/log/format_crdb_v2.go, line 503 at r45 (raw file):

Previously, knz (kena) wrote…

if the panic object is not an error, then do not change it into an error. Instead, propagate it:
default:
  panic(r)

Done.

pkg/util/log/format_crdb_v2.go, line 597 at r45 (raw file):

Previously, knz (kena) wrote…

this should be an panic(errors.AssertionFailedf("..."))

Fixed.

pkg/util/log/log_decoder.go, line 42 at r45 (raw file):

Previously, knz (kena) wrote…

This is not the right location to define this mapping.

Generally, complex objects that do not depend on a function's parameters nor the time of invocation should be defined in the global scope.
In this particular case, in formats.go.

Fixed.

pkg/util/log/log_decoder.go, line 53 at r45 (raw file):

Previously, knz (kena) wrote…

I do not understand this entry nor the next one. Are you trying to invent new format names? I don't see why we should support more names for parsing than are available in the logging output configuration.

I included them in case the user simplifies the format name (for example, a user passes "v2" instead of "crdb-v2") but this is unnecessary and seems undesirable, so this is now removed.

pkg/util/log/log_decoder.go, line 57 at r45 (raw file):

Previously, knz (kena) wrote…

This comment is not reflecting the conditional immediately below it.

Updated.

pkg/util/log/log_decoder.go, line 79 at r45 (raw file):

Previously, knz (kena) wrote…

include the value specified by the user inside the error message

Done.

pkg/util/log/log_decoder.go, line 97 at r45 (raw file):

Previously, knz (kena) wrote…

for next time: when you link to an issue from the code, ensure the issue has the label X-anchored-telemetry.

Will do 👍

knz

but CI is telling you something remains to be fixed

Reviewed 5 of 8 files at r46, 5 of 5 files at r47.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @cameronnunez)

pkg/sql/pgwire/auth_test.go, line 319 at r47 (raw file):

								if entry.StructuredEnd == 0 {
									// TODO(knz): Enhance this when the log file
									// contains proper markers for structured entries.

You can remove this comment now

pkg/util/log/formats.go, line 22 at r47 (raw file):

}

var formats = map[string]string{

I believe a better name for this variable is formatParsers (it's a mapping so the name should describe both the keys and the values in the map)

cameronnunez · 2021-07-13T21:16:57Z

thanks you guys! @ajwerner @knz

cameronnunez · 2021-07-13T21:17:27Z

bors r=knz,ajwerner

cameronnunez · 2021-07-13T21:26:32Z

bors r-

craig · 2021-07-13T21:26:34Z

Canceled.

knz · 2021-07-14T16:21:09Z

as discussed offline: you can re-add the Co-authored-by: line in the commit message and merge this.

The remaining CI failure(s) if any are flakes in tests unrelated to your PR. We need to file issues when those occurs, to fix by other teams, but they should not block you from merging this.

The crdb-v2 log file format does not currently have an entry parser. This patch creates the new parser alongside the v1 parser. Release note (cli change): Previously, the crdb-v2 log file format lacked a parser. This has now changed. Co-authored-by: Andrew Werner <[email protected]>

cameronnunez · 2021-07-14T20:17:02Z

bors r=knz,ajwerner

craig · 2021-07-14T22:02:42Z

Build failed (retrying...):

GitHub CI (Cockroach)

cameronnunez · 2021-07-14T22:45:17Z

bors r=knz,ajwerner

craig · 2021-07-14T22:45:19Z

Already running a review

craig · 2021-07-14T23:59:48Z

Build succeeded:

GitHub CI (Cockroach)

cameronnunez added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-logging In and around the logging infrastructure. labels May 24, 2021

cameronnunez requested a review from knz May 24, 2021 19:56

cameronnunez marked this pull request as draft May 24, 2021 19:57

knz reviewed May 25, 2021

View reviewed changes

cameronnunez force-pushed the v2-parser branch 4 times, most recently from 6ae171f to af56499 Compare May 26, 2021 18:17

cameronnunez requested a review from knz May 27, 2021 00:12

knz reviewed May 27, 2021

View reviewed changes

cameronnunez force-pushed the v2-parser branch 6 times, most recently from aef44d3 to 35bd33b Compare June 4, 2021 00:53

cameronnunez requested a review from knz June 4, 2021 00:54

cameronnunez force-pushed the v2-parser branch from 35bd33b to 59e2e5c Compare June 4, 2021 00:56

knz reviewed Jun 4, 2021

View reviewed changes

knz mentioned this pull request Jun 4, 2021

util/log: extend logpb.Entry with more fine grained fields #66096

Merged

cameronnunez force-pushed the v2-parser branch 2 times, most recently from f0013ab to b05a534 Compare June 7, 2021 17:56

knz reviewed Jun 8, 2021

View reviewed changes

cameronnunez force-pushed the v2-parser branch 2 times, most recently from 11b67aa to c18376a Compare July 12, 2021 14:30

cameronnunez commented Jul 12, 2021

View reviewed changes

cameronnunez force-pushed the v2-parser branch 2 times, most recently from 65462fd to 545b1b8 Compare July 12, 2021 22:52

cameronnunez requested a review from knz July 13, 2021 01:16

knz approved these changes Jul 13, 2021

View reviewed changes

cameronnunez force-pushed the v2-parser branch 3 times, most recently from ba12cc8 to dc96ad9 Compare July 13, 2021 19:59

cameronnunez force-pushed the v2-parser branch 2 times, most recently from 42d83e5 to 7c7f1d8 Compare July 14, 2021 15:20

cameronnunez force-pushed the v2-parser branch from 7c7f1d8 to 72e4e86 Compare July 14, 2021 18:40

craig bot merged commit 00f9b26 into cockroachdb:master Jul 14, 2021

aliher1911 mentioned this pull request Jul 30, 2021

Can't use debug merge-logs on logs from 20.2 unless explicitly provide flag #68278

Closed

jseldess mentioned this pull request Sep 8, 2021

log: create crdb-v2 log entry parser and integrate it into debug merge-logs cockroachdb/docs#11521

Closed

exalate-issue-sync bot mentioned this pull request Nov 11, 2022

release-21.1: util/log,server/debug: fixes to logging and the logspy api endpoint cockroachdb/docs#11075

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log: create crdb-v2 log entry parser and integrate it into debug merge-logs #65633

log: create crdb-v2 log entry parser and integrate it into debug merge-logs #65633

cameronnunez commented May 24, 2021 •

edited

Loading

cockroach-teamcity commented May 24, 2021

knz left a comment

knz commented May 25, 2021

cameronnunez commented May 27, 2021 •

edited

Loading

knz left a comment

knz left a comment

cameronnunez commented Jun 4, 2021 •

edited

Loading

knz left a comment

cameronnunez left a comment

knz left a comment

cameronnunez commented Jul 13, 2021

cameronnunez commented Jul 13, 2021

cameronnunez commented Jul 13, 2021

craig bot commented Jul 13, 2021

knz commented Jul 14, 2021

cameronnunez commented Jul 14, 2021

craig bot commented Jul 14, 2021

cameronnunez commented Jul 14, 2021

craig bot commented Jul 14, 2021

craig bot commented Jul 14, 2021

log: create crdb-v2 log entry parser and integrate it into debug merge-logs #65633

log: create crdb-v2 log entry parser and integrate it into debug merge-logs #65633

Conversation

cameronnunez commented May 24, 2021 • edited Loading

cockroach-teamcity commented May 24, 2021

knz left a comment

Choose a reason for hiding this comment

knz commented May 25, 2021

cameronnunez commented May 27, 2021 • edited Loading

knz left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

cameronnunez commented Jun 4, 2021 • edited Loading

knz left a comment

Choose a reason for hiding this comment

cameronnunez left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

cameronnunez commented Jul 13, 2021

cameronnunez commented Jul 13, 2021

cameronnunez commented Jul 13, 2021

craig bot commented Jul 13, 2021

knz commented Jul 14, 2021

cameronnunez commented Jul 14, 2021

craig bot commented Jul 14, 2021

cameronnunez commented Jul 14, 2021

craig bot commented Jul 14, 2021

craig bot commented Jul 14, 2021

cameronnunez commented May 24, 2021 •

edited

Loading

cameronnunez commented May 27, 2021 •

edited

Loading

cameronnunez commented Jun 4, 2021 •

edited

Loading