Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: create crdb-v2 log entry parser and integrate it into debug merge-logs #65633

Merged
merged 1 commit into from
Jul 14, 2021

Conversation

cameronnunez
Copy link
Contributor

@cameronnunez cameronnunez commented May 24, 2021

Fixes #65504.

The crdb-v2 log file format is not recognized by some tools like debug merge-logs.
This is makes it difficult/impossible to merge log files properly.
This patch creates a new parser and integrates it into debug merge-logs.

Release note (cli change): the tool debug merge-logs is now able to recognize
crdb-v2 formatted log files.

@cameronnunez cameronnunez added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-logging In and around the logging infrastructure. labels May 24, 2021
@cameronnunez cameronnunez requested a review from knz May 24, 2021 19:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cameronnunez cameronnunez marked this pull request as draft May 24, 2021 19:57
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right before anything else we need to talk a little bit about how to organize this work.

For any kind of project that looks like "we need to get data from point A to point B in different forms"
There is necessarily a conversion step, meaning we have at least two algorithms of interest: parsing the input into an intermediate representation, then printing out the representation in the output format.

Whenever there are two separate algorithms, we can split the work into two sub-projects.
The one you're looking at here is thus specifically the parser.

Now you have a simpler project "we need to parse data from place X (e.g. a file) into place Y (a data structure)"

If that is the project you're working on, the commit message should reflect that, and explain what are X and Y.

Next thing to talk about is how to go about a programming task to implement an algorithm. Most of the time, it's extremely useful for productivity to set up a test program (or better yet - a unit test) ahead of starting to implement the algorithm. This way, at every step of the implementation you can exercise your implementation so far by running your test and see what the test says.

What does a test look like for a parser? That's a programming exercise in its own right. A parsing test is a collection of candidate inputs, with logic that applies the parser (yet to be programmed) to the input, then compares the actual output (from the parser) to some expected output (defined in the test).

For this kind of test structure nowadays we use the datadriven package. An example such test can be found in pkg/sql/pgwire/hba/hba_test.go.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@knz
Copy link
Contributor

knz commented May 25, 2021

In this case you'd thus start by populating some test code in format_crdb_v2_test.go and see how that goes.

@cameronnunez cameronnunez force-pushed the v2-parser branch 4 times, most recently from 6ae171f to af56499 Compare May 26, 2021 18:17
@cameronnunez
Copy link
Contributor Author

cameronnunez commented May 27, 2021

@knz

Wrote up some test cases for parsing single line entries, which the current form of the parser has passed.

Now looking at combining multi-line entries into single entries in memory. I am trying to figure out if I am going in the right direction by working on a v2 version of the peek function (which is called by the pop function) in debug_merge_logs.go. Am I headed in the right direction here or should I be looking elsewhere?

@cameronnunez cameronnunez requested a review from knz May 27, 2021 00:12
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez)


pkg/cli/debug_merge_logs.go, line 531 at r1 (raw file):

}
func (s *fileLogStream) peekV2() (logpb.Entry, bool) {
	for !s.read && s.err == nil {
  1. This logic belongs to the entry parser in the log package, not the merge-logs command.

  2. you can capture all the continuation lines using a single regular expression. No need for a "moreLines" loop.

Let's make some time together and I will teach you about the magic of regular expressions.


pkg/util/log/format_crdb_v2.go, line 465 at r1 (raw file):

		}
		b := d.scanner.Bytes()
		m := d.re.FindSubmatch(b)
  1. You're using the v1 regular expression. It's unsuitable for v2. You need to define a new regular expression.

  2. Separately from a new regexp, another question is whether you want to reuse the existing EntryDecoder type or define a new one. It can go either way.

  • Different types:
    • Pros: can use the same method name Decode on both types with a different implementation
    • Cons: code duplication
  • Same type, different fields
    • Pros: can reuse (part of) the constructor
    • Cons: either different Decode method name, or need to plug a method via a function attribute

Do you have an opinion on the matter?


pkg/util/log/format_crdb_v2_test.go, line 193 at r1 (raw file):

				fmt.Println("Here")
				fmt.Println(e.File == "")
				err := NewEntryDecoder(strings.NewReader(td.Input), 1).DecodeV2(&e)

Don't use literal constants in constructors. The constant for redactability has a name, use the name.

Also, we want the test to see that the redactability information is preserved. Your choice of using WithFlattenedSensitiveData here removes it. You need WithMarkedSensitiveData.


pkg/util/log/format_crdb_v2_test.go, line 197 at r1 (raw file):

					return fmt.Sprintf("error: %v\n", err)
				}
				var sb strings.Builder

Using strings.Builder here and the all the tablewriter code below is very cute but it's overkill :)

You can simplify all this to:

return fmt.Sprintf("%# v", pretty.Formatter(e))

@cameronnunez cameronnunez force-pushed the v2-parser branch 6 times, most recently from aef44d3 to 35bd33b Compare June 4, 2021 00:53
@cameronnunez cameronnunez requested a review from knz June 4, 2021 00:54
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how you have simplified the code. It looks much better this way.
Let's move on to polishing this a bit further.

Reviewed 3 of 4 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)


pkg/util/log/format_crdb_v2.go, line 472 at r2 (raw file):

		/* Context tags     				*/ `(?:\[((?:[^]]|\][^ ])+)\] )` +
		/* Counter          				*/ `((?:\d+)?) ` +
	  /* Continuation marker			*/ `([ =!+|])` +

please ensure the comments align with the previous lines properly.


pkg/util/log/format_crdb_v2.go, line 532 at r2 (raw file):

	}

	// Process the file.

Can you handle the (gostd) case in the same way as the crdb-v1 code thanks
(also add a test for it)


pkg/util/log/format_crdb_v2.go, line 559 at r2 (raw file):

	// Process the log message.
	// While the entry has additional lines, collect the full message.
	entry.Message = string(trimFinalNewLines(m[12]))

Can I suggest you replace the direct assignment of entry.Message by a strings.Builder,
with entry.Message = buf.String() at the end. This will save allocations.


pkg/util/log/format_crdb_v2.go, line 582 at r2 (raw file):

			entry.Message += msg
		case "!":
			entry.Message += "\n(stack trace: " + msg

please don't enclose the stack trace between parentheses - they are commonly many lines long, and the reader will become confused by the time the closing parenthesis is encountered.
(Also not using parentheses will allow you to simplify the case below)


pkg/util/log/format_crdb_v2.go, line 585 at r2 (raw file):

			hasStackTrace = true
		default:
			hasMoreLines = false

I don't think you need hasMoreLines - a simple break would suffice here.


pkg/util/log/format_crdb_v2.go, line 595 at r2 (raw file):

}

func (d *EntryDecoderV2) split(data []byte, atEOF bool) (advance int, token []byte, err error) {

Where is this code used?

@cameronnunez
Copy link
Contributor Author

cameronnunez commented Jun 4, 2021

Can you handle the (gostd) case in the same way as the crdb-v1 code thanks
(also add a test for it)

Yes! The second test includes handling the (gostd) case.

// Process the log message.
// While the entry has additional lines, collect the full message.
entry.Message = string(trimFinalNewLines(m[12]))

Can I suggest you replace the direct assignment of entry.Message by a strings.Builder,
with entry.Message = buf.String() at the end. This will save allocations.

Yes agreed, that's more efficient.

			hasStackTrace = true
		default:
			hasMoreLines = false

I don't think you need hasMoreLines - a simple break would suffice here.

I decided to use hasMoreLines because it seems that break breaks out of the switch instead of the for loop.

func (d *EntryDecoderV2) split(data []byte, atEOF bool) (advance int, token []byte, err error) {

Where is this code used?

This was used when the decoder had a scanner field, but it no longer does so this function is now not needed - thanks!

craig bot pushed a commit that referenced this pull request Jun 7, 2021
65342:  multi-region syntax diagrams r=ericharmeling a=ericharmeling

- Updated ALTER/CREATE DATABASE diagrams for multi-region
- Updated CREATE TABLE diagram for multi-region
- Added opt_locality diagram

Release note: None

66096: util/log: extend `logpb.Entry` with more fine grained fields r=cameronnunez a=knz

Needed for #65633 and #61378.

cc @cockroachdb/server 

66142: cli: skip under testrace as this test might be timing out r=jlinder a=adityamaru

Release note: None

Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
@cameronnunez cameronnunez force-pushed the v2-parser branch 2 times, most recently from f0013ab to b05a534 Compare June 7, 2021 17:56
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the last commit r4 "Merge branch 'cockroachdb:master' into v2-parser"

And instead do a rebase operation.
(We generally never merge the master branch into PR branches)

After you do a rebase, please refresh your parser test to include the new fields in logpb.Entry
Also please populate the new fields (StructuredStart/End, StackTraceStart) since you have enough information for them. You'll need them in the debug command code.

Reviewed 1 of 4 files at r2, 5 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez)


pkg/util/log/format_crdb_v2.go, line 472 at r2 (raw file):

Previously, knz (kena) wrote…

please ensure the comments align with the previous lines properly.

there's still a misalignment. Probably due to using tabs instead of spaces?


pkg/util/log/format_crdb_v2.go, line 555 at r3 (raw file):

	var entryMsg strings.Builder

	fmt.Fprintf(&entryMsg, string(trimFinalNewLines(m[12])))

Fprint, not Fprintf


pkg/util/log/format_crdb_v2.go, line 572 at r3 (raw file):

		switch cont {
		case "+":
			fmt.Fprintf(&entryMsg, "\n"+msg)

Fprintf(..., "%s\n", msg)


pkg/util/log/format_crdb_v2.go, line 574 at r3 (raw file):

			fmt.Fprintf(&entryMsg, "\n"+msg)
		case "|":
			fmt.Fprintf(&entryMsg, msg)

Fprint


pkg/util/log/format_crdb_v2.go, line 576 at r3 (raw file):

			fmt.Fprintf(&entryMsg, msg)
		case "!":
			fmt.Fprintf(&entryMsg, "\nstack trace:\n"+msg)

Fprintf with suitable %s

@cameronnunez cameronnunez force-pushed the v2-parser branch 2 times, most recently from 11b67aa to c18376a Compare July 12, 2021 14:30
Copy link
Contributor Author

@cameronnunez cameronnunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @cameronnunez, and @knz)


pkg/server/status.go, line 1006 at r36 (raw file):

Previously, knz (kena) wrote…

There's still a stray : %s at the end now. That's invalid.

Fixed.


pkg/util/log/file_api.go, line 375 at r29 (raw file):

Previously, knz (kena) wrote…

yes I think so.

Got it. Removed.


pkg/util/log/file_api.go, line 293 at r45 (raw file):

Previously, knz (kena) wrote…

"... is like FetchEntriesFromFiles but the caller can specified the format of the log file."

Done.


pkg/util/log/format_crdb_v1.go, line 440 at r45 (raw file):

Previously, knz (kena) wrote…

This simplification is not an improvement. bytes.TrimRight() incurs two heap allocations and is going to iterate 3 times over the final newlines.
The original code is objectively better.

Understood, reverted back to the original code.


pkg/util/log/format_crdb_v2.go, line 470 at r45 (raw file):

Previously, knz (kena) wrote…

I believe this one can be simplified to just

   ...  `\[(?P<tags>(?:[^]]|\][^ ])+)\] ` +

(the outer non-grouping parentheses can be omitted)

Updated.


pkg/util/log/format_crdb_v2.go, line 503 at r45 (raw file):

Previously, knz (kena) wrote…

if the panic object is not an error, then do not change it into an error. Instead, propagate it:

default:
  panic(r)

Done.


pkg/util/log/format_crdb_v2.go, line 597 at r45 (raw file):

Previously, knz (kena) wrote…

this should be an panic(errors.AssertionFailedf("..."))

Fixed.


pkg/util/log/log_decoder.go, line 42 at r45 (raw file):

Previously, knz (kena) wrote…

This is not the right location to define this mapping.

Generally, complex objects that do not depend on a function's parameters nor the time of invocation should be defined in the global scope.
In this particular case, in formats.go.

Fixed.


pkg/util/log/log_decoder.go, line 53 at r45 (raw file):

Previously, knz (kena) wrote…

I do not understand this entry nor the next one. Are you trying to invent new format names? I don't see why we should support more names for parsing than are available in the logging output configuration.

I included them in case the user simplifies the format name (for example, a user passes "v2" instead of "crdb-v2") but this is unnecessary and seems undesirable, so this is now removed.


pkg/util/log/log_decoder.go, line 57 at r45 (raw file):

Previously, knz (kena) wrote…

This comment is not reflecting the conditional immediately below it.

Updated.


pkg/util/log/log_decoder.go, line 79 at r45 (raw file):

Previously, knz (kena) wrote…

include the value specified by the user inside the error message

Done.


pkg/util/log/log_decoder.go, line 97 at r45 (raw file):

Previously, knz (kena) wrote…

for next time: when you link to an issue from the code, ensure the issue has the label X-anchored-telemetry.

Will do 👍

@cameronnunez cameronnunez force-pushed the v2-parser branch 2 times, most recently from 65462fd to 545b1b8 Compare July 12, 2021 22:52
@cameronnunez cameronnunez requested a review from knz July 13, 2021 01:16
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but CI is telling you something remains to be fixed

Reviewed 5 of 8 files at r46, 5 of 5 files at r47.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @cameronnunez)


pkg/sql/pgwire/auth_test.go, line 319 at r47 (raw file):

								if entry.StructuredEnd == 0 {
									// TODO(knz): Enhance this when the log file
									// contains proper markers for structured entries.

You can remove this comment now


pkg/util/log/formats.go, line 22 at r47 (raw file):

}

var formats = map[string]string{

I believe a better name for this variable is formatParsers (it's a mapping so the name should describe both the keys and the values in the map)

@cameronnunez cameronnunez force-pushed the v2-parser branch 3 times, most recently from ba12cc8 to dc96ad9 Compare July 13, 2021 19:59
@cameronnunez
Copy link
Contributor Author

thanks you guys! @ajwerner @knz

@cameronnunez
Copy link
Contributor Author

bors r=knz,ajwerner

@cameronnunez
Copy link
Contributor Author

bors r-

@craig
Copy link
Contributor

craig bot commented Jul 13, 2021

Canceled.

@cameronnunez cameronnunez force-pushed the v2-parser branch 2 times, most recently from 42d83e5 to 7c7f1d8 Compare July 14, 2021 15:20
@knz
Copy link
Contributor

knz commented Jul 14, 2021

as discussed offline: you can re-add the Co-authored-by: line in the commit message and merge this.

The remaining CI failure(s) if any are flakes in tests unrelated to your PR. We need to file issues when those occurs, to fix by other teams, but they should not block you from merging this.

The crdb-v2 log file format does not currently have an entry parser.
This patch creates the new parser alongside the v1 parser.

Release note (cli change): Previously, the crdb-v2 log file format
lacked a parser. This has now changed.

Co-authored-by: Andrew Werner <[email protected]>
@cameronnunez
Copy link
Contributor Author

bors r=knz,ajwerner

@craig
Copy link
Contributor

craig bot commented Jul 14, 2021

Build failed (retrying...):

@cameronnunez
Copy link
Contributor Author

bors r=knz,ajwerner

@craig
Copy link
Contributor

craig bot commented Jul 14, 2021

Already running a review

@craig
Copy link
Contributor

craig bot commented Jul 14, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-logging In and around the logging infrastructure. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

log: implement a crdb-v2 log entry parser and use it in debug merge-logs
4 participants