Redact utility statements by default #588

seanlinsley · 2024-08-21T16:02:20Z

Utility statements can include database secrets, so this PR classifies utility statements under filter_log_secret = credential (which is enabled by default as of #556), so they're redacted from the logs.

Depends on pganalyze/pg_query_go#116

msakrejda · 2024-08-27T19:12:08Z

logs/analyze.go

+		if m.Kind == state.StatementTextLogSecret {
+			query := line.Content[m.ByteStart:m.ByteEnd]
+			normalized, err := pg_query.NormalizeUtility(query)
+			if err == nil && len(query) != len(normalized) {


Do we want to ignore errors here?

Separately, are we using the length here as a proxy for whether normalization did anything? I guess that's probably safe—if someone has ALTER USER abc WITH PASSWORD '12', they have bigger problems—but a string comparison seems more readable here (which I imagine would optimize based on length, too, no?). Or at least a comment.

I assumed an error would only happen if it's a syntax/parsing error with the query, which we already filter from the logs as of #556.

I chose to compare string length because this is in a relatively hot path in the collector, and I'd guess that checking the string length is faster. But I didn't benchmark to see if there's a performance difference so 🤷 we could go with query != normalized until there's data to suggest optimization is needed here.

Yeah, according to https://github.com/northbright/Notes/blob/master/Golang/string/golang-string-compare-internals.md, Go does do a length check as part of string comparison, so I think it'd be better to leave this as a simple comparison unless this suggests a bottleneck.

If we don't expect errors here, maybe we should panic on them? That's probably overkill, but given that users may rely on this to avoid sending sensitive data to us, I don't think we should fail "open".

Okay, I've changed the condition to check string equality.

But marking any normalization errors with the credential log secret is causing a number of test failures in analyze_test.go. For example this test fails with the error syntax error at or near "Query".

Diff for anyone who wants to try this locally:

- if err == nil && query != normalized { + if err != nil { + fmt.Fprintf(os.Stderr, "======= %s\n%s", err, query) + } + if err != nil || query != normalized {

Possibly related, but I was going to ask why you're not using a logger. And then I realized there's no access to a logger here, and no way to return an error. I don't love this API design (the existing code, not your changes).

In this case, though, I think if we don't expect there to be an error anyway, I think we should just panic. The worst case there is we miss monitoring availability, which I think is less bad than potentially leaking credentials if this code starts dealing with new cases we had not considered.

That would cause our test suite to panic, though? Unless our tests are wrong, it seems we have no choice but to ignore errors at this point.

Oh, I see. It's not that we'll never see an error because the syntax error happens elsewhere. An error still happens here, but that error is basically something we handle elsewhere, so we can ignore it here. I got the test suite to pass with

diff --git a/logs/analyze.go b/logs/analyze.go index 238a413e..0aef7503 100644 --- a/logs/analyze.go +++ b/logs/analyze.go @@ -2280,7 +2280,13 @@ func markUtilitySecret(line *state.LogLine) { if m.Kind == state.StatementTextLogSecret { query := line.Content[m.ByteStart:m.ByteEnd] normalized, err := pg_query.NormalizeUtility(query) - if err == nil && query != normalized { + if err != nil { + if strings.HasPrefix(err.Error(), "syntax error") || strings.HasPrefix(err.Error(), "unterminated quoted identifier") { + continue + } + panic(fmt.Errorf("Could not normalize utility statement: %s", err)) + } + if query != normalized { line.SecretMarkers = append(line.SecretMarkers, state.LogSecretMarker{ ByteStart: m.ByteStart, ByteEnd: m.ByteEnd,

but the fact that I had to check for two separate error strings here is not a good sign for the robustness of this approach. We would probably cause panics for legitimate workloads, which is not acceptable.

Given that, I'm out of ideas. Since this patch is tightening redaction, I think the approach here is a worthwhile step in spite of this. We should revisit this, though.

seanlinsley · 2024-08-28T18:10:40Z

Note that pganalyze/pg_query_go#116 and pganalyze/libpg_query#255 also need review.

Redact utility statements by default

1fdd243

seanlinsley requested a review from a team August 21, 2024 16:02

msakrejda reviewed Aug 27, 2024

View reviewed changes

msakrejda mentioned this pull request Aug 27, 2024

Release 0.58.0 #590

Merged

check string equality

6ba6411

msakrejda approved these changes Aug 28, 2024

View reviewed changes

Update pg_query_go to main branch

796cf1c

seanlinsley merged commit 6ea5a56 into main Aug 29, 2024
3 checks passed

seanlinsley deleted the normalize_utility branch August 29, 2024 18:49

msakrejda added a commit that referenced this pull request Aug 29, 2024

Add note about #588

0170130

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redact utility statements by default #588

Redact utility statements by default #588

seanlinsley commented Aug 21, 2024

msakrejda Aug 27, 2024

msakrejda Aug 27, 2024

seanlinsley Aug 27, 2024 •

edited

Loading

msakrejda Aug 27, 2024

seanlinsley Aug 27, 2024

msakrejda Aug 28, 2024 •

edited

Loading

seanlinsley Aug 28, 2024

msakrejda Aug 28, 2024

seanlinsley commented Aug 28, 2024

Redact utility statements by default #588

Redact utility statements by default #588

Conversation

seanlinsley commented Aug 21, 2024

msakrejda Aug 27, 2024

Choose a reason for hiding this comment

msakrejda Aug 27, 2024

Choose a reason for hiding this comment

seanlinsley Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

msakrejda Aug 27, 2024

Choose a reason for hiding this comment

seanlinsley Aug 27, 2024

Choose a reason for hiding this comment

msakrejda Aug 28, 2024 • edited Loading

Choose a reason for hiding this comment

seanlinsley Aug 28, 2024

Choose a reason for hiding this comment

msakrejda Aug 28, 2024

Choose a reason for hiding this comment

seanlinsley commented Aug 28, 2024

seanlinsley Aug 27, 2024 •

edited

Loading

msakrejda Aug 28, 2024 •

edited

Loading