-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redact utility statements by default #588
Conversation
logs/analyze.go
Outdated
if m.Kind == state.StatementTextLogSecret { | ||
query := line.Content[m.ByteStart:m.ByteEnd] | ||
normalized, err := pg_query.NormalizeUtility(query) | ||
if err == nil && len(query) != len(normalized) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to ignore errors here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, are we using the length here as a proxy for whether normalization did anything? I guess that's probably safe—if someone has ALTER USER abc WITH PASSWORD '12'
, they have bigger problems—but a string comparison seems more readable here (which I imagine would optimize based on length, too, no?). Or at least a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed an error would only happen if it's a syntax/parsing error with the query, which we already filter from the logs as of #556.
I chose to compare string length because this is in a relatively hot path in the collector, and I'd guess that checking the string length is faster. But I didn't benchmark to see if there's a performance difference so 🤷 we could go with query != normalized
until there's data to suggest optimization is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, according to https://github.com/northbright/Notes/blob/master/Golang/string/golang-string-compare-internals.md, Go does do a length check as part of string comparison, so I think it'd be better to leave this as a simple comparison unless this suggests a bottleneck.
If we don't expect errors here, maybe we should panic on them? That's probably overkill, but given that users may rely on this to avoid sending sensitive data to us, I don't think we should fail "open".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've changed the condition to check string equality.
But marking any normalization errors with the credential log secret is causing a number of test failures in analyze_test.go
. For example this test fails with the error syntax error at or near "Query"
.
Diff for anyone who wants to try this locally:
- if err == nil && query != normalized {
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "======= %s\n%s", err, query)
+ }
+ if err != nil || query != normalized {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly related, but I was going to ask why you're not using a logger. And then I realized there's no access to a logger here, and no way to return an error. I don't love this API design (the existing code, not your changes).
In this case, though, I think if we don't expect there to be an error anyway, I think we should just panic. The worst case there is we miss monitoring availability, which I think is less bad than potentially leaking credentials if this code starts dealing with new cases we had not considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would cause our test suite to panic, though? Unless our tests are wrong, it seems we have no choice but to ignore errors at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. It's not that we'll never see an error because the syntax error happens elsewhere. An error still happens here, but that error is basically something we handle elsewhere, so we can ignore it here. I got the test suite to pass with
diff --git a/logs/analyze.go b/logs/analyze.go
index 238a413e..0aef7503 100644
--- a/logs/analyze.go
+++ b/logs/analyze.go
@@ -2280,7 +2280,13 @@ func markUtilitySecret(line *state.LogLine) {
if m.Kind == state.StatementTextLogSecret {
query := line.Content[m.ByteStart:m.ByteEnd]
normalized, err := pg_query.NormalizeUtility(query)
- if err == nil && query != normalized {
+ if err != nil {
+ if strings.HasPrefix(err.Error(), "syntax error") || strings.HasPrefix(err.Error(), "unterminated quoted identifier") {
+ continue
+ }
+ panic(fmt.Errorf("Could not normalize utility statement: %s", err))
+ }
+ if query != normalized {
line.SecretMarkers = append(line.SecretMarkers, state.LogSecretMarker{
ByteStart: m.ByteStart,
ByteEnd: m.ByteEnd,
but the fact that I had to check for two separate error strings here is not a good sign for the robustness of this approach. We would probably cause panics for legitimate workloads, which is not acceptable.
Given that, I'm out of ideas. Since this patch is tightening redaction, I think the approach here is a worthwhile step in spite of this. We should revisit this, though.
Note that pganalyze/pg_query_go#116 and pganalyze/libpg_query#255 also need review. |
Utility statements can include database secrets, so this PR classifies utility statements under
filter_log_secret = credential
(which is enabled by default as of #556), so they're redacted from the logs.Depends on pganalyze/pg_query_go#116