Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact utility statements by default #588

Merged
merged 3 commits into from
Aug 29, 2024
Merged

Redact utility statements by default #588

merged 3 commits into from
Aug 29, 2024

Conversation

seanlinsley
Copy link
Member

Utility statements can include database secrets, so this PR classifies utility statements under filter_log_secret = credential (which is enabled by default as of #556), so they're redacted from the logs.

Depends on pganalyze/pg_query_go#116

@seanlinsley seanlinsley requested a review from a team August 21, 2024 16:02
logs/analyze.go Outdated
if m.Kind == state.StatementTextLogSecret {
query := line.Content[m.ByteStart:m.ByteEnd]
normalized, err := pg_query.NormalizeUtility(query)
if err == nil && len(query) != len(normalized) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to ignore errors here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, are we using the length here as a proxy for whether normalization did anything? I guess that's probably safe—if someone has ALTER USER abc WITH PASSWORD '12', they have bigger problems—but a string comparison seems more readable here (which I imagine would optimize based on length, too, no?). Or at least a comment.

Copy link
Member Author

@seanlinsley seanlinsley Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed an error would only happen if it's a syntax/parsing error with the query, which we already filter from the logs as of #556.

I chose to compare string length because this is in a relatively hot path in the collector, and I'd guess that checking the string length is faster. But I didn't benchmark to see if there's a performance difference so 🤷 we could go with query != normalized until there's data to suggest optimization is needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, according to https://github.com/northbright/Notes/blob/master/Golang/string/golang-string-compare-internals.md, Go does do a length check as part of string comparison, so I think it'd be better to leave this as a simple comparison unless this suggests a bottleneck.

If we don't expect errors here, maybe we should panic on them? That's probably overkill, but given that users may rely on this to avoid sending sensitive data to us, I don't think we should fail "open".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've changed the condition to check string equality.

But marking any normalization errors with the credential log secret is causing a number of test failures in analyze_test.go. For example this test fails with the error syntax error at or near "Query".

Diff for anyone who wants to try this locally:

-                       if err == nil && query != normalized {
+                       if err != nil {
+                               fmt.Fprintf(os.Stderr, "======= %s\n%s", err, query)
+                       }
+                       if err != nil || query != normalized {

Copy link
Contributor

@msakrejda msakrejda Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly related, but I was going to ask why you're not using a logger. And then I realized there's no access to a logger here, and no way to return an error. I don't love this API design (the existing code, not your changes).

In this case, though, I think if we don't expect there to be an error anyway, I think we should just panic. The worst case there is we miss monitoring availability, which I think is less bad than potentially leaking credentials if this code starts dealing with new cases we had not considered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would cause our test suite to panic, though? Unless our tests are wrong, it seems we have no choice but to ignore errors at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. It's not that we'll never see an error because the syntax error happens elsewhere. An error still happens here, but that error is basically something we handle elsewhere, so we can ignore it here. I got the test suite to pass with

diff --git a/logs/analyze.go b/logs/analyze.go
index 238a413e..0aef7503 100644
--- a/logs/analyze.go
+++ b/logs/analyze.go
@@ -2280,7 +2280,13 @@ func markUtilitySecret(line *state.LogLine) {
                if m.Kind == state.StatementTextLogSecret {
                        query := line.Content[m.ByteStart:m.ByteEnd]
                        normalized, err := pg_query.NormalizeUtility(query)
-                       if err == nil && query != normalized {
+                       if err != nil {
+                               if strings.HasPrefix(err.Error(), "syntax error") || strings.HasPrefix(err.Error(), "unterminated quoted identifier") {
+                                       continue
+                               }
+                               panic(fmt.Errorf("Could not normalize utility statement: %s", err))
+                       }
+                       if query != normalized {
                                line.SecretMarkers = append(line.SecretMarkers, state.LogSecretMarker{
                                        ByteStart: m.ByteStart,
                                        ByteEnd:   m.ByteEnd,

but the fact that I had to check for two separate error strings here is not a good sign for the robustness of this approach. We would probably cause panics for legitimate workloads, which is not acceptable.

Given that, I'm out of ideas. Since this patch is tightening redaction, I think the approach here is a worthwhile step in spite of this. We should revisit this, though.

@msakrejda msakrejda mentioned this pull request Aug 27, 2024
@seanlinsley
Copy link
Member Author

Note that pganalyze/pg_query_go#116 and pganalyze/libpg_query#255 also need review.

@seanlinsley seanlinsley merged commit 6ea5a56 into main Aug 29, 2024
3 checks passed
@seanlinsley seanlinsley deleted the normalize_utility branch August 29, 2024 18:49
msakrejda added a commit that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants