-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: adding column redaction to selected system and crdb_internal tables #121448
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @kousiknath)
-- commits
line 2 at r1:
Commit message should contain detailed description of change and a release note.
On CRDB PRs we link to the github issue, not the Jira ticket since this is a public repo. Use the text Part of: #87487
to link it to the issue. I assume there will be more work to be done for that ticket, otherwise use Fixes: #xxx
so that merging the PR closes the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kousiknath)
pkg/cli/zip_table_registry.go
line 457 at r1 (raw file):
// `env` column can contain sensitive node environment variable values, // such as AWS_ACCESS_KEY. customQueryUnredacted: `SELECT
nit: If customQueryRedacted
is defined, and there's no special query required for the unredacted case, then we don't need to define customQueryUnredacted
.
This is because in the absence of customQueryUnredacted
, the code using this config will simply do a SELECT * FROM <table>
, which should have the same result.
Code quote:
customQueryUnredacted: `SELECT
pkg/cli/zip_table_registry.go
line 480 at r1 (raw file):
FROM crdb_internal.kv_node_status `, customQueryRedacted: `SELECT
nit: In the comments for this table, we note that the env
column is sensitive and redacted.
Since we're now adding a custom redacted query, can we include the env
column similar to how we do for the other redacted columns? e.g. <redacted> AS env
.
Code quote:
customQueryRedacted: `SELECT
pkg/cli/zip_table_registry.go
line 702 at r1 (raw file):
node_id, network, '<redacted>' as address,
nit: Small comments explaining why each field is redacted would be helpful. We can just note that this is motivated by redaction of hostnames.
Code quote:
'<redacted>' as address,
pkg/cli/zip_table_registry.go
line 840 at r1 (raw file):
}, "crdb_internal.node_runtime_info": { customQueryRedacted: `SELECT * FROM (
nit: the indentation looks off in this query. Query looks good, though!
pkg/cli/zip_table_registry.go
line 1333 at r1 (raw file):
"locality" FROM system.sql_instances `,
nit: ditto here, we don't need to define customQueryUnredacted
when we have customQueryRedacted
. This is partially my fault, I could have made this clearer in the comments on the struct.
Code quote:
customQueryUnredacted: `SELECT
"id",
"addr",
"session_id",
"locality"
FROM system.sql_instances
`,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kousiknath)
pkg/cli/zip_table_registry.go
line 480 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: In the comments for this table, we note that the
env
column is sensitive and redacted.Since we're now adding a custom redacted query, can we include the
env
column similar to how we do for the other redacted columns? e.g.<redacted> AS env
.
Done
pkg/cli/zip_table_registry.go
line 702 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: Small comments explaining why each field is redacted would be helpful. We can just note that this is motivated by redaction of hostnames.
Done
pkg/cli/zip_table_registry.go
line 840 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: the indentation looks off in this query. Query looks good, though!
Changed the indentation.
pkg/cli/zip_table_registry.go
line 1333 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: ditto here, we don't need to define
customQueryUnredacted
when we havecustomQueryRedacted
. This is partially my fault, I could have made this clearer in the comments on the struct.
I see there are some extra columns added to table - crdb_region
(BYTES) , binary_version
(STRING) which are omitted in this redacted query. Since we are not using custom unredacted query, those columns will be by default included in the unredacted result. So, shouldn't we include these columns in redacted query too ? @abarganier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better!
To test these changes, you'll need to re-record the datadriven tests in pkg/cli/zip_test.go
and pkg/cli/zip_tenant_test.go
, and verify that the diffs made to the recorded test files (if any) look correct.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kousiknath)
pkg/cli/zip_table_registry.go
line 1333 at r1 (raw file):
Previously, kousiknath (Kousik Nath) wrote…
I see there are some extra columns added to table -
crdb_region
(BYTES) ,binary_version
(STRING) which are omitted in this redacted query. Since we are not using custom unredacted query, those columns will be by default included in the unredacted result. So, shouldn't we include these columns in redacted query too ? @abarganier
Ah, this is a result of the columns/queries becoming out of date with changes made to the system table schema.
In order to add these columns to the redacted query, we first need to determine if they need redaction or not. That work to audit the table is out of scope for this PR, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/cli/zip_table_registry.go
line 1333 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
Ah, this is a result of the columns/queries becoming out of date with changes made to the system table schema.
In order to add these columns to the redacted query, we first need to determine if they need redaction or not. That work to audit the table is out of scope for this PR, I think.
By the name of these columns, it does not look like the columns need any redaction. These columns will be included in the unredacted query, but not in the redacted query unless we mention here. So, the question is more about parity of the queries here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @kousiknath)
-- commits
line 5 at r3:
commit updates on CRDB are always squashed before merge.
-- commits
line 8 at r3:
This should be a separate PR. The description and message here is too generic and should contain more detail.
pkg/server/status.go
line 2354 at r3 (raw file):
} return resp
lots of changes in this file with no tests. please add them for each modified request.
One thing that might help is testutils.RunTrueAndFalse
which is a helper you can leverage to test with and without redaction.
pkg/server/serverpb/status.proto
line 95 at r3 (raw file):
string node_id = 1; reserved 2; bool redacted = 3;
Field should have docstring on every proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much to add on top of what David already said. I agree we should split the API stuff into a separate PR to help keep conversations/scope of changes focused.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kousiknath)
pkg/cli/zip_table_registry.go
line 1333 at r1 (raw file):
Previously, kousiknath (Kousik Nath) wrote…
By the name of these columns, it does not look like the columns need any redaction. These columns will be included in the unredacted query, but not in the redacted query unless we mention here. So, the question is more about parity of the queries here.
I think that's fine for now - introducing new columns into the redacted data is out of scope for this PR in my opinion.
pkg/server/serverpb/status.proto
line 95 at r3 (raw file):
string node_id = 1; reserved 2; bool redacted = 3;
nit: can we change this and the others to redact
instead of redacted
?
Code quote:
bool redacted = 3;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)
Previously, dhartunian (David Hartunian) wrote…
commit updates on CRDB are always squashed before merge.
This PR was supposed be a draft PR, not a final one, my bad I did not convert it to Draft earlier. It's still WIP. I just wanted to get some preliminary feedback so that we don't end up taking a complete detour just in case. I have converted it to Draft PR and will add test cases and other stuffs as necessary.
pkg/server/status.go
line 2354 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
lots of changes in this file with no tests. please add them for each modified request.
One thing that might help is
testutils.RunTrueAndFalse
which is a helper you can leverage to test with and without redaction.
This is a Draft PR, I am working on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian)
pkg/server/serverpb/status.proto
line 95 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Field should have docstring on every proto.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @kousiknath)
pkg/server/status.go
line 2345 at r3 (raw file):
} func (s *statusServer) redactRangesResponse(resp *serverpb.RangesResponse) *serverpb.RangesResponse {
I think something a bit more structured would help here. If you're working with protobufs anyway, add a custom annotation to the fields that require redaction like is done here for example:
cockroach/pkg/util/log/eventpb/telemetry.proto
Lines 46 to 57 in d0d20af
// The distribution of the DistSQL query plan (local, full, or partial). | |
string distribution = 6 [(gogoproto.jsontag) = ",omitempty", (gogoproto.moretags) = "redact:\"nonsensitive\""]; | |
// The query's plan gist bytes as a base64 encoded string. | |
string plan_gist = 7 [(gogoproto.jsontag) = ',omitempty', (gogoproto.moretags) = "redact:\"nonsensitive\""]; | |
// SessionID is the ID of the session that initiated the query. | |
string session_id = 8 [(gogoproto.customname) = "SessionID", (gogoproto.jsontag) = ",omitempty", (gogoproto.moretags) = "redact:\"nonsensitive\""]; | |
// Name of the database that initiated the query. | |
string database = 9 [(gogoproto.jsontag) = ",omitempty", (gogoproto.moretags) = "redact:\"nonsensitive\""]; |
Then you can iterate over the generic message and redact fields as necessary.
Alternatively, you could have a RedactableProto
interface that has a single method: Redact()
that redacts the fields. There could be a default implementation for each object that's a no-op.
e83cbb1
to
ab5c60d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding testing, I don't see any test files is changed after the changes. They are as they are. Is there any way to actually run this in local? or the better way is to merge to master and then check in stage?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
Previously, dhartunian (David Hartunian) wrote…
This should be a separate PR. The description and message here is too generic and should contain more detail.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abarganier @dhartunian I tested the changes in local and updated the result in the ticket: https://cockroachlabs.atlassian.net/browse/CRDB-19369?focusedCommentId=223182
Could you please approve the PR unless I am not missing anything here.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian and @kousiknath)
pkg/cli/zip_table_registry.go
line 696 at r4 (raw file):
started_at, is_live, ranges, leases
nit: newline for each column name
Code quote:
ranges, leases
DB dump tables contain un-redacted information which might contain crucial customer data like hostname / ip addresses etc. Some of these table data is served to debug zip through APIs. But many self hosted customers would not like to pass these information out of their network. This PR solves this issue and introduces un-redacted query support for few of such tables. Epic: https://cockroachlabs.atlassian.net/browse/CRDB-19369 Release note: None
bors r+ |
DB dump tables contain un-redacted information which might contain crucial customer
data like hostname / ip addresses etc. Some of these table data is served to debug zip
through APIs. But many self hosted customers would not like to pass these information
out of their network. This PR solves this issue and introduces un-redacted query
support for few of such tables.
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-19369
Release note: None