-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql,cli: add redacted sql stmts to debug zip #92263
Conversation
ccd216c
to
7a3d0d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you checked performance after using the new builtin? (vs adding a new column redacted and the debuzip deciding which one to use)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)
pkg/sql/sem/builtins/builtins.go
line 7357 at r1 (raw file):
}, ), "crdb_internal.redact_sql_constants": makeBuiltin(tree.FunctionProperties{
can you add tests to the new function?
7a3d0d5
to
a6cd9ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't benchmark this but I think this approach is preferred to extra columns because if we were to create a new redacted column for each of these tables, we'd be redacting every time we query when we don't need to and also showing a redundant extra column quite often in the select all. Would parsing and redacting the stmts during debug zip creation add that much overhead?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @maryliag)
pkg/sql/sem/builtins/builtins.go
line 7357 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
can you add tests to the new function?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would parsing and redacting the stmts during debug zip creation add that much overhead
that's my question. I don't know, so I want to be safe that when you're getting a debug zip that already adds some overhead, you wouldn't be adding even more
we'd be redacting every time we query when we don't need to
from the issue, I had the impression we were already redacting and was a matter of using that value on the columns, instead of the non-redacted value, so my assumption is that it wouldn't be adding extra overhead
I'm okay with either approach, but want to make sure there is no performance degradation on either approach
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier)
a6cd9ea
to
6a421b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2, 1 of 2 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)
pkg/sql/logictest/testdata/logic_test/builtin_function
line 3686 at r4 (raw file):
subtest crdb_internal.redact_sql_constants query T
can you add a few more complex queries, with IN (...)
or something with strings (to confirm is keeping the '
for example)
6a421b6
to
40a319b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @maryliag)
pkg/sql/logictest/testdata/logic_test/builtin_function
line 3686 at r4 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
can you add a few more complex queries, with
IN (...)
or something with strings (to confirm is keeping the'
for example)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Apologies for the review delay - this looks great! TSE is going to really appreciate this - thank you @xinhaoz!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @maryliag and @xinhaoz)
pkg/cli/zip_table_registry.go
line 670 at r6 (raw file):
"crdb_internal.node_sessions": { // `active_queries` and `last_active_query` columns contain unredacted // SQL statement strings.
nit: here and above, let's remove these comments for columns that are no longer omitted.
Code quote:
// `active_queries` and `last_active_query` columns contain unredacted
// SQL statement strings.
// `client_address` contains unredacted client IP addresses.
40a319b
to
220a245
Compare
220a245
to
3f825ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @maryliag)
pkg/cli/zip_table_registry.go
line 670 at r6 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: here and above, let's remove these comments for columns that are no longer omitted.
Done.
3f825ed
to
152e5b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xinhaoz @maryliag
I have a quick question about the use of the word "redact" in this API. The redaction that's applied here is not the same as our log redaction which uses the special bracket markers etc. Just wondering if that might be confusing. Is there another word we can use? It kinda turns the statement into a fingerprint, right? Is there a reason to use different terminology here? Maybe "anonymize" is better. Not sure. I don't feel super strongly here but just want to get thoughts from folks.
Reviewed 1 of 4 files at r5, 1 of 3 files at r7, 2 of 2 files at r9.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @xinhaoz)
@dhartunian I see, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the anonymize
option
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian)
152e5b1
to
a4e8b2c
Compare
@maryliag Sounds good, rename has been pushed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz)
TFTR, all! |
Build failed (retrying...): |
Build failed (retrying...): |
bors r- |
Canceled. |
a4e8b2c
to
ed7c286
Compare
This commit adds the builtin, `crdb_internal.anonymize_sql_constants` which takes in a sql string and returns it with constants hidden. This will be used to safely expose columns that are sql stmts in the redacted debug zip. Release note: None
Closes cockroachdb#88823 This commit adds the following fields to the redacted debug zip: crdb_internal.create_statements: - create_statement - create_nofks - alter_statements (each elem is redacted) crdb_internal.create_function_statements: - create_statement crdb_internal.{node,cluster}_distsql_flows: - stmt crdb_internal.{cluster,node}_sessions: - last_active - active_queries crdb_internal.{cluster,node}_queries: - query Release note (cli change): The following fields have been redacted and added to the redacted debug zip: crdb_internal.create_statements: - create_statement - create_nofks - alter_statements (each elem is redacted) crdb_internal.create_function_statements: - create_statement crdb_internal.{node,cluster}_distsql_flows: - stmt crdb_internal.{cluster,node}_sessions: - last_active - active_queries crdb_internal.{cluster,node}_queries: - query
ed7c286
to
1224138
Compare
bors r+ |
Build succeeded: |
@abarganier do you want this backported to 22.2? |
Commit 1
This commit adds the builtin,
crdb_internal.anonymize_sql_constants
which takes in a sql string and returns it with constants redacted.
This will be used to redact columns that are sql stmts in the
redacted debug zip.
Release note: None
Commit 2
Closes #88823
This commit adds the following fields to the redacted
debug zip:
crdb_internal.create_statements:
crdb_internal.create_function_statements:
crdb_internal.{node,cluster}_distsql_flows:
crdb_internal.{cluster,node}_sessions:
crdb_internal.{cluster,node}_queries:
Release note (cli change):
The following fields have been redacted and added to
the redacted debug zip:
crdb_internal.create_statements:
crdb_internal.create_function_statements:
crdb_internal.{node,cluster}_distsql_flows:
crdb_internal.{cluster,node}_sessions:
crdb_internal.{cluster,node}_queries:
Running ycsb, tpcc and movr default workloads for 15 minutes and requesting the debug zip on a fresh node in master vs with new changes:
master
branch