Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,cli: add redacted sql stmts to debug zip #92263

Merged
merged 2 commits into from
Dec 2, 2022

Conversation

xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Nov 21, 2022

Commit 1

This commit adds the builtin, crdb_internal.anonymize_sql_constants
which takes in a sql string and returns it with constants redacted.
This will be used to redact columns that are sql stmts in the
redacted debug zip.

Release note: None

Commit 2

Closes #88823

This commit adds the following fields to the redacted
debug zip:

crdb_internal.create_statements:

  • create_statement
    • create_nofks
  • alter_statements (each elem is redacted)

crdb_internal.create_function_statements:

  • create_statement

crdb_internal.{node,cluster}_distsql_flows:

  • stmt

crdb_internal.{cluster,node}_sessions:

  • last_active
  • active_queries

crdb_internal.{cluster,node}_queries:

  • query

Release note (cli change):
The following fields have been redacted and added to
the redacted debug zip:
crdb_internal.create_statements:

  • create_statement
  • create_nofks
  • alter_statements (each elem is redacted)

crdb_internal.create_function_statements:

  • create_statement

crdb_internal.{node,cluster}_distsql_flows:

  • stmt

crdb_internal.{cluster,node}_sessions:

  • last_active
  • active_queries

crdb_internal.{cluster,node}_queries:

  • query

Running ycsb, tpcc and movr default workloads for 15 minutes and requesting the debug zip on a fresh node in master vs with new changes:
master
Pasted Graphic

branch
Pasted Graphic 2

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from ccd216c to 7a3d0d5 Compare November 21, 2022 17:02
@xinhaoz xinhaoz marked this pull request as ready for review November 21, 2022 17:04
@xinhaoz xinhaoz requested a review from a team as a code owner November 21, 2022 17:04
@xinhaoz xinhaoz requested review from a team November 21, 2022 17:04
@maryliag maryliag requested a review from abarganier November 21, 2022 17:11
Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you checked performance after using the new builtin? (vs adding a new column redacted and the debuzip deciding which one to use)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)


pkg/sql/sem/builtins/builtins.go line 7357 at r1 (raw file):

		},
	),
	"crdb_internal.redact_sql_constants": makeBuiltin(tree.FunctionProperties{

can you add tests to the new function?

@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from 7a3d0d5 to a6cd9ea Compare November 21, 2022 19:04
Copy link
Member Author

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't benchmark this but I think this approach is preferred to extra columns because if we were to create a new redacted column for each of these tables, we'd be redacting every time we query when we don't need to and also showing a redundant extra column quite often in the select all. Would parsing and redacting the stmts during debug zip creation add that much overhead?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @maryliag)


pkg/sql/sem/builtins/builtins.go line 7357 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

can you add tests to the new function?

Done.

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would parsing and redacting the stmts during debug zip creation add that much overhead
that's my question. I don't know, so I want to be safe that when you're getting a debug zip that already adds some overhead, you wouldn't be adding even more

we'd be redacting every time we query when we don't need to
from the issue, I had the impression we were already redacting and was a matter of using that value on the columns, instead of the non-redacted value, so my assumption is that it wouldn't be adding extra overhead

I'm okay with either approach, but want to make sure there is no performance degradation on either approach

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier)

@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from a6cd9ea to 6a421b6 Compare November 21, 2022 19:45
@xinhaoz xinhaoz changed the title sql: add redacted sql stmts to debug zip sql,cli: add redacted sql stmts to debug zip Nov 21, 2022
@xinhaoz xinhaoz requested a review from a team November 22, 2022 18:37
Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 1 of 1 files at r2, 1 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)


pkg/sql/logictest/testdata/logic_test/builtin_function line 3686 at r4 (raw file):


subtest crdb_internal.redact_sql_constants
query T

can you add a few more complex queries, with IN (...) or something with strings (to confirm is keeping the ' for example)

@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from 6a421b6 to 40a319b Compare November 22, 2022 19:56
Copy link
Member Author

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @maryliag)


pkg/sql/logictest/testdata/logic_test/builtin_function line 3686 at r4 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

can you add a few more complex queries, with IN (...) or something with strings (to confirm is keeping the ' for example)

Done.

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier)

Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Apologies for the review delay - this looks great! TSE is going to really appreciate this - thank you @xinhaoz!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag and @xinhaoz)


pkg/cli/zip_table_registry.go line 670 at r6 (raw file):

	"crdb_internal.node_sessions": {
		// `active_queries` and `last_active_query` columns contain unredacted
		// SQL statement strings.

nit: here and above, let's remove these comments for columns that are no longer omitted.

Code quote:

		// `active_queries` and `last_active_query` columns contain unredacted
		// SQL statement strings.
		// `client_address` contains unredacted client IP addresses.

@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from 40a319b to 220a245 Compare November 30, 2022 16:42
@xinhaoz xinhaoz requested a review from a team as a code owner November 30, 2022 16:42
@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from 220a245 to 3f825ed Compare November 30, 2022 17:58
@xinhaoz xinhaoz removed the request for review from a team November 30, 2022 17:58
Copy link
Member Author

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @maryliag)


pkg/cli/zip_table_registry.go line 670 at r6 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: here and above, let's remove these comments for columns that are no longer omitted.

Done.

@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from 3f825ed to 152e5b1 Compare November 30, 2022 18:29
@xinhaoz xinhaoz requested a review from a team December 1, 2022 15:24
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xinhaoz @maryliag
I have a quick question about the use of the word "redact" in this API. The redaction that's applied here is not the same as our log redaction which uses the special bracket markers etc. Just wondering if that might be confusing. Is there another word we can use? It kinda turns the statement into a fingerprint, right? Is there a reason to use different terminology here? Maybe "anonymize" is better. Not sure. I don't feel super strongly here but just want to get thoughts from folks.

Reviewed 1 of 4 files at r5, 1 of 3 files at r7, 2 of 2 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @xinhaoz)

@xinhaoz
Copy link
Member Author

xinhaoz commented Dec 1, 2022

@dhartunian I see, if redact already has a specific meaning, I wouldn't want to tread on that with this. How about just 'hide_sql_constants'? That aligns with the fmt function being used, too.

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the anonymize option

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian)

Copy link
Contributor

@e-mbrown e-mbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian)

@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from 152e5b1 to a4e8b2c Compare December 2, 2022 17:09
@xinhaoz
Copy link
Member Author

xinhaoz commented Dec 2, 2022

@maryliag Sounds good, rename has been pushed

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz)

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thankyou!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz)

@xinhaoz
Copy link
Member Author

xinhaoz commented Dec 2, 2022

TFTR, all!
bors r+

@craig
Copy link
Contributor

craig bot commented Dec 2, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 2, 2022

Build failed (retrying...):

@xinhaoz
Copy link
Member Author

xinhaoz commented Dec 2, 2022

bors r-

@craig
Copy link
Contributor

craig bot commented Dec 2, 2022

Canceled.

@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from a4e8b2c to ed7c286 Compare December 2, 2022 19:24
This commit adds the builtin, `crdb_internal.anonymize_sql_constants`
which takes in a sql string and returns it with constants hidden.
This will be used to safely expose columns that are sql stmts in the
redacted debug zip.

Release note: None
Closes cockroachdb#88823

This commit adds the following fields to the redacted
debug zip:

crdb_internal.create_statements:
- create_statement
- create_nofks
- alter_statements (each elem is redacted)

crdb_internal.create_function_statements:
- create_statement

crdb_internal.{node,cluster}_distsql_flows:
- stmt

crdb_internal.{cluster,node}_sessions:
- last_active
- active_queries

crdb_internal.{cluster,node}_queries:
- query

Release note (cli change):
The following fields have been redacted and added to
the redacted debug zip:
crdb_internal.create_statements:
- create_statement
- create_nofks
- alter_statements (each elem is redacted)
crdb_internal.create_function_statements:
- create_statement
crdb_internal.{node,cluster}_distsql_flows:
- stmt
crdb_internal.{cluster,node}_sessions:
- last_active
- active_queries
crdb_internal.{cluster,node}_queries:
- query
@xinhaoz xinhaoz force-pushed the redact-sql-debug-zip branch from ed7c286 to 1224138 Compare December 2, 2022 19:49
@xinhaoz
Copy link
Member Author

xinhaoz commented Dec 2, 2022

bors r+

@craig craig bot merged commit 865ad56 into cockroachdb:master Dec 2, 2022
@craig
Copy link
Contributor

craig bot commented Dec 2, 2022

Build succeeded:

@maryliag
Copy link
Contributor

maryliag commented Dec 6, 2022

@abarganier do you want this backported to 22.2?

@xinhaoz xinhaoz deleted the redact-sql-debug-zip branch December 6, 2022 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: anonymize query strings exposed in various crdb_internal tables.
6 participants