Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate requirement level of db.statement #754

Open
jack-berg opened this issue Feb 16, 2024 · 5 comments
Open

Evaluate requirement level of db.statement #754

jack-berg opened this issue Feb 16, 2024 · 5 comments
Assignees

Comments

@jack-berg
Copy link
Member

jack-berg commented Feb 16, 2024

The db.statement attribute currently has a requirement level of recommended, with a footnote that reads:

Should be collected by default only if there is sanitization that excludes sensitive information.

This requires that we elaborate on the definition of sanitization (and maybe define a standard algorithm for sanitization). Generally, this increases instrumentation complexity.

In the 2/16/2024 db semconv working group, we discussed a number of possible options:

(1) db.statement is conditionally required

  • Conditional on:
    • Instrumentation library performing sanitization prior to capture OR
    • Instrumentation library knows that the statement is template (i.e. prepared statement in SQL-speak) and unlikely to contain sensitive information
  • If condition is false, its opt-in

(2) Capture db.statement OR db.statement_template

  • db.statement_template
    • Captured if instrumentation library knows the statement is a template (i.e. prepared statement in SQL-speak)
    • Conditionally required if template is available
  • db.statement
    • Captures the raw or resolved statement, assuming that a template is not used
    • Opt-in, OR conditionally required if instrumentation library performs sanitization and template is not used

(3) Capture db.statement AND (bool) db.is_parameterized_statement

  • db.is_parameterized_statement is required and reflects whether the the statement is a template / parameterized
  • db.statement
    • Conditionally required if db.is_parameterized_statement = true
    • Opt in if db.is_parameterized_statement = false

The options with a single statement attribute (1 & 3) make it easier to dashboards because the content is always in the same place, but make it harder to express the requirement level rules. Option 2 has simpler rules for requirement levels, but dashboards have to look in two places for the information.

@jack-berg
Copy link
Member Author

I'm in favor of option 3:

  • A consumer which is worried about potentially sensitive data can follow very simple rules to mitigate risk: If db.is_parameterized_statement=true, then db.statement is safe. If db.is_parameterized_statement=false, then the consumer knows that if db.statement is present, the user opted into it. They can accept the risk knowing that it was opted into, or reject if some level of uncertainty is unacceptable, or scrub the statement.
  • Because there is no condition based on whether the statement is has been sanitized, instrumentations don't need to be burdened with sanitization. We can define a db sanitization processor in the collector instead of in all languages. This reduces implementation complexity.
  • Because db.statement is included by default if the statement is sanitized, many users will have a good experience out of the box, since many users will use parameterized statements and having db.statement will be important to many users.

@trask
Copy link
Member

trask commented Feb 21, 2024

it would be nice to give instrumentation the option to sanitize if they want, e.g. Java instrumentation already implements sanitization which is pretty nice for people not using the collector

how could we incorporate this into option 3? e.g. allow instrumentation to sanitize and emit is_parameterized_statement=true even though it wasn't a true "parameterized" statement

similar thought for sanitization that occurs on the collector, should it emit is_parameterized_statement=true after sanitization so that downstream collectors/ingestion are aware that the statement has already been sanitized?

@jack-berg
Copy link
Member Author

Maybe is_parameterized_statement is actually is_sanitized_statement, and is true if the statement is either parameterized or has been sanitized by instrumentation.

@cheempz
Copy link

cheempz commented Feb 23, 2024

it would be nice to give instrumentation the option to sanitize if they want...people not using the collector

Yes this is our use case so +1 on the option

@trask
Copy link
Member

trask commented Apr 24, 2024

I like option 3 as well, though I think we should hold off on implementing it in case a more generalized solution comes out of #128, and also because I think we can add this post-stability

@trask trask moved this to Post Stability in Database Client Semantic Conventions Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Post Stability
Development

No branches or pull requests

4 participants