Evaluate requirement level of db.statement #754

jack-berg · 2024-02-16T21:48:48Z

The db.statement attribute currently has a requirement level of recommended, with a footnote that reads:

Should be collected by default only if there is sanitization that excludes sensitive information.

This requires that we elaborate on the definition of sanitization (and maybe define a standard algorithm for sanitization). Generally, this increases instrumentation complexity.

In the 2/16/2024 db semconv working group, we discussed a number of possible options:

(1) db.statement is conditionally required

Conditional on:
- Instrumentation library performing sanitization prior to capture OR
- Instrumentation library knows that the statement is template (i.e. prepared statement in SQL-speak) and unlikely to contain sensitive information
If condition is false, its opt-in

(2) Capture db.statement OR db.statement_template

db.statement_template
- Captured if instrumentation library knows the statement is a template (i.e. prepared statement in SQL-speak)
- Conditionally required if template is available
db.statement
- Captures the raw or resolved statement, assuming that a template is not used
- Opt-in, OR conditionally required if instrumentation library performs sanitization and template is not used

(3) Capture db.statement AND (bool) db.is_parameterized_statement

db.is_parameterized_statement is required and reflects whether the the statement is a template / parameterized
db.statement
- Conditionally required if db.is_parameterized_statement = true
- Opt in if db.is_parameterized_statement = false

The options with a single statement attribute (1 & 3) make it easier to dashboards because the content is always in the same place, but make it harder to express the requirement level rules. Option 2 has simpler rules for requirement levels, but dashboards have to look in two places for the information.

jack-berg · 2024-02-21T16:34:43Z

I'm in favor of option 3:

A consumer which is worried about potentially sensitive data can follow very simple rules to mitigate risk: If db.is_parameterized_statement=true, then db.statement is safe. If db.is_parameterized_statement=false, then the consumer knows that if db.statement is present, the user opted into it. They can accept the risk knowing that it was opted into, or reject if some level of uncertainty is unacceptable, or scrub the statement.
Because there is no condition based on whether the statement is has been sanitized, instrumentations don't need to be burdened with sanitization. We can define a db sanitization processor in the collector instead of in all languages. This reduces implementation complexity.
Because db.statement is included by default if the statement is sanitized, many users will have a good experience out of the box, since many users will use parameterized statements and having db.statement will be important to many users.

trask · 2024-02-21T17:06:53Z

it would be nice to give instrumentation the option to sanitize if they want, e.g. Java instrumentation already implements sanitization which is pretty nice for people not using the collector

how could we incorporate this into option 3? e.g. allow instrumentation to sanitize and emit is_parameterized_statement=true even though it wasn't a true "parameterized" statement

similar thought for sanitization that occurs on the collector, should it emit is_parameterized_statement=true after sanitization so that downstream collectors/ingestion are aware that the statement has already been sanitized?

jack-berg · 2024-02-21T17:16:39Z

Maybe is_parameterized_statement is actually is_sanitized_statement, and is true if the statement is either parameterized or has been sanitized by instrumentation.

cheempz · 2024-02-23T22:13:54Z

it would be nice to give instrumentation the option to sanitize if they want...people not using the collector

Yes this is our use case so +1 on the option

trask · 2024-04-24T19:29:51Z

I like option 3 as well, though I think we should hold off on implementing it in case a more generalized solution comes out of #128, and also because I think we can add this post-stability

jack-berg added this to Database Client Semantic Conventions Feb 16, 2024

github-actions bot assigned AlexanderWert Feb 16, 2024

trask moved this to Post Stability in Database Client Semantic Conventions Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate requirement level of db.statement #754

Evaluate requirement level of db.statement #754

jack-berg commented Feb 16, 2024 •

edited

Loading

jack-berg commented Feb 21, 2024

trask commented Feb 21, 2024

jack-berg commented Feb 21, 2024

cheempz commented Feb 23, 2024

trask commented Apr 24, 2024

Evaluate requirement level of db.statement #754

Evaluate requirement level of db.statement #754

Comments

jack-berg commented Feb 16, 2024 • edited Loading

jack-berg commented Feb 21, 2024

trask commented Feb 21, 2024

jack-berg commented Feb 21, 2024

cheempz commented Feb 23, 2024

trask commented Apr 24, 2024

jack-berg commented Feb 16, 2024 •

edited

Loading