-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DB sanitization uniform format #717
Comments
FWIW Java implements the 3rd approach in all DB clients that we currently instrument (SQL and SQL-like, Redis, MongoDb) - enabled by default, disabled by a configuration setting. We attempt to sanitize all variables, while preserving the database schema if possible. I think that the first two approaches do not provide any useful information (the first one simply contains |
If the data is sanitzed, and we write some text like "SELECT ? FROM ?" in the attribute value, it could be hard for backends to correctly label this value and process it. For example, if processors try to extract the table name from the statement, they might try to parse this text and decide the table name is a valid "?" instead of "missing"/"sanitized". If the value is "SELECT {query information is sanitized}" it will probably fail parsing the SQL statement which could be either invalid behavior of the app (requires attention), or a valid "magic text" to signal lack of data. Another example - I would love to see some deterministic rules that allow the processors to differentiate between sanitized statements and actual statements. If it involve parsing or looking for magic text, that should be precisely speced. |
@open-telemetry/technical-committee can you move this to https://github.com/open-telemetry/semantic-conventions? |
possible language for SQL queries:
|
should normalization be part of this or not? e.g.
or
|
I wouldn't say is necessarily easy, because how do you decide the "main" method on a statement that have multiple methods such as update and select. Or statements that start with
For this one I think is more helpful for the user but agree it will required a harder level to implement and keep consistency. For example, on CockroachDB the replacements were something as follows: When there was Another thing to consider on this option is level of sanitization, because is more common for the values to be hidden, but not the columns, so it would need options for sanitazing just the value or the value + columns. |
thoughts from SIG meeting:
|
from SIG meeting:
|
What are you trying to achieve?
According to the db spec, the
db.statement
value can be sanitized, but it is not defined how to do so.Currently, the sanitization is being dealt with differently in few places.
I suggest to add a uniform format that will describe how to do the sanitization.
(It will be best if this format will apply to all different DB's and syntaxes)
Different implementations examples:
I suggest a few options to replace the value with:
Keep the method name, and add a sanitized text.
for example:
db.statement = "SELECT {query information is sanitized}"
advantages: quite easy to implement, easy to keep consistent across libraries.
dis-advantages: will require to research whether all different libraries can handle this format effectively.
Simple text that will describe that the value is sanitized.
for example:
db.statement = "query information is sanitized"
advantages: easy to implement, easy to keep consistent across different libraries.
dis-advantages: doesn't supply basic information about the query that could be useful.
Replace the values with question marks.
for example:
db.statement = "SELECT ? FROM ?"
advantages: keeps more amount of information, while still not exposing sensitive or private data.
dis-advantages: harder to implement, harder to keep consistent across libraries.
I would like to hear opinions about the suggested solutions, or hear different ideas.
Additional context.
open-telemetry/opentelemetry-specification#3104 - Issue regarding changing the recommendation to sanitize the information by default.
#708 - Issue about missing examples for sanitization in specs.
The text was updated successfully, but these errors were encountered: