-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sqlalchemy db.statement sanitization flag #1701
base: main
Are you sure you want to change the base?
Add sqlalchemy db.statement sanitization flag #1701
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Thank you for adding this
@@ -27,6 +27,16 @@ | |||
from opentelemetry.trace.status import Status, StatusCode | |||
|
|||
|
|||
def _sanitize_query(query): | |||
"""Remove query content, replace with sanitization symbol. | |||
For example `SELECT * FROM table` will sanitize to SELECT ? ?` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this how other SQL db instrumentations sanitize?
Co-authored-by: Srikanth Chekuri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the _sanitize_query
is correct.
def _sanitize_query(query): | ||
"""Remove query content, replace with sanitization symbol. | ||
For example `SELECT * FROM table` will sanitize to SELECT ? FROM ?` | ||
""" | ||
sanitize_symbol = " ?" | ||
if query and query.split(): | ||
return query.split()[0] + sanitize_symbol + " FROM" + sanitize_symbol | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is incorrect for many cases, right? Just taking the first word and concatenating it with ? FROM ?
because statements may contain multi-word instructions such as INSERT INTO
and ALTER TABLE
, and it's not always guaranteed that there will be FROM
.
for word in query.split(): | ||
if word.upper() not in sql_reserved_dict: | ||
word = "?" | ||
sanitized_query += word + " " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is better than the last change I still find this naive because I can think of some WHERE
clause cases this breaks. I wonder if there is a sqlalchemy does this sanitization work more reliably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind this is that anything which isn't a reserved word will be sanitized. Can you clarify with an example of where clause case which breaks this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine a scenario WHERE
clause filter containing spaces and one of these reserved words. How does it behave for the following query
SELECT * FROM table WHERE column_name="PRIMARY BANKING STOCK INDEX COLLAPSE".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yes, I see what you're saying. This will result with SELECT ? FROM table WHERE ? ? ? INDEX ?
. I still think the SELECT ? ?
option is the safest
@shalevr can this be merged? seems that comments were fixed |
Description
Added an optional query sanitizer to the SQLAlchemy instrumentation.
Usage
SQLAlchemyInstrumentor().instrument(sanitize_query=True)
This will affect the DB_STATEMENT value to contain the original query or sanitized one.
Fixes #1549
Following the specification discussion here
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Does This PR Require a Core Repo Change?
Checklist: