Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sqlalchemy db.statement sanitization flag #1701

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

nemoshlag
Copy link
Member

Description

Added an optional query sanitizer to the SQLAlchemy instrumentation.
Usage
SQLAlchemyInstrumentor().instrument(sanitize_query=True)

This will affect the DB_STATEMENT value to contain the original query or sanitized one.

Fixes #1549
Following the specification discussion here

Type of change

Please delete options that are not relevant.

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • Test A db.statement has been sanitized
  • Test B no side affects occurred on a query without a db.statement attribute

Does This PR Require a Core Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@github-actions github-actions bot requested a review from shalevr March 1, 2023 10:13
@nemoshlag nemoshlag marked this pull request as ready for review March 1, 2023 12:52
@nemoshlag nemoshlag requested a review from a team March 1, 2023 12:52
Copy link
Member

@shalevr shalevr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Thank you for adding this

CHANGELOG.md Outdated Show resolved Hide resolved
@@ -27,6 +27,16 @@
from opentelemetry.trace.status import Status, StatusCode


def _sanitize_query(query):
"""Remove query content, replace with sanitization symbol.
For example `SELECT * FROM table` will sanitize to SELECT ? ?`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how other SQL db instrumentations sanitize?

Copy link
Member

@srikanthccv srikanthccv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the _sanitize_query is correct.

Comment on lines 30 to 37
def _sanitize_query(query):
"""Remove query content, replace with sanitization symbol.
For example `SELECT * FROM table` will sanitize to SELECT ? FROM ?`
"""
sanitize_symbol = " ?"
if query and query.split():
return query.split()[0] + sanitize_symbol + " FROM" + sanitize_symbol
return ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is incorrect for many cases, right? Just taking the first word and concatenating it with ? FROM ? because statements may contain multi-word instructions such as INSERT INTO and ALTER TABLE, and it's not always guaranteed that there will be FROM.

Comment on lines +104 to +107
for word in query.split():
if word.upper() not in sql_reserved_dict:
word = "?"
sanitized_query += word + " "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is better than the last change I still find this naive because I can think of some WHERE clause cases this breaks. I wonder if there is a sqlalchemy does this sanitization work more reliably?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this is that anything which isn't a reserved word will be sanitized. Can you clarify with an example of where clause case which breaks this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine a scenario WHERE clause filter containing spaces and one of these reserved words. How does it behave for the following query

SELECT * FROM table WHERE column_name="PRIMARY BANKING STOCK INDEX COLLAPSE". 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes, I see what you're saying. This will result with SELECT ? FROM table WHERE ? ? ? INDEX ?. I still think the SELECT ? ? option is the safest

@arbiv
Copy link

arbiv commented Aug 9, 2023

@shalevr can this be merged? seems that comments were fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement sensitive data sanitization for sqlalchemy instrumentation
6 participants