-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: robust(er) adhoc query validation #24032
Conversation
requirements/testing.txt
Outdated
pyhive[presto]==0.6.5 | ||
# via apache-superset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is showing up here now.
superset/connectors/sqla/models.py
Outdated
def _process_sql_expression( | ||
expression: Optional[str], | ||
database_id: int, | ||
schema: str, | ||
template_processor: Optional[BaseTemplateProcessor] = None, | ||
) -> Optional[str]: | ||
if template_processor and expression: | ||
expression = template_processor.process_template(expression) | ||
if expression: | ||
try: | ||
expression = validate_adhoc_subquery( | ||
expression, | ||
database_id, | ||
schema, | ||
) | ||
expression = sanitize_clause(expression) | ||
except (QueryClauseValidationException, SupersetSecurityException) as ex: | ||
raise QueryObjectValidationError(ex.message) from ex | ||
return expression | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was almost identical to the base class, the only difference is that is handles SupersetSecurityException
. I updated the base class method and removed this implementation.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #24032 +/- ##
===========================================
- Coverage 67.04% 56.19% -10.85%
===========================================
Files 1948 1948
Lines 76062 76036 -26
Branches 8493 8493
===========================================
- Hits 50995 42730 -8265
- Misses 22887 31126 +8239
Partials 2180 2180
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@villebro any thoughts on this? |
b51aec2
to
06b3274
Compare
f3beba7
to
04090ed
Compare
@betodealmeida can you rebase this PR so we can get CI running? I'll be happy to review after that. I think it would be great to get this merged, it's a very useful improvement 👍 |
04090ed
to
cccb623
Compare
Hello, |
Closing due to #26786. |
SUMMARY
The function
has_table_query
is used to detect if a given query is selecting from one or more tables, in order to prevent malicious ad-hoc expressions that bypass RLS. Becausesqlparse
is non-validating the function needs to keep track of a lot of state, and often fails on edge cases.This PR rewrites the function to use
sqloxide
. The advantage is that the function is now more robust, supporting custom dialects and handling more edge cases. There are a couple disadvantages, though:sqloxide
was an optional dependency, with this PR it becomes mandatory.has_table_query
now requires a full SQL query, so for validating expressions they need to be wrapped inf"SELECT {expression}"
.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TESTING INSTRUCTIONS
The unit tests pass mostly unmodified, the only exception is that
table
is usually a reserved keyword and had to be quoted in order tosqloxide
to parse it (sqlparse
is much more lenient). I added a new test to cover another edge case that was found could bypass RLS.ADDITIONAL INFORMATION