Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: robust(er) adhoc query validation #24032

Closed
wants to merge 1 commit into from

Conversation

betodealmeida
Copy link
Member

SUMMARY

The function has_table_query is used to detect if a given query is selecting from one or more tables, in order to prevent malicious ad-hoc expressions that bypass RLS. Because sqlparse is non-validating the function needs to keep track of a lot of state, and often fails on edge cases.

This PR rewrites the function to use sqloxide. The advantage is that the function is now more robust, supporting custom dialects and handling more edge cases. There are a couple disadvantages, though:

  • sqloxide was an optional dependency, with this PR it becomes mandatory.
  • The function has_table_query now requires a full SQL query, so for validating expressions they need to be wrapped in f"SELECT {expression}".

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

The unit tests pass mostly unmodified, the only exception is that table is usually a reserved keyword and had to be quoted in order to sqloxide to parse it (sqlparse is much more lenient). I added a new test to cover another edge case that was found could bypass RLS.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Comment on lines 110 to 127
pyhive[presto]==0.6.5
# via apache-superset
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is showing up here now.

Comment on lines 520 to 1101
def _process_sql_expression(
expression: Optional[str],
database_id: int,
schema: str,
template_processor: Optional[BaseTemplateProcessor] = None,
) -> Optional[str]:
if template_processor and expression:
expression = template_processor.process_template(expression)
if expression:
try:
expression = validate_adhoc_subquery(
expression,
database_id,
schema,
)
expression = sanitize_clause(expression)
except (QueryClauseValidationException, SupersetSecurityException) as ex:
raise QueryObjectValidationError(ex.message) from ex
return expression


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was almost identical to the base class, the only difference is that is handles SupersetSecurityException. I updated the base class method and removed this implementation.

@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (494068b) 67.04% compared to head (cccb623) 56.19%.
Report is 1 commits behind head on master.

Files Patch % Lines
superset/models/helpers.py 38.09% 13 Missing ⚠️
superset/sql_parse.py 83.33% 4 Missing ⚠️
superset/connectors/sqla/models.py 33.33% 2 Missing ⚠️
superset/models/sql_lab.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #24032       +/-   ##
===========================================
- Coverage   67.04%   56.19%   -10.85%     
===========================================
  Files        1948     1948               
  Lines       76062    76036       -26     
  Branches     8493     8493               
===========================================
- Hits        50995    42730     -8265     
- Misses      22887    31126     +8239     
  Partials     2180     2180               
Flag Coverage Δ
hive ?
javascript 56.50% <ø> (ø)
mysql ?
postgres ?
python 55.86% <59.18%> (-22.58%) ⬇️
sqlite ?
unit 55.86% <59.18%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@betodealmeida
Copy link
Member Author

@villebro any thoughts on this?

@betodealmeida betodealmeida force-pushed the robust_has_table_query branch 7 times, most recently from b51aec2 to 06b3274 Compare October 4, 2023 00:15
@betodealmeida betodealmeida force-pushed the robust_has_table_query branch 2 times, most recently from f3beba7 to 04090ed Compare November 8, 2023 13:00
@villebro
Copy link
Member

@betodealmeida can you rebase this PR so we can get CI running? I'll be happy to review after that. I think it would be great to get this merged, it's a very useful improvement 👍

@sbelondr
Copy link

sbelondr commented Feb 1, 2024

Hello,
I'm interested for this pr and I add this comment to be notified when this pr is validated. Thank you in advance if it can be validated quickly.
Thanks

@betodealmeida
Copy link
Member Author

Closing due to #26786.

@rusackas rusackas deleted the robust_has_table_query branch April 16, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants