Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sqllab): sqllab/execute returns 500 when user only has schema access #28357

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

toniphan21
Copy link

@toniphan21 toniphan21 commented May 6, 2024

SUMMARY

When a user has no admin access and is trying to run a query in SQL Lab, Superset needs to check:

  • Database access
  • Schema access
  • Data source access

To be able to check schema access, Superset needs to know which table the user is trying to execute on and use extract_tables_from_jinja_sql() to get it. The function extract tables in both ways:

  • Extract any tables referenced within the confines of specific Jinja macros.
  • Parse SQL and get tables.

In the line I changed, there is simply a bug that uses template which has type Template. The correct one should be sql, which is a string sent by the user.

TESTING INSTRUCTIONS

Reproduce steps:

  1. Create a new user
  2. Create a role with permission: "schema access on [examples].[main]"
  3. Assign the user to the role and "sql_lab" role
  4. Log in with the new user and go to SQL Lab.
  5. Write a simple query and execute it. You will see that the /api/v1/sqllab/execute/ endpoint returns 500

Impact

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

@@ -1582,7 +1582,7 @@ def extract_tables_from_jinja_sql(sql: str, database: Database) -> set[Table]:
return (
tables
| ParsedQuery(
sql_statement=processor.process_template(template),
Copy link
Member

@john-bodley john-bodley May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toniphan21 I think the current logic (which I added) might be correct, i.e., we first have to render/sanitize the template in order for it to be "proper" SQL so that SQLGlot can actually parse it.

In your PR description you mentioned,

There are 2 ways to extract table set:

  1. Extract any tables referenced within the confines of specific Jinja macros.
  2. Parse SQL and get table.

whereas in actuality we need to do both*, which is what the current code does.

* Actuality for (2) we need to parse "sanitized" SQL, i.e., we the Jinja templates rendered or removed so that SQLGlot is able to parse it.

That said it's interesting that Mypy didn't barf on the fact that the first argument to process_template is of type nodes.Template rather than str. Here's were the argument is processed, where the from_string() method accepts either a node.Template or str (albeit being named to only accept the late).

The TL;DR is I think the code is correct, though the process_template() method could be updated to include a union of str | nodes.Template.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion (I updated the description). From the code, I understand that this function will get tables from both ways. However, the problem lies in ParsedQuery.stripped(), which is supposed to strip whitespace characters out of the given sql. If you pass nodes.Template, it will throw an exception because there is no strip() method in nodes.Template.

Please correct me if I'm wrong, but I understand that in the first part, you already "Extract any tables referenced within the confines of specific Jinja macros." Then, in this part, the given sql should be processed via the processor to create a raw sql_statement, which is then put into the ParsedQuery class to extract tables. If that is the purpose, hence the variable passed to processor.process_template(...) should be a given SQL, which may include Jinja macros.

@toniphan21 toniphan21 changed the title fix(sqllab): execute endpoint returns 500 when user does not have admin access fix(sqllab): execute endpoint returns 500 when user only have schema access May 6, 2024
@toniphan21 toniphan21 changed the title fix(sqllab): execute endpoint returns 500 when user only have schema access fix(sqllab): execute endpoint returns 500 when user only has schema access May 6, 2024
@toniphan21 toniphan21 changed the title fix(sqllab): execute endpoint returns 500 when user only has schema access fix(sqllab): sqllab/execute returns 500 when user only has schema access May 6, 2024
@rusackas
Copy link
Member

If this does indeed resolve #28145 then please feel free to add "Fixes: #28145" do your description block and it'll auto-close that issue when merged.

Meanwhile, running CI 🤞

Copy link

codecov bot commented May 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.56%. Comparing base (76d897e) to head (e85728a).
Report is 853 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #28357       +/-   ##
===========================================
+ Coverage   60.48%   77.56%   +17.07%     
===========================================
  Files        1931      521     -1410     
  Lines       76236    37570    -38666     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    29140    -16974     
+ Misses      28017     8430    -19587     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 49.03% <ø> (-0.13%) ⬇️
javascript ?
mysql 77.03% <ø> (?)
postgres 77.12% <ø> (?)
presto 53.64% <ø> (-0.16%) ⬇️
python 77.56% <ø> (+14.07%) ⬆️
sqlite 76.59% <ø> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tseruga
Copy link

tseruga commented May 21, 2024

Relevant to this issue as well: #28218

@rusackas
Copy link
Member

rusackas commented Jul 3, 2024

Looks like there are some failing unit tests. Do you have time to take a look, @toniphan21 ?

@toniphan21 toniphan21 force-pushed the fix/sql-lab-schema-access-error branch from e4d0612 to e85728a Compare July 8, 2024 19:50
Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, in this part, the given sql should be processed via the processor to create a raw sql_statement, which is then put into the ParsedQuery class to extract tables. If that is the purpose, hence the variable passed to processor.process_template(...) should be a given SQL, which may include Jinja macros.

@toniphan21 Given the following:

Actuality for (2) we need to parse "sanitized" SQL, i.e., we the Jinja templates rendered or removed so that SQLGlot is able to parse it.

We can't pass sql to processor.process_template because we need the processed template with the following:

# Replace the potentially problematic Jinja macro with some benign SQL.
node.__class__ = nodes.TemplateData
node.fields = nodes.TemplateData.fields
node.data = "NULL" 

The question that we need to answer is why Template.render is not returning a str in accordance to its signature:

def render(self, *args: t.Any, **kwargs: t.Any) -> str:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query history page is not usable in 3.1.2 and 4.0.0, caused by bug in sql_parse.py
5 participants