-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(SIP-85): OAuth2 for databases #27631
Conversation
0cd62a5
to
8c7c027
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #27631 +/- ##
==========================================
+ Coverage 69.89% 69.96% +0.07%
==========================================
Files 1911 1916 +5
Lines 75024 75377 +353
Branches 8355 8403 +48
==========================================
+ Hits 52435 52741 +306
- Misses 20539 20571 +32
- Partials 2050 2065 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
e055f69
to
be464ba
Compare
047477d
to
7e1c4e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems solid. My most important point is probably around adding a database index on the new model, the rest are comments/notes.
superset/migrations/versions/2024-03-20_16-02_678eefb4ab44_add_access_token_table.py
Show resolved
Hide resolved
@betodealmeida do we perceive there could/would be other authorization frameworks other than OAuth 2.0? If so I was wondering if there was merit in renaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far! I think there's quite a few cases that could be tested. (Token delete/insert/ etc., the updated engine spec definitions, etc...)
@john-bodley I'm not sure, to be honest. The beauty of OAuth2 is that the same flow is shared across multiple providers, all you need is an access token and a refresh token, so the same foundation works for BigQuery/GSheets/Snowflake/Dremio/Databricks. The one case I can think of is if at some point we'd want users to be able to input their own username/password, but I think that trying to address potential future uses would increase the complexity without clear benefits. |
@craig-rueda I did test inserting/deleting the token in the API test. I'll add tests for the new DB engine spec methods. |
@betodealmeida I've been trying to adapt this to work with Trino but I'm a bit stuck. I got the This happens just for opening the sqllab menu (Trino is the first selection so the
Buuut, the FE doesn't show any interaction, no new tabs, no request for permissions for the website missing, etc. It only shows "There was an error loading the schemas". If I do try to execute a query, I get another (unrelated, hopefuly) error which I can't get around: The question: any tip on what I could look into? |
@joaoferrao yeah, I think this is because we run the query in a separate thread for Trino. I'm away from my computer right now and traveling tomorrow, do you want to hop on a call on Wednesday to discuss this? |
Basically, I think we need this fix: #29981 |
Thanks for the offer, I dropped you an email.
I will take a look! |
@betodealmeida let me know if my email didn't reach you, sent it to the one you have on your profile. In parallel, good news: I was able to make it work, but I will need some suggestions for a couple of points. I will open a PR and mention you, in case you have the time to provide your input. |
SUMMARY
This PR introduces a new table called
database_user_oauth2_tokens
. The table is used for storing personal user tokens associated with a given database:Whenever a SQLAlchemy engine is instantiated, the personal user token (or
None
) will be passed to theget_url_for_impersonation
method in the DB engine spec, so that a custom URL can be built for the user. For example, for GSheets:The change allows users to login to databases like BigQuery, Snowflake, Dremio, Databricks, Google Sheets, etc. using their own credentials. This makes it easier to set up databases, since service accounts are no longer required, and provides better isolation of data between users. Only support for Google Sheets is implemented in this PR, and it's considered the reference implementation. Note that a newer version of Shillelagh is required, since a change in the Google Auth API introduced a regression.
In order to populate the table with personal access tokens, the DB engine spec checks for a specific exception that signals that OAuth2 should start:
When called, the
start_oauth2_dance
method will return the errorOAUTH2_REDIRECT
to the frontend. The error is captured by theErrorMessageWithStackTrace
component, which provides a link to the user so they can start the OAuth2 authentication. Since this is implemented at the DB engine spec level, any query will trigger it — in SQL Lab, Explore, or dashboards — see the screenshots below for the UX.Note that while the current implementation triggers OAuth2 when a query needs authorization, we could also implement affordances in the database UI to manually trigger OAuth2 to store the personal access tokens. This could be done in the future.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
SQL Lab. Note that the query runs automatically once OAuth is completed:
SIP-85.Sql.Lab.mov
Explore. Note that the chart is automatically updated after OAuth:
SIP-85.Explore.mov
Same thing for dashboards:
SIP-85.Dashboard.mov
TESTING INSTRUCTIONS
superset_config.py
and add the client ID and secret:ADDITIONAL INFORMATION