-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(OAuth2): refactor for custom OAuth2 clients #27880
Conversation
855499f
to
bf5d4e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor first pass comments. I will need to review with more thought later today, as the changes are quite extensive, but in general this looks ok.
```python | ||
from flask import current_app | ||
|
||
The DB engine should implement logic in either `get_url_for_impersonation` or `update_impersonation_config` to update the connection with the personal access token. See the Google Sheets DB engine spec for a reference implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some forthcoming major version we might want to consider renaming these methods to make it clear they don't only encapsulate impersonation logic. Maybe call it something that clarifies that it updates the URL to include user information (=JWT in the case of OAuth2 or user impersonation in the more traditional approach).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But OAuth2 is still a form of user impersonation, we're running a query on a connection that is built on behalf of the user. Putting this under user impersonation means that the cache is not shared between users. This is what we want most of the time, but not always, which is why I think we should decouple the "user-level cache" from "user-impersonation" and communicate better what's happening in these flows.
Having said that, I agree that the naming of these methods is confusing, and personally I think they could be combined in a single one. Currently the former modifies the URL and the latter modifies the params passed to create_engine
, but they could be consolidated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be splitting hairs, but I perceive user impersonation to be a very specific execution form, which means "query is executed using a service account, but is executed as if a particular user us executing it". In the case of OAuth2, the query is in fact executed by the user, not the service account, and makes it more secure.
I also agree that consolidating these methods could make sense, as having separate methods for mutating the URL and the other for params to create_engine
often causes confusion for me, and are IMO really the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
Interestingly, I always understood it is "Superset is impersonating a user", while your perception is that "the service account is impersonating a user". I agree those are two very different scenarios — OAuth2 starts from zero permissions and increases them; traditional impersonation starts with full permissions and lowers them.
But it just confirms your point, the naming is confusing, and we should definitely clarify it and probably make these two separate things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, lots of good improvements here 👍
def __init__( # pylint: disable=too-many-arguments,too-many-locals | ||
def __init__( # pylint: disable=too-many-locals, too-many-arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, was this change automatically applied by one of our linters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think what happened is that pylint
got bumped. When that happens, the new rules are not enforced in the same PR, since it only runs in files that have been modified. So only in the next PR that the rules will be checked. I think @mistercrunch is working on fixing this.
Here I fixed the lint manually, then someone else fixed in master, but using a different order.
SUMMARY
This PR is a small refactor of #27631, addressing a different workflow for OAuth2 suggested by @villebro, where any number of OAuth2 clients can be configured and assigned manually to databases.
The main change is that the DB engine spec no longer inspects the application config to get the OAuth2 client details (ID, secret, etc.). Instead, the OAuth2 related methods in the DB engine spec now receive a configuration object:
The configuration is read from the database model:
The current implementation simply builds the config object from
superset_config.py
, but in the future the database could check first for a custom OAuth2 client associated with it and use that instead. Similarly, to check if OAuth2 is enabled in a given database we now do:The current method also simply checks
superset_config.py
for now.Having a standard configuration being passed to the DB engine spec has helped simplify the OAuth2 related methods. I was able to move the methods from GSheets to the base DB engine spec by adding a few new attributes. All that is needed to support OAuth2 in Gsheets now is:
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION