Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(OAuth2): refactor for custom OAuth2 clients #27880

Merged
merged 2 commits into from
Apr 5, 2024
Merged

Conversation

betodealmeida
Copy link
Member

SUMMARY

This PR is a small refactor of #27631, addressing a different workflow for OAuth2 suggested by @villebro, where any number of OAuth2 clients can be configured and assigned manually to databases.

The main change is that the DB engine spec no longer inspects the application config to get the OAuth2 client details (ID, secret, etc.). Instead, the OAuth2 related methods in the DB engine spec now receive a configuration object:

class OAuth2ClientConfig(TypedDict):
    id: str
    secret: str
    scope: str
    redirect_uri: str
    authorization_request_uri: str
    token_request_uri: str

The configuration is read from the database model:

oauth2_config = database.get_oauth2_config()

The current implementation simply builds the config object from superset_config.py, but in the future the database could check first for a custom OAuth2 client associated with it and use that instead. Similarly, to check if OAuth2 is enabled in a given database we now do:

if database.is_oauth2_enabled():
   pass

The current method also simply checks superset_config.py for now.

Having a standard configuration being passed to the DB engine spec has helped simplify the OAuth2 related methods. I was able to move the methods from GSheets to the base DB engine spec by adding a few new attributes. All that is needed to support OAuth2 in Gsheets now is:

from shillelagh.adapters.api.gsheets.lib import SCOPES

class GSheetsEngineSpec(ShillelaghEngineSpec):

    supports_oauth2 = True
    oauth2_scope = " ".join(SCOPES)
    oauth2_authorization_request_uri = "https://accounts.google.com/o/oauth2/v2/auth"
    oauth2_token_request_uri = "https://oauth2.googleapis.com/token"
    oauth2_exception = UnauthenticatedError

    @classmethod
    def get_url_for_impersonation(
        cls,
        url: URL,
        impersonate_user: bool,
        username: str | None,
        access_token: str | None,
    ) -> URL:
        if impersonate_user and access_token:
            url = url.update_query_dict({"access_token": access_token})

        return url

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@betodealmeida betodealmeida requested a review from villebro April 3, 2024 15:40
@github-actions github-actions bot added the api Related to the REST API label Apr 3, 2024
@betodealmeida betodealmeida force-pushed the sip-85-b branch 4 times, most recently from 855499f to bf5d4e9 Compare April 3, 2024 16:29
@betodealmeida betodealmeida marked this pull request as ready for review April 3, 2024 16:32
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor first pass comments. I will need to review with more thought later today, as the changes are quite extensive, but in general this looks ok.

superset/db_engine_specs/README.md Outdated Show resolved Hide resolved
```python
from flask import current_app

The DB engine should implement logic in either `get_url_for_impersonation` or `update_impersonation_config` to update the connection with the personal access token. See the Google Sheets DB engine spec for a reference implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some forthcoming major version we might want to consider renaming these methods to make it clear they don't only encapsulate impersonation logic. Maybe call it something that clarifies that it updates the URL to include user information (=JWT in the case of OAuth2 or user impersonation in the more traditional approach).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But OAuth2 is still a form of user impersonation, we're running a query on a connection that is built on behalf of the user. Putting this under user impersonation means that the cache is not shared between users. This is what we want most of the time, but not always, which is why I think we should decouple the "user-level cache" from "user-impersonation" and communicate better what's happening in these flows.

Having said that, I agree that the naming of these methods is confusing, and personally I think they could be combined in a single one. Currently the former modifies the URL and the latter modifies the params passed to create_engine, but they could be consolidated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be splitting hairs, but I perceive user impersonation to be a very specific execution form, which means "query is executed using a service account, but is executed as if a particular user us executing it". In the case of OAuth2, the query is in fact executed by the user, not the service account, and makes it more secure.

I also agree that consolidating these methods could make sense, as having separate methods for mutating the URL and the other for params to create_engine often causes confusion for me, and are IMO really the same thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

Interestingly, I always understood it is "Superset is impersonating a user", while your perception is that "the service account is impersonating a user". I agree those are two very different scenarios — OAuth2 starts from zero permissions and increases them; traditional impersonation starts with full permissions and lowers them.

But it just confirms your point, the naming is confusing, and we should definitely clarify it and probably make these two separate things.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, lots of good improvements here 👍

def __init__( # pylint: disable=too-many-arguments,too-many-locals
def __init__( # pylint: disable=too-many-locals, too-many-arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, was this change automatically applied by one of our linters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think what happened is that pylint got bumped. When that happens, the new rules are not enforced in the same PR, since it only runs in files that have been modified. So only in the next PR that the rules will be checked. I think @mistercrunch is working on fixing this.

Here I fixed the lint manually, then someone else fixed in master, but using a different order.

@betodealmeida betodealmeida merged commit 9377227 into master Apr 5, 2024
40 checks passed
@rusackas rusackas deleted the sip-85-b branch April 5, 2024 16:50
EandrewJones pushed a commit to UMD-ARLIS/superset that referenced this pull request Apr 5, 2024
EnxDev pushed a commit to EnxDev/superset that referenced this pull request Apr 12, 2024
qleroy pushed a commit to qleroy/superset that referenced this pull request Apr 28, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 4.1.0 labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 4.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants