Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Prevent cached bootstrap data from leaking between users w/ same first/last name #26023

Conversation

jfrag1
Copy link
Member

@jfrag1 jfrag1 commented Nov 20, 2023

SUMMARY

The caching mechanism for bootstrap data was not working as intended in the case where two users share the same first/last name. The cache key was dependent on the User object passed. When a non-primitive object is used to generate a cache key, flask-caching calls repr() on it to determine the key: https://flask-caching.readthedocs.io/en/latest/index.html#memoization.

The User class comes from flask appbuilder and has the following __repr__:

def get_full_name(self):
    return "{0} {1}".format(self.first_name, self.last_name)

def __repr__(self):
    return self.get_full_name()

Therefore, when two users shared a first and last name, they would generate the same cache key and the cache could leak between the two users. This was noticed because it would result in affected users seeing menu results they shouldn't see, though it potentially had other subtle effects as well, particularly for anyone adding a user-dependent COMMON_BOOTSTRAP_OVERRIDES_FUNC.

This PR fixes the issue by passing the actual user id to the memoized function so that it's included in the cache key.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link

codecov bot commented Nov 20, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2499a1c) 69.05% compared to head (6ee1eb8) 69.07%.
Report is 21 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #26023      +/-   ##
==========================================
+ Coverage   69.05%   69.07%   +0.02%     
==========================================
  Files        1938     1937       -1     
  Lines       75835    75835              
  Branches     8427     8427              
==========================================
+ Hits        52366    52385      +19     
+ Misses      21299    21280      -19     
  Partials     2170     2170              
Flag Coverage Δ
hive 53.68% <100.00%> (+<0.01%) ⬆️
mysql 78.16% <100.00%> (-0.03%) ⬇️
postgres 78.26% <100.00%> (-0.03%) ⬇️
presto 53.64% <100.00%> (+<0.01%) ⬆️
python 82.95% <100.00%> (+0.05%) ⬆️
sqlite 76.91% <100.00%> (-0.03%) ⬇️
unit 55.79% <100.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -380,7 +380,9 @@ def menu_data(user: User) -> dict[str, Any]:


@cache_manager.cache.memoize(timeout=60)
def cached_common_bootstrap_data(user: User, locale: str) -> dict[str, Any]:
def cached_common_bootstrap_data( # pylint: disable=unused-argument
user: User, user_id: int | None, locale: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both User and user_id? Surely the former contains the ID which should then be used for the cache key given it's globally unique.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While yes, User "contains" the ID, it's __repr__ method that flask-caching uses to generate the cache key does not (see PR description).

The 2 alternatives to this fix as-is would be to:

  1. change the __repr__ upstream on FAB to include the id. (cc @dpgaspar if this is a welcome change)
  2. Subclass FAB's user class and add our own repr then update to use that user class everywhere, which seems like a pretty large change for this small fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfrag1 sorry I missed that detail when initially glancing over the PR description. Is a third alternative to just pass in user_id instead of both user and user_id?

@@ -380,7 +380,9 @@ def menu_data(user: User) -> dict[str, Any]:


@cache_manager.cache.memoize(timeout=60)
def cached_common_bootstrap_data(user: User, locale: str) -> dict[str, Any]:
def cached_common_bootstrap_data( # pylint: disable=unused-argument
user: User, user_id: int | None, locale: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfrag1 sorry I missed that detail when initially glancing over the PR description. Is a third alternative to just pass in user_id instead of both user and user_id?

@@ -424,8 +426,9 @@ def cached_common_bootstrap_data(user: User, locale: str) -> dict[str, Any]:


def common_bootstrap_payload(user: User) -> dict[str, Any]:
user_id = user.id if hasattr(user, "id") else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the FAB User class will always have the id attribute defined and thus simply passing in user.id on line 431 should be safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grokking through the code it seems like this is always g.user, thus rather than passing around the global, it seems clearer to remove the user variable from said method and thus said logic simply becomes,

 from superset.utils.core import get_user_id 

def common_bootstrap_payload() -> dict[str, Any]:
    return {
        **cached_common_bootstrap_data(get_user_id(), get_locale()),
        "flash_messages": get_flashed_messages(with_categories=True),
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh great point it is always g.user, I'll make that change.

It seems like the FAB User class will always have the id attribute defined and thus simply passing in user.id on line 431 should be safe.

The type hint was a bit of a lie since g.user could also be AnonymousUserMixin. I'd update it but it's getting removed instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfrag1 I think my comment is somewhat of a moot point given that it seems like we're going down the get_user_id() route.

Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jfrag1 jfrag1 closed this Nov 21, 2023
@jfrag1 jfrag1 reopened this Nov 21, 2023
@jfrag1 jfrag1 closed this Nov 21, 2023
@jfrag1 jfrag1 reopened this Nov 21, 2023
@jfrag1 jfrag1 closed this Nov 21, 2023
@jfrag1 jfrag1 reopened this Nov 21, 2023
@eschutho eschutho merged commit 630734b into apache:master Nov 21, 2023
119 of 123 checks passed
@eschutho eschutho deleted the jack/fix-bootstrap-memoize-for-when-2-users-share-name branch November 21, 2023 23:39
@michael-s-molina michael-s-molina added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Nov 22, 2023
sadpandajoe pushed a commit to preset-io/superset that referenced this pull request Nov 28, 2023
@sadpandajoe
Copy link
Member

🏷️ preset:2023.47

michael-s-molina pushed a commit that referenced this pull request Dec 4, 2023
josedev-union pushed a commit to Ortege-xyz/studio that referenced this pull request Jan 22, 2024
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
@mistercrunch mistercrunch added 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants