Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Automatically add relevant Jinja methods to cache key if present #9572

Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Apr 17, 2020

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

The URL parameters were not being used in the cache key resulting in issues per the attached screenshot. This PR adds the URL params (as well as other JInja macros including current_user_id and current_username) to the cache key by default if they are leveraged by the query.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before

Screen Shot 2020-04-17 at 11 52 20 AM

After

Screen Shot 2020-04-17 at 11 51 25 AM

TEST PLAN

Updated unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

to: @villebro

@@ -173,7 +173,6 @@ def test_viz_cache_key(self):
viz = slc.viz
qobj = viz.query_obj()
cache_key = viz.cache_key(qobj)
self.assertEqual(cache_key, viz.cache_key(qobj))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary, i.e., it's equivalent to,

self.assertEqual(viz.cache_key(qobj), viz.cache_key(qobj))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what I was thinking there 😄

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is the risk that this will cause lots of unnecessary cache misses? I would almost prefer to use cache_key_wrapper for a case like this, as it should be the main tool for adding non-standard cache keys. I can imagine it's a feature that very few probably use or know about, so perhaps it should be promoted somehow when people use jinja in their queries.

@@ -173,7 +173,6 @@ def test_viz_cache_key(self):
viz = slc.viz
qobj = viz.query_obj()
cache_key = viz.cache_key(qobj)
self.assertEqual(cache_key, viz.cache_key(qobj))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what I was thinking there 😄

@john-bodley
Copy link
Member Author

john-bodley commented Apr 18, 2020

@villebro I wasn't aware of cache_key_wrapper. It seems like that method is used primarily for ad-hoc datasource level caching outside of the constructs of Superset, however url_params is a construct that Superset supports and is part of the form-data, hence why it seems like a viable standard field for the cache parameters.

@villebro
Copy link
Member

@john-bodley the way I see it, url_params is a slightly special property in chart metadata, and should only be regarded as default values that are overridden if the parameter is present in the URL. It's worth keeping in mind that adding url_params to the cache key will invalidate the cache key for every query/chart in a dashboard if a query parameter is added to the URL, even if it is not used by the query. It is for this purpose that cache_key_wrapper was added, ie. making sure that any additional variable that needs to be considered in the query is considered in the cache key. Given the fact that using cache_key_wrapper can solve this problem without impacting other charts/queries, I would propose using that for now.

@john-bodley
Copy link
Member Author

@villebro I agree with your point (and noted) that the presence of a URL parameter does not mean it will be used by the query but per the documentation,

As you create a visualization form this SQL Lab query, you can pass parameters in the explore view as well as from the dashboard, and it should carry through to your queries.

Default values for URL parameters can be defined in chart metadata by adding the key-value pair url_params: {'foo': 'bar'}

I perceive one would expect that if one were using them as defined they wouldn't expect the cached payload to be incorrect, i.e., it feels like a bug if they didn't know they needed to wrap the call with cache_key_wrapper.

Maybe there's a common ground we could agree on (however I wasn't able to get it working but maybe you could shed some light on maybe how to do it). Is there a way of possibly calling cache_key_wrapper from within url_param to register url_params as extra cache keys? That would ensure these are only added to the cache if they're used by the query.

@villebro
Copy link
Member

villebro commented Apr 20, 2020

I agree, users will probably expect url_params to be added to cache keys. Perhaps we default url_params to add the value to extra_cache_keys, but leave an option to opt out:

def url_param(param: str, default: Optional[str] = None, add_to_cache_keys: bool=True) -> Optional[Any]:
   ...

Then we also need to change the regex has_calls_to_cache_key_wrapper in connectors/models/sqla.py to be something along the lines of

regex = re.compile(r"\{\{.*(cache_key_wrapper\(.*\)|url_param\(.*).*\}\}")

to make sure we don't unnecessarily trigger re-evaluation of jinja code. As I added this feature, I can put together a PR for it if you wish.

Oh, the syntax for cache_key_wrapper is fairly simple, just wrap it around whatever JSON serializable value you want to add to keys:

{{ cache_key_wrapper(url_param("key", "my default value")) }}

@villebro
Copy link
Member

Looking more closely at the default jinja functions, it appears they should all default to being added to extra cache keys, except filter_values and form_data:

  • url_param
  • current_user_id
  • current_username
    Then regular users wouldn't need to concern themselves with wrapping the function calls, and cache_key_wrapper would be something that mostly more advanced users need when introducing custom jinja functions.

@john-bodley john-bodley force-pushed the john-bodley--add-url-params-to-cache-key branch from edc8c8c to 145b8ea Compare April 20, 2020 17:44
@pull-request-size pull-request-size bot added size/L and removed size/S labels Apr 20, 2020
@john-bodley john-bodley force-pushed the john-bodley--add-url-params-to-cache-key branch 2 times, most recently from 3b92f5a to 13f7a08 Compare April 20, 2020 18:03
@john-bodley
Copy link
Member Author

john-bodley commented Apr 20, 2020

@villebro I took a brief stab at this (commit) though it's untested.

You have more context here but my sense is that the cache_key_wrapper method may now be redundant. Maybe it would be best if you took over this PR (per your previous suggestion) as you're much more well versed on exactly how this all works.

My one concern is it seems that the extra_cache_keys is merely a list of values, i.e., user_1, user_2 etc. however there's not scope and thus there could be collision. I wonder if it should be a dictionary of values keyed by method with more context, i.e,.

{
    "current_user_id": ["user_1", "user_2",], 
    "url_param": [{"foo": "bar"},],
}

Note in the case of the URL parameters having both the key and value is probably necessary for completeness.

@villebro
Copy link
Member

Thanks, I'll test and review shortly. It would be great if you can see this through, as getting a fresh set of eyes on the old code is always positive. If we can completely drop cache_key_wrapper all the merrier.

@john-bodley john-bodley force-pushed the john-bodley--add-url-params-to-cache-key branch from 13f7a08 to 175ad04 Compare April 20, 2020 18:28
@codecov-io
Copy link

codecov-io commented Apr 20, 2020

Codecov Report

Merging #9572 into master will decrease coverage by 0.00%.
The diff coverage is 47.05%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9572      +/-   ##
==========================================
- Coverage   70.52%   70.52%   -0.01%     
==========================================
  Files         574      574              
  Lines       30152    30161       +9     
  Branches     3060     3066       +6     
==========================================
+ Hits        21265    21271       +6     
- Misses       8776     8779       +3     
  Partials      111      111              
Flag Coverage Δ
#cypress 53.74% <ø> (+0.04%) ⬆️
#javascript 58.76% <ø> (ø)
#python 70.56% <47.05%> (-0.01%) ⬇️
Impacted Files Coverage Δ
superset/jinja_context.py 74.00% <39.28%> (-0.45%) ⬇️
superset/views/tags.py 35.13% <50.00%> (ø)
superset/connectors/sqla/models.py 88.51% <100.00%> (-0.02%) ⬇️
.../src/dashboard/components/gridComponents/Chart.jsx 88.76% <0.00%> (-1.13%) ⬇️
superset-frontend/src/SqlLab/actions/sqlLab.js 66.81% <0.00%> (ø)
...set-frontend/src/dashboard/util/getDropPosition.js 92.06% <0.00%> (ø)
...ontend/src/dashboard/components/dnd/handleHover.js 100.00% <0.00%> (+12.50%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6cedaa...9006bb3. Read the comment docs.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks mostly good, just a couple of small changes that I believe are vital for this functionality. Also, we should probably add a note in UPDATING.md informing current users of cache_key_wrapper that the new default behaviour for url_param, current_username and current_user_id will automatically add the value to extra_cache_keys unless explicitly setting add_to_cache_keys = False.

self.extra_cache_keys = extra_cache_keys
self.extra_cache_keys = extra_cache_keys or []

def current_user_id(self, add_to_cache_keys: bool = True) -> Optional[int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the references to the moved methods need to be updated in docs/sqllab.rst.

superset/viz.py Outdated
Comment on lines 398 to 403

url_params = self.form_data.get("url_params")

if url_params:
cache_dict["url_params"] = url_params

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be added here, as calling url_params in the Jinja context will now add the relevant values to extra_keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed.

Comment on lines 439 to 444

url_params = self.form_data.get("url_params")

if url_params:
cache_dict["url_params"] = url_params

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here (thanks for keeping this in sync)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See ^^.

Comment on lines 208 to 216
def test_cache_key_changes_with_url_parameters(self):
self.login(username="admin")
slc = self.get_slice("Girls", db.session)
viz = slc.viz
qobj = viz.query_obj()
cache_key_original = viz.cache_key(qobj)
viz.form_data["url_params"] = {"foo": "bar"}
self.assertNotEqual(cache_key_original, viz.cache_key(qobj))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding a test making sure that the cache key does not change when url_params changes, unless some part of the query_obj is referencing one of the keys present in url_params using Jinja. The point being we want to make sure only the url params that are used affect the cache key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@villebro
Copy link
Member

@villebro I took a brief stab at this (commit) though it's untested.

You have more context here but my sense is that the cache_key_wrapper method may now be redundant. Maybe it would be best if you took over this PR (per your previous suggestion) as you're much more well versed on exactly how this all works.

The cache_key_wrapper method is good to leave as-is for cases where users want to add values to extra_cache_keys other than the predefined current_username() etc. However, the documentation probably needs to be updated to reflect the fact that the predefined jinja functions don't need to be wrapped.

My one concern is it seems that the extra_cache_keys is merely a list of values, i.e., user_1, user_2 etc. however there's not scope and thus there could be collision. I wonder if it should be a dictionary of values keyed by method with more context, i.e,.

{
    "current_user_id": ["user_1", "user_2",], 
    "url_param": [{"foo": "bar"},],
}

Note in the case of the URL parameters having both the key and value is probably necessary for completeness.

With regards to extra_cache_keys being a list, I don't think it's a big issue, as the extra cache keys will probably always be added sequentially. So if we're adding two keys key1 and key2, with values value1 and value2 respectively, it probably won't make a big difference if we're using {"key1": "value1", "key2": "value2"} or ["value1", "value2"]. While there is the theoretical risk that the code adds different keys inder different circumstances, to my understanding this has not been an issue during the time this has been available (since last summer).

@john-bodley
Copy link
Member Author

@villebro could you think of an example when cache_key_wrapper would be used given we're wrapping numerous methods with this? I'm trying to determine whether we still need this method.

@john-bodley john-bodley changed the title [fix] Adding URL params to cache key if present [fix] Automatically adding many Jinja methods to cache key if present Apr 23, 2020
@villebro
Copy link
Member

@john-bodley I think any Callable passed to JINJA_CONTEXT_ADDONS in superset_config.py might require wrapping.

@john-bodley john-bodley changed the title [fix] Automatically adding many Jinja methods to cache key if present [fix] Automatically adding relevant Jinja methods to cache key if present Apr 23, 2020
@john-bodley john-bodley changed the title [fix] Automatically adding relevant Jinja methods to cache key if present [fix] Automatically add relevant Jinja methods to cache key if present Apr 23, 2020
@john-bodley john-bodley force-pushed the john-bodley--add-url-params-to-cache-key branch from 175ad04 to 81268b2 Compare April 23, 2020 23:37
@john-bodley
Copy link
Member Author

@villebro thanks for the feedback. I believe this is now ready for re-review.

@john-bodley john-bodley force-pushed the john-bodley--add-url-params-to-cache-key branch 2 times, most recently from e3684e0 to 36df720 Compare April 24, 2020 01:08
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! 👍 If you could also add a note in UPDATING.md to let users know this is now much simpler, ie. builtin Jinja functions don't need to be wrapped anymore, this is ready to go.

@john-bodley john-bodley force-pushed the john-bodley--add-url-params-to-cache-key branch from 36df720 to 9006bb3 Compare April 24, 2020 06:37
@john-bodley
Copy link
Member Author

@villebro I updated UPDATING.md per your suggestion.

@john-bodley john-bodley merged commit 955a4fe into apache:master Apr 24, 2020
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.37.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 0.37.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants