Filter Box Caching Incorrectly for Multi-Query Use Case #7666

smokemonster99 · 2019-06-06T15:13:22Z

A filter box that generates multiple queries (multiple filter controls) does not generate the correct cache entries for each filter control. It appears that only 1 (the first) query gets cached correctly. A correct cache entry should have a single query and associated results. What I see for most filter box cache entries are multiple queries from the filter box. So when the filter box gets reloaded, it only finds one of the cached queries and has cache misses for the rest. In each successive reload, the cache misses decrease by one. I suspect only the first query is generating the correct cache entry.

I am thinking the issue might be related to 'class FilterBoxViz' and/or 'run_extra_queries'

Expected results

If I have a filter box with 16 filter controls, then a reload of the page within the cache TTL should generate 16 cache hits.

Actual results

Only 1 cache hit and 15 cache misses. I see 15 more logs like
2019-06-06 14:09:09,893:INFO:root:Caching 1180 chars at key c4dceda1ab0aef7047ef06ebb6063d88

Screenshots

Here is my filter box (16 controls)

How to reproduce the bug

Set logging to INFO. I am using REDIS cache but probably any cache would do.

Load a dashboard containing a filter box with more than 1 filter control. Note occurrences of
Caching xxxx chars at key yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
Reload the page and note occurrences of "Caching abcd chars at key ..." and occurrences of "Serving from cache"
Note that there is only 1 cache hit for the filter box and that superset generated different cache keys then it did in step 1
Inspect the cache for the cache keys observed in step 1 and note that they contain multiple queries...they should contain a single query.

Environment

(please complete the following information):

superset version: 0.26.3
python version: 2.7
node.js version: not installed
npm version: `not installed

Checklist

Make sure these boxes are checked before submitting your issue - thank you!

I have checked the superset logs for python stacktraces and included it here as text if there are any.
[x ] I have reproduced the issue with at least the latest released version of superset.
I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

Add any other context about the problem here.

issue-label-bot · 2019-06-06T15:13:31Z

Issue-Label Bot is automatically applying the label #bug to this issue, with a confidence of 0.95. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

mistercrunch · 2019-06-07T05:35:02Z

Took a quick 👀 at the code and things look ok. 0.26.3 is old, I'm guessing this is addressed on master, can you or someone else confirm/infirm on master

smokemonster99 · 2019-06-07T19:36:19Z

We're looking to upgrade to latest in next few weeks, I will be checking it then and will reply back.

smokemonster99 · 2019-06-21T13:53:56Z

I have upgraded to 0.33.rc1 in my test environment and should be able to test today

smokemonster99 · 2019-06-21T16:21:50Z

@mistercrunch I have verified the same issue exists, this bug seriously impacts performance when dashboards use filter boxes with multiple queries.

villebro · 2019-06-23T19:18:42Z

I can take a look at this.

villebro · 2019-06-26T19:26:20Z

Hmm, I was unable to replicate the bug with three filters on master. Can you rebuild a Filter Box from scratch on a recent build and verify that the same bug still persists?

smokemonster99 · 2019-06-26T19:47:27Z

Sure, I reproduced it on v33.0. I have a filter box with 8 filters then i have 5 charts on the same dashboard. I will get you logs shortly. I wonder if you need more filters to reproduce it...

smokemonster99 · 2019-06-26T21:28:16Z

Here are 3 log files.
first-db-load-zero-loaded-from-cache.txt
This one is logs from a fresh load of the dashboard with no items in cache. You see 13 'loaded_from_source' as expected.

2nd-db-load-6fromcache-7fromsource.txt
This file is the next dashboard load and here you see 6 items loaded from cache and 7 from source. All the items should have loaded from cache.

3rd-db-load-7fromcache-6fromsource.txt
This next dashboard load you see 7 items loaded from cache and 6 from source.

If I was to continue then the next load would serve 8 from cache. Again I have a dashboard with 8 filter queries and 5 charts. I believe it is the filter box queries generating different cache keys instead of same cache key. I am not setting any filter values just reloading the db.

Dashboard image

villebro · 2019-06-29T07:07:31Z

Thanks @smokemonster99 will take a look.

villebro · 2019-06-29T10:38:21Z

Ok I think I see the problem. You've defined your filter box to for the last two hours without time grain. This gets baked into the underlying query, which is in practice accurate to the second, hence the once per second changing cache key. The time filter and granularity should answer the question: for what time period and to which granularity should the retrieved values in the dropdowns be set? Default values are a dashboard property, and are set there (you choose the values, then save the dashboard, after which the chosen filters are the defaults).

villebro · 2019-06-29T13:44:05Z

On second thought that's not it. If you have a dev box where you can easily change the code, adding some extra logging in the following place and collecting a new round of logs should show exactly how those keys are generated:

diff --git a/superset/viz.py b/superset/viz.py
index efbf892c..49989c67 100644
--- a/superset/viz.py
+++ b/superset/viz.py
@@ -357,6 +357,7 @@ class BaseViz(object):
         cache_dict['time_range'] = self.form_data.get('time_range')
         cache_dict['datasource'] = self.datasource.uid
         json_data = self.json_dumps(cache_dict, sort_keys=True)
+        logging.info(json_data)
         return hashlib.md5(json_data.encode('utf-8')).hexdigest()
 
     def get_payload(self, query_obj=None):

smokemonster99 · 2019-07-01T15:11:02Z

v33-filterbox-cache-issuewithdebug.txt

I made the change and performed same test. It is all in 1 file but search for 'NOTE' to see my notes. As I noticed before I see multiple queries (SELECT statements) in a single cache key which might point to the issue in some way, not sure.

To save you time, you can focus on cache key 6910e9e73779023aa4dbcd4e23138aec which got generated on the first reload.
thanks!

mistercrunch · 2019-07-02T04:22:07Z

I'm trying to understand why there are prequeries here, that seems wrong.

smokemonster99 · 2019-07-16T02:28:29Z

Hi @villebro do you need me to perform any further tests on this? What is next step? Thanks for looking at this!

smokemonster99 · 2019-07-17T03:08:22Z

I noticed the cache keys with prequeries were the ones with later cache misses. I removed 'prequeries' from the cache key inside method cache_key using "del cache_dict['prequeries']". Now I get the desired behavior caching wise...I get 13 cache hits on first reload (and no cache misses)

But I am not sure how or why prequeries are getting set and if this hacky change could cause issues with caching of non-filter box charts. Thoughts?

villebro · 2019-07-18T12:56:38Z

Hmm, I am not familiar with prequeries, and I'm having a very hard time understanding how they are created in this case. But tit seems like the query for the first dropdown is being added as a prequery to some of the following dropdowns' query_objs. I will keep looking, but if anyone can add context that would be helpful.

villebro · 2019-07-18T13:40:32Z

I think I was able to replicate, will investigate.

villebro · 2019-07-18T17:30:56Z

Very interesting regression introduced by #4163. I have a working fix, will try to put through a PR soon.

issue-label-bot bot added the !deprecated-label:bug Deprecated label - Use #bug instead label Jun 6, 2019

villebro self-assigned this Jun 23, 2019

villebro mentioned this issue Jul 18, 2019

[Bugfix] Remove prequery properties from query_obj #7896

Merged

3 tasks

villebro closed this as completed in #7896 Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter Box Caching Incorrectly for Multi-Query Use Case #7666

Filter Box Caching Incorrectly for Multi-Query Use Case #7666

smokemonster99 commented Jun 6, 2019

issue-label-bot bot commented Jun 6, 2019

mistercrunch commented Jun 7, 2019

smokemonster99 commented Jun 7, 2019

smokemonster99 commented Jun 21, 2019

smokemonster99 commented Jun 21, 2019 •

edited

Loading

villebro commented Jun 23, 2019

villebro commented Jun 26, 2019

smokemonster99 commented Jun 26, 2019

smokemonster99 commented Jun 26, 2019

villebro commented Jun 29, 2019

villebro commented Jun 29, 2019

villebro commented Jun 29, 2019

smokemonster99 commented Jul 1, 2019

mistercrunch commented Jul 2, 2019

smokemonster99 commented Jul 16, 2019

smokemonster99 commented Jul 17, 2019

villebro commented Jul 18, 2019

villebro commented Jul 18, 2019

villebro commented Jul 18, 2019

Filter Box Caching Incorrectly for Multi-Query Use Case #7666

Filter Box Caching Incorrectly for Multi-Query Use Case #7666

Comments

smokemonster99 commented Jun 6, 2019

Expected results

Actual results

Screenshots

How to reproduce the bug

Environment

Checklist

Additional context

issue-label-bot bot commented Jun 6, 2019

mistercrunch commented Jun 7, 2019

smokemonster99 commented Jun 7, 2019

smokemonster99 commented Jun 21, 2019

smokemonster99 commented Jun 21, 2019 • edited Loading

villebro commented Jun 23, 2019

villebro commented Jun 26, 2019

smokemonster99 commented Jun 26, 2019

smokemonster99 commented Jun 26, 2019

villebro commented Jun 29, 2019

villebro commented Jun 29, 2019

villebro commented Jun 29, 2019

smokemonster99 commented Jul 1, 2019

mistercrunch commented Jul 2, 2019

smokemonster99 commented Jul 16, 2019

smokemonster99 commented Jul 17, 2019

villebro commented Jul 18, 2019

villebro commented Jul 18, 2019

villebro commented Jul 18, 2019

smokemonster99 commented Jun 21, 2019 •

edited

Loading