Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Box Caching Incorrectly for Multi-Query Use Case #7666

Closed
2 tasks done
smokemonster99 opened this issue Jun 6, 2019 · 19 comments · Fixed by #7896
Closed
2 tasks done

Filter Box Caching Incorrectly for Multi-Query Use Case #7666

smokemonster99 opened this issue Jun 6, 2019 · 19 comments · Fixed by #7896
Assignees
Labels
!deprecated-label:bug Deprecated label - Use #bug instead

Comments

@smokemonster99
Copy link

A filter box that generates multiple queries (multiple filter controls) does not generate the correct cache entries for each filter control. It appears that only 1 (the first) query gets cached correctly. A correct cache entry should have a single query and associated results. What I see for most filter box cache entries are multiple queries from the filter box. So when the filter box gets reloaded, it only finds one of the cached queries and has cache misses for the rest. In each successive reload, the cache misses decrease by one. I suspect only the first query is generating the correct cache entry.

I am thinking the issue might be related to 'class FilterBoxViz' and/or 'run_extra_queries'

Expected results

If I have a filter box with 16 filter controls, then a reload of the page within the cache TTL should generate 16 cache hits.

Actual results

Only 1 cache hit and 15 cache misses. I see 15 more logs like
2019-06-06 14:09:09,893:INFO:root:Caching 1180 chars at key c4dceda1ab0aef7047ef06ebb6063d88

Screenshots

Here is my filter box (16 controls)
Screen Shot 2019-06-06 at 9 59 17 AM

How to reproduce the bug

Set logging to INFO. I am using REDIS cache but probably any cache would do.

  1. Load a dashboard containing a filter box with more than 1 filter control. Note occurrences of
    Caching xxxx chars at key yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
  2. Reload the page and note occurrences of "Caching abcd chars at key ..." and occurrences of "Serving from cache"
  3. Note that there is only 1 cache hit for the filter box and that superset generated different cache keys then it did in step 1
  4. Inspect the cache for the cache keys observed in step 1 and note that they contain multiple queries...they should contain a single query.

Environment

(please complete the following information):

  • superset version: 0.26.3
  • python version: 2.7
  • node.js version: not installed
  • npm version: `not installed

Checklist

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • [x ] I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

Add any other context about the problem here.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label #bug to this issue, with a confidence of 0.95. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@issue-label-bot issue-label-bot bot added the !deprecated-label:bug Deprecated label - Use #bug instead label Jun 6, 2019
@mistercrunch
Copy link
Member

Took a quick 👀 at the code and things look ok. 0.26.3 is old, I'm guessing this is addressed on master, can you or someone else confirm/infirm on master

@smokemonster99
Copy link
Author

We're looking to upgrade to latest in next few weeks, I will be checking it then and will reply back.

@smokemonster99
Copy link
Author

I have upgraded to 0.33.rc1 in my test environment and should be able to test today

@smokemonster99
Copy link
Author

smokemonster99 commented Jun 21, 2019

@mistercrunch I have verified the same issue exists, this bug seriously impacts performance when dashboards use filter boxes with multiple queries.

@villebro
Copy link
Member

I can take a look at this.

@villebro villebro self-assigned this Jun 23, 2019
@villebro
Copy link
Member

Hmm, I was unable to replicate the bug with three filters on master. Can you rebuild a Filter Box from scratch on a recent build and verify that the same bug still persists?

@smokemonster99
Copy link
Author

Sure, I reproduced it on v33.0. I have a filter box with 8 filters then i have 5 charts on the same dashboard. I will get you logs shortly. I wonder if you need more filters to reproduce it...

@smokemonster99
Copy link
Author

Here are 3 log files.
first-db-load-zero-loaded-from-cache.txt
This one is logs from a fresh load of the dashboard with no items in cache. You see 13 'loaded_from_source' as expected.

2nd-db-load-6fromcache-7fromsource.txt
This file is the next dashboard load and here you see 6 items loaded from cache and 7 from source. All the items should have loaded from cache.

3rd-db-load-7fromcache-6fromsource.txt
This next dashboard load you see 7 items loaded from cache and 6 from source.

If I was to continue then the next load would serve 8 from cache. Again I have a dashboard with 8 filter queries and 5 charts. I believe it is the filter box queries generating different cache keys instead of same cache key. I am not setting any filter values just reloading the db.

Dashboard image
Screen Shot 2019-06-26 at 4 27 07 PM

@villebro
Copy link
Member

Thanks @smokemonster99 will take a look.

@villebro
Copy link
Member

Ok I think I see the problem. You've defined your filter box to for the last two hours without time grain. This gets baked into the underlying query, which is in practice accurate to the second, hence the once per second changing cache key. The time filter and granularity should answer the question: for what time period and to which granularity should the retrieved values in the dropdowns be set? Default values are a dashboard property, and are set there (you choose the values, then save the dashboard, after which the chosen filters are the defaults).

@villebro
Copy link
Member

On second thought that's not it. If you have a dev box where you can easily change the code, adding some extra logging in the following place and collecting a new round of logs should show exactly how those keys are generated:

diff --git a/superset/viz.py b/superset/viz.py
index efbf892c..49989c67 100644
--- a/superset/viz.py
+++ b/superset/viz.py
@@ -357,6 +357,7 @@ class BaseViz(object):
         cache_dict['time_range'] = self.form_data.get('time_range')
         cache_dict['datasource'] = self.datasource.uid
         json_data = self.json_dumps(cache_dict, sort_keys=True)
+        logging.info(json_data)
         return hashlib.md5(json_data.encode('utf-8')).hexdigest()
 
     def get_payload(self, query_obj=None):

@smokemonster99
Copy link
Author

v33-filterbox-cache-issuewithdebug.txt

I made the change and performed same test. It is all in 1 file but search for 'NOTE' to see my notes. As I noticed before I see multiple queries (SELECT statements) in a single cache key which might point to the issue in some way, not sure.

To save you time, you can focus on cache key 6910e9e73779023aa4dbcd4e23138aec which got generated on the first reload.
thanks!

@mistercrunch
Copy link
Member

I'm trying to understand why there are prequeries here, that seems wrong.

@smokemonster99
Copy link
Author

Hi @villebro do you need me to perform any further tests on this? What is next step? Thanks for looking at this!

@smokemonster99
Copy link
Author

I noticed the cache keys with prequeries were the ones with later cache misses. I removed 'prequeries' from the cache key inside method cache_key using "del cache_dict['prequeries']". Now I get the desired behavior caching wise...I get 13 cache hits on first reload (and no cache misses)

But I am not sure how or why prequeries are getting set and if this hacky change could cause issues with caching of non-filter box charts. Thoughts?

@villebro
Copy link
Member

Hmm, I am not familiar with prequeries, and I'm having a very hard time understanding how they are created in this case. But tit seems like the query for the first dropdown is being added as a prequery to some of the following dropdowns' query_objs. I will keep looking, but if anyone can add context that would be helpful.

@villebro
Copy link
Member

I think I was able to replicate, will investigate.

@villebro
Copy link
Member

Very interesting regression introduced by #4163. I have a working fix, will try to put through a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
!deprecated-label:bug Deprecated label - Use #bug instead
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants