-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter Box Caching Incorrectly for Multi-Query Use Case #7666
Comments
Issue-Label Bot is automatically applying the label Links: app homepage, dashboard and code for this bot. |
Took a quick 👀 at the code and things look ok. |
We're looking to upgrade to latest in next few weeks, I will be checking it then and will reply back. |
I have upgraded to 0.33.rc1 in my test environment and should be able to test today |
@mistercrunch I have verified the same issue exists, this bug seriously impacts performance when dashboards use filter boxes with multiple queries. |
I can take a look at this. |
Hmm, I was unable to replicate the bug with three filters on master. Can you rebuild a Filter Box from scratch on a recent build and verify that the same bug still persists? |
Sure, I reproduced it on v33.0. I have a filter box with 8 filters then i have 5 charts on the same dashboard. I will get you logs shortly. I wonder if you need more filters to reproduce it... |
Here are 3 log files. 2nd-db-load-6fromcache-7fromsource.txt 3rd-db-load-7fromcache-6fromsource.txt If I was to continue then the next load would serve 8 from cache. Again I have a dashboard with 8 filter queries and 5 charts. I believe it is the filter box queries generating different cache keys instead of same cache key. I am not setting any filter values just reloading the db. |
Thanks @smokemonster99 will take a look. |
Ok I think I see the problem. You've defined your filter box to for the last two hours without time grain. This gets baked into the underlying query, which is in practice accurate to the second, hence the once per second changing cache key. The time filter and granularity should answer the question: for what time period and to which granularity should the retrieved values in the dropdowns be set? Default values are a dashboard property, and are set there (you choose the values, then save the dashboard, after which the chosen filters are the defaults). |
On second thought that's not it. If you have a dev box where you can easily change the code, adding some extra logging in the following place and collecting a new round of logs should show exactly how those keys are generated: diff --git a/superset/viz.py b/superset/viz.py
index efbf892c..49989c67 100644
--- a/superset/viz.py
+++ b/superset/viz.py
@@ -357,6 +357,7 @@ class BaseViz(object):
cache_dict['time_range'] = self.form_data.get('time_range')
cache_dict['datasource'] = self.datasource.uid
json_data = self.json_dumps(cache_dict, sort_keys=True)
+ logging.info(json_data)
return hashlib.md5(json_data.encode('utf-8')).hexdigest()
def get_payload(self, query_obj=None): |
v33-filterbox-cache-issuewithdebug.txt I made the change and performed same test. It is all in 1 file but search for 'NOTE' to see my notes. As I noticed before I see multiple queries (SELECT statements) in a single cache key which might point to the issue in some way, not sure. To save you time, you can focus on cache key 6910e9e73779023aa4dbcd4e23138aec which got generated on the first reload. |
I'm trying to understand why there are |
Hi @villebro do you need me to perform any further tests on this? What is next step? Thanks for looking at this! |
I noticed the cache keys with prequeries were the ones with later cache misses. I removed 'prequeries' from the cache key inside method cache_key using "del cache_dict['prequeries']". Now I get the desired behavior caching wise...I get 13 cache hits on first reload (and no cache misses) But I am not sure how or why prequeries are getting set and if this hacky change could cause issues with caching of non-filter box charts. Thoughts? |
Hmm, I am not familiar with prequeries, and I'm having a very hard time understanding how they are created in this case. But tit seems like the query for the first dropdown is being added as a prequery to some of the following dropdowns' |
I think I was able to replicate, will investigate. |
Very interesting regression introduced by #4163. I have a working fix, will try to put through a PR soon. |
A filter box that generates multiple queries (multiple filter controls) does not generate the correct cache entries for each filter control. It appears that only 1 (the first) query gets cached correctly. A correct cache entry should have a single query and associated results. What I see for most filter box cache entries are multiple queries from the filter box. So when the filter box gets reloaded, it only finds one of the cached queries and has cache misses for the rest. In each successive reload, the cache misses decrease by one. I suspect only the first query is generating the correct cache entry.
I am thinking the issue might be related to 'class FilterBoxViz' and/or 'run_extra_queries'
Expected results
If I have a filter box with 16 filter controls, then a reload of the page within the cache TTL should generate 16 cache hits.
Actual results
Only 1 cache hit and 15 cache misses. I see 15 more logs like
2019-06-06 14:09:09,893:INFO:root:Caching 1180 chars at key c4dceda1ab0aef7047ef06ebb6063d88
Screenshots
Here is my filter box (16 controls)
How to reproduce the bug
Set logging to INFO. I am using REDIS cache but probably any cache would do.
Caching xxxx chars at key yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
Environment
(please complete the following information):
Checklist
Make sure these boxes are checked before submitting your issue - thank you!
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: