-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: cache dashboard bootstrap data #11234
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11234 +/- ##
==========================================
- Coverage 61.47% 59.05% -2.42%
==========================================
Files 832 797 -35
Lines 39445 38232 -1213
Branches 3598 3396 -202
==========================================
- Hits 24248 22579 -1669
- Misses 15015 15471 +456
Partials 182 182
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
superset/connectors/base/models.py
Outdated
if is_feature_enabled("DASHBOARD_CACHE"): | ||
from superset.models.dashboard import Dashboard | ||
|
||
Dashboard.clear_cache_for_datasource(datasource_id=self.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to add this to SQLAlchemy events, just like what we did with slices, but it was somehow triggered multiple times (4 or 5), which could have performance risks when a datasource is used in multiple dashboards.
So I moved it to this BaseDatasource
API instead. The drawback is it may not trigger if the datasource is updated from FAB CRUD view (an unlikely use case).
@@ -131,7 +140,7 @@ class Dashboard( # pylint: disable=too-many-instance-attributes | |||
css = Column(Text) | |||
json_metadata = Column(Text) | |||
slug = Column(String(255), unique=True) | |||
slices = relationship("Slice", secondary=dashboard_slices, backref="dashboards") | |||
slices = relationship(Slice, secondary=dashboard_slices, backref="dashboards") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove quotes for consistency.
@@ -145,7 +154,7 @@ class Dashboard( # pylint: disable=too-many-instance-attributes | |||
] | |||
|
|||
def __repr__(self) -> str: | |||
return self.dashboard_title or str(self.id) | |||
return f"Dashboard<{self.slug or self.id}>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__repr__
is used in cache key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -44,6 +45,7 @@ def __init__(self, user: User, model_id: int): | |||
def run(self) -> Model: | |||
self.validate() | |||
try: | |||
Dashboard.clear_cache_for_slice(slice_id=self._model_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't use before_delete
event for Slice
because the association record in the relationship table datasource_slice
is deleted before the Slice
object is deleted.
@@ -108,7 +108,7 @@ def set_dash_metadata( | |||
and obj["meta"]["chartId"] | |||
): | |||
chart_id = obj["meta"]["chartId"] | |||
obj["meta"]["uuid"] = uuid_map[chart_id] | |||
obj["meta"]["uuid"] = uuid_map.get(chart_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will throw 500 errors when chart don't have uuid, which could happen when chart is deleted but the dashboard cache is not purged for some reason.
@@ -50,6 +50,7 @@ | |||
SQL_MAX_ROW = 666 | |||
SQLLAB_CTAS_NO_LIMIT = True # SQL_MAX_ROW will not take affect for the CTA queries | |||
FEATURE_FLAGS = { | |||
**FEATURE_FLAGS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can override the configs from SUPERSET_CONFIG_PATH
(I've set it to ~/.superset/config.py
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this do anything? FEATURE_FLAGS
is empty by default in the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But SUPERSET_CONFIG_PATH
(a config file) is loaded before SUPERSET_CONFIG
(a Python module). This allows users to override the feature flags in SUPERSET_CONFIG_PATH
(a file not tracked in Git) when starting superset like this:
SUPERSET_CONFIG=tests.superset_test_config superset run
ebe41c8
to
ee36591
Compare
@@ -14,4 +14,3 @@ | |||
# KIND, either express or implied. See the License for the | |||
# specific language governing permissions and limitations | |||
# under the License. | |||
from . import models, views |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up to break circular imports, because I want to import DruidMetric
and DruidColumn
from dashboard.py
.
@@ -719,8 +718,5 @@ class FavStar(Model): # pylint: disable=too-few-public-methods | |||
|
|||
# events for updating tags | |||
if is_feature_enabled("TAGGING_SYSTEM"): | |||
sqla.event.listen(Dashboard, "after_insert", DashboardUpdater.after_insert) | |||
sqla.event.listen(Dashboard, "after_update", DashboardUpdater.after_update) | |||
sqla.event.listen(Dashboard, "after_delete", DashboardUpdater.after_delete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a duplicate of https://github.com/apache/incubator-superset/blob/93fdf1d64465b3378de65fb1b058e1a04742a70f/superset/models/dashboard.py#L573
cc @suddjian
Removed so that it's possible to import models.core
from dashboard.py
. It makes more sense to let Dashboard
depends on core
(if needed) than vice versa.
sqla.event.listen(Dashboard, "after_update", event_after_dashboard_changed) | ||
update_thumbnail: OnDashboardChange = lambda _, __, target: target.update_thumbnail() | ||
sqla.event.listen(Dashboard, "after_insert", update_thumbnail) | ||
sqla.event.listen(Dashboard, "after_update", update_thumbnail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some refactor for consistency.
4ba6fc2
to
1183dde
Compare
@@ -1040,7 +1038,7 @@ def copy_dash( # pylint: disable=no-self-use | |||
for value in data["positions"].values(): | |||
if isinstance(value, dict) and value.get("meta", {}).get("chartId"): | |||
old_id = value["meta"]["chartId"] | |||
new_id = old_to_new_slice_ids[old_id] | |||
new_id = old_to_new_slice_ids.get(old_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a 500 error when duplicating a dashboard with deleted slices.
96590c2
to
19c6395
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add unit tests for the new decorators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d78ab7c
to
325e5b8
Compare
325e5b8
to
7939b0a
Compare
Added some basic tests. |
ca38c89
to
985fdb0
Compare
SUMMARY
Large and complex dashboards can be very slow to load because they need to read all the related slices and datasources and they all have go through SQL queries with no caching enabled.
A previous attempt to cache the rendered dashboard page had to be reverted because it did not consider user-specific data on that page. Why don't we cache the non-user-specific bootstrap data instead?
This PR extracts the data building logics from the
/dashboard/:id_or_slug
Flask view to theDashboard
model and uses Flack caching decorator to manage the cache. Cache will be cleaned up when any one of the dashboard, the associated slices, datasources, or table columns/metrics is updated.When tested with a dashboard of 300+ slices in our staging environment, this change can save up to more than 3 seconds of page load time.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TEST PLAN
Make sure the dashboard page still work as expected.
ADDITIONAL INFORMATION