-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add hook for dataset health check #11970
feat: add hook for dataset health check #11970
Conversation
@@ -139,6 +139,10 @@ class ControlPanelsContainer extends React.Component { | |||
controls[s].validationErrors.length > 0, | |||
), | |||
); | |||
if (exploreState.dataset_health_check_message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can populate warning message from here, or from superset-ui-chart-controls
:
https://github.com/apache-superset/superset-ui/blob/1f8a700b953efbed81d2c9970b514d7fe031ad2e/packages/superset-ui-chart-controls/src/shared-controls/index.tsx#L166
but it will need to create another PR in another code base.@ktmud do you have preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component ControlPanelsContainer
is generic and should stay generic. It should not contain logics related to any specific control field.
What if users change datasource or update datasource to point to a physical datasource in the Edit Datasource modal? |
Codecov Report
@@ Coverage Diff @@
## master #11970 +/- ##
==========================================
- Coverage 66.26% 60.24% -6.02%
==========================================
Files 941 912 -29
Lines 45715 45009 -706
Branches 4395 4061 -334
==========================================
- Hits 30293 27116 -3177
- Misses 15287 17893 +2606
+ Partials 135 0 -135
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@@ -197,6 +197,7 @@ export const controls = { | |||
label: t('Dataset'), | |||
default: null, | |||
description: null, | |||
healthCheckMessage: null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe most if not all control configs in this file are not in use any more. We should clean them up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. this line has no use :(
I don't offer any rule in the PR. the goal for this PR is to offer a hook in the codebase. Each company define their own rules. |
I mean does the error message update when users switch datasources? It seems the check only happens at the first page load. |
No. save dataset data is on a different api. This health check will only happen when opening explore view. |
5f0a0bf
to
5f4f2cc
Compare
5f4f2cc
to
88a96b3
Compare
superset/connectors/sqla/models.py
Outdated
@@ -1467,6 +1469,26 @@ class and any keys added via `ExtraCache`. | |||
extra_cache_keys += sqla_query.extra_cache_keys | |||
return extra_cache_keys | |||
|
|||
@event_logger.log_this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can i add event log this way? want to monitor the duration for health check logic, in case it may affect explore view perf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you care most is the actual check function, the part where we check version and skip the health check (happens for most page views) is known to be pretty trivial. So maybe use this code instead:
with event_logger.log_context(action="datasource_health_check") as log:
message = check(self)
...
6b5315f
to
04309c5
Compare
04309c5
to
420adcc
Compare
@@ -698,6 +703,7 @@ def data(self) -> Dict[str, Any]: | |||
data_["fetch_values_predicate"] = self.fetch_values_predicate | |||
data_["template_params"] = self.template_params | |||
data_["is_sqllab_view"] = self.is_sqllab_view | |||
data_["health_check_message"] = self.health_check_message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of output whole "extra", i saw certified data only offer flattened information. I think it's easier for frontend usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the iterations! One blocking comment (you should set health_check
even when check returns None), otherwise LGTM.
I'm also curious to hear how others think about this feature, though.
@junlincc @mistercrunch do you think this align well with overall Superset roadmap?
superset/views/core.py
Outdated
@@ -650,6 +650,10 @@ def explore( # pylint: disable=too-many-locals,too-many-return-statements | |||
f"datasource_id={datasource_id}&" | |||
) | |||
|
|||
# if feature enabled, run some health check rules for sqla datasource | |||
if hasattr(datasource, "health_check") and conf["DATASET_HEALTH_CHECK"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can skip the conf check here. It will be immediately checked by datasource.health_check()
anyway.
superset/connectors/sqla/models.py
Outdated
extra["health_check"] = { | ||
"version": check.version, | ||
"message": message, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to set the health_check
field even if check
message returns None
or empty. Because otherwise this will run for every "good" datasources that didn't have any health check warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also recommend making the message more structured for scalability (i.e. let DATASET_HEALTH_CHECK return a dict instead of a string). Because in the future, you may need to localize the warning message, then it would be better to store an ERROR_CODE + some metadata instead of just a message string. This also opens the door for more actionable warning message where users may click on the warning icon and open a popup for more details.
Maybe we can even define it as a new SupersetError type and use @etr2460 's error message registry to render it.
But of course, this could also be reserved for future refactor/extension. (We can always update the health check version when that happens!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not as strong as SupesetError. Error and Warning should be treated differently.
Let's keep this PR as simple as it is, and we can reiterate when it is really needed. The impact of this feature is still unknown to me, because it depends a lot on the rule implementation. Let's run the core function with some real examples before putting too much efforts on extras.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed. We don't need to go that route if it's too much work. Just laying out my thoughts here in case we want to double down on this feature. Even if the message depends on specific rule implementation, we can enforce a basic structure so that it's possible the override the message renders.
Error and Warning should be treated differently.
This I must humbly disagree. Both are actually a subclass of Alert
in most UI libraries, and we see in a lot of places the concept of "error levels" (error, warning, info). Most of the time, you'd see warnings and errors being rendered by the same component, just different colors or icons. And in case we want to render them differently, the shared infrastructure of custom renderer registry comes handy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. i will keep this work in mind. If we see this feature is useful, I will polish it with localization and other re-usable UI component.
ea3b23b
to
5146afa
Compare
it('should render health check message', () => { | ||
const wrapper = setup(); | ||
const icons = wrapper.find(Icon); | ||
expect(icons.first().prop('name')).toBe('alert-solid'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also make sure the warning message is shown here?
return ( | ||
<Styles className="DatasourceControl"> | ||
<ControlHeader {...this.props} /> | ||
<div> | ||
<div style={{ display: 'flex' }}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of an inline style, should we use a styled.div
to apply this style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
# It's possible to add a dataset health check logic which is specific to your system. | ||
# It will get executed each time when user open a chart's explore view. | ||
DATASET_HEALTH_CHECK = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the signature about the health check to the comment here?
Or maybe even better, include a default function that always passes. It's actually a bit unclear to me how this should be used (should I return True if the dataset passes? None?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Jesse's comment above, if the dataset passes, it will still add a health_check
section into db record (so that not running check again).
@@ -650,6 +650,10 @@ def explore( # pylint: disable=too-many-locals,too-many-return-statements | |||
f"datasource_id={datasource_id}&" | |||
) | |||
|
|||
# if feature enabled, run some health check rules for sqla datasource | |||
if hasattr(datasource, "health_check"): | |||
datasource.health_check() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@graceguo-supercat why in this instance do we not require to force a check, i.e., datasource.health_check(force=True)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should run check again when user updates dataset, or when health check rule's version is changed. Otherwise the healthy status will not be changed.
This function is triggered at the time explore view is opened from browser, so it doesn't need to force run the check.
SUMMARY
Inside airbnb, we spent a lot of time on Superset dashboard and query performance issues. We create some guidelines for airbnb users, to help them write better queries to create virtual dataset. So we want to add a dataset health check logic when user explore a chart, and show some warning message if health check returns any message.
To decide whether a dataset is good or bad, this logic is probably very specific to each corporation. Some rules applied in airbnb may totally not relevant for Dropbox. So in the open source codebase, we only add an entry point,
DATASET_HEALTH_CHECK = _your function to check dataset quality_
which allow each company to implement and hook their own rules.
For example, after implements a rule that checks if the chart is using a virtual dataset:
I will see a warning message like this:
Note
The health_check result will be saved with datasource record
extra
column. When user update datasource, orDATASET_HEALTH_CHECK.version
is changed, health_check function will be triggered again and update the record in the database. Please see how we discussed different ideas for this feature.TEST PLAN
CI
ADDITIONAL INFORMATION
cc @ktmud @etr2460