-
Notifications
You must be signed in to change notification settings - Fork 273
chore: add extra field for Datasource control config #856
chore: add extra field for Datasource control config #856
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/superset/superset-ui/lysq3pakw |
Codecov Report
@@ Coverage Diff @@
## master #856 +/- ##
=======================================
Coverage 26.55% 26.55%
=======================================
Files 377 377
Lines 8178 8178
Branches 1117 1117
=======================================
Hits 2172 2172
Misses 5878 5878
Partials 128 128
Continue to review full report at Codecov.
|
This feels more like a state tied to a specific |
Good idea!! i will follow this direction and add fix. |
7d4b9a6
to
23de1f6
Compare
23de1f6
to
9dae02d
Compare
9dae02d
to
dcba8ac
Compare
return { | ||
datasource, | ||
healthCheckMessage: dataset_health_check_message, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an attribute of datasource
? I'd imagine the same check may become available to the dataset API endpoint, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. i feel datasource
contains data that stored in database, while health check message is sanity check on-the-fly, and this message won't persist in database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a state associated with a specific datasource. It could be considered an extended read-only prop from the API's standpoint---similar to the various transformation we already did for order_by
and verbose_map
in datasource.data
.
The check may even become an instance method for the BaseDatasource
model so it's easier to access.
Imagine if we need to access the same info for multi datasources (e.g. display the warning icon in dataset CRUD list view), having the message attached to the datasource would make rendering much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You suggest health_check_message could be part of datasource.data but is not stored in database?
Yes that will be good because we can access it whenever we call datasource.data. But I do have a little concern: what if this health check become heavy and slow? so I think we should keep the sanity check away from core data model, and only trigger it when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, which is why it doesn't have to be inside the datasource.data property but could be an instance method and added to the API output in the view handlers. We can also change datasource.data
to be a method instead of property so each call can choose whether to run these checks or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new example, i understand your idea better. Is persisting to db really necessary? If so, then we need to update db record:
- when rule changed: check rule version, or
- when datasource is updated: How to make sure this check_health get called whenever datasource is updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked airbnb's airflow job: it will update extra
periodically, which means if you want to re-use the same field, airflow jobs and sanity check may step on each other's toe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not super necessary, but as you mentioned earlier, the health check could get expensive. In case we want to cache the results somewhere, this would be the way to do it.
When datasource is updated, we can simply add a one liner datasource.check_health(commit=False, force=True)
to the update command. Then we update check_health
to allow a force health check option that skips the version check.
I think the Airflow job can be updated pretty easily. In fact, since you don't know what other future features may put things into extra
, any updates on the extra
field should be a merge
operation instead of a simple override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though the health check could get expensive.
, it's not the concern right now. If most of the check is expensive, i will choose to send health check async in the explore view, instead of adding check during the explore request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will go direction that save health_check results into datasource extra
field. So there is no need to change datasource control config. close this PR.
🏆 Enhancements
We want to display health check message for a datasource: pending PR
This PR is to add extra field in datasource control config.
@ktmud