Skip to content
This repository has been archived by the owner on Dec 10, 2021. It is now read-only.

chore: add extra field for Datasource control config #856

Conversation

graceguo-supercat
Copy link

@graceguo-supercat graceguo-supercat commented Dec 9, 2020

🏆 Enhancements

We want to display health check message for a datasource: pending PR
This PR is to add extra field in datasource control config.

@ktmud

@graceguo-supercat graceguo-supercat requested a review from a team as a code owner December 9, 2020 18:21
@vercel
Copy link

vercel bot commented Dec 9, 2020

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/superset/superset-ui/lysq3pakw
✅ Preview: https://superset-ui-git-gg-addfieldtocontrolconfig.superset.vercel.app

@codecov
Copy link

codecov bot commented Dec 9, 2020

Codecov Report

Merging #856 (dcba8ac) into master (f45cfaf) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #856   +/-   ##
=======================================
  Coverage   26.55%   26.55%           
=======================================
  Files         377      377           
  Lines        8178     8178           
  Branches     1117     1117           
=======================================
  Hits         2172     2172           
  Misses       5878     5878           
  Partials      128      128           
Impacted Files Coverage Δ
...et-ui-chart-controls/src/shared-controls/index.tsx 43.52% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f45cfaf...dcba8ac. Read the comment docs.

@ktmud
Copy link
Contributor

ktmud commented Dec 9, 2020

This feels more like a state tied to a specific datasource other than a global prop. I'd imagine it being set in mapStateToProps other than component initialization, so it can be updated when the datasource changes.

@graceguo-supercat
Copy link
Author

This feels more like a state tied to a specific datasource other than a global prop. I'd imagine it being set in mapStateToProps other than component initialization, so it can be updated when the datasource changes.

Good idea!! i will follow this direction and add fix.

return {
datasource,
healthCheckMessage: dataset_health_check_message,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an attribute of datasource? I'd imagine the same check may become available to the dataset API endpoint, too.

Copy link
Author

@graceguo-supercat graceguo-supercat Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. i feel datasource contains data that stored in database, while health check message is sanity check on-the-fly, and this message won't persist in database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a state associated with a specific datasource. It could be considered an extended read-only prop from the API's standpoint---similar to the various transformation we already did for order_by and verbose_map in datasource.data.

The check may even become an instance method for the BaseDatasource model so it's easier to access.

Imagine if we need to access the same info for multi datasources (e.g. display the warning icon in dataset CRUD list view), having the message attached to the datasource would make rendering much easier.

Copy link
Author

@graceguo-supercat graceguo-supercat Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You suggest health_check_message could be part of datasource.data but is not stored in database?

Yes that will be good because we can access it whenever we call datasource.data. But I do have a little concern: what if this health check become heavy and slow? so I think we should keep the sanity check away from core data model, and only trigger it when needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, which is why it doesn't have to be inside the datasource.data property but could be an instance method and added to the API output in the view handlers. We can also change datasource.data to be a method instead of property so each call can choose whether to run these checks or not.

Copy link
Author

@graceguo-supercat graceguo-supercat Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new example, i understand your idea better. Is persisting to db really necessary? If so, then we need to update db record:

  • when rule changed: check rule version, or
  • when datasource is updated: How to make sure this check_health get called whenever datasource is updated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked airbnb's airflow job: it will update extra periodically, which means if you want to re-use the same field, airflow jobs and sanity check may step on each other's toe.

Copy link
Contributor

@ktmud ktmud Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super necessary, but as you mentioned earlier, the health check could get expensive. In case we want to cache the results somewhere, this would be the way to do it.

When datasource is updated, we can simply add a one liner datasource.check_health(commit=False, force=True) to the update command. Then we update check_health to allow a force health check option that skips the version check.

I think the Airflow job can be updated pretty easily. In fact, since you don't know what other future features may put things into extra, any updates on the extra field should be a merge operation instead of a simple override.

Copy link
Author

@graceguo-supercat graceguo-supercat Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the health check could get expensive., it's not the concern right now. If most of the check is expensive, i will choose to send health check async in the explore view, instead of adding check during the explore request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will go direction that save health_check results into datasource extra field. So there is no need to change datasource control config. close this PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants