Skip to content
This repository has been archived by the owner on Dec 10, 2021. It is now read-only.

chore: add extra field for Datasource control config #856

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -168,9 +168,10 @@ const datasourceControl: SharedControlConfig<'DatasourceControl'> = {
label: t('Datasource'),
default: null,
description: null,
mapStateToProps: ({ datasource }) => {
mapStateToProps: ({ datasource, dataset_health_check_message }) => {
return {
datasource,
healthCheckMessage: dataset_health_check_message,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an attribute of datasource? I'd imagine the same check may become available to the dataset API endpoint, too.

Copy link
Author

@graceguo-supercat graceguo-supercat Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. i feel datasource contains data that stored in database, while health check message is sanity check on-the-fly, and this message won't persist in database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a state associated with a specific datasource. It could be considered an extended read-only prop from the API's standpoint---similar to the various transformation we already did for order_by and verbose_map in datasource.data.

The check may even become an instance method for the BaseDatasource model so it's easier to access.

Imagine if we need to access the same info for multi datasources (e.g. display the warning icon in dataset CRUD list view), having the message attached to the datasource would make rendering much easier.

Copy link
Author

@graceguo-supercat graceguo-supercat Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You suggest health_check_message could be part of datasource.data but is not stored in database?

Yes that will be good because we can access it whenever we call datasource.data. But I do have a little concern: what if this health check become heavy and slow? so I think we should keep the sanity check away from core data model, and only trigger it when needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, which is why it doesn't have to be inside the datasource.data property but could be an instance method and added to the API output in the view handlers. We can also change datasource.data to be a method instead of property so each call can choose whether to run these checks or not.

Copy link
Author

@graceguo-supercat graceguo-supercat Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new example, i understand your idea better. Is persisting to db really necessary? If so, then we need to update db record:

  • when rule changed: check rule version, or
  • when datasource is updated: How to make sure this check_health get called whenever datasource is updated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked airbnb's airflow job: it will update extra periodically, which means if you want to re-use the same field, airflow jobs and sanity check may step on each other's toe.

Copy link
Contributor

@ktmud ktmud Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super necessary, but as you mentioned earlier, the health check could get expensive. In case we want to cache the results somewhere, this would be the way to do it.

When datasource is updated, we can simply add a one liner datasource.check_health(commit=False, force=True) to the update command. Then we update check_health to allow a force health check option that skips the version check.

I think the Airflow job can be updated pretty easily. In fact, since you don't know what other future features may put things into extra, any updates on the extra field should be a merge operation instead of a simple override.

Copy link
Author

@graceguo-supercat graceguo-supercat Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the health check could get expensive., it's not the concern right now. If most of the check is expensive, i will choose to send health check async in the explore view, instead of adding check during the explore request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will go direction that save health_check results into datasource extra field. So there is no need to change datasource control config. close this PR.

};
},
};
Expand Down
1 change: 1 addition & 0 deletions packages/superset-ui-chart-controls/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ export interface DatasourceMeta {
export interface ControlPanelState {
form_data: QueryFormData;
datasource: DatasourceMeta | null;
dataset_health_check_message?: string | null;
controls: ControlStateMapping;
}

Expand Down