Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-15] Adding database level python-date-format #8474

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -954,8 +954,8 @@ cache store when upgrading an existing environment.
entire setup. If not, background jobs can get scheduled multiple times
resulting in weird behaviors like duplicate delivery of reports,
higher than expected load / traffic etc.
* SQL Lab will only run your queries asynchronously if you enable

* SQL Lab will only run your queries asynchronously if you enable
"Asynchronous Query Execution" in your database settings.


Expand Down Expand Up @@ -1306,7 +1306,17 @@ SIP-15

`SIP-15 <https://github.com/apache/incubator-superset/issues/6360>`_ aims to ensure that time intervals are handled in a consistent and transparent manner for both the Druid and SQLAlchemy connectors.

Prior to SIP-15 SQLAlchemy used inclusive endpoints however these may behave like exclusive depending on the time column (refer to the SIP for details) and thus the endpoint behavior could be unknown. To aid with transparency the current endpoint behavior is explicitly called out in the chart time range (post SIP-15 this will be [start, end) for all connectors and databases). One can override the defaults on a per database level via the ``extra``
Prior to SIP-15 SQLAlchemy used inclusive endpoints however these may behave like exclusive for string columns (due to lexicographical ordering) if no formatting was defined and the column formatting did not conform to an ISO 8601 date-time (refer to the SIP for details).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be a convenience for the reader to add a link to the SIP issue on Github.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willbarrett there's a link to the GitHub issue on line #1307.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whup, I missed the context. Thanks!


To remedy this rather than having to define the date/time format for every non-IS0 8601 date-time column, once can define a default column mapping on a per database level via the ``extra`` parameter ::

{
"python_date_format_by_column_name": {
"ds": "%Y-%m-%d"
}
}

Additionally to aid with transparency the current endpoint behavior is explicitly called out in the chart time range (post SIP-15 this will be [start, end) for all connectors and databases). One can override the defaults on a per database level via the ``extra``
parameter ::

{
Expand Down
4 changes: 3 additions & 1 deletion superset/assets/src/datasource/DatasourceEditor.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,9 @@ function ColumnCollectionTable({
you will need to define an expression and type for
transforming the string into a date or timestamp. Note
currently time zones are not supported. If time is stored
in epoch format, put \`epoch_s\` or \`epoch_ms\`.`)}
in epoch format, put \`epoch_s\` or \`epoch_ms\`. If no pattern
is specified we fall back to using the optional defaults on a per
database/column name level via the extra parameter.`)}
</div>
}
control={<TextControl />}
Expand Down
5 changes: 4 additions & 1 deletion superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -679,7 +679,10 @@ class CeleryConfig(object):
# SQLALCHEMY_DATABASE_URI by default if set to `None`
SQLALCHEMY_EXAMPLES_URI = None

# Note currently SIP-15 feature is WIP and should not be enabled.
# SIP-15 should be enabled for all new Superset deployments which ensures that the time
# range endpoints adhere to [start, end). For existing deployments admins should provide
# a dedicated period of time to allow chart producers to update their charts before
# mass migrating all charts to use the [start, end) interval.
SIP_15_ENABLED = False
SIP_15_DEFAULT_TIME_RANGE_ENDPOINTS = ["unknown", "inclusive"]
SIP_15_TOAST_MESSAGE = 'Action Required: Preview then save your chart using the new time range endpoints <a href="{url}" class="alert-link">here</a>.'
Expand Down
30 changes: 26 additions & 4 deletions superset/connectors/sqla/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,15 +169,19 @@ def get_time_filter(
col = self.get_sqla_col(label="__time")
l = []
if start_dttm:
l.append(col >= text(self.dttm_sql_literal(start_dttm)))
l.append(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is now configuration for inclusive/exclusive start points, should we have another if condition in this block supporting col > <start time literal statement>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willbarrett per SIP-15 the end state is [start, end) i.e., this should not be configurable. The reason for the interim configuration is to allow users to preview their charts using the [start, end) time range endpoints before saving and thus persisting this information in the chart form data.

The time_range_endpoints form-data is merely there as a mechanism to provide both the current and future states simultaneously for a period which enabling an institution to manually fix charts as necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for clarifying!

col >= text(self.dttm_sql_literal(start_dttm, time_range_endpoints))
)
if end_dttm:
if (
time_range_endpoints
and time_range_endpoints[1] == utils.TimeRangeEndpoint.EXCLUSIVE
):
l.append(col < text(self.dttm_sql_literal(end_dttm)))
l.append(
col < text(self.dttm_sql_literal(end_dttm, time_range_endpoints))
)
else:
l.append(col <= text(self.dttm_sql_literal(end_dttm)))
l.append(col <= text(self.dttm_sql_literal(end_dttm, None)))
return and_(*l)

def get_timestamp_expression(
Expand Down Expand Up @@ -218,7 +222,13 @@ def lookup_obj(lookup_column):

return import_datasource.import_simple_obj(db.session, i_column, lookup_obj)

def dttm_sql_literal(self, dttm: DateTime) -> str:
def dttm_sql_literal(
self,
dttm: DateTime,
time_range_endpoints: Optional[
Tuple[utils.TimeRangeEndpoint, utils.TimeRangeEndpoint]
],
) -> str:
"""Convert datetime object to a SQL expression string"""
sql = (
self.table.database.db_engine_spec.convert_dttm(self.type, dttm)
Expand All @@ -231,6 +241,18 @@ def dttm_sql_literal(self, dttm: DateTime) -> str:

tf = self.python_date_format

# Fallback to the default format (if defined) only if the SIP-15 time range
# endpoints, i.e., [start, end) are enabled.
if not tf and time_range_endpoints == (
utils.TimeRangeEndpoint.INCLUSIVE,
utils.TimeRangeEndpoint.EXCLUSIVE,
):
tf = (
self.table.database.get_extra()
.get("python_date_format_by_column_name", {})
.get(self.column_name)
)

if tf:
if tf in ["epoch_ms", "epoch_s"]:
seconds_since_epoch = int(dttm.timestamp())
Expand Down
3 changes: 3 additions & 0 deletions superset/connectors/sqla/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ class TableColumnInlineView(CompactCRUDMixin, SupersetModelView):
"define an expression and type for transforming the string into a "
"date or timestamp. Note currently time zones are not supported. "
"If time is stored in epoch format, put `epoch_s` or `epoch_ms`."
"If no pattern is specified we fall back to using the optional "
"defaults on a per database/column name level via the extra parameter."
""
),
True,
),
Expand Down