-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sql_lab): Add custom timestamp type for literal casting for presto timestamps #13082
Conversation
Codecov Report
@@ Coverage Diff @@
## master #13082 +/- ##
===========================================
+ Coverage 53.06% 68.81% +15.74%
===========================================
Files 489 697 +208
Lines 17314 38411 +21097
Branches 4482 0 -4482
===========================================
+ Hits 9187 26431 +17244
- Misses 8127 11980 +3853
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Used for in-line rendering of TIMESTAMP data type | ||
as Presto does not support automatic casting. | ||
""" | ||
return "TIMESTAMP '%s'" % value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing EOF (CI will fail)
@john-bodley this looks like something you'd be very interested in |
superset/db_engine_specs/presto.py
Outdated
@@ -911,10 +912,14 @@ def where_latest_partition( # pylint: disable=too-many-arguments | |||
if values is None: | |||
return None | |||
|
|||
column_names = {column.get("name") for column in columns or []} | |||
column_names = {column.get("name") : column.get('type') for column in columns or []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be column_type_by_name
and thus line 918 will be easier to grok. In that case you could roll lines 918 and 919 into one.
superset/db_engine_specs/presto.py
Outdated
for col_name, value in zip(col_names, values): | ||
if col_name in column_names: | ||
query = query.where(Column(col_name) == value) | ||
col_type = column_names.get(col_name) | ||
if col_type == 'TIMESTAMP': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the existing logic is wrong but I wonder if SQLAlchemy has a mechanism for auto-casting between the type of value
and the column type for the comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be ideal solution or getting Presto/Trino to support auto-casting.
I've looked into it a bit and tried a few things but couldn't get any auto-casting. I've tried to pass in the default SQLAlchemy data types e.g. types.TIMESTAMP
to the Column object as well as using a literal clause but the default data types doesn't seem to know how to auto-cast the type of value. I'm guessing because each DB dialect casts types differently so its up to the user to cast it yourself using one of these approaches to apply SQL-level binding
I have only done this casting for 2 Trino data types TIMESTAMP
and DATE
in this PR since these are the most popular partition column data types for our use case. But all of these data types besides the basic string, integer, boolean types require explicit casting so this is still an issue for other data types.
superset/db_engine_specs/presto.py
Outdated
if col_name in column_names: | ||
query = query.where(Column(col_name) == value) | ||
if col_name in column_type_by_name: | ||
col_type = column_type_by_name.get(col_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col_type = column_type_by_name.get(col_name) |
superset/db_engine_specs/presto.py
Outdated
query = query.where(Column(col_name) == value) | ||
if col_name in column_type_by_name: | ||
col_type = column_type_by_name.get(col_name) | ||
if col_type == 'TIMESTAMP': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if col_type == 'TIMESTAMP': | |
if column_type_by_name.get(col_name) == 'TIMESTAMP': |
@kekwan you'll need to fix a couple of linting issues. |
thanks @john-bodley fixed the linting issues now 😅 |
@john-bodley can i get another look at this when u get a chance? thanks! |
Co-authored-by: John Bodley <[email protected]>
if col_name in column_names: | ||
query = query.where(Column(col_name) == value) | ||
if col_name in column_type_by_name: | ||
if column_type_by_name.get(col_name) == "TIMESTAMP": | ||
query = query.where(Column(col_name, TimeStamp()) == value) | ||
elif column_type_by_name.get(col_name) == "DATE": | ||
query = query.where(Column(col_name, Date()) == value) | ||
else: | ||
query = query.where(Column(col_name) == value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we couldn't just use the existing convert_dttm
here, as we should already have type casting logic in place for these types of conversions?
superset/superset/db_engine_specs/presto.py
Lines 597 to 604 in 8ab45c9
@classmethod | |
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]: | |
tt = target_type.upper() | |
if tt == utils.TemporalType.DATE: | |
return f"""from_iso8601_date('{dttm.date().isoformat()}')""" | |
if tt == utils.TemporalType.TIMESTAMP: | |
return f"""from_iso8601_timestamp('{dttm.isoformat(timespec="microseconds")}')""" # pylint: disable=line-too-long | |
return None |
Something like query = query.where(Column(col_name) == cls.convert_dttm(column_type_by_name.get(col_name), value)
(not tested! thinking out loud)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly possible but the drawbacks I see to re-using this existing method is that:
- We have to explicitly cast the string
value
to a pythondatetime
to call the method only to get it converted back to a string with.isoformat
. from_iso8601_x
doesn't seem to be a recommended function anymore in the latest versions of Presto/Trino https://trino.io/docs/current/language/types.html#date-and-time- This method only works if your date/timestamps are in ISO 8601 format.
- In the future, if more Trino types are to be supported in previewing data with partitions, we will have to create new types like this anyways. Also, I think this is more the sqlalchemy friendly way.
What is the status of this PR? This makes it impossible to work with timeseries data on Trino. :( |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue |
@villebro what do we need to do to push this PR forward? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue |
…o timestamps (apache#13082) * Add custom timestamp type for literal casting for presto timestamps * Remove typo in comment * Use process_bind_params as in sqla docs * Uncommit local superset config * Add DATE literal casting * Fix lint errors and change var name * Get rid of col_type and whitespace * Fix linting * Fix arg type * Fix isort lint error * ran black and isort locally.. * accidentally removed EOF * Dont need eof * Use newer string formatting style from comments Co-authored-by: John Bodley <[email protected]> * Trigger notification * Trigger notification Co-authored-by: Kenny Kwan <[email protected]> Co-authored-by: John Bodley <[email protected]>
…o timestamps (apache#13082) * Add custom timestamp type for literal casting for presto timestamps * Remove typo in comment * Use process_bind_params as in sqla docs * Uncommit local superset config * Add DATE literal casting * Fix lint errors and change var name * Get rid of col_type and whitespace * Fix linting * Fix arg type * Fix isort lint error * ran black and isort locally.. * accidentally removed EOF * Dont need eof * Use newer string formatting style from comments Co-authored-by: John Bodley <[email protected]> * Trigger notification * Trigger notification Co-authored-by: Kenny Kwan <[email protected]> Co-authored-by: John Bodley <[email protected]>
SUMMARY
Described in detail in: #13009 but described here again just for clarity.
When previewing a table in SQL Lab, a
SELECT *
statement is generated based on table data and metadata. This particular bug occur whens previewing tables withTIMESTAMP
partition for Presto/Trino data sources. Unlike some other databases, Presto doesn't automatically convert between varchars and other types. In particular,TIMESTAMP
needs to be explicitly casted. The simplest way is to use the type constructorTIMESTAMP
before the value.Therefore,
where_latest_partition
inpresto.py
needs to cast theTIMESTAMP
when it generates theWHERE
block in the preview table statement. Accomplish this by creating new SQLAlchemy Type which subclasses TypeDecorator adding additional functionality to existing TIMESTAMP type. For each partition column, I check if the type isTIMESTAMP
, and if it is I instantiate the customTimeStamp
class in theColumn
object which will callprocess_bind_params
when the query compiles.Invalid syntax (generated by
/<int:pk>/table/<table_name>/<schema_name>/
->get_table_metadata
->select_star
->where_latest_partition
)Proper syntax
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Before
After
TEST PLAN
ADDITIONAL INFORMATION