Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make timestamp expression native SQLAlchemy element #7131

Merged
merged 10 commits into from
May 30, 2019
Merged

Make timestamp expression native SQLAlchemy element #7131

merged 10 commits into from
May 30, 2019

Conversation

villebro
Copy link
Member

@villebro villebro commented Mar 26, 2019

SUMMARY

The functional part of this PR makes the timestamp expression a native SQLAlchemy element that respects quoting rules for column names (based largely on https://docs.sqlalchemy.org/en/latest/core/compiler.html) . This is a continuation of what was started in #6897, which was really just a hack to make Postgres work, but didn't address e.g. reserved keywords or similar problems that might arise in other dbs. This PR adds a new class TimestampExpression, which can be used in a SQLAlchemy query, which keeps the target column as a native Core element. The column name is only rendered to text once the query is compiled, which ensures that engine specific quoting rules are respected.

Refactoring:

  • Added mypy typing where code was changed.
  • Moved some functional parts of sqla/models/get_timestamp_expression() to db_engine_specs/get_timestamp_expr().
  • Removed models/core/grains() which was no longer needed.
TEST PLAN

Added unit tests that test the central features and tested timeseries graphs using both column and expression, both with and without time grains.

ADDITIONAL INFORMATION
[ ] Has associated issue:
[ ] Changes UI
[ ] Requires DB Migration. Confirm DB Migration upgrade and downgrade tested.
[ ] Introduces new feature or API
[ ] Removes existing feature or API
[x] Fixes bug
[x] Refactors code
[x] Adds test(s)
REVIEWERS

Comments much appreciated @betodealmeida @john-bodley. Also @agrawaldevesh: I had to change the Pinot logic for this PR. Do you have the opportunity to test if this works on your Pinot deployment (I don't have a Pinot installation handy right now)? I also added a few grains while at it, do they work?

@@ -117,16 +138,31 @@ class BaseEngineSpec(object):
max_column_name_length = None

@classmethod
def get_time_expr(cls, expr, pdf, time_grain, grain):
def get_time_expr(cls, col: ColumnClause, pdf: Optional[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you also provide a default value if pdf and time_grain are optional? Also in the docstring the Optional word is redundant as that's implied for the type hints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my understanding that Optional[str] is synonymous with Union[None, str], not that it is an optional argument. Good point about the docstring.

@agrawaldevesh
Copy link

Thanks @villebro, Happy to "test this in prod" at Uber and report back to you. The code changes look okay to me and I think they should work.

I have just one question: I am wondering if the TimestampExpression should be a good vehicle to fix the TZ issues plaguing superset: #6768 and #1149 for example.

The problem is that Superset is essentially TZ unaware, and it assumes that the stored data is in the local TZ. I was curious if TimestampExpression could have a ".tz" field in it eventually ? And for databases that do support TZ's, that tz field is plumbed in when generating the timestamp expression sql.

def get_timestamp_expression(self, time_grain):
"""Getting the time component of the query"""
def get_timestamp_expression(self, time_grain: Optional[str]) \
-> Union[TimeExpression, Label]:
Copy link
Member

@mistercrunch mistercrunch Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotations FTW!

@villebro villebro changed the title Make time expression native SQLAlchemy element Make timestamp expression native SQLAlchemy element Mar 27, 2019
@codecov-io
Copy link

codecov-io commented Mar 27, 2019

Codecov Report

Merging #7131 into master will increase coverage by 0.03%.
The diff coverage is 94.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7131      +/-   ##
==========================================
+ Coverage   65.24%   65.27%   +0.03%     
==========================================
  Files         435      435              
  Lines       21503    21502       -1     
  Branches     2379     2379              
==========================================
+ Hits        14030    14036       +6     
+ Misses       7353     7346       -7     
  Partials      120      120
Impacted Files Coverage Δ
superset/models/core.py 83.64% <ø> (-0.11%) ⬇️
superset/connectors/sqla/models.py 81.98% <100%> (+0.11%) ⬆️
superset/db_engine_specs.py 62.73% <92.3%> (+0.83%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc3b043...e317cde. Read the comment docs.

@kristw kristw added enhancement:request Enhancement request submitted by anyone from the community review labels Mar 27, 2019
@villebro
Copy link
Member Author

villebro commented Mar 30, 2019

Rebased to fix conflict. @agrawaldevesh thanks for offering to help verify this PR! Regarding the timezone issue, I think it's something that runs slightly deeper in the code, so would probably require some additional planning (perhaps a SIP). Perhaps something along the lines of a setting in superset_config.py to specify a timezone in which all timestamps with timezone info would be cast to (naive timestamps would probably be kept unchanged)? Anyway, a really good idea worth investigating (committer guidance would be helpful).

@villebro
Copy link
Member Author

kind reminder @john-bodley @agrawaldevesh for final comments on this PR.


@compiles(TimestampExpression)
def compile_timegrain_expression(element: TimestampExpression, compiler, **kw):
return element.name.replace('{col}', compiler.process(element.col, **kw))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "name" of a ColumnClause ?

Do we need to pass this further via the compiler ? Is this supposed to return a string back or is it supposed to return some sqlalchemy struct, and so we might have to do:

compiler.process(element.name.replace('{col}', compiler.process(element.col, **kw)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name is the part containing the column name/expression; think SELECT name FROM table. In this case name will be a string containing a token {col} to indicate where the native element col should be compiled. The reason the element is kept in native form is because at construction time we want to build an engine agnostic select statement (SqlaTable.get_sqla_query() in connectors/sqla/models.py). By doing it this way we can get a different value for name depending on the dialect when we are finally compiling the whole statement (SqlaTable.get_query_str_extended() in connectors/sqla/models.py). Example:

  • name = TRUNC(CAST({col} as DATE), 'MI') (string)
  • col = MyMixedCaseCol (Any Core SQLAlchemy element that can be compiled, but in this example a ColumnClause whose name = MyMixedCaseCol)

When compiling this using different dialects we would get different results:

  • MSSQL: TRUNC(CAST([MyMixedCaseCol] as DATE), 'MI')
  • Oracle: TRUNC(CAST("MyMixedCaseCol" as DATE), 'MI')

And so forth. I'm actually working on adding a more generalized version of this to SQLAlchemy, but since it requires fairly big changes to the codebase I don't expect to see it ship in a released version before the end of this year. Hence the simple version above.

@@ -1485,20 +1503,21 @@ class PinotEngineSpec(BaseEngineSpec):
inner_joins = False
supports_column_aliases = False

_time_grain_to_datetimeconvert = {
# Pinot does its own conversion below
time_grain_functions: Dict[Optional[str], str] = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is renaming this to time_grain_functions is that they are not really behaving like "functions" (like in the rest of the db engine specs).

For example if someone (other than the pinot specific get_timestamp_expr) starts using them, then they wouldn't be getting a function. So I would be wary of changing this.

If anything, we can make time_grain_functions = None (in case they are set in the BaseEngineSpec), so that no one can use them.

And switch this back to its internal name that the below pinot specific get_timestamp_expr can use.

Copy link
Member Author

@villebro villebro Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it all values in time_grain_functions are merely technical strings that are used by the spec to construct the final TimestampExpression. The fact that Pinot does this slightly differently I think is ok; all interaction with them should always be done via the get_timestamp_expr() function anyway, be it Pinot or Postgres. But I agree that the naming is slightly confusing. I would personally stick with my proposal, but I will change it back if you feel it is more intuitive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. This was a nit anyway :D.

Should we make time_grain_functions private then ? _time_grain_functions ?

The name is a bit of a misnomer now: The values in this dictionary are no longer pure functions.

Or we can just punt on this and perhaps the comment above "time_grain_functions are merely technical strings ...." as a code comment.

Its fine either way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it should be made private as it is. However, I think I'll let this be as it is and keep this in mind next time this time grains are refactored (originally they were a tuple of tuples which are currently emulated via BaseEngineSpec.get_time_grains()). Perhaps next iteration they'll become more final.

@agrawaldevesh
Copy link

Hi @villebro , I left some comments inline. LGTM to me otherwise.

@villebro
Copy link
Member Author

Hi @villebro , I left some comments inline. LGTM to me otherwise.

Thanks @agrawaldevesh , great comments as always! Were you able to get this working? And was the unit test ok?

@agrawaldevesh
Copy link

Hi @villebro , I left some comments inline. LGTM to me otherwise.

Thanks @agrawaldevesh , great comments as always! Were you able to get this working? And was the unit test ok?

Hi Ville ... Everything works. The unit test is proper and also I was able to merge this branch and test it out in my prod environment and superset had no problems. All charts and everything rendered fine. The generated queries remained the same.

So this PR is totally fine from Pinot perspective.

@villebro
Copy link
Member Author

@mistercrunch @john-bodley this one should be ok to ship.

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass on the PR and it LGTM @agrawaldevesh has this been running in production on your side? Which engines are you using there? Presto + Pinot I'm assuming?

@agrawaldevesh
Copy link

Hi Max ... I tested this with Pinot natively and it works. Presto+pinot wouldn't really test the pinot side of superset (since superset would just speak presto-sql then). Having tested it, I am confident that this PR would work with Pinot.

# if epoch, translate to DATE using db specific conf
if pdf == 'epoch_s':
expr = cls.epoch_to_dttm().format(col=expr)
time_expr = time_expr.replace('{col}', cls.epoch_to_dttm())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.format does much more than just replacing strings. There's a whole mini language behind it that people might have used (even though it's unlikely)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the code conditionally "burned" the column name/expression into the epoch expression (later that gets conditionally "burned" into the timegrain expression), which is what this PR tries to avoid. In this proposal the epoch expression is instead conditionally burned into the timestamp expression, making it possible to keep the column object in it's native form until compilation time. I can change the .replace logic to .format, but in this context I don't think it will make a difference.

elif pdf == 'epoch_ms':
expr = cls.epoch_ms_to_dttm().format(col=expr)
time_expr = time_expr.replace('{col}', cls.epoch_ms_to_dttm())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@villebro villebro merged commit 34407e8 into apache:master May 30, 2019
@gogitub
Copy link

gogitub commented Jan 4, 2021

Hi,
Need help on similar requirement.
We have many timestamp with time zone fields in Oracle database which is the source for our Superset (0.36 version) reports.
And we need to show the data in EST or PST or some other time zones based on customer need.

Ex:
Data in Oracle database table is like '2021-01-04 21:48:56.66 +05:30'
If we want to display in EST, report should show it like : 2021-01-04T11:18:56.660000
If we want to display in PST, report should show it like : 2021-01-04T08:18:56.660000

So, is there any configuration parameter that we can modify to display the data in desired time zone (like EST/PST or some other)?

Please advise. Thanks in advance.

@gogitub
Copy link

gogitub commented Jan 5, 2021

Also, we need to show the data displayed in time zone per user profile or preference if there is a way.

@gogitub
Copy link

gogitub commented Jan 5, 2021

It is observed that Superset is automatically converting the timestamp fields to EST time while displaying in the SQL Lab or Chart.
Anyone knows where exactly we can change this setting to show it in different time zone or customize it to work using a config parameter?

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels enhancement:request Enhancement request submitted by anyone from the community 🚢 0.34.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants