Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres timestamp when using Include Time in table view #6763

Closed
3 tasks done
kdorsel opened this issue Jan 26, 2019 · 12 comments · Fixed by #6786
Closed
3 tasks done

postgres timestamp when using Include Time in table view #6763

kdorsel opened this issue Jan 26, 2019 · 12 comments · Fixed by #6786
Labels
#bug:cant-reproduce Bugs that cannot be reproduced !deprecated-label:bug Deprecated label - Use #bug instead v0.28 v0.29

Comments

@kdorsel
Copy link

kdorsel commented Jan 26, 2019

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Superset version

I'm running MacOS 10.13 with python 3.7 and postgres 11

I've tried on both version each in a different venv.
0.28.1
0.29.0rc7

Expected results

To correctly scrub time and display time grouping

Actual results

When using a column of type timestamp with timezone I get the following error. This is the same error on both versions of superset.

019-01-25 23:08:53,693:INFO:root:Database.get_sqla_engine(). Masked URL: postgresql+psycopg2://kaass:XXXXXXXXXX@localhost:5432/postgres
2019-01-25 23:08:53,694:INFO:root:SELECT day AS day, DATE_TRUNC('hour', day) AT TIME ZONE 'UTC' AS __timestamp, AVG(value) AS "AVG(value)" 
FROM tryshop GROUP BY day, DATE_TRUNC('hour', day) AT TIME ZONE 'UTC' ORDER BY "AVG(value)" DESC 
 LIMIT 10000
2019-01-25 23:08:53,706:INFO:root:Database.get_sqla_engine(). Masked URL: postgresql+psycopg2://kaass:XXXXXXXXXX@localhost:5432/postgres
2019-01-25 23:08:53,747:DEBUG:root:[stats_logger] (incr) loaded_from_source
2019-01-25 23:08:53,792:ERROR:root:Timestamp subtraction must have the same timezones or no timezones
Traceback (most recent call last):
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/superset/views/base.py", line 96, in wraps
    return f(self, *args, **kwargs)
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/superset/views/core.py", line 1211, in explore_json
    samples=samples,
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/superset/views/core.py", line 1143, in generate_json
    return data_payload_response(*viz_obj.payload_json_and_has_error(payload))
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/superset/viz.py", line 473, in payload_json_and_has_error
    return self.json_dumps(payload), has_error
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/superset/viz.py", line 605, in json_dumps
    return super(TableViz, self).json_dumps(obj)
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/superset/viz.py", line 467, in json_dumps
    sort_keys=sort_keys,
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/simplejson/__init__.py", line 399, in dumps
    **kw).encode(obj)
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/simplejson/encoder.py", line 296, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/simplejson/encoder.py", line 378, in iterencode
    return _iterencode(o, 0)
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/superset/utils/core.py", line 376, in json_int_dttm_ser
    obj = datetime_to_epoch(obj)
  File "/Users/kaass/.envs/superset-029-env/lib/python3.7/site-packages/superset/utils/core.py", line 362, in datetime_to_epoch
    return (dttm - epoch_with_tz).total_seconds() * 1000
  File "pandas/_libs/tslibs/timestamps.pyx", line 320, in pandas._libs.tslibs.timestamps._Timestamp.__sub__
TypeError: Timestamp subtraction must have the same timezones or no timezones

When using a timestamp without time zone I get the following. Again on both versions.

2019-01-25 22:31:33,848:INFO:root:Database.get_sqla_engine(). Masked URL: postgresql+psycopg2://kaass@localhost:5432/postgres
2019-01-25 22:31:33,849:INFO:root:SELECT day::timestamp AS date2, DATE_TRUNC('day', day::timestamp) AT TIME ZONE 'UTC' AS __timestamp, COUNT(value) AS "COUNT(value)" 
FROM shop GROUP BY day::timestamp, DATE_TRUNC('day', day::timestamp) AT TIME ZONE 'UTC' ORDER BY "COUNT(value)" DESC 
 LIMIT 10000
2019-01-25 22:31:33,860:INFO:root:Database.get_sqla_engine(). Masked URL: postgresql+psycopg2://kaass@localhost:5432/postgres
2019-01-25 22:31:33,880:ERROR:root:Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Traceback (most recent call last):
  File "/Users/kaass/.envs/superset-env/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 377, in _convert_listlike
    values, tz = conversion.datetime_to_datetime64(arg)
  File "pandas/_libs/tslibs/conversion.pyx", line 174, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/kaass/.envs/superset-env/lib/python3.7/site-packages/superset/viz.py", line 406, in get_df_payload
    df = self.get_df(query_obj)
  File "/Users/kaass/.envs/superset-env/lib/python3.7/site-packages/superset/viz.py", line 229, in get_df
    df[DTTM_ALIAS], utc=False, format=timestamp_format)
  File "/Users/kaass/.envs/superset-env/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 451, in to_datetime
    values = _convert_listlike(arg._values, True, format)
  File "/Users/kaass/.envs/superset-env/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 380, in _convert_listlike
    raise e
  File "/Users/kaass/.envs/superset-env/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 368, in _convert_listlike
    require_iso8601=require_iso8601
  File "pandas/_libs/tslib.pyx", line 492, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 537, in pandas._libs.tslib.array_to_datetime
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

Steps to reproduce

I created a dummy postgres table as follows:

create table shop as
SELECT day::timestamp without time zone, (10 + 7*random())*(row_number() over()) as "value" FROM generate_series
        ( '2017-02-01'::date
        , '2017-04-01'::date
        , '1 hour'::interval) day
create table tryshop as
SELECT day::timestamp with time zone, (10 + 7*random())*(row_number() over()) as "value" FROM generate_series
        ( '2017-02-01'::date
        , '2017-04-01'::date
        , '1 hour'::interval) day
@villebro
Copy link
Member

I believe this was fixed in #6453. I checked that it hasn't made it's way into 0.29rc7, but is present in the 0.30 branch.

@kdorsel
Copy link
Author

kdorsel commented Jan 26, 2019

I got the latest commit (a62a8d3) and tried again.

It seems to have fixed the issue with columns of timestamp with time zone, but still getting errors with columns without time zones.

019-01-26 11:35:41,157:INFO:root:Database.get_sqla_engine(). Masked URL: postgresql+psycopg2://kaass:XXXXXXXXXX@localhost:5432/postgres
2019-01-26 11:35:41,158:INFO:root:SELECT DATE_TRUNC('day', day) AT TIME ZONE 'UTC' AS __timestamp, AVG(value) AS "AVG(value)" 
FROM shop GROUP BY DATE_TRUNC('day', day) AT TIME ZONE 'UTC' ORDER BY "AVG(value)" DESC 
 LIMIT 10000
2019-01-26 11:35:41,166:INFO:root:Database.get_sqla_engine(). Masked URL: postgresql+psycopg2://kaass:XXXXXXXXXX@localhost:5432/postgres
2019-01-26 11:35:41,185:ERROR:root:Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Traceback (most recent call last):
  File "/Users/kaass/.envs/superset-head-env/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 377, in _convert_listlike
    values, tz = conversion.datetime_to_datetime64(arg)
  File "pandas/_libs/tslibs/conversion.pyx", line 174, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/kaass/.envs/incubator-superset/superset/viz.py", line 410, in get_df_payload
    df = self.get_df(query_obj)
  File "/Users/kaass/.envs/incubator-superset/superset/viz.py", line 231, in get_df
    df[DTTM_ALIAS], utc=False, format=timestamp_format)
  File "/Users/kaass/.envs/superset-head-env/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 451, in to_datetime
    values = _convert_listlike(arg._values, True, format)
  File "/Users/kaass/.envs/superset-head-env/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 380, in _convert_listlike
    raise e
  File "/Users/kaass/.envs/superset-head-env/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 368, in _convert_listlike
    require_iso8601=require_iso8601
  File "pandas/_libs/tslib.pyx", line 492, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 537, in pandas._libs.tslib.array_to_datetime
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

@villebro
Copy link
Member

Are you on pandas 0.24?

@kdorsel
Copy link
Author

kdorsel commented Jan 27, 2019

Upgrading from pandas 0.23.1 to 0.24 now gives me this error:

/Users/kaass/.envs/superset-head-env/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
Traceback (most recent call last):
  File "/Users/kaass/.envs/superset-head-env/bin/superset", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/Users/kaass/.envs/incubator-superset/superset/bin/superset", line 21, in <module>
    from superset.cli import create_app
  File "/Users/kaass/.envs/incubator-superset/superset/__init__.py", line 214, in <module>
    ConnectorRegistry.register_sources(module_datasource_map)
  File "/Users/kaass/.envs/incubator-superset/superset/connectors/connector_registry.py", line 30, in register_sources
    module_obj = __import__(module_name, fromlist=class_names)
  File "/Users/kaass/.envs/incubator-superset/superset/connectors/sqla/__init__.py", line 18, in <module>
    from . import views  # noqa
  File "/Users/kaass/.envs/incubator-superset/superset/connectors/sqla/views.py", line 30, in <module>
    from superset.connectors.base.views import DatasourceModelView
  File "/Users/kaass/.envs/incubator-superset/superset/connectors/base/views.py", line 21, in <module>
    from superset.views.base import SupersetModelView
  File "/Users/kaass/.envs/incubator-superset/superset/views/__init__.py", line 19, in <module>
    from . import core  # noqa
  File "/Users/kaass/.envs/incubator-superset/superset/views/core.py", line 45, in <module>
    from superset import (
  File "/Users/kaass/.envs/incubator-superset/superset/sql_lab.py", line 32, in <module>
    from superset import app, dataframe, db, results_backend, security_manager
  File "/Users/kaass/.envs/incubator-superset/superset/dataframe.py", line 30, in <module>
    from pandas.core.common import _maybe_box_datetimelike
ImportError: cannot import name '_maybe_box_datetimelike' from 'pandas.core.common' (/Users/kaass/.envs/superset-head-env/lib/python3.7/site-packages/pandas/core/common.py)

@makkus
Copy link

makkus commented Jan 27, 2019

Upgrading from pandas 0.23.1 to 0.24 now gives me this error:

/Users/kaass/.envs/superset-head-env/lib/python3.7/site-packages/psycopg2/__init__.py:144: 
  File "/Users/kaass/.envs/incubator-superset/superset/dataframe.py", line 30, in <module>
    from pandas.core.common import _maybe_box_datetimelike
ImportError: cannot import name '_maybe_box_datetimelike' from 'pandas.core.common' (/Users/kaass/.envs/superset-head-env/lib/python3.7/site-packages/pandas/core/common.py)

Same here...

@zihengCat
Copy link
Contributor

See #6765

@villebro
Copy link
Member

@kdorsel Have you tried on py36? I don't currently have a py37 setup, but was unable to replicate on py36.

@kdorsel
Copy link
Author

kdorsel commented Jan 29, 2019

On py36 there are no errors, correct.

Now I've started looking into the actual timestamp though. The time grain function seems to be changing the outputted times themselves.

Here is an example output of tz and without tz from postgres directly. This information was inserted at 15:17:26 localtime at -05:00 offset (America/Toronto).

shop=# select * from withouttz;
             ts             | val
----------------------------+-----
 2019-01-29 15:17:26.081731 |   5
 2019-01-29 15:17:26.081731 |   6
 2019-01-29 15:17:26.081731 |   7
 2019-01-29 15:17:26.081731 |   8
(4 rows)


shop=# select * from withtz;
              ts               | val
-------------------------------+-----
 2019-01-29 20:17:26.081731-05 |   5
 2019-01-29 20:17:26.081731-05 |   6
 2019-01-29 20:17:26.081731-05 |   7
 2019-01-29 20:17:26.081731-05 |   8
(4 rows)

Here's the same output from superset grouping by ts and with the time grain checked. As can be seen, the time column. It looks like the time grain is applying the tz offset to columns with no tz, first example. And also applying the tz offset a second time to the with tz column.

without tz with time grain set to minute
ts                         | Time (from time grain)    | COUTN(*)
2019-01-29T15:27:56.750068 | 2019-01-29T10:27:00-05:00 | 4

with tz with time grain set to minute
ts                               | Time (from time grain) | COUTN(*)
2019-01-29T20:17:26.081731-05:00 | 2019-01-30T01:17:00    | 4

@villebro
Copy link
Member

Weird, I switched over to python 3.7.2 and can't reproduce the error. Check my pip list below. However, I agree that the AT TIME ZONE 'UTC' really shouldn't be there (in practice it casts a timestamp to timestamptz and vice versa). Can you check if the time grains in #6786 fixes the time shifting problem?

Package                    Version  
-------------------------- ---------
alembic                    1.0.0    
amqp                       2.3.2    
asn1crypto                 0.24.0   
azure-common               1.1.17   
azure-storage-blob         1.4.0    
azure-storage-common       1.4.0    
Babel                      2.6.0    
billiard                   3.5.0.4  
bleach                     3.0.2    
boto3                      1.9.86   
botocore                   1.12.86  
celery                     4.2.0    
certifi                    2018.8.24
cffi                       1.11.5   
chardet                    3.0.4    
click                      6.7      
colorama                   0.3.9    
contextlib2                0.5.5    
croniter                   0.3.26   
cryptography               2.4.2    
decorator                  4.3.0    
defusedxml                 0.5.0    
docutils                   0.14     
Flask                      1.0.2    
Flask-AppBuilder           1.12.1   
Flask-Babel                0.11.1   
Flask-Caching              1.4.0    
Flask-Compress             1.4.0    
Flask-Login                0.4.1    
Flask-Migrate              2.1.1    
Flask-OpenID               1.2.5    
Flask-SQLAlchemy           2.3.2    
Flask-WTF                  0.14.2   
future                     0.17.1   
geopy                      1.11.0   
gunicorn                   19.8.0   
humanize                   0.5.1    
idna                       2.6      
ijson                      2.3      
isodate                    0.6.0    
itsdangerous               0.24     
Jinja2                     2.10     
jmespath                   0.9.3    
kombu                      4.2.1    
Mako                       1.0.7    
Markdown                   3.0      
MarkupSafe                 1.0      
numpy                      1.16.0   
pandas                     0.23.1   
parsedatetime              2.0      
pathlib2                   2.3.0    
pip                        19.0.1   
polyline                   1.3.2    
psycopg2                   2.7.7    
py                         1.7.0    
pyasn1                     0.4.5    
pyasn1-modules             0.2.4    
pycparser                  2.19     
pycryptodomex              3.7.3    
pydruid                    0.5.0    
PyJWT                      1.7.1    
pyOpenSSL                  19.0.0   
python-dateutil            2.6.1    
python-editor              1.0.3    
python-geohash             0.8.5    
python3-openid             3.1.0    
pytz                       2018.5   
PyYAML                     3.13     
requests                   2.20.0   
retry                      0.9.2    
s3transfer                 0.1.13   
selenium                   3.141.0  
setuptools                 40.7.0   
simplejson                 3.15.0   
six                        1.11.0   
snowflake-connector-python 1.7.5    
snowflake-sqlalchemy       1.1.6    
SQLAlchemy                 1.2.2    
SQLAlchemy-Utils           0.32.21  
sqlparse                   0.2.4    
unicodecsv                 0.14.1   
urllib3                    1.22     
vine                       1.1.4    
webencodings               0.5.1    
Werkzeug                   0.14.1   
wheel                      0.32.3   
WTForms                    2.2.1

@kdorsel
Copy link
Author

kdorsel commented Jan 31, 2019

Yes, removing the AT TIME ZONE 'UTC' fixes all the shifting issues.
Let me get back to you for py37. Travelling with another computer currently.

@kdorsel
Copy link
Author

kdorsel commented Feb 1, 2019

Ok, on a different system WSL ubuntu 18.04 vs macOS 10.13 on py37 I also have no errors. That is with and without the UTC fix.

I am using the current master branch though whereas the macOS testing was done with the latest releases. I will not be able to try the master branch on macOS for an extended period of time unfortunately.

@villebro
Copy link
Member

villebro commented Feb 1, 2019

Ok, if you were unable to repro the py37 issue on master I think it's safe to assume it's been fixed, and removing the AT TIME ZONE 'UTC' from grains should close this issue.

@kristw kristw added !deprecated-label:bug Deprecated label - Use #bug instead #bug:cant-reproduce Bugs that cannot be reproduced .postgres v0.29 v0.28 labels Feb 7, 2019
@kristw kristw closed this as completed Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug:cant-reproduce Bugs that cannot be reproduced !deprecated-label:bug Deprecated label - Use #bug instead v0.28 v0.29
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants