Skip to content

Commit

Permalink
docs: Add timezone information (apache#19056)
Browse files Browse the repository at this point in the history
Co-authored-by: Evan Rusackas <[email protected]>
  • Loading branch information
john-bodley and rusackas authored Oct 11, 2023
1 parent d796a59 commit 049c4f8
Showing 1 changed file with 50 additions and 0 deletions.
50 changes: 50 additions & 0 deletions docs/docs/miscellaneous/timezones.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: Timezones
hide_title: true
sidebar_position: 1
version: 1
---

## Timezones

There are four distinct timezone components which relate to Apache Superset,

1. The timezone that the underlying data is encoded in.
2. The timezone of the database engine.
3. The timezone of the Apache Superset backend.
4. The timezone of the Apache Superset client.

where if a temporal field (`DATETIME`, `TIME`, `TIMESTAMP`, etc.) does not explicitly define a timezone it defaults to the underlying timezone of the component.

To help make the problem somewhat tractable—given that Apache Superset has no control on either how the data is ingested (1) or the timezone of the client (4)—from a consistency standpoint it is highly recommended that both (2) and (3) are configured to use the same timezone with a strong preference given to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) to ensure temporal fields without an explicit timestamp are not incorrectly coerced into the wrong timezone. Actually Apache Superset currently has implicit assumptions that timestamps are in UTC and thus configuring (3) to a non-UTC timezone could be problematic.

To strive for data consistency (regardless of the timezone of the client) the Apache Superset backend tries to ensure that any timestamp sent to the client has an explicit (or semi-explicit as in the case with [Epoch time](https://en.wikipedia.org/wiki/Unix_time) which is always in reference to UTC) timezone encoded within.

The challenge however lies with the slew of [database engines](/docs/databases/installing-database-drivers#install-database-drivers) which Apache Superset supports and various inconsistencies between their [Python Database API (DB-API)](https://www.python.org/dev/peps/pep-0249/) implementations combined with the fact that we use [Pandas](https://pandas.pydata.org/) to read SQL into a DataFrame prior to serializing to JSON. Regrettably Pandas ignores the DB-API [type_code](https://www.python.org/dev/peps/pep-0249/#type-objects) relying by default on the underlying Python type returned by the DB-API. Currently only a subset of the supported database engines work correctly with Pandas, i.e., ensuring timestamps without an explicit timestamp are serializd to JSON with the server timezone, thus guaranteeing the client will display timestamps in a consistent manner irrespective of the client's timezone.

For example the following is a comparison of MySQL and Presto,

```
import pandas as pd
from sqlalchemy import create_engine
pd.read_sql_query(
sql="SELECT TIMESTAMP('2022-01-01 00:00:00') AS ts",
con=create_engine("mysql://root@localhost:3360"),
).to_json()
pd.read_sql_query(
sql="SELECT TIMESTAMP '2022-01-01 00:00:00' AS ts",
con=create_engine("presto://localhost:8080"),
).to_json()
```

which outputs `{"ts":{"0":1640995200000}}` (which infers the UTC timezone per the Epoch time definition) and `{"ts":{"0":"2022-01-01 00:00:00.000"}}` (without an explicit timezone) respectively and thus are treated differently in JavaScript:

```
new Date(1640995200000)
> Sat Jan 01 2022 13:00:00 GMT+1300 (New Zealand Daylight Time)
new Date("2022-01-01 00:00:00.000")
> Sat Jan 01 2022 00:00:00 GMT+1300 (New Zealand Daylight Time)
```

0 comments on commit 049c4f8

Please sign in to comment.