Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: tzinfo can be inconsistent across nodes #31978

Open
bobvawter opened this issue Oct 29, 2018 · 12 comments
Open

sql: tzinfo can be inconsistent across nodes #31978

bobvawter opened this issue Oct 29, 2018 · 12 comments
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-semantics C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-done docs-known-limitation T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@bobvawter
Copy link
Contributor

bobvawter commented Oct 29, 2018

CockroachDB relies on each node's tzinfo data for resolving named timezones like America/New_York and for applying daylight-savings rules. This can lead to inconsistent results across a cluster when coercing date/time strings into temporal types.

The desired solution is to store tzinfo data as a system table, to be bootstrapped and maintained by a tzinfo distribution baked into the cockroach binary.

This would also make it easier to implement #31710 to support (custom) timezone abbreviations.

Jira issue: CRDB-4766

@bobvawter bobvawter added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-semantics A-sql-pgcompat Semantic compatibility with PostgreSQL S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors labels Oct 29, 2018
@bobvawter bobvawter added this to the 2.2 milestone Oct 29, 2018
craig bot pushed a commit that referenced this issue Oct 30, 2018
31758: sql: Generalize date/time parsing r=bobvawter a=bobvawter

sql: Generalize date/time parsing

The current date/time parsing code relies on `time.ParseInLocation()`. It does
not support all of the various date/time formats accepted by PostgreSQL and
also requires multiple invocation to try the various date/time formats that we
do accept.

This change updates the date/time parsing code with a new implementation that
does not delegate to `time.ParseInLocation()` and is able to parse all
supported formats in a single pass.

In order to support parsing named timezones like `America/New_York`, we
delegate to `time.LoadLocation()` as we did previously.  `LoadLocation()` is
rather expensive, since it looks for tzinfo files on disk every time it is
invoked. A per-node, in-memory cache has been added to amortize this overhead.
Per #31978, the tzinfo used on each node could already be inconsistent,
depending on the tzinfo files present in the underlying OS.

The following table compares the new `ParseTimestamp()` function to calling
`ParseInLocation()`.  While it is true that `ParseInLocation()` is generally
faster for any given pattern, the current parsing code must call it repeatedly,
trying each supported date format until one succeeds. The test with the named
timezone also shows the significant overhead of calling `LoadLocation()`.

```
2003-06-12/ParseTimestamp-8             10000000               122 ns/op          81.53 MB/s
2003-06-12/ParseInLocation-8            30000000                35.6 ns/op       281.29 MB/s
2003-06-12_01:02:03/ParseTimestamp-8            10000000               163 ns/op         116.45 MB/s
2003-06-12_01:02:03/ParseInLocation-8           30000000                54.4 ns/op       349.16 MB/s
2003-06-12_04:05:06.789-04:00/ParseTimestamp-8          10000000               238 ns/op         121.69 MB/s
2003-06-12_04:05:06.789-04:00/ParseInLocation-8         10000000               161 ns/op         180.05 MB/s
2000-01-01T02:02:02.567+09:30/ParseTimestamp-8           5000000               233 ns/op         124.01 MB/s
2000-01-01T02:02:02.567+09:30/ParseInLocation-8         10000000               158 ns/op         182.41 MB/s
2003-06-12_04:05:06.789_America/New_York/ParseTimestamp-8                3000000               475 ns/op          84.06 MB/s
2003-06-12_04:05:06.789_America/New_York/ParseInLocation-8                200000              7313 ns/op           3.15 MB/s
```

The tests in `parsing_test.go` have an optional mode to cross-check the test
data aginst a PostgreSQL server.  This is useful for developing, but is not
part of the automated build.

Parsing of BC dates is supported, #28099 could then be completed by changing
the date-formatting code to print a BC date.

This change would allow #30697 (incomplete handling of datestyle) to be
re-evaluated, since the parser does allow configuration of YMD, DMY, or MDY
input styles.

Resolves #27500
Resolves #27501
Resolves #31954

Release note (sql change): A wider variety of date, time, and timestamp formats
are now accepted by the SQL frontend.

Release note (bug fix): Prepared statements that bind temporal values now
respect the session's timezone setting. Previously, bound temporal values were
always interpreted as though the session time zone were UTC.

Release note (backward-incompatible change): Timezone abbreviations, such as
EST, are no longer allowed when parsing or converting to a date/time type.
Previously, an abbreviation would be accepted if it were an alias for the
session's timezone.


Co-authored-by: Bob Vawter <[email protected]>
@knz knz changed the title tzinfo can be inconsistent across nodes sql: tzinfo can be inconsistent across nodes Nov 12, 2018
@knz knz added S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. docs-known-limitation and removed S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. labels Nov 12, 2018
@knz
Copy link
Contributor

knz commented Nov 12, 2018

@jseldess until this issue is fixed, we should call out in the docs for operational matters that the OS database should be updated, and all nodes should be restarted when the timezone data changes (e.g. the definition of a timezone changes because of geopolitical matters, or some country decides to change their DST rules).

@knz
Copy link
Contributor

knz commented Nov 12, 2018

@bobvawter can you clarify why you used the label "pgcompat" here? What does PostgreSQL does in this case?

@bobvawter
Copy link
Contributor Author

bobvawter commented Nov 12, 2018 via email

@knz
Copy link
Contributor

knz commented Nov 12, 2018

ok thanks for clarifying.

@bdarnell
Copy link
Contributor

This would also address #32415.

Keyword stuffing for future searches: zoneinfo, tzdata.

@jseldess
Copy link
Contributor

@bdarnell, is cockroachdb/docs#4567 enough here from a docs perspective? Do we need to call this out as a known limitation as well?

@bdarnell
Copy link
Contributor

It's separate from cockroachdb/docs#4567. I'm not sure if it's a known limitation, more of a production best practice ("ensure that all nodes have the same version of the tzdata package; when updating this package roll it out as quickly as possible across all nodes").

While you're thinking about tzdata docs, you could also cover #32415 (which is a known limitation, that location-based time zone names may not resolve for a cockroachdb server running on windows. There is a workaround, which is to install the Go toolchain on the machine(s) running the server).

@knz
Copy link
Contributor

knz commented Apr 16, 2019

The discussion should continue on #36864.

@knz
Copy link
Contributor

knz commented Oct 10, 2020

@otan let's chat here about options to keep this in sync across nodes.

It's not because we may want to store today in the distributed KV that we need KV primitives to access it upon every tx lookup.

They way I could see this work is the way we handle the HBA config already: store the tz data as a cluster setting which takes care of persistence, let it propagate via gossip, and use an in-RAM cache for lookups.

@knz knz removed this from the 19.2 milestone May 4, 2021
@rafiss rafiss added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 12, 2021
@rafiss
Copy link
Collaborator

rafiss commented May 19, 2021

i believe #56634 closes this -- @otan @knz please reopen if needed

@rafiss rafiss closed this as completed May 19, 2021
@bdarnell
Copy link
Contributor

#56634 improves the situation, but inconsistency between nodes is still possible at version upgrade time.

@bdarnell bdarnell reopened this May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-semantics C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-done docs-known-limitation T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

5 participants