sql: tzinfo can be inconsistent across nodes #31978

bobvawter · 2018-10-29T17:42:41Z

CockroachDB relies on each node's tzinfo data for resolving named timezones like America/New_York and for applying daylight-savings rules. This can lead to inconsistent results across a cluster when coercing date/time strings into temporal types.

The desired solution is to store tzinfo data as a system table, to be bootstrapped and maintained by a tzinfo distribution baked into the cockroach binary.

This would also make it easier to implement #31710 to support (custom) timezone abbreviations.

Jira issue: CRDB-4766

The text was updated successfully, but these errors were encountered:

31758: sql: Generalize date/time parsing r=bobvawter a=bobvawter sql: Generalize date/time parsing The current date/time parsing code relies on `time.ParseInLocation()`. It does not support all of the various date/time formats accepted by PostgreSQL and also requires multiple invocation to try the various date/time formats that we do accept. This change updates the date/time parsing code with a new implementation that does not delegate to `time.ParseInLocation()` and is able to parse all supported formats in a single pass. In order to support parsing named timezones like `America/New_York`, we delegate to `time.LoadLocation()` as we did previously. `LoadLocation()` is rather expensive, since it looks for tzinfo files on disk every time it is invoked. A per-node, in-memory cache has been added to amortize this overhead. Per #31978, the tzinfo used on each node could already be inconsistent, depending on the tzinfo files present in the underlying OS. The following table compares the new `ParseTimestamp()` function to calling `ParseInLocation()`. While it is true that `ParseInLocation()` is generally faster for any given pattern, the current parsing code must call it repeatedly, trying each supported date format until one succeeds. The test with the named timezone also shows the significant overhead of calling `LoadLocation()`. ``` 2003-06-12/ParseTimestamp-8 10000000 122 ns/op 81.53 MB/s 2003-06-12/ParseInLocation-8 30000000 35.6 ns/op 281.29 MB/s 2003-06-12_01:02:03/ParseTimestamp-8 10000000 163 ns/op 116.45 MB/s 2003-06-12_01:02:03/ParseInLocation-8 30000000 54.4 ns/op 349.16 MB/s 2003-06-12_04:05:06.789-04:00/ParseTimestamp-8 10000000 238 ns/op 121.69 MB/s 2003-06-12_04:05:06.789-04:00/ParseInLocation-8 10000000 161 ns/op 180.05 MB/s 2000-01-01T02:02:02.567+09:30/ParseTimestamp-8 5000000 233 ns/op 124.01 MB/s 2000-01-01T02:02:02.567+09:30/ParseInLocation-8 10000000 158 ns/op 182.41 MB/s 2003-06-12_04:05:06.789_America/New_York/ParseTimestamp-8 3000000 475 ns/op 84.06 MB/s 2003-06-12_04:05:06.789_America/New_York/ParseInLocation-8 200000 7313 ns/op 3.15 MB/s ``` The tests in `parsing_test.go` have an optional mode to cross-check the test data aginst a PostgreSQL server. This is useful for developing, but is not part of the automated build. Parsing of BC dates is supported, #28099 could then be completed by changing the date-formatting code to print a BC date. This change would allow #30697 (incomplete handling of datestyle) to be re-evaluated, since the parser does allow configuration of YMD, DMY, or MDY input styles. Resolves #27500 Resolves #27501 Resolves #31954 Release note (sql change): A wider variety of date, time, and timestamp formats are now accepted by the SQL frontend. Release note (bug fix): Prepared statements that bind temporal values now respect the session's timezone setting. Previously, bound temporal values were always interpreted as though the session time zone were UTC. Release note (backward-incompatible change): Timezone abbreviations, such as EST, are no longer allowed when parsing or converting to a date/time type. Previously, an abbreviation would be accepted if it were an alias for the session's timezone. Co-authored-by: Bob Vawter <[email protected]>

knz · 2018-11-12T11:44:00Z

@jseldess until this issue is fixed, we should call out in the docs for operational matters that the OS database should be updated, and all nodes should be restarted when the timezone data changes (e.g. the definition of a timezone changes because of geopolitical matters, or some country decides to change their DST rules).

knz · 2018-11-12T11:44:19Z

@bobvawter can you clarify why you used the label "pgcompat" here? What does PostgreSQL does in this case?

bobvawter · 2018-11-12T13:30:48Z

I applied "pgcompat" here because this is an example of where CockroachDB and PostgreSQL can produce difference results for the same inputs. pg includes its own tzinfo distribution as part of the build process.

…

On Mon, Nov 12, 2018 at 6:44 AM kena ***@***.***> wrote: @bobvawter <https://github.com/bobvawter> can you clarify why you used the label "pgcompat" here? What does PostgreSQL does in this case? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#31978 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABGtlMSglfNmSQILtKV2p1PcrBuVWlh7ks5uuV8ZgaJpZM4X_uGe> .

knz · 2018-11-12T13:35:20Z

ok thanks for clarifying.

bdarnell · 2019-03-24T15:13:09Z

This would also address #32415.

Keyword stuffing for future searches: zoneinfo, tzdata.

jseldess · 2019-03-24T22:14:20Z

@bdarnell, is cockroachdb/docs#4567 enough here from a docs perspective? Do we need to call this out as a known limitation as well?

bdarnell · 2019-03-25T17:13:08Z

It's separate from cockroachdb/docs#4567. I'm not sure if it's a known limitation, more of a production best practice ("ensure that all nodes have the same version of the tzdata package; when updating this package roll it out as quickly as possible across all nodes").

While you're thinking about tzdata docs, you could also cover #32415 (which is a known limitation, that location-based time zone names may not resolve for a cockroachdb server running on windows. There is a workaround, which is to install the Go toolchain on the machine(s) running the server).

knz · 2019-04-16T09:07:12Z

The discussion should continue on #36864.

jseldess · 2019-11-05T18:06:57Z

Documented in several places:

knz · 2020-10-10T15:11:18Z

@otan let's chat here about options to keep this in sync across nodes.

It's not because we may want to store today in the distributed KV that we need KV primitives to access it upon every tx lookup.

They way I could see this work is the way we handle the HBA config already: store the tz data as a cluster setting which takes care of persistence, let it propagate via gossip, and use an in-RAM cache for lookups.

rafiss · 2021-05-19T16:38:58Z

i believe #56634 closes this -- @otan @knz please reopen if needed

bdarnell · 2021-05-19T16:41:51Z

#56634 improves the situation, but inconsistency between nodes is still possible at version upgrade time.

bobvawter added this to the 2.2 milestone Oct 29, 2018

bobvawter added the docs-todo label Oct 30, 2018

bobvawter mentioned this issue Oct 30, 2018

sql: Generalize date/time parsing #31758

Merged

bobvawter mentioned this issue Oct 31, 2018

sql: Support the "at tz" operator #32005

Closed

knz changed the title ~~tzinfo can be inconsistent across nodes~~ sql: tzinfo can be inconsistent across nodes Nov 12, 2018

jseldess mentioned this issue Nov 12, 2018

sql: tzinfo can be inconsistent across nodes cockroachdb/docs#4025

Open

knz mentioned this issue Apr 16, 2019

sql: stop relying on time.LoadLocation for time zone data #36864

Closed

knz modified the milestones: 19.1, 19.2 Apr 16, 2019

jseldess added docs-done and removed docs-todo labels Nov 5, 2019

otan mentioned this issue Feb 11, 2020

an active time zone incorrectly changes interval math in UPDATE cockroachdb/django-cockroachdb#124

Closed

otan mentioned this issue Oct 9, 2020

timeutil: embed time zone data into CockroachDB #55377

Closed

knz removed this from the 19.2 milestone May 4, 2021

rafiss added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 12, 2021

rafiss closed this as completed May 19, 2021

bdarnell reopened this May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: tzinfo can be inconsistent across nodes #31978

sql: tzinfo can be inconsistent across nodes #31978

bobvawter commented Oct 29, 2018 •

edited by cockroach-jira-scripts

Loading

knz commented Nov 12, 2018

knz commented Nov 12, 2018

bobvawter commented Nov 12, 2018 via email

knz commented Nov 12, 2018

bdarnell commented Mar 24, 2019

jseldess commented Mar 24, 2019

bdarnell commented Mar 25, 2019

knz commented Apr 16, 2019

jseldess commented Nov 5, 2019

knz commented Oct 10, 2020

rafiss commented May 19, 2021

bdarnell commented May 19, 2021

sql: tzinfo can be inconsistent across nodes #31978

sql: tzinfo can be inconsistent across nodes #31978

Comments

bobvawter commented Oct 29, 2018 • edited by cockroach-jira-scripts Loading

knz commented Nov 12, 2018

knz commented Nov 12, 2018

bobvawter commented Nov 12, 2018 via email

knz commented Nov 12, 2018

bdarnell commented Mar 24, 2019

jseldess commented Mar 24, 2019

bdarnell commented Mar 25, 2019

knz commented Apr 16, 2019

jseldess commented Nov 5, 2019

knz commented Oct 10, 2020

rafiss commented May 19, 2021

bdarnell commented May 19, 2021

bobvawter commented Oct 29, 2018 •

edited by cockroach-jira-scripts

Loading