-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: tzinfo can be inconsistent across nodes #31978
Comments
31758: sql: Generalize date/time parsing r=bobvawter a=bobvawter sql: Generalize date/time parsing The current date/time parsing code relies on `time.ParseInLocation()`. It does not support all of the various date/time formats accepted by PostgreSQL and also requires multiple invocation to try the various date/time formats that we do accept. This change updates the date/time parsing code with a new implementation that does not delegate to `time.ParseInLocation()` and is able to parse all supported formats in a single pass. In order to support parsing named timezones like `America/New_York`, we delegate to `time.LoadLocation()` as we did previously. `LoadLocation()` is rather expensive, since it looks for tzinfo files on disk every time it is invoked. A per-node, in-memory cache has been added to amortize this overhead. Per #31978, the tzinfo used on each node could already be inconsistent, depending on the tzinfo files present in the underlying OS. The following table compares the new `ParseTimestamp()` function to calling `ParseInLocation()`. While it is true that `ParseInLocation()` is generally faster for any given pattern, the current parsing code must call it repeatedly, trying each supported date format until one succeeds. The test with the named timezone also shows the significant overhead of calling `LoadLocation()`. ``` 2003-06-12/ParseTimestamp-8 10000000 122 ns/op 81.53 MB/s 2003-06-12/ParseInLocation-8 30000000 35.6 ns/op 281.29 MB/s 2003-06-12_01:02:03/ParseTimestamp-8 10000000 163 ns/op 116.45 MB/s 2003-06-12_01:02:03/ParseInLocation-8 30000000 54.4 ns/op 349.16 MB/s 2003-06-12_04:05:06.789-04:00/ParseTimestamp-8 10000000 238 ns/op 121.69 MB/s 2003-06-12_04:05:06.789-04:00/ParseInLocation-8 10000000 161 ns/op 180.05 MB/s 2000-01-01T02:02:02.567+09:30/ParseTimestamp-8 5000000 233 ns/op 124.01 MB/s 2000-01-01T02:02:02.567+09:30/ParseInLocation-8 10000000 158 ns/op 182.41 MB/s 2003-06-12_04:05:06.789_America/New_York/ParseTimestamp-8 3000000 475 ns/op 84.06 MB/s 2003-06-12_04:05:06.789_America/New_York/ParseInLocation-8 200000 7313 ns/op 3.15 MB/s ``` The tests in `parsing_test.go` have an optional mode to cross-check the test data aginst a PostgreSQL server. This is useful for developing, but is not part of the automated build. Parsing of BC dates is supported, #28099 could then be completed by changing the date-formatting code to print a BC date. This change would allow #30697 (incomplete handling of datestyle) to be re-evaluated, since the parser does allow configuration of YMD, DMY, or MDY input styles. Resolves #27500 Resolves #27501 Resolves #31954 Release note (sql change): A wider variety of date, time, and timestamp formats are now accepted by the SQL frontend. Release note (bug fix): Prepared statements that bind temporal values now respect the session's timezone setting. Previously, bound temporal values were always interpreted as though the session time zone were UTC. Release note (backward-incompatible change): Timezone abbreviations, such as EST, are no longer allowed when parsing or converting to a date/time type. Previously, an abbreviation would be accepted if it were an alias for the session's timezone. Co-authored-by: Bob Vawter <[email protected]>
@jseldess until this issue is fixed, we should call out in the docs for operational matters that the OS database should be updated, and all nodes should be restarted when the timezone data changes (e.g. the definition of a timezone changes because of geopolitical matters, or some country decides to change their DST rules). |
@bobvawter can you clarify why you used the label "pgcompat" here? What does PostgreSQL does in this case? |
I applied "pgcompat" here because this is an example of where CockroachDB
and PostgreSQL can produce difference results for the same inputs. pg
includes its own tzinfo distribution as part of the build process.
…On Mon, Nov 12, 2018 at 6:44 AM kena ***@***.***> wrote:
@bobvawter <https://github.com/bobvawter> can you clarify why you used
the label "pgcompat" here? What does PostgreSQL does in this case?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31978 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGtlMSglfNmSQILtKV2p1PcrBuVWlh7ks5uuV8ZgaJpZM4X_uGe>
.
|
ok thanks for clarifying. |
This would also address #32415. Keyword stuffing for future searches: zoneinfo, tzdata. |
@bdarnell, is cockroachdb/docs#4567 enough here from a docs perspective? Do we need to call this out as a known limitation as well? |
It's separate from cockroachdb/docs#4567. I'm not sure if it's a known limitation, more of a production best practice ("ensure that all nodes have the same version of the tzdata package; when updating this package roll it out as quickly as possible across all nodes"). While you're thinking about tzdata docs, you could also cover #32415 (which is a known limitation, that location-based time zone names may not resolve for a cockroachdb server running on windows. There is a workaround, which is to install the Go toolchain on the machine(s) running the server). |
The discussion should continue on #36864. |
Documented in several places: |
@otan let's chat here about options to keep this in sync across nodes. It's not because we may want to store today in the distributed KV that we need KV primitives to access it upon every tx lookup. They way I could see this work is the way we handle the HBA config already: store the tz data as a cluster setting which takes care of persistence, let it propagate via gossip, and use an in-RAM cache for lookups. |
#56634 improves the situation, but inconsistency between nodes is still possible at version upgrade time. |
CockroachDB relies on each node's tzinfo data for resolving named timezones like
America/New_York
and for applying daylight-savings rules. This can lead to inconsistent results across a cluster when coercing date/time strings into temporal types.The desired solution is to store tzinfo data as a system table, to be bootstrapped and maintained by a tzinfo distribution baked into the cockroach binary.
This would also make it easier to implement #31710 to support (custom) timezone abbreviations.
Jira issue: CRDB-4766
The text was updated successfully, but these errors were encountered: