Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeutil: embed time zone data into CockroachDB #55377

Closed
wants to merge 1 commit into from

Conversation

otan
Copy link
Contributor

@otan otan commented Oct 9, 2020

  • Use 4d63.com/tz to embed timezone data.
  • Modify timeutil.LoadLocation to use this data.
  • Remove cli checks for tzdata.
  • Add lint rule to use timeutil.LoadLocation

The binary size changes by ~1MB here.

full credits to @gliptak for linking.
Informs #36864
Refs #31978

Release note (general change): The timezone data is now built in to the
CockroachDB binary, thus no longer relying on the system tzdata for
locations.

@otan otan requested a review from knz October 9, 2020 15:31
@otan otan requested a review from a team as a code owner October 9, 2020 15:31
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor

knz commented Oct 9, 2020

Sorry I don't get it - why did you opt for a 3rd party package instead of using the facility we get for free in go 1.15?

@otan
Copy link
Contributor Author

otan commented Oct 9, 2020

I recall we're waiting for Go 1.16 before an upgrade.
I don't think that'll happen before v21.1.

@knz
Copy link
Contributor

knz commented Oct 9, 2020

@nvanbenschoten @petermattis do you think we could consider go 1.15 for crdb v21.1 if 1.16 is not there on time?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR as-is is a good intermediate step even if go 1.15 makes it even simpler in the future
:lgtm_strong:

Also @awoods187 @bdarnell motion to backport this to 20.2? The change seems minimal.

Reviewed 14 of 14 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

* Use 4d63.com/tz to embed timezone data.
* Modify timeutil.LoadLocation to use this data.
* Remove cli checks for tzdata.
* Add lint rule to use timeutil.LoadLocation

The binary size changes by ~1MB here.

Release note (general change): The timezone data is now built in to the
CockroachDB binary, thus no longer relying on the system tzdata for
locations.
@nvanbenschoten
Copy link
Member

do you think we could consider go 1.15 for crdb v21.1 if 1.16 is not there on time?

Yes, this is what we had decided upon when we switched back to go 1.13. We're going to switch to go 1.15 early in the next release cycle (a week or two after v20.2 is out the door). go 1.16 won't be released until 2021, so it's not clear whether it will land in time for us to use it in v21.1.

@otan
Copy link
Contributor Author

otan commented Oct 9, 2020

I think i'm still in favour of this solution as it is patchable to allow lower case time zone comparisons (#36847), as well as a good safety net in case go 1.15's partial fix does not work out and we need to go back to 1.13.

(also backportable to 20.2 if we chose to)

@knz
Copy link
Contributor

knz commented Oct 9, 2020

Yes your PR is good to go as-is. Your argument is sound.

@bdarnell
Copy link
Contributor

bdarnell commented Oct 9, 2020

The go 1.15 tzdata package is used as a fallback - if external time zone data is found on this system, it will be used instead. This package uses the embedded data as the sole source. That means you lose the ability to update the tzdata package as updates occur, which is sometimes significant as governments like to change DST rules on short notice. (and it appears that this package tracks the zoneinfo files as embedded in Go, rather than the upstream sources from IANA).

This feels like a significant regression to me, although maybe it's not one in practice since docker makes it impractical to manage tzdata updates in the traditional way and I've not heard of anyone who cares or even notices.

Neither this change nor the go 1.15 solution completely solve #31978. With external tzdata, there is a window of inconsistency when a new version of the tzdata package is rolled out, but this is a low-risk change so it (can be) done rapidly across the fleet. Embedding tzdata in the binary means that your time zone data is inconsistent for the duration of your CRDB canary and upgrade process, which can be quite long.

it is patchable to allow lower case time zone comparisons

But the author does not intend to keep updating it after go 1.16 is released, so this would put us on the hook for future maintenance.

So I'm 👎 on backporting this to 20.2, although it would have the nice side effect of letting us more easily use ubi-minimal in #54812.

@knz
Copy link
Contributor

knz commented Oct 9, 2020

ok so maybe this PR should not be marked to solve #31978 - there would be more work needed afterwards (store the tzdata in-db to ensure it's consistent across all nodes)

@knz
Copy link
Contributor

knz commented Oct 10, 2020

@bdarnell regardless of whether we adopt this PR or adopt go 1.15, what do you think the future looks like?

It seems to be that we are committed to running CockroachDB inside containers, and that goes with keeping the system image immutable at run-time.

So the only way to "pick up the latest IANA changes" is to re-bake the container base image, while keeping the CockroachDB version constant.

So do you think that's going to be our process (in theory)?

  1. every X months, re-bake the system image, pick up OS-level dependency changes
  2. push that new docker image to our users so they can pick up changes

If we were to recommend this, why do you think it's better to depend on system-level tzdata, instead of an archive we'd inject ourself at step (1)?

Finally, what do you recommend should be a good next step for @otan here?

@otan
Copy link
Contributor Author

otan commented Oct 10, 2020

We could introduce an env variable to use a tzdata from a user compiled archive instead of our embedded version if we so choose (or even the system one). I think a user can mount this onto each docker image (or in the case of k8s use a config map) if they wanted to pursue this avenue.

Alternatively, we could make using the system one the default and falling back to the embedded version in 20.2 if we so choose (of course, we should log this in cli/start or demo).

(To fully solve #31978 it sounds like tzdata should be stored in the database as part of a system table, which is also gross as updating it with IANA sounds like a painful primitive and needing to use internal executors everywhere to fetch this metadata. I suppose that solution fixes this too but I'm not sure if this is viable. )

@bdarnell
Copy link
Contributor

regardless of whether we adopt this PR or adopt go 1.15, what do you think the future looks like?

I think in the long-term future, in order to solve #31978 we need to support multiple versions of tzdata and switch between them in a coordinated way across all nodes, which precludes either using the system files or the go 1.5 package. But I think that's a long way off, so let's think about the nearer-term future.

If we were to recommend this, why do you think it's better to depend on system-level tzdata, instead of an archive we'd inject ourself at step (1)?

If we use the system-level data, each time we produce a patch release we get the latest tzdata from redhat automatically (along with security patches for base libraries, etc). If we embed something into the binary, we are either tied to the Go upgrade cycle (they appear to only plan to update the go embedded tzdata every 6 months, although I'm sure that if a major jurisdiction has a late-breaking DST change they'll do a faster release) or we have to do extra work on our side to pick up updates.

Finally, what do you recommend should be a good next step for @otan here?

I think I'd just wait for 21.1 and go 1.15. If we really wanted to backport embedded tzdata into 20.2.x we could add something that defaults to the system tzdata and falls back to the embedded one if it's missing, but this feels like a new feature that wouldn't qualify for a backport.

@otan
Copy link
Contributor Author

otan commented Oct 13, 2020

So I'm 👎 on backporting this to 20.2, although it would have the nice side effect of letting us more easily use ubi-minimal in #54812.

Looks like #55467 takes care of that.

I guess that just leaves the remaining question of whether this is "better" than having the tzdata behaviour depend on the system.

@otan
Copy link
Contributor Author

otan commented Oct 27, 2020

alrighty, since we're going to go1.15 soon, i'll leave this alone

@otan otan closed this Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants