Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: time: handle time zones better in Parse and LoadLocation #63345

Closed
robpike opened this issue Oct 3, 2023 · 13 comments
Closed

proposal: time: handle time zones better in Parse and LoadLocation #63345

robpike opened this issue Oct 3, 2023 · 13 comments
Labels
Milestone

Comments

@robpike
Copy link
Contributor

robpike commented Oct 3, 2023

Consider this unedited typescript running on my Mac with macOS and a recent Go distribution:

% cat x.go
package main

import (
	"fmt"
	"log"
	"time"
)

func main() {
	t, err := time.Parse("2006-01-02 15:04:05 MST", "2023-09-27 18:00:00 CEST")
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(t)
}
% go run x.go
2023-09-27 18:00:00 +0000 CEST
% TZ=CEST go run x.go
2023-09-27 18:00:00 +0000 CEST
% TZ=Europe/Berlin go run x.go
2023-09-27 18:00:00 +0200 CEST
% 

There are a few mysteries here for the uninitiated.

First, the time zone in the result - which is printed by time.Time.String so is actually what is held in the time value - always reads CEST. That seems good for starters.

Yet second, only the last of the three runs prints a correct time zone offset for CEST. The first two have it at zero, which is always wrong for CEST.

Third, even when I explicitly set the time zone through the TZ variable, it still doesn't get the offset right.

Fourth, even in the last run, the time zone prints as CEST although one might expect Europe/Berlin.

Fifth, perhaps strangest of all, this is working correctly according to the documentation in the time package.

This is a proposal to address, if not completely clear up, these mysteries.

Allow me to explain. (If you don't know much about time zones and how they work in modern computing, I suggest reading this IETF document first.)

The concept of time zone is not a first-class citizen in the time package. Instead, "location" is paramount. To get the correct rendering of a time, we must in effect ask for the time in Berlin, not in Central European Standard Time. In library terms, we call LoadLocation, not LoadTimeZone. There are a number of reasons for this, and it is confusing, but the fundamental one is that is how the IETF time zone (ha!) database works. That is why the only run above that gets everything right is the one where we set the location, even though that is ironically, incorrectly and confusingly done through the legacy TZ environment variable, whose initials of course stand for "time zone".

What happens in that third run is that because our location is Berlin, we have loaded the data for Berlin, and in that data is a description of the time zone called CEST. There is no time zone file for CEST itself. The only description of CEST is inside the data for Berlin. (Have a look for yourself; the data is in /usr/share/zoneinfo/europe/Berlin on Unix, although it's in binary; you need a program called zdump to examine it.)

In the first run, my location is not in central Europe, the data for CEST is never loaded, and time.Parse humors me but delivers an unsatisfactory result. I'll explain why in a minute. On the other hand, if you happen to live in central Europe, that first run will in fact work properly for you.

In the second run, a reasonable attempt to address this issue through the time-zone-looking TZ variable still fails, because CEST is not a location and therefore cannot be LoadLocationed.

You might ask, "Why doesn't time.Parse give an error then, if CEST is not a valid location?" It's because of this paragraph, the last in the doc comment for time.Parse:

When parsing a time with a zone abbreviation like MST, if the
zone abbreviation has a defined offset in the current location,
then that offset is used. The zone abbreviation "UTC" is recognized
as UTC regardless of location. If the zone abbreviation is unknown,
Parse records the time as being in a fabricated location with the given
zone abbreviation and a zero offset. This choice means that such a time can
be parsed and reformatted with the same layout losslessly, but the exact
instant used in the representation will differ by the actual zone offset.
To avoid such problems, prefer time layouts that use a numeric zone offset,
or use ParseInLocation.

I claim that this is an unfortunate design decision, albeit one made in good faith with a tinge of ignorance back in the early days of Go's development.

(You might also ask, "Why does time.Parse accept time zone strings and not locations?" There the answer is simpler: That's what time values look like, confusing though it is in this context.)

Compatibility concerns might argue we can't fix this behavior, but I believe it's confusing and results in apparently correct but actually wrong results, and should be fixed. But we can't just switch the whole thing to time zones for the very reason that locations were chosen as the fundamental unit. But we can do something about time zones.

Here is the proposal.

If you look in the time zone database manually, you'll find that all the time zone abbreviations like EST and AEST and CEST are unique, or at least nearly all are. In the data we can see the offset, which is a fixed value. For instance, EST is always offset by 5 hours from UTC, while EDT is always offset by 4 hours. I propose:

  1. When building the (misnamed) time zone data into the Go library, we actually include the time zone information as well as the location data for all unique time zone strings. It could just be a small map of names to offsets. There is a historical component we could use, but the data is actually stable enough that I wouldn't worry about it, and just use the most recent offset values.
  2. If Parse or LoadLocation is given a location/time zone that is not loadable, instead of incorrectly using a zero offset, we look up the name in that table. If it exists, we build a nonce Location with the provided name and fixed offset.
  3. If it does not exist, for compatibility we will need to keep the original incorrect behavior by assuming a zero offset. If we are willing to break compatibility, we could give an error. I'm not certain which is the best answer here.

The properties of these nonce locations are worth enumerating:

  1. They will not correctly handle daylight savings time switching. They are a fixed offset. If you ask for EST in the US summer, you will get the five-hour offset not the four-hour one you would if you used America/New_York as your location. But what gets printed will be logical, consistent, and reasonable. It will be a valid time with a matching time zone name and offset.
  2. If you ask for EST while in the location America/New_York in the summer, the time will come back with the correct location that prints as EDT. In other words, the behavior will still depend on your location, just not so confusingly.
  3. All three runs of the program above would always give a two hour offset for CEST (daylight savings time aside), regardless of the location running the program.
  4. Because they are not correct locations, they may actually make things worse in some cases by having people depend more on these nonce locations than they should, but that may be a tradeoff worth making.

I believe this approach is easy, reasonable, less confusing, and worth doing.

(Developed in conversation with @rsc).

@fzipp
Copy link
Contributor

fzipp commented Oct 3, 2023

  1. If Parse or LoadLocation is given a location/time zone that is not loadable, instead of incorrectly using a zero offset, we look up the name in that table. If it exists, we build a nonce Location with the provided name and fixed offset.

An alternative would be:

  • Add a new function LoadFixedZone(abbrev string) (*Location, error) that returns time.FixedZone(abbrev, offset) with the determined offset from that table.

  • LoadLocation continues to return what it returns today.

  • Parse attempts to use the new LoadFixedZone when encountering a zone string, as it's the most specific option and corresponds to what is produced by Time.Format. If that fails, it falls back to using LoadLocation.

This approach keeps the concepts of 'time zone location' and 'time zone' separate within the API, at least at the function level. In the IETF database, some strings (such as "CET") serve as both a time zone location and a time zone. By providing two functions, users can decide whether they want to look up "CET" as a time zone location or "CET" as an actual time zone.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Oct 3, 2023
@ianlancetaylor
Copy link
Contributor

I'm concerned about the ambiguities. For example, CST is both +0800 (China Standard Time) and -0600 (Central Standard Time in the U.S.). PST is both -0800 (Pacific Standard Time in the U.S.) and +0800 (Philippine Standard Time).

Running a little program over the timezone database turns up these duplicate identifiers:

AMT appears in [Europe/Athens Africa/Asmara America/Asuncion Europe/Amsterdam]
APT appears in [US/Alaska Canada/Atlantic]
AST appears in [US/Alaska Canada/Atlantic]
AWT appears in [US/Alaska Canada/Atlantic]
BMT appears in [Europe/Brussels Europe/Tiraspol Europe/Zurich Africa/Banjul America/Bogota Asia/Baghdad Asia/Bangkok Asia/Jakarta Atlantic/Bermuda]
BST appears in [US/Aleutian America/La_Paz Atlantic/Bermuda GB-Eire]
CDT appears in [US/Indiana-Starke Cuba ROC]
CMT appears in [America/Rosario America/Caracas America/Panama America/La_Paz America/St_Lucia Europe/Tiraspol Europe/Copenhagen]
CST appears in [Cuba ROC US/Michigan]
FMT appears in [Africa/Freetown Atlantic/Madeira]
HDT appears in [US/Aleutian US/Hawaii]
HMT appears in [Cuba Asia/Kolkata Atlantic/Azores Europe/Mariehamn]
IMT appears in [Asia/Irkutsk Turkey]
IST appears in [Europe/Dublin Asia/Kolkata Israel]
JMT appears in [Israel Atlantic/St_Helena]
KMT appears in [Jamaica America/St_Vincent Europe/Zaporozhye Europe/Vilnius]
MMT appears in [America/Montevideo Asia/Colombo W-SU Indian/Maldives Africa/Monrovia America/Managua Asia/Kolkata Asia/Ujung_Pandang Europe/Minsk]
MST appears in [W-SU US/Mountain]
NPT appears in [US/Aleutian Canada/Newfoundland]
NST appears in [US/Aleutian Canada/Newfoundland Europe/Amsterdam]
NWT appears in [US/Aleutian Canada/Newfoundland]
PDT appears in [US/Pacific Asia/Manila]
PMT appears in [Asia/Pontianak Asia/Yekaterinburg Europe/Prague Europe/Paris America/Paramaribo]
PST appears in [US/Pacific Asia/Manila]
RMT appears in [Asia/Yangon Europe/Riga Europe/Vatican]
SAST appears in [Africa/Windhoek Africa/Johannesburg]
SMT appears in [Chile/Continental Singapore Atlantic/Stanley Europe/Simferopol]
TMT appears in [Iran Europe/Tallinn]

@robpike
Copy link
Contributor Author

robpike commented Oct 3, 2023

@ianlancetaylor As I said in the proposal text, the offset would only be created for unique names, although there are more duplicates than I suspected.

@fzipp That does seem like a clean proposal, but in practice I suspect the pattern would be to call LoadLocation and if that fails, LoadFixedZone. Putting the functionality into LoadLocation seems worthwhile.

@ianlancetaylor
Copy link
Contributor

@robpike Yeah, I understood the intended handling of duplicate names, but it troubles me that PST and PDT, which I would expect to be among the more common timezone identifiers that people use, are ambiguous.

@rittneje
Copy link

rittneje commented Oct 3, 2023

I'm not sure what the use case is for this kind of string, but it seems like the caller is the only one in a position to resolve the ambiguity. To that end, maybe there should be another flavor of time.Parse that takes something like func(string) (*time.Location, error), where the parameter to the callback is the timezone from the string being parsed (e.g., "EST").

(Of course, in the simplest case, they could just parse the timezone themselves beforehand and use time.ParseInLocation.)

@robpike
Copy link
Contributor Author

robpike commented Oct 4, 2023

It's also possible to disambiguate the names using the installed location, as hinted by @rittneje but more automatically.

First, though, there needs to be some acknowledgement that there is a problem that needs addressing. To me, that's clear, but maybe not to others, or maybe not of sufficient importance. But that typescript above is not a happy thing.

@ianlancetaylor
Copy link
Contributor

I, at least, agree that there is a problem here. Possibly related: #19694 #34913 #56528

@rittneje
Copy link

rittneje commented Oct 4, 2023

It's also possible to disambiguate the names using the installed location

@robpike can you elaborate on what you mean by this?

@rsc
Copy link
Contributor

rsc commented Oct 27, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Oct 27, 2023
@unixdj
Copy link

unixdj commented Nov 20, 2023

@robpike

I would suggest that Go handles the TZ environment variable like libc/tzcode does, so that it's consistent with the rest of the system. When TZ is set to Europe/Berlin, date(1) prints the timezone abbreviation as CET or CEST, thus I would say that in the third example in your original comment Go is behaving correctly.

$ TZ=Europe/Berlin date
Mon Nov 20 21:40:18 CET 2023

What is currently lacking is parsing POSIX-style TZ strings, as described in the newtzset(3) man page (e.g., CET-1CEST,M3.5.0,M10.5.0/3). I added a call to tzset to initLocal in src/time/zoneinfo_unix.go (and made some other improvements related to handling of type rule) and intend to submit patches.

However, time.tzset parses the string strictly, whilst the parser in tzcode (used by glibc and others) parses as much of the string as it can and discards the rest, thus TZ=CEST has the same effect as TZ=CEST0 (a static timezone named CEST at UTC+0):

$ TZ=UTC date ; TZ=Etc/GMT+0 date ; TZ=CEST date
Mon Nov 20 21:12:33 UTC 2023
Mon Nov 20 21:12:33 GMT 2023
Mon Nov 20 21:12:33 CEST 2023

I kinda understand the desire to handle timezone abbreviations in a best-effort manner in Parse and ParseInLocation, but in my opinion the pitfalls (like "CEST" being a timezone abbreviation but "CET" a location that is in CEST most of the time) are too big for LoadLocation.

My suggestion would be for LoadLocation to do either of:

  • accept only zoneinfo locations + "Local" (current behaviour)
  • accept zoneinfo locations + "Local" and POSIX-style TZ strings

There's a comment by @rsc in src/time/zoneinfo.go, before LoadLocation:

// NOTE(rsc): Eventually we will need to accept the POSIX TZ environment
// syntax too, but I don't feel like implementing it today.

There are a couple of fine points about handling of TZ that I'm not sure about, like whether POSIX-style TZ strings should be handled on OSes other than Unix at all. Should I submit another proposal or keep the discussion here?

@fzipp

  • Parse attempts to use the new LoadFixedZone when encountering a zone string, as it's the most specific option and corresponds to what is produced by Time.Format. If that fails, it falls back to using LoadLocation.

A timezone abbreviation may also be useful in ParseInLocation to disambiguate a timestamp around an autumn transition, e.g., 2023-10-29 02:30 +0200 CEST and 02:30 +0100 CET.

However, if there's no time corresponding to such string exactly, this may be tricky. E.g., Europe/Dublin has three time types named IST, two of which differ only by the value of the DST flag (at UTC+1) and one that was used from 1916-05-21 to 1916-10-01 (at GMT+00:34:39).

@robpike
Copy link
Contributor Author

robpike commented Jan 16, 2024

I tried implementing this proposal, scanning the database to get all the timezone initialisms, and there are far more duplicates than I imagined. Therefore although I'd love to find a solution to the problem, which I think is a problem, my naive suggestion is certainly not it.

@rsc
Copy link
Contributor

rsc commented Jan 19, 2024

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals Jan 19, 2024
@rsc rsc moved this from Likely Decline to Declined in Proposals Jan 26, 2024
@rsc
Copy link
Contributor

rsc commented Jan 26, 2024

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Declined
Development

No branches or pull requests

7 participants