-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strptime(.., '%c') fails to parse output of strftime('%c', ..) in some locales #53203
Comments
The following code: import locale, time
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
t = time.localtime()
s = time.strftime('%c', t)
time.strptime('%c', s) Raises ValueError: time data '%c' does not match format 'Mer 9 jui 16:14:46 2010' in any locale where month follows day in '%c' format. Note that attached C code works as expected on my OSX laptop. I wonder it it would make sense to call platform strptime where available? I wonder if platform support for strptime has improved since 2002 when _strptime.py was introduced. |
Adding bpo-8915 as a dependency because deducing D_T_FMT locale setting from strftime output seems impossible: >>> locale.nl_langinfo(locale.D_T_FMT)
'%a %b %e %H:%M:%S %Y' |
Victor, You may be interested because your native language is implicated. :-) |
time.strptime(s, '%c' ) ? |
Oh my. It certainly took a long time to recognize a silly mistake! Thanks. |
My tests were wrong but the problem does exist. I am attaching a script that tests strptime(.., '%c') for all locales installed on my system (an unmodified US Mac OS X 10.6.6). The only failing locale that I recognize is Hebrew (he_IL). Eli, what do you think about this?
$ ./python.exe cfmt.py
am_ET [ማክሰ ጃንዩ 11 18:56:18 2011] %A %B %d %H:%M:%S %Y != %a %b %e %H:%M:%S %Y
et_EE [T, 11. jaan 2011. 18:56:18] %a, %d. %B %Y. %H:%M:%S != %a, %d. %b %Y. %T
he_IL [EST 18:56:18 2011 ינו 11 ג'] %Z %H:%M:%S %Y %B %d %a != %Z %H:%M:%S %Y %b %d %a |
|
On Tue, Jan 11, 2011 at 7:26 PM, Roumen Petrov <[email protected]> wrote:
According to what standard? POSIX defines it as %c Replaced by the locale's appropriate date and time representation. http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html and the manual page on my system agrees: %c is replaced by national representation of time and date. |
On Linux, cfmt.py fails on fr_FR locale (the only valid locale in the list of tested locales): The problem is the month format: locale.nl_langinfo(locale.D_T_FMT) returns '%a %d %b %Y %T %Z', but _strptime (LocaleTime().LC_date_time) uses '%a %d %B %Y %H:%M:%S %Z' => '%b' vs '%B'. _strptime.LocalTime.__calc_date_time() uses strftime('%c') and then parse the output to get the complete format. But it uses strftime('%c') with the march month, and in french, march is formatted 'mars' for both month formats (%b *and* %B). _strptime.LocalTime.__calc_date_time() should detect that the month has the same format with %b and %B, and try other timestamps (other months). |
Alexander, I get the same error for the he_IL locale. Will look into this |
The problem for Hebrew appears to be the same as the one Victor stated for French. March in Hebrew is also a 3-letter word which means it's equal to its abbreviation. |
I'm attaching a patch for Lib/_strptime.py that handles the month differently in __calc_date_time. It cycles all months, trying to find one where the full and abbrev names are different and matches it against the timestamp created by strftime. This solution is a hack, but so is the whole __calc_date_time function :-) [IMHO] All tests pass and I also tried it manually with all the problematic locales reported by Alexander - seems to work correctly. If this looks OK to you guys I can commit and backport. |
On Sat, Jan 15, 2011 at 2:20 AM, Eli Bendersky <[email protected]> wrote:
I am not sure how to proceed. On one hand, I opened this issue to I made this issue depend on bpo-8915 because I think strptime should I don't think this fix solves all the problems. For example, in most '%a %b %e %H:%M:%S %Y'
>>> LocaleTime().LC_date_time
'%a %b %d %H:%M:%S %Y' This does not seem to be an issue because strptime with %d seems to be On the patch itself:
Eli, what do you think yourself: should we try to perfect the hack or |
Python is not C! |
Alexander,
With understanding of (2) I will be able to also logically reason about the next steps :-) |
You pretty much hit the nail on the head. Some platforms don't have strptime or did not have it at the time this code was written. The locale module is probably more recent than this code as well. |
Alexander, but still - this isn't just an implementation of strptime. strptime, AFAIU strptime gets the format string as a parameter and uses it to parse a date string into a "tm" struct. So why do we need to parse a date string *without* a format string in Python, resorting to heuristics and pseudo-AI instead? |
Eli, Given your last comment, are you still proposing your patch for inclusion or should we take the bpo-8915 approach? |
…GH-124946) (GH-125370) In some locales (like French or Hebrew) the full or abbreviated names of the default month and weekday used in __calc_date_time can be part of other name or constant part of the %c format. The month name can also match %m with constant suffix (like in Japanese). So the code failed to correctly distinguish formats %a, %A, %b, %B and %m. Cycle all month and all days of the week to find the variable part and distinguish %a from %A and %b from %B or %m. Fixed locales for the following languges: Arabic, Bislama, Breton, Bodo, Kashubian, Chuvash, Estonian, French, Irish, Ge'ez, Gurajati, Manx Gaelic, Hebrew, Hindi, Chhattisgarhi, Haitian Kreyol, Japanese, Kannada, Korean, Marathi, Malay, Norwegian, Nynorsk, Punjabi, Rajasthani, Tok Pisin, Yoruba, Yue Chinese, Yau/Nungon and Chinese. (cherry picked from commit c05f9dd) Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Eli Bendersky <[email protected]>
…GH-124946) (GH-125369) In some locales (like French or Hebrew) the full or abbreviated names of the default month and weekday used in __calc_date_time can be part of other name or constant part of the %c format. The month name can also match %m with constant suffix (like in Japanese). So the code failed to correctly distinguish formats %a, %A, %b, %B and %m. Cycle all month and all days of the week to find the variable part and distinguish %a from %A and %b from %B or %m. Fixed locales for the following languges: Arabic, Bislama, Breton, Bodo, Kashubian, Chuvash, Estonian, French, Irish, Ge'ez, Gurajati, Manx Gaelic, Hebrew, Hindi, Chhattisgarhi, Haitian Kreyol, Japanese, Kannada, Korean, Marathi, Malay, Norwegian, Nynorsk, Punjabi, Rajasthani, Tok Pisin, Yoruba, Yue Chinese, Yau/Nungon and Chinese. (cherry picked from commit c05f9dd) Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Eli Bendersky <[email protected]>
Is this fixed now? Can it be closed? |
Not yet. Several other locales can be fixed using other approach. I am working on this. |
Fixed most locales that use non-ASCII digits, like Persian, Burmese, Odia and Shan.
#125406 fixes most of other locales that use non-ASCII digits. |
…H-125406) Fixed most locales that use non-ASCII digits, like Persian, Burmese, Odia and Shan.
…les (pythonGH-125406) Fixed most locales that use non-ASCII digits, like Persian, Burmese, Odia and Shan. (cherry picked from commit 5f4e5b5) Co-authored-by: Serhiy Storchaka <[email protected]>
…ales (GH-125406) (GH-125454) Fixed most locales that use non-ASCII digits, like Persian, Burmese, Odia and Shan. (cherry picked from commit 5f4e5b5) Co-authored-by: Serhiy Storchaka <[email protected]>
…s on many locales (pythonGH-125406) (pythonGH-125454) Fixed most locales that use non-ASCII digits, like Persian, Burmese, Odia and Shan. (cherry picked from commit 5f4e5b5) (cherry picked from commit cbcdf34) Co-authored-by: Miss Islington (bot) <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
Hi, testing all the merged changes on Solaris, I encountered two issues that I wanted to report:
Based on my digging, I believe that the issue is in the LMT; I dumped the I have no idea where that LMT comes from or whether it is Solaris specific though. It's there for old years: import locale
import time
locale.setlocale(locale.LC_TIME, "de_DE")
print(time.strftime("%c", (1900, 1, 1, 0, 0, 0, 0, 1, 0)))
print(time.strftime("%c", (1800, 1, 1, 0, 0, 0, 0, 1, 0)))
I don't think that the output of %x is standardized (Windows format is also different, although the year seems always 4 digit)? So it's possible that this might be an issue for other platforms as well. |
Thank you @kulikjak. LMT is a standard thing. I think the result of the test depends on the place where they are run. I do not know what to do with LMT, so I am going to just ignore failures if the result contains LMT. This may be a part of a larger issue.
What locales? We can skip this test on Solaris, but it would be better to keep it running for locales in which it works. |
That makes sense. Thank you.
From those being tested, en_US, de_DE and ar_AE do print only two digits with |
Use fixed timezone. Skip roundtrip tests on locales with 2-digit year.
Use fixed timezone. Skip roundtrip tests on locales with 2-digit year. (cherry picked from commit 9dde463) Co-authored-by: Serhiy Storchaka <[email protected]>
Use fixed timezone. Skip roundtrip tests on locales with 2-digit year. (cherry picked from commit 9dde463) Co-authored-by: Serhiy Storchaka <[email protected]>
Use fixed timezone. Skip roundtrip tests on locales with 2-digit year. (cherry picked from commit 9dde463) Co-authored-by: Serhiy Storchaka <[email protected]>
Use fixed timezone. Skip roundtrip tests on locales with 2-digit year. (cherry picked from commit 9dde463) Co-authored-by: Serhiy Storchaka <[email protected]>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: