-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-52551: Fix encoding issues in strftime() #125193
Conversation
Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531.
@vstinner, could you please review this PR? |
Modules/_datetimemodule.c
Outdated
newformat, timetuple, NULL); | ||
Py_DECREF(newformat); | ||
} | ||
if (0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if (0)
is weird. Why not adding a label after _PyUnicodeWriter_Dealloc() and using goto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require changing more code. Renaming label Done to Error (there are many goto Done
in not changed code) and adding new Done label. The if (0)
idiom is weird, but used in many places.
Modules/_datetimemodule.c
Outdated
} | ||
if (newformat != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to exit early: if (newformat == NULL) goto ...
.
By the way, only the _PyUnicodeWriter_Finish()
code path can fail, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require adding more labels. The flow would look more complicated.
format = PyUnicode_EncodeLocale(format_arg, "surrogateescape"); | ||
if (format == NULL) | ||
format_size = PyUnicode_GET_LENGTH(format_arg); | ||
format = PyMem_Malloc((format_size + 1)*sizeof(time_char)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to add an explicit integer overflow check: fail with PyErr_NoMemory() on overflow.
Modules/timemodule.c
Outdated
* will be ahead of time... | ||
*/ | ||
while (1) { | ||
outbuf = (time_char *)PyMem_Realloc(outbuf, bufsize*sizeof(time_char)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also add an explicit integer overflow check here.
Modules/timemodule.c
Outdated
Py_DECREF(unicode); | ||
} | ||
|
||
j = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should declare the j
variable here, and I suggest to rename it to start
.
} | ||
format[fmtlen++] = (char)c; | ||
} | ||
if (fmtlen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is quite big. You might move the code handling format
in a sub-function (the whole if
block).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be inconvenient, because this code changes several outer variables.: outbuf
and bufsize
.
I addressed all comments. Perhaps it is now more difficult to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I just left minor coding style suggestions that you can ignore.
Thanks for the exhaustive added tests, it's very helpful.
Modules/timemodule.c
Outdated
*outbuf = (time_char *)PyMem_Realloc(*outbuf, | ||
*bufsize*sizeof(time_char)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*outbuf = (time_char *)PyMem_Realloc(*outbuf, | |
*bufsize*sizeof(time_char)); | |
*outbuf = (time_char *)PyMem_Realloc(*outbuf, | |
*bufsize*sizeof(time_char)); |
Modules/timemodule.c
Outdated
@@ -776,27 +776,100 @@ the C library strftime function.\n" | |||
#endif | |||
|
|||
static PyObject * | |||
time_strftime(PyObject *module, PyObject *args) | |||
strftime1(time_char **outbuf, size_t *bufsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: you might rename the function to time_strftime1()
to avoid any conflict with a potential strftime1()
name from the C library.
Thank you for your review @vstinner. |
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
Sorry, @serhiy-storchaka, I could not cleanly backport this to
|
Sorry, @serhiy-storchaka, I could not cleanly backport this to
|
…5193) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit ad3eac1) Co-authored-by: Serhiy Storchaka <[email protected]>
Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit ad3eac1)
GH-125657 is a backport of this pull request to the 3.13 branch. |
…5657) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also gh-78662 and gh-124531. (cherry picked from commit ad3eac1)
…5193) (pythonGH-125657) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit 08ccbb9) Co-authored-by: Serhiy Storchaka <[email protected]> (cherry picked from commit ad3eac1)
…5657) (GH-125661) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also gh-78662 and gh-124531. (cherry picked from commit 08ccbb9) (cherry picked from commit ad3eac1) Co-authored-by: Serhiy Storchaka <[email protected]>
Congrats! That's a nice change. Using Unicode instead of bytes for the internal buffer is a good move. |
Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time.
Closes #52551, #78662 and #124531.
datetime.strftime
strings can be terminated by "\x00" literals #124531