Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-30681: Support invalid date format or value in email Date header #22090

Merged
merged 3 commits into from
Oct 27, 2020

Conversation

sim0nx
Copy link
Contributor

@sim0nx sim0nx commented Sep 4, 2020

I am re-submitting an older PR which was abandoned but is still relevant, #10783 by @timb07.

The issue being solved (https://bugs.python.org/issue30681) is still relevant. The original PR #10783 was closed as
the final request changes were not applied and since abandoned.

In this new PR I have re-used the original patch plus applied both comments from the review, by @maxking and @pganssle.

For reference, here is the original PR description:
In email.utils.parsedate_to_datetime(), a failure to parse the date, or invalid date components (such as hour outside 0..23) raises an exception. Document this behaviour, and add tests to test_email/test_utils.py to confirm this behaviour.

In email.headerregistry.DateHeader.parse(), check when parsedate_to_datetime() raises an exception and add a new defect InvalidDateDefect; preserve the invalid value as the string value of the header, but set the datetime attribute to None.

Add tests to test_email/test_headerregistry.py to confirm this behaviour; also added test to test_email/test_inversion.py to confirm emails with such defective date headers round trip successfully.

This pull request incorporates feedback gratefully received from @bitdancer, @brettcannon, @Mariatta and @warsaw, and replaces the earlier PR #2254.

https://bugs.python.org/issue30681

https://bugs.python.org/issue30681

Automerge-Triggered-By: GH:warsaw

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@sim0nx

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@@ -124,7 +124,10 @@ of the new API.
.. function:: parsedate_to_datetime(date)

The inverse of :func:`format_datetime`. Performs the same function as
:func:`parsedate`, but on success returns a :mod:`~datetime.datetime`. If
:func:`parsedate`, but on success returns a :mod:`~datetime.datetime`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at this now during the Python core sprint. Thanks for resurrecting this PR.

I find this description a bit confusing because it's saying that the function either returns None or raises ValueError when parsing an invalid date. That makes it difficult to know when I'll get None and when I'll get an exception. I think you're saying that only some bogus strings will raise ValueError, e.g. those where the hour is invalid. I also don't see where the tests you're adding include the test for None return.

I'm not sure what to suggest, because I'm not sure what the intended semantics should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't see how parsedate_to_datetime would ever return None. It is _parsedate_tz which returns None on bogus input, in which case parsedate_to_datetime raises a TypeError.
This is also covered in the tests, so those should be fine.

In order to continue I suggest to fix the documentation on parsedate_to_datetime, remove the mention of it returning None and replacing it with it possibly returning TypeError in case of an invalid date input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the ticket, I agree with you; parsedate_to_datetime() can't return None, so let's remove that from the documentation. I also suggest checking for None return from _parsedate_tz() and turning that into an explicit raise of ValueError rather than relying on the implicit TypeError from a failing tuple unpack. Then the docs should say that ValueError is raised on an invalid date input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments. I fully agree with your points, that makes it definitely cleaner.
I implemented them in my latest commit.

@warsaw
Copy link
Member

warsaw commented Oct 23, 2020

Also, from the ticket:

Aside: I noticed that on _parseaddr.py:68, there's a bare return. That should really be return None (EIBTI). Can you fix that in your PR?

sim0nx and others added 3 commits October 23, 2020 22:17
Re-submitting an older PR which was abandoned but still relevant.
…it raises a TypeError on invalid input instead of it returning None
- _parsedate_tz should explicitely return None
- parsedate_to_datetime should check for None as return value by _parsedate_tz
   and in that case raise ValueError
- Fix documentation and tests accordingly
@warsaw
Copy link
Member

warsaw commented Oct 23, 2020

@sim0nx This PR looks great now, thanks! Before I approve it, I want to get a ruling from the RMs about backporting. If I don't hear from them in the next couple of days, I'll raise it on the mailing list. Either way, this will for sure land in 3.10.

@warsaw warsaw requested review from ambv and ned-deily October 23, 2020 21:09
@sim0nx
Copy link
Contributor Author

sim0nx commented Oct 23, 2020

@sim0nx This PR looks great now, thanks! Before I approve it, I want to get a ruling from the RMs about backporting. If I don't hear from them in the next couple of days, I'll raise it on the mailing list. Either way, this will for sure land in 3.10.

Great news, thank you!

@warsaw
Copy link
Member

warsaw commented Oct 23, 2020

Actually, only @ambv needs to chime in, since this would only be a candidate for backporting to 3.8 and 3.9.

@ned-deily ned-deily removed their request for review October 26, 2020 04:44
@warsaw
Copy link
Member

warsaw commented Oct 26, 2020

Based on discussions in -committers, we won't back port.

@miss-islington miss-islington merged commit 303aac8 into python:master Oct 27, 2020
shihai1991 added a commit to shihai1991/cpython that referenced this pull request Oct 29, 2020
…lots1

* origin/master: (365 commits)
  bpo-42029: Remove IRIX code (pythonGH-23023)
  bpo-42143: Ensure PyFunction_NewWithQualName() can't fail after creating the func object (pythonGH-22953)
  bpo-34204: Use pickle.DEFAULT_PROTOCOL in shelve (pythonGH-19639)
  bpo-41805: Documentation for PEP 585 (pythonGH-22615)
  bpo-42161: Micro-optimize _collections._count_elements() (pythonGH-23008)
  bpo-42161: Remove private _PyLong_Zero and _PyLong_One (pythonGH-23003)
  bpo-42099: Fix reference to ob_type in unionobject.c and ceval (pythonGH-22829)
  bpo-41659: Disallow curly brace directly after primary (pythonGH-22996)
  bpo-6761: Enhance __call__ documentation (pythonGH-7987)
  bpo-42161: Modules/ uses _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22998)
  bpo-41474, Makefile: Add dependency on cpython/frameobject.h (pythonGH-22999)
  bpo-42157: Rename unicodedata.ucnhash_CAPI (pythonGH-22994)
  bpo-42161: Use _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22995)
  bpo-30681: Support invalid date format or value in email Date header (pythonGH-22090)
  bpo-42161: Add _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22993)
  bpo-42123: Run the parser two times and only enable invalid rules on the second run (pythonGH-22111)
  bpo-42157: Convert unicodedata.UCD to heap type (pythonGH-22991)
  bpo-42157: unicodedata avoids references to UCD_Type (pythonGH-22990)
  bpo-39101: Fixes BaseException hang in IsolatedAsyncioTestCase. (pythonGH-22654)
  bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (pythonGH-22713)
  ...
@sim0nx sim0nx deleted the fix-issue-30681c branch October 30, 2020 09:26
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
…ythonGH-22090)

I am re-submitting an older PR which was abandoned but is still relevant, python#10783 by @timb07.

The issue being solved () is still relevant. The original PR python#10783 was closed as
the final request changes were not applied and since abandoned.

In this new PR I have re-used the original patch plus applied both comments from the review, by @maxking and @pganssle.


For reference, here is the original PR description:
In email.utils.parsedate_to_datetime(), a failure to parse the date, or invalid date components (such as hour outside 0..23) raises an exception. Document this behaviour, and add tests to test_email/test_utils.py to confirm this behaviour.

In email.headerregistry.DateHeader.parse(), check when parsedate_to_datetime() raises an exception and add a new defect InvalidDateDefect; preserve the invalid value as the string value of the header, but set the datetime attribute to None.

Add tests to test_email/test_headerregistry.py to confirm this behaviour; also added test to test_email/test_inversion.py to confirm emails with such defective date headers round trip successfully.

This pull request incorporates feedback gratefully received from @bitdancer, @brettcannon, @Mariatta and @warsaw, and replaces the earlier PR python#2254.

Automerge-Triggered-By: GH:warsaw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants