Skip to content

Commit

Permalink
Improve the error message for invalid characters in domain names afte…
Browse files Browse the repository at this point in the history
…r Unicode NFC normalization

These cases were previously handled by the call to idna.encode or idna.alabel, but the error message wasn't consistent with similar checks we do for the local part.

See #142.
  • Loading branch information
JoshData committed Jun 19, 2024
1 parent 7f1f281 commit 8051347
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 7 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ In Development

* Email addresses with internationalized local parts could, with rare Unicode characters, be returned as valid but actually be invalid in their normalized form (returned in the `normalized` field). Local parts now re-validated after Unicode NFC normalization to ensure that invalid characters cannot be injected into the normalized address and that characters with length-increasing NFC normalizations cannot cause a local part to exceed the maximum length after normalization.
* The length check for email addresses with internationalized local parts is now also applied to the original address string prior to Unicode NFC normalization, which may be longer and could exceed the maximum email address length, to protect callers who do not use the returned normalized address.
* Improved error message for IDNA domains that are too long.
* Improved error message for IDNA domains that are too long or have invalid characters after Unicode normalization.
* A new option to parse `My Name <address@domain>` strings, i.e. a display name plus an email address in angle brackets, is now available. It is off by default.

2.1.2 (June 16, 2024)
Expand Down
10 changes: 10 additions & 0 deletions email_validator/syntax.py
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,16 @@ def validate_email_domain_name(domain: str, test_environment: bool = False, glob
except idna.IDNAError as e:
raise EmailSyntaxError(f"The part after the @-sign contains invalid characters ({e}).") from e

# Check for invalid characters after Unicode normalization which are not caught
# by uts46_remap (see tests for examples).
bad_chars = {
safe_character_display(c)
for c in domain
if not ATEXT_HOSTNAME_INTL.match(c)
}
if bad_chars:
raise EmailSyntaxError("The part after the @-sign contains invalid characters after Unicode normalization: " + ", ".join(sorted(bad_chars)) + ".")

# The domain part is made up dot-separated "labels." Each label must
# have at least one character and cannot start or end with dashes, which
# means there are some surprising restrictions on periods and dashes.
Expand Down
8 changes: 2 additions & 6 deletions tests/test_syntax.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,12 +392,8 @@ def test_domain_literal() -> None:
('me@⒈wouldbeinvalid.com',
"The part after the @-sign contains invalid characters (Codepoint U+2488 not allowed "
"at position 1 in '⒈wouldbeinvalid.com')."),
('me@\u037e.com',
"The part after the @-sign is invalid (Codepoint U+003B at position 1 "
"of ';' not allowed)."),
('me@\u1fef.com',
"The part after the @-sign is invalid (Codepoint U+0060 at position 1 "
"of '`' not allowed)."),
('me@\u037e.com', "The part after the @-sign contains invalid characters after Unicode normalization: ';'."),
('me@\u1fef.com', "The part after the @-sign contains invalid characters after Unicode normalization: '`'."),
('@example.com', 'There must be something before the @-sign.'),
('white space@test', 'The email address contains invalid characters before the @-sign: SPACE.'),
('test@white space', 'The part after the @-sign contains invalid characters: SPACE.'),
Expand Down

0 comments on commit 8051347

Please sign in to comment.