Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-109653: Improve the import time of email.utils #109824

Merged
merged 12 commits into from
Oct 12, 2023
29 changes: 29 additions & 0 deletions Lib/email/_msgid.py
AlexWaygood marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# This function used to be part of email.utils,
# but has been separated out to speedup the import of that module

import os
import random
import socket
import time

def make_msgid(idstring=None, domain=None):
"""Returns a string suitable for RFC 2822 compliant Message-ID, e.g:

<142480216486.20800.16526388040877946887@nightshade.la.mastaler.com>

Optional idstring if given is a string used to strengthen the
uniqueness of the message id. Optional domain if given provides the
portion of the message id after the '@'. It defaults to the locally
defined hostname.
"""
timeval = int(time.time()*100)
pid = os.getpid()
randint = random.getrandbits(64)
if idstring is None:
idstring = ''
else:
idstring = '.' + idstring
if domain is None:
domain = socket.getfqdn()
msgid = '<%d.%d.%d%s@%s>' % (timeval, pid, randint, idstring, domain)
AlexWaygood marked this conversation as resolved.
Show resolved Hide resolved
return msgid
39 changes: 10 additions & 29 deletions Lib/email/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,8 @@
'unquote',
]

import os
import re
import time
import random
import socket
import datetime
import urllib.parse

Expand All @@ -36,9 +33,6 @@

from email._parseaddr import parsedate, parsedate_tz, _parsedate_tz

# Intrapackage imports
from email.charset import Charset

COMMASPACE = ', '
EMPTYSTRING = ''
UEMPTYSTRING = ''
Expand Down Expand Up @@ -94,6 +88,8 @@ def formataddr(pair, charset='utf-8'):
name.encode('ascii')
except UnicodeEncodeError:
if isinstance(charset, str):
# lazy import to improve module import time
from email.charset import Charset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like this one, only fine with it since formataddr doesn't look too widely used (a lot of the time it's nice to pay these costs upfront, predictable performance is important, e.g. don't want the first request your webserver serves to be randomly slow)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a lot of the time it's nice to pay these costs upfront, predictable performance is important

agreed. On the other hand, though, the email package goes in quite heavily for lazy imports in some other places, so this does seem in keeping with that general philosophy:

# Some convenience routines. Don't import Parser and Message as side-effects
# of importing email since those cascadingly import most of the rest of the
# email package.
def message_from_string(s, *args, **kws):
"""Parse a string into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import Parser
return Parser(*args, **kws).parsestr(s)
def message_from_bytes(s, *args, **kws):
"""Parse a bytes string into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import BytesParser
return BytesParser(*args, **kws).parsebytes(s)
def message_from_file(fp, *args, **kws):
"""Read a file and parse its contents into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import Parser
return Parser(*args, **kws).parse(fp)
def message_from_binary_file(fp, *args, **kws):
"""Read a binary file and parse its contents into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import BytesParser
return BytesParser(*args, **kws).parse(fp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to change it so it's imported at the top of the function if you think that'd be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah that'd be worse, module level or here. I'm fine with this as is!

charset = Charset(charset)
encoded_name = charset.header_encode(name)
return "%s <%s>" % (encoded_name, address)
Expand Down Expand Up @@ -171,29 +167,6 @@ def format_datetime(dt, usegmt=False):
return _format_timetuple_and_zone(now, zone)


def make_msgid(idstring=None, domain=None):
"""Returns a string suitable for RFC 2822 compliant Message-ID, e.g:

<142480216486.20800.16526388040877946887@nightshade.la.mastaler.com>

Optional idstring if given is a string used to strengthen the
uniqueness of the message id. Optional domain if given provides the
portion of the message id after the '@'. It defaults to the locally
defined hostname.
"""
timeval = int(time.time()*100)
pid = os.getpid()
randint = random.getrandbits(64)
if idstring is None:
idstring = ''
else:
idstring = '.' + idstring
if domain is None:
domain = socket.getfqdn()
msgid = '<%d.%d.%d%s@%s>' % (timeval, pid, randint, idstring, domain)
return msgid


def parsedate_to_datetime(data):
parsed_date_tz = _parsedate_tz(data)
if parsed_date_tz is None:
Expand Down Expand Up @@ -351,3 +324,11 @@ def localtime(dt=None, isdst=None):
if dt is None:
dt = datetime.datetime.now()
return dt.astimezone()


def __getattr__(attr):
# lazy import, to speed up module import time
if attr == "make_msgid":
from email._msgid import make_msgid
return make_msgid
raise AttributeError(f"module {__name__!r} has no attribute {attr!r}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't like performance hacks like this, but they do have their place. I can't speak to whether or not this is a worthwhile performance hack, you should seek approval from the maintainers of the impacted modules for that.

That said, the stdlib itself makes no use of make_msgid, and email.utils is not itself considered part of the new email API. Moving it into a separate module and making that part of the non-legacy public API of the email module would actually make some sense. I guess we'd just call it 'msgid'? Then this code should issue a deprecation warning pointing to the new way to import make_msgid. It feels kind of weird to have a module with just one function, but there isn't really anything else related to it that I can think of. (I wonder...maybe make_msgid actually belongs in the UUID module? Probably not. Wrong RFC.)

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Reduce the import time of :mod:`email.utils` by around 46%. This results in
the import time of :mod:`email.message` falling by around 28%, which in turn
reduces the import time of :mod:`importlib.metadata` by around 5%. Patch by
Alex Waygood.
Loading