Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-109653: Improve the import time of email.utils #109824

Merged
merged 12 commits into from
Oct 12, 2023

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Sep 25, 2023

This patch reduces the import time of email.utils by around 46%.

Why do I care about email.utils? Well, email.utils is imported by email.message, and email.message is imported by lots of other stdlib modules: urllib.parse, mailbox, and importlib.metadata. Improving the import time of this module has a cascading effect through lots of the rest of the standard library.

A lot of the import cost of email.utils has to do with the make_msgid function, which requires the random and socket modules. This function is used by many third-party libraries, but isn't used at all by any of the stdlib modules that import email.utils. This patch moves the function into a separate submodule.

@AlexWaygood AlexWaygood added type-feature A feature request or enhancement performance Performance or resource usage topic-email 3.13 bugs and security fixes labels Sep 25, 2023
@AlexWaygood AlexWaygood requested a review from a team as a code owner September 25, 2023 10:03
Co-authored-by: Nikita Sobolev <[email protected]>
Lib/email/_msgid.py Outdated Show resolved Hide resolved
Lib/email/_msgid.py Outdated Show resolved Hide resolved
Lib/email/utils.py Outdated Show resolved Hide resolved
Lib/email/utils.py Outdated Show resolved Hide resolved
if attr == "make_msgid":
from email._msgid import make_msgid
return make_msgid
raise AttributeError(f"module {__name__!r} has no attribute {attr!r}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't like performance hacks like this, but they do have their place. I can't speak to whether or not this is a worthwhile performance hack, you should seek approval from the maintainers of the impacted modules for that.

That said, the stdlib itself makes no use of make_msgid, and email.utils is not itself considered part of the new email API. Moving it into a separate module and making that part of the non-legacy public API of the email module would actually make some sense. I guess we'd just call it 'msgid'? Then this code should issue a deprecation warning pointing to the new way to import make_msgid. It feels kind of weird to have a module with just one function, but there isn't really anything else related to it that I can think of. (I wonder...maybe make_msgid actually belongs in the UUID module? Probably not. Wrong RFC.)

@hauntsaninja
Copy link
Contributor

This feels a little strange to me. Should we instead just defer the imports? Might even be faster to do so

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Oct 3, 2023

This feels a little strange to me. Should we instead just defer the imports? Might even be faster to do so

Yeah, that might be a better shout. Have been meaning to give it a go — just haven't got round to it yet :)

I'm not really enthused about the idea of a painful deprecation period in order to change the place where people are "meant" to import it from. This function is pretty widely used by third-party packages, and the improvement in import time doesn't seem worth the disruption to me :)

@AlexWaygood
Copy link
Member Author

I've updated the PR. I abandoned the idea of adding a new submodule; now I just defer the problematic random, socket and os imports to inside make_msgid(). The reason I didn't do that originally was because I was concerned it would slow down the make_msgid() function, but local experimentation doesn't show that. (I ran python -m timeit -s "from email.utils import make_msgid" "make_msgid()" both with and without the change to test.) Shows the importance of benchmarking!

Co-authored-by: Adam Turner <[email protected]>
Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -94,6 +88,8 @@ def formataddr(pair, charset='utf-8'):
name.encode('ascii')
except UnicodeEncodeError:
if isinstance(charset, str):
# lazy import to improve module import time
from email.charset import Charset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like this one, only fine with it since formataddr doesn't look too widely used (a lot of the time it's nice to pay these costs upfront, predictable performance is important, e.g. don't want the first request your webserver serves to be randomly slow)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a lot of the time it's nice to pay these costs upfront, predictable performance is important

agreed. On the other hand, though, the email package goes in quite heavily for lazy imports in some other places, so this does seem in keeping with that general philosophy:

# Some convenience routines. Don't import Parser and Message as side-effects
# of importing email since those cascadingly import most of the rest of the
# email package.
def message_from_string(s, *args, **kws):
"""Parse a string into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import Parser
return Parser(*args, **kws).parsestr(s)
def message_from_bytes(s, *args, **kws):
"""Parse a bytes string into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import BytesParser
return BytesParser(*args, **kws).parsebytes(s)
def message_from_file(fp, *args, **kws):
"""Read a file and parse its contents into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import Parser
return Parser(*args, **kws).parse(fp)
def message_from_binary_file(fp, *args, **kws):
"""Read a binary file and parse its contents into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import BytesParser
return BytesParser(*args, **kws).parse(fp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to change it so it's imported at the top of the function if you think that'd be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah that'd be worse, module level or here. I'm fine with this as is!

Lib/email/utils.py Outdated Show resolved Hide resolved
@hauntsaninja hauntsaninja merged commit aa3f419 into python:main Oct 12, 2023
18 checks passed
@AlexWaygood AlexWaygood deleted the email-message-import branch October 12, 2023 22:26
@AlexWaygood AlexWaygood restored the email-message-import branch February 22, 2024 16:37
@AlexWaygood AlexWaygood deleted the email-message-import branch March 7, 2024 14:35
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes performance Performance or resource usage topic-email type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants