Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-121423: Improve import time of socket by writing socket.errorTab as a constant and lazy import modules #121424

Merged
merged 12 commits into from
Sep 4, 2024

Conversation

Wulian233
Copy link
Contributor

@Wulian233 Wulian233 commented Jul 6, 2024

hyperfine:

$ hyperfine -w 16 './python -c "import socket"'
  Time (mean ± σ):      14.6 ms ±   0.7 ms    [User: 11.9 ms, System: 2.6 ms]
  Range (min … max):    13.4 ms …  16.5 ms    192 runs
 
$ git switch pr/121424 
$ hyperfine -w 16 './python -c "import socket"'
  Time (mean ± σ):      13.9 ms ±   0.6 ms    [User: 11.4 ms, System: 2.4 ms]
  Range (min … max):    12.6 ms …  15.5 ms    197 runs

≈ 30% faster

@Wulian233 Wulian233 changed the title gh-121423: Speed up socket.errotTab and lazy import selectors gh-121423: Speed up socket.errorTab and lazy import selectors Jul 6, 2024
Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that your benchmark is correct. Because you put import socket into your setup part. So, the import time itself is not counted.

Lib/socket.py Outdated Show resolved Hide resolved
@Wulian233
Copy link
Contributor Author

Wulian233 commented Jul 6, 2024

You are right, ↓ benchmark remove -s, slower 0.02us than -s because import

>python -m timeit -n 1000 "import socket" "print(socket.__all__)"
1000 loops, best of 5: 424 usec per loop

>python -m timeit -n 1000 "import socket" "print(socket.__all__)"
1000 loops, best of 5: 371 usec per loop

1.14x faster

@sobolevn sobolevn requested a review from gpshead July 7, 2024 07:24
@Wulian233 Wulian233 changed the title gh-121423: Speed up socket.errorTab and lazy import selectors gh-121423: Improve import time of socket by writing socket.errorTab as a constant and lazy import of selectors Jul 7, 2024
@barry-scott
Copy link
Contributor

The benchmark only imports socket once. the other 999 imports are no-op as socket in in sys.modules.

@Wulian233
Copy link
Contributor Author

Wulian233 commented Aug 31, 2024

I carried out two parameter tests, one is with -s, is in the pr description, 1.15x. One is to do without -s in the comments 1.14x

Am I right? I'm not familiar with benchmark, sorry


I'm wrong

@effigies
Copy link
Contributor

You can use hyperfine to measure the entire Python process, in which case using a python -c pass test is useful as a baseline:

hyperfine -w 16 -u microsecond 'python -c pass'
Benchmark 1: python -c pass
  Time (mean ± σ):     10921.2 µs ± 844.5 µs    [User: 8745.1 µs, System: 2091.4 µs]
  Range (min … max):   9720.3 µs … 15591.1 µs    242 runshyperfine -w 16 -u microsecond 'python -c "import socket; socket.__all__"'
Benchmark 1: python -c "import socket; socket.__all__"
  Time (mean ± σ):     14554.3 µs ± 1154.4 µs    [User: 12061.3 µs, System: 2368.5 µs]
  Range (min … max):   12826.1 µs … 20450.0 µs    206 runs

@hugovk
Copy link
Member

hugovk commented Aug 31, 2024

Using a PGO+LTO build on macOS, so this will just be measuring the import selectors change, and saves about 1.1 ms:

❯ hyperfine --warmup 16 \
--prepare "git checkout socket"                                   './python.exe -c "import socket; socket.__all__"' \
--prepare "git checkout 6239d41527d5977aa5d44e4b894d719bc045860e" './python.exe -c "import socket;  socket.__all__"'
Benchmark 1: ./python.exe -c "import socket; socket.__all__"
  Time (mean ± σ):      16.2 ms ±   0.8 ms    [User: 13.1 ms, System: 2.5 ms]
  Range (min … max):    15.5 ms …  21.7 ms    91 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Benchmark 2: ./python.exe -c "import socket;  socket.__all__"
  Time (mean ± σ):      17.3 ms ±   1.0 ms    [User: 13.9 ms, System: 2.7 ms]
  Range (min … max):    16.6 ms …  25.1 ms    88 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs.

Summary
  ./python.exe -c "import socket; socket.__all__" ran
    1.06 ± 0.08 times faster than ./python.exe -c "import socket;  socket.__all__"

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we have a general purpose issue for import time improvements -- #118761 -- shall we use that as an umbrella issue for this PR as well?


Using tuna to visualise import times (again, on macOS):

./python.exe -X importtime -c "import socket" 2> import.log && tuna import.log

Before: 3ms

image

Current PR: 2ms

image

Also: 1ms

Moving the couple of import arrays into two functions that call it:

image

Lib/socket.py Show resolved Hide resolved
@@ -348,6 +349,9 @@ def makefile(self, mode="r", buffering=None, *,
if hasattr(os, 'sendfile'):

def _sendfile_use_sendfile(self, file, offset=0, count=None):
# Lazy import to improve module import time
import selectors
Copy link
Member

@vstinner vstinner Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use a global variable to avoid the import at each call:

global selectors
if selectors is None:
    import selectors

You can define the selectors variable to None at the top of the file with a comment:

# module imported lazily
selectors = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked several recent PRs making module loading lazy, but most of them do not use the pattern with a global variable (I suspect for readability reasons).

@vstinner Are there any other reasons besides the small performance improvement for using the global variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka: Do you think that it's still useful in 2024 to use a global variable to avoid import selectors at each function call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just do the import. It's a mere dict lookup without a conditional when the import has already happened.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just measure. It is more than a mere dict lookup (we also need to check that the module is not partially initialized, this adds 2 more dict lookups or like).

In this case, I think that the difference may be small even in comparison with a single os.fstat() call. The idiom proposed by @vstinner may be used when the whole function is very fast.

@vstinner
Copy link
Member

vstinner commented Sep 2, 2024

Also: 1ms Moving the couple of import arrays into two functions that call it:

Can you also make the array module import lazy in this PR?

@vstinner
Copy link
Member

vstinner commented Sep 2, 2024

I measured that the change saves 0.7 ms on import socket: 1.72 ms => 1.24 ms.

timeit:

$ ./python -m timeit -s 'import sys; state=dict(sys.modules)' 'import socket; del socket; sys.modules.clear(); sys.modules.update(state)' 
200 loops, best of 5: 1.72 msec per loop

$ git switch pr/121424 
$ ./python -m timeit -s 'import sys; state=dict(sys.modules)' 'import socket; del socket; sys.modules.clear(); sys.modules.update(state)' 
200 loops, best of 5: 1.24 msec per loop

hyperfine:

$ hyperfine -w 16 './python -c "import socket"'
  Time (mean ± σ):      14.6 ms ±   0.7 ms    [User: 11.9 ms, System: 2.6 ms]
  Range (min … max):    13.4 ms …  16.5 ms    192 runs
 
$ git switch pr/121424 
$ hyperfine -w 16 './python -c "import socket"'
  Time (mean ± σ):      13.9 ms ±   0.6 ms    [User: 11.4 ms, System: 2.4 ms]
  Range (min … max):    12.6 ms …  15.5 ms    197 runs

@Wulian233
Copy link
Contributor Author

Thank you! I just replaced the incorrect benchmark results in the PR with the correct ones

@hugovk
Copy link
Member

hugovk commented Sep 2, 2024

Also: 1ms Moving the couple of import arrays into two functions that call it:

Can you also make the array module import lazy in this PR?

@Wulian233 Please could you do this as well?

@Wulian233
Copy link
Contributor Author

Wulian233 commented Sep 3, 2024

Of course! I just lazy imported array and haven't done full benchmarking yet. I also mentioned this optimization in 3.14.rst, do you think it should be included? @hugovk

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

What's New entry:

which results in a 30% speed up in standard pyperformance benchmarks.

This sentence can be misunderstood as "everything is 30% faster".

Which pyperformance benchmark is now faster?

Lib/socket.py Outdated Show resolved Hide resolved
@@ -348,6 +349,9 @@ def makefile(self, mode="r", buffering=None, *,
if hasattr(os, 'sendfile'):

def _sendfile_use_sendfile(self, file, offset=0, count=None):
# Lazy import to improve module import time
import selectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just measure. It is more than a mere dict lookup (we also need to check that the module is not partially initialized, this adds 2 more dict lookups or like).

In this case, I think that the difference may be small even in comparison with a single os.fstat() call. The idiom proposed by @vstinner may be used when the whole function is very fast.

@Wulian233
Copy link
Contributor Author

  • Improve import time of :mod:socket by lazy importing modules and
    writing :data:!socket.errorTab as a constant, which results in
    a 30% speed-up in the import time pyperformance benchmarks.

What do you think of this👀

@hugovk
Copy link
Member

hugovk commented Sep 3, 2024

For What's New, we can follow the example @AlexWaygood wrote for 3.13, which grouped a few import improvements together:

Several standard library modules have had their import times significantly improved. For example, the import time of the typing module has been reduced by around a third by removing dependencies on re and contextlib. Other modules to enjoy import-time speedups include email.utils, enum, functools, importlib.metadata, and threading. (Contributed by Alex Waygood, Shantanu Jain, Adam Turner, Daniel Hollas, and others in gh-109653.)

Let's do the same with #118761 in 3.14. We have two under that issue so far.

I recommend we also group this PR under #118761 as well -> rename this PR title gh-118761: ....

So we can follow something like that now, or leave it out for now and add a grouped summary later.

@serhiy-storchaka
Copy link
Member

Omit details that are not interested to the end user.

I would also remove any mention from Ehat's New -- this is an insignificant change. I am sure that if you measure import time of different modules, you will find larger changes between versions (maybe even between bugfix releases) without anybody noticing.

@Wulian233 Wulian233 changed the title gh-121423: Improve import time of socket by writing socket.errorTab as a constant and lazy import of selectors gh-118761: Improve import time of socket by writing socket.errorTab as a constant and lazy import of selectors Sep 3, 2024
@Wulian233
Copy link
Contributor Author

Wulian233 commented Sep 3, 2024

I recommend we also group this PR under #118761 as well -> rename this PR title gh-118761: ....

Okay, now this pr belongs to 118761. I revert the changes to 3.14 so we can write them together future when there are more modules optimizations :)

@Wulian233 Wulian233 changed the title gh-118761: Improve import time of socket by writing socket.errorTab as a constant and lazy import of selectors gh-118761: Improve import time of socket by writing socket.errorTab as a constant and lazy import modules Sep 3, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to Misc/NEWS.d/next/Library/2024-07-06-12-37-10.gh-issue-118761.vnxrl4.rst (containing gh-issue-118761) and we're ready to merge :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#121423 is the right issue.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vstinner vstinner changed the title gh-118761: Improve import time of socket by writing socket.errorTab as a constant and lazy import modules gh-121423: Improve import time of socket by writing socket.errorTab as a constant and lazy import modules Sep 4, 2024
@vstinner vstinner merged commit 7bd964d into python:main Sep 4, 2024
34 checks passed
@vstinner
Copy link
Member

vstinner commented Sep 4, 2024

Merged. Thanks for your nice enhancement @Wulian233.

@Wulian233 Wulian233 deleted the socket branch September 4, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.