Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown when modifying instance member #11

Open
mdboom opened this issue Jul 26, 2023 · 0 comments
Open

Slowdown when modifying instance member #11

mdboom opened this issue Jul 26, 2023 · 0 comments

Comments

@mdboom
Copy link
Contributor

mdboom commented Jul 26, 2023

Bug report

Using the fibonacci example from the old nogil README, I'm able to see the time-per-call decrease with more threads:

import sys
from concurrent.futures import ThreadPoolExecutor

print(f"nogil={getattr(sys.flags, 'nogil', False)}")

def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)

threads = 8
if len(sys.argv) > 1:
    threads = int(sys.argv[1])

with ThreadPoolExecutor(max_workers=threads) as executor:
    for _ in range(threads):
        executor.submit(lambda: print(fib(34)))
$ time ./python nogil_bench.py 1
0.89user 0.00system 0:00.97elapsed 91%CPU  
# 0.97s per call
$ time ./python nogil_bench.py 8
11.75user 0.00system 0:01.97elapsed 595%CPU
# 0.24s per call

However, when I modify the benchmark to update an instance member, the time per call skyrockets. Note that the instance isn't shared between threads -- each thread gets its own instance.

import sys
from concurrent.futures import ThreadPoolExecutor

print(f"nogil={getattr(sys.flags, 'nogil', False)}")

class Fibonacci:
    def __init__(self, x):
        self.x = x

    def calculate(self, n):
        # This line doesn't actually matter for the calculation, but this is what
        # causes the nogil threaded performance to drop precipitously.
        self.x += 1

        if n < 2:
            return 1
        return self.calculate(n - 1) + self.calculate(n - 2)

def fib(n):
    f = Fibonacci(1)
    return f.calculate(n)

threads = 8
if len(sys.argv) > 1:
    threads = int(sys.argv[1])

with ThreadPoolExecutor(max_workers=threads) as executor:
    for _ in range(threads):
        executor.submit(lambda: print(fib(34)))
$ time ./python nogil_bench_slow.py 1
2.24user 0.00system 0:02.44elapsed 92%CPU
# 2.44s per call
$ time ./python nogil_bench_slow.py 8
76.39user 150.70system 1:22.25elapsed 276%CPU
# 11.03s per call

Looking at Linux perf, I see that _PyObject_GetInstanceAttribute is 10% of runtime on the slow version, and 0.0% in the fast version, so it is seemingly lock contention getting an instance attribute.

I do not see this pathological behavior on nogil-3.9, so I'm hoping this is just an isolated bug that is fixable independently.

$ time ./python nogil_bench_slow.py 1
1.50user 0.00system 0:01.63elapsed 92%CPU
# 1.63s per call
$ time ./python nogil_bench_slow.py 8
18.40user 0.01system 0:02.91elapsed 632%CPU
# 0.36s per call

Please ignore the fact that that line is meaningless to calculating Fibonacci -- this is my attempt at breaking down pyperformance's raytrace benchmark into a more minimal example. I'm sure you agree that modifying instance members is a pretty common thing to do. :)

Your environment

Debian Buster 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant