Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoPool's performance in comparison to other goroutine pool implementations #144

Open
alphadose opened this issue Jul 5, 2022 · 12 comments
Labels
question Further information is requested

Comments

@alphadose
Copy link

Question

GoPool seems to perform poorly from the benchmarks I ran

name                   time/op
UnlimitedGoroutines-8   301ms ± 4%
ErrGroup-8              515ms ± 9%
AntsPool-8              582ms ± 9%
GammaZeroPool-8         740ms ±13%
BytedanceGoPool-8       572ms ±18%
ItogamiPool-8           331ms ± 7%

name                   alloc/op
UnlimitedGoroutines-8  96.3MB ± 0%
ErrGroup-8              120MB ± 0%
AntsPool-8             22.4MB ± 6%
GammaZeroPool-8        18.8MB ± 1%
BytedanceGoPool-8      82.2MB ± 2%
ItogamiPool-8          25.6MB ± 2%

name                   allocs/op
UnlimitedGoroutines-8   2.00M ± 0%
ErrGroup-8              3.00M ± 0%
AntsPool-8              1.10M ± 2%
GammaZeroPool-8         1.05M ± 0%
BytedanceGoPool-8       2.59M ± 1%
ItogamiPool-8           1.05M ± 0%

Reference -> https://github.com/alphadose/itogami

Benchmarking code -> https://github.com/alphadose/go-threadpool-benchmarks

@alphadose alphadose added the question Further information is requested label Jul 5, 2022
@PureWhiteWu
Copy link
Collaborator

Could you please try the same test as https://github.com/bytedance/gopkg/blob/develop/util/gopool/pool_test.go?

@alphadose
Copy link
Author

Got these results

2022/07/05 16:20:06.801184 logger.go:190: [Error] GOPOOL: panic in pool: test: test: goroutine 2025 [running]:
runtime/debug.Stack()
	/opt/homebrew/Cellar/go/1.18/libexec/src/runtime/debug/stack.go:24 +0x68
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1.1.1()
	/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:64 +0x80
panic({0x100601700, 0x100623510})
	/opt/homebrew/Cellar/go/1.18/libexec/src/runtime/panic.go:838 +0x204
command-line-arguments.testPanicFunc()
	/Users/alphadose/assignment/go-threadpool-benchmarks/gopool_test.go:27 +0x30
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1.1(0x1400010c2b0?, 0x10060c8a0?)
	/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:69 +0x5c
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1()
	/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:70 +0x1a8
created by github.com/bytedance/gopkg/util/gopool.(*worker).run
	/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:41 +0x64
%!(EXTRA []interface {}=[])
goos: darwin
goarch: arm64
BenchmarkPool-8      	     367	   3841601 ns/op	  179468 B/op	   10227 allocs/op
BenchmarkGo-8        	     100	  11113182 ns/op	  167427 B/op	   10016 allocs/op
BenchmarkItogami-8   	     369	   3109478 ns/op	  174749 B/op	   10073 allocs/op
PASS
ok  	command-line-arguments	4.771s

https://github.com/alphadose/go-threadpool-benchmarks/blob/master/gopool_test.go

@PureWhiteWu
Copy link
Collaborator

Thanks for your tests!
For the first test, I think that's not the correct way to use goroutine pools, it's better to use the native goroutines instead.
For the second test, seems that Itogami outperforms gopool in cpu-bound tasks. I took a quick glance at the code, and found that it makes good use of the runtime's internal funcs, that's great!
I'll investigate if we could optimize the performance of gopool without the usage of runtime internal funcs.

@alphadose
Copy link
Author

alphadose commented Jul 5, 2022

The first test was to mimic 1000s of HTTP requests each of which typically last within milliseconds, but yes there should be some computation and variance between requests for relation to an actual use case

it's better to use the native goroutines instead.

if we remove the limit on itogami then it essentially becomes an infinite goroutine pool but it also saves a lot of memory (due to recycling via the stack) as compared to using native goroutines, hence I think its beneficial to use itogami in this case too with some modifications

@lesismal
Copy link

lesismal commented Oct 16, 2022

I think it's not enough to test cpu cost handler only, in most of the real scenarios, blocking is the much more usual operation than pure cpu cost, eg: http handler, rpc handler, and there is a write syscall at least.

I tried with a short sleep and get a much worse performance from gopool than std:

func testFunc() {
	// DoCopyStack(0, 0)
	time.Sleep(time.Microsecond * 10)
}

result:

BenchmarkPool-8   	       1	13349818686 ns/op	  727840 B/op	   20125 allocs/op
BenchmarkGo-8     	      82	  13735426 ns/op	  968846 B/op	   20042 allocs/op

This much worse result may be because the default pool size is only 8 in my env(runtime.GOMAXPROCS(0)==8).
I change the pool size to 1000 or 10000 and gopool runs much better but still worse than std:

# pool size: 1000
BenchmarkPool-8   	      10	 109975368 ns/op	  466008 B/op	   12162 allocs/op
BenchmarkGo-8     	      85	  13984710 ns/op	  970441 B/op	   20082 allocs/op
# pool size: 10000
BenchmarkPool-8   	      68	  17622188 ns/op	 1421435 B/op	   30038 allocs/op
BenchmarkGo-8     	      86	  14227710 ns/op	  971272 B/op	   20074 allocs/op

@lesismal
Copy link

lesismal commented Oct 16, 2022

it also saves a lot of memory

@alphadose
I think the memory cost for each Go operation can be ignored since it runs 10000 times in each b.N loop, it's a much smaller memory cost for each Go operation:
https://github.com/bytedance/gopkg/blob/develop/util/gopool/pool_test.go#L24

But indeed we need to limit the pool size to control the total memory cost of the process.

@lesismal
Copy link

I took a quick glance at the code, and found that it makes good use of the runtime's internal funcs, that's great!

@alphadose Cool!

@lesismal
Copy link

lesismal commented Oct 16, 2022

@alphadose
I took a look at your test and found that it's unfair because itogami's pool size is 5e4 but gopool size is much smaller. For itogami's pool it's similar to using std goroutine but your implementation performs a little better.
https://github.com/alphadose/go-threadpool-benchmarks/blob/master/gopool_test.go#L89

And the same point with gopool's default cpu cost handler, if I limit the pool size and change to a testFunc with a short sleep, itogami doesn't perform better than std in my env:

# pool size: 1000
BenchmarkPool-8      	      10	 110218422 ns/op	  461439 B/op	   12138 allocs/op
BenchmarkGo-8        	      84	  13746197 ns/op	  969096 B/op	   20052 allocs/op
BenchmarkItogami-8   	      10	 104207091 ns/op	  200136 B/op	   10413 allocs/op

It's only faster than std when the PoolSize >= benchmarkTimes or the blocking time is shorter than b.N calling loop cost, else it's worse than std.

if we remove the limit on itogami then it essentially becomes an infinite goroutine pool but it also saves a lot of memory (due to recycling via the stack) as compared to using native goroutines, hence I think its beneficial to use itogami in this case too with some modifications

It's no use if we don't limit the pool size and I think users prefer to use std goroutine rather than use another implementation like std.

@lesismal
Copy link

@alphadose

I met a SIGSEGV from Itogami when running the benchmark:

goos: linux
goarch: amd64
pkg: github.com/alphadose/go-threadpool-benchmarks
cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
BenchmarkGo-8         	     328	   3554240 ns/op	  972669 B/op	   20124 allocs/op
BenchmarkPool-8       	     206	   6353972 ns/op	  590759 B/op	   19165 allocs/op
BenchmarkItogami-8    	fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x330 pc=0x4372a7]

......

@alphadose
Copy link
Author

alphadose commented Oct 16, 2022

@lesismal I made itogami as a proof of concept for a better golang scheduler, and as it uses golang runtime internals it is unsafe because of some edge cases here and there. Hence, its also unsafe to use for real use cases.

If you are interested here is the corresponding issue in golang core golang/go#53398
^ the benchmarking in the above case is based on a no-limit version of itogami which can be viewed in this branch https://github.com/alphadose/itogami/tree/golang-scheduler

@lesismal
Copy link

lesismal commented Oct 16, 2022

If you are interested here is the corresponding issue in golang core golang/go#53398 ^ the benchmarking in the above case is based on a no-limit version of itogami which can be viewed in this branch https://github.com/alphadose/itogami/tree/golang-scheduler

I took a quick look at 53398, It's good trying to use lock-free to optimize performance.
But as I know, lock-free is only suitable for a few scenarios such as a queue's push and pop operations. It can't be used to optimize a process with multiple steps which have beyond the atomic operations' ability.

@lesismal I made itogami as a proof of concept for a better golang scheduler, and as it uses golang runtime internals it is unsafe because of some edge cases here and there. Hence, its also unsafe to use for real use cases.

Since it's unsafe, the benchmark result should not be used to compare with std other goroutine pools, it doesn't make sense and would make misleading.

About http performance, I agree with you that fasthttp is really fast, but gnet's http benchmark is not a fully implementation of http protocol, details here, that also makes misleading. I used to opend an issue to advise not to use that benchmark result to claim gnet's performance but wasn't accepted.. But I would still be glad to see if someday the author deletes the http benchmark result, or if he fully supports the protocol.

I have another lib that does these tls/http/websocket things with goroutine num and memory usage under controlled: https://github.com/lesismal/nbio
There's also a sub-package that implements goroutine pools with limited size, it can't perform better than std but can limit pool size, and it seems the benchmark with short sleep performs a little better than I attached in the previous comments, if interested, welcome to try.

@alphadose
Copy link
Author

alphadose commented Oct 16, 2022

@lesismal

But as I know, lock-free is only suitable for a few scenarios such as a queue's push and pop operations. It can't be used to optimize a process with multiple steps which have beyond the atomic operations' ability.

I went through the golang's scheduling code and there is room for improvement by switching to a lock-free stack in some places. I got the idea itself from a gophercon talk delivered by a member of golang core team and I tested it out, got some good results. The major performance improvement by switching to a lock free stack would be scheduling all operations entirely in user-space and avoid lock()/unlock() system call context switches.

Since it's unsafe, the benchmark result should not be used to compare with std other goroutine pools, it doesn't make sense and would make misleading.

Agreed, I will update the readme so that people are aware that its experimental and unsafe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants