Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit amount of channel buffers #22

Merged
merged 1 commit into from
Feb 2, 2021
Merged

Conversation

oflebbe
Copy link
Contributor

@oflebbe oflebbe commented Jan 22, 2021

While running with an high amount of processes (>=16) for large OSM exports I observed a very high RAM usage.

Inspecting the code I found that buffered channels are used, scaling up by the number of goroutines. This will create a N^2 amount of buffers. Since the not-so-scaling-well serialization at the end of the parallel pipeline will create backpressure, all these buffers are filled quite fast completely, leaving all the threads in a "twiddling thumbs" state, accounting for a high and unproductive load.

Proposal is to account for some buffering for a low amount of goroutines , to efficiently scaling for low processor counts and limiting to buffering to no buffering for higher amount of goroutines since it does not make sense to provide a high amount of buffers which cannot be processed at later parts of the system.

The benchmark is not showing the whole effect of this change. Load is decreased as well and more important RAM usage is decreased by a factor of N, N being the goroutine count (for my usecase it was essentially a drop from 16GB to 1.5GB)

benchmark                       old ns/op     new ns/op     delta
BenchmarkLondon-8               268139968     257049707     -4.14%
BenchmarkLondon_nodes-8         188952129     180283800     -4.59%
BenchmarkLondon_ways-8          170449362     161457450     -5.28%
BenchmarkLondon_relations-8     98485345      98436864      -0.05%

benchmark                       old allocs     new allocs     delta
BenchmarkLondon-8               2416812        2416806        -0.00%
BenchmarkLondon_nodes-8         1003846        1003844        -0.00%
BenchmarkLondon_ways-8          1792714        1792716        +0.00%
BenchmarkLondon_relations-8     456776         456774         -0.00%

benchmark                       old bytes     new bytes     delta
BenchmarkLondon-8               954879856     954877660     -0.00%
BenchmarkLondon_nodes-8         649481057     649479861     -0.00%
BenchmarkLondon_ways-8          432819502     432819265     -0.00%
BenchmarkLondon_relations-8     179086988     179085753     -0.00%

@paulmach
Copy link
Owner

paulmach commented Feb 2, 2021

This is a good find. Definitely N^2 is not good because every N is 8000 osm elements in memory.

@paulmach paulmach merged commit eeed6ca into paulmach:master Feb 2, 2021
@paulmach paulmach mentioned this pull request Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants