Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While running with an high amount of processes (>=16) for large OSM exports I observed a very high RAM usage.
Inspecting the code I found that buffered channels are used, scaling up by the number of goroutines. This will create a N^2 amount of buffers. Since the not-so-scaling-well serialization at the end of the parallel pipeline will create backpressure, all these buffers are filled quite fast completely, leaving all the threads in a "twiddling thumbs" state, accounting for a high and unproductive load.
Proposal is to account for some buffering for a low amount of goroutines , to efficiently scaling for low processor counts and limiting to buffering to no buffering for higher amount of goroutines since it does not make sense to provide a high amount of buffers which cannot be processed at later parts of the system.
The benchmark is not showing the whole effect of this change. Load is decreased as well and more important RAM usage is decreased by a factor of N, N being the goroutine count (for my usecase it was essentially a drop from 16GB to 1.5GB)