Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osmpbf: do not preallocate nodes #26

Merged
merged 1 commit into from
Apr 27, 2021
Merged

osmpbf: do not preallocate nodes #26

merged 1 commit into from
Apr 27, 2021

Conversation

paulmach
Copy link
Owner

@paulmach paulmach commented Apr 2, 2021

regarding #25 on high memory usage.

I noticed that pre-allocating the nodes could cause problems if you're saving some of them for later use. If just 1 node out of the whole 8000 node array is saved, the GC can not cleanup the any of them.

Benchmarks are impacted:

benchmark                        old ns/op     new ns/op     delta
BenchmarkLondon-12               235836166     248139745     +5.22%
BenchmarkLondon_nodes-12         163988036     177138150     +8.02%
BenchmarkLondon_ways-12          136805230     135422092     -1.01%
BenchmarkLondon_relations-12     82180166      81767765      -0.50%

benchmark                        old allocs     new allocs     delta
BenchmarkLondon-12               2416813        5145479        +112.90%
BenchmarkLondon_nodes-12         1003846        3732512        +271.82%
BenchmarkLondon_ways-12          1792714        1792718        +0.00%
BenchmarkLondon_relations-12     456772         456772         +0.00%

benchmark                        old bytes     new bytes     delta
BenchmarkLondon-12               954879689     952781068     -0.22%
BenchmarkLondon_nodes-12         649480586     647382889     -0.32%
BenchmarkLondon_ways-12          432818482     432819399     +0.00%
BenchmarkLondon_relations-12     179085842     179085459     -0.00%

@paulmach paulmach mentioned this pull request Apr 2, 2021
@oflebbe
Copy link
Contributor

oflebbe commented Apr 2, 2021

Ok this will introduce a huge Garbage Collection overhead, since we will have 8000x more objects.
My impression the library will be used either to take all objects from a result or to copy away the few needed.
I totally see the issue with #25 , where I proposed a way to use like I am using consuming the lib.

We cannot use sync.Pool as well, since the consumer of the library who has to do Put() after discarding a result.

I am thinking about having a alternative api, where the consumer of the api has to call the Scanner with a function as a predicate to find whether a given node/way/relation should be returned or not. What do you think @paulmach ?

Or even doing the complete processing by a user-supplied function (as I have seen it in a rust osm library).

@paulmach
Copy link
Owner Author

After thinking about this, I think this change is necessary to support the current API. Then we can add something like #27 for anyone that needs the extra performance.

The way I'm thinking about this is objects in the for scanner.Scan() {} loop can have 3ish different lifetimes:

  1. Just within the loop. Maybe you're counting or transforming and saving, or just inserting into a DB.
  2. For a few loops. Like bulk inserting into the DB.
  3. Forever, you're saving a handful of the nodes into a map.

1+2 work today, 3 creates high memory usages as reported in #25. To support all 3 cases with the current API we need this PR.

Performance wise I think we're okay. Comparing v0.1.1 (the release that added this to be reverted change) to this branch I see:

benchmark              old ns/op     new ns/op     delta
BenchmarkLondon-12     355943390     251720242     -29.28%

benchmark              old allocs     new allocs     delta
BenchmarkLondon-12     7732810        5145480        -33.46%

benchmark              old bytes      new bytes     delta
BenchmarkLondon-12     1306695743     952781394     -27.08%

So we're still better than what we had when the change was originally proposed.

@paulmach paulmach merged commit 9df5be8 into master Apr 27, 2021
@paulmach paulmach deleted the no-preallocate branch April 27, 2021 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants