Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test different heap sizes for Lucene benchy #37

Open
mikemccand opened this issue Jun 10, 2023 · 5 comments
Open

Test different heap sizes for Lucene benchy #37

mikemccand opened this issue Jun 10, 2023 · 5 comments

Comments

@mikemccand
Copy link
Collaborator

Spinoff from this discussion: apache/lucene#12358 (comment)

We should fix the heap size for the JVM running Lucene. It can save some cost of the JVM trying to grow/shrink/reallocate, etc.?

I'm starting with 4 GB as a random guess but we should test at what heap size is GC cost minimized.

@mikemccand
Copy link
Collaborator Author

Note that tantivy seems to use 1.5 GB resident:

 203670 mike      20   0   11.9g   1.5g   1.5g R  99.7   0.8   4:57.51 do_query

And memory maps its index (like Lucene).

We should compare the RAM requirements of each engine too!

Since Rust "GC" is immediate (as soon as something becomes garbage it is reclaimed, like Python's non-cyclic "collector") it does not need the overhead to allow garbage to accumulate and then be reclaimed by a complex GC impl like Java.

@Tony-X
Copy link
Owner

Tony-X commented Jun 10, 2023

Since Rust "GC" is immediate (as soon as something becomes garbage it is reclaimed

well, freeing garbage is not free (lol sorry for the pun). So for Rust, the latencies we see actually include garbage/memory management. So it is actually doing extra work on top of what is strictly needed for finishing the query.

@jainankitk
Copy link

So for Rust, the latencies we see actually include garbage/memory management. So it is actually doing extra work on top of what is strictly needed for finishing the query.

Is it possible that Rust only marks the garbage during query execution (which should be quite lightweight)? And, asynchronously clears memory blocks and make them available for allocation?

@mikemccand
Copy link
Collaborator Author

well, freeing garbage is not free (lol sorry for the pun).

Ha! Love the pun.

So for Rust, the latencies we see actually include garbage/memory management. So it is actually doing extra work on top of what is strictly needed for finishing the query.

Yeah that is true -- Rust must still do the memory management work. But it avoids all the cost of crawling all object references, inserting memory barriers, etc. (I think?)

Is it possible that Rust only marks the garbage during query execution (which should be quite lightweight)? And, asynchronously clears memory blocks and make them available for allocation?

Maybe :) I am far from a Rust GC expert!

I found this recent Reddit discussion, but I remain confused :) It looks like at compile time rustc is able to know the moment an object is no longer referenced and it de-allocates it immediately. But there is also a reference counting option, which is more dynamic (only runtime, not compile time, knows it's time to de-allocate).

@Tony-X
Copy link
Owner

Tony-X commented Jun 20, 2023

@jainankitk what you are talking about is basically a version of concurrent GC :) , which is not what rust has.

As @mikemccand pointed out, rust statically knows when objects are out-of-scope (or more precisely, for data that no longer has an owner), so that it can call the destructor. This helps not only with memory management (working with the allocator) but also any clean-up actions (e.g. close a file handle).

@mikemccand to your confusion -- those reference-count object are really Smart Pointers. Say you have x: Rc<Data>, x here is a reference(valid pointer) to some value of type Data. When x goes out of scope, its destructor still runs whose effect will be decrementing the reference count and conditionally destruct the Data it references if and only if the ref-count reached 0. This is not GC :)

Maybe this helps: x only owns the reference (machine's pointer type, 4/8 bytes), the Data that x references to can be arbitrary size for the purpose of this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants