-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CompileAnalysis contains many redundant strings #547
Comments
Funny that you report now, I was profiling this a week ago and also saw the brutal amount of objects that Zinc was allocating. Aside from the strings I also saw a lot of name hashes created and other intermediary objects that could be removed. I don't know to which extent the protobuf deserialization logic is involved with the creation of these objects. I would expect that the majority of them are created from the bridge when compilation happens. I think the best way to solve this problem is via pooling (as we do in the compiler with term and type names).
I agree this is a superior solution than I wanted to take a shot at reducing the high consumption of name hashes during compilation this weekend (the other main source of allocation). Let me know if we're not stepping on each other's toes! I'm happy to defer these tasks to you if you're already looking into all of them. |
To be clear, I'm not blaming the deserialization logic: since most of the objects created there are short lived, they should get cleaned up quite easily. I was just editing there because that was the easiest place to introduce the interning. Rather, I'm concerned with the memory held by a So it's not about throughput, but rather about total memory usage. |
I could, but it would be a big protobuf schema change... is there appetite for that? |
I would welcome that change. So long as we can keep the old reader and and a new reader/writer for the same version, it's good for me (so that we can reuse previously generated analysis files). The problem we're trying to solve here is pretty fundamental so I see value in it. This would also reduce the size of the analysis file considerably. |
We are interested in this issue. Do you have any update and what is the preferred approach to fix it? Thanks. |
I have some staged changes to fix this, so I'll submit them soon. After the PR, any help regarding further profiling would be welcome. |
I was looking at the #1326 and looks like the intern optimization is gone. Shall first enable benchmark for analysis serialization/deserialization performance (#1400) and then add the optimization back if it does not affect performance (which I doubt it would). Nvm #1326 replaced intern optimization with a constant pool of strings in serialized format. |
While upgrading our repository from zinc
0.13.9
to1.1.7
, we experienced an increase in memory usage. After investigation, it was likely because during the upgrade we switched to using name hashing (by virtue of the previous implementation having been removed). We've found other solutions to this increase, so it isn't currently a blocker.But while inspecting heap dumps during the investigation, I noticed that the object graph below CompileAnalysis contains highly redundant strings. For one module, we saw 2708 distinct occurrences of the string
"java"
(representing ~130KB on the heap), 789 instances of"equals"
, 287 instances of"findByName"
, etc.The distribution of occurrences and their redundancy for that module was:
(ie, there were 17 distinct strings that occurred between 512 and 1024 times in this module alone)
I started in on a patch to intern strings during protobuf deserialization, which I expect would significantly reduce this redundancy.
But while doing that it occurred to me that a more efficient and typesafe solution would be to change the protobuf schema to store the interned string values once (ie, by replacing
string
withmessage InternKey {int32}
). This would be similar to the Java classfile constant pool.The text was updated successfully, but these errors were encountered: