Optimize performance of string interning #7703

sync-by-unito · 2024-05-17T08:36:42Z

Performance of string interning needs to be optimized. Specifically the current implementation needs to load too much state on initialization. Instead we need to build a fast searchable structure in the file. This is in line with the design doc.

sync-by-unito · 2024-05-17T10:01:16Z

➤ PM Bot commented:

Jira ticket: RCORE-2125

sync-by-unito · 2024-08-01T13:14:38Z

➤ nicola-cab commented:

[~[email protected]] it seems we are slower in several cases compared to master, even when we use string IDs. I think this is known, I am just leaving a comment in this Jira, since I believe we are down to implementing the trimming logic and reviewing the string interner.
In particular debugging and testing the code this has come up:

We are grabbing a lock every time we are accessing the string interner, this in my perf measurements accounts for roughly a degradation between 15% and 30% ... Especially for sorting stuff, the constant burden of taking a lock every time we compare 2 strings is killing our perfs. If we don't make lock free the interner, we should consider having some sort of API that allows us to take the lock once for the entire duration of the sorting, or alternatively get an immutable copy of the DS that stores the symbols.
Another thing I have noticed is that, when we load the compressed string view, we end up very often inside

StringInterner::load_leaf_if_needed

... which is again loading the entire state from scratch. I wonder if it is possible to avoid that somehow.

Considering these limitations: We are still better than current master for sorting a lot of dups strings with that are very long.

#7892 (comment)

This should apply to find_first too, but in this case the impacting of locking should be less evident, since we should be taking a lock less often.

sync-by-unito bot assigned finnschiermer May 17, 2024

sync-by-unito bot unassigned finnschiermer Jun 24, 2024

sync-by-unito bot closed this as completed Aug 14, 2024

github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize performance of string interning #7703

Optimize performance of string interning #7703

sync-by-unito bot commented May 17, 2024

sync-by-unito bot commented May 17, 2024

sync-by-unito bot commented Aug 1, 2024

Optimize performance of string interning #7703

Optimize performance of string interning #7703

Comments

sync-by-unito bot commented May 17, 2024

sync-by-unito bot commented May 17, 2024

sync-by-unito bot commented Aug 1, 2024