Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize performance of string interning #7703

Closed
sync-by-unito bot opened this issue May 17, 2024 · 2 comments
Closed

Optimize performance of string interning #7703

sync-by-unito bot opened this issue May 17, 2024 · 2 comments

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented May 17, 2024

Performance of string interning needs to be optimized. Specifically the current implementation needs to load too much state on initialization. Instead we need to build a fast searchable structure in the file. This is in line with the design doc.

Copy link
Author

sync-by-unito bot commented May 17, 2024

➤ PM Bot commented:

Jira ticket: RCORE-2125

Copy link
Author

sync-by-unito bot commented Aug 1, 2024

➤ nicola-cab commented:

[~[email protected]] it seems we are slower in several cases compared to master, even when we use string IDs. I think this is known, I am just leaving a comment in this Jira, since I believe we are down to implementing the trimming logic and reviewing the string interner. 
In particular debugging and testing the code this has come up:

  • We are grabbing a lock every time we are accessing the string interner, this in my perf measurements accounts for roughly a degradation  between 15% and 30% ...  Especially for sorting stuff, the constant burden of taking a lock every time we compare 2 strings is killing our perfs. If we don't make lock free the interner, we should consider having some sort of API that allows us to take the lock once for the entire duration of the sorting, or alternatively get an immutable copy of the DS that stores the symbols. 

  • Another thing I have noticed is that, when we load the compressed string view, we end up very often inside

StringInterner::load_leaf_if_needed 

... which is again loading the entire state from scratch. I wonder if it is possible to avoid that somehow.

Considering these limitations: We are still better than current master for sorting a lot of dups strings with that are very long. 

#7892 (comment)

This should apply to find_first too, but in this case the impacting of locking should be less evident, since we should be taking a lock less often.

@sync-by-unito sync-by-unito bot closed this as completed Aug 14, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant