Use separate value stores for identifiers and string literals #4106

chandlerc · 2024-07-03T01:52:51Z

This undoes a previous change to unify them, and I think at my advice. =[ Sorry about that, I think I was just wrong.

Specifically, I think I had suggested that it would be more efficient to have a single shared hashtable of strings. The more I look at profiles of the toolchain, the less likely that seems. Specifically for identifiers and string literals it seems especially problematic.

Using a single, joint hashtable is likely a good idea when all of the different querying code paths are equally likely, the strings follow the same distribution of sizes, and either there is no clustering of access to different sets of strings or none of the sets are meaningfully small enough to fit into a lower level of resident cache.

I think essentially none of these predicates actually hold for identifiers vs. string literals:

Identifiers are much more hot
They have wildly different size distributions.
The access patterns are very clustered

Sorry for the misleading advice on that one.

While splitting them, I've worked to simplify the code a bit by building a way to have the StringRef holding canonical value stores not require specializations, and so we get a pretty large code cleanup in the process here.

This undoes a previous change to unify them, and I think at my advice. =[ Sorry about that, I think I was just wrong. Specifically, I think I had suggested that it would be more efficient to have a single shared hashtable of strings. The more I look at profiles of the toolchain, the less likely that seems. Specifically for identifiers and string literals it seems especially problematic. Using a single, joint hashtable is likely a good idea when all of the different querying code paths are equally likely, the strings follow the same distribution of sizes, and either there is no clustering of access to different sets of strings or none of the sets are meaningfully small enough to fit into a lower level of resident cache. I think essentially none of these predicates actually hold for identifiers vs. string literals: - Identifiers are *much* more hot - They have wildly different size distributions. - The access patterns are very clustered Sorry for the misleading advice on that one. While splitting them, I've worked to simplify the code a bit by building a way to have the `StringRef` holding canonical value stores not require specializations, and so we get a pretty large code cleanup in the process here.

jonmeow

Nice! The RefType code is a particularly good cleanup.

chandlerc requested a review from jonmeow July 3, 2024 01:52

github-actions bot requested a review from josh11b July 3, 2024 01:53

github-actions bot added the toolchain label Jul 3, 2024

chandlerc mentioned this pull request Jul 3, 2024

Reserve memory for the identifiers hashtable. #4107

Merged

jonmeow approved these changes Jul 3, 2024

View reviewed changes

jonmeow added this pull request to the merge queue Jul 3, 2024

Merged via the queue into carbon-language:trunk with commit e71e6ca Jul 3, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use separate value stores for identifiers and string literals #4106

Use separate value stores for identifiers and string literals #4106

chandlerc commented Jul 3, 2024

jonmeow left a comment

Use separate value stores for identifiers and string literals #4106

Use separate value stores for identifiers and string literals #4106

Conversation

chandlerc commented Jul 3, 2024

jonmeow left a comment

Choose a reason for hiding this comment