Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separate value stores for identifiers and string literals #4106

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

chandlerc
Copy link
Contributor

This undoes a previous change to unify them, and I think at my advice. =[ Sorry about that, I think I was just wrong.

Specifically, I think I had suggested that it would be more efficient to have a single shared hashtable of strings. The more I look at profiles of the toolchain, the less likely that seems. Specifically for identifiers and string literals it seems especially problematic.

Using a single, joint hashtable is likely a good idea when all of the different querying code paths are equally likely, the strings follow the same distribution of sizes, and either there is no clustering of access to different sets of strings or none of the sets are meaningfully small enough to fit into a lower level of resident cache.

I think essentially none of these predicates actually hold for identifiers vs. string literals:

  • Identifiers are much more hot
  • They have wildly different size distributions.
  • The access patterns are very clustered

Sorry for the misleading advice on that one.

While splitting them, I've worked to simplify the code a bit by building a way to have the StringRef holding canonical value stores not require specializations, and so we get a pretty large code cleanup in the process here.

This undoes a previous change to unify them, and I think at my advice.
=[ Sorry about that, I think I was just wrong.

Specifically, I think I had suggested that it would be more efficient to
have a single shared hashtable of strings. The more I look at profiles
of the toolchain, the less likely that seems. Specifically for
identifiers and string literals it seems especially problematic.

Using a single, joint hashtable is likely a good idea when all of the
different querying code paths are equally likely, the strings follow the
same distribution of sizes, and either there is no clustering of access
to different sets of strings or none of the sets are meaningfully small
enough to fit into a lower level of resident cache.

I think essentially none of these predicates actually hold for
identifiers vs. string literals:
- Identifiers are *much* more hot
- They have wildly different size distributions.
- The access patterns are very clustered

Sorry for the misleading advice on that one.

While splitting them, I've worked to simplify the code a bit by building
a way to have the `StringRef` holding canonical value stores not require
specializations, and so we get a pretty large code cleanup in the
process here.
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The RefType code is a particularly good cleanup.

@jonmeow jonmeow added this pull request to the merge queue Jul 3, 2024
Merged via the queue into carbon-language:trunk with commit e71e6ca Jul 3, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants