Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String interning #5121

Merged
merged 6 commits into from
Mar 7, 2018
Merged

String interning #5121

merged 6 commits into from
Mar 7, 2018

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Mar 5, 2018

This builds on the work from #5118. This interns the strings in the part of resolver that gets cloned a lot.

In a test on #4810 (comment)
Before we got to 1700000 ticks in ~(63 to 67) sec from #5118
After we got to 1700000 ticks in ~(42 to 45) sec

The interning code itself would be much better with a leak function that converts a String to a &'static str. Something like:

pub fn leek(s: String) -> &'static str {
    let ptr = s.as_ptr();
    let len = s.len();
    mem::forget(s);
    unsafe {
        let slice = slice::from_raw_parts(ptr, len);
        str::from_utf8(slice).unwrap()
    }
}

but "there is no unsafe in Cargo", and I am not the best at unsafe. So I just to_string and lived with the extra copy. Is there a better way to hand out references?

I assumed that InternedString::new world start appearing in profile result, and that we would want PackageId, and Summary, Et Al. to store the InternedString. That is why I put the interner in a shared folder. So far it is just used in the resolver. It may make sense for a lot more of the Strings to be interned, but with the extra copy... I have not explored it yet.

@rust-highfive
Copy link

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@matklad
Copy link
Member

matklad commented Mar 5, 2018

Mostly a drive-by comment, but I wonder if we should use strings for features at all... A natural idea is to enumerate all features in a package, and then use a number as a feature identifier. That way, a set of features could be stored as a bitset and fit into a single u64 most of the time....

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 5, 2018

@matklad So I see one small problem with that, If I require winapi = {version = "0.3", features = ["ntdef"]} how do I store the ntdef before we have picked a version of winapi? Maybe different versions of winapi store it as different numbers.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The wins here are quite nice :)

Can you confirm that this doesn't regress the Servo use case much? In the sense of "big pre-resolved dependency graph still executes quickly".

Other than that I'm mildly worried about the memory leaks here. Cargo is often used as a library and I could imagine that if you're doing lots of resolutions of lots of crates the memory could get a little out of hand here over time. I wonder if instead, though, we could have a scheme like:

  • There's an Interner structure which is used to get/set strings
  • Each Config stores an Interner
  • There's a scoped thread local Interner set at the start of each Cargo thread, maybe by Config?

It'd be a bit trickier to get the interner to other threads in Cargo's backend but I think it's plausible in the long run even with an Interner struct. For now I think this'd look like a scoped thread local during resolve (aka destroyed entirely after resolve is done).

Does that make sense though? Or do you think the static/leak strategy would still be prefererable?

RwLock::new((Vec::new(), HashMap::new()));
}

#[derive(Ord, PartialOrd, Eq, PartialEq, Hash, Clone, Copy)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the Ord and PartialOrd references be left out here? I think b/c they're comparing strings rather than contents it may end up being confusing down the road.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 5, 2018

servo>cargo generate-lockfile -Z no-index-update Still runs in ~34 sec most of that in many Updating git repository `https://github.com/servo/...`

I don't really know what I think about having a Interner structure. I don't look forward to trying to thread an argument to the Interner to every place we want to make an InternedString whatever scope we decide should own the Interner. I also don't like having to think about if an InternedString managed to outlive its backing Interner, nor if an InternedString is being used with the correct Interner. On the other hand, the static/leak strategy... well... it leaks witch is not grate, and it is spooky action at a distance. So I see the advantage in "not that".

The current implementation only uses InternedString in the resolver. So the conservative thing to do is to just make the Interner an argument to activate_deps_loop. But, most of our new InternedString come from InternedString::new(x.package_id().name()) so it would be nice for PackageId to already know the InternedString of its name, not that it shows up in profiles.

If we make Interner a part of Config would that help for projects using cargo as a library? Do they create a new Config on a regular basis? Can we be sure that they won't use a result from one config with a different config?

I did not understand your suggestion about scoped thread local. Sorry.
If you are suggesting using thread_local! instead of lazy_static!, dose that help with projects using cargo as a library? Do they create a new thread on a regular basis?

So overall I don't know.

@alexcrichton
Copy link
Member

Ok good to know on the resolve time! Sorry I think I wasn't clear before, but on second thought it's probably not worth it to have destruction here and it's probably best to just have this be a global. We can always work with it later and tweak it as necessary.

If it's global though we can for sure never delete it, so want to go ahead and add a Deref implementation? That should be safe as the strings basically live for the static lifetime anyway.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 6, 2018

Just to confirm you are leaning toward using the global/leek strategy, and you are asking me to add code that uses unsafe to convert an InternedString -> &'static str?

Next time I get a chance I will play with some of the options, and put links hear for us to compare.
Specifically, Adding a Interner owned by activate_deps. It may not be as intrusive as I feare.

@alexcrichton
Copy link
Member

Ah yeah I'm coming around to a global interner which leaks everything you add to it. I do like the idea of InternedString though rather than &'static str because it makes equality testing an integer comparison rather than a full string comparison.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 6, 2018

Note: that all the performance number should be rethought in lite of #5132 If we are cloning less often than improvements to performance of cloning will matter less.

I assume that if two &strs have the same ptr and len then the contents are never checked. But a dedicated InternedString also is fast for !eq tests.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 6, 2018

Pre #5132:
1700000 ticks in ~(63 to 67) sec from #5118 0r 26k ticks/sec
With #5132, still without this pr:
5000000, ticks in ~72 sec, 0r 69k ticks/sec
So that Is a lot better! And the profile is still mostly Cloning activations.
Next to rebase this pr :-)

@alexcrichton
Copy link
Member

Holy cow, nice!

In a test on rust-lang#4810 (comment)
Before we got to 5000000 ticks in ~72 sec
After we got to 5000000 ticks in ~65 sec
In a test on rust-lang#4810 (comment)
Before we got to 5000000 ticks in ~65 sec
After we got to 5000000 ticks in ~52 sec
@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 6, 2018

A nice to you. You are the one that spotted and identified the problem! I didn't notice that the clones I was optimizing where not supposed to be there.

I rebased and clean up this PR. Remarkably, it still pulls its weight!
After this pr, 5000000 ticks in ~52 sec, 0r 96k ticks/sec

Next to try a version with unsafe and a separate version with an Interner so we can decide how to proceed. :-)

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 6, 2018

I have worked on the scoped/Interner version. The ergonomics of passing the Interner to all the places we need to convert from a &str are abysmal. For example RemainingCandidates.next neads to compre the links attribute of a summary to a hashset of InternedStrings, but summarys out live the Interner so they have to return strings, I.E. RemainingCandidates.next needs to have a Interner argument.

I have threaded it thru to some of the places that need it, and am so bored I just can't finish this experiment. and you can see the result at Eh2406@0bdbc08. Now that it is done it is not so bad. :-)

@alexcrichton
Copy link
Member

Thanks for trying that out! I think it's clear the lazy_static approach will do fine for now. If you additionally don't see lock contention in the profile then I think it's for sure a fine approach.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 7, 2018

The unsafe version is a very small change:
Eh2406@26bfc7d

However the fact that this can hand out &str at 0 cost means we can use it in the things resolver talk to, so next let us see if that can make a difference. Most of the time is in clone so probably not, but let's give it a try. Yes, we can make a difference:
Eh2406@411355a

Witch brings us to: 5000000 ticks in ~48 sec, 0r 104k ticks/sec
If we go this route, I should try Interning all the things related to features, but that was too big for this test.

@alexcrichton
Copy link
Member

Ok sounds good to me! And to be clear it's not that we want no unsafe code in Cargo we just want very little :). Cargo itself has a fair bit of transitive unsafe code

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 7, 2018

I tried interning the features:
Eh2406@6360f25
But despite all the changes that involved it was not measurably faster.

If we want the speed of Eh2406@411355a and dont want the unsafe we can add a interned_name fealed to Dependency so it can hand out both.

One way or the other, I am feeling ready to move onto the next avenue of attack.

@alexcrichton
Copy link
Member

Ah yeah I'm fine incrementally improving the situation here rather than trying to do it all at once. Did you want to put some more changes in this PR and then I can r+?

use std::mem;
use std::cmp::Ordering;

pub fn leek(s: String) -> &'static str {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing this may actually want to do is call s.into_boxed_str() as that'll call shink_to_fit and make sure we're not hanging on to any lingering bytes, but that can happen in a follow-up!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    let boxed = s.into_boxed_str();
    let ptr = boxed.as_ptr();
    let len = boxed.len();
    mem::forget(boxed);

Yes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed!

Eventually this could even use Box::leak :)

cache.insert(s);
InternedString { ptr: s.as_ptr(), len: s.len() }
}
pub fn to_inner(&self) -> &'static str {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be listed as an implementation of Deref to get all the fun methods on strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thou they do not hash the same. I can use Deref instead of to_inner if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I wouldn't worry too much about that

@alexcrichton
Copy link
Member

Looks great! I'd like to get the Deref change in before merging but otherwise looks good to go!

@alexcrichton
Copy link
Member

Er actually, there's also a bunch of test failures on Windows at least :(

@Eh2406
Copy link
Contributor Author

Eh2406 commented Mar 7, 2018

I removed the last optimization as I don't know how it was causing the test failures. And added your suggestions. I think let us merge this, and I'll rework the last commit as a separate PR.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Mar 7, 2018

📌 Commit 98480e8 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Mar 7, 2018

⌛ Testing commit 98480e8 with merge 8fde4e3...

bors added a commit that referenced this pull request Mar 7, 2018
String interning

This builds on the work from #5118. This interns the strings in the part of resolver that gets cloned a lot.

In a test on #4810 (comment)
Before we got to 1700000 ticks in ~(63 to 67) sec from #5118
After we got to 1700000 ticks in ~(42 to 45) sec

The interning code itself would be much better with a `leak` function that converts a `String` to a `&'static str`. Something like:
```rust
pub fn leek(s: String) -> &'static str {
    let ptr = s.as_ptr();
    let len = s.len();
    mem::forget(s);
    unsafe {
        let slice = slice::from_raw_parts(ptr, len);
        str::from_utf8(slice).unwrap()
    }
}
```
but "there is no unsafe in Cargo", and I am not the best at unsafe. So I just `to_string` and lived with the extra copy. Is there a better way to hand out references?

I assumed that `InternedString::new` world start appearing in profile result, and that we would want `PackageId`, and `Summary`, Et Al. to store the `InternedString`. That is why I put the interner in a shared folder. So far it is just used in the resolver. It may make sense for a lot more of the Strings to be interned, but with the extra copy... I have not explored it yet.
@bors
Copy link
Contributor

bors commented Mar 7, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 8fde4e3 to master...

@bors bors merged commit 98480e8 into rust-lang:master Mar 7, 2018
@Eh2406 Eh2406 deleted the string_interning branch March 8, 2018 03:32
bors added a commit that referenced this pull request Mar 8, 2018
restructure `Activations` for better clone

This builds on the work in #5121 When we last met we had:
5000000 ticks in ~48 sec, 0r 104k ticks/sec
This small change brings us to:
5000000 ticks in ~21 sec, 0r 238k ticks/sec
Edit: sorry for the large diff only the last commit is new. The rest are from #5121
@Eh2406 Eh2406 mentioned this pull request Mar 8, 2018
bors added a commit that referenced this pull request Mar 8, 2018
More interning

This is a small part of the unsuccessful last commit of #5121, this part removes `InternedString::new` from the innerest of loops.

This is mostly a resubmission of #5147, that I accidentally deleted while bors was testing. This one has new commits, so github will take the resubition.
@Eh2406 Eh2406 mentioned this pull request Sep 17, 2019
@ehuss ehuss added this to the 1.26.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants