Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite cycle in depgraph resolution #4810

Closed
vorner opened this issue Dec 12, 2017 · 8 comments
Closed

Infinite cycle in depgraph resolution #4810

vorner opened this issue Dec 12, 2017 · 8 comments

Comments

@vorner
Copy link

vorner commented Dec 12, 2017

Hello

When I check out and try to build this repository, cargo gets stuck in this:

$ cargo check --verbose
    Updating registry `https://github.com/rust-lang/crates.io-index`
   Resolving dependency graph...

I tried to look at what it was doing and setting RUST_LOG=debug, it seems it is in an infinite cycle, because these same messages keep repeating forever.

DEBUG:cargo::core::registry: load/precise  registry `https://github.com/rust-lang/crates.io-index`
DEBUG:cargo::core::registry: load/precise  registry `https://github.com/rust-lang/crates.io-index`
DEBUG:cargo::core::registry: load/precise  registry `https://github.com/rust-lang/crates.io-index`
DEBUG:cargo::core::resolver: checking if unicode-xid v0.0.4 is already activated
DEBUG:cargo::core::resolver: checking if synom v0.11.3 is already activated
DEBUG:cargo::core::resolver: checking if quote v0.3.15 is already activated
DEBUG:cargo::core::resolver: checking if syn v0.11.11 is already activated

I was pointed to #4806, but I don't think it is exactly that, since that one mentions pushing forever to a vector and the memory consumption of the process doesn't grow at all. But I didn't get the fix yet from rustup:

$ cargo --version
cargo 0.25.0-nightly (930f9d949 2017-12-05)

This is also happening with beta and stable.

If I comment either slog-json or version-sync from the dev-dependencies, run cargo check, then it works. Afterwards, when I already have Cargo.lock present, I can uncomment it and it continues to work. When I remove Cargo.lock, the problem returns.

@alexcrichton
Copy link
Member

This is most likely to be #4066, but would have to investigate more to see what's going on.

@vorner
Copy link
Author

vorner commented Dec 13, 2017

I'm not sure. That one seems to talk about searching a huge space of options and taking a long time, but getting it done eventually. This one, at least according to the log messages, makes no forward progress and has a true cycle.

I'll try letting this run overnight to see if it gets done or not.

@Marwes
Copy link

Marwes commented Dec 13, 2017

Also getting this bug for this repo https://github.com/paritytech/jsonrpc

@vorner
Copy link
Author

vorner commented Dec 14, 2017

Still running…

@kivikakk
Copy link
Contributor

I'm getting this issue on stable (1.23.0) and nightly (1.25.0-nightly 51b0b3734 2018-01-12) with the following Cargo.toml dependencies:

[dependencies]
piston_window = "0.35.0"
piston2d-graphics = "0.24.0"
pistoncore-glutin_window = "0.43.0"
piston2d-opengl_graphics = "0.50.0"

I've just tried installing 1.20.0 and it's happening there too. (It looks like it's taking forever on Updating registry […], but RUST_LOG=debug shows it's looping in the resolver.)

@alexcrichton
Copy link
Member

@kivikakk you're running into #4066

Eh2406 added a commit to Eh2406/cargo that referenced this issue Mar 2, 2018
In a test on rust-lang#4810 (comment)
Before we got to 1700000 ticks in ~97 sec
After we got to 1700000 ticks in ~92 sec
And query disappears from the flame graph
bors added a commit that referenced this issue Mar 3, 2018
Cache the query result.

Small performance gain.

In a test on #4810 (comment)
Before we got to 1700000 ticks in ~97 sec
After we got to 1700000 ticks in ~92 sec. I just reran and got ~82, so it seems to be unstable.
And query disappears from the flame graph
Eh2406 added a commit to Eh2406/cargo that referenced this issue Mar 4, 2018
In a test on rust-lang#4810 (comment)
Before we got to 1700000 ticks in ~(82 to 97) sec
After we got to 1700000 ticks in ~(63 to 67) sec
bors added a commit that referenced this issue Mar 4, 2018
use more Rc in the part of resolver that gets cloned a lot

In a test on #4810 (comment)
Before we got to 1700000 ticks in ~(82 to 97) sec
After we got to 1700000 ticks in ~(63 to 67) sec
Eh2406 added a commit to Eh2406/cargo that referenced this issue Mar 6, 2018
In a test on rust-lang#4810 (comment)
Before we got to 5000000 ticks in ~72 sec
After we got to 5000000 ticks in ~65 sec
Eh2406 added a commit to Eh2406/cargo that referenced this issue Mar 6, 2018
In a test on rust-lang#4810 (comment)
Before we got to 5000000 ticks in ~65 sec
After we got to 5000000 ticks in ~52 sec
bors added a commit that referenced this issue Mar 7, 2018
String interning

This builds on the work from #5118. This interns the strings in the part of resolver that gets cloned a lot.

In a test on #4810 (comment)
Before we got to 1700000 ticks in ~(63 to 67) sec from #5118
After we got to 1700000 ticks in ~(42 to 45) sec

The interning code itself would be much better with a `leak` function that converts a `String` to a `&'static str`. Something like:
```rust
pub fn leek(s: String) -> &'static str {
    let ptr = s.as_ptr();
    let len = s.len();
    mem::forget(s);
    unsafe {
        let slice = slice::from_raw_parts(ptr, len);
        str::from_utf8(slice).unwrap()
    }
}
```
but "there is no unsafe in Cargo", and I am not the best at unsafe. So I just `to_string` and lived with the extra copy. Is there a better way to hand out references?

I assumed that `InternedString::new` world start appearing in profile result, and that we would want `PackageId`, and `Summary`, Et Al. to store the `InternedString`. That is why I put the interner in a shared folder. So far it is just used in the resolver. It may make sense for a lot more of the Strings to be interned, but with the extra copy... I have not explored it yet.
@Eh2406 Eh2406 mentioned this issue Mar 9, 2018
bors added a commit that referenced this issue Mar 9, 2018
More interning

This is a forward approach to interning. Specifically `Dependency` and `PackageId` store their names as `InternedString`s and leave that value interned as long as possible. The alternative is to make a new `interned_name` function. The advantage of this approach is that a number of places in the code are doing `deb.name() == pid.name()` and are now using the fast pointer compare instead of the string compare, without the code needing to change. The disadvantage is that lots of places need to call `deref` with `&*` to convert to an `&str` and sum need to use `.to_inner()` to get a `&'static str`.

In a test on #4810 (comment)
Before we got to 10000000 ticks in ~48 sec
After we got to 10000000 ticks in ~44 sec
Eh2406 added a commit to Eh2406/cargo that referenced this issue Mar 23, 2018
bors added a commit that referenced this issue Mar 24, 2018
Faster resolver: use a inverse-index to not activate the causes of conflict

This adds a test for #4810 (comment) with two extensions that make it harder. It is the last reproducible and in the wild exponentially slow resolution (that I have found).

The problem in the test is `backtrack_trap0 = "*"` is a lot of ways of saying `constrained = ">=1.1.0, <=2.0.0"` but `constrained= "2.0.1"` is already picked. Only then to try and solve `constrained= "~1.0.0"` which is incompatible. Our parent knows that we have been poisoned, and wont try to activate us again.  Because of the order we evaluate deps we end up backtracking to where `constrained: 1.1.0` is set instead of our parent. And so the poisoning does not help. This is harder then #4810 (comment) because:

1. Having multiple semver compatible versions of constrained in play makes for a lot more bookkeeping. Specifically bookkeeping I forgot when I first submitted this PR.
2. The problematic dependencies are added deep in a combinatorial explosion of possibilities. So if we don't correctly handle caching that `backtrack_trap0 = "*"` is doomed then we will never finish looking thru the different possibilities for `level0 = "*"`

This PR also includes a proof of concept solution for the test, which proves that it does solve #4810 (comment). The added code is tricky to read. It also adds a `O(remaining_deps)` job to every activation on the happy path, slower if the `past_conflicting_activations` is not empty.

I'd like some brainstorming on better solutions.
bors added a commit that referenced this issue Apr 6, 2018
use more Rc in the part of resolver that gets cloned a lot 2

This is the same idea as #5118, I was looking at a profile and noted that ~5% of our time was sent dropping `HashMap<PackageId, HashSet<InternedString>>`. A quick rg and I found the culprit, we are cloning the set of features for every new `Context`. With some Rc we are now just cloning for each time we insert.

To benchmark I commented out line https://github.com/rust-lang/cargo/blob/b9aa315158fe4d8d63672a49200401922ef7385d/src/cargo/core/resolver/mod.rs#L453
the smallest change to get #4810 (comment) not to solve instantly.
Before 17000000 ticks, 90s, 188.889 ticks/ms
After 17000000 ticks, 73s, 232.877 ticks/ms
@Eh2406
Copy link
Contributor

Eh2406 commented Jul 26, 2018

Move to close? do to #5213.

@alexcrichton
Copy link
Member

Definitely!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants