Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping #5988

Merged
merged 22 commits into from
Sep 17, 2018

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Sep 6, 2018

As mentioned in #5921 (comment), the new proptest found a live bug! This PR so far tracs my attempt to minimize the problematic input.

The problem turned out to be that we where backjumping on incomplete set of conflicts.

@rust-highfive
Copy link

r? @matklad

(rust_highfive has picked a reviewer for you, use r? to override)

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 7, 2018

Edit: I was wrong about what can fix the problem.

I came across @alexcrichton's prescient comment #5187 (comment). This is exactly the kind of bug he was worried about. Furthermore, it was found with the randomized/quickcheck testing I suggested would make me feel better.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 7, 2018

commenting out

if frame
.context_backup
.is_conflicting(Some(parent.package_id()), conflicting_activations)
{
continue;
}
will also get the test to pass.

This is a brain dump of all the things I can think of that are relevant, as I don't know how they fit together.

This is code that prunes the remaining_candidates stack to only try candidates that can solve the problem that triggered backtracking. (originally added in #4834, rewritten in #5037. maybe confirm by testing if this bug reproduces all the way back then.) So it must be skipping over something that in fact could solve the problem.

I tried running the test with trace logging and I got (edited to the relevant parts):

after yank
resolver: initial activation: root v1.0.0
resolver: root[0]>g trying 1.0.0
resolver: g[0]>b trying 1.0.0
resolver: g[1]>e trying 1.1.0
resolver: e[0]>c trying 1.1.0
resolver: b[0]>a trying 1.1.0
resolver: g[2]>f trying 1.1.0
resolver: f[0]>d trying 1.0.0
resolver: d[0]>c -- no candidates
resolver::conflict_cache: c adding a skip {PackageId { name: "c", version: "1.1.0"}: Semver}
resolver: g[2]>f trying 1.0.0
resolver: f[0]>to_yank -- no candidates
resolver::conflict_cache: to_yank adding a skip {}
-- this is ware the log ends with the line uncommented
resolver: b[0]>a trying 1.0.0
resolver: g[2]>f trying 1.1.0
resolver: f[0]>d trying 1.0.0
resolver: f[0]>d skipping 1.0.0
resolver: f[0]>d -- no candidates
resolver::conflict_cache: d adding a skip {PackageId { name: "c", version: "1.1.0"}: Semver}
resolver: g[2]>f trying 1.0.0
resolver: g[2]>f skipping 1.0.0
resolver: g[2]>f -- no candidates
resolver: g[1]>e trying 1.0.0
resolver: g[2]>f trying 1.1.0
resolver: f[0]>d trying 1.0.0
resolver: d[0]>c trying 1.0.0
resolver: b[0]>a trying 1.1.0

So this is how I read that:

  • e trying 1.1.0 leads to c trying 1.1.0
  • f trying 1.1.0 leads to d trying 1.0.0 leads to dep_req("c", "=1.0") does not work.
  • backtrack to f trying 1.0.0 leads to dep("to_yank") does not work. With a conflicting_activations == {} i.e. there is nothing that can make dep("to_yank") work.
  • So now we start pruning the remaining_candidates stack for something that will make us work or removes our parent. What will make f 1.0.0 work? Nothing. Our parent is g so we skip all the way back. Skipping right past the thing that would make a nephew (descendant of a sibling) work.

Why don't we keep track of more of the conflicting_activations like info from our siblings and their descendants? One, it is often faster to prove something is wrong by trying it then by rigorous argument" so we tend to YOLO and only get smarter on subsequent attempts (see the skipping later on in the log for example of this working). Two, we aggressively do not add things to the remaining_candidates meening that when backtracking we dont have a good list of what we are backing out.

Why don't we just disable this optimization for now? Well several of the tests hang if we leave it commented out.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 12, 2018

I found an odd hack that gets this one to pass. I am going back to the proptest branch to find a new case to shrink. Mabey looking at another example will help me to see a solution.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 14, 2018

I shrank 2 more examples. I think the dont_yet_know_the_problem_3, may be small eanough to be worth digging into. I will enable logging and see what it is doing.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 14, 2018

Expanding the odd hack got all 3 to pass! I think this is worth merging. Thow the next thing on my todo list is merging this back into my proptest branch and seeing if it finds more bugs.

@Eh2406 Eh2406 changed the title WIP BUG fuzzing found a bug in the resolver, now how to fix it. BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping Sep 14, 2018
@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 15, 2018

The merged version of the proptest branch ran smoothly, and the performance related resolver test are not strongly affected. So I think this is ready for review.

@alexcrichton
Copy link
Member

@bors: r+

Looks great to me, thanks so much for tracking this down!

@bors
Copy link
Contributor

bors commented Sep 17, 2018

📌 Commit 9b2295d has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Sep 17, 2018

⌛ Testing commit 9b2295d with merge f2b861dd23abb6ea7f25e7cd46ac4a508bbeec38...

@bors
Copy link
Contributor

bors commented Sep 17, 2018

💔 Test failed - status-appveyor

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 17, 2018

termcolor is not building with minimal-versions why are we seeing this hear? I did not touch the "toml"?

@ehuss
Copy link
Contributor

ehuss commented Sep 17, 2018

I've been trying to figure out why things changed, but I don't see a good reason. The differences are:

cargo
    home 0.3.2 downgraded to 0.3.0

home:
    scopeguard 0.3.1 -> 0.1.2
    winapi 0.3.4 -> 0.2.8
        +kernel32-sys 0.2.2
        +advapi32-sys 0.2.0

termcolor:
    wincolor: 0.1.5 -> 0.1.1

wincolor 0.1.1:
    winapi 0.3.4 -> 0.2.8
        +kernel32-sys 0.2.2

+advapi32-sys:
    +winapi 0.2.8
    +winapi-build 0.1.1

atty:
    winapi 0.2.7 -> 0.2.8

backtrace:
    winapi 0.2.7 -> 0.2.8

curl:
    winapi 0.2.7 -> 0.2.8

dbghelp-sys:
    winapi 0.2.7 -> 0.2.8

fs2:
    winapi 0.2.7 -> 0.2.8

kernel32-sys
    winapi 0.2.7 -> 0.2.8

same-file
    winapi 0.2.7 -> 0.2.8

userenv-sys
    winapi 0.2.7 -> 0.2.8

The only thing that's changed recently is that termcolor released a 1.0.4 version 4 days ago, but tests were passing yesterday, and a newer version shouldn't affect things. I don't see why the home package suddenly started using an older version (or why it wasn't using that older version in the first place).

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 17, 2018

We sort dependencies from most constrained to least. So it is possible that the number of unused versions of a crate can change the resolution.

@ehuss
Copy link
Contributor

ehuss commented Sep 17, 2018

I bisected the index and the publishing of curl 0.4.16 is what has made the resolve change. I imagine you're right, that it shifted some of the heuristics.

Well, it seems clear that termcolor 0.3.0's dependency on wincolor 0.1.1 doesn't satisfy the minimal dependencies. How do we typically fix that?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 17, 2018

So I don't think we have been doing this for long enough to have a "typically" yet. For the case of trying to get it setup we have some advice #5657 (comment) But this is the first time we have hit this form.

I'd try:

  1. Can we bump the patch version of one of our direct affected dependencies to fix the problem?
  2. One of the other direct dependencies?
  3. Are any of the affected dependencies interested in a PR to work with minimal-versions? (termcolor is a BurntSushi project so yes.)
  4. Adding a synthetic dependency to get things working?

I am working on a complicated patch to try #5921 (comment) but can jump in a little while.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Sep 17, 2018

I could not reproduce on the nightly I was using, but rustup update and I could repro. So that is odd.

> cargo tree -i -p termcolor:0.3.0
    Updating crates.io index
termcolor v0.3.0
└── env_logger v0.5.4
    └── cargo v0.30.0 (file:///C:/Users/finkelman.SEMCOGDOM/Documents/MyProjects/cargo)

So I am trying bumping are env_logger to 0.5.11

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Sep 17, 2018

📌 Commit 682b295 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Sep 17, 2018

⌛ Testing commit 682b295 with merge 8201560...

bors added a commit that referenced this pull request Sep 17, 2018
BUG fuzzing found a bug in the resolver, we need a complete set of conflicts to do backjumping

As mentioned in #5921 (comment), the new proptest found a live bug! This PR so far tracs my attempt to minimize the problematic input.

The problem turned out to be that we where backjumping on incomplete set of conflicts.
@bors
Copy link
Contributor

bors commented Sep 17, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 8201560 to master...

@bors bors merged commit 682b295 into rust-lang:master Sep 17, 2018
@Eh2406 Eh2406 deleted the explore_the_bug branch September 20, 2018 02:04
@ehuss ehuss added this to the 1.31.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants