Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If git fetch fails wipe the directory and try again from scratch #1580

Merged
merged 3 commits into from
Jan 3, 2016
Merged

If git fetch fails wipe the directory and try again from scratch #1580

merged 3 commits into from
Jan 3, 2016

Conversation

ndmitchell
Copy link
Contributor

This helps with #1418, certainly recovering from an artificial instance of the problem I was able to reproduce (just deleting the .git directory).

@quyse
Copy link
Contributor

quyse commented Dec 30, 2015

git fetch may fail also if remote server is temporary down, or because of wrong network settings.

Honestly I would even argue against this fix at all. Limited cache at Appveyor is a fairly rare and specific problem. I would say stack should have reasonable expectations about underlying filesystem/hardware/etc, and should not contain "repairing" functionality for every possible case of misbehaving system, as it may make things worse. Like, recursive deletion of a directory should be very carefully tested, to not delete by chance user's home directory or something because of wrong environment variables or spaces in the path, etc...

@ndmitchell
Copy link
Contributor Author

The current code is non atomic - if you Ctrl C at the wrong point then your stack install is irrecoverably trashed unless you find a directory and delete it yourself. I don't think stack should ever get wedged because on disk structures aren't what it expects. The fact appveyor causes the disk structures to be incorrect is just one way this can go wrong, but it's far from the only way.

I agree recursive delete should be used carefully and tested, and confirm I have tested it, but in general, these things are tricky.

@quyse
Copy link
Contributor

quyse commented Dec 30, 2015

I mean it's better when user deals with unexpected breakage, rather than the tool itself. Probably the primary motivation for writing my above comment was the feeling that stack is already way too "intelligent" - like when you call stack path to just get the path to ghc, it may start downloading snapshot. I do understand why it's doing this, but it's certainly not a "unix way", and makes it harder to use stack in scripts, as it's difficult to say what sort of operations stack is going to do for specific command and what kind of things it may break. This fix with removing and total re-downloading of package index is an example of such intelligence IMHO. I would say it's better to print useful error message and let user fix the things or provide better system to operate on.

Anyway, I apologize if my previous comment sounded bluntly, I don't mean this. Thanks for all your efforts with stack!

@ndmitchell
Copy link
Contributor Author

@quyse - no offense taken - it's important to have people push back, otherwise stupid ideas make it in, and my previous contribution to stack has been minimal, so entirely other people :)

For things like the presence or absence of GHC, which a user should be aware of, I agree that information querying commands that suddenly go off and magic the world around them are a bit scary (it has arguments both ways). Here I think it's a little more reasonable - it's a stack internal directory that no user should ever be aware exists, operating on a git bundle of the repo, when users shouldn't be aware git is being used like that behind the scenes. The only thing you could tell a user is "remove this directory and try again", and all the arguments about lack of network access etc mean you can never know if that's the right thing to suggest anyway.

Alternatively, some rsync like protocol where it just mirrored a directory, with no preconceived notion of the start or end point, would totally solve the entire issue. But I don't think that's on the cards at the moment.

PS. If my first comment sounded terse its because I tapped it out on a mobile while waiting for my son to fall asleep :)

@mgsloan
Copy link
Contributor

mgsloan commented Dec 30, 2015

This change seems reasonable to me. It'd also be good to show the user the exception that causes the retry to be necessary. Also, how about factoring this out like this:

let cloneAndFetch = do ...
cloneAndFetch `C.catch` \(ex :: ReadProcessException) -> do
    $logWarn (T.pack (show ex))
    $logStickyDone "Failed to fetch package index, retrying."
    removeTree acfDir
    cloneAndFetch

Can you please update the PR with these changes? Thanks!

While it is reasonable that stack shouldn't try to work around every single failure case, we should try to gracefully handle the cases that happen in practice. Having the requisite intelligence to make stuff "just work" is part of what makes it excellent! Since this folder is an implementation detail, deleting it and starting from scratch is perfectly reasonable.

@kadoban
Copy link
Collaborator

kadoban commented Dec 30, 2015

I did some thinking about this before, coming from a different place, and my tentative conclusion was: for a perfect answer we need to use a git library that can more easily tell why the command failed. If the local repo is corrupt, yeah it should be wiped. If the network connection didn't succeed it might make sense just to retry instead, without nuking the local repo. If something else failed, there's probably no point in trying again automatically, etc.

Of course "perfect" isn't always a good thing to wait for, I'm unsure if this is better than the current behavior or not. I suspect it is better in general?

@mgsloan
Copy link
Contributor

mgsloan commented Dec 30, 2015

That's a good point, @kadoban! We could try to parse out different git error messages. It seems reasonable to me to detect the case where it's a network failure, and avoid deleting the repo. We'd also want to detect this case in order to support a general "--offline" mode for stack, as alluded to here, in the context of stack new.

We could also use gitlib's binding of libgit, but I'd rather avoid depending on an additional C library.

@ndmitchell
Copy link
Contributor Author

@mgsloan - the issue with putting the retry there is if there is no repo and we git clone, and that fails, we'll retry the git clone, which seems incorrect. Parsing error messages from other people's programs scares me, especially git, which scares me independently. Adding the logging seems a good idea.

@ndmitchell
Copy link
Contributor Author

I now log the exception. @mgsloan, I'm happy to shove in your version if you prefer, but it does have a few downsides. If the code duplication is excessive, I can shove the two commands into a let, but given they call different command line functions in the initial and retry, it's not very neat either way.

@ndmitchell
Copy link
Contributor Author

(I note while going through the process functions that they are all a bit weirdly inconsistent - with the working directory either a Maybe, or required, and the arguments in different orders. I leave that to you though.)

@mgsloan
Copy link
Contributor

mgsloan commented Jan 3, 2016

LGTM, thanks!

(I note while going through the process functions that they are all a bit weirdly inconsistent - with the working directory either a Maybe, or required, and the arguments in different orders. I leave that to you though.)

Agreed. The functions there have been created and modified as required by the uses in stack, which is one reason it's still in the stack source tree. It'd be cool to generalize + clean it up + release it as a new package.

mgsloan added a commit that referenced this pull request Jan 3, 2016
If git fetch fails wipe the directory and try again from scratch
@mgsloan mgsloan merged commit ecb056c into commercialhaskell:master Jan 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants