Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CopyFile: Don't touch copyFile target if it is identical. #1501

Merged
merged 2 commits into from
Dec 19, 2013

Conversation

nh2
Copy link
Member

@nh2 nh2 commented Sep 11, 2013

This can prevent a lot of recompilation when using ghc -M / -c.

@23Skidoo
Copy link
Member

Do we want to replace copyFile with copyFileChanged in all situations? Maybe it's better to do this on case-by-case basis?

@23Skidoo
Copy link
Member

I'm leaving this for @dcoutts to review.

@ghost ghost assigned dcoutts Sep 11, 2013
@nh2
Copy link
Member Author

nh2 commented Sep 11, 2013

Do we want to replace copyFile with copyFileChanged in all situations? Maybe it's better to do this on case-by-case basis?

I've thought about that, but I couldn't see the benefits. Touching files in build systems when you don't have to always causes trouble, and the performance impact seems minimal.

(Also, given the amount of helper functions that call copyFile, it is not clear which places call copyFile that matter (I tried to trace that by hand and did not succeed), but as I said above, in the end it always matters.)

@nh2
Copy link
Member Author

nh2 commented Sep 11, 2013

I have not mentioned some example benefits by the way: cabal install now no longer needlessly touches .hi and .o files when installing. This means that cabal builds or plain ghc invocations in projects that depend on them will no longer cause rebuilds.

@dcoutts
Copy link
Contributor

dcoutts commented Sep 18, 2013

Hmm, interesting. I'd also be inclined to only use this in the couple places where it is necessary, but the comment about reinstalls is interesting.

Still, it does make me a tad nervous to do it everywhere. I cannot forsee all the consequences of not updating timestamps.

@liyang
Copy link
Contributor

liyang commented Oct 11, 2013

#1537 includes parts of this request. I have not changed copyFile nor added copyFileChanged; only filesEqual is introduced. (I figured there was no point in copyFileChanged since it's only two lines, and sometimes you actually want to do renameFile rather than copyFile.)

From here on we can gradually work on replacing individual copyFiles to only do so when the contents differ.

@nh2
Copy link
Member Author

nh2 commented Oct 11, 2013

I cannot forsee all the consequences of not updating timestamps.

Why should there? If a file's content has not changed, there is no reason to tell anything in the system the impression that it has. Or is there?

@tibbe
Copy link
Member

tibbe commented Oct 11, 2013

Don't worry about not updating the timestamp.

@liyang
Copy link
Contributor

liyang commented Oct 13, 2013

Here's the kind of problem I see: make(1) uses mtime to indicate that an output file is up-to-date with respect to an input file. When the input changes but its output remains the same, and if we don't update the output's mtime, the next time make is invoked it will build the output again.

Admittedly, not updating the mtime of the output may prevent a cascade of unnecessary rebuilds further down the line (and this behaviour we definitely do want), but something doesn't feel right when the first step is unnecessarily duplicated. Were I writing a Makefile, I might try to work around this as follows:

.PHONY: foo.o
foo.o: foo.o.stamp
foo.o.stamp: foo.c
    gcc -o foo.tmp.o foo.c
    cmp foo.o foo.tmp.o && rm foo.tmp.o || mv foo.tmp.o foo.o
    touch foo.o.stamp

That is, use foo.o.stamp to keep track of whether foo.o is actually up-to-date with respect to foo.c, while the mtime of foo.o is free to lag behind. (I don't know if Shake does something like this internally; I suspect it does. I will find out at some point.)

Of course, this issue is not exclusive to make, which is why we need to be careful here and not replace copyFile wholesale.

@liyang
Copy link
Contributor

liyang commented Oct 13, 2013

tl;dr: lots of tools use mtime to indicate up-to-date-ness WRT its dependencies, even though it ought to be an indication of whether the file has actually changed. We want to keep separate these two attributes of an output file.

@nh2
Copy link
Member Author

nh2 commented Oct 13, 2013

When the input changes but its output remains the same, and if we don't update the output's mtime, the next time make is invoked it will build the output again.

I see what you mean.

I think the difference is the follwoing: There are two types of build-system like things:

  • systems that check current state vs desired state (using mtimes or other properties, e.g. make, shake, ghc --make)
  • task runners (make with PHONY, simple shell scripts, rake, grunt etc.)

Current-vs-desired systems work top-down, i.e. for the output that you demand, they figure out what needs to be built for that, continue recursively and on each step apply their "updatedness metric" like mtime or looking something up in a metadata database (like shake) to avoid duplicate work.

Task runners work bottom-up, executing steps one after the other. They avoid duplicate work by discovering that something is not necessary as they build up, e.g. by seeing that the package that I have just built has the same hash as the package that is already installed (so the install step can be skipped).

Task runners can generally avoid less recompilation than Current-vs-desired systems (which is why projects like ghc-make exist), but they are less complicated.

You probably agree that the problem you mentioned only arises in the first kind of system (like your make example) and that for task runners, behaviour like this is the only way how they can avoid duplicate work.

I think that Cabal is a task runner. cabal install always works in the order preprocessing library, building library with ghc, preprocessing executables, building executables with ghc, copying files to the install location, running haddock, installing haddock files. Any non-task-runner-stuff is done inside ghc --make. You can convince yourself that cabal is a task runner by grepping for moreRecentFile (which compares mtimes): Its usages are not used to skip following compilation steps, so a problem like in your example does not arise.

That being said, yes, replacing copyFile by copyFileChanged "selectively" makes sense. From my reading around in the code, all occurences I found qualify though.

@tibbe
Copy link
Member

tibbe commented Nov 11, 2013

What's the status of this pull request?

@liyang
Copy link
Contributor

liyang commented Nov 11, 2013

I'm not sure. The filesEqual part made it in as of #1537 and is used like so. I felt the two lines for copyFileChanged was too low to cross the Fairbairn threshold, and in any case sometimes you want to renameFile rather than copyFile.

I expect we'll fiddle around with Cabal when we get sufficiently annoyed with long build times, but if anyone knows their way around the library better, please go ahead and add a filesEqual check where appropriate.

@tibbe
Copy link
Member

tibbe commented Nov 11, 2013

@nh2 do you have any particular places in the code where you want to apply this, now when we already avoid relinking? If so, perhaps you could update this pull request to do so. If not, would you be OK with me closing this pull request until we've found a use case? If I'm a bit worried to change all the file copying outright, especially if we don't have a specific case that will be improved (performance-wise) by the change.

@nh2
Copy link
Member Author

nh2 commented Dec 16, 2013

I updated my pull request, making it more conservative as you suggest.

  • I changed filesEqual to check for existence instead of catching exceptions, since that can silence errors when things are wrong
  • I left the copyFiles function as is and added copyFilesChanged so that in future we can cherry-pick those places where we want to use it.

@tibbe tibbe merged commit 72c94a0 into haskell:master Dec 19, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants