Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving location where library build files are generated #3545

Open
ezyang opened this issue Jul 13, 2016 · 24 comments
Open

Moving location where library build files are generated #3545

ezyang opened this issue Jul 13, 2016 · 24 comments

Comments

@ezyang
Copy link
Contributor

ezyang commented Jul 13, 2016

Summary by @ezyang. We considered the following options for restructuring dist/build such that component build directories are not a subdirectory of the library build directory:

  1. Move library directory to dist/build/pkgname. Problem: This will conflict with an executable that is named the same as the package.
  2. Move non-library components to dist/subbuild/compname. Problem: This moves where executables are, and many people (including this author) are used to accessing the executable at dist/build/exename/exename. One way to patch over this is to copy the executables back to the old locations.
  3. Move the library directory to dist/build/lib-pkgname. Problem: This will still conflict with a component named lib-pkgname.
  4. Qualify all components with their type, e.g., dist/build/exe-exename. Problem: This moves where the executables are.
  5. Move the library directory to dist/libbuild. No major problems, although @23Skidoo would prefer all build products be in dist/build.

We are going to go with proposal (5).


Hey guys. I want to propose a BC-breaking change to the dist directory layout. I'll first explain what the change is, and then why I need it. I'd like to know if people are OK with it. It WILL break code which needs to know where Cabal puts build products. CC @mgsloan, @23Skidoo , @dcoutts

Cabal-1.24 behavior. Suppose we are building a Haskell module M.hs. If this module is in the public library, the interface file is placed in dist/build/M.hi. If the module is in any other component (e.g., a test suite) named comp, the interface file is placed in dist/build/comp/M.hi)

Proposed new behavior. I propose that libraries also get their own subdirectory: dist/build/pkgname; thus if we build M.hs in a library named foobar it will be placed in dist/build/foobar/M.hi. Obviously, this will break any scripts which rely on the interface files/object files having been placed in the old location.

Motivation. Cabal supports pre-processors which may generate extra object files that need to be linked into the final libraries/executables we build. This is tested by the PreProcessExtraSources cabal:package-test. For example, if we build Foo.hsc, this results in a Foo_hsc.o in our build directory.

If a preprocessor generates this extra object file, how does Cabal know to link it in? The "obvious" design is that when you run the preprocessor, it tells you what extra files need to be compiled and you go ahead and compile them. But this is not what Cabal does. Instead, what it does is simply look in the build directory for any files which look like ones that the preprocessor generated, and make sure to link them in. (ian-ross@e38cb0c)

Now here is the problem with Cabal-1.24's behavior: the build directories for the test suites/benchmarks/etc are all INSIDE the library's build directory. So clearly the search code will pick up the extra build products: disaster! And so the most obvious fix (without completely changing the preprocessor interface) is to move the directory out of the way.

The bug got uncovered by #3542 although I am not really sure why it didn't manifest earlier.

Alternatives. I think there are two plausible alternatives which avoid moving the location of library files:

  1. Move all non-library components to another directory, e.g., "subbuild" in dist. Still a BC break, because the binary for executables is no longer where you expect it to be.
  2. Keep the current layout, and filter out subdirectories which correspond to other components in the Cabal file. I am morally opposed to this option, since it enshrines a simply incorrect design decision. Look at this Cabal file:
name:                xyz
version:             0.1.0.0
license:             BSD3
author:              Edward Z. Yang
maintainer:          [email protected]
extra-source-files:  ChangeLog.md
cabal-version:       >=1.10

library
  exposed-modules:     Doom.Foo
  build-depends:       base
  default-language:    Haskell2010

test-suite Doom
    main-is: Foo.hs
    type: exitcode-stdio-1.0
    build-depends: base

These two Foos clobber each other! Ouch!

@phadej
Copy link
Collaborator

phadej commented Jul 13, 2016

Could this be done for new-build only, and how it relates to it?

@23Skidoo
Copy link
Member

I propose that libraries also get their own subdirectory: dist/build/pkgname; thus if we build M.hs in a library named foobar it will be placed in dist/build/foobar/M.hi.

Will this still work when there's an executable with the same name as library (example)? Otherwise +1.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 13, 2016

@phadej : No, it affects both new-build and build; it's a modification to how Setup scripts work.

@23Skidoo: Oh right. You'll need to disambiguate the public library somehow. The subbuild alternative plan would do this (at the cost of moving where all the executables are.) Maybe we should do that then.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 13, 2016

OK, I came up with another "more BC" scheme:

  1. The public library continues to be built into dist/build
  2. Other components are built into dist/subbuild/$component_name
  3. However, all components which produce executables have the executable copied into dist/build/$component_name/$exe_name

So, no library files move, and the executables are where you expect them.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 15, 2016

The bug that motivated this is present in master; however it doesn't show up on a fresh build because the benchmarks/test suites get built after the library, avoiding the library linking step from picking up the duplicate objects.

ezyang added a commit to ezyang/cabal that referenced this issue Jul 15, 2016
See the comment in the code.  I don't like it but it's BC!

Signed-off-by: Edward Z. Yang <[email protected]>
@phadej
Copy link
Collaborator

phadej commented Jul 15, 2016

btw, just today I run into situations where two executables in the same package use the same cabal_macros.h, so MIN_VERSION_package flags from other one leaks into other one.

EDIT in particularly I cannot have a source file with "clever" #ifdef MIN_VERSION_foo definitions

So the fix for this is very welcome!

@ezyang
Copy link
Contributor Author

ezyang commented Jul 16, 2016

The cabal_macros.h but is different and already fixed in master, see #1893 and the fixing commit is 90e908b

@23Skidoo
Copy link
Member

I'd prefer not to add a subbuild directory, IMO the build/subbuild distinction is confusing. Can we put library files into e.g. build/lib-foo and leave everything else as before? We'll have to prohibit components that have a name starting with lib-, but I don't think it's a big problem in practice.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 16, 2016

@23Skidoo But your proposal exhibits a displeasing asymmetry where the library is called lib-foo but the test is called foo, and so is the executable. Actually, there is already related nastiness in Cabal where sometimes a String (without a component type tag) is used as a component name (e.g., PDTagged), and thus there needs to be a convention for what the library is called (currently the empty string is used. Ick.)

So if we are rearranging the directory, we might as well do it right, and put the executable in exe-foo, etc.

@23Skidoo
Copy link
Member

So if we are rearranging the directory, we might as well do it right, and put the executable in exe-foo, etc.

I agree that this is cleaner, but people are used to having exes in dist/build/foo/foo. So maybe continue using dist/build/foo for exes and use dist/build/{lib,test,bench}/foo for other components?

@ezyang
Copy link
Contributor Author

ezyang commented Jul 17, 2016

OK... so how about the opposite: let's put the library build files in a different place than dist/build? How about dist/libbuild? (FWIW, if I could do it again I wouldn't allow executables and libraries to have the same name).

@23Skidoo
Copy link
Member

Well, that's basically what I proposed in #3545 (comment). libbuild is a bit more intuitive than subbuild, but IMO it's more logical to keep all build artifacts under dist/build.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 17, 2016

I mean, at the end of the day all the proposals are the same because we're just suggesting different locations to put things, but the original proposal was not so good because it caused there to be a reserved name that other components are not allowed to use. I agree it is certainly logical to keep all build artifacts under dist/build but there is another concern here, one of namespacing, and the point is that the "public library" lives under a different namespace than all other components of a package (you can't have a test-suite foo and a benchmark foo)

@23Skidoo
Copy link
Member

23Skidoo commented Jul 17, 2016

I won't protest too much against dist/libbuild :-)

ezyang added a commit to ezyang/cabal that referenced this issue Jul 17, 2016
See the comment in the code.  I don't like it but it's BC!

Signed-off-by: Edward Z. Yang <[email protected]>
@ezyang
Copy link
Contributor Author

ezyang commented Jul 18, 2016

I summarized the discussion up top.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 19, 2016

I just realized there is another consideration if we go for dist/libbuild: should the value of buildDir in LocalBuildInfo be dist/libbuild or dist/build?

This is quite difficult because either choice will break many setup scripts. Here is some quick grepping on Hackage:

Cabal-ide-backend-1.23.0.0/Distribution/Simple/PreProcess.hs
165:      pre dirs (buildDir lbi) (localHandlers bi)
167:    let exeDir = buildDir lbi </> nm </> nm ++ "-tmp"
178:          preProcessTest test f $ buildDir lbi </> testName test  
181:          let testDir = buildDir lbi </> stubName test
191:          preProcessBench bm f $ buildDir lbi </> benchmarkName bm

Here buildDir is assumed to be the directory containing components.

gstreamer-0.12.5.0/Gtk2HsSetup.hs
32:import Distribution.Simple.LocalBuildInfo (LocalBuildInfo(withPackageDB, buildDir, localPkgDescr, installedPkgs, withPrograms),
204:                                             , buildDir lbi ] }
242:        "--precomp=" ++ buildDir lbi </> precompFile,
263:  mFiles <- mapM (findFileWithExtension' ["chi"] [buildDir lbi] . toFilePath)

buildDir is assumed to be recording where the library goes.

And this repeats over and over again repeatedly all over many packages with custom Setup scripts. This seems to suggest changing this layout is a no-go without a major BC break.

@23Skidoo
Copy link
Member

That's a bummer. My vote is for making this change anyway and setting the default Cabal upper bound for build-type: Custom packages to < 1.26 when there's no custom-setup section. We can also bump the version to 2.0 in the next release, IIRC cabal-install 1.24 already sets the default upper bound to <2.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 20, 2016

OK, well, if we are going to release Cabal 2.0 next release, this affects a number of other BC decisions. E.g., #3574 and @ttuegel's #3572. If we are changing the default upper bound (I like this idea more and more, since the preprocessor interface changed backwards incompatibly and I don't think it's possible to keep BC here), what is our policy for BC-breaking changes? I guess the answer is something like, "Please still try to keep BC, so that it's easier for applications to migrate" but minor incompatibility's like #3572 are still fine.

@23Skidoo
Copy link
Member

I guess the answer is something like, "Please still try to keep BC, so that it's easier for applications to migrate" but minor incompatibility's like #3572 are still fine.

Yes, something like this. Plus there should be a mention in the changelog when a part of the API gets changed.

@mgsloan
Copy link
Collaborator

mgsloan commented Jul 21, 2016

As a data point, in stack we would also prefer to preserve BC here.

I'm don't have the details here. But it seems like part of the topic post suggests a third, potentially viable approach

The "obvious" design is that when you run the preprocessor, it tells you what extra files need to be compiled and you go ahead and compile them. But this is not what Cabal does.

How about making this what Cabal does, instead of hacking around the current design? Deprecate the old interface and transition all the pre-processors to the new system.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 21, 2016

OK, well, I will have to look at the preprocessor interface and propose a design.

ezyang added a commit to ezyang/cabal that referenced this issue Jul 21, 2016
See the comment in the code.  I don't like it but it's BC!

Signed-off-by: Edward Z. Yang <[email protected]>
@ezyang
Copy link
Contributor Author

ezyang commented Jul 21, 2016

OK, so let's say how preprocessors work today. Presently, preprocessors are represented using the following data structure:

data PreProcessor = PreProcessor {
  -- Is the output of the pre-processor platform independent? eg happy output
  -- is portable haskell but c2hs's output is platform dependent.
  -- This matters since only platform independent generated code can be
  -- included into a source tarball.
  platformIndependent :: Bool,

  runPreProcessor :: (FilePath, FilePath) -- Location of the source file relative to a base dir
                  -> (FilePath, FilePath) -- Output file name, relative to an output base dir
                  -> Verbosity -- verbosity
                  -> IO ()     -- Should exit if the preprocessor fails
  }

A preprocessor takes in an input source file, and an output destination, and arranges for something to be written to that file name. In general, the output file name is always Modname.hs of some sort, because preprocessors are assumed to be generating Haskell code.

However, some preprocessors can generate extra files. There are two situations which are handled by Cabal today:

  • hsc2hs: produces extra Modname_hsc.c and Modname_hsc.h files.
  • c2hs produces extra Modname.chs.c, Modname.h and Modname.chi

The way they are handled is via type PreProcessorExtras = FilePath -> IO [FilePath] which takes a path to the build directory and recursively searches the directory for files which match the preprocessor template style.

Intuitively, we should be able to determine precisely what the possible generated files are simple from the list of module names that were preprocessed under some preprocessor.

But where should we return this information?

  • preprocessComponent should return precisely the C files to be built; otherwise we have no idea would ModuleNames and under what preprocessors they ran
  • Ditto preprocessFile, because at this point we don't know what preprocessor will run.
  • Inside preprocessFile, once we find the Preprocessor, we need a way to find out what files we should snoop at to find the extra files.

Since the extra files are preprocessor specific, we need a hook in PreProcessor to do it. There are two possibilities:

  1. Modify runPreProcessor to return a list of extra files which the preprocessor generated. This is more BC assuming mkSimplePreprocessor was used, then the signatures can be adjusted in tandem.
  2. Add a new hook function that given the output file name gives candidate locations. Or the preprocessor can do the probe itself and the output are precisely the files needed. Latter is more efficient; we should just provide helper functions for "speculating". This is more BC, but http://hs01.scs.stanford.edu:8910/search/?q=PreProcessor some clients of PreProcessor construct the data structure directly, so it is not as easy to extend.

Other BC considerations:

OK, so concrete proposal in next comment.

@ezyang ezyang mentioned this issue Jul 21, 2016
38 tasks
@ezyang ezyang modified the milestone: Cabal 2.0 Sep 6, 2016
@gbaz
Copy link
Collaborator

gbaz commented Sep 5, 2021

I'm pretty sure we moved to a much more granular layout similar to this with new-build?

@mpilgrem
Copy link
Collaborator

This topic (the structure under directory build) has been resolved for users of recent versions of Cabal (the tool), but remains an issue for users of Cabal (the library) (the Setup.hs commands) - including users of Stack, which makes use of Cabal (the library) to build. The current structure does not distinguish between, for example, the build artefacts for module Foo.Foo.Something in directory build/Foo/Foo and a package executable named just Foo with path build/Foo/Foo. This affects macOS users in particular because (a) the file system is case-insensitive and (b) executable files are not distinguished with an extension (unlike the use of .exe on Windows).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants