Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install_data doesn't respect "--prefix" #130

Closed
ghost opened this issue Jan 5, 2014 · 36 comments
Closed

install_data doesn't respect "--prefix" #130

ghost opened this issue Jan 5, 2014 · 36 comments

Comments

@ghost
Copy link

ghost commented Jan 5, 2014

Originally reported by: iElectric (Bitbucket: iElectric, GitHub: Unknown)


Changing following line in https://bitbucket.org/pypa/setuptools/src/735202ca6848d58bc59022f85cde10af64a61a7e/setuptools/command/bdist_egg.py?at=default#cl-155:

self.call_command('install_data', force=0, root=None)

to

self.call_command('install_data', force=0, root=self.bdist_dir)

Makes data installation respect prefix directory


@ghost
Copy link
Author

ghost commented Jan 5, 2014

Original comment by iElectric (Bitbucket: iElectric, GitHub: Unknown):


Actually one would need to extract prefix from self.bdist_dir to property install it into prefix.

@ghost
Copy link
Author

ghost commented Jan 5, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


It appears as if the original code explicitly set root=None so it seems dangerous to me to change that without a clear understanding of why it does what it does and why it would be better served to change the behavior.

What is the impact/failure if install_data doesn't respect the prefix? Why should it respect the prefix? Can you think of any cases where one might expect this command not to honor --prefix?

@ghost
Copy link
Author

ghost commented Jan 5, 2014

Original comment by iElectric (Bitbucket: iElectric, GitHub: Unknown):


Yes, the use case is you might install under different root than /. For example Nix package manager install all packages in /nix/store/hash-name-version/ as prefix.

@ghost
Copy link
Author

ghost commented Jan 5, 2014

Original comment by iElectric (Bitbucket: iElectric, GitHub: Unknown):


See aarddict/desktop#30

@ghost
Copy link
Author

ghost commented Jan 6, 2014

Original comment by iElectric (Bitbucket: iElectric, GitHub: Unknown):


I have found a workaround:

#!python

    python setup.py install_data --root=$out
    sed -i '/data_files=data_files/d' setup.py
    python setup.py install

The issue is that prefix passed to setup.py install means what to prefix all paths while installing, but install_data confuses this concept with root variable as you can see here https://bitbucket.org/carljm/python-distutils/src/48c42eeaee4410d76675b637bcd401b8919ff19a/command/install_data.py?at=default#cl-63

And setuptools is passing it as None, which means there is no way to prefix data.

@ghost
Copy link
Author

ghost commented Jan 6, 2014

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


I've tried but I don't yet understand the problem well enough to make a judgment on whether this behavior is a defect or intended behavior. Indeed in the github issue, you reference this stackoverflow answer which indicates that data files by design are installed with the egg and not relative to other system locations.

There are a lot of confounding aspects here, but given this behavior has been the same for over 8 years, I'm disinclined to say that the implementation has been broken all this time.

Reading through the distutils code, it appears as if a non-absolute paths will disregard the 'root' attribute and will instead honor the 'install_dir' attribute. Perhaps the fix is for the project to supply non-absolute paths. I can't verify that because the ticket referenced doesn't have any code.

Please provide a simplified example demonstrating the issue you've encountered and how to replicate it in a simple environment.

@ghost
Copy link
Author

ghost commented Jan 7, 2014

Original comment by iElectric (Bitbucket: iElectric, GitHub: Unknown):


I don't agree with argument that setuptools design is perfect because it has been in use for the last 8 years. I believe many have stumbled upon this behaviour and figured out a workaround.

Before I give you an example we have to figure out roles. In this case, it's about handling installation of python packages for a Linux distribution. I can not influence what people put into their setup.py - that is the api provided by setuptools for those providing metadata for the installers. I'm using the installer api: setup.py install.

Let's dive into a real example. I'm using plain Ubuntu 13.10 install. Interesting that searching for data_files on github reveals almost no one is using absolute paths. But I found an example:

$ mkdir tmp
$ sudo easy_install statelessd --prefix `pwd`/tmp

Docs for easy_install:

--prefix                       installation prefix

So by looking at docs and the fact there is no --data-prefix or similar, I'd expect those files are installed into the prefix.

@ghost
Copy link
Author

ghost commented Nov 16, 2015

Original comment by jdemeyer (Bitbucket: jdemeyer, GitHub: jdemeyer):


Related (or perhaps even a duplicate of this bug?): #460

@ghost
Copy link
Author

ghost commented Nov 17, 2015

Original comment by pombredanne (Bitbucket: pombredanne, GitHub: pombredanne):


For references, see also: https://bitbucket.org/pypa/wheel/issues/120/bdist_wheel-creates-a-wheel-that-installs

@ghost
Copy link
Author

ghost commented Dec 2, 2015

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Actually one would need to extract prefix from self.bdist_dir to property install it into prefix.

@iElectric I'm not sure what it means to extract self.bdist_dir. I traced that attribute to the bdist command, which builds that value based on the build command, but I don't see how the prefix is relevant to it. Can you investigate and make a recommendation, perhaps with a test case, that addresses your concern?

@ghost
Copy link
Author

ghost commented Dec 2, 2015

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Issue #460 was marked as a duplicate of this issue.

@ghost
Copy link
Author

ghost commented Dec 8, 2015

Original comment by jdemeyer (Bitbucket: jdemeyer, GitHub: jdemeyer):


Since you "welcome suggestions on how to implement this proposed change", here is a patch which fixes #460:

diff -ru setuptools-18.1/setuptools/command/bdist_egg.py setuptools-18.1-patched/setuptools/command/bdist_egg.py
--- setuptools-18.1/setuptools/command/bdist_egg.py 2015-06-18 14:36:06.000000000 +0200
+++ setuptools-18.1-patched/setuptools/command/bdist_egg.py 2015-12-08 12:47:22.414712218 +0100
@@ -129,7 +129,7 @@

         try:
             log.info("installing package data to %s" % self.bdist_dir)
-            self.call_command('install_data', force=0, root=None)
+            self.call_command('install_data', force=0, root=None, override_dirs=False)
         finally:
             self.distribution.data_files = old

@@ -138,8 +138,9 @@

     def call_command(self, cmdname, **kw):
         """Invoke reinitialized command `cmdname` with keyword args"""
-        for dirname in INSTALL_DIRECTORY_ATTRS:
-            kw.setdefault(dirname, self.bdist_dir)
+        if kw.pop("override_dirs", True):
+            for dirname in INSTALL_DIRECTORY_ATTRS:
+                kw.setdefault(dirname, self.bdist_dir)
         kw.setdefault('skip_build', self.skip_build)
         kw.setdefault('dry_run', self.dry_run)
         cmd = self.reinitialize_command(cmdname, **kw)

If you accept this change, I don't mind writing the relevant documentation.

@ghost
Copy link
Author

ghost commented Dec 11, 2015

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Hi @jdemeyer - would it be just as easy, and more obvious to do the following?

diff -r 04d56f778d05 setuptools/command/bdist_egg.py
--- a/setuptools/command/bdist_egg.py   Fri Dec 11 10:37:09 2015 -0500
+++ b/setuptools/command/bdist_egg.py   Fri Dec 11 10:53:14 2015 -0500
@@ -138,8 +138,9 @@

     def call_command(self, cmdname, **kw):
         """Invoke reinitialized command `cmdname` with keyword args"""
-        for dirname in INSTALL_DIRECTORY_ATTRS:
-            kw.setdefault(dirname, self.bdist_dir)
+        if cmdname != 'install_data':
+            for dirname in INSTALL_DIRECTORY_ATTRS:
+                kw.setdefault(dirname, self.bdist_dir)
         kw.setdefault('skip_build', self.skip_build)
         kw.setdefault('dry_run', self.dry_run)
         cmd = self.reinitialize_command(cmdname, **kw)

All that seems to do is disable the overriding of install directories when installing data files. How can we tell that it does the right thing? What will be the impact on other users?

I think there are several confounding factors here.

First, in Domen's failing example installing statelessd above, the usage is wrong because the --prefix argument is ignored. It's ignored because options to the command must precede positional arguments. The command should have been easy_install --prefix pwd/tmp statelessd (without sudo). After all, sudo shouldn't be required to install into ./tmp. Additionally, that command will fail because ./tmp/lib/pythonX.X/site-packages isn't in the sys.path, so it should be invoked with PYTHONPATH=./tmp/lib/python2.7/site-packages easy_install --prefixpwd/tmp statelessd.

However, even with that, the command fails to install on my machine, and crashes with a MemoryError or similar due to #440. I'll need to focus on that first in order to make statelessd a valid test case for this behavior.

Second, there are two use cases here that are relevant (and elucidated in #460) - data files indicated in relative paths versus absolute paths. I believe relative paths are supported, but the files are installed relative to the distribution, so must be located that way. If the projects are updated to use zip_safe=False and pass relative path targets for data files, is that sufficient to address the issue?

Third, and more importantly, I believe that although both distutils and pip allow installing files outside of the package itself, easy_install (and thus setup.py install when setuptools is present) intentionally added the constraint of disallowing this behavior in order for eggs (and thus any package installed by easy_install) to be encapsulated in a single file. This constraint was created to foster quality through simplicity, allowing an uninstall operation to be as simple as removing the package (directory or file), and eliminating the need to keep a registry of files installed by a package. I've observed similar elegance in the Mac OS X Application folder - uninstalling an application is as trivial as trashing the Application folder, whereas on Windows or Linux, one must rely on an uninstaller to reliably uninstall most applications.

I believe this constraint applies here, regardless if one is using absolute or relative paths. I believe this constraint was intentional and removing it would violate an expectation of users of .eggs (that their system won't have files installed in arbitrary locations, but that everything will be installed to the package only). The constraint also enforces a friendliness to cross-platform support that's often violated when assumptions about system file layouts are built into a distribution's install instructions. There is a workaround, which is to have the package install its data files into the distribution, then have a post-install routine to link or copy the files into other locations on the system. In this model, the package takes responsibility for the changes to the user's system (and any cleanup if appropriate).

If there's a consensus to remove this constraint, it can be lifted, but such a decision can't be made lightly. I think the next steps would be to produce some minimal examples that show the unexpected behavior and describe what the desired behavior would be.

@ghost
Copy link
Author

ghost commented Dec 11, 2015

Original comment by jdemeyer (Bitbucket: jdemeyer, GitHub: jdemeyer):


would it be just as easy, and more obvious to do the following?

Maybe, I don't really care. All I care about is for the issue to be fixed, not how it is fixed.

What will be the impact on other users?

Do you know of any use cases of setuptools + data_files? Given how broken it is (incompatible with both distutils and wheel), I doubt it is actually used the way it is currently implemented.

If the projects are updated to use zip_safe=False and pass relative path targets for data files, is that sufficient to address the issue?

Sorry, I don't get what you mean. The point of this ticket is to be able to install stuff outside of the egg, regardless whether it is zipped or not. So I don't see how zip_safe is relevant.

allowing an uninstall operation to be as simple as removing the package

Uninstall will get more difficult. Not a big deal. I think that working packages are far more important than easy uninstalling.

There is a workaround, which is to have the package install its data files into the distribution, then have a post-install routine to link or copy the files into other locations on the system. In this model, the package takes responsibility for the changes to the user's system (and any cleanup if appropriate).

Right, but this doesn't work with wheel, so it's not really a suitable option either.

@ghost
Copy link
Author

ghost commented Jan 24, 2016

Original comment by dstufft (Bitbucket: dstufft, GitHub: dstufft):


This has come up for me in the past, and again recently. I'd like to push back on this to see if we can get this sorted.

Currently we have two mechanisms to install data files in a python package. One of those is package_data, which works consistently and installs things relative to the distribution that is being installed, and the other one of those is data_files which the behavior is... complicated. As of right now if someone has a setup.py that uses distutils they'll get relative paths in data_files installed relative to sys.prefix. However, if that same person installs that package with easy_install or with pip (w/o wheel) those data_files will stop being installed in that location and will instead be installed relative to the distribution that is being installed. To add even more complexity to this, if someone has a setup.py that uses setuptools they'll get relative paths in data_files installed relative to the distribution. However if that person installs that package with pip (w/ wheel) then those packages will stop being installed in that location and instead be installed relative to sys.prefix.

I think we need to unify the definition of what data_files actually does because the "well sometimes it means X and sometimes it means Y" is very confusing and, I think, user hostile to the point where I recommend people never to use it if they can help it because it's mostly just a footgun at this point. My recommendation for people to not use data_files generally works for the kind of data that people can use package_data for, since that does work consistently and it keeps things contained inside of the installed distribution. However, there is other kinds of data that cannot be installed there because it needs to live in specific, well known locations on the system. One example of this is man pages, man pages need to be placed in specific folders or else the man utility cannot find them by default. Currently if a project is installed with distutils or via wheel they can correctly install a man page, but with setuptools there is no such thing available to them.

To that end, I think what we should do is bring setuptools back in line with distutils and with wheel so that all of them install relative paths in data_files relative to sys.prefix. This would technically be a backwards incompatible change but one I think with little impact since the on by default wheel building in pip has already effectively done this for pip's users and a few projects had to adjust but overall it was pretty painless. I think this makes sense because we cannot change the meaning of data_files in distutils easily, since at the earlier it would land in Python 3.6 and we'd be waiting around for years and years for that to happen. I also think that the distutils defintiion makes more sense in general since we already have the package_data mechanism, having two, slightly different mechanisms for installing data files relative to the installed distribution does not seem like an overall useful thing to me.

@ghost
Copy link
Author

ghost commented Jan 24, 2016

Original comment by iElectric (Bitbucket: iElectric, GitHub: Unknown):


@dstufft Thanks for jumping on this. You're correct, implementations differ between setuptools and distutils.

This issue however talks making sys.prefix configurable when installing data_files, since prefix of Python might not be the same where data files are installed.

Example: Python is custom compiled and lives in /opt/bin/python2.6 (sys.prefix is /opt), but we'd like to install the files into /etc/ (where the prefix really is /).

NOTE: my originally reported snippets talks about self.bdist_dir, although it should really obey --prefix, whereever that variable is stored.

@ghost
Copy link
Author

ghost commented Jan 24, 2016

Original comment by dstufft (Bitbucket: dstufft, GitHub: dstufft):


Presumably once data_files respects sys.prefix it will also respect --prefix since the two are synonymous I believe

@ghost
Copy link
Author

ghost commented Jan 24, 2016

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


So what is the recommendation? Where should data files go when building an egg and installing an egg?

@ghost
Copy link
Author

ghost commented Jan 24, 2016

Original comment by dstufft (Bitbucket: dstufft, GitHub: dstufft):


What's wrong with putting them relative to sys.prefix as distutils does? For the example of say man pages you still want them to go to the place on disk that man will know to look?

@ghost
Copy link
Author

ghost commented Jan 24, 2016

Original comment by jdemeyer (Bitbucket: jdemeyer, GitHub: jdemeyer):


Where should data files go when building an egg and installing an egg?

In the same location as where distutils installs data_files. This can be achieved by a patch that I proposed in this thread. The location is sys.prefix by default, but it can be changed by the --install-data option or if --user is given.

@ghost
Copy link
Author

ghost commented Jan 24, 2016

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


It sounds like what @dstufft is recommending is that eggs be built to support data files and easy_install be adopted to install those data files wherever distutils would have. Is that what your patch does, @jdemeyer?

@ghost
Copy link
Author

ghost commented Jan 25, 2016

Original comment by jdemeyer (Bitbucket: jdemeyer, GitHub: jdemeyer):


It sounds like what Donald Stufft is recommending is that eggs be built to support data files and easy_install be adopted to install those data files wherever distutils would have. Is that what your patch does, Jeroen Demeyer?

Yes.

@ghost
Copy link
Author

ghost commented Mar 5, 2016

Original comment by sdh4 (Bitbucket: sdh4, GitHub: sdh4):


Compounding all of this is that, even with the above patch, "python setup.py install --prefix=..." for an application such as referenced in #460 (attempting to put its data relative to PREFIX) doesn't seem to actually perform the install_data operation.

Doing a separate "python setup.py install_data --prefix=..." doesn't work either because the install_data command doesn't accept a --prefix parameter.

So with the above patch I seem to be able to install into Python's pre-configured PREFIX, but not one that has been specified manually on the command line.

@jaraco
Copy link
Member

jaraco commented May 22, 2016

@jdemeyer: Do you have any comment on @sdh4's concern? Would you be willing to put together a pull request with your patch and the doc updates offered?

@a1an77
Copy link

a1an77 commented Sep 30, 2016

From the setup.py help it seems the correct way of handling data files would be using the --install-data option for the install command.

$ python setup.py --help install | grep install-data
--install-data installation directory for data files

The actual bug is then that that parameter is also not honoured since the files listed in the data_files parameter end up in the package even if the option is specified

just tested with version 28.0.0

@jaraco
Copy link
Member

jaraco commented Sep 30, 2016

What's wrong with putting them relative to sys.prefix as distutils does?

The main problem with allowing easy_install to install files other than to a single location is that violates that explicit expectation. easy_install was designed to put an egg in a single location such that deleting the egg would readily delete the package (much in the same way that deleting a directory in /Applications on a Mac will uninstall that application). There has been the one long-standing exception that scripts would additionally be installed to bin, but even those were wrappers linking to the functionality in the package, such that no meaningful content remains after removing an egg.

I'm happy to allow a package to install arbitrary files to arbitrary directories (perhaps under $prefix) but it should be clear that one cannot expect easy_install to facilitate the uninstall and that one must use pip (or another tool) to keep track of installed files.

@kitterma
Copy link

Since it's been a few years, I thought I'd check in and see if setuptools was usable for a project that needs to install man pages and I see it's not. dstuft's comment from nearly two years ago is exactly right for the problem I have: #130 (comment) - I see all these recommendations to use setuptools instead of distutils, but then this basic requirement is still not supported.

@jdemeyer
Copy link
Contributor

I see all these recommendations to use setuptools instead of distutils, but then this basic requirement is still not supported.

Use pip install instead to install your project. You can still use setuptools inside the setup.py script, but pip changes some default installation directories such that data_files becomes supported just like it was with distutils.

@jaraco
Copy link
Member

jaraco commented Nov 19, 2017

I like that. This is in line with the plan to deprecate easy_install and setuptools' mechanisms for installing packages.

I think that means there's no additional work required for this ticket.

@jaraco jaraco closed this as completed Nov 19, 2017
@jaraco jaraco added the wontfix label Nov 19, 2017
@kitterma
Copy link

Not particularly helpful.

I can use pip to install it myself, but having setup.py install not work is not useful.

I know not everyone cares about things like distribution packaging, but the pip "solution" is useless in that context.

@jaraco
Copy link
Member

jaraco commented Nov 19, 2017

@kitterma What about setup.py install --single_version_externally_managed? That's what pip uses.

@jdemeyer
Copy link
Contributor

I know not everyone cares about things like distribution packaging, but the pip "solution" is useless in that context.

What you are trying to do precisely? As @jaraco said, you can always use setup.py install --single-version-externally-managed --record=/dev/null which installs projects the same way that pip does.

@kitterma
Copy link

I'm trying to do what dstuft used as an example in #130 (comment) - install a man page. setup.py install --single-version-externally-managed --record=/dev/null works, but is certainly not at all intuitive. I don't understand why the insistence on deviating from what distutils does in this case, but at least that sort of solves the problem.

Thanks.

@Alexander-Shukaev
Copy link

Checked again today as well and the issue is still there. This is ridiculous. Simply had to os.path.join the sys.prefix in front of every entry in data_files to achieve consistent behavior between all tools.

@Alexander-Shukaev
Copy link

Alexander-Shukaev commented Jun 18, 2018

Another (and perhaps better) possibility is to modify setup.py:

if __name__ == "__main__":
  try:
    from setuptools     import setup
    opt = '--single-version-externally-managed'
    if not opt in sys.argv:
      try:
        sys.argv.insert(sys.argv.index('install') + 1, opt)
      except ValueError:
        pass
    opt = '--record='
    if not [arg for arg in sys.argv if arg.startswith(opt)]:
      try:
        sys.argv.insert(sys.argv.index('install') + 1, opt + os.devnull)
      except ValueError:
        pass
  except ImportError:
    from distutils.core import setup
  setup(**args)

@Profpatsch
Copy link

Sadly, it looks like we are not going to be able to fix our packaging this way, c.f. NixOS/nixpkgs#4968

bhipple added a commit to bhipple/nixpkgs that referenced this issue Jul 13, 2018
Tool for interacting with S3 buckets, with some performance optimizations over s3cmd.

Skips installing the bash shell completion scripts in /etc, to work around:
NixOS#4968
pypa/setuptools#130
esheldon added a commit to beckermr/pizza-cutter that referenced this issue May 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants