Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All packages that contain non-package data are now likely installed in a broken way since 7.0.0 #2874

Closed
pombredanne opened this issue Jun 4, 2015 · 43 comments
Labels
auto-locked Outdated issues that have been locked by automation

Comments

@pombredanne
Copy link
Contributor

Since 7.0.0 sdist are converted to wheels before being installed. As a result, a package is never setup.py install'ed but always bdist_wheel'ed first.

This means that every package that contains additional files hit the wheel bug https://bitbucket.org/pypa/wheel/issue/120/ or/and https://bitbucket.org/pypa/wheel/issue/92/bdist_wheel-makes-absolute-data_files which are the same issue.

This includes breaking the pip install of a large number of packages that could only be installed from sdists because of this wheel bug such as the reasonably popular selenium, pdfminer, publicsuffix and many more.

And since an sdist is always wheeled before install, there is no work around that is possible anymore: forcing the use of an sdist will always go through the wheels mechanics anyway.

The solution short term is to stay on pip 6.x
The solution must be to fix pip asap to not use wheel always and/or to fix wheel ASAP for these bug: https://bitbucket.org/pypa/wheel/issue/120/ https://bitbucket.org/pypa/wheel/issue/92/

The bug was introduced in pip by #2618 and has not been marked as a breaking change in the release notes, but this is a major breaking change that likely breaks the installation of hundreds of packages starting with pip 7.0

@pombredanne
Copy link
Contributor Author

So FWIW, I have bugs popping all over the places because of this after the upgrade to pip 7 and I must revert to 6.11 for now. This is a rather major blocker for me and eventually would impact many project in a rather sneaky and hard to diagnose way as packages while installing OK suddenly stop to work because of files they cannot find at runtime.
@dstufft I never understood if the https://bitbucket.org/pypa/wheel/issue/120/ and https://bitbucket.org/pypa/wheel/issue/92/ are actual bugs in wheel or part of the wheel spec... but regardless of what they are they break quite a few packages.

@dstufft
Copy link
Member

dstufft commented Jun 4, 2015

You can also use the flag in pip 7 that lets you turn off the wheeling of sdists for particular packages.

pip install --no-binary selenium,pdfminer,publicsuffix selenium

@pombredanne
Copy link
Contributor Author

@dstufft thank you for the good tip and the quick reply!

That can help, but in my case I eventually pip install automatically 100's of packages from multiple requirement files and I cannot easily single out a few packages. For now I have kept as sdist all the ones that were broken because of the wheel issue and that was a decent workaround

@pombredanne
Copy link
Contributor Author

@dstufft actually the wheel bug related comments would be rather for @dholth ... but you guys share the same first name initial hence my confusion :)

@dstufft
Copy link
Member

dstufft commented Jun 5, 2015

You should be able to put that flag into a config file fwiw, and I think into the requirements file too?

@dstufft
Copy link
Member

dstufft commented Jun 5, 2015

FWIW I just took a look at the packages:

  • selenium - uses hacks to poorly replace package_data, should just use package_data instead.
  • pdfminer - doesn't appear to use data_files at all in it's github master branch.
  • publicsuffix - also appears to be using hacks to poorly replace package_data.

@pombredanne
Copy link
Contributor Author

@dstufft I agree with you that these are poorly packaged... I actually forked publicsuffix to publicsuffix2 to fix that...
However poorly packaged these may be, they were installing fine before pip 7.x and they do not now. and there are likely many more like these if we dig a bit more.
I just cannot fix all the poorly packaged bits ... ;)

@dstufft
Copy link
Member

dstufft commented Jun 5, 2015

Sure :)

We knew going into it, some things would have problems :/, the problem is Wheels have been around for almost 2 years now. I don't think giving more time is going to cause the remaining packages to fix themselves, so our choices are to either never make this change, or make it and let those projects work to get themselves fixed. Since this change is a massive improvement (when it works...) to people's lives we decided to go that way but provide the --no-binary option to give people an off switch for projects that install incorrectly with this (and in addition to that, if we actually fail to build the wheel, we'll fall back to the sdist install too).

@benjaoming
Copy link

Oh they're just poorly packed? What a sensible thing to say ;)

Well, actually it turns out that wheel doesn't support the data_files argument properly for setup(). The bug was around even before automatic bdist_wheel was turned on a part of pip install. So now we have a Pip 7 that potentially can't install anything anymore, because it uses Wheel which behaves differently (and brokenly).

https://bitbucket.org/pypa/wheel/issue/92/bdist_wheel-makes-absolute-data_files

Using --no-binary as a work-around? That won't work in requirements.txt, and we can't seriously say that people ( using these hundreds / thousands of packages ) have to discover for themselves that data files have been installed to the wrong locations and then individually deduct "oh I'll just use --no-binary" :)

I would suggest fixing the way that Wheel handles data_files: This is the only way forward, as the current bdist_wheel behavior is already in the wild. Get a fixed Wheel released and increment the Wheel version dependency of a new Pip 7.0.4.

@dholth
Copy link
Member

dholth commented Jun 5, 2015

data_files has always been iffy in virtualenvs. Does the program using
the data_files need to know it's in a virtualenv, and look for paths
relative to it? The bdist_egg source code handles data_files in a
different way, maybe that would be more desirable.

On Fri, Jun 5, 2015, at 06:19 AM, benjaoming wrote:

Oh they're just poorly packed? What a sensible thing to say ;)

Well, actually it turns out that wheel doesn't support the data_files
argument properly for setup(). The bug was around even before
automatic bdist_wheel was turned on as an automatic part of pip
install. So now we have a Pip 7 that potentially can't install
anything anymore, because it uses Wheel which behaves differently (and
brokenly).

https://bitbucket.org/pypa/wheel/issue/92/bdist_wheel-makes-absolute-data_files

Using --no-binary as a work-around? That won't work in
requirements.txt, and we can't seriously say that people ( using these
hundreds / thousands of packages ) have to discover for themselves
that data files have been installed to the wrong locations and then
individually deduct "oh I'll just use --no-binary" :)

It's broken! A bug is in the wild!

I would suggest that fixing the way that Wheel handles data_files is
the only way forward. Get a fixed Wheel released and increment the
Wheel version dependency of a new Pip 7.0.4.

— Reply to this email directly or view it on GitHub[1].

Links:

  1. All packages that contain non-package data are now likely installed in a broken way since 7.0.0 #2874 (comment)

@pombredanne
Copy link
Contributor Author

@dstufft could there be a simpler way to detect packages that are likely problematic before even attempting to wheel them at all? After all the problems are rather narrow from my experience: these are essentially packages that provide data_files and that expect that the data_files will end up installed exactly where they were relative to the Python code in an sdist rather than relative to sys.prefix, which is rather a bug in setuptools than it is in wheel or pip.
It is just that the behaviour has been there for so long that it has been taken for granted by some or many..

Now, wheeling these would rarely if ever fail, so pip would not fall back to a plain setup.py install there.

I wonder if eventually a simple extract and reading the setup.py as a plain file and "grepping" for a "data_files" would suffice to make the decision to wheel or not to wheel?

@pombredanne
Copy link
Contributor Author

@benjaoming you wrote "Oh they're just poorly packed? What a sensible thing to say ;)"

Sorry if that came out poorly... I meant poorly packaged in the sense that they there is a conflict in the distutils and setuptools approach to dealing with the same data_files directive and that is this is a poorly understood issue (I came to grasp it only recently) and that leads to poorly packaged code, not because of the packager:

wheel has been taking the distutil interpretation even though wheels are practically built with a setuptools extension..

pip was neutral until v7 because it would install wheels as wheels and sdist as sdist by running a setup.py install

pip7 now runs setup.py bdist_wheel first which means that the wheel interpretation will be always applied.

So the crux of the issue is the inconsistency between distutils vs setuptools and the distutils side taken by wheel and the relative prominence of setuptools vs distutils.

Personally I was originally split on the issue... There are likely very few packages relying on pure distutils behaviour. In fact it sounds like instead most everyone even when using pure distutils expects their package to work with a setuptools interpretation of where data_files will land, as in the case of publicsuffix or selenium. I think this is likely because when you do install with pip the setuptools behaviour will apply even when the distutils code is used in a setup.py

So after a good thought, I am not longer split I guess... the setuptools behaviour should be preferred.
This should mean that eventually the distutils behaviour should be deprecated at some point and that data_files can never be installed by wheels or sdist outside of the target installation directory. I kinda like that approach which feels safer and is neutral wrt to a system vs virtualenv installation.
There is a risk that a few packages relied on installation in absolute paths or in other places.

@dstufft
Copy link
Member

dstufft commented Jun 5, 2015

Honestly, projects that are using data files and are doing hacks to make it act like package_data, should just use package data. What's the point of having data files and package data if they both act the same?

@benjaoming
Copy link

@pombredanne
+1 for setuptools behavior, I was implementing a data_files application just before pip 7.0 landed, and I was initially targeting distutils to have the lowest level of compatibility... but eventually found that it was broken for precisely the reasons you are linking!

I don't think there's any reason to expand the behavior of setuptools. Ultimately, the ones who implement a setup.py will have to deal with the problems of wanting files in paths that are not relative to sys.prefix, and taking care to use sys.prefix in one's data_files list is critical. For instance, they would have responsibility to raise Exception("No write-access to /etc/xx, maybe you are installing in a virtualenv or you forgot administrative privileges").

@dstufft please consider that not all data files are package-related, they can be /etc/init.d/my-service or /usr/share/icons/blah.

@dstufft
Copy link
Member

dstufft commented Jun 5, 2015

I think the ability to write to arbitrary paths outside of sys.prefix is a misfeature.

@benjaoming
Copy link

@dstufft
I totally disagree. A setup.py is an application of its own, not a configuration file. It should have the freedom to utilize the packaging system to place files in OS-specific places. It's entirely possible to create platform-dependent lists of files for data_files as it was, and removing that possibility would be a regression.

@dstufft
Copy link
Member

dstufft commented Jun 5, 2015

setup.py is bad too, Python packaging is moving in the direction to get rid of it.

@benjaoming
Copy link

@dstufft !? maybe you wanna take that to a different issue.

@dstufft
Copy link
Member

dstufft commented Jun 5, 2015

The key difference in view points is that you're coming at a place that the current solution allows X, so X must be preserved. That's now how most (if any?) of the people pushing Python packaging forward are viewing things. The current solution allows anything, so if we try to preserve that then every change is backwards incompatible and we might as well give up and just let things sit exactly how they are, some folks call this Postel's Tarpit.

Unfortunately, this means that at some point we have to break things for some subset of users. We've been doing this with each release for different subsets of users:

  • We've broken the cases where you want to install from PyPI without having an SSL enabled Python (or a ancient enough OpenSSL).
  • We've broken the cases where someone wanted to host their files off of PyPI but still have it installed by default on PyPI.
  • We've broken the cases where people were relying on specific older markup formats to discover files to download.
  • We've broken people who used exotic looking version numbers or version numbers which did things a little odd (like pytz which used versions like YYYYa, YYYYb, YYYYc).
  • We've broken people relying on the package itself defining extra URLs for the package installer to go and look to satisfy dependencies.
  • We've broken people who want to upload a file to PyPI, delete it, and then upload a different file with the same name.

That's just the stuff that I'm able to think of off the top of my head. Throughout all of this we are attempting to balance what we're breaking with the benefits of breaking it, and looking for use cases that we want to still support that aren't able to be handled by what we've left it with.

For the feature at hand I've seen two situations where people are using it:

  • They are (poorly) reimplementing the package_data feature.
  • They are writing out files like /etc/init/my-service or similar "I'm using Python packages like they were OS packages" situations.

The first of those should be broken, it's silly to poorly reimplement a different feature instead of just using that feature. In my experience this is also the most common use case for the data_files feature and I believe that if we hold up improvements due to it we are doing the entire community a disservice for the benefit of a few projects. These projects have had almost 2 years to try using Wheel for their projects and to fix any issues with installing their project via Wheel. At this point It's unlikely that anything but breaking them is going to push them to fix their packaging.

The second situation is one where I think there is a valid use case here, but I don't think the valid use case involves being able to write to arbitrary paths anywhere on the system. If someone installs your project into a virtual environment, you shouldn't be able to touch files outside of that virtual environment. The fact it is technically possible today because of the nature of setup.py does not mean that it is a good thing or that we're going to wish to support it moving into the future. In this regard, I think rooting data_files in sys.prefix (or some other variable that would scope it to the system in a system install, and the virtualenv root otherwise) is the right thing here and that setuptools got it wrong. Most likely the only reason setuptools does what it did is because of egg installs (another feature that folks are moving away from).

Ultimately projects who rely on something that doesn't work while wheel'd should modify their setup.py to prevent being wheeled. This will be accurate not just for the implicit wheel cache in pip 7, but also manual wheel cached using pip wheel, and also things like the wheel builder for devpi. In the mean time end users can disable this feature either globally using --no-binary :all: or for a particular set of projects by using --no-binary project1,project2,project3 or by simply not upgrading.

As we attempt to "climb out" of Postel's Tarpit, it is inevitable that things break because as it stands, the attempts to do something (anything, no matter how broken or silly it was) given any input has caused a situation where any change breaks things.

@pfmoore
Copy link
Member

pfmoore commented Jun 5, 2015

+1 to all that @dstufft said. To make progress, we have to establish policies which might not match current behaviour. The aim is to meet existing requirements, so that any valid use cases are still supported. But:

  1. People may well have to change how they achieve their aims, to work with the newer versions. The data_files -> package_data change is in this class.
  2. The use cases have to be valid and appropriate. The case of installing into an arbitrary filesystem location is one of these - as it stands, it's not compatible with virtualenvs, user installs, or pretty much anything other than a root-level system Python install. So we need to refine the requirement to cater for these cases. There is work going on to do that, but it's not a simple problem, and until it's done, packages can explicitly not support being made into a wheel. But that should only be a stopgap - we need such projects to enter into the discussion on how to implement the new feature effectively, without just rejecting any change at all. That way, we can be sure that we end up with an implementation that provides what projects need.

So, to cut a long story short - data_files is deprecated. Most use cases can be handled via package_data, and for those that can't, projects should temporarily disable wheel creation, and join in the discussion on how to extend the wheel spec to satisfy their needs alongside the other constraints involved (such as virtualenv and user install friendliness).

@benjaoming
Copy link

Guys, we're not deprecating data_files, we've essentially removed the feature, and packages that are currently using it, have been prompted to re-release with some exception throwing from individual setup.py's if they're called with bdist_wheel. This decision was unfortunately not made in a single move, but is the result of the mix of Wheel not having implemented it properly, and Pip now forcing bdist_wheel.

Do you have suggestions on how Pip / Wheel can help track files installed in '/etc' ?

Or do you encourage that users write their own setup.py-like logic for their applications' first-run? Because I can easily see lots of ugly behaviors being implemented in the application scope if Wheel doesn't want to deal with it.

@dstufft

The key difference in view points is that you're coming at a place that the current solution allows X, so X must be preserved.

No, that's not the difference. I understand the other decisions, but it's a logical fallacy to use them as justification. I wrote about current use cases of data_files. Just because there's justification in removing some behaviors (as in: we're not even deprecating this stuff!), doesn't mean that all behaviors should be removed because they're old and there's an objection. Maybe old discussions and decisions were entirely consistent and well thought through. I understand you're trying to explain things in the rest of the post, but please don't hint that I'm non-progressive just because I try to advocate the preservation of an old feature :)

The second situation is one where I think there is a valid use case here, but I don't think the valid use case involves being able to write to arbitrary paths anywhere on the system. If someone installs your project into a virtual environment, you shouldn't be able to touch files outside of that virtual environment.

This will most likely fail anyways. If someone is installing in a virtualenv and data files are put outside it, usually the virtualenv's pip instance won't have write access. In case it does, well, this IMO is up to the application's setup.py: Does it want to respect other existing files? Other installed versions? This is the sort of freedom that is needed for certain things.

If data_files doesn't do it, you end up having people put this in their setup.py: open('/etc/rc.local', 'a').write('python my_service.py') -- how about that? ;)

@dstufft
Copy link
Member

dstufft commented Jun 5, 2015

We obviously haven't removed all old behaviors nor would we, to say anything else is a strawman. The ability to stick files in arbitrary locations is a misfeature, and supporting features that make that possible isn't something I'm interested in doing. Full stop. All files should be scoped to be within the scope of the environment they are being installed into.

You've correctly pointed out that people can just do it by throwing some arbitrary Python into their setup.py, of course they can do that, but it has a few major differences:

  • It's more obviously the wrong thing to do rather than just subtly the wrong thing to do.
  • If they do that, that's a short term "fix" anyways as we move away from setup.py and towards declarative packages.
  • It's not a "feature" of the packaging toolchain, just a side effect of the fact that setup.py is executable.

I mentioned this issue to an IRC channel I happen to be in that has a number of Python developers in it, and one of them commented (name removed):

<___>   It does seem super obvious that projects in virtualenvs shouldn't be able to write outside their virtualenv
<___>   It hadn't even really occurred to me that they could
<___>   Or that a developer would do that

I suspect that a large majority of developers do not expect that installing a project into a virtual environment (or a --user install) would even attempt to affect their global system (whether they had permission to do it or not).

I also do not believe that it should be up to random authors whether they respect other existing files or other installed versions. You can argue that in some scenario this freedom is needed, and that is true I can probably come up with a scenario where something like that would make things better for that one particular project, but part of designing a sane software ecosystem is being able to look at a particular feature and being able to say "no".

We recognize that there are use cases where this may be useful, and we invite those people to help us come up with a solution that satisfies both their constraints and our constraints. Until then, they have a number of solutions available to them:

  • Modify their setup.py to raise an error when attempting to be wheeled (this isn't a forever solution, long term I hope that pip will only support installing from Wheels and the only thing it ever does with a sdist is build a wheel).
  • Instruct their users to use --no-binary :all: or --no-binary their-project,some-other-project.
  • Instruct their users to downgrade pip to a pip that's older than 7.0.
  • Instruct their users not to install wheel alongside their projects, which will also disable wheel building.

They had the last two years to play with Wheels and be proactive in ensuring that they don't break with them. If they've failed to do that over the last 2 years, we're now at the reactive stage where they have to react to the fact that they need to either fix themselves or put up work arounds. Eventually (another 2 years? Who knows for sure with volunteer timelines) if they only institute work arounds they'll have to react again when their projects stop being installable all together.

@pombredanne
Copy link
Contributor Author

@dstufft you wrote:

I think the ability to write to arbitrary paths outside of sys.prefix is a misfeature.

I could not agree more. I would go even further to say that the ability to write to arbitrary paths outside of where the root package is installed is a misfeature which allow to mistreat Python packages for system packages...

@pombredanne
Copy link
Contributor Author

@benjaoming you wrote:

Do you have suggestions on how Pip / Wheel can help track files installed in '/etc' ?

My 2 cents, if this is the use case you are after, then IMHO Python packages be they wheel, eggs, sdist, bdist are not what you should be looking for. If you are on Linux use the package manager of your distro for this instead.

@benjaoming
Copy link

Well, essentially the packaging system (Pip+Wheels) is creating value by managing what's installed on a system.

If a piece of Python software needs data files outside of the environment (where package data lives), then we all agree that there are valid use cases.

So the question is: Does Wheel want to create this value by accommodating scenarios where distributed files are placed outside of the OS-universal Python environment ?

If not, Pip+Wheels can remain a great tool for certain things, but you will definitely see people reinventing lots of wheels to reimplement an often-need feature for system services and desktop applications. Your mindset seems to be on Python libraries and command line scripts.

So in case you leave this as is, future Python package distributions may start leaving around unmanaged data files and use the same hacks that you were trying to avoid in order to force non-OS-specific behavior. But you can't really force it, when essentially it doesn't exist :)

Q: I want to create a package on PyPi that can be installed with pip install my-software and then it will create desktop short cuts or start up services on all the OS that I've supported in my setup.py
A: You can do this, but there's no way of making Pip+Wheels look after the files that you place on the user's system

Does the above sound correct?

@dstufft
Copy link
Member

dstufft commented Jun 6, 2015

If you want to install desktop shortcuts or start up services then start up a thread on distutils-sig about that so we can collectively figure out if we want to support that, and if we want to support that how we can best do it in a way that will work across all the various platforms as well as virtual environments, user installs, and such. Things that can be explicit designed and ultimate controlled passed on to the end user whether they want that thing or not.

@pombredanne
Copy link
Contributor Author

@benjaoming To me the Python packages "system" is about discovering, provisioning and installing Python libraries used in a larger context of Python-based applications. This is the approach of most if not all language-specific package management systems for Ruby, Perl, JS, Java, etc.

MO is that the provisioning and installing of complete applications including things that deal with system services, desktop integration, databases and more is best left for other tools to handle. In most cases with the OS package tool (debs, rpms, brew, chocolatey, native installers, etc) or with tool dealing with configuration "in the large" such as buildout and Ansible, Salt, Puppet, Chef of the world. And containers.

I would never attempt to use only pip and wheels for this.

@benjaoming
Copy link

For my case, it's fine to have data_files only being allowed to place files in sys.prefix.

But consider what happens if you do not implement a backwards-compatible way of handling data_files... broken packages galore..

My proposal is still that it should work as before, but with your objections intact: It's discouraged to place files outside the userspace Python environment. But we can still embrace the freedom and value of having the ability in Wheels to install and track these files.

Btw data_files is supported by stdeb for creating .deb packages.

@pombredanne
Copy link
Contributor Author

@benjaoming I wonder if the number of actual packages using data_files is not rather small
and if this is possibly a minuscule issue after all?

@benjaoming
Copy link

@pombredanne no way of telling quite yet I suspect :) Stirring the waters to see what happens is inevitably what's happening now...

But this Google does look like there's a substantial bit of setup.pys out there with data_files:

https://encrypted.google.com/search?num=30&hl=en&q=%22data_files%22+%22blob%22+%22setup.py%22+site%3Agithub.com

@dstufft
Copy link
Member

dstufft commented Jun 24, 2015

Personally, I think the distutils behavior makes more sense. If we want something relative to the package dir we already have package_data.

@jdemeyer
Copy link

So the crux of the issue is the inconsistency between distutils vs setuptools

I agree, see also setuptools #460.

@breedlun
Copy link

I just released my first python package, and it is affected by this issue. I would like to avoid absolute paths, as suggested, but I do not know the proper way. Can someone give me a hand?

Here is a link to my stack overflow question that goes into more detail.

anthrotype referenced this issue in fonttools/fonttools Aug 20, 2016
The file contains outdated information and nobody is maintaining it any longer.
Plus it only works on Linux, and only if pip installing as sudo (which is bad!).

We need proper Sphinx+ReadTheDocs kind of documentation.
benoit-pierre added a commit to benoit-pierre/plover that referenced this issue May 25, 2017
benoit-pierre added a commit to benoit-pierre/plover that referenced this issue May 25, 2017
benoit-pierre added a commit to benoit-pierre/plover that referenced this issue May 25, 2017
benoit-pierre added a commit to benoit-pierre/plover that referenced this issue May 26, 2017
benoit-pierre added a commit to benoit-pierre/plover that referenced this issue May 26, 2017
benoit-pierre added a commit to benoit-pierre/plover that referenced this issue May 27, 2017
benoit-pierre added a commit to benoit-pierre/plover that referenced this issue May 27, 2017
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

No branches or pull requests

10 participants