Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing readline as a dependency on unix to avoid implicating GPL3 licenses #387

Open
EricTheMagician opened this issue Sep 14, 2020 · 31 comments

Comments

@EricTheMagician
Copy link

EricTheMagician commented Sep 14, 2020

While auditing our python distribution, we discovered that the python version built by conda-forge could be required to be distributed under a GPL3 license, which is not viable for commercial purposes.
This arises from the fact that the python library is built with readline, which is GPL3.

Given that this could create some issues for commercial distribution of conda-forge/python, what are your thoughts on removing it? A possible replacement is libedit

(I know this is easy to do, and I'm trying it out, but for some reason I'm not able to run docker image to build locally. So in the mean time, I thought I would ask the community while I try to debug my situation and test this change)

@scopatz
Copy link
Member

scopatz commented Sep 14, 2020

This could break a lot of downstream code. I think we have talked about a mutex for readline/noreadline in the past

@EricTheMagician
Copy link
Author

EricTheMagician commented Sep 14, 2020

@scopatz Can you point me to the discussion?
I don't understand what you are trying to say.

@scopatz
Copy link
Member

scopatz commented Sep 14, 2020

Basically the idea is that we would have one build output that uses readline and another that doesn't. I think the discussion was in an issue in this repo somewhere.

@EricTheMagician
Copy link
Author

Alright, I think you are referring to #191, and #192
I'll have a read through those, but I was looking more to replace the readline shared library with libedit which is licensed under netbsd, which is quite similar to bsd.
So with that change, I think it would be fine and there would be no need to have 2 different versions.

I saw this:
#192 (comment)
I'll definitely make sure that my proposed changes will not break cli interactions with python and ipython.

@scopatz
Copy link
Member

scopatz commented Sep 15, 2020

libedit is pretty broken relative to readline. It is not a full replacement. There are really three mutex options here:

  • readline (default)
  • libedit
  • noreadline

@mingwandroid
Copy link
Contributor

libedit is pretty broken relative to readline

.. then again, we use it on macOS quite heavily and no one has complained.

@EricTheMagician
Copy link
Author

@scopatz
What functionality is missing that would break readline?

@mingwandroid
I'm not familiar with the python popularity on macOS, so I checked the downloads from anaconda.org and the macOS downloads of the python builds are reasonably high enough to expect someone to find an issue with libedit if it was truly a problem.

@mingwandroid
Copy link
Contributor

I know @jjhelmus has some issues with editline!

@scopatz
Copy link
Member

scopatz commented Sep 15, 2020

OK, let me rephrase, then. I am that person. I work on some other projects that use readline/libedit. I can tell you that it is a huge headache, even on Mac, when libedit is used instead. This is because libedit only partially implements the readline API. It is not 1:1 and many APIs are missing. It causes tons of issues for users & developers who have a really hard time figuring out what went wrong. It really doesn't make sense to treat them as equivalent.

In terms of specific missing functionality, it has been a couple of years, but I believe it was a lot of stuff related to tab completions, and that libedit doesn't respect the full key-binding syntax that readline has.

@mingwandroid
Copy link
Contributor

mingwandroid commented Sep 15, 2020

This is because libedit only partially implements the readline API. It is not 1:1 and many APIs are missing. It causes tons of issues for users & developers who have a really hard time figuring out what went wrong. It really doesn't make sense to treat them as equivalent.

These issues seem orthogonal to how appropriate readline is as the line editing component of the Python interpreter, unless you can expand on the tangible problems with using it in that context.

I don't care if you select readline for your own projects, that's between you and Richard Stallman!

@mingwandroid
Copy link
Contributor

And please do refresh your knowledge here! Could be that they've fixed everything you had a problem with since. Can I ask, do you use macOS python? Are you happy with the text entering experience there?

@scopatz
Copy link
Member

scopatz commented Sep 15, 2020

@mingwandroid - no need to be so aggressive here. I am just saying that the last time I checked, libedit was a broken mess and abandoned.

@scopatz
Copy link
Member

scopatz commented Sep 15, 2020

I am proposing a perfectly reasonable solution here with having a mutex. I don't think it is that hard

@mingwandroid
Copy link
Contributor

I am not being aggressive, I'm asking for tangible problems with editline in python. That is all!

@mingwandroid
Copy link
Contributor

mingwandroid commented Sep 15, 2020

Adding variants isn't so problem-free IMHO. They expand our build matrices a lot, increasing overall package counts. It also increases the amount of variation seen in the wild which will lead to more issues to solve. Overall, if editline in python on Linux is in general considered "good enough" then we should stick with it.

@scopatz
Copy link
Member

scopatz commented Sep 15, 2020

I feel like you are being aggressive by making this about "Me & Stallman" and telling me to "refresh my knowledge." This personalizes the issue. This really should be a civil discussion about the pros/cons of various packaging strategies.

@mingwandroid
Copy link
Contributor

mingwandroid commented Sep 15, 2020

Sorry, the Stallman comment was a joke only. But I do think you are personalising the issue by bringing in unrelated things here (API etc). I didn't mean offense. I also need to refresh my knowledge on editline vs readline, as we all should (if it is lacking!) if we want to talk about it reasonably.

@scopatz
Copy link
Member

scopatz commented Sep 15, 2020

Alright, sorry, I didn't read it as a joke. Sounds like we can just stick to the technical, which works for me.

For reference, here is the editline repo: https://github.com/troglobit/editline The README has may places where it says that it is designed to be smaller and not implement all of the features of readline. There are also some issue, like https://stackoverflow.com/a/7116997, where the actual usage patterns are different between the two libraries. Codes downstream from Python, like xonsh have definitely hit these issues, https://github.com/xonsh/xonsh/pull/93/files.

My point here is that a switch to editline from readline could break other packages, even if everything compiled & linked correctly and the Python test suite passed. An alternative to both of those might be replxx, but that project has only had two releases so it is not clear how mature it is.

@EricTheMagician
Copy link
Author

EricTheMagician commented Sep 15, 2020

@scopatz
Just for clarity, which library does mac OS actually use?
According to the python source code:

It is possible to link the readline module to the readline emulation library of editline/libedit.

which makes me think that it's using this editline/libedit and not the one you linked.
Another interpretation is that it could use either or, but I'm not sure how to read this.

@EricTheMagician
Copy link
Author

For the readline library, I had originally been targetting python 3.7 as that's what we're looking to distribute and the code about the runtime check is no longer encapsulated in an #ifdef __APPLE__ such that, if I upgrade our distribution to 3.8, than in theory I could just remove the readline library and have it symlink to libedit, potentially rendering this issue moot?

@mingwandroid
Copy link
Contributor

mingwandroid commented Sep 15, 2020

That would not work as the ABIs are not compatible. (edit: missed your comment re the emulation lib, the compatibility stuff will elide functionality I expect and Python may not handle that gracefully, still some experiments would be worthwhile).

My personal preference at present is to have python-core (or -base), python-gpl and python-nongpl as top level packages, then python == python-gpl. The others just pull in python-readline-readline and python-readline-editline output packages. We create them all at once from a single package and then make sure python's readline module loader handles switching behavior. Then we slowly migrate all our build deps to python-core (or -base) .. if we can't just agree that editline itself is "good enough" (@scopatz, I think that difference between editline and readline should be handled inside Python really then no packages would need patching, and of course it'd be nice to upstream any such patch).

@EricTheMagician
Copy link
Author

@mingwandroid
That sounds like a plan.

@mingwandroid
Copy link
Contributor

Some other opinions:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498857
Summary: PSF license is GPL-compatible, nothing to see here (some back and forth on this).

https://pyoxidizer.readthedocs.io/en/oxidized_importer-0.1/packaging_licensing.html
Summary: A readline that was statically linked to the python executable or the thing you're embedding python into confers strong GPL license requirements. Implication: doing this via dynamic linking does not.

I would not care if we just leave this. The general consensus is that this is not a problem.

@hmaarrfk
Copy link

hmaarrfk commented Sep 16, 2020

@scopatz i thought i went through the packages you were previously worried about and found that they all stopped using readline because readline's functionality was too limited.

Instead, they all use prompt-toolkit

https://arstechnica.com/cars/2020/09/nikola-admits-prototype-was-rolling-downhill-in-promotional-video/
#192 (comment)

edit: eek wrong link

@scopatz
Copy link
Member

scopatz commented Sep 16, 2020

Yeah, I think this is less of an issue for the packages we know about. But there may be packages and users out there we don't know about. And yeah, don't get me wrong. I am totally.in favor of providing a GPL free version, and letting deps and packages choose what they need. It just has yet to be anyone's priority, but would be nice to have if it was.

@mingwandroid
Copy link
Contributor

Two more ideas:

  1. We could contribute to editline some of the missing bits for readline compatibility (API, ABI, file names, file locations (/etc vs ~), quirk translation). Turns out it doesn't support UTF-8! Adding that is a reasonable sized chunk of work (very possible though). From looking at their TODO the rest looks quite achievable. I would do the TODO stuff before tackling UTF-8 as I think that'd give a more gradual learning curve.

  2. This looks nice: https://github.com/AmokHuginnsson/replxx, apart from ABI/API compat. guarantees (I didn't check yet) my concern is pulling C++ runtime libraries into otherwise C-only things. I wouldn't like python to depend on that really.

@hmaarrfk
Copy link

I think the underlying problem goes beyond this particular python package.

Linking to FFMPEG could make your whole software GPL for example.

In my mind, it would be good to push some GPL-free feature for a host of other packages.

@mingwandroid
Copy link
Contributor

I am wary of misunderstandings concerning the GPL (though I'm not an expert).

I do not understand what you mean about FFMPEG, with regards to our ecosystem. How do you "push some GPL-free feature" for such a huge library? Do that not mean someone needs to rewrite it and release it in some license that you are more happy with? Yes, linking to FFMPEG could confer GPL status on the thing you built (depending on the definition of linking - simply importing it via an interface into your python code will not do this to your python code - sure it'll pull some GPL into your env, but well, we all use GPL stuff in our envs each and every day, by and large), but it's the right of the FFMPEG developers to assert this. If you want to use their work under a different license then you'd need to get in touch with them or else find an alternative. For me these freedoms, not problems.

If you are hoping to 'productize' conda environments in some fashion (otherwise why would you care about GPL?) and licenses are creating restrictions on that then I don't really have a huge amount of sympathy. These are the moral rights of the softwares' authors and it's important that it is known that you can still sell GPL things and you can sell things that mix GPL things with non-GPL things. SUSE does that all the time.

In many ways python and its module system can be considered a distribution, in fact it is often referred to as that. It is a runtime environment in which the user is free to run the readline module or not. This can, IMHO be considered as being similar to a Linux distro placing GPL software beside non-GPL software. Yes, our Python readline.pyd will dynamically auto-load readline.so, but it will dynamically link to any readline.so that implements the same ABI. If we want to be super-clean about this we should rewrite the readline header (as much as Python needs of it anyway) and release that under BSD, then include that in the python build. That way, the python build would not have ever even seen GPL software and in that instance it'd be impossible to argue that our Python is not GPL-free. That conda install python then ends up with a runtime environment that includes readline is a different issue and we'd be free to provide alternatives at that level for anyone who cared.

Outside of doing the above, I think since we have a few different tools (mutexes, split outputs, optional constrained dependencies) at hand to improve the categorization of readline component of Python (and given it does look at GPL stuff during compilation and linking) we probably should continue to explore that.

@hmaarrfk
Copy link

i'm not trying to change the intent of the original authors. conda is licensed as BSD, and as a user of conda, I would like to respect that.

However, the legal text of GPL is quite long and complex. If I know I don't use any of the GPL features in my system, I would like the piece of mind that I'm not infringing on any of their 17 sections. Their FAQ is even longer....

I think GPL makes it rather clear that import submodule; submodule.call('git') does not make your python module GPL, but it is unclear if the creation of readline at the same time as the python binary, has made python as a whole GPL or not. https://www.gnu.org/licenses/gpl-faq.en.html#GPLPlugins

Maybe the GPL is valuable to some programs, but it is a complex document, that is rather dense and hard to understand. A GPL-free way to distribute code would be nice.

If you aren't making a program interactive, why would you want to subject yourself to the ongoing requirements of the GPL.

@EricTheMagician
Copy link
Author

EricTheMagician commented Sep 21, 2020

Thanks for all the discussion: @scopatz @mingwandroid @hmaarrfk
It's really quite interesting to be part of.

@mingwandroid while it is true that I do have a commercial interest in redistributing python, but part of the problem is just packaging.I like your solution of just splitting out the readline module in to it's own package. In this case, it's really doable. What the downstream consequences are, I don't know, but I can definitely appreciate the complexity and unintended consequences.

I think the discussion between @hmaarrfk and @mingwandroid over ffmpeg brings up a really big concern of mine: how do you really know what licenses you need or are using? ffmpeg is a really peculiar case because ffmpeg on it's own is released under LGPL, though it can be compiled to use GPL licensed libraries. On the linux side at least, those GPL libraries are included, which makes distributing the version built by conda-forge GPL. But in this case, how do you split the package in 2? Further discussion on distributing ffmpeg is needed and I know @hmaarrfk has already started it in the right places.
The thing about ffmpeg that makes it really peculiar to me which relates back to licenes is the inclusion of openh264 and x264. While those 2 packages are free as in speech, the H264 license is free as in beer (up to a point). For this very peculiar case, whose responsibility should it fall on to, to get a license from the MPEG-LA? Is it the developer who uses ffmpeg? Or the one who distributes ffmpeg? Both are equally acceptable answers according the MPEG-LA. Since I don't know anything about internal commercial workings (if any) of conda-forge, and that I see no company containing the letters conda on the MPEG-LA licencee website. This means the burden falls on to the developer.

All of that got me to think about this: how do we find a balance of managing licenses for the community (through the distribution) and by the community (who develop packages on top of the distribution)?

@hmaarrfk
Copy link

I just wanted to cross reference an issue with constructor that had been touted as a "solution" or workaroudn for this kind of problem:
conda/constructor#319
Therefore, building with readline and asking people that package their final output product to simply not include has become a "second class" citizen in constructor. Not a really fun situation to be in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants