Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip install #100

Closed
bittremieux opened this issue Jul 14, 2020 · 12 comments
Closed

pip install #100

bittremieux opened this issue Jul 14, 2020 · 12 comments
Labels
package quality Make package easy to install, use, adapt.

Comments

@bittremieux
Copy link

Besides via conda, it would be useful to submit matchms to PyPI as well so that non-conda users can easily install it using pip.

@sverhoeven
Copy link
Member

Conda is used mainly because of rdkit usage in the molecule/smile/inchi functions. If matchms was on pypi you would need to install RDKit without using conda. Installing rdkit without conda is hard so we choose to not have a pypi package.

We could make those rdkit dependent functions dynamically check if rdkit is installed and give an error when rdkit is missing, but this could be surprising and hard to document.

@bittremieux
Copy link
Author

I agree, installing RDKit without Conda is very annoying. However, only a minority of people consistently use Conda as package manager (unfortunately).

At least providing the more general way to install matchms using pip gives those users the option to still be able to install it. They'll have to figure out how to get the RDKit dependency themselves.

@florian-huber
Copy link
Collaborator

florian-huber commented Aug 3, 2020

We can indeed consider to add a "matchms light" version to pip.
Functions that would be affected seem to be:

from filtering

  • add_fingerprint
  • derive_inchikey_from_inchi
  • derive_inchikey_from_smiles
  • derive_smiles_from_inchi
  • repair_inchi_inchikey_smiles

from similarity

  • FingerprintSimilarity (the class itself is fine though, it just expects that fingerprints are added)

The main gain I see is as @bittremieux said the higher accessibility for potential future users.
However, this seems to require an own sprint. In particular also to properly do the packaging and CI.
For the matchms manuscript that won't be do-able I fear.

@florian-huber florian-huber added the package quality Make package easy to install, use, adapt. label Sep 10, 2020
@florian-huber
Copy link
Collaborator

florian-huber commented Sep 16, 2020

This has become a bit more important now, since matchms is about to become a dependency of spec2vec.
So making a more lightweight version will make spec2vec have less dependencies AND will allow to make matchms pip installable (and maybe spec2vec as well by the way).

I see two ways how we could do it:

  1. We keep matchms where it is but remove the 5 filters mentioned above that rely on rdkit.
    Those 5 removed filters will make a separate conda package matchms_chemistry.
    We would than have
    -- matchms on pypi
    -- matchms and matchms_chemistry on anaconda.
    Disadvantage: That makes two packages to take care off.

For the user that would not change too much. The importing might become a bit different, such as

from matchms.filtering import normalize_intensitites
from matchms_chemistry.filtering import repair_inchi_inchikey_smiles
...
  1. The alternative would be to keep everything in matchms, but make rdkit optional. That seems comparatively simple to do, but also comes with a
    Disadvantage: -as Stefan pointed out above- this can lead to undesired behavior. A user pip-installing matchms would get an error (missing rdkit) when running those filter functions.

Let me know if you have any ideas or opinions on this @sverhoeven, @jspaaks, @fdiblen ?
be careful, that's a trap. Whoever answers might be getting more questions once I start working on this...

@fdiblen
Copy link
Collaborator

fdiblen commented Sep 16, 2020

As you know we initially started to make a pip package. As far as I remember rdkit was the only reason why we switched to conda. I am in favor of the first option but keep in mind that this will require quite some work. Another option might be to replace rdkit with another pip package.

@sverhoeven
Copy link
Member

I would go with option 2

In other libraries I have seen methods not working because of missing dependencies and I don't mind it as long it is not the main functionality of the library.

Documenting rdkit requirement and giving a warning when rdkit is missing when trying to access a rdkit powered method, is workable for me, but maybe not for others.

It will make it much easier to publish using the https://github.com/NLeSC/python-template/blob/master/%7B%7Bcookiecutter.project_slug%7D%7D/.github/workflows/pypi.yml workflow.

In the future we can make this repo pip only and have conda package on conda-forge using https://github.com/conda-forge/staged-recipes . We will need to remove bioconda channel as conda-forge package can not depend on other channels, can be fixed by installing pyteomics as pip package. Using a conda-forge recipe will be much less maintenance for us as they will maintain all the CI stuff. We will throw away a lot of work in this repo, but I think we should learn to let it go, let it go.

@bittremieux
Copy link
Author

I agree with @sverhoeven that option 2 would be the better one. The chemistry functionality is relatively minor and would be pretty small to extract into its own package. Also, then you'd have additional overhead by having to manage two packages instead of one.

Giving a relevant error message when a user tries to use functions that require RDKit should be quite clear for most developers.

Alternatively to conda-forge, you can also create a Bioconda package as it allows dependencies from the defaults, conda-forge, and bioconda channels. And if you have a noarch Python package, getting it on Bioconda is pretty trivial.

@florian-huber
Copy link
Collaborator

Thanks for the feedback so far!
How would you implement the optional dependency?
Add rdkit to extras_required in the setup.py and wrap the import with try:?

try:
    import rdkit
except ImportError:
    _has_rdkit = False
else:
    _has_rdkit = True

And then add raise an error when the actual function is called?

def add_fingerprint(....):
    if not _has_rdkit:
        raise ImportError("Conda package 'rdkit' is required for this functionality.")

@bittremieux
Copy link
Author

Yes indeed, that's pretty much what I'd recommend. You can specify RDKit as a dependency for matchms[chemistry] using extras_require. And then you can do the RDKit import as you suggest, or just import it inside of the function definition where you need it.

@florian-huber
Copy link
Collaborator

(slowly) started working on this in PR #148

@florian-huber
Copy link
Collaborator

florian-huber commented Oct 7, 2020

First version should be ready soon #148

@florian-huber
Copy link
Collaborator

Updates were merged (PR #148) and matchms 0.6.0 it is now on pypi!
(now spec2vec can follow soon)
Thanks @sverhoeven, @fdiblen and @bittremieux for comments, reviews etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package quality Make package easy to install, use, adapt.
Projects
None yet
Development

No branches or pull requests

4 participants