Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automated approach to validate that requirement files are consistent #230

Closed
erikmannerfelt opened this issue Jul 9, 2021 · 4 comments · Fixed by #398
Closed

Add automated approach to validate that requirement files are consistent #230

erikmannerfelt opened this issue Jul 9, 2021 · 4 comments · Fixed by #398
Labels
enhancement Feature improvement or request test-suite

Comments

@erikmannerfelt
Copy link
Contributor

Is your feature request related to a problem? Please describe.
As mentioned in #229, there are currently four places where dependencies are specified:

  • environment.yml
  • setup.py
  • dev-requirements.txt
  • meta.yaml

There is a large risk that someone may forget to update all of these if a dependency is updated, so there should be an automated way to validate that they are consistent.

Describe the solution you'd like
A test or pre-commit hook to validate the content of all files.

Some pseudocode:

min_python_version = "3.7"
dependencies = ["numpy>=1.17", "scipy", ...]  # This could either be hardcoded in the test or read from a "reference" reqs file.
dev_dependencies = ["pytest", "sphinx"]

for req_file in req_files:
    # Standardize -/_ (e.g. sphinxcontrib_programoutput on PyPi and sphinxcontrib-programoutput on conda-forge)
    # Validate that all dependencies (and dev-dependencies if appropriate) are there
    # Validate the version constraints
    # Check that no additional dependencies exist that are not defined above.

This should fail with a descriptive error if something is inconsistent.

Describe alternatives you've considered
Perhaps some bot exists to do this for us?

For reference, Pandas has a pre-commit hook for this specifically.

@rhugonnet
Copy link
Member

rhugonnet commented Aug 17, 2023

EDIT: just noticing now that @erikmannerfelt already noticed pandas's pre-commit hook 2 years ago! 🙈
Well at least the searching I did on setuptools is still useful and complementary!

Following discussion started in GlacioHack/xdem#408 with @erikmannerfelt.

Problem = we currently have 4 environment files to synchronize:

  1. environment.yml (for conda),
  2. dev-environment.yml (for devs using conda),
  3. requirements.txt (for pip),
  4. setup.py (for pip).

... and we would add a 5th one for nix! 😱

We currently only check the consistency of environment.yml and dev-environment.yml during CI since #297 and the custom bit of code here: https://github.com/GlacioHack/geoutils/blob/main/geoutils/misc.py#L153, and the script here: https://github.com/GlacioHack/geoutils/blob/main/.github/get_yml_env_nopy.py.

I looked around a bit. Googling really doesn't work for these type of specific "GitHub-dev" problems, so I did a bit of spying instead 😁:

However, it looks like setuptools now works mostly with a pyproject.toml instead of the setup.py, and this is the more "future-proof" option. However, the requirements.txt is still needed to pin the exact versions (was hard to find that info! see recent comment here: pypa/packaging.python.org#685 (comment)), and documentation page here: https://pip.pypa.io/en/stable/reference/requirements-file-format/). So the pip-to-conda code in pandas is still relevant and will continue to be in the future!

So this should solve it all! 😀
In short, we only need to maintain a good environment.yml, and then:

  1. Using Check that environment.yml can import geoutils, and use diff with dev-environment.yml during CI #297, our CI will check that dev-environment.yml is consistent (could almost write it as a custom pre-commit hook, but maybe that's over-engineering?),
  2. Using pandas's approach, our CI will check that requirements.txt is consistent (here also, pre-commit hook or not?),
  3. Finally, we can remove the install_requirements from setup.py (and split the info there into setup.cfg + pyproject.toml files to be future-proof).

@rhugonnet
Copy link
Member

Additionally, we have the meta.yaml in the -feedstock/recipe. They are not updated automatically, but the bot asks for a manual check before publishing.
The bot currently works for xdem, but not for geoutils (it looks like it is since we changed from GlacioHack/GeoUtils to GlacioHack/geoutils). I tried to change all the info and links hoping it would fix the issue (https://github.com/conda-forge/geoutils-feedstock/pull/12/files and following PRs) as it is supposed to detect either PyPI or GitHub releases, but without success.

In more details, what triggers the PR is the regro-cf-autotick-bot, info from conda-forge's doc here: https://conda-forge.org/docs/maintainer/updating_pkgs.html#how-does-regro-cf-autotick-bot-create-automatic-version-updates
I couldn't find anything in the links to help me fix the issue, but maybe one of you could be more successful @erikmannerfelt @atedstone @adehecq 😅!

@erikmannerfelt
Copy link
Contributor Author

Great inputs; thanks @rhugonnet!

I guess there are two (potentially non-exclusive) ways to make sure all points to the same:

  1. Validate each dependency location separately
  2. Generate some of the dependency files from a reference file.

@rhugonnet, could you elaborate on your statement that the install_requires part in setup.py is not required? I can't see where the requirements.txt is read, so I don't understand how else the dependency list for a pip install would come in.

Regarding the bot on conda-forge, I remember having spent an eternity trying to get it running again after it stopped. I think we need to give up on it and just implement our own thing.

One small consideration we need to discuss is how blocking such a tester would be. If it's running as part of CI, should it fail or just warn in case of inconsistent dependencies? The obvious best case is to fail, but it may create a catch-22 situation in the conda-forge feedstock when updating; a new version needs to pass (unless forced) in the main repo, but that can only pass if the conda-forge feedstock is updated, which shouldn't be updated before the main repo. Of course, it would work to just force merge main on a version increment, then update the meta.yaml and finally rerun CI to give us the nice checkmark. But it would be best if we could figure this out without forcing a failure! Perhaps failures for dependencies in the repo and a warning for meta.yaml?

@rhugonnet
Copy link
Member

Reading a bit more: actually, install_requires is independent. It is used to pin "abstract" dependencies, while the ones in requirements.txt are "concrete".

Some quick reading explaining this:

Despite the nuance, many people still feel it is duplicating efforts and use tools to have everything in one requirements.txt file (like pbr: https://opendev.org/openstack/pbr).
Given how much we rely on conda-forge for geospatial dependencies (pip has a great chance of failing), and the fact that we only recommend a conda/mamba install in our docs, I don't think we need to be super cautious with install_requires... We could just list the "parent" packages, and just run a custom function that checks that the packages listed in install_requires exist in requirements.txt (without version check). Or we could use pbr.
What do you think?

To implement our own thing for meta.yaml while avoiding the catch-22 situation: could we add our own additional test which runs directly from the .github of the feedstock repo? That would solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature improvement or request test-suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants