Add automated approach to validate that requirement files are consistent #230

erikmannerfelt · 2021-07-09T09:11:24Z

Is your feature request related to a problem? Please describe.
As mentioned in #229, there are currently four places where dependencies are specified:

environment.yml
setup.py
dev-requirements.txt
meta.yaml

There is a large risk that someone may forget to update all of these if a dependency is updated, so there should be an automated way to validate that they are consistent.

Describe the solution you'd like
A test or pre-commit hook to validate the content of all files.

Some pseudocode:

min_python_version = "3.7"
dependencies = ["numpy>=1.17", "scipy", ...]  # This could either be hardcoded in the test or read from a "reference" reqs file.
dev_dependencies = ["pytest", "sphinx"]

for req_file in req_files:
    # Standardize -/_ (e.g. sphinxcontrib_programoutput on PyPi and sphinxcontrib-programoutput on conda-forge)
    # Validate that all dependencies (and dev-dependencies if appropriate) are there
    # Validate the version constraints
    # Check that no additional dependencies exist that are not defined above.

This should fail with a descriptive error if something is inconsistent.

Describe alternatives you've considered
Perhaps some bot exists to do this for us?

For reference, Pandas has a pre-commit hook for this specifically.

rhugonnet · 2023-08-17T20:39:01Z

EDIT: just noticing now that @erikmannerfelt already noticed pandas's pre-commit hook 2 years ago! 🙈
Well at least the searching I did on setuptools is still useful and complementary!

Following discussion started in GlacioHack/xdem#408 with @erikmannerfelt.

Problem = we currently have 4 environment files to synchronize:

environment.yml (for conda),
dev-environment.yml (for devs using conda),
requirements.txt (for pip),
setup.py (for pip).

... and we would add a 5th one for nix! 😱

We currently only check the consistency of environment.yml and dev-environment.yml during CI since #297 and the custom bit of code here: https://github.com/GlacioHack/geoutils/blob/main/geoutils/misc.py#L153, and the script here: https://github.com/GlacioHack/geoutils/blob/main/.github/get_yml_env_nopy.py.

I looked around a bit. Googling really doesn't work for these type of specific "GitHub-dev" problems, so I did a bit of spying instead 😁:

In pandas, the requirements-dev.txt file is automatically generated using a custom pip-to-conda in pre-commit: https://github.com/pandas-dev/pandas/blob/7ff13b468a1c13a1c14b61bfc454e97f5b650717/.pre-commit-config.yaml#L267 which points to the following script in the pandas repo: https://github.com/pandas-dev/pandas/blob/main/scripts/generate_pip_deps_from_conda.py.
Reading a bit the documentation of pip and setuptools, it looks like the old way of doing it is with setup.py, requirements.txt, and setup.cfg, where the first one is the build script, second one contains the dependencies, and third one contains the project metadata included in the build. Right now we have the info of setup.cfg directly put into setup.py, and we also repeat the requirements.txt file there. So, normally, we could remove the install_requires altogether from setup.py and it should still work: https://github.com/GlacioHack/geoutils/blob/main/setup.py#L45!

However, it looks like setuptools now works mostly with a pyproject.toml instead of the setup.py, and this is the more "future-proof" option. However, the requirements.txt is still needed to pin the exact versions (was hard to find that info! see recent comment here: pypa/packaging.python.org#685 (comment)), and documentation page here: https://pip.pypa.io/en/stable/reference/requirements-file-format/). So the pip-to-conda code in pandas is still relevant and will continue to be in the future!

So this should solve it all! 😀
In short, we only need to maintain a good environment.yml, and then:

Using Check that environment.yml can import geoutils, and use diff with dev-environment.yml during CI #297, our CI will check that dev-environment.yml is consistent (could almost write it as a custom pre-commit hook, but maybe that's over-engineering?),
Using pandas's approach, our CI will check that requirements.txt is consistent (here also, pre-commit hook or not?),
Finally, we can remove the install_requirements from setup.py (and split the info there into setup.cfg + pyproject.toml files to be future-proof).

rhugonnet · 2023-08-17T20:51:09Z

Additionally, we have the meta.yaml in the -feedstock/recipe. They are not updated automatically, but the bot asks for a manual check before publishing.
The bot currently works for xdem, but not for geoutils (it looks like it is since we changed from GlacioHack/GeoUtils to GlacioHack/geoutils). I tried to change all the info and links hoping it would fix the issue (https://github.com/conda-forge/geoutils-feedstock/pull/12/files and following PRs) as it is supposed to detect either PyPI or GitHub releases, but without success.

In more details, what triggers the PR is the regro-cf-autotick-bot, info from conda-forge's doc here: https://conda-forge.org/docs/maintainer/updating_pkgs.html#how-does-regro-cf-autotick-bot-create-automatic-version-updates
I couldn't find anything in the links to help me fix the issue, but maybe one of you could be more successful @erikmannerfelt @atedstone @adehecq 😅!

erikmannerfelt · 2023-08-21T10:43:10Z

Great inputs; thanks @rhugonnet!

I guess there are two (potentially non-exclusive) ways to make sure all points to the same:

Validate each dependency location separately
Generate some of the dependency files from a reference file.

@rhugonnet, could you elaborate on your statement that the install_requires part in setup.py is not required? I can't see where the requirements.txt is read, so I don't understand how else the dependency list for a pip install would come in.

Regarding the bot on conda-forge, I remember having spent an eternity trying to get it running again after it stopped. I think we need to give up on it and just implement our own thing.

One small consideration we need to discuss is how blocking such a tester would be. If it's running as part of CI, should it fail or just warn in case of inconsistent dependencies? The obvious best case is to fail, but it may create a catch-22 situation in the conda-forge feedstock when updating; a new version needs to pass (unless forced) in the main repo, but that can only pass if the conda-forge feedstock is updated, which shouldn't be updated before the main repo. Of course, it would work to just force merge main on a version increment, then update the meta.yaml and finally rerun CI to give us the nice checkmark. But it would be best if we could figure this out without forcing a failure! Perhaps failures for dependencies in the repo and a warning for meta.yaml?

rhugonnet · 2023-08-21T19:47:21Z

Reading a bit more: actually, install_requires is independent. It is used to pin "abstract" dependencies, while the ones in requirements.txt are "concrete".

Some quick reading explaining this:

Despite the nuance, many people still feel it is duplicating efforts and use tools to have everything in one requirements.txt file (like pbr: https://opendev.org/openstack/pbr).
Given how much we rely on conda-forge for geospatial dependencies (pip has a great chance of failing), and the fact that we only recommend a conda/mamba install in our docs, I don't think we need to be super cautious with install_requires... We could just list the "parent" packages, and just run a custom function that checks that the packages listed in install_requires exist in requirements.txt (without version check). Or we could use pbr.
What do you think?

To implement our own thing for meta.yaml while avoiding the catch-22 situation: could we add our own additional test which runs directly from the .github of the feedstock repo? That would solve it.

erikmannerfelt added enhancement Feature improvement or request test-suite labels Jul 9, 2021

erikmannerfelt mentioned this issue Aug 17, 2023

Add a nix flake and development environment GlacioHack/xdem#408

Open

rhugonnet closed this as completed in #398 Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add automated approach to validate that requirement files are consistent #230

Add automated approach to validate that requirement files are consistent #230

erikmannerfelt commented Jul 9, 2021

rhugonnet commented Aug 17, 2023 •

edited

Loading

rhugonnet commented Aug 17, 2023

erikmannerfelt commented Aug 21, 2023

rhugonnet commented Aug 21, 2023

Add automated approach to validate that requirement files are consistent #230

Add automated approach to validate that requirement files are consistent #230

Comments

erikmannerfelt commented Jul 9, 2021

rhugonnet commented Aug 17, 2023 • edited Loading

rhugonnet commented Aug 17, 2023

erikmannerfelt commented Aug 21, 2023

rhugonnet commented Aug 21, 2023

rhugonnet commented Aug 17, 2023 •

edited

Loading