Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure __version__ is retrieved while all dependencies are available #253

Open
flying-sheep opened this issue Mar 1, 2019 · 21 comments
Open

Comments

@flying-sheep
Copy link
Contributor

flying-sheep commented Mar 1, 2019

load_module is deprecated. And exec_module has a distinct advantage:

Since we try to get __version__ from a module that possibly doesn’t have its dependencies installed, we could partially execute the module by ignoring exceptions (ImportErrors would be sufficient). If the module manages to already set its __version__ before an ImportError is raised, flit install will still work.

This allows to only add those dependencies to build-requires which are needed to get the __version__, and let flit do the rest.

import importlib.util

def get_docstring_and_version_via_import(target):
    """
    Return a tuple like (docstring, version) for the given module,
    extracted by importing the module and pulling __doc__ & __version__
    from it.
    """
    log.debug("Loading module %s", target.file)
    spec = importlib.util.spec_from_file_location(target.name, str(target.file))
    m = importlib.util.module_from_spec(spec)
    with _module_load_ctx():
        spec.loader.exec_module(m)
    docstring = m.__dict__.get('__doc__', None)
    version = m.__dict__.get('__version__', None)
    return docstring, version

@contextmanager
def _module_load_ctx():
    """Preserve some global state that modules might change at import time.
    - Handlers on the root logger.
    - Ignores ImportErrors
    """
    logging_handlers = logging.root.handlers[:]
    try:
        yield
    except ImportError:
        pass
    finally:
        logging.root.handlers = logging_handlers
@takluyver
Copy link
Member

Fine on moving away from the deprecated method. I'm less keen on the idea of ignoring import errors: I'm trying to think about how getting the version number could be made properly robust, not just a bit less likely to fail.

@flying-sheep
Copy link
Contributor Author

I think that would simply mean that we don’t try to get the version until we’ve installed all dependencies.

@takluyver
Copy link
Member

Either that, or limiting how packages can specify their version number so we can always get it without loading the whole package.

@flying-sheep
Copy link
Contributor Author

As long as get_version keeps working 😄

@merwok
Copy link
Contributor

merwok commented Nov 3, 2019

I think I have seen AST browsing used to find an assignment to __version__.

@flying-sheep
Copy link
Contributor Author

Sure, but that won’t work if there’s more complex code here. I want to retrieve the version from the latest git tag using setuptools_scm or get_version, and those need to call a function that gets passed __file__ or so.

@flying-sheep flying-sheep changed the title use loader.exec_module() Ensure `__version__ is retrieved while all dependencies are available Nov 4, 2019
@flying-sheep flying-sheep changed the title Ensure `__version__ is retrieved while all dependencies are available Ensure __version__ is retrieved while all dependencies are available Nov 4, 2019
@takluyver
Copy link
Member

I've mentioned before that I'd consider breaking setuptools_scm type approaches. I haven't gone down that route yet, but I still might at some point. I know a lot of people like that kind of feature, but flit has always been an opinionated tool based around the way I like to do packaging, and that isn't part of it.

@flying-sheep
Copy link
Contributor Author

flying-sheep commented Nov 5, 2019

Please give me a heads-up if you do that so I can move all of my packages to Poetry.

@merwok
Copy link
Contributor

merwok commented Nov 5, 2019

I want to retrieve the version from the latest git tag using setuptools_scm or get_version

Ah! that’s our difference, I’m in the camp of people who like having the canonical version in source code and have packaging metadata and VCS tags derived from it.

@flying-sheep
Copy link
Contributor Author

flying-sheep commented Nov 5, 2019

I don’t want to start a discussion here, so let’s start a new issue to discuss this if you want to reply to this comment:

My main reasons to not do that is because of 1. friction and 2. human fallibility: Your approach means to always perfectly execute:

sed -Ei 's/"1\.3\.dev\d+"/"1\.4"' package/__init__.py
git add package/__init__.py
git commit -m 'v1.4'
sed -i 's/"1\.4"/"1\.4\.dev0"' package/__init__.py
git add package/__init__.py
git commit -m 'dev version'
git push
git push --tags

That’s a lot of work and a lot can go wrong compared to my preferred

git tag v1.4
git push --tags

The minor reasons: 3. It pollutes your commit history with pairs of meaningless commits that 4. contain information that’s duplicated in your git tags (no one true source of information). 5. Theoretically you’d also need to bump dev$x to dev${x+1} in every single other commit to correctly convey that the version changed, while I get this for free.

I know there’s tools who do the above dance for you, but 6. I can never remember how their CLI works, and if you need to specify the new version or how to bump the version and if it pushes for you or not or …

@takluyver
Copy link
Member

I have a feeling we've discussed this before, but briefly: inspecting a package's version number is something I absolutely don't want to do magic, to involve shelling out to git or trying to inspect the environment it's installed in. I want it to be stamped into the code of the package itself. Plus doing any clever things to get the version number usually means doing them at import time, which I also dislike

I've also found that none of the available tools to automate changing version numbers quite fit my brain, so for now I live with updating version numbers by hand as part of a release. No, it's not ideal, but on a scale of things that concern me, it's not going to make the top 1000. I'm experimenting with making my own releaser tool that would fit my way of working ("modeltee"), but it's incomplete and I wouldn't recommend anyone try to use it yet.

(To be clear, I'm still not saying I will make break your preferred way of doing this. If I see a way the different patterns can peacefully coexist, great. But if they conflict, I'll naturally prioritise the way I like to do things.)

@merwok
Copy link
Contributor

merwok commented Nov 5, 2019

(Would be curious to read a few notes about your issues with bumpversion and others)

@flying-sheep
Copy link
Contributor Author

OK! Getting back on topic: Is there anything that prevents us from delaying the __version__ retrieval until the deps are installed?

@goodmami
Copy link

Just got bit by this and found my way here. The fix was simple enough: just flit install before flit publish (see goodmami/wn@d09031f)

The linked issue flying-sheep/get_version#8 helped me see that there's another solution. I could move my __version__ assignment to my main package's __init__.py file, as Flit first attempts to parse the file and inspect the AST without executing the file. Currently I import it from a _meta.py file, but from Flit's perspective this is the same as if I had used setuptools_scm or similar: it requires the code to be imported to get the value of the __version__ attribute.

If no proper fix is proposed, and if the current situation is the intended behaviour, I think it should be documented. E.g., something like this:

  1. Make sure that foobar’s docstring starts with a one-line summary of what the module is, and that it has a __version__:

    """An amazing sample package!"""
    
    __version__ = '0.1'

    If __version__ is a simple assignment of a str value, then you're good to go. If it is some other expression (e.g., using setuptools_scm) or not an assignment (e.g., imported from another module), then Flit will need to import the module to get the value of the attribute. Note that importing the module means that dependencies should be satisfied before invoking a command like flit publish, and it also incurs all the side-effects of executing the code necessary to import the module.

That long disclaimer takes some of the fun out of it, though. Even worse, I think (read: have not tested) that AST parsing would do the wrong thing in cases like this, as it would detect the string assignment and ignore the concatenation afterward:

__version__ = '1.0.0'
__version__ += '-rc1'

I'd like to pretend we live in a world where people don't do things like this, but 2020 has all but destroyed my optimism that people will act rationally.

@flying-sheep
Copy link
Contributor Author

A hack I started to use in my projects is to only import things in the root package if we aren’t being imported by flit:

from get_version import get_version


__author__ = "Philipp Angerer"
__version__ = get_version(__file__)


def within_flit():
    import traceback
    for frame in traceback.extract_stack():
        if frame.name == "get_docstring_and_version_via_import":
            return True
    return False


if not within_flit():
    from .submodule import thing
    ...

@goodmami
Copy link

@flying-sheep this hack only reaffirms my statement above about people acting rationally 😉

Here I agree with something @takluyver said in #257:

[...] the sort of solutions people have proposed so far all involve extra complexity inside every package to look up its version number when it gets imported. That's what I object to. Any extra complexity needed should go in developer tooling, not in the packages' runtime code.

Another option, perhaps, is for Flit to have a setting (e.g., version-location) that specifies the module containing __version__, defaulting to the value of module so as to not break backward compatibility. This file can then be loaded separately (similar to option (3) of https://packaging.python.org/guides/single-sourcing-package-version/). Thus it would be able to execute the Python code without necessarily importing the entire package and its dependencies. In addition, with this method the package would not need to be aware of the developer tooling; it would just put something like from mymodule.version import __version__ in its top __init__.py so from the user's point of view mymodule.__version__ works as expected.

@flying-sheep
Copy link
Contributor Author

flying-sheep commented Dec 1, 2020

Once there’s a pure way to do this, I’ll immediately switch to it.
If you instead give me an alternative that doesn’t rely on a hack and allows for the following, I’ll also immediately switch to it:

  • straightforward package metadata all in pyproject.toml (setuptools fails due to MANIFEST.in and other arcane legacy)
  • editable installs (pip install -e for setuptools, flit install -s, poetry fails)
  • a way to have exactly one source of truth for package versions (flit fails, unless my hack is used)

If my research is exhaustive enough to prove no tool does all 3, what I do is therefore the most rational option.

PS: Even if poetry gains editable installs, the poetry plugin for VCS-sourced versions exists, but is hacky itself, as it relies on monkeypatching.

@goodmami
Copy link

goodmami commented Dec 2, 2020

@flying-sheep no offense intended; I guess the winking-face emoji wasn't enough to overcome the inherent coldness of text. To be more serious, your hack is indeed clever but I'm afraid it might be brittle as I don't think there's any guarantee that get_docstring_and_version_via_import will always be available in future Flit versions.

Regarding your criteria, if we assume that Flit satisfies the first two, then all that's left is a single source for the package version. I think this is already accomplished in simple cases where the published package metadata takes the version from (package).__version__. It sounds like you want to get that version from Git instead of from the package itself. I won't get into the arguments for or against that, but I'm still not seeing how Flit fails to get the version from this single source, it just requires the init code to be executed so the version can be retrieved from Git in Python. Your hack is just trying to avoid importing a bunch of stuff irrelevant for getting the version. Or is there something else I'm missing? If not, I think my proposal above would have the same effect.

@flying-sheep
Copy link
Contributor Author

flying-sheep commented Dec 3, 2020

Oh, it’s future-proof: if the function gets renamed or removed, the check will return False and the rest of the module will be executed.

I fully expect @takluyver to be responsible and only remove or rename get_docstring_and_version_via_import in a commit that comes after the commit that contains fixes gh-253 in its description.

It sounds like you want to get that version from Git instead of from the package itself.

Yes: SCM version tags are necessary to be able to link commits to released versions, therefore they are the logical single source of truth.

is there something else I'm missing?

Yes: If your package’s __init__.py transitively imports a 3rd party package that might not be installed, any module next to or below __init__.py will not be importable without that 3rd party package installed, e.g. the following will still fail if third_party is not installed:

foo/__init__.py

from .version import __version__
from .core import business_logic

foo/version.py

from setuptools_scm import get_version
__version__ = get_version(root='..', relative_to=__file__)

foo/core.py

from third_party import ...

def business_logic(...): ...

So what options exist?

  1. only get version after dependencies are available (this issue)
  2. allow get_docstring_and_version_via_import to ignore an ImportError (just try to get __version__ from the partially imported module after one if possible) (Allow getting version despite deps missing #382)
  3. allow a sanctioned alternative to the hack (Allow modules to stub themselves out #374)
  4. allow specifying some entry point to get the version from (RFE: functionality to set __version__ from scm automatically #257, Allow specifying a function that gets the version #271)

@goodmami
Copy link

goodmami commented Dec 3, 2020

any module next to or below __init__.py will not be importable without that 3rd party package installed

There are ways of importing the module without going through the top-level package, such as the exec() method described in Python's packaging guidelines.

@flying-sheep
Copy link
Contributor Author

I think #382 is the more elegant way to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants