Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import-style prevents vendoring #1165

Closed
jku opened this issue Oct 6, 2020 · 19 comments
Closed

import-style prevents vendoring #1165

jku opened this issue Oct 6, 2020 · 19 comments
Labels
decision record Outcome of this discussion should be tracked in a decision record discussion Discussions related to the design, implementation and operation of the project

Comments

@jku
Copy link
Member

jku commented Oct 6, 2020

I just tried to vendor TUF into pip (in other words embed the tuf sources inside pip sources) and it fails (even with #1160 fixed) because of the import style used in TUF source code itself:

import tuf.download

Using this (instead of relative imports) means tuf needs to be installed to python path for the imports to work.

I've looked at the dozen other projects pip vendors and the system seems to cope with two styles:

  1. Most projects use relative imports like
from . import treebuilders
from .treebuilders.base import Marker
  1. The other method is
from toml.decoder import InlineTableDict

The second style works because vendoring (the tool) modifies import NAME to from vendored.path import NAME and from NAME.module import something to from vendored.path.NAME.module import something. I assume this is mostly done so vendored dependencies of vendored projects work (so e.g. vendored tuf can "import securesystemslib"). In any case the TUF style cannot be automatically modified as far as I can tell.

I'll try to add reproduction instructions that don't require my pip branch...

@jku
Copy link
Member Author

jku commented Oct 6, 2020

Example you can run or have a look at:
git clone https://gist.github.com/22d5f08083c8b7b2f3e552f7ca6c301e.git

@jku
Copy link
Member Author

jku commented Oct 6, 2020

The issue is any "submodule import" really, not just internal submodule imports: import securesystemslib.util will be an issue as well

@joshuagl
Copy link
Member

joshuagl commented Oct 6, 2020

This reads like in order to support vendoring TUF (and securesystemslib) we need to change the way we do imports to either

  1. explicit relative imports (from . import downloads)
  2. importing functions (and other attributes) of a module into the current modules namespace (from tuf.download import safe_download)

All things being equal, I prefer absolute imports – I believe they make the code easier to read and follow. Indeed, I recently rejected a securesystemslib PR which added explicit relative imports (secure-systems-lab/securesystemslib#254), but now I'm wondering if that's the direction we should take?

I certainly prefer explicit relative imports to importing functions from a dependent module into our own namespace.

@joshuagl joshuagl added decision record Outcome of this discussion should be tracked in a decision record discussion Discussions related to the design, implementation and operation of the project labels Oct 6, 2020
@jku
Copy link
Member Author

jku commented Oct 6, 2020

If the important bit is the complete code namespacing, I suppose that would still be possible if package initialization imported the submodules? As in you could do

import tuf
tuf.download.safe_download(...)

if __init__.py actually imported submodules (or a reasonable subset of them) . I don't know much about python library design so maybe there's a reason that isn't being done?

@joshuagl
Copy link
Member

joshuagl commented Oct 7, 2020

Two reasons not to do that today would be clients importing all of the repository logic and everyone who imported tuf would have log handlers modified #1148

If we had a clearer distinction between the client and repository logic, it might make more sense to have the package initialisation import submodules.

@lukpueh
Copy link
Member

lukpueh commented Oct 7, 2020

If we have to choose between relative imports and importing names (functions, globals, etc) directly, I too vote for the former. At least this requirement will make it easy to choose an import style convention. :)

@jku, have you checked if we'd need to patch any of TUF's dependencies or transitive dependencies, except for securesystemslib?

@jku
Copy link
Member Author

jku commented Oct 7, 2020

All of TUF dependencies are already pip dependencies (so succesfully vendored) except SSL and iso8601 (which is a single file library).

If we have to choose between relative imports and importing names (functions, globals, etc) directly, I too vote for the former. At least this requirement will make it easy to choose an import style convention. :)

I'm researching more clever alternatives as both of these options mean a massive PR: ~700 actual code lines would need to change and the possibility of a namespace mistake is real. I'll try to have a possible solution for discussion by tomorrow.

@jku jku mentioned this issue Oct 7, 2020
3 tasks
@jku
Copy link
Member Author

jku commented Oct 7, 2020

There's a possible fix in a PR -- it might look like a large change but 70% of it is test changes that are completely optional: I just did those to make them the same style as the actual source.

I'm also speaking to the vendoring maintainer to make sure this is not a misunderstanding (I don't think it is -- it's just impossible for vendoring to handle the current imports)

@jku
Copy link
Member Author

jku commented Oct 7, 2020

other points that should maybe be bug reports?

  • repository_tool, developer_tool and repository_lib could live somewhere where they would not be confused for library parts (assuming they are not parts of the library like I think): tuf/scripts/ maybe or a directory of their own?
  • unittest_toolbox should just be moved to tests/
  • download and mirrors should probably be moved to tuf/client/

@jku jku changed the title import-style prevents vendoring and running uninstalled import-style prevents vendoring Oct 7, 2020
@jku
Copy link
Member Author

jku commented Oct 9, 2020

I've been looking at and testing the Draft PR and it works: the patch is reasonable size (most changes are in tests and completely optional), all current methods of importing keep working, and vendoring now works (apart from the securesystemslib imports that would also have to be fixed) ... but I'm not entirely happy:

  • the circular import (keydb.py: import tuf, __init__.py: from . import keydb) works but it just looks wrong. There were also issues with previous versions of this in python2: I can't promise the other interpreters will be just fine with this
  • I think defining package default imports(__init__.py) is fine but I'm not as convinced the contents should be defined by the purpose of avoiding code changes within the project source code... This could be avoided by renaming the common import file and doing something like import _internal as tuf

@jku
Copy link
Member Author

jku commented Oct 9, 2020

Documenting the core issue and main options for those who are not familiar with it:

Core issue
I'd like to vendor tuf into pip in the near future. Vendoring using the vendoring tool requires changes to the way tuf & securesystemslib do imports.

Options I see available

  • patch tuf and securesystemslib in pip: Vendoring does support patching so this is a potential stopgap solution for testing (I could use a trimmed down version of this patch). This is not a viable long term solution
  • avoid major code changes by using a common imports file like in my draft PR to (see previous comment) or by other ugly trickery like this:
class Tuf:
  def __init__(self):
    from tuf import exceptions
    self.exceptions = exceptions
tuf=Tuf() # voila, tuf.exceptions now works as expected. Also ewww

@jku
Copy link
Member Author

jku commented Oct 12, 2020

Related findings:

  • pylint does not understand the tuf import style at least in the unused-imports check: as a result we have some unused internal imports in our code.
  • we are not very consistent in imports: lots of files use e.g. securesystemslib modules without importing them (and trust that someone else is importing them)

@joshuagl
Copy link
Member

We've talked elsewhere about adopting, with minimal modifications, the Google Python coding style.

That style guide recommends either:

import foo.bar
badger = foo.bar.BADGER

or

from foo import bar
badger = bar.BADGER

and explicitly recommends against using relative imports.

I think the second pattern above (from foo import bar) is compatible with vendoring but has the downside of being a fairly significant code change. Am I following correctly?

@jku
Copy link
Member Author

jku commented Oct 15, 2020

Yes, that seems correct to me. Bringing the example to TUF context:

Current style

import tuf.download

tuf.download.safe_download(...)

would become

from tuf import download

download.safe_download(...)

That is a big change and cannot be completely automated.

I don't think we have name clashes where a tuf module could be mistaken for another imported package or module... But the change is likely to lead to some local name clashes (in the sense that download might be a local variable already): We might be able to automate parts of the refactor... maybe something like this:

  • find potential name clashes with some clever grep (as in, this hopefully shows the uses of "formats" that should be reviewed for manual fixing: git grep -P '(?<!tuf\.)(?<!securesystemslib\.)formats' -- '*.py')
  • manually fix the name clashes first (+ test)
  • do the import switcheroo with another script
  • manual review step to find any unusual cases (+test)
  • grep + manual fixes for documentation and examples (this is in some ways optional: the current style of importing works, we just wouldn't be using it inside the code base anymore)

@lukpueh
Copy link
Member

lukpueh commented Oct 15, 2020

Thanks for all the digging, @jku! Based on the from toml.decoder import InlineTableDict example in the PR description, I was under the impression that the symbol after import could not be a module in order to work with vendoring.

If that is not the case, and constructs like from tuf.client import updater are indeed compatible with vendoring, then I lean towards "biting the bullet" and refactoring the imports, so that we stay Pythonic/idiomatic and also align with the Google style guide we want to adopt.

I would only go with any of your proposed trickery, if it makes the refactor substantially easier and we make it very clear that it is only a temporary hack that won't survive the upcoming refactor.

@lukpueh
Copy link
Member

lukpueh commented Oct 15, 2020

Regarding the other pain points you mentioned ...

repository_tool, developer_tool and repository_lib could live somewhere where they would not be confused for library parts (assuming they are not parts of the library like I think): tuf/scripts/ maybe or a directory of their own?

Agreed but probably not worth to change before the planned refactor? Definitely something to consider together with #1136 and #881. Maybe this warrants a separate over-arching module architecture design issue? Also #1134 (comment) seems related.

unittest_toolbox should just be moved to tests/

Agreed but IMO low-prio.unittest_toolbox should probably also be subjected to a revision.

download and mirrors should probably be moved to tuf/client/

Agreed but see both comments above and #1135.

pylint does not understand the tuf import style at least in the unused-imports check: as a result we have some unused internal imports in our code.

I know. :/ I've been meaning to at least submit an issue for this for a long time (I will do this right now). FWIW, I've fixed this a long time ago in in-toto and have created a ticket in securesystemslib, which both adopted their linter config from tuf (see secure-systems-lab/securesystemslib#243).

we are not very consistent in imports: lots of files use e.g. securesystemslib modules without importing them (and trust that someone else is importing them)

I know. :/ I think a sensible linter config would catch this (see above).

@jku
Copy link
Member Author

jku commented Oct 15, 2020

Thanks for all the digging, @jku! Based on the from toml.decoder import InlineTableDict example in the PR description, I was under the impression that the symbol after import could not be a module in order to work with vendoring.

right I can see how that was ambiguous. vendoring can cope with any form of from X import Y I believe but the specific limitation is that the local imported name cannot have a dot in it (as from X import Y.Z is not valid). So the local name can refer to anything (module, class, ...) as long as it's just the name "without path".

@lukpueh
Copy link
Member

lukpueh commented Oct 15, 2020

I've been meaning to at least submit an issue for this [pylint configuration] for a long time

--> #1178

@jku jku added client Related to the client (updater) implementation and removed client Related to the client (updater) implementation labels Nov 27, 2020
@jku
Copy link
Member Author

jku commented Apr 15, 2021

well this one's done already

@jku jku closed this as completed Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision record Outcome of this discussion should be tracked in a decision record discussion Discussions related to the design, implementation and operation of the project
Projects
None yet
Development

No branches or pull requests

3 participants