-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support importing across repos with bzlmod #2088
Comments
Any thoughts on how this should behave if the e.g. Or is the expectation only limited to cross-repository dependencies that do not take additional dependencies beyond the leaf repository? |
For the simpler cases, one option that would work for you today with bzlmod and the third-party dependency rules in rules_python, would be to use a direct url reference. e.g That will behave as a simple For situations that involve native dependencies, or complex build backends, it becomes a lot more difficult to build from source through that mechanism. Other simple options to consider are adding a GitHub Action that packages the |
I believe that in that scenario, If That being said, sometimes you don't want that, which is where isolated extension usages come in (unfortunately I couldn't find good documentation on it, but it's at bazelbuild/bazel#20186) |
The MVS algorithm from bzlmod applies to the versions of the bazel extensions in BCR themselves, not of any language dependencies of the code inside the root or leaf respositories.
I think this scenario is more or less solved with bzlmod, because in these scenarios, there wouldn't be python code from the root repository importing python code from a leaf repository right? The extension would provide some sort of toolchain or tool that can be used without leaking at all into the root repository. Agree? Can you clarify a bit about your use-case? Is your use-case an inter-bazel-extension dependency scenario? So where you want one bazel extension to be able to import python code from another bazel extension? Or is your use-case an arbitrary source code sharing scenario outside bazel extensions? |
That's correct, but if you have multiple module extension usages, each requesting a different set of versions of dependencies, it's up to the module extenison to decide what to do. You can give each precisely the thing verison they requested, or attempt to merge them using MVS. Generally we recommend that module extensions perform MVS because that usually works, and if it doesn't, isolated extenison usages allow you to ensure that each gets precisely what they asked for.
Generally what happens is:
With isolated extensions, what happens is:
Keep in mind that when using bzlmod, every* external repo is a module extension. We create the repository rule for chromite, then we have to use the repo rule in a module extension, and finally you make it visible to the main repo in MODULE.bazel. So although our chromite is just a simple repository rule, the path is still repo-mapped as Our want is that we want the main repo to be able to import python files from other repos (but if you can get it working from the main repo, it's already repo-mapping aware, and you can get it working from any repo pretty trivially). We have a multirepo (not ideal for bazel, but it's a historical decision). As an example, we have *Technically, not true. You can invoke repo rules directly in the MODULE.bazel files, though it's much more limited in what you can do with it. |
The third-party dependencies in rules_python are repositories, they're not module extensions. So |
I think we might be getting confused because we're using different terminology. My original thought was that the solution would be to not special-case pip, but instead the pip module extension would generate a repository |
Thanks for the details. It really seems like the real issue is this:
Is this supported in Java? I would think you could possibly pull things in the way you wish if you did something like this: MODULE.bazel
However, I'm not sure thats too much better than having your leaf repos publish source packages or pulling in the source packages through the pypi rules as direct url references. |
The implementation details of how we are currently importing the third-party python language ecosystem to be convenient for bazel is quite a different problem to merging multiple independent source trees through bazel. Our pip module extension aims to take external packages from the Python ecosystem (sdist and bdist typically on PyPI but can be in private artifact storage too) and make these packages importable by python code in the root repository. These bazel foreign dependencies don't have prefixes on their python imports. So users can write python code as they would outside bazel. So "import numpy" is the same inside and outside bazel for numpy users. Your ask is more around merging non packaged python code (ie no sdist or wheel anywhere) and having special python imports for these. That's quite a different problem space to the pip problem space. That's a bazel source deps / vendoring problem. It's not a pip/pypi python packaging problem. It's possible to change the source deps / vendoring problem into a packaging problem if you can expose your source as a package. That's what we did by publishing "runfiles" as a wheel to PyPI. But otherwise these problems are different. |
I'll try to add my two cents about how I understand what the ask is. I think we are conflating here quite a few problems.
To summarize, for now bzlmod does not really support the mixing of dependency trees and it is because in rules python we don't have a good idea on how to combine deps of one bazel_dep and another bazel_dep. Combining them requires inputting both of the requirements files and creating a new lock file. With the previous tools it would have been really difficult and with uv it is becoming more tangible, but I am not sure if it is possible. The only way to share code between two python monorepos is to ship code as wheels and consume it via requirements. Is it something that we should block on before we release 1.0? I am not sure. Is it possible to solve it in rules_python? Possibly, but at minimum it would require users to use imports that do not rely on repository names. And we would need to understand what interface we should expose for locking requirements. The only option that I can think of today is:
The main issue I see is that we cannot solve this without modifying the bzlmod workflow. I'll add a 1.0.0 milestone due to this reason. |
That's correct, yes
I don't believe so. I haven't actually tried it, but since two different MODULE.bazel files could add that exact same line of code, it would have to be namespaced as
Yeah, I was trying to make them the same problem by going the other way round - turning the packaging problem into a bazel problem. It was mostly done out of curiosity - I don't know if of any particular advantage to it other than consistency. I don't think exposing the source as a package is a good long-term solution. Any solution other than just adding the
SGTM, though like you said, this should probably be in a different bug
That sounds like something that may be taking the bazel core developers out of context. While I haven't heard the statement, I'm guessing that the intention of their statement was that the full repository name is unstable, but that the repo-mapped name is stable (ie.
I think that @groodt is right, and that we should leave them as two seperate problems here. This is not about requirements or dependencies, but about regular
I don't think it should be a blocker |
It's convenient for you because you seem to be primarily within a bazel ecosystem, but with a slightly awkward setup where you have a polyrepo of unpackaged python source dependencies. However, we need to recognize that the overwhelming majority of the users of rules_python and of python in general are not in that position. The PyPA standards specify the mechanisms to distribute python packages across different projects and repositories. They can be found here packaging and distribution standards and describe sdist (source distributions) and bdist (binary distributions aka wheels). Going "with the grain" and publishing packages that align to the standards wouldn't be a hack I think. It's actually what the vast majority of python developers work with on a daily basis. It would be more consistent with the expectations of the average python developer and wouldn't be rewriting or introducing something at python import level. Introducing that would be inconsistent to my expectations as a python developer. In a similar way that a Java engineer would turn up their nose at There are no language standards for monorepos or source dependencies (that aren't sdist) like there might be for e.g. golang. Golang naturally has the benefit where the language was designed with bazel in mind and the language import structure itself maps to source repository structures. |
Agreed that the repo-mapped name is stable and I am struggling to find the Slack/GH thread that I saw this discussion, so since the context is missing, might be good to leave it there. :) Just a side note, the reason why I think playing with the If we look at how a sample_project project would be laid out, you will have the following code structure:
This means that if Are there drawbacks in asking/forcing monorepo users to adopt the sample project structure outlined above? In the projects I worked on, it was not a problem and I am curious to see examples where this is infeasible. I understand that change is undesirable here, but would like to better understand the impact of doing nothing. |
The project structure described above doesn't inherently solve the problem. We'd still need to work out how to map I don't think the project structure you describe is a bad structure. My main concern here is to minimize "magic". Our policy we have at google is:
It feels to me like a repo
They way it works is that:
This only works because it's trivial to compute the package for a given import (we look in one specific directory containing all our pypi packages, and otherwise you just use the full import). |
Thanks for the context, this is useful. I am still not 100% convinced that Coming back on topic to repo mapping and imports, lets say we have dependencies However, you may still want to depend on I guess I see the problem with my suggestion of "the import just needs to not include the repo name" and repo mapping in general, so having a repo mapping aware solution of constructing the import path would be better. Let me know if I am seeing a different problem from what you are seeing with diamond dependencies. IMHO the scope of solving this properly is reasonably large and given the complexity of Python and what it allows and what it doesn't I am not sure if it is even feasible to have a robust method of making Up until now we wanted to separate the two problems of distributing code and using it using EDIT: I am wondering if we should also tie this together to Any feature that we add should improve correctness of the builds and make things unambiguous. The current behaviour of the workspace name being useable in the Python |
I didn't read every word here (I only had about an hour here), but I think I read enough to respond to the main point. Let me be clear: Having the Bazel repo name decide the Python import name is not sound. The two things are different object models for different things. It was only tenuously working in workspace. Under bzlmod, with repo mapping and name munging, it quickly becomes ambiguous and complex. You might, as a library owner, choose to same name for your top-level Python package and Bazel repo/Bazel module name. That's fine, go for it. The best way to do this today is to use the For workspace builds, the key issue is that the end user decides a repo name. It's perfectly valid to download the same code under two different repo names. It might work, or not. For bzlmod builds, the key issue is repo mapping. One module's |
🚀 feature request
Relevant Rules
py_library
/py_binary
Description
If there is a python library
@foo//bar:baz
, how can I import it from the main repository?Pre-bzlmod, you could write
from foo.bar import baz
, as it would be in the directoryexternal/foo/bar/baz
, and the external directory was in your python path. Apparently this was not documented or intended, but nevertheless, people used it, as I don't believe it was ever considered when writing rules_python.In bzlmod, however, you have repo mapping, where the directory
foo
is actually stored atexternal/_main~~foo_ext~foo
, and thusfrom foo.bar import baz
fails. This is the direct cause of #1679, and if we solve this, we fix that bug for free.Describe the solution you'd like
I'd personally like to make users prefix their imports to external repositories with the name of the workspace (based on the repo mapping). This would allow two python files in different repositories to
import foo.bar.baz
and each get their own@foo
in the case of a repo name clash (this is how repo mapping is intended to be used, to allow each repo to have their own namespace of repos).I previously achieved this in ChromeOS via a
sitecustomize.py
(see discussion in #1679).Describe alternatives you've considered
I can't really think of a viable alternative. Any solution without including the repo name in the import seems like it will end up in painful corner cases where you have name conflicts.
The text was updated successfully, but these errors were encountered: