Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support importing across repos with bzlmod #2088

Open
matts1 opened this issue Jul 25, 2024 · 16 comments
Open

Support importing across repos with bzlmod #2088

matts1 opened this issue Jul 25, 2024 · 16 comments

Comments

@matts1
Copy link
Contributor

matts1 commented Jul 25, 2024

🚀 feature request

Relevant Rules

py_library / py_binary

Description

If there is a python library @foo//bar:baz, how can I import it from the main repository?

Pre-bzlmod, you could write from foo.bar import baz, as it would be in the directory external/foo/bar/baz, and the external directory was in your python path. Apparently this was not documented or intended, but nevertheless, people used it, as I don't believe it was ever considered when writing rules_python.

In bzlmod, however, you have repo mapping, where the directory foo is actually stored at external/_main~~foo_ext~foo, and thus from foo.bar import baz fails. This is the direct cause of #1679, and if we solve this, we fix that bug for free.

Describe the solution you'd like

I'd personally like to make users prefix their imports to external repositories with the name of the workspace (based on the repo mapping). This would allow two python files in different repositories to import foo.bar.baz and each get their own @foo in the case of a repo name clash (this is how repo mapping is intended to be used, to allow each repo to have their own namespace of repos).

I previously achieved this in ChromeOS via a sitecustomize.py (see discussion in #1679).

Describe alternatives you've considered

I can't really think of a viable alternative. Any solution without including the repo name in the import seems like it will end up in painful corner cases where you have name conflicts.

@groodt
Copy link
Collaborator

groodt commented Jul 25, 2024

Any thoughts on how this should behave if the @foo repository uses third-party dependencies from PyPI?

e.g. pydantic==1.10.17 but the root respository wishes to use pydantic==2.8.2?

Or is the expectation only limited to cross-repository dependencies that do not take additional dependencies beyond the leaf repository?

@groodt
Copy link
Collaborator

groodt commented Jul 25, 2024

For the simpler cases, one option that would work for you today with bzlmod and the third-party dependency rules in rules_python, would be to use a direct url reference.

e.g foo @ git+https://github.com/foo/[email protected]#7921be1537eac1e97bc40179a57f0349c2aee67d

That will behave as a simple sdist dependency to the resolution systems (and would work with transitive dependency resolution) and for most pure python scenarios, would work well.

For situations that involve native dependencies, or complex build backends, it becomes a lot more difficult to build from source through that mechanism.

Other simple options to consider are adding a GitHub Action that packages the foo dependency as a wheel to PyPI, or a private repository, or a github artifact and then you can also pull that down through the existing rules_python third-party dependency rules as well.

@matts1
Copy link
Contributor Author

matts1 commented Jul 25, 2024

Any thoughts on how this should behave if the @foo repository uses third-party dependencies from PyPI?

e.g. pydantic==1.10.17 but the root respository wishes to use pydantic==2.8.2?

I believe that in that scenario, @foo's import pydantic should import 1.10.17, while the root repository's import pydantic should resolve to importing 2.8.2 (that isn't usually what you want, but in practice it should be dealt with by the module extension itself).

If @foo's module.bazel declares a dependency on 1.10.17, and the root module declares 2.8.2, it's the responsibility of rules_python to perform MVS and end up with 2.8.2. This means that in practice, they will both end up resolving to the same version.

That being said, sometimes you don't want that, which is where isolated extension usages come in (unfortunately I couldn't find good documentation on it, but it's at bazelbuild/bazel#20186)

@groodt
Copy link
Collaborator

groodt commented Jul 25, 2024

it's the responsibility of rules_python to perform MVS and end up with 2.8.2. This means that in practice, they will both end up resolving to the same version.

The MVS algorithm from bzlmod applies to the versions of the bazel extensions in BCR themselves, not of any language dependencies of the code inside the root or leaf respositories.

isolated extension usages

I think this scenario is more or less solved with bzlmod, because in these scenarios, there wouldn't be python code from the root repository importing python code from a leaf repository right? The extension would provide some sort of toolchain or tool that can be used without leaking at all into the root repository. Agree?

Can you clarify a bit about your use-case?

Is your use-case an inter-bazel-extension dependency scenario? So where you want one bazel extension to be able to import python code from another bazel extension?

Or is your use-case an arbitrary source code sharing scenario outside bazel extensions?

@matts1
Copy link
Contributor Author

matts1 commented Jul 25, 2024

The MVS algorithm from bzlmod applies to the versions of the bazel extensions in BCR themselves, not of any language dependencies of the code inside the root or leaf respositories.

That's correct, but if you have multiple module extension usages, each requesting a different set of versions of dependencies, it's up to the module extenison to decide what to do. You can give each precisely the thing verison they requested, or attempt to merge them using MVS. Generally we recommend that module extensions perform MVS because that usually works, and if it doesn't, isolated extenison usages allow you to ensure that each gets precisely what they asked for.

isolated extension usages

I think this scenario is more or less solved with bzlmod, because in these scenarios, there wouldn't be python code from the root repository importing python code from a leaf repository right? The extension would provide some sort of toolchain or tool that can be used without leaking at all into the root repository. Agree?

Generally what happens is:

  1. root repo requests foo v1.2
  2. other repo requests foo v1.1
  3. Pip module extension receives both of these requests at the same time, performs MVS on the requests, and creates a repo @rules_python~~pip_ext~pip containing foo v1.2
  4. Both of these repos map @pip to @rules_python~~pip_ext~pip, thus both getting foo v1.2

With isolated extensions, what happens is:

  1. Root repo requests foo v1.2 isolated
  2. Pip module extension receives the request, and creates a repo @rules_python~~~main~~pip_ext~pip containing foo v1.2
  3. @pip is mapped in the main repo to @rules_python~~~main~~pip_ext~pip , thus getting foo v1.2
  4. other repo requests foo v1.1
  5. pip module extension receives this request, and creates a repo @rules_python~~pip_ext~pip containing foo v1.1
  6. Other repo maps @pip to @rules_python~~pip_ext~pip, thus getting foo v1.1

Can you clarify a bit about your use-case?

Is your use-case an inter-bazel-extension dependency scenario? So where you want one bazel extension to be able to import python code from another bazel extension?

Or is your use-case an arbitrary source code sharing scenario outside bazel extensions?

Keep in mind that when using bzlmod, every* external repo is a module extension. We create the repository rule for chromite, then we have to use the repo rule in a module extension, and finally you make it visible to the main repo in MODULE.bazel. So although our chromite is just a simple repository rule, the path is still repo-mapped as _main~~cros_deps~chromite.

Our want is that we want the main repo to be able to import python files from other repos (but if you can get it working from the main repo, it's already repo-mapping aware, and you can get it working from any repo pretty trivially). We have a multirepo (not ideal for bazel, but it's a historical decision). As an example, we have cache.py stored at @chromite//lib:cache. So from the main repo, we'd like to be able to run import chromite.lib.cache.

*Technically, not true. You can invoke repo rules directly in the MODULE.bazel files, though it's much more limited in what you can do with it.

@groodt
Copy link
Collaborator

groodt commented Jul 25, 2024

each requesting a different set of versions of dependencies, it's up to the module extenison to decide what to do

The third-party dependencies in rules_python are repositories, they're not module extensions. So import pydantic inside a python source file refers to a python dependency, not a bazel module extension dependency.

@matts1
Copy link
Contributor Author

matts1 commented Jul 25, 2024

I think we might be getting confused because we're using different terminology. pydantic, in your example, is a repository created by the pip module extension. (@pip//pydantic is actually an alias to something along the lines of @@rules_python~~pip_ext~pydantic~1.2.3//:pydantic).

My original thought was that the solution would be to not special-case pip, but instead the pip module extension would generate a repository @@rules_python~~pip~pydantic which would then be visible via repo-mapping as @pydantic, and so import pydantic.foo would just import foo within the repo visible as @pydantic. I'm not sure how viable it is - bazel's new auto use_repo fixups (bazelbuild/bazel#17908) should help.

@groodt
Copy link
Collaborator

groodt commented Jul 25, 2024

Thanks for the details. It really seems like the real issue is this:

Our want is that we want the main repo to be able to import python files from other repos

Is this supported in Java?

I would think you could possibly pull things in the way you wish if you did something like this:

MODULE.bazel

http_archive = use_repo_rule("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(name = "foo", urls = [...])

However, I'm not sure thats too much better than having your leaf repos publish source packages or pulling in the source packages through the pypi rules as direct url references.

@groodt
Copy link
Collaborator

groodt commented Jul 25, 2024

My original thought was that the solution would be to not special-case pip, but instead the pip module extension would generate a repository @@rules_python~~pip~pydantic which would then be visible via repo-mapping as @pydantic, and so import pydantic.foo would just import foo within the repo visible as @pydantic. I'm not sure how viable it is

The implementation details of how we are currently importing the third-party python language ecosystem to be convenient for bazel is quite a different problem to merging multiple independent source trees through bazel.

Our pip module extension aims to take external packages from the Python ecosystem (sdist and bdist typically on PyPI but can be in private artifact storage too) and make these packages importable by python code in the root repository. These bazel foreign dependencies don't have prefixes on their python imports. So users can write python code as they would outside bazel. So "import numpy" is the same inside and outside bazel for numpy users.

Your ask is more around merging non packaged python code (ie no sdist or wheel anywhere) and having special python imports for these. That's quite a different problem space to the pip problem space. That's a bazel source deps / vendoring problem. It's not a pip/pypi python packaging problem.

It's possible to change the source deps / vendoring problem into a packaging problem if you can expose your source as a package. That's what we did by publishing "runfiles" as a wheel to PyPI. But otherwise these problems are different.

@aignas
Copy link
Collaborator

aignas commented Jul 25, 2024

I'll try to add my two cents about how I understand what the ask is. I think we are conflating here quite a few problems.

  1. Right now the pip.parse is assuming that python trees are isolated and we ask the user to provide a unique hub_name, which is an alternative to using the isolated module extensions. I think it is likely that we should just move to use the isolated extensions by default, just like what rules_go is forcing users to isolate their extension usage for third party dependencies if the user does not want the merging of third party dependency trees.

  2. Doing MVS across multiple external dependency trees is something that we try to not do yet because we don't yet have a stable solution for "Cross compilation" of py_binary/py_image/py_library targets #260. In our case the merging is not super complex if we just chose the minimum version out of all requirement lock files, but it is also something that we haven't experimented yet. By default many dependency solvers select the highest available version of the dependencies and then doing the MVS without being more in the weeds is not possible. Because go mod is using MVS, rules_go is having an easier time to do that there, I think.

  3. The imports relying on repository names previously worked but broke due to bazel moving to repo mapping model and it is quite correct in saying that you cannot rely on the repo names when doing imports. Even python proper is not doing that as the names of the imports are different from the wheel or repository names. My stance here would be to not support this ask and just follow what the bazel core developers said - the name of the repo is unstable and should not be relied upon as the repo mapping cannot work otherwise. A solution to import rules_python doesn't work under bzlmod #1679 is not pretty or generic and requires rules_python to move code around, so it is unlikely that it will ever get used.

To summarize, for now bzlmod does not really support the mixing of dependency trees and it is because in rules python we don't have a good idea on how to combine deps of one bazel_dep and another bazel_dep. Combining them requires inputting both of the requirements files and creating a new lock file. With the previous tools it would have been really difficult and with uv it is becoming more tangible, but I am not sure if it is possible. The only way to share code between two python monorepos is to ship code as wheels and consume it via requirements.

Is it something that we should block on before we release 1.0? I am not sure. Is it possible to solve it in rules_python? Possibly, but at minimum it would require users to use imports that do not rely on repository names. And we would need to understand what interface we should expose for locking requirements. The only option that I can think of today is:

  • users specify their input files to pip.parse extension
  • the pip.parse extension exposes a target that allows users to lock the requirements. If the usage of the extensions is not isolated, the requirements locker would get all of the inputs files as constraints.
  • the lock file that gets materialized and subsequently used by rules_python is a result of all of those input files.

The main issue I see is that we cannot solve this without modifying the bzlmod workflow. I'll add a 1.0.0 milestone due to this reason.

@matts1
Copy link
Contributor Author

matts1 commented Jul 26, 2024

Thanks for the details. It really seems like the real issue is this:

Our want is that we want the main repo to be able to import python files from other repos

That's correct, yes

I would think you could possibly pull things in the way you wish if you did something like this:

MODULE.bazel

http_archive = use_repo_rule("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(name = "foo", urls = [...])

I don't believe so. I haven't actually tried it, but since two different MODULE.bazel files could add that exact same line of code, it would have to be namespaced as _main~~foo or similar.

It's possible to change the source deps / vendoring problem into a packaging problem if you can expose your source as a package. That's what we did by publishing "runfiles" as a wheel to PyPI. But otherwise these problems are different.

Yeah, I was trying to make them the same problem by going the other way round - turning the packaging problem into a bazel problem. It was mostly done out of curiosity - I don't know if of any particular advantage to it other than consistency.

I don't think exposing the source as a package is a good long-term solution. Any solution other than just adding the py_library target as a dep and then importing it is a hack, IMO, because that's the clear winner from a UX and consistency standpoint.

I'll try to add my two cents about how I understand what the ask is. I think we are conflating here quite a few problems.

  1. Right now the pip.parse is assuming that python trees are isolated and we ask the user to provide a unique hub_name, which is an alternative to using the isolated module extensions. I think it is likely that we should just move to use the isolated extensions by default, just like what rules_go is forcing users to isolate their extension usage for third party dependencies if the user does not want the merging of third party dependency trees.
  1. Doing MVS across multiple external dependency trees is something that we try to not do yet because we don't yet have a stable solution for "Cross compilation" of py_binary/py_image/py_library targets #260. In our case the merging is not super complex if we just chose the minimum version out of all requirement lock files, but it is also something that we haven't experimented yet. By default many dependency solvers select the highest available version of the dependencies and then doing the MVS without being more in the weeds is not possible. Because go mod is using MVS, rules_go is having an easier time to do that there, I think.

SGTM, though like you said, this should probably be in a different bug

  1. The imports relying on repository names previously worked but broke due to bazel moving to repo mapping model and it is quite correct in saying that you cannot rely on the repo names when doing imports. Even python proper is not doing that as the names of the imports are different from the wheel or repository names. My stance here would be to not support this ask and just follow what the bazel core developers said - the name of the repo is unstable and should not be relied upon as the repo mapping cannot work otherwise. A solution to import rules_python doesn't work under bzlmod #1679 is not pretty or generic and requires rules_python to move code around, so it is unlikely that it will ever get used.

That sounds like something that may be taking the bazel core developers out of context. While I haven't heard the statement, I'm guessing that the intention of their statement was that the full repository name is unstable, but that the repo-mapped name is stable (ie. _main~~foo_ext~foo is unstable, but if it was imported as "foo" in the main repo, then foo is a stable identifier that can be used to refer to that repo from the main repo). So I think it's perfectly reasonable to allow users to use import foo to import the repository visible as foo. If the repo-mapped name was unsstable, then depending on @pip//foo would be unstable in your build files, which is obviously false.

To summarize, for now bzlmod does not really support the mixing of dependency trees and it is because in rules python we don't have a good idea on how to combine deps of one bazel_dep and another bazel_dep. Combining them requires inputting both of the requirements files and creating a new lock file. With the previous tools it would have been really difficult and with uv it is becoming more tangible, but I am not sure if it is possible. The only way to share code between two python monorepos is to ship code as wheels and consume it via requirements.

I think that @groodt is right, and that we should leave them as two seperate problems here. This is not about requirements or dependencies, but about regular py_library targets. We can package code just fine. We have access to all the information we need (the REPO_MAPPING file and the paths. it's simply a question of hooking in to the import system to import it correctly. I already implemented this for ChromeOS with a sitecustomize.py, but I don't know if there's a better way to do it.

Is it something that we should block on before we release 1.0? I am not sure. Is it possible to solve it in rules_python? Possibly, but at minimum it would require users to use imports that do not rely on repository names. And we would need to understand what interface we should expose for locking requirements. The only option that I can think of today is:

  • users specify their input files to pip.parse extension
  • the pip.parse extension exposes a target that allows users to lock the requirements. If the usage of the extensions is not isolated, the requirements locker would get all of the inputs files as constraints.
  • the lock file that gets materialized and subsequently used by rules_python is a result of all of those input files.

The main issue I see is that we cannot solve this without modifying the bzlmod workflow. I'll add a 1.0.0 milestone due to this reason.

I don't think it should be a blocker

@groodt
Copy link
Collaborator

groodt commented Jul 26, 2024

I don't think exposing the source as a package is a good long-term solution. Any solution other than just adding the py_library target as a dep and then importing it is a hack, IMO, because that's the clear winner from a UX and consistency standpoint.

It's convenient for you because you seem to be primarily within a bazel ecosystem, but with a slightly awkward setup where you have a polyrepo of unpackaged python source dependencies.

However, we need to recognize that the overwhelming majority of the users of rules_python and of python in general are not in that position. The PyPA standards specify the mechanisms to distribute python packages across different projects and repositories. They can be found here packaging and distribution standards and describe sdist (source distributions) and bdist (binary distributions aka wheels).

Going "with the grain" and publishing packages that align to the standards wouldn't be a hack I think. It's actually what the vast majority of python developers work with on a daily basis. It would be more consistent with the expectations of the average python developer and wouldn't be rewriting or introducing something at python import level. Introducing that would be inconsistent to my expectations as a python developer. In a similar way that a Java engineer would turn up their nose at import rules_java.com.google.bazel.blabla;

There are no language standards for monorepos or source dependencies (that aren't sdist) like there might be for e.g. golang. Golang naturally has the benefit where the language was designed with bazel in mind and the language import structure itself maps to source repository structures.

@aignas
Copy link
Collaborator

aignas commented Jul 26, 2024

That sounds like something that may be taking the bazel core developers out of context. While I haven't heard the statement, I'm guessing that the intention of their statement was that the full repository name is unstable, but that the repo-mapped name is stable (ie. _main~~foo_ext~foo is unstable, but if it was imported as "foo" in the main repo, then foo is a stable identifier that can be used to refer to that repo from the main repo). So I think it's perfectly reasonable to allow users to use import foo to import the repository visible as foo. If the repo-mapped name was unstable, then depending on @pip//foo would be unstable in your build files, which is obviously false.

Agreed that the repo-mapped name is stable and I am struggling to find the Slack/GH thread that I saw this discussion, so since the context is missing, might be good to leave it there. :)

Just a side note, the reason why @pip//foo works is because Python has _main~~foo/site-packages in the PYTHONPATH and the site-packages directory has a foo directory. We are not doing anything too clever there and we just ensure that all of the modules are importable. We are just relying on standard Python importlib machinery.

I think playing with the PathFinder and related importlib could be an interesting academic challenge and it is somewhat described in here. I personally would not be against on using the runfiles library and importlib to solve this problem of import <repo_mapped_name>.foo, because it may have its uses. So doing the research to see if it is possible and if it fixes any other things would be welcome, in my opinion. But that does not necessarily mean that the research would result in merged code - it would give more meat in our discussions and we could be more concrete.

If we look at how a sample_project project would be laid out, you will have the following code structure:

<repo_root>
- <foo> or <src/foo>  # This is where the python code resides
- <MODULE.bazel>  # to signify the module root
...

This means that if rules_python decides to support importing things by using import <repo_root>.foo it means that we are going against the grain with respect to the general Python ecosystem and that would be the main sticking point of why I don't think it is a good idea to extend the rules_python supported API even further. In order to change this opinion, I think I would have to see at least a sketch of a solution for this.

Are there drawbacks in asking/forcing monorepo users to adopt the sample project structure outlined above? In the projects I worked on, it was not a problem and I am curious to see examples where this is infeasible. I understand that change is undesirable here, but would like to better understand the impact of doing nothing.

@matts1
Copy link
Contributor Author

matts1 commented Jul 26, 2024

Are there drawbacks in asking/forcing monorepo users to adopt the sample project structure outlined above? In the projects I worked on, it was not a problem and I am curious to see examples where this is infeasible. I understand that change is undesirable here, but would like to better understand the impact of doing nothing.

The project structure described above doesn't inherently solve the problem. We'd still need to work out how to map import foo to @foo//src/foo, which still requires an understanding of repo mapping.

I don't think the project structure you describe is a bad structure. My main concern here is to minimize "magic". Our policy we have at google is:

  • Projects which are packaged on pypi are always imported via import <package name>
  • All other projects are always imported by their full path.

It feels to me like a repo @foo//src/bar being accessed via import bar is way too much magic, making it both harder to read for humans, and will make it harder for tooling to work correctly. I've always believed that you should be able to work out the bazel package purely from the import path, and able to work out how to import it purely by the build label. Internally at google we have the tools build_cleaner and py_strict_(binary|library). Consider the following interaction:

$ blaze build //path/to/package                     
ERROR: Strict deps violations: //path/to/package:server
  In path/to/package/server.py, no direct deps found for imports:
    line 10: from google3.other.path import my_library: Module "google3.other.path" not provided by a direct dep

*** Please fix target dependencies. ***

build_cleaner //path/to/package:server

Target //path/to/package:server failed to build
ERROR: Build did NOT complete successfully
$ build_cleaner //path/to/package:server
...
For target //path/to/package:server, build_cleaner made the following changes:
'deps' attribute:
  Add //other/path:my_library

They way it works is that:

  1. It sees import google3.other.path
  2. It knows that this must either be contained within the @google3//other/path package (which it sees exists), or for the google3 pypi package (which it realizes doesn't exist).
  3. It finds the python file that exports my_library (my_library.py)
  4. It does a query to work out which python library has my_library.py as its srcs, getting @google3//other/path:my_library

This only works because it's trivial to compute the package for a given import (we look in one specific directory containing all our pypi packages, and otherwise you just use the full import).

@aignas
Copy link
Collaborator

aignas commented Jul 26, 2024

Thanks for the context, this is useful.

I am still not 100% convinced that @foo//src/bar and import bar is super magical. Because if foo is a repo that provides a number of imports like bar, baz (and maybe foo itself), then it still makes sense. There are many python packages which do that, for example @pypi//opencv_python_headless allows you to import cv2 and it just shows that the repo/project name does not necessarily map to the package name. I view the bazel workspace name as just that. Maybe one could argue that it would be better to be able to have @pypi//opencv_python_headless:cv2 to be able to import cv2, but the Python ecosystem is unfortunately imperfect and does not map to that paradigm well.

Coming back on topic to repo mapping and imports, lets say we have dependencies A, B and C, where C depends on A v1 and B depends on A v2. I would assume that all of them are bazel modules and have a MODULE.bazel file and they define py_library targets. Then each module have a repo mapping file and the C using targets like @A//foo:bar will resolve to A~v1 and B would in the same way resolve to A~v2. If you introduce a module D that depends on both, C and B, then something needs to do an MVS in order to resolve a single version of A to be used when you have a target py_binary(..., deps=["@B//foo", "@C//bar"]). I thought that the MVS would be done by bzlmod and that there is nothing that rules_python needs to do unless there are PyPI dependencies involved, in which case the pip extension would need to merge the dependency trees.

However, you may still want to depend on A~v1 in D and be able to do from A_legacy import foo.bar. In that situation if there is a mixing of A~v1 and A~v2 in the same closure, then we have no way to uniquely identify which one we need, because there is no way to uniquely perform import foo from the external dep A and get the right version. It seems that bzlmod supports that, however, python runtime would fail to correctly handle it because the importlib is not repo mapping aware and we would just fail or be non-deterministic at runtime.

I guess I see the problem with my suggestion of "the import just needs to not include the repo name" and repo mapping in general, so having a repo mapping aware solution of constructing the import path would be better. Let me know if I am seeing a different problem from what you are seeing with diamond dependencies. IMHO the scope of solving this properly is reasonably large and given the complexity of Python and what it allows and what it doesn't I am not sure if it is even feasible to have a robust method of making import repo mapping aware. And I am not sure how this maps to projects that are available via both, wheels and bazel (e.g. @rules_python//python/runfiles having different import paths), should a build system you use influence the import paths?

Up until now we wanted to separate the two problems of distributing code and using it using rules_python and this FR is somewhat making this distinction not as clear. bzlmod now would be used not only for you build-tool dependency graph, but for your application code dependency graph and this expands the scope of rules_python. Someone would have to write a prototype (and probably a design doc) and we should probably discuss this in a Google doc as GH issues might have its limitations as a discussion medium.

EDIT: I am wondering if we should also tie this together to imports attribute of py_library - if the imports is specified, then we should not use runfiles, if it is not specified, then we could use runfiles and repo mapping to import the right things.

Any feature that we add should improve correctness of the builds and make things unambiguous. The current behaviour of the workspace name being useable in the Python import statements is magical and I would like to move away from that as far as possible.

@rickeylev
Copy link
Collaborator

I didn't read every word here (I only had about an hour here), but I think I read enough to respond to the main point.

Let me be clear: Having the Bazel repo name decide the Python import name is not sound. The two things are different object models for different things. It was only tenuously working in workspace. Under bzlmod, with repo mapping and name munging, it quickly becomes ambiguous and complex. You might, as a library owner, choose to same name for your top-level Python package and Bazel repo/Bazel module name. That's fine, go for it. The best way to do this today is to use the imports attribute to ensure the directory you want, with the names you want, end up on sys.path.

For workspace builds, the key issue is that the end user decides a repo name. It's perfectly valid to download the same code under two different repo names. It might work, or not.

For bzlmod builds, the key issue is repo mapping. One module's @foo may not be the same as another module's @foo, or a module map remap a repo to another name. Does the repo mapping even distinguish between apparent names? While us humans might think of @chromite as the "logically canonical" name. To bzlmod, it's just another apparent name, just like @remapped_chromite is. They're both just apparent names for the canonical @whatever~bla~bla canonical repo name.

@aignas aignas removed this from the v1.0.0 milestone Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants