Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel breaks python import with __init__.py #55

Closed
mouadino opened this issue Dec 29, 2017 · 17 comments
Closed

Bazel breaks python import with __init__.py #55

mouadino opened this issue Dec 29, 2017 · 17 comments

Comments

@mouadino
Copy link

mouadino commented Dec 29, 2017

System information:

  • Bazel version: 0.9.0
  • Python 3.6
  • Operation System: Debian 9.0
  • Context Runner: Docker

Description

I have few examples of how Bazel introduce __init__.py in places where none was and by doing this it breaks python import, e.g.

Python dependencies that are package namespaces.

An example is zope.interface package, which is a dependency of pyramid library same for zope.deprecation, we have some code that both when tests and one code is run it fails with ModuleNotFoundError: No module named 'zope.interface'.

I spend some time investigating this issue and the result of my investigation lead to two conclusions:

  • Bazel introduce __init__.py files and break namespacing
  • Python namespaces wheels installed in no-site directories wouldn't work

Let's looks at each one of them in details:

Bazel introduce __init__.py files and break namespacing

Looking at what wheel contains (using unzip -l ....whl) and how the directory is layout is in the .cache/bazel/<...>/external/pypi__zope_interface_4_4_0/, we see the same structure:

+ zope
    | + interface
          | - __init__.py
          | - _compat.py
          | ....
- zope.interface-4.4.0-py3.6-nspkg.pth
+ zope.interface-4.4.0.dist-info

While if you look inside the .runfiles of a py_binary or py_test rule you will see some __init__.py files being introduced e.g. in ls bazel-out/k8-fastbuild/bin/<some-repository-path>/default_test.runfiles/pypi__zope_interface_4_4_0/

- __init__.py    <<<<<<<<<<<<<<<<<<
+ zope
  | - __init__.py    <<<<<<<<<<<<<<<<<<<<<
   | + interface
         | - __init__.py
         | - _compat.py
         | ....
- zope.interface-4.4.0-py3.6-nspkg.pth
+ zope.interface-4.4.0.dist-info

Those __init__.py break the logic inside .pth that make sure that a package namespace is created.

Python namespaces wheels installed in no-site directories wouldn't work

The above is not all the story, because even if you remove __init__,py files, still namespace packages like zope.interface will not work as they should be b/c this later relay on the fact that the .pth files are executed which patch python module's __path__ of the given package (and that no __init__.py file exists in the subdirectory hence the first issue), however .pth files (like zope.interface-4.4.0-py3.6-nspkg.pth) are only parsed if the enclosing directory is considered a site directory as mentioned by documentation here:

A path configuration file is a file whose name has the form name.pth and exists in one of the four directories mentioned above; ...

Which refer to the paragraph before where it said.

It starts by constructing up to four directories from a head and a tail part. For the head part, it uses sys.prefix and sys.exec_prefix; empty heads are skipped. For the tail part, it uses the empty string and then lib/site-packages (on Windows) or lib/pythonX.Y/site-packages (on Unix and Macintosh)

Now long story, short, the .pth files are not parsed just by adding a directory to PYTHONPATH which is what the runner here, instead you will want to do add something like:

import site

for dir in python_imports.split(':'):
    if any(fpath.endswith('.pth') for fpath in os.listdir(dir)):
       site.addsitedir(dir)

Running tests with pytest

We are using pytest as a test runner, but for some of our existing code, switching to Bazel is breaking the tests, reason for this is again because of the introduced __init__.py files.

To give more details, our monorepo has the following structure:

+ services
   | + some_api
        | - BUILD.bazel
          - setup.py
          + services
             | - __init__.py
               - some_code.py
         
+ libraries
    | + python
        | + some_library
           | - setup.py
             - BUILD.bazel
             + shared

Inside BUILD.bazel we have rules to run tests by calling a python script that use pytest, again the same issue is that import is broken in some of our code that relies on import side effect (not good but hard to change) b/c same python modules get imported twice with different __name__ one is libraries.python.some_library... and another time with some_library.... and this is due to first pytest autodiscovery magic but mostly because Bazel introduce __init__.py files in libraries/ folder and libraries/python/ folder, which sadly confuse pytest.

Workarounds

So far, we are working around the later issue by misusing --run_under Bazel flag to patch .runfiles and delete __init__.py in those places where they shouldn't be any.

To give an idea of the snippet of the code that we have in the script passed to --run_under here it is:

for entry in $(ls $RUNFILES_DIR); do
  if [[ "$entry" == pypi__* ]]; then
    python_ns_package='no'
    for subentry in $(ls $RUNFILES_DIR/$entry); do
      if [[ "$subentry" == *.pth ]]; then
        python_ns_package='yes'
        break
      fi
    done

    if [[ "$python_ns_package" == 'yes' ]]; then
      rm -f $RUNFILES_DIR/$entry/__init__.py

      for subentry in $(ls $RUNFILES_DIR/$entry); do
        if [ -f $RUNFILES_DIR/$entry/$subentry/__init__.py ]; then
          rm -f $RUNFILES_DIR/$entry/$subentry/__init__.py
        fi
      done

      if [ ! -z $SRV_NAME ]; then
        sed -i "\|#!/usr/bin/env python|a import site; site.addsitedir('$entry')" $RUNFILES_DIR/__main__/services/$SRV_NAME/service_test
      elif [ ! -z $LIB_NAME ]; then
        sed -i "\|#!/usr/bin/env python|a import site; site.addsitedir('$entry')" $RUNFILES_DIR/__main__/libraries/python/$LIB_NAME/default_test
      fi
    fi
  fi
done

It's really ugly but fixes the test. However the workaround is harder to implement when we integrate with py_image rule from docker_rules.

Proposed solution

I still don't understand why inserting __init__.py is needed and in which case, but IMHO such changes mess up with the way Python work and worst create a different environment than the one you see, so my proposal maybe adds a new flag in Bazel test and build to skip the __init__.py creation or maybe better (read local control) add a flag to py_binary and py_test rules to skip auto __init__.py creation, thoughts?

N.B. While Bazel has been great addon to our stack and I am very grateful to the effort the community put in to make it better, I am a bit unhappy with the fact that python rules are written not as Bazel extensions (using Skylar) but in Java, because this way fixing issues like the above is not as easy as forking repository changing code and changing WORKSPACE to use fork :(

@mattmoor
Copy link
Contributor

/cc @ulfjack

@mouadino
Copy link
Author

Let me know whether this a real issue or just misunderstanding on my part if the former I would like to provide a fix for this.

@duggelz
Copy link

duggelz commented Jan 1, 2018

It's a legacy behavior from the internal version of Bazel (Blaze), for Google's particular Python peculiarities. It should definitely be cleaned up in Bazel, but it will be a breaking change, so it would probably need to be mediated by a flag or per-target attribute, as you suggest.

@c4urself
Copy link

c4urself commented Jan 3, 2018

Running into a similar problem with our Python project, the extra __init__.py files break imports.

@raliste
Copy link

raliste commented Jan 12, 2018

We are seeing the same issue with the google namespace when using protobuf and google-cloud-storage. Any workarounds?

protocolbuffers/protobuf#1296

cc @duggelz

@ulfjack
Copy link

ulfjack commented Jan 16, 2018

If someone sends a change to add a flag to Bazel or a tag to py_binary / py_test, I'm happy to review it. Unfortunately, I won't be able to spend any time on this myself.

mouadino added a commit to mouadino/bazel that referenced this issue Jan 17, 2018
Introduce a new attribute to py_binary and py_test to control whether to
create `__init__.py` or not.

Fixes bazelbuild/rules_python#55
@mouadino
Copy link
Author

@ulfjack I have a PR ready for review bazelbuild/bazel#4470, please take a look.

@mouadino
Copy link
Author

mouadino commented Jan 28, 2018

Just to clarify the patch above only fix one issue we still need to introduce the other fix which makes python package namespaces works which is by introducing import site; site.addsitedir(..) to https://github.com/bazelbuild/bazel/blob/ba66e72295d763e43b92179ce6bd46476084a548/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt for the specific libraries, I will work on a patch when the other is ready to be merged or merged.

@fortuna
Copy link

fortuna commented Jan 30, 2018

FYI, I'm also seeing the issue with google.cloud.bigquery. I believe all of google.* and google.cloud.* is broken because of this issue.

@ulfjack
Copy link

ulfjack commented Jan 30, 2018

Lukacs just added a Python integration test in bb9f19d4d819a751fbfbe11b2a7827c05680944d (no idea how to reference that from here).

mouadino added a commit to mouadino/bazel that referenced this issue Feb 10, 2018
Introduce a new attribute to py_binary and py_test to control whether to
create `__init__.py` or not.

Fixes bazelbuild/rules_python#55
bazel-io pushed a commit to bazelbuild/bazel that referenced this issue Feb 14, 2018
*** Reason for rollback ***

Breaks Kokoro and I accidentally submitted the change without presubmit checks.

*** Original change description ***

Make __init__.py files creation optional

Introduce a new attribute to py_binary and py_test to control whether to
create `__init__.py` or not.

Fixes bazelbuild/rules_python#55

Closes #4470.

PiperOrigin-RevId: 185676592
@fortuna
Copy link

fortuna commented Feb 14, 2018

When should we expect the feature to be released?

It seems the commit was rolled back. Should the issue be reopened?

bazel-io pushed a commit to bazelbuild/bazel that referenced this issue Feb 15, 2018
*** Reason for rollback ***

Remove example changes; those need to build with the last Bazel release.

*** Original change description ***

Automated rollback of commit 0f9c6ea.

*** Reason for rollback ***

Breaks Kokoro and I accidentally submitted the change without presubmit checks.

*** Original change description ***

Make __init__.py files creation optional

Introduce a new attribute to py_binary and py_test to control whether to
create `__init__.py` or not.

Fixes bazelbuild/rules_python#55

Closes #4470.

PiperOrigin-RevId: 185806241
@ulfjack ulfjack reopened this Feb 15, 2018
@ulfjack
Copy link

ulfjack commented Feb 15, 2018

It was rolled back, rolled forward, rolled back again, and then rolled forward again.

bazelbuild/bazel@e9c885a will hopefully stick this time?

@ulfjack ulfjack closed this as completed Feb 15, 2018
@duggelz
Copy link

duggelz commented Feb 16, 2018

The second bug around .pth packages seems a little tricky to me. I don't think adding the code suggested to the python_stub_template.txt will help, because that's actually a different Python process than the one actually doing the import. (The user invokes the stub, the stub sets up the PYTHONPATH, then the stub invokes the "real" Python program).

@mouadino
Copy link
Author

It's good that you brought that up because first, the previous bug fix provided doesn't fix fully the namespace issue as mentioned here #55 (comment), maybe we should reopen the issue, as I am planning to provide a fix over the weekend for the namespace issue unless someone beat me to it.

Now concerning the fix itself, fixing python_stub_template.txt, does work (b/c we use that fix with our workaround) however it only works, in this case os.execv is used (here) i.e. when it's not running from a Zip, for the later I am not sure yet how to fix yet.

@fejta
Copy link

fejta commented Jun 13, 2018

So is this merged?

@c4urself
Copy link

@fejta -- yep, you set legacy_create_init = 0 in your py_test or py_binary, see: https://docs.bazel.build/versions/master/be/python.html#py_test.legacy_create_init

@therc
Copy link

therc commented Sep 24, 2018

Is there a plan for the .pth fix? There are a bunch of duplicates/variants of this problem in this repository's issues alone: #14, #65, #93.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants