Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using intra-monorepo python dependencies without publishing them to pypi #6455

Closed
sherifnada opened this issue Sep 27, 2021 · 5 comments
Assignees
Labels
area/connectors Connector related issues area/developer Everything related to build, developer UX build DX lang/python team/extensibility type/enhancement New feature or request

Comments

@sherifnada
Copy link
Contributor

Tell us about the problem you're trying to solve

Today it is not easy to use intra-repo python dependencies in published docker images because our build system assumes that the Docker build context of a particular connector is everything under the directory of that connector.

For example, let's say airbyte-integrations/source-s3 depends on a helper library airbyte-integrations/source-abstract-blobs. We could theoretically go to source-s3/requirements.txt and insert a line like -e ../source-abstract-blobs and run pip install -r requirements.txt to get that dependency and all would work well locally.

However, this is more difficult to do when buildling source-s3's docker image because using something like pip install -e creates a symlink. But if you look at the directory structure, source-s3/Dockerfile's build context does not contain the file pointed to by that symlink i.e: the symlink will point to airbyte-integrations/source-abstract-blobs which is outside the airbyte-integrations/source-s3 directory.

Describe the solution you’d like

I want to be able to re-use python packages in other python packages inside the monorepo without having to publish them to pypi.

There's a few things we could do to get around this:

  1. change our build tooling to make the build context of a Docker image one or two levels higher in the filesystem. This would be a massive pain as anyone writing a Dockerfile would have to relativize their paths according to wherever the build context was i.e: you can't say COPY ./file ./file you have to say COPY ./airbyte-integrations/source-s3/file ./file. This is pretty unintuitive and creates a fork between the logic a developer runs to test their connector and the one run in CI.
  2. We could use something like pip install -t which would "copy" the package into the venv directory. This could work with some caveats:
    1. It needs to happen before the docker image is built (since like we explained above the build context isn't changing), meaning we first need to run pip install -t etc... before running any docker commands so the .venv directory is populated with all the right packages
    2. We need to stop excluding the .venv directory from the build context. This is workable as it is the current approach we use for java packages.
    3. If we do the above, then when pip install runs during docker build it will have access to any intra-monorepo dependencies via their presence in the .venv directory. However, we need to be super careful that this creates reproducible builds (holding the git commit fixed).
  3. there is probably another solution i'm not thinking of
@Phlair
Copy link
Contributor

Phlair commented Apr 11, 2022

I've found a potential 3rd option, but it's quite specific to the case of a connector-y wants to depend on a connector-x where connector-x is likely an abstraction (e.g. source-s3 wants to build upon source-files-abstract).

The child connector can replace FROM python:3.7-slim as base with FROM airbyte/source-files-abstract as base at the top of its Dockerfile.
Since the abstract connector will be based on python:X.X, we end up with a base for the child that effectively preinstalls the desired abstract module with its prerequisites on top of the Docker base.

Pros:

  • Easy DX.
    • local development works with the one line symlink addition to requirements (as Sherif describes in the initial issue comment)
    • docker images / gradle commands will work with the one line change in the Dockerfile.
  • No need to change any wider build tooling.

Cons:

  • Limited.
    • not sure this could handle multiple inheritance (haven't really tested, may be possible to base off multiple other images?)
  • Requires prerequisite to be published as Docker image.
    • not necessarily a problem so much as a dissonance from the way the cdk is published & used.

@sherifnada wdyt?

@Phlair
Copy link
Contributor

Phlair commented Apr 11, 2022

After some more tinkering, I think I can extend the above idea into a fully functional solution. Longer write-up coming asap.

@sherifnada
Copy link
Contributor Author

@Phlair that sounds very intriguing and definitely could worK ( I believe we used this for the forked file connector). Do you want to propose a spike/experiment for this in backlog grooming tomorrow?

@Phlair
Copy link
Contributor

Phlair commented Apr 11, 2022

Yeah for sure!

I'll write up where I'm at with it so far in this doc before grooming.

@Phlair
Copy link
Contributor

Phlair commented Apr 14, 2022

@Phlair Phlair closed this as completed Apr 14, 2022
Repository owner moved this from In Progress - current sprint to Done in API Sources DX Roadmap Apr 14, 2022
@sherifnada sherifnada moved this from Done to Blocked in API Sources DX Roadmap Apr 14, 2022
@sherifnada sherifnada moved this from Blocked to Archive (Completed in previous sprints) in API Sources DX Roadmap Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/developer Everything related to build, developer UX build DX lang/python team/extensibility type/enhancement New feature or request
Projects
Status: Archive (Completed in previous sprints)
Development

No branches or pull requests

3 participants