Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): using component's pip_index_urls for Dockerfile generation. Fixes #8816 #8871

Merged
merged 7 commits into from
Mar 1, 2023

Conversation

b4sus
Copy link
Contributor

@b4sus b4sus commented Feb 20, 2023

As described in issue #8816, pip_index_urls can be used for pip install command inside of generated Dockerfile - handy for containerized python components.
The PR is reusing same code for generating the index url options string, which is used for lightweight python component.

@google-oss-prow
Copy link

Hi @b4sus. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b4sus, this looks great! Thank you for the contribution.

Just a few small nitpicks. Could you also add a release note to ./sdk/RELEASE.md that describes this as a feature to support pip index_urls in Containerized Python Components?

sdk/python/kfp/cli/component.py Outdated Show resolved Hide resolved
if comp.pip_index_urls is not None:
pip_index_urls.extend(comp.pip_index_urls)
if len(pip_index_urls) > 0:
self._pip_index_urls = list(dict.fromkeys(pip_index_urls))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what's happening here? It seems like this converts a list to dict then back to list, but it's possible I'm missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little trick to get rid of possible duplicates preserving order (dict guarantees insertion order since 3.7, see here.
I will remove this as it is a bit too defensive and it is not done for any other list arguments I've seen - to be consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! Thank you for explaining and good thinking.

I actually think it makes sense to keep this deduping. I can imagine containerizing many components that each use the same single URL. We don't need to list that package registry multiple times. This is probably the most realistic use case, since combining the pip_index_urls of components that each use a different registry wouldn't really make sense.

And the order preserving is good, too. There's no logical total ordering across multiple components, but it's easy enough to preserve a partial ordering, so let's do it.

The entries in the requirements file (runtime-requirements.txt) are deduplicated (via set()) and have some of the same concerns/considerations that arise from combining the component parameters for multiple components, so there's one precedent for deduping.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the deduplication is actually necessary when more components in a file share the same pypi url, dockerfile needs it only once - what is I guess expected use-case - company having own pypi index.
I've put it back in :)
Preserving order is nice to have for unit tests, otherwise set would be fine (like in runtime-requirements case).


WORKDIR /usr/local/src/kfp/components
COPY runtime-requirements.txt runtime-requirements.txt
RUN pip install --index-url https://pypi.org/simple --trusted-host https://pypi.org/simple --extra-index-url https://example.com/pypi/simple --trusted-host https://example.com/pypi/simple --no-cache-dir -r runtime-requirements.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not your code; feel free to ignore) Looks like string joiner implementation results in two spaces between the first entry and the second (before the first --extra-index-url).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@connor-mccarthy
Copy link
Member

/ok-to-test

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thank you, @b4sus! This is a valuable contribution.

@connor-mccarthy
Copy link
Member

/remove-lgtm
/remove-approve

Just caught this: looks like the release notes entry is under 2.0.0-beta.13, which was already released yesterday. Can you bump the note to the "Current Version" section?

@google-oss-prow
Copy link

@b4sus: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipelines-sdk-execution-tests 0db8d2d link true /test kubeflow-pipelines-sdk-execution-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@connor-mccarthy
Copy link
Member

/lgtm
/approve

Thanks again, @b4sus!

@google-oss-prow google-oss-prow bot added the lgtm label Mar 1, 2023
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: connor-mccarthy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@connor-mccarthy connor-mccarthy merged commit ed66ba3 into kubeflow:master Mar 1, 2023
@b4sus b4sus deleted the feat-pip-urls-dockerfile branch March 2, 2023 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants