-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Backend + SDK): Update kfp backend and kubernetes sdk to support EmptyDir #10913
feat(Backend + SDK): Update kfp backend and kubernetes sdk to support EmptyDir #10913
Conversation
Hi @gregsheremeta. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
kubernetes_platform/python/test/snapshot/data/empty_dir_mounts.py
Outdated
Show resolved
Hide resolved
/ok-to-test |
Thanks for tackling this! It's an important gap between v1 and v2. |
@chensun can I ask for your review, or can you recommend someone to review? thanks. |
@gregsheremeta I can help with the backend review, but for SDK we have only @connor-mccarthy. I can review both and see how can we proceed with approvals. |
thanks. For what it's worth, the SDK piece is fairly simple, and Chen gave a preliminary lgtm over in #10892. It follows the (already approved and merged) example set in #10410. The key thing to review there is that I included the important fields. When you are doing an emptydir mount in kube, there are four pieces of information required (ref: https://kubernetes.io/docs/reference/kubernetes-api/config-and-storage-resources/volume/#local-temporary-directory) : name, mount path, medium (optional), and size limit (optional) |
actually, you may have been referring to the code in |
The failing tests are expected. When #10892 is merged, I'll rebase this, and then the tests will work @chensun @james-jwu @zijianjoy @connor-mccarthy can I ask for your review, or can you recommend someone to review? thanks. |
/rerun-all |
kubernetes_platform/python/test/snapshot/data/empty_dir_mounts.yaml
Outdated
Show resolved
Hide resolved
/lgtm |
@chensun @connor-mccarthy I am not sure why these tests are failing, but I would appreciate your review of this feature, as not being able to mount If you are wondering why this is important, it's because PyTorch expects there to be a tempfs mounted at |
@gregsheremeta FYI all the GitHub Actions tests are mandatory. Some of the Prow ones may be optional. You can check which of them are optional here. |
a8374c2
to
1063e9c
Compare
97aac83
to
caf3539
Compare
… EmptyDir Update kfp backend and kubernetes sdk to support mounting EmptyDir volumes to task pods. Inspired by kubeflow#10427 Fixes: kubeflow#10656 Signed-off-by: Greg Sheremeta <[email protected]>
caf3539
to
fb9d32a
Compare
Added some nitpick error handling and logging suggestions for the driver code. The Kubernetes Platform / SDK code and the corresponding tests LGTM. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/lgtm Thanks! |
@chensun this is ready for your final review / merge. Can you take a look? |
@chensun bumping :) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chensun, HumairAK The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Congrats on the merge, @gregsheremeta. Thank you for tackling this meaningful contribution! 🔥 |
this should now be available in KFP 2.3.0 / SDK 2.9.0 |
… EmptyDir (#10913) Update kfp backend and kubernetes sdk to support mounting EmptyDir volumes to task pods. Inspired by #10427 Fixes: #10656 Signed-off-by: Greg Sheremeta <[email protected]> Signed-off-by: KevinGrantLee <[email protected]>
… EmptyDir (kubeflow#10913) Update kfp backend and kubernetes sdk to support mounting EmptyDir volumes to task pods. Inspired by kubeflow#10427 Fixes: kubeflow#10656 Signed-off-by: Greg Sheremeta <[email protected]>
…etters (#11097) * temp title: change title Signed-off-by: KevinGrantLee <[email protected]> * add release notes Signed-off-by: KevinGrantLee <[email protected]> * formatting Signed-off-by: KevinGrantLee <[email protected]> * feat(backend): move comp logic to workflow params (#10979) * feat(backend): move comp logic to workflow params Signed-off-by: zazulam <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: zazulam <[email protected]> * address pr comments Signed-off-by: zazulam <[email protected]> * Use function name instead of base name and address edge cases Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Improve logic and update tests Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * POC hashing command and args Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Add comments to clarify the logic Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> * Hash entire PipelineContainerSpec Signed-off-by: droctothorpe <[email protected]> Co-authored-by: zazulam <[email protected]> --------- Signed-off-by: zazulam <[email protected]> Signed-off-by: droctothorpe <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * feat(component): internal Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 660985413 Signed-off-by: KevinGrantLee <[email protected]> * feat(components): internal Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 661332120 Signed-off-by: KevinGrantLee <[email protected]> * fix(components): Fix to model batch explanation component for Structured Data pipelines Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 661475667 Signed-off-by: KevinGrantLee <[email protected]> * feat(components): Support dynamic values for boot_disk_type, boot_disk_size in preview.custom_job.utils.create_custom_training_job_from_component Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 662242688 Signed-off-by: KevinGrantLee <[email protected]> * chore: Upgrade Argo to v3.4.17 (#10978) Signed-off-by: Giulio Frasca <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * test: Moved kubeflow-pipelines-manifests to GitHub Actions (#11066) Signed-off-by: vmudadla <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix: re-enable exit hanler test. (#11100) Signed-off-by: Liav Weiss (EXT-Nokia) <[email protected]> Co-authored-by: Liav Weiss (EXT-Nokia) <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(frontend): retrieve archived logs from correct location (#11010) * fix(frontend): retrieve archived logs from correct location Signed-off-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: owmasch <[email protected]> * Add namespace tag handling and validation Signed-off-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: owmasch <[email protected]> * Remove whitespace from keyFormat Signed-off-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: owmasch <[email protected]> * Update frontend unit tests Signed-off-by: droctothorpe <[email protected]> * Remove superfluous log statements Signed-off-by: droctothorpe <[email protected]> Co-authored-by: quinnovator <[email protected]> * Add link to keyFormat in manifests Signed-off-by: droctothorpe <[email protected]> * Fix workflow parsing for log artifact Signed-off-by: droctothorpe <[email protected]> Co-authored-by: quinnovator <[email protected]> * Fix unit test Signed-off-by: droctothorpe <[email protected]> --------- Signed-off-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: owmasch <[email protected]> Co-authored-by: quinnovator <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * feat(component): internal Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 663774557 Signed-off-by: KevinGrantLee <[email protected]> * feat(component): internal Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 663872006 Signed-off-by: KevinGrantLee <[email protected]> * chore(components): GCPC 2.16.1 Release Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 663883139 Signed-off-by: KevinGrantLee <[email protected]> * test: Fail fast when image build fails on tests #11102 (#11115) * Fail fast when image build fails on tests #11102 Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> * Fail fast when image build fails on tests #11102 Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> --------- Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> Co-authored-by: Elay Aharoni (EXT-Nokia) <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(components): Use instance.target_field_name format for text-bison models only, use target_field_name for gemini models Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 665638487 Signed-off-by: KevinGrantLee <[email protected]> * chore: Renamed GitHub workflows from *.yaml to *.yml for consistency (#11126) Signed-off-by: hbelmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * Fix view edit cluster roles (#11067) * Fixing incorrect typing in loop_parallism example Signed-off-by: Oswaldo Gomez <[email protected]> * Fixing samples/core/loop_parameter example Signed-off-by: Oswaldo Gomez <[email protected]> * Fixing aggregate-to-kubeflow-pipelines-edit Signed-off-by: Oswaldo Gomez <[email protected]> * keeping MRs separate. Signed-off-by: Oswaldo Gomez <[email protected]> * Adding blank line Signed-off-by: Oswaldo Gomez <[email protected]> --------- Signed-off-by: Oswaldo Gomez <[email protected]> Co-authored-by: Oswaldo Gomez <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(components): Pass moddel name to eval_runner to process batch prediction's output as per the output schema of model used Signed-off-by: Googler <[email protected]> PiperOrigin-RevId: 665977093 Signed-off-by: KevinGrantLee <[email protected]> * feat(components): release LLM Model Evaluation image version v0.7 Signed-off-by: Jason Dai <[email protected]> PiperOrigin-RevId: 666102687 Signed-off-by: KevinGrantLee <[email protected]> * chore: Adding @DharmitD to SDK reviewers (#11131) Signed-off-by: ddalvi <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * test: Kubeflow Pipelines V2 integration Tests (#11125) Signed-off-by: Diego Lovison <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Add make targets for building driver and launcher images (#11103) Signed-off-by: Giulio Frasca <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * feat(Backend + SDK): Update kfp backend and kubernetes sdk to support EmptyDir (#10913) Update kfp backend and kubernetes sdk to support mounting EmptyDir volumes to task pods. Inspired by #10427 Fixes: #10656 Signed-off-by: Greg Sheremeta <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * docs:fixing broken links in readme (#11108) Signed-off-by: Fiona Waters <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(deps): bump micromatch from 4.0.5 to 4.0.8 in /test/frontend-integration-test (#11132) Bumps [micromatch](https://github.com/micromatch/micromatch) from 4.0.5 to 4.0.8. - [Release notes](https://github.com/micromatch/micromatch/releases) - [Changelog](https://github.com/micromatch/micromatch/blob/4.0.8/CHANGELOG.md) - [Commits](micromatch/micromatch@4.0.5...4.0.8) --- updated-dependencies: - dependency-name: micromatch dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: KevinGrantLee <[email protected]> * Fix: Basic sample tests - sequential is flaky (#11138) Signed-off-by: Diego Lovison <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Wrapped "Failed GetContextByTypeAndName" error for better troubleshooting (#11098) Signed-off-by: hbelmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(components): Update AutoSxS and RLHF image tags Signed-off-by: Michael Hu <[email protected]> PiperOrigin-RevId: 668536503 Signed-off-by: KevinGrantLee <[email protected]> * test: Improvements to wait_for_pods function (#11162) Signed-off-by: hbelmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(frontend): fixes filter pipeline text box shows error when typing anything in it. Fixes #10241 (#11096) * Filter pipeline text box shows error when typing anything in it #10241 Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> * Filter pipeline text box shows error when typing anything in it #10241 Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> --------- Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> Co-authored-by: Elay Aharoni (EXT-Nokia) <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * correct artifact preview behavior in UI (#11059) This change allows KFP UI to fallback to UI host namespace when no namespaces are provided when referencing the artifact object store provider secret, in default kubeflow deployments this namespace is simply "kubeflow", however the user can customize this behavior by providing the environment variable "SERVER_NAMESPACE" to the KFP UI deployment. Further more, this change addresses a bug that caused URL parse to fail when parsing endpoints without a protocol, this will support such endpoint types as <ip>:<port> for object store endpoints, as is the case in the default kfp deployment manifests. Signed-off-by: Humair Khan <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Added DCO link to PR template (#11176) Signed-off-by: Helber Belmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(backend): Update driver and launcher licenses (#11177) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(backend): update driver and launcher default images (#11178) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Add instructions for releasing driver and launcher images (#11179) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * test: Fixed `kfp-runtime-tests` to run on master branch (#11158) Signed-off-by: hbelmiro <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * (fix): reduce executor logs (#11169) * remove driver logs from executor These logs congest the executor runtime logs making it difficult for the user to differentiate between logs. The driver logs are unnecessary here and can be removed to reduce this clutter. Signed-off-by: Humair Khan <[email protected]> * remove duplicate emissary call in executor As per the initial inline dev comment, argo podspecpatch did not add the emissary call, and had to be manualy added. This was fixed a couple of argo versions back. However, as a result executor pod makes double calls to the executor, which as a consequence also results in superflous logs. This change removes the additional call to emissary to resolve this. Signed-off-by: Humair Khan <[email protected]> --------- Signed-off-by: Humair Khan <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: add PaulinaPacyna and ouadakarim as reviewers (#11180) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * test: Move run-all-gcpc-modules to GitHub Actions (#11157) * add gcpc modules tests to gha Signed-off-by: Amanpreet Singh Bedi <[email protected]> * remove run-all-gcpc-modules test driver script Signed-off-by: Amanpreet Singh Bedi <[email protected]> * fix path under gcpc modules tests github action Signed-off-by: Amanpreet Singh Bedi <[email protected]> * upgrade ubuntu base image Signed-off-by: Amanpreet Singh Bedi <[email protected]> * upgrade python version to 3.9 Signed-off-by: Amanpreet Singh Bedi <[email protected]> --------- Signed-off-by: Amanpreet Singh Bedi <[email protected]> Signed-off-by: Amanpreet Singh Bedi <[email protected]> Co-authored-by: Amanpreet Singh Bedi <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(sdk): Kfp support for pip trusted host (#11151) Signed-off-by: Diego Lovison <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(sdk): Loosening kubernetes dependency constraint (#11079) * Loosening kubernetes dependency constraint Signed-off-by: egeucak <[email protected]> * added setuptools in test script Signed-off-by: egeucak <[email protected]> --------- Signed-off-by: egeucak <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Remove unwanted Frontend test files (#10973) Signed-off-by: ddalvi <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * fix(ui): fixes empty string value in pipeline parameters (#11175) Signed-off-by: Jan Staněk <[email protected]> Co-authored-by: Jan Staněk <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(backend): update driver and launcher default images (#11182) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(release): bumped version to 2.3.0 Signed-off-by: KevinGrantLee <[email protected]> * chore: Update RELEASE.md to remove obsolete instructions (#11183) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: Release kfp-pipeline-spec 0.4.0 (#11189) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: release kfp-kubernetes 1.3.0 (#11190) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore: update kfp-kubernetes release scripts and instructions (#11191) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * feat(sdk)!: Pin kfp-pipeline-spec==0.4.0, kfp-server-api>=2.1.0,<2.4.0 (#11192) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * chore(sdk): release KFP SDK 2.9.0 (#11193) Signed-off-by: Chen Sun <[email protected]> Signed-off-by: KevinGrantLee <[email protected]> * Delete test pipelines as they are duplicate with pipeline_with_resource_spec Signed-off-by: KevinGrantLee <[email protected]> --------- Signed-off-by: KevinGrantLee <[email protected]> Signed-off-by: zazulam <[email protected]> Signed-off-by: droctothorpe <[email protected]> Signed-off-by: Googler <[email protected]> Signed-off-by: Giulio Frasca <[email protected]> Signed-off-by: vmudadla <[email protected]> Signed-off-by: Liav Weiss (EXT-Nokia) <[email protected]> Signed-off-by: Elay Aharoni (EXT-Nokia) <[email protected]> Signed-off-by: hbelmiro <[email protected]> Signed-off-by: Oswaldo Gomez <[email protected]> Signed-off-by: Jason Dai <[email protected]> Signed-off-by: ddalvi <[email protected]> Signed-off-by: Diego Lovison <[email protected]> Signed-off-by: Greg Sheremeta <[email protected]> Signed-off-by: Fiona Waters <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Michael Hu <[email protected]> Signed-off-by: Humair Khan <[email protected]> Signed-off-by: Helber Belmiro <[email protected]> Signed-off-by: Chen Sun <[email protected]> Signed-off-by: Amanpreet Singh Bedi <[email protected]> Signed-off-by: Amanpreet Singh Bedi <[email protected]> Signed-off-by: egeucak <[email protected]> Signed-off-by: Jan Staněk <[email protected]> Co-authored-by: Michael <[email protected]> Co-authored-by: droctothorpe <[email protected]> Co-authored-by: andreafehrman <[email protected]> Co-authored-by: MonicaZhang1 <[email protected]> Co-authored-by: kylekaminky <[email protected]> Co-authored-by: CarterFendley <[email protected]> Co-authored-by: Googler <[email protected]> Co-authored-by: Giulio Frasca <[email protected]> Co-authored-by: Vani Haripriya Mudadla <[email protected]> Co-authored-by: Liav Weiss <[email protected]> Co-authored-by: Liav Weiss (EXT-Nokia) <[email protected]> Co-authored-by: owmasch <[email protected]> Co-authored-by: quinnovator <[email protected]> Co-authored-by: ElayAharoni <[email protected]> Co-authored-by: Elay Aharoni (EXT-Nokia) <[email protected]> Co-authored-by: Helber Belmiro <[email protected]> Co-authored-by: Oswaldo Gomez <[email protected]> Co-authored-by: Oswaldo Gomez <[email protected]> Co-authored-by: Jason Dai <[email protected]> Co-authored-by: Dharmit Dalvi <[email protected]> Co-authored-by: Diego Lovison <[email protected]> Co-authored-by: Greg Sheremeta <[email protected]> Co-authored-by: Fiona Waters <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Hu <[email protected]> Co-authored-by: Humair Khan <[email protected]> Co-authored-by: Chen Sun <[email protected]> Co-authored-by: aman23bedi <[email protected]> Co-authored-by: Amanpreet Singh Bedi <[email protected]> Co-authored-by: ege uçak <[email protected]> Co-authored-by: Jan Staněk <[email protected]> Co-authored-by: Jan Staněk <[email protected]>
Description of your changes:
Update kfp backend and kubernetes sdk to support mounting EmptyDir volumes to task pods.
(Based on #10892 and needs rebase after that is merged first)
Inspired by #10427
Fixes: #10656
Checklist: