Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipelineRun with embedded pipelineSpec and taskSpec #166

Merged
merged 3 commits into from
Jun 24, 2020

Conversation

ckadner
Copy link
Member

@ckadner ckadner commented Jun 1, 2020

Which issue is resolved by this Pull Request:
Resolves #143

Description of your changes:
Convert the YAML produced by the kfp-tekton compiler:

  • from multiple documents (Conditions, Tasks, Pipeline, PipelineRun) into
  • into a single PipelineRun with embedded pipelineSpec and taskSpecs

This will simplify the required work for down-stream integration with the KFP API and KFP UI.

Outstanding Work:
TBD

Environment tested:

  • Python Version (use python --version): 3.7.5
  • Tekton Version (use tkn version): 0.11.0
  • Kubernetes Version (use kubectl version): 0.15
  • OS (e.g. from /etc/os-release): Mac OS

Test cases that still need to be fixed (6/15/2020):

  • big_data_passing.yaml
  • condition.yaml
    • missing super-condition, solved by kubectl -n kubeflow apply -f https://raw.githubusercontent.com/kubeflow/kfp-tekton-backend/master/manifests/kustomize/base/tekton/catalog-condition.yaml
  • input_artifact_raw_value.yaml
    • unknown field "artifacts"
  • katib.yaml
  • pipelineparams.yaml
    • [download : main] sh: results.downloaded-resultoutput.path: not found
  • pipeline_transformers.yaml
    • missing task/pod annotations
  • resourceop_basic.yaml
    • invalid input params: missing values for these params which have no default values: [success-condition failure-condition]
  • volume_op.yaml
    • invalid input params: missing values for these params which have no default values: [success-condition failure-condition]
  • volume_snapshot_op.yaml

Issues to track the outstanding fixes:

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

@kubeflow-bot
Copy link

This change is Reviewable

@ckadner ckadner force-pushed the issue_143_embedded_pipelineSpec branch from f4f64c7 to bd23726 Compare June 1, 2020 22:47


def fix_big_data_passing(workflow: List[Dict[Text, Any]]) -> List[Dict[Text, Any]]: # Tekton change signature
def fix_big_data_passing(workflow: dict) -> dict:
Copy link
Member

@fenglixa fenglixa Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @ckadner , could you please don't change the type of input parma for fix_big_data_passing? as I already hanled it in another PR with List type, and fixed big data params in pipelinerun as well. Thanks ckadner!

'metadata': {
'name': pipeline.name or 'Pipeline',
'annotations': {
'name': sanitize_k8s_name(pipeline.name or 'Pipeline', suffix_space=4) + '-run',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We no longer need the -run at the end because there won't by any duplicated pipeline name in the embedded spec.

'name': pipeline.name or 'Pipeline',
'annotations': {
'name': sanitize_k8s_name(pipeline.name or 'Pipeline', suffix_space=4) + '-run',
'annotations': { # TODO: do annotations apply to PipelineRun?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, annotations in pipelinerun do apply to all tasks which is the same behavior.

@Tomcli
Copy link
Member

Tomcli commented Jun 5, 2020

I will take a deeper look with the api and see any errors within the integration.

for image_pull_secret in pipeline.conf.image_pull_secrets:
service_template['imagePullSecrets'] = [{'name': image_pull_secret.name}]

# TODO: service_template
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today the image secret pull PR is finally merged in Tekton, you can ignore this for now and I can create a separate PR to fix it with the new implementation.

@ckadner
Copy link
Member Author

ckadner commented Jun 5, 2020

tektoncd/pipeline #2767: Unsupported features for embedded pipelineSpec, taskSpec?

Copy link
Member

@Tomcli Tomcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add comments to the missing parts as I'm running down all the tests.

params:
- name: create-pvc-name
value: $(tasks.create-pvc.results.name)
taskSpec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The volumes from task spec are missing.

tasks:
- name: download
params: []
taskSpec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, missing volumes for taskSpec.

tasks:
- name: download
params: []
taskSpec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sidecar spec is missing in this taskSpec

params: []
taskRef:
name: print-2
tasks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this transformer example is one of the missing features for task labels/annotations

@ckadner
Copy link
Member Author

ckadner commented Jun 9, 2020

Thanks @Tomcli -- I will need to make more changes to address the missing pieces.

@ckadner
Copy link
Member Author

ckadner commented Jun 9, 2020

/hold -- wait for merge of Add end-to-end compiler tests #174 so we can verify the changes end-to-end

@ckadner ckadner force-pushed the issue_143_embedded_pipelineSpec branch 2 times, most recently from 3bf6234 to 08ef00b Compare June 16, 2020 03:30
@ckadner ckadner requested a review from fenglixa June 16, 2020 03:31
@ckadner
Copy link
Member Author

ckadner commented Jun 16, 2020

@fenglixa -- I tried to update your code in _data_passing_rewriter.py but I must have missed a few places or tripped up over one of the many mapping structures. Some of those mappings are no longer needed (i.e. embedded Tasks names) some need to find their targets in other sub-structures now.

Would you mind taking a look? If this requires a lot more effort than a just a few more small-ish changes, it might take a separate PR after this one gets merged (pending other fixes here).

@ckadner ckadner force-pushed the issue_143_embedded_pipelineSpec branch 2 times, most recently from 21c44ce to 7ac1ca4 Compare June 16, 2020 04:04
workflow = copy.deepcopy(workflow)

tasks = workflow["spec"]["pipelineSpec"]["tasks"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tasks = workflow["spec"]["pipelineSpec"]["tasks"]
tasks = workflow["spec"]["pipelineSpec"].get('tasks', []) + workflow["spec"]["pipelineSpec"].get('finally', [])

template for template in workflow if template['kind'] == 'PipelineRun'
]
# TODO: why multiple pipelineRuns
pipelinerun_template = workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider the cases of multiple pipelines or pipelineruns in one yaml file? I know it's not the case of current version, but do we need to leave the interfaces for future intergration? why close it?

@fenglixa
Copy link
Member

Hello @ckadner, I am sorry, I don't think such more code diff is needed for big data passing, even though the workflow structure has changed in this PR. Why?

  1. In my opinion, the main changes of this PR is integrate the tasks, pipeline, and pipelinerun together from list to a dict. So, in _data_passing_rewriter, we just need to know which one is task template, pipeline and pipelinerun. Then should be no much code diff inside of _data_passing_rewriter.py.

  2. There is not task name in pipelineSpec['tasks']['taskSpec'], so the task name should be used as pipelineSpec['tasks']['name'], same as the tasks defined for pipeline.

  3. task['taskRef']['name'] should be replaced to task['name'], etc...

After all, big data passing is more complex, not even kfp-tekton, but also in kfp, a little changes may lead to a big modification.

To make this PR go ahead firstly, I think you can ignore big data part, and log another issue to me, I will handle it after your PR commited.

Thanks.
BR
-Feng

@ckadner
Copy link
Member Author

ckadner commented Jun 16, 2020

  1. In my opinion, the main changes of this PR is integrate the tasks, pipeline, and pipelinerun together from list to a dict. So, in _data_passing_rewriter, we just need to know which one is task template, pipeline and pipelinerun. Then should be no much code diff inside of _data_passing_rewriter.py.

Yes, that is what I attempted :-)

  1. There is not task name in pipelineSpec['tasks']['taskSpec'], so the task name should be used as pipelineSpec['tasks']['name'], same as the tasks defined for pipeline.

I think I did that, no?

  1. task['taskRef']['name'] should be replaced to task['name'], etc...

Yup, ... thought I did.

I think where I tripped up, or where I would need much more time is with all of the mapping data structures. Do you want me to keep my changes and you continue, or would you like me to revert everything in _data_passing_rewriter for the scope of this PR?

@fenglixa
Copy link
Member

Do you want me to keep my changes and you continue, or would you like me to revert everything in _data_passing_rewriter for the scope of this PR?

Yes, you could keep your changes, and I will continue on it. Thanks @ckadner

@ckadner ckadner force-pushed the issue_143_embedded_pipelineSpec branch from 7ac1ca4 to 24e57bc Compare June 16, 2020 21:06
@ckadner
Copy link
Member Author

ckadner commented Jun 16, 2020

@Tomcli @animeshsingh

there are some test cases that still need to be fixed. Feng agreed to tackle the big_data_passing in a subsequent PR. I am still trying to fix the others but we could opt to do those in (a) separate PR(s) as well:

  • big_data_passing.yaml (@fenglixa agreed to tackle this in a subsequent PR)
  • condition.yaml (may need help from @drewbutlerbb4 @Tomcli)
    • missing super-condition
    • kubectl -n kubeflow apply -f https://raw.githubusercontent.com/kubeflow/kfp-tekton-backend/master/manifests/kustomize/base/tekton/catalog-condition.yaml
  • input_artifact_raw_value.yaml (might depend on big_data_passing fix)
    • unknown field "artifacts" ... missing stepTemplate with volumeMounts
  • pipelineparams.yaml (might depend on big_data_passing fix)
    • [download : main] sh: results.downloaded-resultoutput.path: not found
  • pipeline_transformers.yaml (Tekton dependency)
    • missing task/pod annotations

@Tomcli
Copy link
Member

Tomcli commented Jun 17, 2020

Thanks @ckadner
/lgtm

@ckadner
Copy link
Member Author

ckadner commented Jun 18, 2020

Thanks for your review and fixes @Tomcli

@animeshsingh -- if you approve to merge this PR, then @fenglixa can start working on the subsequent fixes for big_data_passing (#189)

@animeshsingh
Copy link
Collaborator

@fenglixa just want to get a sense of timing on your side for the related fixes, given big data fix is critical to watson - before i push this in master, since this will break the master for big data fix for a while.

@fenglixa
Copy link
Member

@fenglixa just want to get a sense of timing on your side for the related fixes, given big data fix is critical to watson - before i push this in master, since this will break the master for big data fix for a while.

Consider to the watson part, let me start working on #189 right now, with the trunk of: https://github.com/ckadner/kfp-tekton/tree/issue_143_embedded_pipelineSpec

So let's commit code of this PR with my fixes together later.

Due to the influence of Chinese Dragon Boat Festival these days, I may need 3-5 natural days to finish it. Let me know if you have any concern on it. Thanks.

@animeshsingh
Copy link
Collaborator

thanks - to clarify @fenglixa you are recommending to hold this until your PR comes by the end of the week, or merge it now?

@animeshsingh
Copy link
Collaborator

@ckadner a conflict on this one as well

@ckadner ckadner force-pushed the issue_143_embedded_pipelineSpec branch from e33af01 to 71823e2 Compare June 23, 2020 02:28
@k8s-ci-robot k8s-ci-robot removed the lgtm label Jun 23, 2020
@ckadner
Copy link
Member Author

ckadner commented Jun 23, 2020

@ckadner a conflict on this one as well

Rebased.

@fenglixa
Copy link
Member

@ckadner @animeshsingh ,
PR submitted for big data part modification.
with my self unit test, PR #194 + PR #166, then all the cases should be OK now, including big data. See details of my test result from PR #194

@ckadner, let's commit these 2 PRs now, to support both pipelinerun and bigdata.
Additional, PR #194 changed following 6 files. To avoid any conflicts during merging, it's better you can revert these files in this PR, (it's also OK if you don't want do it, then we may need to resolve conflicts later).

	modified:   sdk/python/kfp_tekton/compiler/_data_passing_rewriter.py
	modified:   sdk/python/tests/compiler/testdata/big_data_passing.py
	modified:   sdk/python/tests/compiler/testdata/big_data_passing.yaml
	modified:   sdk/python/tests/compiler/testdata/input_artifact_raw_value.yaml
	modified:   sdk/python/tests/compiler/testdata/katib.yaml
	modified:   sdk/python/tests/compiler/testdata/pipelineparams.yaml

@animeshsingh
Copy link
Collaborator

@ckadner vis a vis the 6 files mentioned above - what do you want to do?

@ckadner
Copy link
Member Author

ckadner commented Jun 23, 2020

@ckadner vis a vis the 6 files mentioned above - what do you want to do?

I will keep them since they reflect the conversion from individual Task, Pipeline to all-inclusice PipelineRun.

@Tomcli
Copy link
Member

Tomcli commented Jun 23, 2020

/lgtm
Although Feng's PR probably needs to rebase after we merged this PR.

@ckadner
Copy link
Member Author

ckadner commented Jun 23, 2020

/lgtm
Although Feng's PR probably needs to rebase after we merged this PR.

yes. but that should not be a big deal. the testdata gets regenerated and the bigdatafix builds on top of what I had done, and if Feng has no patience he can just override it :-)

@ckadner
Copy link
Member Author

ckadner commented Jun 23, 2020

oh, and, if I were to take those files out, Travis would complain ;-)

@animeshsingh
Copy link
Collaborator

/approve

thanks @ckadner for driving this, and @Tomcli @fenglixa for your help.
cc @Udiknedormin

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: animeshsingh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 196b822 into kubeflow:master Jun 24, 2020
k8s-ci-robot pushed a commit that referenced this pull request Jun 24, 2020
* further big_data_passing fixes after PR #166

* further big_data_passing fixes after PR #166

* address comments from reviewers

* Address comments form Tomcli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate PipelineRun with embedded pipelineSpec and taskSpec
6 participants