Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #733 the pipeline parameter consumed as file #740

Merged
merged 1 commit into from
Sep 29, 2021

Conversation

huixa
Copy link
Contributor

@huixa huixa commented Sep 27, 2021

Which issue is resolved by this Pull Request:
Resolves #733

Description of your changes:
Remove the limitations for the pipeline parameter consumed as file.

Environment tested:

  • Python Version (use python --version): Python 3.8.6
  • Tekton Version (use tkn version):
  • Kubernetes Version (use kubectl version): v1.21.1
  • OS (e.g. from /etc/os-release): RHEL8.4

Checklist:

@google-cla
Copy link

google-cla bot commented Sep 27, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@huixa
Copy link
Contributor Author

huixa commented Sep 27, 2021

@pugangxa Create PR and try to resolve the #733. And run the case as attachment, it could add step to use the pipeline parameter content to file, then be consumed as file. But because this file _data_passing_rewriter.py is related with other data passing's cases, maybe there has some impact. So just create this pr and please help guide me. Thanks.
1.py.txt
1.yaml.txt

@pugangxa
Copy link
Contributor

Took a look and I think the root cause is that the parameter is in fact from pipeline parameter but not the dependent tasks, so what about match pipeline parameter and add the copy steps, like this:

diff --git a/sdk/python/kfp_tekton/compiler/_data_passing_rewriter.py b/sdk/python/kfp_tekton/compiler/_data_passing_rewriter.py
index 5bd8399..22ccbaa 100644
--- a/sdk/python/kfp_tekton/compiler/_data_passing_rewriter.py
+++ b/sdk/python/kfp_tekton/compiler/_data_passing_rewriter.py
@@ -17,6 +17,7 @@ import json
 import re

 from typing import List, Optional, Set

 from kfp_tekton.compiler._k8s_helper import sanitize_k8s_name
 from kfp_tekton.compiler._op_to_template import _get_base_step, _add_mount_path, _prepend_steps
@@ -300,7 +301,7 @@ def fix_big_data_passing(workflow: dict) -> dict:

     # Use workspaces to tasks if big data passing instead of 'results', 'copy-inputs'
     for task_template in container_templates:
-        task_template = big_data_passing_tasks(task_template,
+        task_template = big_data_passing_tasks(pipelinerun_name, task_template,
                                                pipelinerun_template,
                                                inputs_consumed_as_artifacts,
                                                outputs_consumed_as_artifacts)
@@ -470,7 +471,7 @@ def big_data_passing_pipelinerun(name: str, pr: dict, pw: set):
     return pr, prw


-def big_data_passing_tasks(task: dict, pipelinerun_template: dict,
+def big_data_passing_tasks(prname: str, task: dict, pipelinerun_template: dict,
                            inputs_tasks: set, outputs_tasks: set) -> dict:
     task_name = task.get('name')
     task_spec = task.get('taskSpec', {})
@@ -523,6 +524,10 @@ def big_data_passing_tasks(task: dict, pipelinerun_template: dict,
         if (task_name, task_artifact.get('name')) not in inputs_tasks:
             # add input artifact processes
             task = input_artifacts_tasks(task, task_artifact)
+        if (prname, task_artifact.get('name')) in inputs_tasks:
+            # add input artifact processes for pipeline parameter
+            task_artifact['raw']['data'] = pipelinerun_template['spec']['params'][0]['value']
+            task = input_artifacts_tasks(task, task_artifact)

     # Remove artifacts parameter from params
     task.get("taskSpec", {})['params'] = [

I tested it can add copy steps but still some issues in the code:

  • Match the parameter values and add to the copy task
  • Maybe do not change the function signature but get prname from pipelinerun_template
  • Please test if there's multiple pipeline parameters but used as the same name in task_artifact, will duplicated items being created

Thanks

@google-cla
Copy link

google-cla bot commented Sep 27, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Sep 27, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@Tomcli
Copy link
Member

Tomcli commented Sep 27, 2021

/ok-to-test

@@ -537,6 +548,24 @@ def big_data_passing_tasks(task: dict, pipelinerun_template: dict,
return task


def input_artifacts_tasks1(template: dict, artifact: dict) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we manage to reuse the input_artifacts_tasks? If it's too messed up, at least should change the name to something like input_artifacts_tasks_pr_params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gang. Yes, it's different function behavior for input_artifacts_tasks and input_artifacts_tasks1. I will update the input_artifacts_tasks1 name.

@pugangxa
Copy link
Contributor

Thanks very much @huixa for your hard work.

  • Please sign the CLA
  • Please add a test for this, can simply reuse the you python file with some small changes
  • Please fix the lint error

@huixa
Copy link
Contributor Author

huixa commented Sep 28, 2021

Thanks very much @huixa for your hard work.

* Please sign the CLA

* Please add a test for this, can simply reuse the you python file with some small changes

* Please fix the lint error

Thanks Gang very much. Will follow these issues.

@huixa
Copy link
Contributor Author

huixa commented Sep 28, 2021

@googlebot I signed it!

@huixa
Copy link
Contributor Author

huixa commented Sep 28, 2021

@Tomcli Had followed all the comments, could you help review again. Thanks.

@Tomcli
Copy link
Member

Tomcli commented Sep 28, 2021

thanks @huixa, can you run make unit_test GENERATE_GOLDEN_YAML=True to generate the test yaml because it will include the license and format it correctly.

For the python lint, you can run make lint to make sure no error is in the python test files.

@huixa
Copy link
Contributor Author

huixa commented Sep 29, 2021

thanks @huixa, can you run make unit_test GENERATE_GOLDEN_YAML=True to generate the test yaml because it will include the license and format it correctly.

For the python lint, you can run make lint to make sure no error is in the python test files.

Thanks @Tomcli. Had fixed these issues and run these 2 commands successfully. Please help review again.

HuiMac:kfp-tekton liuhui$ make unit_test GENERATE_GOLDEN_YAML=True
==================================================================
Optional environment variables to configure unit_test, examples:
  make unit_test GENERATE_GOLDEN_YAML=True
============================================
... ...
test_data_passing_pipeline_param_as_file (compiler.compiler_tests.TestTektonCompiler)
Test compiling a pipeline_param_as_file workflow. ... ok
... ...

----------------------------------------------------------------------
Ran 73 tests in 9.344s

OK
unit_test: OK
HuiMac:kfp-tekton liuhui$ make lint
lint: OK
HuiMac:kfp-tekton liuhui$ 

@Tomcli
Copy link
Member

Tomcli commented Sep 29, 2021

thanks @huixa for your contributions.
/lgtm
/approve

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: huixa, Tomcli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot google-oss-robot merged commit b5c907d into kubeflow:master Sep 29, 2021
@huixa huixa deleted the fix_data_passing branch September 30, 2021 03:17
@huixa huixa changed the title [Only review]Fix #733 the pipeline parameter consumed as file Fix #733 the pipeline parameter consumed as file Sep 30, 2021
@huixa
Copy link
Contributor Author

huixa commented Sep 30, 2021

@pugangxa Thank you for helping me analyze the root cause.
@Tomcli Thank you for sharing the commands to resolve the unit test cases and python issues.
Finally appreciate Gang and Tommy again for this pr‘s review, guidance and help one step by step.

@pugangxa
Copy link
Contributor

pugangxa commented Oct 3, 2021

Congratulations! This is your first PR and is a good start point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data passing doesn't work if the pipeline parameter that get consumed as file.
4 participants