Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: pipeline trigger code + cloud function #30

Merged
14 commits merged into from
Aug 18, 2023
Merged

Conversation

ghost
Copy link

@ghost ghost commented Aug 7, 2023

Description

Separated the pipeline trigger code (for make run) from the Cloud Function

  • Cloud Function now located in terraform/modules/cloudfunction/src
  • Pipeline trigger code now located in pipelines/src/pipelines/utils/trigger_pipeline.py
  • Changed PipelineJob display_name option to no longer use the template_path - template_path might be too long in the case of Artifact Registry URIs
  • Added unit test for upload_pipeline
  • pr-checks.yaml CI pipeline now runs unit tests for the util scripts
  • trigger-tests.yaml CI pipeline now runs unit tests only for the Cloud Function
  • Terraform code updated
  • Docs updated
  • Makefile updated
  • E2E tests updated to use new trigger_pipeline function
  • Added a workaround to Specifying artifact regististry images with tags fails googleapis/python-aiplatform#2181 in the Cloud Function - now resolves the AR tag before passing to PipelineJob

How has this been tested?

  • New and existing unit tests
  • Modified make commands have been tested locally
  • E2E tests as part of CI
  • Manually test cloud function deployment and operation
  • test release.yaml CI/CD

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have successfully run the E2E tests
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated any relevant documentation to reflect my changes

@ghost ghost force-pushed the decouple_trigger_code branch 3 times, most recently from 854b2ec to 7351898 Compare August 7, 2023 17:42
@ghost
Copy link
Author

ghost commented Aug 7, 2023

/gcbrun

@ghost ghost force-pushed the decouple_trigger_code branch 2 times, most recently from b25aae5 to d84c7e7 Compare August 7, 2023 18:09
@ghost
Copy link
Author

ghost commented Aug 7, 2023

/gcbrun

@ghost ghost force-pushed the decouple_trigger_code branch from d84c7e7 to eab027f Compare August 7, 2023 18:10
@ghost
Copy link
Author

ghost commented Aug 7, 2023

/gcbrun

@ghost ghost force-pushed the decouple_trigger_code branch 7 times, most recently from bf1118e to 70dad7a Compare August 8, 2023 16:26
@ghost
Copy link
Author

ghost commented Aug 8, 2023

/gcbrun

@ghost ghost force-pushed the decouple_trigger_code branch from 70dad7a to dcaca24 Compare August 8, 2023 16:30
@ghost
Copy link
Author

ghost commented Aug 8, 2023

/gcbrun

@ghost ghost force-pushed the decouple_trigger_code branch from dcaca24 to e58afb1 Compare August 9, 2023 08:41
@ghost
Copy link
Author

ghost commented Aug 9, 2023

/gcbrun

@ghost ghost force-pushed the decouple_trigger_code branch from e58afb1 to f11a92d Compare August 9, 2023 08:49
@ghost
Copy link
Author

ghost commented Aug 9, 2023

/gcbrun

@ghost ghost force-pushed the decouple_trigger_code branch 2 times, most recently from e5d2766 to b973d85 Compare August 9, 2023 09:02
@ghost ghost marked this pull request as ready for review August 9, 2023 09:07
@ghost
Copy link
Author

ghost commented Aug 9, 2023

/gcbrun

@ghost ghost force-pushed the decouple_trigger_code branch from b973d85 to 2fd9d61 Compare August 9, 2023 09:34
Comment on lines 39 to 41
enable_caching = os.environ.get("enable_pipeline_caching", None)
if enable_caching:
enable_caching = bool(strtobool(enable_caching))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enable_caching = os.environ.get("enable_pipeline_caching", None)
if enable_caching:
enable_caching = bool(strtobool(enable_caching))
import os.environ as env
enable_caching = lower(env.get("enable_pipeline_caching")) in ["1", "true"]

also, I'd move this to ArgumentParser()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is not correct. If the env variable is not set, the result should be None, not False

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(equally lower(None) will cause an error)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like:

enable_caching = os.environ.get("enable_pipeline_caching")
if enable_caching is not None:
  enable_caching = lower(enable_caching) in ["1", "true"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will work and default to False if enable_pipeline_caching isn't in the env:

Suggested change
enable_caching = os.environ.get("enable_pipeline_caching", None)
if enable_caching:
enable_caching = bool(strtobool(enable_caching))
enable_caching = lower(env.get("enable_pipeline_caching", "")) in ["1", "true"]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to default to None if not provided

pipeline_root = os.environ["VERTEX_PIPELINE_ROOT"]
service_account = os.environ["VERTEX_SA_EMAIL"]

enable_caching = os.environ.get("enable_pipeline_caching", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use upper case as we do for other env vars?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind - this is an env variable that we set when running the command (make run enable_pipeline_caching=true) rather than one picked up from env.sh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use upper case to standardise. also, you can change the code to:

Suggested change
enable_caching = os.environ.get("enable_pipeline_caching", None)
enable_caching = os.environ.get("enable_pipeline_caching")

pipelines/src/pipelines/utils/upload_pipeline.py Outdated Show resolved Hide resolved
("true", True),
("False", False),
("false", False),
(None, None), # enable_pipeline_caching env var not set
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "0" and "1"

@ghost
Copy link
Author

ghost commented Aug 14, 2023

/gcbrun

@ghost
Copy link
Author

ghost commented Aug 14, 2023

Looks like we might be blocked by this issue

@ghost
Copy link
Author

ghost commented Aug 14, 2023

/gcbrun

pipeline_root = os.environ["VERTEX_PIPELINE_ROOT"]
service_account = os.environ["VERTEX_SA_EMAIL"]

enable_caching = os.environ.get("enable_pipeline_caching", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use upper case to standardise. also, you can change the code to:

Suggested change
enable_caching = os.environ.get("enable_pipeline_caching", None)
enable_caching = os.environ.get("enable_pipeline_caching")

Comment on lines 39 to 41
enable_caching = os.environ.get("enable_pipeline_caching", None)
if enable_caching:
enable_caching = bool(strtobool(enable_caching))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will work and default to False if enable_pipeline_caching isn't in the env:

Suggested change
enable_caching = os.environ.get("enable_pipeline_caching", None)
if enable_caching:
enable_caching = bool(strtobool(enable_caching))
enable_caching = lower(env.get("enable_pipeline_caching", "")) in ["1", "true"]

pipelines/tests/utils/test_upload_pipeline.py Outdated Show resolved Hide resolved
Comment on lines +56 to +58
enable_caching = payload.get("enable_pipeline_caching")
if enable_caching:
enable_caching = bool(strtobool(enable_caching))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enable_caching = payload.get("enable_pipeline_caching")
if enable_caching:
enable_caching = bool(strtobool(enable_caching))
enable_caching = lower(payload.get("enable_pipeline_caching", "")) in ["1", "true"]

Comment on lines +61 to +62
encryption_spec_key_name = os.environ.get("VERTEX_CMEK_IDENTIFIER") or None
network = os.environ.get("VERTEX_NETWORK") or None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
encryption_spec_key_name = os.environ.get("VERTEX_CMEK_IDENTIFIER") or None
network = os.environ.get("VERTEX_NETWORK") or None
encryption_spec_key_name = os.environ.get("VERTEX_CMEK_IDENTIFIER")
network = os.environ.get("VERTEX_NETWORK")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or None is needed in case the env variable is set to an empty string -> becomes None instead

terraform/modules/cloudfunction/src/main.py Outdated Show resolved Hide resolved
terraform/modules/cloudfunction/src/requirements.txt Outdated Show resolved Hide resolved
terraform/modules/cloudfunction/test/test_main.py Outdated Show resolved Hide resolved
Co-authored-by: Felix Schaumann <[email protected]>
@ghost
Copy link
Author

ghost commented Aug 15, 2023

this will work and default to False if enable_pipeline_caching isn't in the env:

It needs to default to None if not provided

@ghost
Copy link
Author

ghost commented Aug 17, 2023

/gcbrun

@ghost
Copy link
Author

ghost commented Aug 17, 2023

/gcbrun

Copy link
Collaborator

@felix-datatonic felix-datatonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ghost ghost merged commit d16d566 into develop Aug 18, 2023
@felix-datatonic felix-datatonic deleted the decouple_trigger_code branch November 13, 2023 17:10
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant