Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify e2e tests #33

Merged
16 commits merged into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 9 additions & 13 deletions Makefile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the changes, just noticed that test task isn't documented in this Makefile.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed this during the rebase. @roberta-dt please could you take a look?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ compile: ## Compile the pipeline to pipeline.yaml. Must specify pipeline=<traini
@cd pipelines/src && \
poetry run kfp dsl compile --py pipelines/${pipeline}/pipeline.py --output pipelines/${pipeline}/pipeline.yaml --function pipeline

targets ?= "training serving"
targets ?= training serving
build: ## Build and push training and/or serving container(s) image using Docker. Specify targets=<training serving> e.g. targets=training or targets="training serving" (default)
@cd model && \
for target in $$targets ; do \
Expand All @@ -63,17 +63,18 @@ build: ## Build and push training and/or serving container(s) image using Docker
done


compile ?=true
build ?= true
run: ## Compile or build pipeline and run pipeline in sandbox environment. Set compile=false to skip recompiling the pipeline and set build=false to skip rebuilding container images
@if [ "${compile}" ]; then \
compile ?= true
build ?= true
wait ?= false
run: ## Run pipeline in sandbox environment. Must specify pipeline=<training|prediction>. Optionally specify ENABLE_PIPELINE_CACHING=<true|false> (defaults to default Vertex caching behaviour) and wait=<true|false> (default = false). Set compile=false to skip recompiling the pipeline and set build=false to skip rebuilding container images
@if [ $(compile) = "true" ]; then \
$(MAKE) compile ; \
fi && \
if [ "${build}" ]; then \
if [ $(build) = "true" ]; then \
$(MAKE) build ; \
fi && \
cd pipelines/src \
poetry run python -m pipelines.utils.trigger_pipeline --template_path=pipelines/${pipeline}/pipeline.yaml --display_name=${pipeline}
cd pipelines/src && \
poetry run python -m pipelines.utils.trigger_pipeline --template_path=pipelines/${pipeline}/pipeline.yaml --display_name=${pipeline} --wait=${wait}


test: ## Run unit tests for a component group or for all component groups and the pipeline trigger code.
Expand All @@ -93,8 +94,3 @@ test: ## Run unit tests for a component group or for all component groups and th
cd ../.. ;\
done ; \
fi


e2e-tests: ## Perform end-to-end (E2E) pipeline tests. Must specify pipeline=<training|prediction>. Optionally specify ENABLE_PIPELINE_CACHING=<true|false> (defaults to default Vertex caching behaviour).
@ cd pipelines && \
poetry run pytest --log-cli-level=INFO tests/$(pipeline)
19 changes: 6 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,13 +177,14 @@ make build target=serving
You can run the training pipeline (for example) with:

```bash
make run pipeline=training
make run pipeline=training [ wait=<true|false> ]
```

This will execute the pipeline using the chosen template on Vertex AI, namely it will:

1. Compile the pipeline using the Kubeflow Pipelines SDK
1. Trigger the pipeline with the help of `pipelines/trigger/main.py`
1. (optional) Wait for pipeline to finish before returning if `wait` is set to `true` (default is false)

#### Pipeline input parameters

Expand All @@ -193,8 +194,8 @@ When triggering ad hoc runs in your dev/sandbox environment, or when running the

## Testing

Unit tests and end-to-end (E2E) pipeline tests are performed using [pytest](https://docs.pytest.org).
The unit tests for custom KFP components are run on each pull request, as well as the E2E tests. To run them on your local machine:
Unit tests are performed using [pytest](https://docs.pytest.org).
The unit tests for custom KFP components are run on each pull request. To run them on your local machine:

```
make test
Expand All @@ -205,12 +206,6 @@ Alternatively, only test one of the component groups by running:
make test GROUP=vertex-components
```

To run end-to-end tests of a single pipeline, you can use:

```
make e2e-tests pipeline=<training|prediction>
```

There are also unit tests for the utility scripts in [pipelines/src/pipelines/utils](/pipelines/src/pipelines/utils/). To run them on your local machine:

```
Expand Down Expand Up @@ -241,14 +236,12 @@ vertex-pipelines-end-to-end-samples

Make sure that you give the ML pipeline a unique name in the `@pipeline` decorator.

To run your pipeline, use `make run` as before:
To run your pipeline, use `make run` as before (optionally adding parameter to wait until pipeline is finished before returning - defaults to false):

```bash
make run pipeline=your_new_pipeline
make run pipeline=your_new_pipeline [ wait=<true|false> ]
```

You will also need to add an E2E test - copy and paste the `training` or `prediction` example in [pipelines/tests/](/pipelines/tests/).

Some of the scripts e.g. CI/CD pipelines assume only a training and prediction pipeline. You will need to adapt these to add in the compile, run and upload steps for your new pipeline in [cloudbuild/pr-checks.yaml](/cloudbuild/pr-checks.yaml), [cloudbuild/e2e-test.yaml](/cloudbuild/e2e-test.yaml) and [cloudbuild/release.yaml](/cloudbuild/release.yaml).

### Scheduling pipelines
Expand Down
4 changes: 2 additions & 2 deletions cloudbuild/e2e-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ steps:
curl -sSL https://install.python-poetry.org | python3 - && \
export PATH="/builder/home/.local/bin:$$PATH" && \
make install && \
make e2e-tests pipeline=training && \
make e2e-tests pipeline=prediction
make run pipeline=training build=false wait=true && \
make run pipeline=prediction build=false wait=true
Comment on lines +48 to +49
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@browningjp-datatonic why don't we build containers when running e2e tests?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The containers are built in the previous pipeline step. We pass build=false here because make build would create a separate Cloud Build job (which fails in this case, because it relies on the gcloud command that is not available in the CI container image)

env:
- ENABLE_PIPELINE_CACHING=${_TEST_ENABLE_PIPELINE_CACHING}
- VERTEX_LOCATION=${_TEST_VERTEX_LOCATION}
Expand Down
2 changes: 1 addition & 1 deletion docs/PRODUCTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ When you open the Pull Request, the CI pipeline (`pr-checks.yaml`) should be tri

| :bulb: Remember |
|:-------------------|
| Make sure to update any unit tests and end-to-end tests in line with your changes to the pipelines |
| Make sure to update any unit tests in line with your changes to the pipelines |

| :exclamation: IMPORTANT |
|:---------------------------|
Expand Down
18 changes: 18 additions & 0 deletions pipelines/src/pipelines/utils/trigger_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@
def trigger_pipeline(
template_path: str,
display_name: str,
wait: bool = False,
) -> aiplatform.PipelineJob:
"""Trigger a Vertex Pipeline run from a (local) compiled pipeline definition.

Args:
template_path (str): file path to the compiled YAML pipeline
display_name (str): Display name to use for the PipelineJob
wait (bool): Wait for the pipeline to finish running

Returns:
aiplatform.PipelineJob: the Vertex PipelineJob object
Expand Down Expand Up @@ -67,6 +69,10 @@ def trigger_pipeline(
network=network,
)

if wait:
# Wait for pipeline to finish running before returning
pl.wait()

return pl


Expand All @@ -84,10 +90,22 @@ def trigger_pipeline(
type=str,
)

parser.add_argument(
"--wait",
help="Wait for the pipeline to finish running",
type=str,
)
# Get commandline args
args = parser.parse_args()

if args.wait.lower() == "true":
wait = True
elif args.wait.lower() != "false":
raise ValueError("wait variable must be 'true' or 'false'")
wait = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@becky-dt can you please check for similar logic elsewhere in this repo? would like to follow the same logic or even import a reusable function for parsing boolean flags from the command-line. thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felix-datatonic There are a couple more booleans in the Makefile but in this case the logic is all run in the makefile (unlike above where the wait param is passed into the python function)
Do we want to add similar logic in the makefile too to catch ValueErrors? and if so is there a good way to raise errors in bash
(let me know if this doesnt make sense and needs more explaining!)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberta-dt and I added this logic to the makefile in the latest commit :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@browningjp-datatonic can you please take a look at the change we made to the run command?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other place where similar logic is used in Python is for ENABLE_PIPELINE_CACHING - however this is a bit different, because it's not just True/False, but True/False/None (None default)

The compile=true and build=true flags are implemented in the Makefile itself as you say, and it's not simple to replicate the logic exactly.

For consistency in behaviour, I would suggest to modify your Python logic here to:

    if args.wait == "true":
        wait = True
    elif args.wait != "false":
        raise ValueError("wait variable must be 'true' or 'false'")
    wait = False

(value from user must be true or false exactly - case sensitive)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


trigger_pipeline(
template_path=args.template_path,
display_name=args.display_name,
wait=wait,
)
Loading