Skip to content

Commit

Permalink
Merge pull request #14 from teamdatatonic/feat/combine_assets
Browse files Browse the repository at this point in the history
feat: combine assets into a single folder
  • Loading branch information
ariadnafer authored May 30, 2023
2 parents 3238801 + caeaa74 commit d2c7c28
Show file tree
Hide file tree
Showing 12 changed files with 20 additions and 29 deletions.
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,12 @@ test-all-components-coverage: ## Run tests with coverage
$(MAKE) test-components-coverage GROUP=$$(basename $$component_group) ; \
done

sync-assets: ## Sync assets folder to GCS. Must specify pipeline=<training|prediction>
@if [ -d "./pipelines/src/pipelines/${PIPELINE_TEMPLATE}/$(pipeline)/assets/" ] ; then \
sync-assets: ## Sync assets folder to GCS.
@if [ -d "./pipelines/assets/" ] ; then \
echo "Syncing assets to GCS" && \
gsutil -m rsync -r -d ./pipelines/src/pipelines/${PIPELINE_TEMPLATE}/$(pipeline)/assets ${PIPELINE_FILES_GCS_PATH}/$(pipeline)/assets ; \
gsutil -m rsync -r -d ./pipelines/assets ${PIPELINE_FILES_GCS_PATH}/assets ; \
else \
echo "No assets folder found for pipeline $(pipeline)" ; \
echo "No assets folder found" ; \
fi ;

run: ## Compile pipeline, copy assets to GCS, and run pipeline in sandbox environment. Must specify pipeline=<training|prediction>. Optionally specify enable_pipeline_caching=<true|false> (defaults to default Vertex caching behaviour)
Expand Down
16 changes: 6 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,8 +178,8 @@ When triggering ad hoc runs in your dev/sandbox environment, or when running the

### Assets

In each pipeline folder, there is an `assets` directory (`pipelines/pipelines/<xgboost|tensorflow>/<training|prediction>/assets/`).
This can be used for any additional files that may be needed during execution of the pipelines.
The folder `pipelines/assets/` can be used for any additional files that may be needed during execution of the pipelines.
Most importantly this can include your training scripts.
This directory is rsync'd to Google Cloud Storage when running a pipeline in the sandbox environment or as part of the CD pipeline (see [CI/CD setup](cloudbuild/README.md)).

## Testing
Expand Down Expand Up @@ -243,14 +243,10 @@ Below is a diagram of how the files are published in each environment in the `e2
```
. <-- GCS directory set by _PIPELINE_PUBLISH_GCS_PATH
└── TAG_NAME or GIT COMMIT HASH <-- Git tag used for the release (release.yaml) OR git commit hash (e2e-test.yaml)
├── prediction
│ ├── assets
│ │ └── some_useful_file.json
│ └── prediction.json <-- compiled prediction pipeline
└── training
├── assets
│ └── training_task.py
└── training.json <-- compiled training pipeline
├── training.json
├── prediction.json
├── assets
│ └── some_useful_file.json
```

4. `terraform-plan.yaml` - Checks the Terraform configuration under `terraform/envs/<env>` (e.g. `terraform/envs/test`), and produces a summary of any proposed changes that will be applied on merge to the main branch.
Expand Down
4 changes: 2 additions & 2 deletions cloudbuild/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ limitations under the License.
There are five CI/CD pipelines

1. `pr-checks.yaml` - runs pre-commit checks and unit tests on the custom KFP components, and checks that the ML pipelines (training and prediction) can compile.
1. `e2e-test.yaml` - copies the "assets" folders to the chosen GCS destination (versioned by git commit hash) and runs end-to-end tests of the training and prediction pipeline.
1. `release.yaml` - compiles training and prediction pipelines, then copies the compiled pipelines and their respective "assets" folders to the chosen GCS destination (versioned by git tag).
1. `e2e-test.yaml` - copies the "assets" folder to the chosen GCS destination (versioned by git commit hash) and runs end-to-end tests of the training and prediction pipeline.
1. `release.yaml` - compiles training and prediction pipelines, then copies the compiled pipelines and "assets" folder to the chosen GCS destination (versioned by git tag).
1. `terraform-plan.yaml` - Checks the Terraform configuration under `terraform/envs/<env>` (e.g. `terraform/envs/test`), and produces a summary of any proposed changes that will be applied on merge to the main branch.
1. `terraform-apply.yaml` - Applies the Terraform configuration under `terraform/envs/<env>` (e.g. `terraform/envs/test`).

Expand Down
6 changes: 2 additions & 4 deletions cloudbuild/e2e-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,8 @@ steps:
args:
- -c
- |
mkdir -p ${COMMIT_SHA}/training/assets && \
mkdir -p ${COMMIT_SHA}/prediction/assets && \
cp -r pipelines/src/pipelines/${_PIPELINE_TEMPLATE}/training/assets ${COMMIT_SHA}/training/ && \
cp -r pipelines/src/pipelines/${_PIPELINE_TEMPLATE}/prediction/assets ${COMMIT_SHA}/prediction/ && \
mkdir -p ${COMMIT_SHA}/assets && \
cp -r pipelines/assets ${COMMIT_SHA} && \
gsutil cp -r ${COMMIT_SHA} ${_PIPELINE_PUBLISH_GCS_PATH}/${COMMIT_SHA}
# Install Python deps
Expand Down
11 changes: 4 additions & 7 deletions cloudbuild/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ steps:
- -c
- |
make setup && \
make compile-all-components && \
make compile-pipeline pipeline=training && \
make compile-pipeline pipeline=prediction
env:
Expand All @@ -35,12 +34,10 @@ steps:
args:
- -c
- |
mkdir -p ${TAG_NAME}/training/assets && \
mkdir -p ${TAG_NAME}/prediction/assets && \
cp pipelines/training.json ${TAG_NAME}/training/training.json && \
cp pipelines/prediction.json ${TAG_NAME}/prediction/prediction.json && \
cp -r pipelines/pipelines/${_PIPELINE_TEMPLATE}/training/assets ${TAG_NAME}/training/ && \
cp -r pipelines/pipelines/${_PIPELINE_TEMPLATE}/prediction/assets ${TAG_NAME}/prediction/ && \
mkdir -p ${TAG_NAME}/assets && \
cp pipelines/src/training.json ${TAG_NAME}/training.json && \
cp pipelines/src/prediction.json ${TAG_NAME}/prediction.json && \
cp -r pipelines/assets/* ${TAG_NAME}/assets/ && \
for dest in ${_PIPELINE_PUBLISH_GCS_PATHS} ; do \
gsutil cp -r ${TAG_NAME} $$dest ; \
done
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion pipelines/src/pipelines/tensorflow/training/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def tensorflow_pipeline(
valid_table = "valid_data" + table_suffix
test_table = "test_data" + table_suffix
primary_metric = "rootMeanSquaredError"
train_script_uri = f"{pipeline_files_gcs_path}/training/assets/train_tf_model.py"
train_script_uri = f"{pipeline_files_gcs_path}/assets/train_tf_model.py"
hparams = dict(
batch_size=100,
epochs=5,
Expand Down
Empty file.
Empty file.
2 changes: 1 addition & 1 deletion pipelines/src/pipelines/xgboost/training/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def xgboost_pipeline(
valid_table = "valid_data" + table_suffix
test_table = "test_data" + table_suffix
primary_metric = "rootMeanSquaredError"
train_script_uri = f"{pipeline_files_gcs_path}/training/assets/train_xgb_model.py"
train_script_uri = f"{pipeline_files_gcs_path}/assets/train_xgb_model.py"
hparams = dict(
n_estimators=200,
early_stopping_rounds=10,
Expand Down

0 comments on commit d2c7c28

Please sign in to comment.