Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: Delete BigQuery and CSV EDP simulator variants #1884

Merged
merged 1 commit into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,6 @@ jobs:
build --define edp6_name=dataProviders/foo6
build --define edp6_cert_name=dataProviders/foo6/certificates/bar6
build --define google_cloud_project=example-project
build --define bigquery_dataset=example-dataset
build --define bigquery_table=events
EOF
- name: Check lockfile
Expand Down
1 change: 0 additions & 1 deletion .github/workflows/scan-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ jobs:
- panel-exchange/gcloud-example-daemon
- panel-exchange/aws-example-daemon
- simulator/synthetic-generator-edp
- simulator/bigquery-edp
- reporting/v2/postgres-internal-server
- duchy/postgres-update-schema
- duchy/gcloud-postgres-update-schema
Expand Down
2 changes: 0 additions & 2 deletions build/variables.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,6 @@ SIMULATOR_K8S_SETTINGS = struct(
edp5_cert_name = "$(edp5_cert_name)",
edp6_name = "$(edp6_name)",
edp6_cert_name = "$(edp6_cert_name)",
bigquery_dataset = "$(bigquery_dataset)",
bigquery_table = "$(bigquery_table)",
)

# Settings for Grafana Kubernetes deployments.
Expand Down
36 changes: 10 additions & 26 deletions docs/gke/correctness-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,37 +67,21 @@ kubectl apply -k src/main/k8s/dev/kingdom
## Deploy EDP simulators

See the [simulator deployment guide](simulator-deployment.md). The test assumes
that there are valid events in the range `[2021-03-15, 2021-03-17]`. The
synthetic generator variant assumes that the event message type is
`wfa.measurement.api.v2alpha.event_templates.testing.TestEvent`, and the
BigQuery variant assumes the event message type is `halo_cmm.uk.pilot.Event`.
that there are valid events in the range `[2021-03-15, 2021-03-17]`. The test
assumes that the event message type is
`wfa.measurement.api.v2alpha.event_templates.testing.TestEvent`.

## Run the correctness test

Run the following, substituting your own values:

* Synthetic generator

```shell
bazel test //src/test/kotlin/org/wfanet/measurement/integration/k8s:SyntheticGeneratorCorrectnessTest
--test_output=streamed \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/Rcn7fKd25C8 \
--define=mc_api_key=W9q4zad246g
```

* BigQuery

```shell
bazel test //src/test/kotlin/org/wfanet/measurement/integration/k8s:BigQueryCorrectnessTest
--test_output=streamed \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/Rcn7fKd25C8 \
--define=mc_api_key=W9q4zad246g \
--define=google_cloud_project=halo-cmm-demo \
--define=bigquery_dataset=demo \
--define=bigquery_table=labelled_events
```
```shell
bazel test //src/test/kotlin/org/wfanet/measurement/integration/k8s:SyntheticGeneratorCorrectnessTest \
--test_output=streamed \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/Rcn7fKd25C8 \
--define=mc_api_key=W9q4zad246g
```

The time the test takes depends on the size of the data set. With the default
synthetic generator configuration, this is about an hour. Eventually, you should
Expand Down
182 changes: 43 additions & 139 deletions docs/gke/simulator-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,60 +15,13 @@ See [Machine Setup](machine-setup.md).

## Configure event data source

There are two data sources that can be used:

1. Synthetic generator

Events are generated according to
[simulator synthetic data specifications](../../src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto),
consisting of a single `SyntheticPopulationSpec` and a
`SyntheticEventGroupSpec` for each `EventGroup`. There are default
specifications included, but you can replace these with your own after
before you apply the K8s Kustomization.

This data source supports any event message type.

2. BigQuery table

Events are read from a Google Cloud BigQuery table. See the section below on
how to populate the table.

This data source currently only supports the `halo_cmm.uk.pilot.Event`
message type.

### Populate BigQuery table

The BigQuery table schema has the following columns:

* `date`
* Type: `DATE`
* `publisher_id`
* Type: `INTEGER`
* `vid`
* Type: `INTEGER`
* `digital_video_completion_status`
* Type: `STRING`
* Values:
* `0% - 25%`
* `25% - 50%`
* `50% - 75%`
* `75% - 100%`
* `100%`
* `viewability`
* Type: `STRING`
* Values:
* `viewable_0_percent_to_50_percent`
* `viewable_50_percent_to_100_percent`
* `viewable_100_percent`

The `dev` configuration expects a table named `labelled_events` in a dataset
named `demo` in the `us-central1` region. The table can be created in the
[Google Cloud Console](https://console.cloud.google.com/bigquery), specifying a
CSV file with automatic schema detection.

The
[`uk-pilot-synthetic-data-gen` script](https://github.com/world-federation-of-advertisers/uk-pilot-synthetic-data-gen)
may be helpful in generating a CSV file with test events.
Events are generated according to
[simulator synthetic data specifications](../../src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto),
consisting of a single `SyntheticPopulationSpec` and a `SyntheticEventGroupSpec`
for each `EventGroup`. There are default specifications included, but you can
replace these with your own after before you apply the K8s Kustomization.

This data source supports any event message type.

## Provision Google Cloud Project infrastructure

Expand All @@ -83,7 +36,7 @@ Applying the Terraform configuration will create a new cluster. You can use the
gcloud container clusters get-credentials simulators
```

## Build and push container image (optional)
## Build and push container image (not recommended)

If you aren't using pre-built release images, you can build the image yourself
from source and push them to a container registry. For example, if you're using
Expand All @@ -95,99 +48,50 @@ The build target to use depends on the event data source. Assuming a project
named `halo-cmm-demo` and an image tag `build-0001`, run the following to build
and push the image:

* Synthetic generator

```shell
bazel run -c opt //src/main/docker:push_synthetic_generator_edp_simulator_runner_image \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```

* BigQuery

```shell
bazel run -c opt //src/main/docker:push_bigquery_edp_simulator_runner_image \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```
```shell
bazel run -c opt //src/main/docker:push_synthetic_generator_edp_simulator_runner_image \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```

## Generate K8s Kustomization

Run the following, substituting your own values:

* Synthetic generator

```shell
bazel build //src/main/k8s/dev:synthetic_generator_edp_simulators.tar \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=worker1_id=worker1
--define=worker1_public_api_target=public.worker1.dev.halo-cmm.org:8443 \
--define=worker2_id=worker2
--define=worker2_public_api_target=public.worker2.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/TGWOaWehLQ8 \
--define=edp1_name=dataProviders/HRL1wWehTSM \
--define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \
--define=edp2_name=dataProviders/djQdz2ehSSE \
--define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \
--define=edp3_name=dataProviders/SQ99TmehSA8 \
--define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \
--define=edp4_name=dataProviders/TBZkB5heuL0 \
--define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \
--define=edp5_name=dataProviders/HOCBxZheuS8 \
--define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \
--define=edp6_name=dataProviders/VGExFmehRhY \
--define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```

The resulting archive will contain `SyntheticEventGroupSpec` messages in
text format under `src/main/k8s/dev/synthetic_generator_config_files/`.
These can be replaced in order to customize the synthetic generator.

* BigQuery

```shell
bazel build //src/main/k8s/dev:bigquery_edp_simulators.tar \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=worker1_id=worker1
--define=worker1_public_api_target=public.worker1.dev.halo-cmm.org:8443 \
--define=worker2_id=worker2
--define=worker2_public_api_target=public.worker2.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/TGWOaWehLQ8 \
--define=edp1_name=dataProviders/HRL1wWehTSM \
--define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \
--define=edp2_name=dataProviders/djQdz2ehSSE \
--define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \
--define=edp3_name=dataProviders/SQ99TmehSA8 \
--define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \
--define=edp4_name=dataProviders/TBZkB5heuL0 \
--define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \
--define=edp5_name=dataProviders/HOCBxZheuS8 \
--define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \
--define=edp6_name=dataProviders/VGExFmehRhY \
--define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \
--define container_registry=gcr.io \
--define=google_cloud_project=halo-cmm-demo \
--define=bigquery_dataset=demo \
--define=bigquery_table=labelled_events \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```
```shell
bazel build //src/main/k8s/dev:synthetic_generator_edp_simulators.tar \
--define=kingdom_public_api_target=v2alpha.kingdom.dev.halo-cmm.org:8443 \
--define=worker1_id=worker1
--define=worker1_public_api_target=public.worker1.dev.halo-cmm.org:8443 \
--define=worker2_id=worker2
--define=worker2_public_api_target=public.worker2.dev.halo-cmm.org:8443 \
--define=mc_name=measurementConsumers/TGWOaWehLQ8 \
--define=edp1_name=dataProviders/HRL1wWehTSM \
--define=edp1_cert_name=dataProviders/HRL1wWehTSM/certificates/HRL1wWehTSM \
--define=edp2_name=dataProviders/djQdz2ehSSE \
--define=edp2_cert_name=dataProviders/djQdz2ehSSE/certificates/djQdz2ehSSE \
--define=edp3_name=dataProviders/SQ99TmehSA8 \
--define=edp3_cert_name=dataProviders/SQ99TmehSA8/certificates/SQ99TmehSA8 \
--define=edp4_name=dataProviders/TBZkB5heuL0 \
--define=edp4_cert_name=dataProviders/TBZkB5heuL0/certificates/TBZkB5heuL0 \
--define=edp5_name=dataProviders/HOCBxZheuS8 \
--define=edp5_cert_name=dataProviders/HOCBxZheuS8/certificates/HOCBxZheuS8 \
--define=edp6_name=dataProviders/VGExFmehRhY \
--define=edp6_cert_name=dataProviders/VGExFmehRhY/certificates/VGExFmehRhY \
--define container_registry=gcr.io \
--define image_repo_prefix=halo-cmm-demo --define image_tag=build-0001
```

The resulting archive will contain `SyntheticEventGroupSpec` messages in text
format under `src/main/k8s/dev/synthetic_generator_config_files/`. These can be
replaced in order to customize the synthetic generator.

Extract the generated archive to some directory.

## Apply K8s Kustomization

From the Kustomization directory, run

* Synthetic generator

```shell
kubectl apply -k src/main/k8s/dev/synthetic_generator_edp_simulators
```

* BigQuery

```shell
kubectl apply -k src/main/k8s/dev/bigquery_edp_simulators
```
```shell
kubectl apply -k src/main/k8s/dev/synthetic_generator_edp_simulators
```
10 changes: 0 additions & 10 deletions src/main/docker/images.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -91,11 +91,6 @@ COMMON_IMAGES = [
image = "//src/main/kotlin/org/wfanet/measurement/loadtest/panelmatchresourcesetup:panel_match_resource_setup_runner_image",
repository = _PREFIX + "/loadtest/panel-match-resource-setup",
),
struct(
name = "csv_edp_simulator_runner_image",
image = "//src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider:csv_edp_simulator_runner_image",
repository = _PREFIX + "/simulator/csv-edp",
),
struct(
name = "synthetic_generator_edp_simulator_runner_image",
image = "//src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider:synthetic_generator_edp_simulator_runner_image",
Expand Down Expand Up @@ -141,11 +136,6 @@ GKE_IMAGES = [
image = "//src/main/kotlin/org/wfanet/measurement/duchy/deploy/gcloud/job/mill/shareshuffle:gcs_honest_majority_share_shuffle_mill_job_image",
repository = _PREFIX + "/duchy/honest-majority-share-shuffle-mill",
),
struct(
name = "bigquery_edp_simulator_runner_image",
image = "//src/main/kotlin/org/wfanet/measurement/loadtest/dataprovider:bigquery_edp_simulator_runner_image",
repository = _PREFIX + "/simulator/bigquery-edp",
),
struct(
name = "duchy_gcloud_postgres_update_schema_image",
image = "//src/main/kotlin/org/wfanet/measurement/duchy/deploy/gcloud/postgres/tools:update_schema_image",
Expand Down
25 changes: 0 additions & 25 deletions src/main/k8s/dev/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -383,31 +383,6 @@ EDP_SIMULATOR_TAGS = {
"google_cloud_project": GCLOUD_SETTINGS.project,
}

cue_dump(
name = "bigquery_edp_simulator_gke",
srcs = ["bigquery_edp_simulator_gke.cue"],
cue_tags = dict(EDP_SIMULATOR_TAGS.items() + {
"bigquery_dataset": SIMULATOR_K8S_SETTINGS.bigquery_dataset,
"bigquery_table": SIMULATOR_K8S_SETTINGS.bigquery_table,
}.items()),
tags = ["manual"],
deps = [":edp_simulator_gke"],
)

kustomization_dir(
name = "bigquery_edp_simulators",
testonly = True,
srcs = [
"resource_requirements.yaml",
":bigquery_edp_simulator_gke",
],
generate_kustomization = True,
tags = ["manual"],
deps = [
"//src/main/k8s/testing/secretfiles:kustomization",
],
)

cue_dump(
name = "synthetic_generator_edp_simulator_gke",
srcs = ["synthetic_generator_edp_simulator_gke.cue"],
Expand Down
46 changes: 0 additions & 46 deletions src/main/k8s/dev/bigquery_edp_simulator_gke.cue

This file was deleted.

2 changes: 1 addition & 1 deletion src/main/k8s/edp_simulator.cue
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ import "list"
"\(deployment._name)": {
_app_label: deployment.spec.template.metadata.labels.app
_egresses: {
// Need to be able to access Kingdom and BigQuery.
// Need to be able to access Kingdom.
any: {}
}
}
Expand Down
Loading