Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1697602 Remove derived tables for AET #1894

Merged
merged 3 commits into from
Mar 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions GRAVEYARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

This document records interesting code that we've deleted for the sake of discoverability for the future.

## 2021-03 Account Ecosystem Telemetry (AET) derived tables

- [Removal PR](https://github.com/mozilla/bigquery-etl/pull/1894)

AET was never released except for a short test in the beta population,
and now the project has been decommissioned, so there is no longer
any need for these derived tables.

## 2020-04 Fenix baseline_daily and clients_last_seen

- [Removal PR](https://github.com/mozilla/bigquery-etl/pull/925)
Expand Down
31 changes: 0 additions & 31 deletions bigquery_etl/shredder/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,6 @@ def fields(self) -> Tuple[str, ...]:
USER_ID = "user_id"
POCKET_ID = "pocket_id"
SHIELD_ID = "shield_id"
ECOSYSTEM_CLIENT_ID = "ecosystem_client_id"
ECOSYSTEM_CLIENT_ID_HASH = f"{ECOSYSTEM_CLIENT_ID}_hash"
DESKTOP_ECOSYSTEM_CLIENT_ID = f"payload.{ECOSYSTEM_CLIENT_ID}"
PIONEER_ID = "pioneer_id"
ID = "id"
CFR_ID = f"COALESCE({CLIENT_ID}, {IMPRESSION_ID})"
Expand All @@ -95,14 +92,6 @@ def fields(self) -> Tuple[str, ...]:
f" UNNEST([{CLIENT_ID}, {IMPRESSION_SRC.field}]) AS `_",
field="_",
)
ECOSYSTEM_CLIENT_ID_HMAC_SRC = DeleteSource(
table="account_ecosystem_restricted.ecosystem_client_id_deletion_v1",
field=ECOSYSTEM_CLIENT_ID_HASH,
)
ECOSYSTEM_CLIENT_ID_SRC = DeleteSource(
table="account_ecosystem_restricted.ecosystem_client_id_deletion_v1",
field=ECOSYSTEM_CLIENT_ID,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now dropping any remaining AET messages in the pipeline, so we will not be accruing any new entries in the stable tables, and there should be no further need for processing ecosystem_client_id as part of deletion requests.

FXA_HMAC_SRC = DeleteSource(
table="firefox_accounts_derived.fxa_delete_events_v1", field="hmac_user_id"
)
Expand Down Expand Up @@ -138,8 +127,6 @@ def fields(self) -> Tuple[str, ...]:
DESKTOP_SRC,
IMPRESSION_SRC,
CFR_SRC,
ECOSYSTEM_CLIENT_ID_HMAC_SRC,
ECOSYSTEM_CLIENT_ID_SRC,
FXA_HMAC_SRC,
FXA_SRC,
]
Expand Down Expand Up @@ -274,21 +261,6 @@ def fields(self) -> Tuple[str, ...]:
user_id_target(
table="firefox_accounts_derived.fxa_users_services_last_seen_v1"
): FXA_SRC,
# account ecosystem telemetry (AET)
DeleteTarget(
table="telemetry_stable.account_ecosystem_v4", field=DESKTOP_ECOSYSTEM_CLIENT_ID
): ECOSYSTEM_CLIENT_ID_SRC,
DeleteTarget(
table="firefox_accounts_stable.account_ecosystem_v1", field=ECOSYSTEM_CLIENT_ID
): ECOSYSTEM_CLIENT_ID_SRC,
DeleteTarget(
table="account_ecosystem_derived.ecosystem_client_id_lookup_v1",
field=ECOSYSTEM_CLIENT_ID_HASH,
): ECOSYSTEM_CLIENT_ID_HMAC_SRC,
DeleteTarget(
table="account_ecosystem_derived.desktop_clients_daily_v1",
field=ECOSYSTEM_CLIENT_ID_HASH,
): ECOSYSTEM_CLIENT_ID_HMAC_SRC,
# legacy mobile
DeleteTarget(
table="telemetry_stable.core_v1",
Expand Down Expand Up @@ -369,9 +341,6 @@ def fields(self) -> Tuple[str, ...]:
client_id_target(table="eng_workflow_stable.build_v1"),
# other
DeleteTarget(table="telemetry_stable.pioneer_study_v4", field=PIONEER_ID),
DeleteTarget(
table="telemetry_stable.pre_account_v4", field=DESKTOP_ECOSYSTEM_CLIENT_ID
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe pre_account was added here in error. It does indeed have a field called ecosystem_client_id, but it is always set to "unknown". The pre-account ping was associated with a previous incarnation of "ecosystem telemetry" rather than AET.

]
}

Expand Down
12 changes: 0 additions & 12 deletions dags.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -271,18 +271,6 @@ bqetl_fenix_event_rollup:
retries: 2
retry_delay: 30m

bqetl_account_ecosystem:
schedule_interval: 0 2 * * *
description: >-
Tables related to the v1 implementation of Account Ecosystem Telemetry (AET)
which is currently on hold.
default_args:
owner: [email protected]
start_date: "2020-09-17"
email: ["[email protected]"]
retries: 2
retry_delay: 30m

bqetl_stripe:
schedule_interval: daily
default_args:
Expand Down
137 changes: 0 additions & 137 deletions dags/bqetl_account_ecosystem.py

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Loading