Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for messed up commit history for adding gcs as backup policy store #81

Merged
merged 1 commit into from
Apr 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* [BigQuery Snapshoter](#bigquery-snapshoter)
* [GCS Snapshoter](#gcs-snapshoter)
* [Tagger](#tagger)
* [Design Notes](#design-notes)
* [Design Notes](#design-notes)
* [Assumptions](#assumptions)
* [Deployment](#deployment)
* [Install Maven](#install-maven)
Expand Down Expand Up @@ -135,16 +135,16 @@ A cloud scheduler is used to send a BigQuery “Scan Scope” to the dispatcher
The solution uses multiple steps, as explained above, to list, backup and tag tables. These steps are designed so that each one
of them is responsible for doing one main task and allow for checkpointing to handle retries in a cost-efficient way, and using
PubSub to decouple these steps from each other. For example, if a table is backed up in a single run, via the Snpashoter step,
and then the Tagger step fails to update its backup policy (i.e. last_update_at field)
and then the Tagger step fails to update its backup policy (i.e. last_update_at field)
due to temporarily GCP API quotas/limits with the underlying service, the request will not be acknowledged to PubSub, and
it will be retried again with exponential backoff. In this case only the tagging logic will be retried and not the entire backup operation
that was already successful.
that was already successful.

These steps are implemented in separate Cloud Run services, vs one service with multiple endpoints, to allow fine grain control
on the concurrency, CPU, Memory and timeout settings for each step. This is especially useful since each step could use
different settings depending on its traffic pattern and processing requirements. For example:
* The Dispatcher executes once per run while the other services executes once per table; meaning they could use different concurrency settings (i.e. number of container, requests per container).
Another
Another
* The Dispatcher uses relatively higher memory since it's listing all tables in the scan scope which could be in thousands
* The Dispatcher and GCS Snapshoter needs relatively longer time to finish compared to the BQ Snapshoter. They could use different timeout settings
to avoid un-wanted retries by PubSub
Expand Down
4 changes: 2 additions & 2 deletions scripts/prepare_backup_storage_projects.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,9 @@ do
echo "Preparing GCS Snapshoter SA permissions on backup storage project ${project} .."

# GCS Snapshoter needs to write to GCS
# permission: storage.objects.create
# permission: storage.objects.create, storage.objects.delete
gcloud projects add-iam-policy-binding "${project}" \
--member="serviceAccount:${SA_SNAPSHOTER_GCS_EMAIL}" \
--role="roles/storage.objectCreator"
--role="roles/storage.objectAdmin"

done
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,17 @@
import com.google.cloud.Tuple;
import com.google.cloud.pso.bq_snapshot_manager.entities.NonRetryableApplicationException;
import com.google.cloud.pso.bq_snapshot_manager.entities.PubSubEvent;
import com.google.cloud.pso.bq_snapshot_manager.entities.TableSpec;
import com.google.cloud.pso.bq_snapshot_manager.entities.backup_policy.FallbackBackupPolicy;
import com.google.cloud.pso.bq_snapshot_manager.functions.f02_configurator.Configurator;
import com.google.cloud.pso.bq_snapshot_manager.functions.f02_configurator.ConfiguratorRequest;
import com.google.cloud.pso.bq_snapshot_manager.functions.f02_configurator.ConfiguratorResponse;
import com.google.cloud.pso.bq_snapshot_manager.functions.f04_tagger.TaggerRequest;
import com.google.cloud.pso.bq_snapshot_manager.functions.f04_tagger.TaggerResponse;
import com.google.cloud.pso.bq_snapshot_manager.helpers.ControllerExceptionHelper;
import com.google.cloud.pso.bq_snapshot_manager.helpers.LoggingHelper;
import com.google.cloud.pso.bq_snapshot_manager.helpers.TrackingHelper;
import com.google.cloud.pso.bq_snapshot_manager.services.backup_policy.BackupPolicyServiceGCSImpl;
import com.google.cloud.pso.bq_snapshot_manager.services.bq.BigQueryServiceImpl;
import com.google.cloud.pso.bq_snapshot_manager.services.catalog.DataCatalogServiceImpl;
import com.google.cloud.pso.bq_snapshot_manager.services.backup_policy.BackupPolicyService;
import com.google.cloud.pso.bq_snapshot_manager.services.backup_policy.BackupPolicyServiceFireStoreImpl;
import com.google.cloud.pso.bq_snapshot_manager.services.pubsub.PubSubServiceImpl;
import com.google.cloud.pso.bq_snapshot_manager.services.set.GCSPersistentSetImpl;
import com.google.gson.Gson;
Expand Down Expand Up @@ -63,7 +62,8 @@ public ConfiguratorController() throws JsonProcessingException {
logger = new LoggingHelper(
ConfiguratorController.class.getSimpleName(),
functionNumber,
environment.getProjectId()
environment.getProjectId(),
environment.getApplicationName()
);

logger.logInfoWithTracker(
Expand All @@ -88,7 +88,7 @@ public ConfiguratorController() throws JsonProcessingException {
public ResponseEntity receiveMessage(@RequestBody PubSubEvent requestBody) {


DataCatalogServiceImpl dataCatalogService = null;
BackupPolicyService backupPolicyService = null;

// These values will be updated based on the execution flow and logged at the end
ResponseEntity responseEntity;
Expand Down Expand Up @@ -119,12 +119,12 @@ public ResponseEntity receiveMessage(@RequestBody PubSubEvent requestBody) {

logger.logInfoWithTracker(configuratorRequest.isDryRun(), trackingId, configuratorRequest.getTargetTable(), String.format("Parsed Request: %s", configuratorRequest.toString()));

dataCatalogService = new DataCatalogServiceImpl();
backupPolicyService = new BackupPolicyServiceGCSImpl(environment.getGcsBackupPoliciesBucket());

Configurator configurator = new Configurator(
environment.toConfig(),
new BigQueryServiceImpl(configuratorRequest.getTargetTable().getProject()),
dataCatalogService,
backupPolicyService,
new PubSubServiceImpl(),
new GCSPersistentSetImpl(environment.getGcsFlagsBucket()),
fallbackBackupPolicy,
Expand Down Expand Up @@ -152,8 +152,8 @@ public ResponseEntity receiveMessage(@RequestBody PubSubEvent requestBody) {

} finally {

if (dataCatalogService != null) {
dataCatalogService.shutdown();
if (backupPolicyService != null) {
backupPolicyService.shutdown();
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ public ConfiguratorConfig toConfig (){
getProjectId(),
getBqSnapshoterOutputTopic(),
getGCSSnapshoterOutputTopic(),
getBackupTagTemplateId()
getBackupTagTemplateId(),
getApplicationName()
);
}

Expand Down Expand Up @@ -58,4 +59,13 @@ public String getBackupPolicyJson(){
public String getGcsFlagsBucket(){
return Utils.getConfigFromEnv("GCS_FLAGS_BUCKET", true);
}

public String getApplicationName(){
return Utils.getConfigFromEnv("APPLICATION_NAME", true);
}

public String getGcsBackupPoliciesBucket(){
return Utils.getConfigFromEnv("GCS_BACKUP_POLICIES_BUCKET", true);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,9 @@ public DispatcherController() {
logger = new LoggingHelper(
DispatcherController.class.getSimpleName(),
functionNumber,
environment.getProjectId());
environment.getProjectId(),
environment.getApplicationName()
);
}

@RequestMapping(value = "/", method = RequestMethod.POST)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ public DispatcherConfig toConfig(){
getProjectId(),
getComputeRegionId(),
getDataRegionId(),
getOutputTopic()
getOutputTopic(),
getApplicationName()
);
}

Expand All @@ -47,4 +48,8 @@ public String getDataRegionId(){
public String getGcsFlagsBucket(){
return Utils.getConfigFromEnv("GCS_FLAGS_BUCKET", true);
}

public String getApplicationName(){
return Utils.getConfigFromEnv("APPLICATION_NAME", true);
}
}
5 changes: 5 additions & 0 deletions services/library/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,11 @@
<artifactId>google-cloud-datacatalog</artifactId>
</dependency>

<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-datastore</artifactId>
</dependency>

<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-cloudresourcemanager</artifactId>
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -109,4 +109,12 @@ public String toResourceUrl(){
getTable()
);
}

public String toHivePartitionPostfix(){
return String.format("project=%s/dataset=%s/table=%s",
project,
dataset,
table
);
}
}
Loading