diff --git a/docs/index.md b/docs/index.md index 2b097bf8b..1d90d4ee7 100644 --- a/docs/index.md +++ b/docs/index.md @@ -48,6 +48,10 @@ To launch reverse replication, refer details [here](./reverse-replication/Revers To find out how to monitor your migration, refer [here](./monitoring/MonitoringUserGuide.md). +### Custom transformation + +To find out how to configure custom transformations, refer [here](./transformations/CustomTransformation.md) + ## Supported Sources and Targets - **Schema Migrations**: SMT supports schema migrations for MySQL, PostgreSQL, SQLServer and Oracle. diff --git a/docs/reverse-replication/ReverseReplicationUserGuide.md b/docs/reverse-replication/ReverseReplicationUserGuide.md index dbf0a3a3a..57ca1cbb3 100644 --- a/docs/reverse-replication/ReverseReplicationUserGuide.md +++ b/docs/reverse-replication/ReverseReplicationUserGuide.md @@ -113,6 +113,9 @@ The Dataflow job that writes to source database exposes the following per shard | metadata_file_create_lag_retry_\ | Count of file lookup retries done when the job that writes to GCS is lagging | | mySQL_retry_\ | Number of retries done when MySQL is not reachable| | shard_failed_\ | Published when there is a failure while processing the shard | +| custom_transformation_exception | Number of exception encountered in the custom transformation jar | +| filtered_events_\ | Number of events filtered via custom transformation per shard | +| apply_custom_transformation_impl_latency_ms | Time taken for the execution of custom transformation logic. | These can be used to track the pipeline progress. However, there is a limit of 100 on the total number of metrics per project. So if this limit is exhausted, the Dataflow job will give a message like so: @@ -378,10 +381,13 @@ In order to make it easier for users to customize the shard routing logic, the [ Steps to perfrom customization: 1. Write custom shard id fetcher logic [CustomShardIdFetcher.java](https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/spanner-custom-shard/src/main/java/com/custom/CustomShardIdFetcher.java). Details of the ShardIdRequest class can be found [here](https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/spanner-migrations-sdk/src/main/java/com/google/cloud/teleport/v2/spanner/utils/ShardIdRequest.java). 2. Build the [JAR](https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/main/v2/spanner-custom-shard) and upload the jar to GCS -3. Invoke the reverse replication flow by passing the [custom jar path and custom class path](RunnigReverseReplication.md#custom-jar). +3. Invoke the reverse replication flow by passing the [custom jar path and custom class path](RunnigReverseReplication.md#custom-shard-identification). 4. If any custom parameters are needed in the custom shard identification logic, they can be passed via the *readerShardingCustomParameters* input to the runner. These parameters will be passed to the *init* method of the custom class. The *init* method is invoked once per worker setup. - +### Custom transformations +For cases where a user wants to handle a custom transformation logic, they need to specify the following parameters in the [GCS to Sourcedb](https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/main/v2/gcs-to-sourcedb) template - a GCS path that points to a custom jar, fully classified custom class name of the class containing custom transformation logic and custom parameters which might be used by the jar to invoke custom logic to perform transformation. + +For details on how to implement and use custom transformations, please refer to the [Custom Transformation](../transformations/CustomTransformation.md) section. ## Cost diff --git a/docs/reverse-replication/RunnigReverseReplication.md b/docs/reverse-replication/RunnigReverseReplication.md index 4174b6637..2dedb936b 100644 --- a/docs/reverse-replication/RunnigReverseReplication.md +++ b/docs/reverse-replication/RunnigReverseReplication.md @@ -69,6 +69,10 @@ The script takes in multiple arguments to orchestrate the pipeline. They are: - `vpcHostProjectId`: project ID hosting the subnetwork. If unspecified, the 'projectId' parameter value will be used for subnetwork. - `vpcNetwork`: name of the VPC network to be used for the dataflow jobs - `vpcSubnetwork`: name of the VPC subnetwork to be used for the dataflow jobs. Subnet should exist in the same region as the 'dataflowRegion' parameter. +- `writeFilteredEventsToGcs`: Whether to write filtered events to GCS. Default is false. +- `writerTransformationCustomJarPath`: The GCS path to custom jar for custom transformation logic. +- `writerTransformationCustomClassName`: The fully qualified custom class name for custom transformation logic. +- `writerTransformationCustomParameters`: Any custom parameters to be supplied to custom transformation class. ## Pre-requisites @@ -139,9 +143,9 @@ Launched writer job: id:"<>" project_id:"<>" name:"<>" current_state_time:{} cr ``` -### Custom Jar +### Custom shard identification -In order to specify custom shard identification function, custom jar and class names need to give. The command to do that is below: +In order to specify custom shard identification function, custom jar and class names need to be given. The command to do that is below: ``` go run reverse-replication-runner.go -projectId= -dataflowRegion= -instanceId= -dbName= -sourceShardsFilePath=gs://bucket-name/shards.json -sessionFilePath=gs://bucket-name/session.json -gcsPath=gs://bucket-name/ -readerShardingCustomJarPath=gs://bucket-name/custom.jar -readerShardingCustomClassName=com.custom.classname -readerShardingCustomParameters='a=b,c=d' @@ -154,6 +158,26 @@ gcloud dataflow flex-template run smt-reverse-replication-reader-2024-01-05t10-3 ``` +### Custom transformation + +In order to specify custom transformation, custom jar and class names need to be given. The command to do that is below: + +``` +go run reverse-replication-runner.go -projectId= -dataflowRegion= -instanceId= -dbName= -sourceShardsFilePath=gs://bucket-name/shards.json -sessionFilePath=gs://bucket-name/session.json -gcsPath=gs://bucket-name/ -writerTransformationCustomJarPath=gs://bucket-name/custom.jar -writerTransformationCustomClassName=com.custom.classname -writerTransformationCustomParameters='a=b,c=d' +``` + +If a user wants to write the records filtered via custom transformation to GCS, they can use the command below: +``` +go run reverse-replication-runner.go -projectId= -dataflowRegion= -instanceId= -dbName= -sourceShardsFilePath=gs://bucket-name/shards.json -sessionFilePath=gs://bucket-name/session.json -gcsPath=gs://bucket-name/ -writerTransformationCustomJarPath=gs://bucket-name/custom.jar -writerTransformationCustomClassName=com.custom.classname -writerTransformationCustomParameters='a=b,c=d' -writeFilteredEventsToGcs +``` + +The sample reader job gcloud command for the same + +``` +gcloud dataflow flex-template run smt-reverse-replication-writer-2024-04-17t08-00-37z-8dac-d95f --project= --region= --template-file-gcs-location=