Track record schema validation errors in Datadog #13114

alovew · 2022-05-24T01:10:36Z

Track record schema validation errors in Segment

lmossman · 2022-05-24T17:06:25Z

airbyte-workers/src/main/java/io/airbyte/workers/general/DefaultReplicationWorker.java

-                                  final RecordSchemaValidator recordSchemaValidator) {
+                                  final RecordSchemaValidator recordSchemaValidator,
+                                  final UUID workspaceId,
+                                  final String dockerImage) {


nit: rename this to sourceDockerImager

jdpgrailsdev · 2022-05-24T17:22:01Z

cc: @davinchia Here is what was discussed in the dev meeting yesterday and as you commented would be a good fit for recording to DataDog/OTEL instead.

davinchia · 2022-05-25T10:29:10Z

airbyte-workers/src/main/java/io/airbyte/workers/general/DefaultReplicationWorker.java


    this.cancelled = new AtomicBoolean(false);
    this.hasFailed = new AtomicBoolean(false);
  }

+  public DefaultReplicationWorker(final String jobId,


@lmossman do you know if there have been discussions about moving everything to the container orchestrator?

Yes, we have made container orchestrator the default for kube deployments in OSS, but I found that the container orchestrator logic as written today does not work with docker-compose deployments, and some work will be required to fix that. I created a Spike ticket here to investigate that more: #13142

Though, this DefaultReplicationWorker class is used by container orchestrators as well

@lmossman nice! this is from 0.39.0-alpha onwards?

Does that mean the docker deployment is not running the container orchestrator for now?

That's correct - part of the changes the Migrate OSS to temporal scheduler PR was setting the CONTAINER_ORCHESTRATOR_ENABLED env var to true for all kube deploys. It is still not set in the docker-compose env file, so it will default to the value of false for those deployments

davinchia

I took a look at the PR since Jonathan tagged me. I noticed one potential issue around where the workspace id should be injected to keep things clean I want to resolve before we merge this in. Details in the comment. TLDR: we don't want direct db access from a pod involved in the job execution, and probably want to shift workspaceId injection further up the creation chain.

Taking a step back, between the Segment event and the Datadog metric, I feel the DD metric provides more immediate value since it helps us action on this in Cloud. Though the Segment alert is definitely useful, it's also less actionable (we cannot look at OSS user's data to debug) and less urgent (OSS users are likely to open issues when they spot them).

The DD metric LOE is also much lower/simpler as it's a count emission with a connector image tag and doesn't require new information injected throughout the system. If the team is strapped for time, we'd get more value implementing only the DD metric and leaving this for later. If the team has time and is open to learning, of course implementing both is great!

Happy to discuss @lmossman @alovew

davinchia · 2022-05-25T10:31:52Z

...orchestrator/src/main/java/io/airbyte/container_orchestrator/ReplicationJobOrchestrator.java

@@ -89,6 +107,25 @@ public Optional<String> runJob() throws Exception {
        sourceLauncherConfig.getDockerImage().equals(WorkerConstants.RESET_JOB_SOURCE_DOCKER_IMAGE_STUB) ? new EmptyAirbyteSource()
            : new DefaultAirbyteSource(workerConfigs, sourceLauncher);

+    final FeatureFlags featureFlags = new EnvVariableFeatureFlags();
+    final String driverClassName = "org.postgresql.Driver";


nit: use DatabaseDriver enum. https://github.com/airbytehq/airbyte/blob/master/airbyte-db/lib/src/main/java/io/airbyte/db/factory/DatabaseDriver.java#L19

davinchia · 2022-05-25T10:33:33Z

airbyte-workers/src/main/java/io/airbyte/workers/general/DefaultReplicationWorker.java

          validationErrors.forEach((stream, errorPair) -> {
+            if (workspaceId != null) {


as far as I can tell, workspaceId never changes, so this should be able to be combined with the validationErrors.isEmpty() check on line 367 to simplify things.

davinchia · 2022-05-25T10:43:27Z

...orchestrator/src/main/java/io/airbyte/container_orchestrator/ReplicationJobOrchestrator.java

@@ -89,6 +107,25 @@ public Optional<String> runJob() throws Exception {
        sourceLauncherConfig.getDockerImage().equals(WorkerConstants.RESET_JOB_SOURCE_DOCKER_IMAGE_STUB) ? new EmptyAirbyteSource()
            : new DefaultAirbyteSource(workerConfigs, sourceLauncher);

+    final FeatureFlags featureFlags = new EnvVariableFeatureFlags();


the job orchestrator runs in the jobs namespace as part of a job in Cloud. in theory, the orchestrator is fire-and-forget, so I don't think we want to allow direct database access from the orchestrator. Doing so also present some security risk, since today we sandbox the jobs namespace off from the ab namespace, so we would have to make some allowances for the orchestrator pod - not the worst but not very clean.

I think the right way to do this is to inject the workspace id via the ReplicationActivityImpl (this runs in the ab namespace). This follow how configs are currently propagated to the jobs - as static files - so we can keep the interface consistent.

Thanks for explaining this Davin. This is something Anne and I also discussed over zoom yesterday; I agree that we do not want to have direct db access in the orchestrator pod, so querying for the workspace ID should be moved higher up the chain.

github-actions bot added area/platform issues related to the platform area/worker Related to worker labels May 24, 2022

alovew requested a review from lmossman May 24, 2022 01:11

alovew temporarily deployed to more-secrets May 24, 2022 01:12 Inactive

alovew temporarily deployed to more-secrets May 24, 2022 17:05 Inactive

lmossman reviewed May 24, 2022

View reviewed changes

alovew temporarily deployed to more-secrets May 24, 2022 22:07 Inactive

davinchia reviewed May 25, 2022

View reviewed changes

davinchia requested changes May 25, 2022

View reviewed changes

alovew temporarily deployed to more-secrets May 25, 2022 20:38 Inactive

alovew temporarily deployed to more-secrets May 26, 2022 00:34 Inactive

alovew changed the title ~~Track record schema validation errors in Segment~~ Track record schema validation errors in Datadog May 26, 2022

alovew temporarily deployed to more-secrets May 26, 2022 19:29 Inactive

alovew temporarily deployed to more-secrets May 31, 2022 16:58 Inactive

alovew added 6 commits May 31, 2022 10:56

Track record schema validation errors in Segment

a663a3a

better error messages for record schema validations

11db230

use datadog for tracking

8aeafc3

change metric emitting app

8bca4be

comment docker info

31a4cf9

add correct env var and initialize datadog client

f0935f9

alovew force-pushed the anne/add-segment-tracking-for-validation-errors branch from 908197f to f0935f9 Compare May 31, 2022 17:56

alovew temporarily deployed to more-secrets May 31, 2022 17:58 Inactive

apply pmd change

68521cf

alovew temporarily deployed to more-secrets May 31, 2022 18:36 Inactive

alovew closed this Jun 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track record schema validation errors in Datadog #13114

Track record schema validation errors in Datadog #13114

alovew commented May 24, 2022

lmossman May 24, 2022

jdpgrailsdev commented May 24, 2022

davinchia May 25, 2022

lmossman May 25, 2022

davinchia May 25, 2022 •

edited

Loading

lmossman May 25, 2022

davinchia left a comment

davinchia May 25, 2022

davinchia May 25, 2022

davinchia May 25, 2022

lmossman May 25, 2022

		validationErrors.forEach((stream, errorPair) -> {
		if (workspaceId != null) {

Track record schema validation errors in Datadog #13114

Track record schema validation errors in Datadog #13114

Conversation

alovew commented May 24, 2022

Choose a reason for hiding this comment

jdpgrailsdev commented May 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davinchia May 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davinchia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davinchia May 25, 2022 •

edited

Loading