-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track record schema validation errors in Datadog #13114
Conversation
final RecordSchemaValidator recordSchemaValidator) { | ||
final RecordSchemaValidator recordSchemaValidator, | ||
final UUID workspaceId, | ||
final String dockerImage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename this to sourceDockerImager
cc: @davinchia Here is what was discussed in the dev meeting yesterday and as you commented would be a good fit for recording to DataDog/OTEL instead. |
|
||
this.cancelled = new AtomicBoolean(false); | ||
this.hasFailed = new AtomicBoolean(false); | ||
} | ||
|
||
public DefaultReplicationWorker(final String jobId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmossman do you know if there have been discussions about moving everything to the container orchestrator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have made container orchestrator the default for kube deployments in OSS, but I found that the container orchestrator logic as written today does not work with docker-compose deployments, and some work will be required to fix that. I created a Spike ticket here to investigate that more: #13142
Though, this DefaultReplicationWorker
class is used by container orchestrators as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmossman nice! this is from 0.39.0-alpha onwards?
Does that mean the docker deployment is not running the container orchestrator for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct - part of the changes the Migrate OSS to temporal scheduler PR was setting the CONTAINER_ORCHESTRATOR_ENABLED
env var to true
for all kube deploys. It is still not set in the docker-compose env file, so it will default to the value of false
for those deployments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the PR since Jonathan tagged me. I noticed one potential issue around where the workspace id should be injected to keep things clean I want to resolve before we merge this in. Details in the comment. TLDR: we don't want direct db access from a pod involved in the job execution, and probably want to shift workspaceId
injection further up the creation chain.
Taking a step back, between the Segment event and the Datadog metric, I feel the DD metric provides more immediate value since it helps us action on this in Cloud. Though the Segment alert is definitely useful, it's also less actionable (we cannot look at OSS user's data to debug) and less urgent (OSS users are likely to open issues when they spot them).
The DD metric LOE is also much lower/simpler as it's a count emission with a connector image tag and doesn't require new information injected throughout the system. If the team is strapped for time, we'd get more value implementing only the DD metric and leaving this for later. If the team has time and is open to learning, of course implementing both is great!
@@ -89,6 +107,25 @@ public Optional<String> runJob() throws Exception { | |||
sourceLauncherConfig.getDockerImage().equals(WorkerConstants.RESET_JOB_SOURCE_DOCKER_IMAGE_STUB) ? new EmptyAirbyteSource() | |||
: new DefaultAirbyteSource(workerConfigs, sourceLauncher); | |||
|
|||
final FeatureFlags featureFlags = new EnvVariableFeatureFlags(); | |||
final String driverClassName = "org.postgresql.Driver"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validationErrors.forEach((stream, errorPair) -> { | ||
if (workspaceId != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I can tell, workspaceId
never changes, so this should be able to be combined with the validationErrors.isEmpty()
check on line 367 to simplify things.
@@ -89,6 +107,25 @@ public Optional<String> runJob() throws Exception { | |||
sourceLauncherConfig.getDockerImage().equals(WorkerConstants.RESET_JOB_SOURCE_DOCKER_IMAGE_STUB) ? new EmptyAirbyteSource() | |||
: new DefaultAirbyteSource(workerConfigs, sourceLauncher); | |||
|
|||
final FeatureFlags featureFlags = new EnvVariableFeatureFlags(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the job orchestrator runs in the jobs
namespace as part of a job in Cloud. in theory, the orchestrator is fire-and-forget, so I don't think we want to allow direct database access from the orchestrator. Doing so also present some security risk, since today we sandbox the jobs
namespace off from the ab
namespace, so we would have to make some allowances for the orchestrator pod - not the worst but not very clean.
I think the right way to do this is to inject the workspace id via the ReplicationActivityImpl (this runs in the ab
namespace). This follow how configs are currently propagated to the jobs - as static files - so we can keep the interface consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining this Davin. This is something Anne and I also discussed over zoom yesterday; I agree that we do not want to have direct db access in the orchestrator pod, so querying for the workspace ID should be moved higher up the chain.
908197f
to
f0935f9
Compare
Track record schema validation errors in Segment