Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only run CHECKs after a failing Sync attempt #13459

Closed
Tracked by #12119
evantahler opened this issue Jun 3, 2022 · 3 comments · Fixed by #17999
Closed
Tracked by #12119

Only run CHECKs after a failing Sync attempt #13459

evantahler opened this issue Jun 3, 2022 · 3 comments · Fixed by #17999

Comments

@evantahler
Copy link
Contributor

evantahler commented Jun 3, 2022

After #12281, we now run each connector's CHECK command at the start of the sync. This is great for reliability, but potentially slow, and uses more resources.

This feature was added to catch /transient/ configuration errors, like an API Key becoming invalid over time. We rely on these checks to clearly assign the fault to the FailureReason config-error, rather than the connector code having a problem, e.g. system-error. We also know that a sync which is failing due to config-error should not be retried, which saves some resources.

Solution
If the previous attempt for this connection was not successful, CHECKS should be run. This does not include the first attempt, but it does check if the previous attempt in this job failed, or the previous attempt in the previous job failed. We can skip the first attempt because the user will have just CHECKed as part of creating the connection - and if something goes wrong, the next attempt will CHECK again. The "previous attempt" should consider only the most recent attempt before this one, and that might be an attempt within the same sync, or the previous sync's final attempt. Each sync has up to 3 attempts.

This would allow regularly succeeding jobs to always skip the CHECK step. Jobs with failure will be retried, and if the problem is due to user configuration problems, the job will have a terminal FailureReason of config-error, just like today.

Related Links:

@evantahler evantahler changed the title Allow users to configure how often CHECK is run before each sync Allow users to configure how often CHECK is run before each Sync Jun 3, 2022
@evantahler evantahler changed the title Allow users to configure how often CHECK is run before each Sync Allow users to relax how often CHECK is run before each Sync Jun 3, 2022
@evantahler evantahler changed the title Allow users to relax how often CHECK is run before each Sync Only run CHECKs after a failing Sync attempt Jun 14, 2022
@evantahler evantahler self-assigned this Jun 15, 2022
@evantahler
Copy link
Contributor Author

Blocked until #13779 is merged in

@evantahler evantahler removed their assignment Sep 26, 2022
@evantahler evantahler changed the title Only run CHECKs after a failing Sync attempt Only run CHECKs on new syncs or after a failing Sync attempt Oct 6, 2022
@evantahler evantahler changed the title Only run CHECKs on new syncs or after a failing Sync attempt Only run CHECKs after a failing Sync attempt Oct 7, 2022
@evantahler
Copy link
Contributor Author

Update - I think we can skip CHECKing the first attempt/sync as we will have just CHECKed as part of the setup process for the connector - what do you think?

@evantahler
Copy link
Contributor Author

Some information about Temporal at Airbyte:

  • Temporal's nouns are Workflows (a collection of activities) and Activities (the actual operations in the job)
  • A Sync is run by the ConnnectionManagerWorkflow.
  • A sync has the following main steps: Check Source, Check Destination, Replication, Normalize.
  • You can see that in the ConnectionManagerWorkflow here, the Workflow runs 2 activities for each of the checks. This is probably where your new logic will live, and we already do have come conditional logic around if the check should run or not
  • In the middle of the workflow you won't have database access, as these jobs are run on worker pods. That means the information you need about the previous attempt will need to be loaded into the Job Inputs ahead of time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants