This repository has been archived by the owner on Apr 13, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix: improve error logging for ddbToEs sync #68
fix: improve error logging for ddbToEs sync #68
Changes from 1 commit
e0b7f10
c41390a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this will result in some messages that were processed successfully being sent to the dlq, since a single message makes the batch fail and retrying the same batch will continue to fail.
It is common to use a batch size of 1 to workaround this issue. An alternative is to enable
BisectBatchOnFunctionError
although I haven't used that setting before an I'm not sure about how it interacts withMaximumRetryAttempts
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-eventsourcemapping.html#cfn-lambda-eventsourcemapping-bisectbatchonfunctionerror
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that is exactly the case some messages will succeed but the batch will fail if a single message fails.
I worry batch size of 1 may slow down our our sync too much. I looked into
BisectBatch
but as you mentioned I was not sure how it works withMaxRetry
and I wasn't able to find documentation around it either. I suspect that it will Bisect at most MaxRetry times.These writes are mostly idempotent, but there could be a use-case of a resource's availability switches due to this. ie 1) "AVAILIABLE" write fails and goes to DLQ, 2) "DELETE" write passes, 3) DLQ redrive changes the ES doc from DELTED -> AVAILIVABLE
A thing to note this DLQ redrive is a manual process and in reality I suspect that this operation would need a runbook laying out when to 'redrive' the DLQ and when not to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that guaranteeing that only the failed messages go to the DLQ is a very desirable property of the system. Otherwise ops become harder for customers for no good reason (why are there so many DLQ messages? How come only 6% of them actually failed? How can I know which of them actually failed?)
Another desirable property is handling out of order messages. Our current implementation does not do that(not the same as idempotency). It could be achieved by updating ES only if the vid of the incoming message is higher than the vid of the document in ES. This would make it safe to redrive DLQ messages. I think we can tackle this later as a separate issue.
IMO sending only the failed messages to the DLQ should be done now (can still be a different PR). I agree that
BisectBatch
has scarce documentation, but is worth testing it out. MaybeMaxRetry=4
andBisectBatch=true
with ourBatchSize=15
will effectively isolate the error to a single record.The cheap alternative is
MaxRetry=1
.My intuition tells me the same, but we need data in order to discard that approach