Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB / AWS Lambda: CrateDB raises DuplicateKeyException after Lambda is resuming CDC operations #301

Open
Tracked by #231
amotl opened this issue Oct 24, 2024 · 6 comments

Comments

@amotl
Copy link
Member

amotl commented Oct 24, 2024

About

We are observing a problem with the sync lambda.

Problem

Every once in a while a task will time out, which leads to the sync lambda retrying. Now, these retries fail, because, even though the original task had timed out, the data has been correctly stored into CrateDB, for all the items that are part of the batch that has timed out. Do you have any idea what could be causing this behavior?

/cc @dfeokti

@amotl
Copy link
Member Author

amotl commented Oct 24, 2024

a) We will look into the resume logic if we can spot any bugs.
b) Other strategies to compensate for those situations are using upserts / on conflict ignore clauses on the insert statements. We might just employ this strategy here as well.

/cc @wierdvanderhaar, @hammerhead

@amotl
Copy link
Member Author

amotl commented Oct 24, 2024

a) Relating to that comment,

"""
Implement partial batch response for Lambda functions that receive events from
a Kinesis stream. The function reports the batch item failures in the response,
signaling to Lambda to retry those messages later.
"""

and how error handling is taking place,

except Exception as ex:
error_message = f"An error occurred processing event: {event_id}"
logger.exception(error_message)
if USE_BATCH_PROCESSING:
# Return failed record's sequence number.
return {"batchItemFailures": [{"itemIdentifier": cur_record_sequence_number}]}
if ON_ERROR == "exit":
# Signal "Input/output error" when error happens while processing data.
sys.exit(5)
elif ON_ERROR == "ignore":
pass
elif ON_ERROR == "raise":
raise ex

I guess the regular modus operandi for a Lambda that receives events from a Kinesis stream is that if the Lambda fails for whatever reason, recent in-flight events will be re-delivered. If it's multiple records, it is probably normal that some of them may be redundant, because they have been relayed to CrateDB successfully already.

b) I guess using ON CONFLICT IGNORE/UPDATE instead will be the right choice.

@amotl
Copy link
Member Author

amotl commented Oct 24, 2024

@amotl
Copy link
Member Author

amotl commented Oct 25, 2024

Investigation » Comparison

Coming from DMS: How does a replication task handle duplicate data w/o Primary/Unique keys on the table, and How do I modify the error handling task settings for an AWS DMS task?, AWS DMS employs a dedicated error behavior option called FullLoadIgnoreConflicts.

FullLoadIgnoreConflicts – Set this option to true to have AWS DMS ignore "zero rows affected" and "duplicates" errors when applying cached events. If set to false, AWS DMS reports all errors instead of ignoring them. The default is true.

There is also an equivalent that probably applies to both full-load and cdc operation modes:

ApplyErrorInsertPolicy – Determines what action AWS DMS takes when there is a conflict with an INSERT operation. The default is LOG_ERROR. Possible values are IGNORE_RECORD, LOG_ERROR, SUSPEND_TABLE, STOP_TASK, and INSERT_RECORD.

See also:

Evaluation

If it's multiple records, it is probably normal that some of them may be redundant, because they have been relayed to CrateDB successfully already.

AWS DMS' default setting of FullLoadIgnoreConflicts=true effectively implements the same like CrateDB's ON CONFLICT DO NOTHING clause.

@amotl amotl changed the title DynamoDB: DuplicateKeyException after resuming CDC DynamoDB / AWS Lambda: CrateDB raises DuplicateKeyException after resuming CDC operation Oct 25, 2024
@amotl
Copy link
Member Author

amotl commented Oct 25, 2024

So, let's merge and release crate/commons-codec#77 as a quick measure, and then follow up with a more elaborate implementation that will be closer to what DMS is providing, in terms of configuration and logging flexibility, then considering @kneth's suggestion:

Is it possible to log the conflicts? I could imagine that the user would like to know how often it happens and maybe see if there is a pattern.

@amotl amotl changed the title DynamoDB / AWS Lambda: CrateDB raises DuplicateKeyException after resuming CDC operation DynamoDB / AWS Lambda: CrateDB raises DuplicateKeyException after Lambda is resuming CDC operations Oct 25, 2024
@amotl
Copy link
Member Author

amotl commented Oct 29, 2024

We received a request to make crate/commons-codec#77 configurable using a feature flag:

Would it be possible to make this (ON CONFLICT DO NOTHING) configurable?

It will be added on the next development iteration.

@amotl amotl mentioned this issue Oct 29, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant