-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incremental backup and incremental backfill generate different file names #90
Comments
I've realized that |
confirmed that sorting key obj before generating the id hash resolves this issue. |
btalbot
added a commit
to iJJi/dynamodb-replicator
that referenced
this issue
May 9, 2017
This exposes a bug (previously filed as mapbox#90) which occurs when items with a range key are read from the DDB event stream, an md5 hash of the key is computed and the item written to S3. The issue is that the DDB event stream handler does not (and should not) do a 'describe_table' to know which key is the HASH and which is the RANGE and therefor simply generates the md5 hash of the item keys in whatever order they happen to appear in the stream event. The s3-backfill util does do a 'describe_table' and does order the keys by declaration order which DDB requires to be HASH first, RANGE second. The different ordering of the item keys will produce a distinct md5 hash value and different S3 path/keys will result in some items appearing twice in S3 effectively corrupting the incremental backups since two valid versions will be present at the same time.
btalbot
added a commit
to iJJi/dynamodb-replicator
that referenced
this issue
May 9, 2017
This exposes a bug (previously filed as mapbox#90) which occurs when items with a range key are read from the DDB event stream, an md5 hash of the key is computed and the item written to S3. The issue is that the DDB event stream handler does not (and should not) do a 'describe_table' to know which key is the HASH and which is the RANGE and therefor simply generates the md5 hash of the item keys in whatever order they happen to appear in the stream event. The s3-backfill util does do a 'describe_table' and does order the keys by declaration order which DDB requires to be HASH first, RANGE second. The different ordering of the item keys will produce a distinct md5 hash value and different S3 path/keys will result in some items appearing twice in S3 effectively corrupting the incremental backups since two valid versions will be present at the same time.
akum32
added a commit
to ACloudGuru/dynamodb-replicator
that referenced
this issue
Sep 7, 2017
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi there!
First off, great library. It's super useful and a much better/simpler option (for me) than the whole EMR/Datapipeline situation.
I have this simple lambda function that is subscribed to the tables I want to update:
(the bucket, region, and prefix are set as env variables in the lambda function)
Then I ran the backfill by importing
dynamodb-replicator/s3-backfill
and passing it a config object.However, I noticed that when records get updated via the stream/lambda function, they are written to a different file from the one created by the backfill.
I see that the formula for generating filenames is slightly different.
https://github.com/mapbox/dynamodb-replicator/blob/master/s3-backfill.js#L46-L48
https://github.com/mapbox/dynamodb-replicator/blob/master/index.js#L130-L132
Does this make any practical difference? Should the restore function work regardless?
The text was updated successfully, but these errors were encountered: