-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync to AWS #368
Comments
For the sake of simplicity, would it make more sense to use DynamoDB as the only store? Unless we're going to run up against the 400KB-per-item size limit for dynamo, it seems compelling to only have to configure a single cloud resource instead of two. |
I think that limit would be a problem, yes |
OK, got it. I haven't checked out the actual objects that get synced. I'll look at your PR for GCP to better understand what all gets sent. |
Great! You can find more info on the sync protocol here. Ignore the HTTP bits for the cloud-storage case, but the rest still applies. There's really no size limit on versions -- if a user is putting big chunks of text into their tasks, such as annotations, they might get quite large. Similarly, if they do not sync very often, an accumulation of small changes might get large. The former case doesn't really permit any technical solution -- nothing prevents a user from putting a 500KB annotation on a task, and that would need to be a single operation and thus included in a single version. Snapshots, too, have no size limit, and are proportional to the total number of tasks (of all statuses) a user has. Probably most users have relatively small task sets, but I'm sure there are people out there with 1000's or even more. So, I think we need to take advantage of S3's more-or-less unlimited size. The alternative would be to store versions and snapshots in multiple DynamoDB items, but that seems difficult and would introduce some new failure modes. |
As of August 20, AWS S3 now supports conditional writes, so I think this should be doable with bare S3 without needing additional coordination via Dynamo. |
Wow, new features in an 18-year old product! It looks like this only supports checking whether a file exists, not whether it has been changed, so might require some additional work to figure out how to get the update-if-not-changed behavior we need. |
I haven't checked the sync protocol -- does it support any sort of pessimistic locking around the sync operation? If so, the AWS integration code could identify an object key that acts as a semaphore, and sync will only succeed if the client can acquire the semahpore / do a conditional put to that object. (Also as part of the initial setup a bucket would need to be configured for the semaphore object ot live in, and the bucket policy should have a TTL set so the object will automatically expire after some amount of time, to recover from the case where a client fails to release the semahpore / fails to delete the object.) |
That could work! The only bit that needs synchronization is taskchampion/taskchampion/src/server/cloud/service.rs Lines 30 to 37 in c7c2cde
The locking could occur around that operation. |
Oh yeah, we can totally use pessimistic locking to implement compare-and-swap. 👍 Lemme see if I can carve out some time to work on that. |
Similar to the GCP sync implemented in GothenburgBitFactory/taskwarrior#3185, we should be able to sync replicas to AWS's object storage.
The tricky bit here is that, unlike GCS and Azure Blob, S3 does not provide a compare-and-swap operation.
Some reading suggests that the easiest way to accomplish this would be to use DynamoDB as a lock over the "latest" object in the S3 bucket.
The text was updated successfully, but these errors were encountered: