Sync to AWS #368

djmitche · 2023-12-26T15:53:06Z

Similar to the GCP sync implemented in GothenburgBitFactory/taskwarrior#3185, we should be able to sync replicas to AWS's object storage.

The tricky bit here is that, unlike GCS and Azure Blob, S3 does not provide a compare-and-swap operation.

Some reading suggests that the easiest way to accomplish this would be to use DynamoDB as a lock over the "latest" object in the S3 bucket.

dathanb · 2023-12-27T00:30:04Z

For the sake of simplicity, would it make more sense to use DynamoDB as the only store? Unless we're going to run up against the 400KB-per-item size limit for dynamo, it seems compelling to only have to configure a single cloud resource instead of two.

djmitche · 2023-12-27T00:57:53Z

I think that limit would be a problem, yes

dathanb · 2023-12-27T01:25:47Z

OK, got it. I haven't checked out the actual objects that get synced. I'll look at your PR for GCP to better understand what all gets sent.

djmitche · 2023-12-27T16:09:54Z

Great!

You can find more info on the sync protocol here. Ignore the HTTP bits for the cloud-storage case, but the rest still applies.

There's really no size limit on versions -- if a user is putting big chunks of text into their tasks, such as annotations, they might get quite large. Similarly, if they do not sync very often, an accumulation of small changes might get large. The former case doesn't really permit any technical solution -- nothing prevents a user from putting a 500KB annotation on a task, and that would need to be a single operation and thus included in a single version. Snapshots, too, have no size limit, and are proportional to the total number of tasks (of all statuses) a user has. Probably most users have relatively small task sets, but I'm sure there are people out there with 1000's or even more.

So, I think we need to take advantage of S3's more-or-less unlimited size. The alternative would be to store versions and snapshots in multiple DynamoDB items, but that seems difficult and would introduce some new failure modes.

dathanb · 2024-09-09T04:11:21Z

As of August 20, AWS S3 now supports conditional writes, so I think this should be doable with bare S3 without needing additional coordination via Dynamo.

djmitche · 2024-09-09T12:09:20Z

Wow, new features in an 18-year old product!

It looks like this only supports checking whether a file exists, not whether it has been changed, so might require some additional work to figure out how to get the update-if-not-changed behavior we need.

dathanb · 2024-09-09T16:46:33Z

I haven't checked the sync protocol -- does it support any sort of pessimistic locking around the sync operation? If so, the AWS integration code could identify an object key that acts as a semaphore, and sync will only succeed if the client can acquire the semahpore / do a conditional put to that object. (Also as part of the initial setup a bucket would need to be configured for the semaphore object ot live in, and the bucket policy should have a TTL set so the object will automatically expire after some amount of time, to recover from the case where a client fails to release the semahpore / fails to delete the object.)

djmitche · 2024-09-09T22:49:09Z

That could work! The only bit that needs synchronization is latest, in a kind of test-and-swap fashion.

taskchampion/taskchampion/src/server/cloud/service.rs

Lines 30 to 37 in c7c2cde

    
           /// Compare the existing object's value with `existing_value`, and replace with `new_value` 
        
           /// only if the values match. Returns true if the replacement occurred. 
        
           fn compare_and_swap( 
        
               &mut self, 
        
               name: &[u8], 
        
               existing_value: Option<Vec<u8>>, 
        
               new_value: Vec<u8>, 
        
           ) -> Result<bool>;

The locking could occur around that operation.

dathanb · 2024-09-09T22:52:58Z

Oh yeah, we can totally use pessimistic locking to implement compare-and-swap. 👍 Lemme see if I can carve out some time to work on that.

djmitche transferred this issue from GothenburgBitFactory/taskwarrior Apr 21, 2024

djmitche moved this to Ready in Taskwarrior Development Jun 1, 2024

djmitche moved this from Ready to Backlog in Taskwarrior Development Jun 4, 2024

djmitche added the topic:sync label Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync to AWS #368

Sync to AWS #368

djmitche commented Dec 26, 2023

dathanb commented Dec 27, 2023

djmitche commented Dec 27, 2023

dathanb commented Dec 27, 2023

djmitche commented Dec 27, 2023

dathanb commented Sep 9, 2024

djmitche commented Sep 9, 2024

dathanb commented Sep 9, 2024

djmitche commented Sep 9, 2024

dathanb commented Sep 9, 2024

Sync to AWS #368

Sync to AWS #368

Comments

djmitche commented Dec 26, 2023

dathanb commented Dec 27, 2023

djmitche commented Dec 27, 2023

dathanb commented Dec 27, 2023

djmitche commented Dec 27, 2023

dathanb commented Sep 9, 2024

djmitche commented Sep 9, 2024

dathanb commented Sep 9, 2024

djmitche commented Sep 9, 2024

dathanb commented Sep 9, 2024