Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

The sync use case #27

Closed
dfahlander opened this issue May 31, 2016 · 8 comments
Closed

The sync use case #27

dfahlander opened this issue May 31, 2016 · 8 comments

Comments

@dfahlander
Copy link

dfahlander commented May 31, 2016

First of all, I'm excited about this proposal because I believe 90% of the use cases with indexedDB involves sync with a remote peer, and this is what is needed to get sync working more performant with indexedDB.

I would just like to share my thoughts about a limitation with the proposal. The proposal will still help out in the sync use case, but maybe not as much as one would think initially.

Consider the following flow after let's say, an add() operation:

  1. The txn.observe() callback is triggered with the change. Now we need to store that in a queue somehow for later sync with server.
  2. If we store it in-memory, we could loose those sync changes if browser closes down before getting online so we would need to store it persistently. We could have a dedicated ObjectStore for this, 'changes' were we store it. We would then need to start a new readwrite transaction to our changes store and record the changes.

The issue here is that there could be a time-window while waiting for the readwrite transaction to succeed storing the changes. There might be other transaction on queue that needs to be executed first. If the browser closes down during that time-window, our changes will be lost. This makes the sync use case unreliable. In practice, this means that we cannot utilize observe() alone for the sync use case.

Maybe this is ok. Reliable sync is already possible with pouchdb and dexie without it. The proposal will still give us the great missing part - to be able to get triggered when something happens in the DB so we know when we need to sync with our server, which has been one of the the trickier part to deal with.

@dmurph
Copy link
Collaborator

dmurph commented May 31, 2016

I think see what you mean. In order for this to be 100% reliable with crashes and such, the following case would have to be handled:

  1. A transaction is finished & commiting on the backend.
  2. The data is committed, and we're about to start calling our observers.
  3. The browser crashes or is shut down.

The spec doesn't cover how we would be able to receive these changes after the browser is restarted. Is this what you mean?

@dfahlander
Copy link
Author

dfahlander commented May 31, 2016

Yes, and it's not nesessary a crash - it could be a page reload during a long-running transaction that was in the queue before us.

But I'm not sure this is an issue because it's still possible to build sync on the existing indexedDB API. I just want to make it clear that it would be unsafe to build sync engine on this API alone. Sync could continue to be built upon intercepting adds, puts and deletes by a lib and put those changes in a dedicated ObjectStore for change tracking, and then use this proposal just to let the sync worker get notified when something was put into that store.

In dexie's Syncable addon, sync is accomplished by making sure every modification is also stored in the oplog / changes store. Possible if assuming the application only use our library to access the database - a reasonable precondition. Our lib will do the following to ensure waterproof sync:

  1. Always include the 'changes' objectStore in any readwrite transaction of an observed objectStore.
  2. When anything is written to the observed objectStore, a corresponding change is added to 'changes' using the same transaction.

So if the transaction is aborted, so will the changes be, and if the transaction commits, we're guaranteed that the 'changes' store commits too.

I'm not sure this proposal should go too deep. Dexie and other libs would benefit from this proposal as it is today because we would get notified when changes are put into our own 'changes-store', which would enable a more performant and eager sync. (Today I need to trigger the 'storage' event or fallback to polling the db to get notified across windows or workers).

That said, if there is an intent to solve the sync use case more deeply, I would suggest to make the observer get the original readwrite transaction instead of a new readonly transaction. Plus making it possible to always include additional object stores whenever a readwrite transaction is created. Additionally, I would also like to preserve the order of the changes even when they occur in different object stores. Either by including an incremental counter with each change that is unique across all object stores, or just use a flat array of change entries instead of a Map.

Example:
Application code:

var trans = db.transaction(['todos'], 'readwrite');

...would implicitely do the following:

var trans = db.transaction(['todos', 'oplog'], 'readwrite');

...in case an observer was intiated the following way:

var control = txn.observe(changes => {
    var oplogStore = changes.transaction.objectStore('oplog');
    changes.records.get('todos').forEach(change => {
        oplogStore.add(change);
    });
}, {
    includeOriginalTransaction: true, // Includes the original transaction before committing it.
    includeValues: true,
    additionalStores: ['oplog']
});

@dmurph
Copy link
Collaborator

dmurph commented Jun 1, 2016

Does this possibly serve the usecase? I might be missing something here:

var txn = db.transaction('oplog', 'readonly');
var ctrl = txn.observe(changes => {
  var changesForNetwork = [];
  changes.records.get('oplog').forEach(change => {
    if (change.type == 'delete') return;
    changesForNetwork.push(change);
  });
  sendChangesToNetwork(changesForNetwork).then(function() {
       var removalTxn = db.transaction('oplog', 'readwrite');
       removalTxn.objectstore('oplog').delete(changesForNetwork); // psuedocode here
    });
});
// Here we catch any changes that we missed due to crashing
// or shutting down.
var readAll = txn.getAll();
readAll.onsuccess = function() {
  sendPendingChangesToNetwork(readAll.result).then(function() {
       var removalTxn = db.transaction('oplog', 'readwrite');
       removalTxn.objectstore('oplog').delete(readAll.result); // psuedocode here
     }).catch(function(){
       // couldn't send changes to network.
     });
}

The main idea here is that whoever makes the changes to, say, the 'book' table, adds the change operations to the 'oplog' objectstore as part of the original transaction. The sync worker can observe that and send them to the network. If there was a crash, the oplog will still be around and on startup you could read it in and send it all to the server. The server would have to deal with the possible duplicate oplog changes, as the network could have failed or we shut down before we delete those records.

This also brings up the nice use case of being able to filter by operation type.

@dfahlander
Copy link
Author

Yes, your code perfectly does what the worker should do. The sync server needs to be idempotent, but that's the usual case.

@dmurph
Copy link
Collaborator

dmurph commented Jun 2, 2016

Ah, I'm now realizing that I did exactly what you said above. I understand what you're saying now. The features you mentioned (especially the transaction borrowing) I think make the feature a bit too heavyweight and specialized, and it also makes it much more complex to implement. I'm trying to think of another way to do this without having the application keep an oplog table. @cmumford, do you have any ideas?

It sounds like the perfect combo here would be some sort of trigger support for data modification, which would be part of the original transaction. Because this would involve a new (or even existing) readwrite transaction that relies on more javascript, I'm a little scared of the performance implications here. But it's not super crazy. Definitely a different feature though.

Here are some other features requests that came from this:

  1. Let the changes be either in a flat list or have some sort of order number between object stores. (Issue Make changes a flat list, or add ordering numbers #28)
  2. Allow one to filter by operation type. (Issue Add operation type filter #29)

@dfahlander
Copy link
Author

I think it's ok having an own oplog table, because sync can be done in many different ways and maybe it's best to let the lib decide strategy instead of implementing too much in the browser.

Pinging @nolanlawson for a comment...

@dmurph
Copy link
Collaborator

dmurph commented Jun 7, 2016

I think, at least for now, the system accomplishes what we need it to for the server-sync use case. Feel free to re-open if you have other ideas or issues.

@dmurph dmurph closed this as completed Jun 7, 2016
@dmurph
Copy link
Collaborator

dmurph commented Jun 9, 2016

Also, thanks so much for your input :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants