Asynchronous replication #19

kevstigneev · 2015-05-20T17:07:07Z

Replication process should not affect operation of the primary Nexus instance that receives an initial artifact upload from a user. If anything with peers go wrong or replication takes too long - it should not impact the primary instance.

Expected results:

A replica instance responds to an artifact upload notification message immediately when it locates a repostory to proxy the artifact. Actual artifact download to the proxy goes after that, its success is not reported to the primary instance.

Note: Nexus instances are referred to as "primary" and "replica" respectively to a particular artifact upload.

kevstigneev · 2015-05-20T17:13:04Z

To be discusset - do we need such async replication at all. Or it's better to make the primary node responsible for the replication till it completes.

ctapobep · 2015-05-21T02:34:10Z

Additionally there is an option in between:

Primary instance sends a notification, Replica answers right away
Primary instance polls Replicas for the status of the replication.

In this way Primary node may ask for the status even if it was down for some time. But both the Primary & Replicas would need to keep replication history for some time (a day?).

Leaving requests synchronous is dangerous for the health of the Primary.

kevstigneev · 2015-05-21T14:03:58Z

@ctapobep I'd say there are 2 realistic options:

Trade resources for consistency - the replication is synchronous. It's the current implementation.
Trade consistency for resources - fire and forget notifications. Replica cares on mirroring (as without the pligin), notifications just help it.

The intermediate option with polling is overcomplicated. Both implementation and operation-wise.

Thread pool for notification senders is of fixed size so no resource exhaustion is expected.

ctapobep · 2015-05-21T14:33:18Z

But these threads themselves might be exhausted. And normal situation (with large artifacts) will be hardly distinguishable from networking issues. E.g. if the TCP packages are simply dropped. If we allow to wait for a long time (in case of large artifacts) then we also allow for long timeouts which may saturate all the event-listening threads and we'd need to restart the Primary.

kevstigneev added the enhancement label May 20, 2015

kevstigneev mentioned this issue May 20, 2015

RESTful webservice to receive notifications of artifact uploads #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronous replication #19

Asynchronous replication #19

kevstigneev commented May 20, 2015

kevstigneev commented May 20, 2015

ctapobep commented May 21, 2015

kevstigneev commented May 21, 2015

ctapobep commented May 21, 2015

Asynchronous replication #19

Asynchronous replication #19

Comments

kevstigneev commented May 20, 2015

kevstigneev commented May 20, 2015

ctapobep commented May 21, 2015

kevstigneev commented May 21, 2015

ctapobep commented May 21, 2015