Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous replication #19

Open
kevstigneev opened this issue May 20, 2015 · 4 comments
Open

Asynchronous replication #19

kevstigneev opened this issue May 20, 2015 · 4 comments

Comments

@kevstigneev
Copy link
Contributor

Replication process should not affect operation of the primary Nexus instance that receives an initial artifact upload from a user. If anything with peers go wrong or replication takes too long - it should not impact the primary instance.

Expected results:

  • A replica instance responds to an artifact upload notification message immediately when it locates a repostory to proxy the artifact. Actual artifact download to the proxy goes after that, its success is not reported to the primary instance.

Note: Nexus instances are referred to as "primary" and "replica" respectively to a particular artifact upload.

@kevstigneev
Copy link
Contributor Author

To be discusset - do we need such async replication at all. Or it's better to make the primary node responsible for the replication till it completes.

@ctapobep
Copy link

Additionally there is an option in between:

  • Primary instance sends a notification, Replica answers right away
  • Primary instance polls Replicas for the status of the replication.

In this way Primary node may ask for the status even if it was down for some time. But both the Primary & Replicas would need to keep replication history for some time (a day?).

Leaving requests synchronous is dangerous for the health of the Primary.

@kevstigneev
Copy link
Contributor Author

@ctapobep I'd say there are 2 realistic options:

  1. Trade resources for consistency - the replication is synchronous. It's the current implementation.
  2. Trade consistency for resources - fire and forget notifications. Replica cares on mirroring (as without the pligin), notifications just help it.

The intermediate option with polling is overcomplicated. Both implementation and operation-wise.

Thread pool for notification senders is of fixed size so no resource exhaustion is expected.

@ctapobep
Copy link

But these threads themselves might be exhausted. And normal situation (with large artifacts) will be hardly distinguishable from networking issues. E.g. if the TCP packages are simply dropped. If we allow to wait for a long time (in case of large artifacts) then we also allow for long timeouts which may saturate all the event-listening threads and we'd need to restart the Primary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants