Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1279 gossiplb inform and decide have bugs - release branch #1349

Conversation

nlslatt
Copy link
Collaborator

@nlslatt nlslatt commented Mar 26, 2021

This PR is the release branch-based equivalent of #1348. Please post review comments on #1348.

Includes many bug fixes and improvements to GossipLB. The following LB args were added:

  • trials (default 3): how many times to repeat the requested number of iterations, hoping to find a better imbalance; helps if it’s easy to get stuck in a local minimum
  • deterministic (default false): for debugging purposes, make the migration decision deterministic assuming deterministic loads (for testing on deterministic loads, consider using the driver in development under Implement general test driver for LB testing #1265)
  • inform (default SyncInform): choice of gossiping approach
    • SyncInform (0): synchronous propagates after all recvs for a round, but has sync cost (matches LBAF approach)
    • AsyncInform (1): asynchronous propagates after the first recv for a round, but avoids sync cost
  • ordering (default Marginal): order in which to evaluate local objects for migration
    • Arbitrary (0): use the unordered_map iteration order
    • ElmID (1): order by ascending element ID
    • Marginal (2): order by descending load starting with the object of marginal load, then order ascending for larger loads
  • cmf (default NormByMax): the algorithm used for computing the CMF
    • Original (0): remove processors from the CMF as soon as they exceed the target (e.g., processor-average) load; use a CMF factor of 1.0/x, where x is the target load
    • NormByMax (1): do not remove processors from the CMF that exceed the target load until the next iteration; use a CMF factor of 1.0x, where x is the maximum of the target load and the most loaded processor in the CMF
    • NormBySelf (2): do not remove processors from the CMF that exceed the target load until the next iteration; use a CMF factor of 1.0x, where x is the load of the processor that is computing the CMF
  • rollback (default true): whether to roll back to an earlier iteration if it had the best imbalance
  • targetpole (default false): whether to replace the processor-average load with the max of that and the maximum object load, effectively redefining overloaded/underloaded based on the longest pole load when it exceeds the processor-average load

Closes #1279

@lifflander lifflander self-requested a review April 5, 2021 19:05
Copy link
Collaborator

@lifflander lifflander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for release.

@nlslatt nlslatt marked this pull request as ready for review April 5, 2021 19:10
@lifflander lifflander merged commit 4cc8812 into 1.0.0-beta.10.4.1-proposed-update Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants