Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elected summarizer logs #6106

Merged
merged 16 commits into from
May 27, 2021
Merged

Elected summarizer logs #6106

merged 16 commits into from
May 27, 2021

Conversation

arinwt
Copy link
Contributor

@arinwt arinwt commented May 11, 2021

Fixes #5617 by adding telemetry for when a single elected client fails to summarize for long enough. This counter resets every time the elected summarizer changes, so it isn't tracking per document as a whole like the one in ContainerRuntime is (those events are "SummaryStatus:Behind" and "SummaryStatus:CaughtUp").

This required different ordered client tracking. Used a specialized two-way linked list to track ordered clients for quick calculation of elected client- including increment and reset, plus a map for quick removal. The adding should also be constant time unless somehow they come out of order. This replaces the QuorumHeap previously used. Added OrderedClients in a new file: summaryManagerUtils.ts with tests around it. Also put the Throttler here.

Also change SummaryCollection to use events instead of op actions.

@arinwt arinwt marked this pull request as ready for review May 12, 2021 17:21
@@ -112,7 +112,7 @@ export function combineAppAndProtocolSummary(appSummary: ISummaryTree, protocolS
export function configurableUrlResolver(resolversList: IUrlResolver[], request: IRequest): Promise<IResolvedUrl | undefined>;

// @public (undocumented)
export function createGenericNetworkError(errorMessage: string, canRetry: boolean, retryAfterSeconds?: number, statusCode?: number): GenericNetworkError | ThrottlingError;
export function createGenericNetworkError(errorMessage: string, canRetry: boolean, retryAfterSeconds?: number, statusCode?: number): ThrottlingError | GenericNetworkError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerbutler any idea why this keeps flip flopping? seen in it my changes as well

Copy link
Member

@tylerbutler tylerbutler May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've noticed it as well. No idea why yet. I'll open an issue. #6158

// this.orderedClients.incrementCurrentClient();
}
});
this.summaryCollection.on(MessageType.SummaryAck, (op) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to track acks, or just ops? I've gone back and forth a bit on this. if one client's ops fail to ack, do we think another client's will?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acks is more important, but both can be valuable: each summarizer client tracks both for itself- attempt vs success. And no, it's not true that summary attempts that are nacked by one client imply that they will definitely be nacked by another client, especially if the internal state tracking of that summarizer client is busted. Though, in most cases simply restarting the client would be sufficient.

return;
}

// Normal case is adding the latest client, which will bypass loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not normal being the initiall population of the list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically I guess, I'm not really sure what case allows them to come in unordered, since they are serialized as an array and already sorted naturally. I did not want to assume that serialization/deserialization rule just yet though.

}

if (this.currentClient === undefined && newClient.nextClient === undefined) {
this.currentClient = newClient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know what current is yet, but here it looks the be the first last client. which is strange

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revamped it a little, so instead of using overly abstract terms like "current client", "last client", and "non-summarizer clients", it is now "elected client", "youngest client", and "eligible clients". So basically this thing tracks ordered clients for purposes of election only. Eligible is the abstraction over interactive, but exists in case we wanted that to mean something else.


public getCurrentClient(): ITrackedClient | undefined {
return this.currentClient;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be made simpler if this always just returne root.next, and increment was changed to remove root.next. what's the use of keep the invalid clients around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the reset functionality, which we will likely need when all clients have been marked as bad summarizers we need to start over and probably just blow up or delay cycling through them again or something.

@msfluid-bot
Copy link
Collaborator

msfluid-bot commented May 20, 2021

@fluidframework/base-host: No change
Metric NameBaseline SizeCompare SizeSize Diff
main.js 177.17 KB 177.17 KB No change
Total Size 177.17 KB 177.17 KB No change
@fluid-example/bundle-size-tests: +2.13 KB
Metric NameBaseline SizeCompare SizeSize Diff
container.js 205.99 KB 208.12 KB +2.13 KB
map.js 50.1 KB 50.1 KB No change
matrix.js 146.09 KB 146.09 KB No change
odspDriver.js 196.56 KB 196.56 KB No change
odspPrefetchSnapshot.js 27.39 KB 27.39 KB No change
sharedString.js 162.44 KB 162.44 KB No change
Total Size 788.58 KB 790.7 KB +2.13 KB

Baseline commit: ecbf0d8

Generated by 🚫 dangerJS against e829909

this.summarizerClientId = newSummarizerClientId;
const newSummarizerClientId = this.orderedClients.getElectedClient()?.clientId;
if (newSummarizerClientId !== this.electedClientId) {
this.electedClientId = newSummarizerClientId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.electedClientId

can we get rid of the member this.electedClientId, and just make it a getter that returns this.orderedClients.getElectedClient()?.clientId. I believe this method will only be called when the summarizer has changed. not needed. i just like to reduce state when we can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no, I think we should keep it for now. Agree with the sentiment of reducing state, but right now it does prevent excess "summarizer" event emitting. We call this function every time any change happens.


/** Returns the currently elected client. */
public getElectedClient(): ITrackedClient | undefined {
return this.electedClient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this ever return root, seems like it can now. i think in that case it should return undefined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't and cannot right now. It only ever gets set to youngerClient or newClient. And no client has the root as its youngerClient.

  1. Add client will set it to the added client in some cases (new)
  2. Remove client will set it to the younger client (next)
  3. Increment will set it to the younger client (next)
  4. Reset will set it to the root's younger client (next)

};

private readonly removeClient = (clientId: string) => {
const removeClient = this.clientMap.get(clientId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just a normal function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did it because in the constructor I attach addClient and removeClient as event handlers, and I want to make sure I encapsulate the this correctly without having to wrap them in lambdas. Equivalent:

class A {
    public readonly x: string = "x";

    constructor(caller: (f: () => void) => void) {
        caller(this.asFunction);
        caller(this.asProperty);
    }

    public asFunction(): void {
        console.log(`${this?.x ?? 'u'} as function`);
    }

    public readonly asProperty = () => {
        console.log(`${this?.x ?? 'u'} as function`);
    }
}

new A((f) => f());

Prints "u as function" and "x as property" showing that this is correctly captured since it's a property on each instance vs. a function on the prototype or something something JavaScript something something...

Copy link
Contributor

@anthony-murphy anthony-murphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Contributor

@anthony-murphy anthony-murphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@github-actions github-actions bot added area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc labels May 27, 2021
@github-actions github-actions bot added the breaking change This PR or issue would introduce a breaking change label May 27, 2021
@github-actions github-actions bot removed the breaking change This PR or issue would introduce a breaking change label May 27, 2021
@arinwt arinwt merged commit 6cbec3c into microsoft:main May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues area: tests Tests to add, test infrastructure improvements, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Number of unsummarized ops spikes every once in a while
4 participants