-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elected summarizer logs #6106
Elected summarizer logs #6106
Conversation
api-report/driver-utils.api.md
Outdated
@@ -112,7 +112,7 @@ export function combineAppAndProtocolSummary(appSummary: ISummaryTree, protocolS | |||
export function configurableUrlResolver(resolversList: IUrlResolver[], request: IRequest): Promise<IResolvedUrl | undefined>; | |||
|
|||
// @public (undocumented) | |||
export function createGenericNetworkError(errorMessage: string, canRetry: boolean, retryAfterSeconds?: number, statusCode?: number): GenericNetworkError | ThrottlingError; | |||
export function createGenericNetworkError(errorMessage: string, canRetry: boolean, retryAfterSeconds?: number, statusCode?: number): ThrottlingError | GenericNetworkError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerbutler any idea why this keeps flip flopping? seen in it my changes as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've noticed it as well. No idea why yet. I'll open an issue. #6158
// this.orderedClients.incrementCurrentClient(); | ||
} | ||
}); | ||
this.summaryCollection.on(MessageType.SummaryAck, (op) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to track acks, or just ops? I've gone back and forth a bit on this. if one client's ops fail to ack, do we think another client's will?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acks is more important, but both can be valuable: each summarizer client tracks both for itself- attempt vs success. And no, it's not true that summary attempts that are nacked by one client imply that they will definitely be nacked by another client, especially if the internal state tracking of that summarizer client is busted. Though, in most cases simply restarting the client would be sufficient.
return; | ||
} | ||
|
||
// Normal case is adding the latest client, which will bypass loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not normal being the initiall population of the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically I guess, I'm not really sure what case allows them to come in unordered, since they are serialized as an array and already sorted naturally. I did not want to assume that serialization/deserialization rule just yet though.
} | ||
|
||
if (this.currentClient === undefined && newClient.nextClient === undefined) { | ||
this.currentClient = newClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know what current is yet, but here it looks the be the first last client. which is strange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revamped it a little, so instead of using overly abstract terms like "current client", "last client", and "non-summarizer clients", it is now "elected client", "youngest client", and "eligible clients". So basically this thing tracks ordered clients for purposes of election only. Eligible is the abstraction over interactive, but exists in case we wanted that to mean something else.
|
||
public getCurrentClient(): ITrackedClient | undefined { | ||
return this.currentClient; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be made simpler if this always just returne root.next, and increment was changed to remove root.next. what's the use of keep the invalid clients around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the reset functionality, which we will likely need when all clients have been marked as bad summarizers we need to start over and probably just blow up or delay cycling through them again or something.
■ @fluidframework/base-host: No change
⯅ @fluid-example/bundle-size-tests: +2.13 KB
Baseline commit: ecbf0d8 |
this.summarizerClientId = newSummarizerClientId; | ||
const newSummarizerClientId = this.orderedClients.getElectedClient()?.clientId; | ||
if (newSummarizerClientId !== this.electedClientId) { | ||
this.electedClientId = newSummarizerClientId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no, I think we should keep it for now. Agree with the sentiment of reducing state, but right now it does prevent excess "summarizer" event emitting. We call this function every time any change happens.
|
||
/** Returns the currently elected client. */ | ||
public getElectedClient(): ITrackedClient | undefined { | ||
return this.electedClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this ever return root, seems like it can now. i think in that case it should return undefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't and cannot right now. It only ever gets set to youngerClient
or newClient
. And no client has the root as its youngerClient
.
- Add client will set it to the added client in some cases (new)
- Remove client will set it to the younger client (next)
- Increment will set it to the younger client (next)
- Reset will set it to the root's younger client (next)
}; | ||
|
||
private readonly removeClient = (clientId: string) => { | ||
const removeClient = this.clientMap.get(clientId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just a normal function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did it because in the constructor I attach addClient
and removeClient
as event handlers, and I want to make sure I encapsulate the this
correctly without having to wrap them in lambdas. Equivalent:
class A {
public readonly x: string = "x";
constructor(caller: (f: () => void) => void) {
caller(this.asFunction);
caller(this.asProperty);
}
public asFunction(): void {
console.log(`${this?.x ?? 'u'} as function`);
}
public readonly asProperty = () => {
console.log(`${this?.x ?? 'u'} as function`);
}
}
new A((f) => f());
Prints "u as function" and "x as property" showing that this
is correctly captured since it's a property on each instance vs. a function on the prototype or something something JavaScript something something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #5617 by adding telemetry for when a single elected client fails to summarize for long enough. This counter resets every time the elected summarizer changes, so it isn't tracking per document as a whole like the one in ContainerRuntime is (those events are "SummaryStatus:Behind" and "SummaryStatus:CaughtUp").
This required different ordered client tracking. Used a specialized two-way linked list to track ordered clients for quick calculation of elected client- including increment and reset, plus a map for quick removal. The adding should also be constant time unless somehow they come out of order. This replaces the QuorumHeap previously used. Added OrderedClients in a new file: summaryManagerUtils.ts with tests around it. Also put the Throttler here.
Also change SummaryCollection to use events instead of op actions.