Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] event log is not "disabled" when initialization fails #68309

Closed
pmuellr opened this issue Jun 4, 2020 · 3 comments · Fixed by #71339
Closed

[Alerting] event log is not "disabled" when initialization fails #68309

pmuellr opened this issue Jun 4, 2020 · 3 comments · Fixed by #71339
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@pmuellr
Copy link
Member

pmuellr commented Jun 4, 2020

When the event log ES initialization fails, the event log itself should be in a "disabled" state where it won't write event documents to the index. But it appears we've only done some of the work on this.

The code below is where the documents are indexed:

async function indexLogEventDoc(esContext: EsContext, doc: unknown) {
esContext.logger.debug(`writing to event log: ${JSON.stringify(doc)}`);
await esContext.waitTillReady();
await esContext.esAdapter.indexDocument(doc);
esContext.logger.debug(`writing to event log complete`);
}

The waitTillReady() call actually returns a boolean indicating whether initialized completed successfully, but we don't check it. We should check the result and not write when it returns false.

The value returned by waitTillReady() is set here:

initialize() {
// only run the initialization method once
if (this.initialized) return;
this.initialized = true;
this.logger.debug('initializing EsContext');
setImmediate(async () => {
try {
await this._initialize();
this.logger.debug('readySignal.signal(true)');
this.readySignal.signal(true);
} catch (err) {
this.logger.debug('readySignal.signal(false)');
this.readySignal.signal(false);
}
});
}
async waitTillReady(): Promise<boolean> {
return await this.readySignal.wait();
}
private async _initialize() {
await initializeEs(this);
}
}

It's only set to false when _initialize() throws an error, but the initializeEs() call in _initialize() (scroll down to the bottom of this snippet) can also return false when an error is detected.

Not immediately clear if we should even bother with these booleans set on the inner calls, when we could just throw an error instead of returning the boolean. Probably worth looking at them a little closer to see if that would be a simplification.

@pmuellr pmuellr added bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jun 4, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr
Copy link
Member Author

pmuellr commented Jun 4, 2020

I think there's a separate question of whether the event log search should also be "disabled" somehow when the the es initialization fails.

I think the answer is no. Some init failures may end up leaving a working event log index available, and in those cases, it would be nice to have access to whatever is in that log - never know, it may yield a clue as to why the init failed (but seems unlikely).

@mikecote
Copy link
Contributor

Maybe a UI indicator as well

@pmuellr pmuellr self-assigned this Jul 9, 2020
pmuellr added a commit to pmuellr/kibana that referenced this issue Jul 13, 2020
resolves elastic#68309

Previously, if the initialization of the elasticsearch resources failed
during initialization, the event logger would still try to write events.
Which is somewhat of a catastrophic failure, as typically the logger would
try writing to the alias name, but no alias exists, so a new index would
be created with the name of the alias.  Making it impossible to initialize
successfully later until that index was deleted.

The core initialization calls already returned success indicators, so this
PR just responds to those and prevents the logger from writing to the index
if intialization failed.
pmuellr added a commit that referenced this issue Jul 14, 2020
resolves #68309

Previously, if the initialization of the elasticsearch resources failed
during initialization, the event logger would still try to write events.
Which is somewhat of a catastrophic failure, as typically the logger would
try writing to the alias name, but no alias exists, so a new index would
be created with the name of the alias.  Making it impossible to initialize
successfully later until that index was deleted.

The core initialization calls already returned success indicators, so this
PR just responds to those and prevents the logger from writing to the index
if initialization failed.
pmuellr added a commit to pmuellr/kibana that referenced this issue Jul 14, 2020
resolves elastic#68309

Previously, if the initialization of the elasticsearch resources failed
during initialization, the event logger would still try to write events.
Which is somewhat of a catastrophic failure, as typically the logger would
try writing to the alias name, but no alias exists, so a new index would
be created with the name of the alias.  Making it impossible to initialize
successfully later until that index was deleted.

The core initialization calls already returned success indicators, so this
PR just responds to those and prevents the logger from writing to the index
if initialization failed.

# Conflicts:
#	x-pack/plugins/event_log/server/es/context.test.ts
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants