-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate status service and status page to New Platform #41983
Comments
Pinging @elastic/kibana-platform |
Hi,
I created the What are the thoughts about having BTW, the stats abbreviation was chosen to be similar to the Elasticsearch endpoints, such as |
I think we should make this APIs separate and just have the status page UI consume both the status API and stats API. Migrating the stats API is part of #46563 |
Here's my initial thoughts on how plugin & core service statuses should work. This is not intended to replace an RFC on the concept, but to organize and document the original thinking before investigating all use cases. Please poke holes in this 😄 High-level design conceptsMore expressive status levelsRight now we have We could benefit from having status levels with explicit meaning and associated behaviors: enum ServiceStatusLevel {
available, // everything is working!
degraded, // some features may not be working
unavailable, // the service/plugin is unavailable, but other functions should work
fatal // block all user functions and show status page
// (reserved for core services?)
} Statuses reflect dependencies between pluginsIn legacy, it's common for plugins to use the "mirror plugin status" concept to inherit their status from another plugin (most commonly, the Elasticsearch plugin). It seems beneficial for this concept to be baked into the design of the new service:
Kibana should always try to keep as much functionality working as possibleIn the legacy system, if any plugin changes to "red", pretty much all of Kibana's UI becomes blocked by the status page. This prevents the user from using some of the built-in management and debug tools to diagnose and correct the problem. For instance, if Machine Learning is in the This is the purpose of the distinction between
Plugins status does not alter which plugins are enabledAnything that cannot be recovered from without restarting Kibana completely, should be throwing exceptions during Therefore, we should not disable plugins because they are currently Core services & plugins should use the same status mechanismPretty self explanatory. There should be a single concept that backs status of different components in the system and they should easily interop with one another. API Designenum ServiceStatusLevel {
available, // everything is working!
degraded, // some features may not be working
unavailable, // the service/plugin is unavailable, but other functions should work
fatal // block all user functions and show status page
// (reserved for core services?)
}
interface ServiceStatus {
level: ServiceStatusLevel;
summary?: string;
detail?: string;
meta?: object;
} interface StatusSetup {
// Allows a plugin to specify a custom status dependent on its own criteria.
// See calculation section below on how this is combined with dependency statuses
setStatus(status$: Observable<ServiceStatus>): Observable<ServiceStatus>;
// Exposes plugin status for dependencies of current plugin.
// Type could be inferred by the generic type arg provided by plugins.
getPluginStatuses$(): Observable<Record<string, ServiceStatus>>;
// Statuses for all of core's services. Can be used with `inheritStatus` utility
// for expressing dependent statuses on core services.
core$: Observable<{
http: ServiceStatus;
elasticsearch: ServiceStatus;
savedObjects: ServiceStatus;
uiSettings: ServiceStatus;
}>;
} Status calculation// Utility for merging several statuses together and producing a single status with the
// most severe status up to the maxLevel.
const inheritStatus:
(
statuses$: Array<Observable<ServiceStatus>>,
maxLevel?: ServiceStatusLevel
) => Observable<ServiceStatus>; // Pseudo-code calculuation of a plugin's status
function calculatePluginStatus(requiredDeps: string[], optionalDeps: string[], pluginCustomStatus$?: Observable<ServiceStatus>) {
const requiredDepStatus$ = inheritStatus(
requiredDeps.map(dep => getStatusForPlugin(dep))
);
const optionalDepStatus$ = inheritStatus(
optionalDeps.map(dep => getStatusForPlugin(dep)),
ServiceStatusLevel.degraded // Optional dependencies are capped to 'degraded'
);
return inheritStatus([
requiredDepStatus$,
optionalDepStatus$,
// `pluginCustomStatus$` is only set if plugin called `status.setStatus()`
...(pluginCustomStatus$ ? [pluginCustomStatus$] : [])
]);
} Open questions
|
I really like the semantic statuses and the the status inheritance.
I think the default of degraded if dependency >= degraded makes sense, but there are probably many plugins which would be unavailable if one of their dependencies were unavailable. E.g. I don't think dashboard could do anything useful without the data plugin, so that might want to set it's status to "unavailable" if the data plugin is unavailable.
I like this and I think it's the "correct" way. Exceptions are reserved for exceptional circumstances that we didn't anticipate happening and have no idea how to handle so the only valid response is to crash and see if it might work the second time Kibana starts up. The only risk, which I'm not sure how to mitigate, is that crashing kibana becomes the default error handling. Many plugins don't have error handling for network exceptions or ES exceptions (usually rate-limiting or response data too large). I also think we should flesh out how plugins API's work when a plugin is degraded. One way would be for API methods to throw an exception when that method is degraded. So if APM was unable to create it's agent configuration index, trying to call methods that rely on this index would throw an exception. This requires every consumer to know about, catch and ignore these exceptions, if they don't, calling a degraded API method will crash Kibana. This isn't any worse than in legacy, but it is a challenge to making Kibana resillient and improving uptime. |
The problem described makes sense to me 👍 I'm unclear on what the first suggestion solution means:
This would be an API to allow plugins to register their own status check that would get executed by Core? Maybe we could leverage the
This may be difficult. From my understanding it's possible for some APIs in Elasticsearch to return 503 Unavailable but other APIs to be working fine. Maybe we could put the ES status to |
@azasypkin do you think we need this solved in Core before #65472 can proceed? |
As I understand |
Got it, a manual refresh would be easy enough to add. Only complex part would be exposing a new |
Not necessarily, I'd rather wait a bit more to give you all enough time to come up with a reasonable solution. I can try to replicate something similar to what Larry did here and switch to a proper solution as soon as you provide it. |
Just a note of something I discovered while working on #67979: In the legacy platform, there was a mechanism to display the status page instead of the app when the server was not ready kibana/src/legacy/ui/ui_render/ui_render_mixin.js Lines 212 to 225 in fa93a81
Probably important to note that this is currently broken, and have been for a long time:
This is not blocker for #67979, and we did not get any complains about this broken feature, but we will probably want to be able to display the status page during the server's startup, and I don't see this task in the issue task list. |
Closing as this is essentially complete, and the main outstanding task (#72831) is being tracked separately. |
Subtasks:
unavailableWhen
HTTP route utility from RFCOriginal issue content
Status service
from lifecycle rfc:
Core should expose a global mechanism for core services and plugins to signal their status. This is equivalent to the legacy status API kibana.Plugin.status which allowed plugins to set their status to e.g. 'red' or 'green'. The exact design of this API is outside of the scope of this RFC.
What is important, is that there is a global mechanism to signal status changes which Core then makes visible to system administrators in the Kibana logs and the /status HTTP API. Plugins should be able to inspect and subscribe to status changes from any of their dependencies.
This will provide an obvious mechanism for plugins to signal that the conditions which are required for this plugin to operate are not currently present and manual intervention might be required. Status changes can happen in both setup and start lifecycles e.g.:
[setup] a required remote host is down
[start] a remote host which was up during setup, started returning connection timeout errors.
Status API and page
Kibana currently exposes a
api/status
endpoint and associatedstatus_page
which renders this output:The status page app is rendered as a
hiddenUiApp
kibana/src/legacy/server/status/routes/page/register_status.js
Line 50 in ec48186
The
api/stats
endpoint is currently created in the same legacy plugin. It won't be migrated to Core, but depends on an equivalent tokbnServer.metrics
being exposed from Core.kibana/src/legacy/server/status/collectors/get_ops_stats_collector.js
Line 45 in c87e881
The text was updated successfully, but these errors were encountered: