Migrate status service and status page to New Platform #41983

joshdover · 2019-07-25T14:40:49Z

Subtasks:

Original issue content

Status service

from lifecycle rfc:
Core should expose a global mechanism for core services and plugins to signal their status. This is equivalent to the legacy status API kibana.Plugin.status which allowed plugins to set their status to e.g. 'red' or 'green'. The exact design of this API is outside of the scope of this RFC.

What is important, is that there is a global mechanism to signal status changes which Core then makes visible to system administrators in the Kibana logs and the /status HTTP API. Plugins should be able to inspect and subscribe to status changes from any of their dependencies.

This will provide an obvious mechanism for plugins to signal that the conditions which are required for this plugin to operate are not currently present and manual intervention might be required. Status changes can happen in both setup and start lifecycles e.g.:

[setup] a required remote host is down
[start] a remote host which was up during setup, started returning connection timeout errors.

Status API and page

Kibana currently exposes a api/status endpoint and associated status_page which renders this output:

name: config.get('server.name'),
uuid: config.get('server.uuid'),
version: {
  number: config.get('pkg.version').replace(matchSnapshot, ''),
  build_hash: config.get('pkg.buildSha'),
  build_number: config.get('pkg.buildNum'),
  build_snapshot: matchSnapshot.test(config.get('pkg.version'))
},
status: kbnServer.status.toJSON(), // https://github.com/elastic/kibana/blob/2a290a14066d4da2b626bb0b4e4e9d0193853230/src/legacy/server/status/server_status.js#L111
metrics: kbnServer.metrics // https://github.com/elastic/kibana/issues/46563 https://github.com/elastic/kibana/blob/ec481861799ed8dcced9cafd8112e5b26e641c54/src/legacy/server/status/lib/metrics.js#L57-L68

The status page app is rendered as a hiddenUiApp

kibana/src/legacy/server/status/routes/page/register_status.js

Line 50 in ec48186

}));

and migrating this will be blocked by

The api/stats endpoint is currently created in the same legacy plugin. It won't be migrated to Core, but depends on an equivalent to kbnServer.metrics being exposed from Core.

kibana/src/legacy/server/status/collectors/get_ops_stats_collector.js

Line 45 in c87e881

    
           ...kbnServer.metrics // latest metrics captured from the ops event listener in src/legacy/server/status/index

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-07-25T14:40:51Z

Pinging @elastic/kibana-platform

tsullivan · 2020-01-17T17:48:00Z

Hi,

The api/stats endpoint is currently created in the same legacy plugin. It won't be migrated to Core, but depends on an equivalent to kbnServer.metrics being exposed from Core.

I created the /api/stats endpoint with the idea that the stats data isn't suitable for Monitoring since the values come directly from Hapi, and weren't suitable for Monitoring consumption.

What are the thoughts about having /api/stats preserved and be a full replacement for /api/status?

BTW, the stats abbreviation was chosen to be similar to the Elasticsearch endpoints, such as _cluster/stats

joshdover · 2020-01-17T19:27:21Z

I think we should make this APIs separate and just have the status page UI consume both the status API and stats API.

Migrating the stats API is part of #46563

joshdover · 2020-02-27T20:36:44Z

Here's my initial thoughts on how plugin & core service statuses should work. This is not intended to replace an RFC on the concept, but to organize and document the original thinking before investigating all use cases.

Please poke holes in this 😄

High-level design concepts

More expressive status levels

Right now we have red, yellow, and green. These don't really explain much or have a consistent semantic meaning.

We could benefit from having status levels with explicit meaning and associated behaviors:

enum ServiceStatusLevel {
  available,   // everything is working!
  degraded,    // some features may not be working
  unavailable, // the service/plugin is unavailable, but other functions should work
  fatal        // block all user functions and show status page
               // (reserved for core services?)
}

Statuses reflect dependencies between plugins

In legacy, it's common for plugins to use the "mirror plugin status" concept to inherit their status from another plugin (most commonly, the Elasticsearch plugin).

It seems beneficial for this concept to be baked into the design of the new service:

A plugin's status should default to the highest-severity status of its required dependencies
A plugin's status should default to degraded if any of it's optional dependencies are >= degraded

Kibana should always try to keep as much functionality working as possible

In the legacy system, if any plugin changes to "red", pretty much all of Kibana's UI becomes blocked by the status page.

This prevents the user from using some of the built-in management and debug tools to diagnose and correct the problem. For instance, if Machine Learning is in the unavailable state, the user should still be able to use the Console app, License management, and Elasticsearch management tools to diagnose & fix the issue.

This is the purpose of the distinction between unavailable and fatal. I suspect that very few plugins should ever need to go into a fatal state. The few exceptions:

Security may want to trigger the fatal state if authentication is broken in some way (eg. license expiration)
Some key core services may need to block (eg. Saved Object migrations)

Plugins status does not alter which plugins are enabled

Anything that cannot be recovered from without restarting Kibana completely, should be throwing exceptions during setup or start rather than setting a unavailable or fatal status.

Therefore, we should not disable plugins because they are currently unavailable since removing plugins from the dependency tree requires an entire restart of the Core lifecycles (essentially an in-place restart of Kibana).

Core services & plugins should use the same status mechanism

Pretty self explanatory. There should be a single concept that backs status of different components in the system and they should easily interop with one another.

API Design

enum ServiceStatusLevel {
  available,   // everything is working!
  degraded,    // some features may not be working
  unavailable, // the service/plugin is unavailable, but other functions should work
  fatal        // block all user functions and show status page
               // (reserved for core services?)
}

interface ServiceStatus {
  level: ServiceStatusLevel;
  summary?: string;
  detail?: string;
  meta?: object;
}

interface StatusSetup {
  // Allows a plugin to specify a custom status dependent on its own criteria.
  // See calculation section below on how this is combined with dependency statuses
  setStatus(status$: Observable<ServiceStatus>): Observable<ServiceStatus>;

  // Exposes plugin status for dependencies of current plugin.
  // Type could be inferred by the generic type arg provided by plugins.
  getPluginStatuses$(): Observable<Record<string, ServiceStatus>>;

  // Statuses for all of core's services. Can be used with `inheritStatus` utility
  // for expressing dependent statuses on core services.
  core$: Observable<{
    http: ServiceStatus;
    elasticsearch: ServiceStatus;
    savedObjects: ServiceStatus;
    uiSettings: ServiceStatus;
  }>;
}

Status calculation

// Utility for merging several statuses together and producing a single status with the 
// most severe status up to the maxLevel.
const inheritStatus:
  (
    statuses$: Array<Observable<ServiceStatus>>,
    maxLevel?: ServiceStatusLevel
  ) => Observable<ServiceStatus>;

// Pseudo-code calculuation of a plugin's status
function calculatePluginStatus(requiredDeps: string[], optionalDeps: string[], pluginCustomStatus$?: Observable<ServiceStatus>) {
  const requiredDepStatus$ = inheritStatus(
    requiredDeps.map(dep => getStatusForPlugin(dep))
  );
  const optionalDepStatus$ = inheritStatus(
    optionalDeps.map(dep => getStatusForPlugin(dep)),
    ServiceStatusLevel.degraded // Optional dependencies are capped to 'degraded'
  );
  return inheritStatus([
    requiredDepStatus$,
    optionalDepStatus$,
    // `pluginCustomStatus$` is only set if plugin called `status.setStatus()`
    ...(pluginCustomStatus$ ? [pluginCustomStatus$] : [])
  ]);
}

Open questions

Do plugins need to be able to override the default inheritance of statuses from their dependencies?
Do we need a status service on the frontend as well as the backend?

rudolf · 2020-02-28T12:56:05Z

I really like the semantic statuses and the the status inheritance.

Do plugins need to be able to override the default inheritance of statuses from their dependencies?

I think the default of degraded if dependency >= degraded makes sense, but there are probably many plugins which would be unavailable if one of their dependencies were unavailable. E.g. I don't think dashboard could do anything useful without the data plugin, so that might want to set it's status to "unavailable" if the data plugin is unavailable.

Anything that cannot be recovered from without restarting Kibana completely, should be throwing exceptions during setup or start rather than setting a unavailable or fatal status.

I like this and I think it's the "correct" way. Exceptions are reserved for exceptional circumstances that we didn't anticipate happening and have no idea how to handle so the only valid response is to crash and see if it might work the second time Kibana starts up.

The only risk, which I'm not sure how to mitigate, is that crashing kibana becomes the default error handling. Many plugins don't have error handling for network exceptions or ES exceptions (usually rate-limiting or response data too large).

I also think we should flesh out how plugins API's work when a plugin is degraded. One way would be for API methods to throw an exception when that method is degraded. So if APM was unable to create it's agent configuration index, trying to call methods that rely on this index would throw an exception. This requires every consumer to know about, catch and ignore these exceptions, if they don't, calling a degraded API method will crash Kibana.

This isn't any worse than in legacy, but it is a challenge to making Kibana resillient and improving uptime.

joshdover · 2020-05-20T15:34:35Z

The problem described makes sense to me 👍

I'm unclear on what the first suggestion solution means:

There was a suggestion for status service to provide API allowing plugins to enforce status check. Btw, the licensing service does it.

This would be an API to allow plugins to register their own status check that would get executed by Core? Maybe we could leverage the ServiceSetup#set mechanism proposed in the original RFC for this (possibly with some changes)

As an alternative, we can intercept all network request to ES and detect status change (503 status in response) to update status change. Or even allow plugins to claim to perform a request to ES with retry.

This may be difficult. From my understanding it's possible for some APIs in Elasticsearch to return 503 Unavailable but other APIs to be working fine. Maybe we could put the ES status to degraded when Core's check is passing but there are 503s being emitted from some API calls? I just worry that 503s on some APIs may be irrelevant to many plugins. I think we need to analyze the actual failure behavior here in ES and then determine what global behavior here may (or may not) make sense.

joshdover · 2020-05-20T15:35:41Z

@azasypkin do you think we need this solved in Core before #65472 can proceed?

mshustov · 2020-05-20T15:43:39Z

Maybe we could leverage the ServiceSetup#set mechanism proposed in the original RFC for this (possibly with some changes)

As I understand set allows to specify own status. But here a plugins needs to enforce status service to perform status checks immediately not waiting for polling interval to be completed. Licensing plugin provides refresh method triggering license re-fetch https://github.com/elastic/kibana/blob/master/x-pack/plugins/licensing/server/types.ts#L60

joshdover · 2020-05-20T15:52:59Z

Got it, a manual refresh would be easy enough to add. Only complex part would be exposing a new manualTrigger$ or similar argument to core services and as part of the API exposed to plugins (when set is introduced).

azasypkin · 2020-05-20T16:09:35Z

@azasypkin do you think we need this solved in Core before #65472 can proceed?

Not necessarily, I'd rather wait a bit more to give you all enough time to come up with a reasonable solution. I can try to replicate something similar to what Larry did here and switch to a proper solution as soon as you provide it.

pgayvallet · 2020-07-16T09:35:20Z

Just a note of something I discovered while working on #67979:

In the legacy platform, there was a mechanism to display the status page instead of the app when the server was not ready

kibana/src/legacy/ui/ui_render/ui_render_mixin.js

Lines 212 to 225 in fa93a81

    
           server.route({ 
        
             path: '/app/{id}/{any*}', 
        
             method: 'GET', 
        
             async handler(req, h) { 
        
               const id = req.params.id; 
        
               const app = server.getUiAppById(id); 
        
               try { 
        
                 if (kbnServer.status.isGreen()) { 
        
                   return await h.renderApp(app); 
        
                 } else { 
        
                   return await h.renderStatusPage(); 
        
                 } 
        
               } catch (err) {

Probably important to note that this is currently broken, and have been for a long time:

We now are waiting for ES to be ready before actually starting the plugins, so the real app is never displayed until the server is green (only the kibana is not ready yet body content)
Now that we are based on a SPA (for non-legacy mode), the client-side router would display the app page instead of the status page after core_system's boot.

This is not blocker for #67979, and we did not get any complains about this broken feature, but we will probably want to be able to display the status page during the server's startup, and I don't see this task in the issue task list.

lukeelmers · 2022-08-17T18:35:57Z

Closing as this is essentially complete, and the main outstanding task (#72831) is being tracked separately.

joshdover added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:New Platform labels Jul 25, 2019

joshdover added the blocker label Oct 29, 2019

joshdover assigned joshdover and rudolf and unassigned joshdover Nov 12, 2019

mshustov mentioned this issue Nov 14, 2019

Move HttpService functionality from Legacy to New platform #39330

Closed

This was referenced Nov 19, 2019

Add Elasticsearch version check to New Platform #49785

Closed

Migrate the Status plugin to the New Platform #51021

Closed

rudolf changed the title ~~Migrate status page to New Platform~~ Migrate status service and status page to New Platform Nov 19, 2019

rudolf mentioned this issue Nov 19, 2019

[New-Platform][Discuss] Elasticsearch connection availability #43456

Closed

rudolf mentioned this issue Jan 8, 2020

[Reporting] New Platform Migration #53898

Closed

19 tasks

joshdover assigned eliperelman and unassigned rudolf Jan 14, 2020

joshdover mentioned this issue Jan 29, 2020

[meta] Cross-team platform migration blockers #56349

Closed

46 tasks

joshdover unassigned eliperelman Feb 10, 2020

joshdover self-assigned this Feb 25, 2020

pgayvallet mentioned this issue Feb 27, 2020

Add core metrics service #58623

Merged

3 tasks

joshdover mentioned this issue Feb 27, 2020

[test-failed]: X-Pack API Integration Tests1.x-pack/test/api_integration/apis/kibana/stats/stats·js - apis kibana stats /api/stats operational stats and usage stats no auth should return 200 and stats for no extended #51491

Closed

mshustov mentioned this issue Feb 28, 2020

[New platform] Security plugin integration #33775

Closed

34 tasks

legrego mentioned this issue Feb 28, 2020

[New platform] Spaces plugin integration #39011

Closed

25 tasks

mshustov mentioned this issue May 20, 2020

Register privileges in Kibana Platform Security plugin and remove legacy getUser API. #65472

Merged

pgayvallet mentioned this issue May 25, 2020

Failing test: Firefox XPack UI Functional Tests.x-pack/test/functional/apps/status_page/status_page·ts - Status page Status Page allows user to navigate without authentication #50448

Closed

This was referenced May 27, 2020

[Meta] Prepare stack monitoring for new platform #33671

Closed

Migrate Status Page UI #67979

Closed

joshdover added the NeededFor:Monitoring label Jun 23, 2020

pgayvallet mentioned this issue Jun 25, 2020

Plugin lifecycle methods that return undefined block dependents from inferring the plugin's disabled state #69845

Closed

azasypkin mentioned this issue Jul 8, 2020

Implement Server-Side sessions #68117

Merged

8 tasks

This was referenced Jul 21, 2020

Migrate status page app to core #72017

Merged

Remove legacy plugin discovery #71927

Closed

joshdover mentioned this issue Aug 6, 2020

Remove legacy spaces plugin #74532

Merged

watson mentioned this issue Aug 13, 2020

Upgrade hapi to @hapi/hapi 18.4.1 #61959

Closed

2 tasks

joshdover mentioned this issue Aug 19, 2020

Expose overall status to plugins #75503

Merged

3 tasks

joshdover mentioned this issue Aug 24, 2020

Add plugin status API #75819

Merged

4 tasks

pgayvallet mentioned this issue Aug 25, 2020

remove legacy security plugin #75648

Merged

joshdover mentioned this issue Sep 10, 2020

Migrate status & stats APIs to KP + remove legacy status lib #76054

Merged

3 tasks

joshdover removed their assignment Dec 2, 2020

mindw mentioned this issue Jan 3, 2021

Kibana replies with 400 chamilad/kibana-prometheus-exporter#7

Closed

mshustov mentioned this issue Aug 18, 2021

Allow Fleet to complete package upgrade before Kibana server is ready #108993

Closed

lukeelmers closed this as completed Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate status service and status page to New Platform #41983

Migrate status service and status page to New Platform #41983

joshdover commented Jul 25, 2019 •

edited

Loading

elasticmachine commented Jul 25, 2019

tsullivan commented Jan 17, 2020 •

edited

Loading

joshdover commented Jan 17, 2020

joshdover commented Feb 27, 2020 •

edited

Loading

rudolf commented Feb 28, 2020 •

edited

Loading

joshdover commented May 20, 2020

joshdover commented May 20, 2020

mshustov commented May 20, 2020 •

edited

Loading

joshdover commented May 20, 2020

azasypkin commented May 20, 2020

pgayvallet commented Jul 16, 2020

lukeelmers commented Aug 17, 2022

Migrate status service and status page to New Platform #41983

Migrate status service and status page to New Platform #41983

Comments

joshdover commented Jul 25, 2019 • edited Loading

Original issue content

Status service

Status API and page

elasticmachine commented Jul 25, 2019

tsullivan commented Jan 17, 2020 • edited Loading

joshdover commented Jan 17, 2020

joshdover commented Feb 27, 2020 • edited Loading

High-level design concepts

More expressive status levels

Statuses reflect dependencies between plugins

Kibana should always try to keep as much functionality working as possible

Plugins status does not alter which plugins are enabled

Core services & plugins should use the same status mechanism

API Design

Status calculation

Open questions

rudolf commented Feb 28, 2020 • edited Loading

joshdover commented May 20, 2020

joshdover commented May 20, 2020

mshustov commented May 20, 2020 • edited Loading

joshdover commented May 20, 2020

azasypkin commented May 20, 2020

pgayvallet commented Jul 16, 2020

lukeelmers commented Aug 17, 2022

joshdover commented Jul 25, 2019 •

edited

Loading

tsullivan commented Jan 17, 2020 •

edited

Loading

joshdover commented Feb 27, 2020 •

edited

Loading

rudolf commented Feb 28, 2020 •

edited

Loading

mshustov commented May 20, 2020 •

edited

Loading