Kibana alerting acts strangely when Elasticsearch and/or Kibana clocks are out of sync #87664

mikecote · 2021-01-07T16:05:26Z

There is ongoing work to document that alerting requires clocks to be in sync between all Elasticsearch and Kibana instances (#81532). It would be nice to mitigate this problem and also avoid ourselves debugging such scenarios without knowing.

elasticmachine · 2021-01-07T16:05:28Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

mikecote · 2021-01-07T18:57:23Z

One way to get the date from Elasticsearch would be to do a call like:

GET */_search
{
  "size": 1, 
  "script_fields": {
    "now": {
      "script": "new Date().getTime()"
    }
  }
}

Some brain dump: I was thinking this could be used on task manager startup and ensure the date returned is between the start and end of that request to Elasticsearch. Otherwise, it would mean the clocks are not in sync. This approach would only work on the node that responded and wouldn't work if ever one of the ES nodes is out of sync. For that, I was thinking this script / get ES date could be part of every task manager claim query and we can make sure the responding node has its clocks in sync with the Kibana requesting to claim tasks.

pmuellr · 2021-01-13T18:48:38Z

I think we'd want to do it on every TM claim query - Kibana start up time only would miss too many cases. Although that's generally going to be too often, especially if it's an additional HTTP request. Or could we bundle this into one of our existing searches somehow as an aggregation?

And this won't work if the customer disabled scripts. I'd prefer to use a Date header in the http response, but apparently ES doesn't add Date headers to responses. Getting it via a header would be one less HTTP request to be made to ES. I wonder if we could add some option to requests (perhaps via a header) to tell ES to add a Date header to the responses.

I suspect we are seeing this in alerting, because most of the other parts of Kibana don't really require now interpretation in ES, that would be expected to be tightly aligned with Kibana's time. But that will likely change in the future. Which makes this more of a system problem, not just alerting.

pmuellr · 2021-01-13T18:52:02Z

One "simple" way to fix the original issue is to not use now in our queries, instead replacing it with the literal date computed by Kibana (eg, Date.now()). Seems like the critical usages are in this module, here's an example:

kibana/x-pack/plugins/task_manager/server/queries/mark_available_tasks_as_claimed.ts

Lines 74 to 81 in 0e118c2

    
           must: [ 
        
             { 
        
               bool: { 
        
                 should: [{ term: { 'task.status': 'running' } }, { term: { 'task.status': 'claiming' } }], 
        
               }, 
        
             }, 
        
             { range: { 'task.retryAt': { lte: 'now' } } }, 
        
           ],

It's kind of brushing the dirt under the rug. You would certainly still see weird stuff in a multi-Kibana deployment where the Kibana clocks are not in sync. But would likely fix the problem in a single Kibana deployment.

mikecote · 2023-10-12T16:35:46Z

Closing issue as it seems it would be a core issue if the clocks were out of sync and we haven't seen this happen yet.

mikecote added Feature:Alerting Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jan 7, 2021

mikecote added the discuss label Jan 12, 2021

gmmorris added the Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework label Jul 1, 2021

gmmorris added loe:needs-research This issue requires some research before it can be worked on or estimated resilience Issues related to Platform resilience in terms of scale, performance & backwards compatibility labels Jul 14, 2021

gmmorris added the estimate:needs-research Estimated as too large and requires research to break down into workable issues label Aug 18, 2021

gmmorris removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 2, 2021

mikecote added this to AppEx: ResponseOps - Execution & Connectors Jan 6, 2022

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

mikecote moved this to Todo in AppEx: ResponseOps - Execution & Connectors Jul 21, 2022

mikecote closed this as completed Oct 12, 2023

github-project-automation bot moved this from Todo to Done in AppEx: ResponseOps - Execution & Connectors Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kibana alerting acts strangely when Elasticsearch and/or Kibana clocks are out of sync #87664

Kibana alerting acts strangely when Elasticsearch and/or Kibana clocks are out of sync #87664

mikecote commented Jan 7, 2021

elasticmachine commented Jan 7, 2021

mikecote commented Jan 7, 2021 •

edited

Loading

pmuellr commented Jan 13, 2021

pmuellr commented Jan 13, 2021

mikecote commented Oct 12, 2023

Kibana alerting acts strangely when Elasticsearch and/or Kibana clocks are out of sync #87664

Kibana alerting acts strangely when Elasticsearch and/or Kibana clocks are out of sync #87664

Comments

mikecote commented Jan 7, 2021

elasticmachine commented Jan 7, 2021

mikecote commented Jan 7, 2021 • edited Loading

pmuellr commented Jan 13, 2021

pmuellr commented Jan 13, 2021

mikecote commented Oct 12, 2023

mikecote commented Jan 7, 2021 •

edited

Loading