[BUG] Detector creation gets stuck on clusters with large shards and heavy ingestion #870

eirsep · 2024-02-27T21:05:32Z

What is the bug?

curl localhost:9200/_cat/tasks?v | less
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
action                                                     task_id                        parent_task_id                 type      start_time    timestamp running_time ip            node
cluster:admin/opensearch/securityanalytics/detector/write  NS5L3EYoSM2ED7Ivqq1snQ:990205  -                              transport 1706578051254 01:27:31  5.2h         10.212.107.51 5112ba5b511cfd4495
cluster:admin/opensearch/securityanalytics/detector/write  NS5L3EYoSM2ED7Ivqq1snQ:991197  -                              transport 1706578171258 01:29:31  5.2h         10.212.107.51 5112ba5b511cfd4495
cluster:admin/opensearch/securityanalytics/rule/search     wWwwf7eRSD2oo8KulgOF7Q:917083  -                              transport 1706578176167 01:29:36  5.2h         10.212.27.178 a173576e5c9b149d2e
cluster:admin/opendistro/alerting/monitor/write            NS5L3EYoSM2ED7Ivqq1snQ:991834  -                              transport 1706578242304 01:30:42  5.2h         10.212.107.51 5112ba5b511cfd4495
cluster:admin/opensearch/securityanalytics/detector/write  6x4YBILlRNqCh-H5SEGz4g:929275  -                              transport 1706578277667 01:31:17  5.2h         10.212.98.228 f8a85ed4b86db333fc
cluster:admin/opensearch/securityanalytics/mapping/get     NS5L3EYoSM2ED7Ivqq1snQ:992971  -                              transport 1706578360527 01:32:40  5.1h         10.212.107.51 5112ba5b511cfd4495
indices:admin/mappings/get

How can one reproduce the bug?

There are a few blocking calls (due to invocation of `actionGet()`) which are causing deadlocks in detector creation flow. On clusters with heavy ingestion and large shards this problem is magnified and causes cluster to choke up and run out of resources stuck in deadlocks

What is the expected behavior?
Code should be event driven using the listener-based SPIs exposed by opensearch transport client

The text was updated successfully, but these errors were encountered:

) (opensearch-project#875) * Fix getAlerts API for standard Alerting monitors Signed-off-by: Ashish Agrawal <[email protected]>

eirsep added bug Something isn't working untriaged labels Feb 27, 2024

eirsep mentioned this issue Feb 27, 2024

Remove blocking calls and change threat intel feed flow to event driven #871

Merged

praveensameneni removed the untriaged label Mar 14, 2024

praveensameneni closed this as completed Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Detector creation gets stuck on clusters with large shards and heavy ingestion #870

[BUG] Detector creation gets stuck on clusters with large shards and heavy ingestion #870

eirsep commented Feb 27, 2024

[BUG] Detector creation gets stuck on clusters with large shards and heavy ingestion #870

[BUG] Detector creation gets stuck on clusters with large shards and heavy ingestion #870

Comments

eirsep commented Feb 27, 2024

There are a few blocking calls (due to invocation of actionGet()) which are causing deadlocks in detector creation flow. On clusters with heavy ingestion and large shards this problem is magnified and causes cluster to choke up and run out of resources stuck in deadlocks

There are a few blocking calls (due to invocation of `actionGet()`) which are causing deadlocks in detector creation flow. On clusters with heavy ingestion and large shards this problem is magnified and causes cluster to choke up and run out of resources stuck in deadlocks