Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [security_analytics_exception] Workflow with xxx not found, cannot stop or delete the Detector Rule #618

Closed
StefanSa opened this issue Sep 26, 2023 · 13 comments
Labels
bug Something isn't working v2.10.0 v2.11.0

Comments

@StefanSa
Copy link

What is the bug?

Have successfully created a "custom log types" with dector rule. Due to dector rule went online and had quite a few findings.
I now wanted to stop or delete the Detector rule, both do not work.

get this error message:
[security_analytics_exception] Workflow with wuCa0YoByCThRGzPDLxU is not found

What is the expected behavior?
Create, stop and delete Detector rules without problems

What is your host/environment?

  • OS 2.10

Do you have any additional context?
For your note, the rule was created without alarm triggers.

@amsiglan
Copy link
Collaborator

@StefanSa Can you add some details about how you are trying to delete the detector? Is it using the UX or the API? Thanks!

@StefanSa
Copy link
Author

Hi @amsiglan ,
in UX, The reason is that no detector is found via API.

POST /_plugins/_security_analytics/detectors/_search
{
  "size": 30,  
  "query": {
    "nested": {
      "path": "detector",
      "query": {
        "bool": {
          "must": [
            { "match": {"detector.name": "*"} }
          ]
        }
      }
    }
  }
}

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

But the alarm configuration still exists (Firewalls).

GET .opendistro-alerting-config/_search
{
  "query": {
    "exists": { "field": "monitor" }
  }
}

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".opendistro-alerting-config",
        "_id": "EN1asYkBR98PEmEvsW7p",
        "_score": 1,
        "_source": {
          "monitor": {
            "type": "monitor",
            "schema_version": 0,
            "name": "test",
            "monitor_type": "query_level_monitor",
            "user": {
              "name": "admin",
              "backend_roles": [
                "admin"
              ],
              "roles": [
                "own_index",
                "all_access"
              ],
              "custom_attribute_names": [],
              "user_requested_tenant": null
            },
            "enabled": true,
            "enabled_time": 1690897658120,
            "schedule": {
              "period": {
                "interval": 1,
                "unit": "MINUTES"
              }
            },
            "inputs": [
              {
                "search": {
                  "indices": [
                    "opensearch_dashboards_sample_data_logs"
                  ],
                  "query": {
                    "size": 0,
                    "query": {
                      "bool": {
                        "filter": [
                          {
                            "range": {
                              "timestamp": {
                                "from": "{{period_end}}||-1h",
                                "to": "{{period_end}}",
                                "include_lower": true,
                                "include_upper": true,
                                "format": "epoch_millis",
                                "boost": 1
                              }
                            }
                          }
                        ],
                        "adjust_pure_negative": true,
                        "boost": 1
                      }
                    }
                  }
                }
              }
            ],
            "triggers": [],
            "last_update_time": 1690897658120,
            "ui_metadata": {
              "schedule": {
                "cronExpression": "0 */1 * * *",
                "period": {
                  "unit": "MINUTES",
                  "interval": 1
                },
                "timezone": null,
                "daily": 0,
                "monthly": {
                  "type": "day",
                  "day": 1
                },
                "weekly": {
                  "tue": false,
                  "wed": false,
                  "thur": false,
                  "sat": false,
                  "fri": false,
                  "mon": false,
                  "sun": false
                },
                "frequency": "interval"
              },
              "search": {
                "searchType": "graph",
                "bucketValue": 1,
                "timeField": "timestamp",
                "bucketUnitOfTime": "h",
                "filters": [],
                "aggregations": [],
                "cleanedGroupBy": []
              },
              "monitor_type": "query_level_monitor"
            },
            "data_sources": {
              "query_index": ".opensearch-alerting-queries",
              "findings_index": ".opensearch-alerting-finding-history-write",
              "findings_index_pattern": "<.opensearch-alerting-finding-history-{now/d}-1>",
              "alerts_index": ".opendistro-alerting-alerts",
              "alerts_history_index": ".opendistro-alerting-alert-history-write",
              "alerts_history_index_pattern": "<.opendistro-alerting-alert-history-{now/d}-1>",
              "query_index_mappings_by_type": {},
              "findings_enabled": false
            },
            "owner": "alerting"
          }
        }
      },
      {
        "_index": ".opendistro-alerting-config",
        "_id": "weCa0YoByCThRGzPArxL",
        "_score": 1,
        "_source": {
          "monitor": {
            "type": "monitor",
            "schema_version": 8,
            "name": "Firewalls",
            "monitor_type": "doc_level_monitor",
            "user": {
              "name": "",
              "backend_roles": [],
              "roles": [],
              "custom_attribute_names": [],
              "user_requested_tenant": null
            },
            "enabled": false,
            "enabled_time": null,
            "schedule": {
              "period": {
                "interval": 1,
                "unit": "MINUTES"
              }
            },
            "inputs": [
              {
                "doc_level_input": {
                  "description": "Firewalls",
                  "indices": [
                    "logs-*-*"
                  ],
                  "queries": [
                    {
                      "id": "UkGV0YoBdPXeaknOL0DI",
                      "name": "UkGV0YoBdPXeaknOL0DI",
                      "query": """(NOT (((id.orig_h: 192.168.*) OR (id.orig_h: 10.*) OR (id.orig_h: 172.16.*) OR (id.orig_h: 172.17.*) OR (id.orig_h: 172.18.*) OR (id.orig_h: 172.19.*) OR (id.orig_h: 172.20.*) OR (id.orig_h: 172.21.*) OR (id.orig_h: 172.22.*) OR (id.orig_h: 172.23.*) OR (id.orig_h: 172.24.*) OR (id.orig_h: 172.25.*) OR (id.orig_h: 172.26.*) OR (id.orig_h: 172.27.*) OR (id.orig_h: 172.28.*) OR (id.orig_h: 172.29.*) OR (id.orig_h: 172.30.*) OR (id.orig_h: 172.31.*) OR (id.orig_h: fd*) OR (id.orig_h: 2620\:83\:800f*)) AND (destination.port: *3389*)))""",
                      "tags": [
                        "high",
                        "firewall",
                        "attack.t1021.001"
                      ]
                    }
                  ]
                }
              }
            ],
            "triggers": [],
            "last_update_time": 1695797069289,
            "data_sources": {
              "query_index": ".opensearch-sap-firewall-detectors-queries",
              "findings_index": ".opensearch-sap-firewall-findings",
              "findings_index_pattern": "<.opensearch-sap-firewall-findings-{now/d}-1>",
              "alerts_index": ".opensearch-sap-firewall-alerts",
              "alerts_history_index": ".opensearch-sap-firewall-alerts-history",
              "alerts_history_index_pattern": "<.opensearch-sap-firewall-alerts-history-{now/d}-1>",
              "query_index_mappings_by_type": {
                "text": {
                  "analyzer": "rule_analyzer"
                }
              },
              "findings_enabled": true
            },
            "owner": "security_analytics"
          }
        }
      }
    ]
  }
}

@lezzago lezzago self-assigned this Sep 28, 2023
@lezzago
Copy link
Member

lezzago commented Sep 28, 2023

@StefanSa, from the UI, did you first try to disable or delete the detector?

Also did the error message, [security_analytics_exception] Workflow with wuCa0YoByCThRGzPDLxU is not found, come from when you tried to disable the detector?

@StefanSa
Copy link
Author

StefanSa commented Sep 28, 2023

@lezzago
hmm that's a good question that i can't answer directly.
When stopping and deleting, i get the error message "Workflow not found".

@lezzago
Copy link
Member

lezzago commented Sep 28, 2023

@StefanSa Thanks for letting me know.

Also did you enable the filter_by_backend_roles setting for alerting or security analytics?
references:

@StefanSa
Copy link
Author

@lezzago
No, here no such parameters have been set.

@lezzago
Copy link
Member

lezzago commented Oct 2, 2023

@StefanSa are you able to replicate this behavior or was this a one time issue?

Also is the detector still showing up on the UI? Or is the problem that it shows it was deleted, but you keep seeing findings being created?

@lezzago
Copy link
Member

lezzago commented Oct 2, 2023

Also if you do not mind, can you share any error logs from the cluster to here that is around when you faced the error? Ideally they should be related to Alerting or Security-Analytics.
Also please make sure to obfuscate any sensitive information.

@lezzago lezzago removed the untriaged label Oct 3, 2023
@StefanSa
Copy link
Author

StefanSa commented Oct 5, 2023

Hi @lezzago, yes the detector still showing up on the UI.
here a error log:

[2023-10-05T16:36:22,643][ERROR][o.o.a.u.AlertingException] [os-client03] Alerting error: OpenSearchStatusException[Workflow not found.]
[2023-10-05T16:36:22,644][ERROR][o.o.s.t.TransportDeleteDetectorAction] [os-client03] Failed to delete detector null
org.opensearch.alerting.util.AlertingException: Workflow not found.
        at org.opensearch.alerting.util.AlertingException$Companion.wrap(AlertingException.kt:70) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.getWorkflow(TransportDeleteWorkflowAction.kt:305) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.access$getWorkflow(TransportDeleteWorkflowAction.kt:112) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler$getWorkflow$1.invokeSuspend(TransportDeleteWorkflowAction.kt)  ~[?:?]
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[?:?]
        at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:233) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:742) ~[?:?]
Caused by: java.lang.Exception: org.opensearch.OpenSearchStatusException: Workflow not found.
        ... 9 more
[2023-10-05T16:36:22,653][ERROR][o.o.s.u.SecurityAnalyticsException] [os-client03] Security Analytics error:
org.opensearch.alerting.util.AlertingException: Workflow not found.
        at org.opensearch.alerting.util.AlertingException$Companion.wrap(AlertingException.kt:70) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.getWorkflow(TransportDeleteWorkflowAction.kt:305) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.access$getWorkflow(TransportDeleteWorkflowAction.kt:112) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler$getWorkflow$1.invokeSuspend(TransportDeleteWorkflowAction.kt)  ~[?:?]
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[?:?]
        at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:233) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:742) ~[?:?]
Caused by: java.lang.Exception: org.opensearch.OpenSearchStatusException: Workflow not found.
        ... 9 more
[2023-10-05T16:36:22,645][ERROR][o.o.a.t.TransportDeleteWorkflowAction] [os-client03] Failed to delete workflow wuCa0YoByCThRGzPDLxU
java.lang.NullPointerException: Cannot invoke "org.opensearch.core.common.bytes.BytesReference.length()" because "bytes" is null
        at org.opensearch.compress.ZstdCompressor.isCompressed(ZstdCompressor.java:53) ~[opensearch-compress-2.10.0.jar:2.10.0]
        at org.opensearch.core.compress.CompressorRegistry.compressor(CompressorRegistry.java:65) ~[opensearch-core-2.10.0.jar:2.10.0]
        at org.opensearch.common.xcontent.XContentHelper.createParser(XContentHelper.java:109) ~[opensearch-2.10.0.jar:2.10.0]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.getWorkflow(TransportDeleteWorkflowAction.kt:310) ~[opensearc
h-alerting-2.10.0.0.jar:2.10.0.0]

@StefanSa
Copy link
Author

StefanSa commented Nov 4, 2023

@lezzago
any news on this bug ?

@lezzago lezzago removed their assignment Nov 8, 2023
@lezzago
Copy link
Member

lezzago commented Nov 8, 2023

^ @sbcd90 @eirsep

@engechas
Copy link
Collaborator

Hi @StefanSa - I'm looking into this. If your setup is still intact, could you check if the Alerting monitor associated with the detector still exists?

I was able to get a similar error by deleting the underlying monitor created by the detector. Here was my process:

  1. Create a detector
  2. Go to Alerting > Monitors
  3. Delete the monitor created by the detector (in my case the monitor and detector name were the same)
  4. Attempt to delete the detector

With the above steps, I get an exception in the UX saying Failed to delete detector: [security_analytics_exception] Workflow not found.

The backend has these exceptions which look the same as the ones you posted above:

[2023-12-20T15:41:50,433][ERROR][o.o.a.u.AlertingException] [integTest-0] Alerting error: OpenSearchStatusException[Workflow not found.]
[2023-12-20T15:41:50,464][ERROR][o.o.a.t.TransportDeleteWorkflowAction] [integTest-0] Failed to delete workflow x6uZiYwB1vDhLX8YLGla
java.lang.NullPointerException: Cannot invoke "org.opensearch.core.common.bytes.BytesReference.length()" because "bytes" is null
        at org.opensearch.compress.ZstdCompressor.isCompressed(ZstdCompressor.java:66) ~[opensearch-compress-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.core.compress.CompressorRegistry.compressor(CompressorRegistry.java:65) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.xcontent.XContentHelper.createParser(XContentHelper.java:109) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.getWorkflow(TransportDeleteWorkflowAction.kt:310) ~[opensearch-alerting-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.access$getWorkflow(TransportDeleteWorkflowAction.kt:112) ~[opensearch-alerting-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler$getWorkflow$1.invokeSuspend(TransportDeleteWorkflowAction.kt) ~[opensearch-alerting-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) [kotlin-stdlib-1.8.21.jar:1.8.21-release-380(1.8.21)]
        at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:233) [kotlinx-coroutines-core-1.1.1.jar:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594) [kotlinx-coroutines-core-1.1.1.jar:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60) [kotlinx-coroutines-core-1.1.1.jar:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:742) [kotlinx-coroutines-core-1.1.1.jar:?]
[2023-12-20T15:41:50,470][ERROR][o.o.a.u.AlertingException] [integTest-0] Alerting error: java.lang.NullPointerException: Cannot invoke "org.opensearch.core.common.bytes.BytesReference.length()" because "bytes" is null
[2023-12-20T15:41:50,469][ERROR][o.o.s.t.TransportDeleteDetectorAction] [integTest-0] Failed to delete detector null
org.opensearch.alerting.util.AlertingException: Workflow not found.
        at org.opensearch.alerting.util.AlertingException$Companion.wrap(AlertingException.kt:70) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.getWorkflow(TransportDeleteWorkflowAction.kt:305) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.access$getWorkflow(TransportDeleteWorkflowAction.kt:112) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler$getWorkflow$1.invokeSuspend(TransportDeleteWorkflowAction.kt) ~[?:?]
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[?:?]
        at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:233) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:742) ~[?:?]
Caused by: java.lang.Exception: org.opensearch.OpenSearchStatusException: Workflow not found.
        ... 9 more
[2023-12-20T15:41:50,493][ERROR][o.o.s.u.SecurityAnalyticsException] [integTest-0] Security Analytics error:
org.opensearch.alerting.util.AlertingException: Workflow not found.
        at org.opensearch.alerting.util.AlertingException$Companion.wrap(AlertingException.kt:70) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.getWorkflow(TransportDeleteWorkflowAction.kt:305) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler.access$getWorkflow(TransportDeleteWorkflowAction.kt:112) ~[?:?]
        at org.opensearch.alerting.transport.TransportDeleteWorkflowAction$DeleteWorkflowHandler$getWorkflow$1.invokeSuspend(TransportDeleteWorkflowAction.kt) ~[?:?]
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[?:?]
        at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:233) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60) ~[?:?]
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:742) ~[?:?]
Caused by: java.lang.Exception: org.opensearch.OpenSearchStatusException: Workflow not found.
        ... 9 more

I am going to work on a PR to treat 404 on getting the workflow the same as if it had been deleted successfully. Then the stop/delete detector logic can continue instead of throwing this exception. Based on the exceptions you posted, I think this solution will work for your case as well, regardless of if the underlying cause is the same (monitor being deleted).

It would be good to confirm that the monitor is deleted in your setup as well. If it is not, then there may be another layer to this bug that I have not identified yet.

riysaxen-amzn pushed a commit to riysaxen-amzn/security-analytics that referenced this issue Mar 25, 2024
…ate as AckAlertRequest (opensearch-project#618) (opensearch-project#619)

Signed-off-by: Surya Sashank Nistala <[email protected]>

Signed-off-by: Surya Sashank Nistala <[email protected]>
(cherry picked from commit 2005185fcbbd36e41dc8344ab5a8c717c3043a1a)

Co-authored-by: Surya Sashank Nistala <[email protected]>
@engechas
Copy link
Collaborator

engechas commented Apr 9, 2024

Resolved by above PRs

@engechas engechas closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v2.10.0 v2.11.0
Projects
None yet
Development

No branches or pull requests

4 participants