Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Runtime field error catching and navigation to data view #124275

Closed
wants to merge 4 commits into from

Conversation

semd
Copy link
Contributor

@semd semd commented Feb 1, 2022

Summary

issue: #122990

The PR catches the runtime field error and shows a warning message to inform the user about the workaround, and a button to navigate to the data view. The flow is the following:

runtimeFieldError_warning_fix_script.mov

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@kibana-ci
Copy link
Collaborator

kibana-ci commented Feb 2, 2022

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering in pinned tab it should have the attributes isDraggable to be false when timelineId !== "active" and activeTab === "pinned"
  • [job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering in pinned tab it should have the attributes isDraggable to be false when timelineId === "active" and activeTab === "pinned"
  • [job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering it should have the attributes isDraggable to be false when timelineId !== "active" and activeTab === "query"
  • [job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering it should have the attributes isDraggable to be true when timelineId === "active" and activeTab === "query"
  • [job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering it should render the Event Details Panel when the panelView is set and the associated params are set
  • [job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering it should render the Event Details view of the Details Panel in the flyout when the panelView is eventDetail and the eventId is set
  • [job] [logs] Jest Tests #2 / Network Details it renders ipv6 headline
  • [job] [logs] Jest Tests #5 / StatefulTimeline it add attribute data-timeline-id in securitySolutionTimeline__container
  • [job] [logs] Jest Tests #5 / StatefulTimeline on create timeline and timeline savedObjectId: null, sourcerer does not update timeline
  • [job] [logs] Jest Tests #5 / StatefulTimeline renders
  • [job] [logs] Jest Tests #5 / StatefulTimeline sourcerer data view updates and timeline already matches the data view, no updates
  • [job] [logs] Jest Tests #5 / StatefulTimeline sourcerer data view updates, update timeline data view
  • [job] [logs] Jest Tests #5 / useTimelineEvents Correlation pagination is calling search strategy when switching page
  • [job] [logs] Jest Tests #5 / useTimelineEvents happy path query
  • [job] [logs] Jest Tests #5 / useTimelineEvents init
  • [job] [logs] Jest Tests #5 / useTimelineEvents Mock cache for active timeline when switching page
  • [job] [logs] Jest Tests #5 / useTimelineLastEventTime should call search strategy
  • [job] [logs] Jest Tests #5 / useTimelineLastEventTime should init
  • [job] [logs] Jest Tests #5 / useTimelineLastEventTime should set response

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 2856 2857 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 4.6MB 4.6MB +1.7KB
timelines 226.6KB 228.0KB +1.4KB
total +3.2KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 245.7KB 245.9KB +175.0B
timelines 136.5KB 136.6KB +139.0B
total +314.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@sebelga
Copy link
Contributor

sebelga commented Feb 2, 2022

It seems that a better UX would be, instead of having a button "Manage Data view" to have a button "Edit field" and open the runtime field flyout directly inside Security > Alert. WDYT?

@semd
Copy link
Contributor Author

semd commented Feb 2, 2022

Hi @sebelga ,
Yes, we considered that solution, the problem is that we do not have the fieldName of the runtime field that caused the error in the error trace:

{
  "message": "status_exception",
  "statusCode": 400,
  "attributes": {
    "type": "status_exception",
    "reason": "error while executing search",
    "caused_by": {
      "type": "search_phase_execution_exception",
      "reason": "all shards failed",
      "phase": "query",
      "grouped": true,
      "failed_shards": [
        {
          "shard": 0,
          "index": ".internal.alerts-security.alerts-default-000001",
          "node": "DZnBO2YJTWO6YtjrV3NMmA",
          "reason": {
            "type": "script_exception",
            "reason": "runtime error",
            "script_stack": [
              "org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:568)",
              "org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:584)",
              "emit(doc['process.name'].value + ' is uncool')",
              "                        ^---- HERE"
            ],
            "script": "emit(doc['process.name'].value + ' is uncool')",
            "lang": "painless",
            "position": {
              "offset": 24,
              "start": 0,
              "end": 46
            },
            "caused_by": {
              "type": "illegal_state_exception",
              "reason": "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
            }
          }
        }
      ]
    }
  }
}

At most, we could inform the user about the script that caused the error.
Even though, we have this PR holding off since there's another one implementing a similar solution:
#91346

@sebelga
Copy link
Contributor

sebelga commented Feb 2, 2022

I think it is worth bringing this to the ES team and ask them to return in the error the runtime field which has failed. This would have a huge improvement on the UX. 👍

@semd semd closed this Feb 10, 2022
@semd
Copy link
Contributor Author

semd commented Feb 10, 2022

solved here #125178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants