[Security Solution] Runtime field error catching and navigation to data view #124275

semd · 2022-02-01T17:04:27Z

Summary

The PR catches the runtime field error and shows a warning message to inform the user about the workaround, and a button to navigate to the data view. The flow is the following:

runtimeFieldError_warning_fix_script.mov

Checklist

Delete any items that are not applicable to this PR.

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)
Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)
If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)
This was checked for cross-browser compatibility

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk	Probability	Severity	Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space.	Low	High	Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks.	High	Low	Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled.	Medium	High	Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

This was checked for breaking API changes and was labeled appropriately

kibana-ci · 2022-02-02T09:51:58Z

💔 Build Failed

Failed CI Steps

Test Failures

[job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering in pinned tab it should have the attributes isDraggable to be false when timelineId !== "active" and activeTab === "pinned"
[job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering in pinned tab it should have the attributes isDraggable to be false when timelineId === "active" and activeTab === "pinned"
[job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering it should have the attributes isDraggable to be false when timelineId !== "active" and activeTab === "query"
[job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering it should have the attributes isDraggable to be true when timelineId === "active" and activeTab === "query"
[job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering it should render the Event Details Panel when the panelView is set and the associated params are set
[job] [logs] Jest Tests #5 / Details Panel Component DetailsPanel:EventDetails: rendering it should render the Event Details view of the Details Panel in the flyout when the panelView is eventDetail and the eventId is set
[job] [logs] Jest Tests #2 / Network Details it renders ipv6 headline
[job] [logs] Jest Tests #5 / StatefulTimeline it add attribute data-timeline-id in securitySolutionTimeline__container
[job] [logs] Jest Tests #5 / StatefulTimeline on create timeline and timeline savedObjectId: null, sourcerer does not update timeline
[job] [logs] Jest Tests #5 / StatefulTimeline renders
[job] [logs] Jest Tests #5 / StatefulTimeline sourcerer data view updates and timeline already matches the data view, no updates
[job] [logs] Jest Tests #5 / StatefulTimeline sourcerer data view updates, update timeline data view
[job] [logs] Jest Tests #5 / useTimelineEvents Correlation pagination is calling search strategy when switching page
[job] [logs] Jest Tests #5 / useTimelineEvents happy path query
[job] [logs] Jest Tests #5 / useTimelineEvents init
[job] [logs] Jest Tests #5 / useTimelineEvents Mock cache for active timeline when switching page
[job] [logs] Jest Tests #5 / useTimelineLastEventTime should call search strategy
[job] [logs] Jest Tests #5 / useTimelineLastEventTime should init
[job] [logs] Jest Tests #5 / useTimelineLastEventTime should set response

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`securitySolution`	2856	2857	+1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	4.6MB	4.6MB	+1.7KB
`timelines`	226.6KB	228.0KB	+1.4KB
total			+3.2KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`securitySolution`	245.7KB	245.9KB	+175.0B
`timelines`	136.5KB	136.6KB	+139.0B
total			+314.0B

History

💚 Build #21133 succeeded 4126647

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

sebelga · 2022-02-02T10:29:24Z

It seems that a better UX would be, instead of having a button "Manage Data view" to have a button "Edit field" and open the runtime field flyout directly inside Security > Alert. WDYT?

semd · 2022-02-02T11:30:58Z

Hi @sebelga ,
Yes, we considered that solution, the problem is that we do not have the fieldName of the runtime field that caused the error in the error trace:

{
  "message": "status_exception",
  "statusCode": 400,
  "attributes": {
    "type": "status_exception",
    "reason": "error while executing search",
    "caused_by": {
      "type": "search_phase_execution_exception",
      "reason": "all shards failed",
      "phase": "query",
      "grouped": true,
      "failed_shards": [
        {
          "shard": 0,
          "index": ".internal.alerts-security.alerts-default-000001",
          "node": "DZnBO2YJTWO6YtjrV3NMmA",
          "reason": {
            "type": "script_exception",
            "reason": "runtime error",
            "script_stack": [
              "org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:568)",
              "org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:584)",
              "emit(doc['process.name'].value + ' is uncool')",
              "                        ^---- HERE"
            ],
            "script": "emit(doc['process.name'].value + ' is uncool')",
            "lang": "painless",
            "position": {
              "offset": 24,
              "start": 0,
              "end": 46
            },
            "caused_by": {
              "type": "illegal_state_exception",
              "reason": "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
            }
          }
        }
      ]
    }
  }
}

At most, we could inform the user about the script that caused the error.
Even though, we have this PR holding off since there's another one implementing a similar solution:
#91346

sebelga · 2022-02-02T13:38:26Z

I think it is worth bringing this to the ES team and ask them to return in the error the runtime field which has failed. This would have a huge improvement on the UX. 👍

semd · 2022-02-10T17:01:53Z

solved here #125178

semd added 3 commits January 28, 2022 18:12

show reset sort button warning toast on runtime field error

503c6fd

fix toastLifeTimeMs and text updated

601af47

error warning button navigates to manage dataView

4126647

stephmilovic mentioned this pull request Feb 1, 2022

[Security Solution] User is able to create invalid script custom field causing alert page break with 404 error #122117

Closed

catch runtime field error on security solution queries

ca0dac9

semd closed this Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution] Runtime field error catching and navigation to data view #124275

[Security Solution] Runtime field error catching and navigation to data view #124275

semd commented Feb 1, 2022

kibana-ci commented Feb 2, 2022 •

edited

Loading

sebelga commented Feb 2, 2022

semd commented Feb 2, 2022

sebelga commented Feb 2, 2022

semd commented Feb 10, 2022

[Security Solution] Runtime field error catching and navigation to data view #124275

[Security Solution] Runtime field error catching and navigation to data view #124275

Conversation

semd commented Feb 1, 2022

Summary

Checklist

Risk Matrix

For maintainers

kibana-ci commented Feb 2, 2022 • edited Loading

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

Module Count

Async chunks

Page load bundle

History

sebelga commented Feb 2, 2022

semd commented Feb 2, 2022

sebelga commented Feb 2, 2022

semd commented Feb 10, 2022

kibana-ci commented Feb 2, 2022 •

edited

Loading