Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC] Alert document updates should not spread results of a fields API query #113003

Closed
afgomez opened this issue Sep 23, 2021 · 16 comments · Fixed by #118245
Closed

[RAC] Alert document updates should not spread results of a fields API query #113003

afgomez opened this issue Sep 23, 2021 · 16 comments · Fixed by #118245
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:RAC label obsolete Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" Theme: rac label obsolete v8.0.0

Comments

@afgomez
Copy link
Contributor

afgomez commented Sep 23, 2021

Summary

Right now to update an alert document, we first read the document from the index using the fields API, make the changes we need, and write everything back.

The fields API doesn't necessarily return what was indexed via the _source.

Consider this simplified example:
# Create the index
PUT /test
{
  "mappings": {
    "properties": {    
      "foo": { "type": "object", "enabled": "false" }
    }
  }
}

# Save a document
POST /test/_doc/1
{
  "foo": {
    "nodeType": "host",
    "criteria": [{ "comparator": "GT", "threshold": 0.1 }, { "comparator": "LT", "threshold": 0.5 }]
  }
}


# Retrieve the document using fields
GET /test/_search
{
  "_source": false,
  # Note the `include_unmapped` here
  "fields": [{ "field": "*", "include_unmapped": "true"}]
}


# ...
{
  ...,
  "hits" : {
    ...,
    "hits" : [
      {
        ...,
        "fields" : {
          "foo.criteria.threshold" : [
            0.1,
            0.5
          ],
          "foo.nodeType" : [
            "host"
          ],
          "foo.criteria.comparator" : [
            "GT",
            "LT"
          ]
        }
      }
    ]
  }
}

As you can see the fields object has no resemblance with the original object in the _source. This is how ES works and it's expected. The potential problem arises when we write back the fields as _source in the document. What happens is that the document is updated with what's in the fields, and not with what the original "_source" looked like.

POST /test/_doc/1
{
  "foo.criteria.threshold" : [
    0.1,
    0.5
  ],
  "foo.nodeType" : [
    "host"
  ],
  "foo.criteria.comparator" : [
    "GT",
    "LT"
  ]
}

Another problem is that if the mapping contains any runtime fields, field aliases, etc. that are not in the _source, they will be returned by the fields API and later saved in _source. This is not something we explicitly intend to do, as far as I'm aware.

Plan for fixing this bug

  1. We can't do this the way we are currently doing it where we spread the results of the Fields API-driven query back into the document. This will produce unintended results in the documents for aliased fields, runtime fields, and certain field types like object which come back from the fields API in a different format than they are intended to be indexed.

  2. We should read from _source for the unmapped fields that we want to read, as well as any object field. "include_unmapped" will likely go away at some point, it sounds like.

  3. We have three options to solve the update issue, listed here from (imo) best to worst:

    1. The Elasticsearch update API would allow us to let the GET + PUT logic be handled on the ES nodes. We would likely need to write the update script in Painless, though.
    2. Create a new object where updates are applied, and only apply those updates back to the document. Don't attempt to spread the entire document back into the update object and do a full document update/replace.
    3. Query from _source and from the fields API, then spread the _source result back onto the full document to do a full document update/replace.

I think we should focus on looking into (i) or (ii) and avoid (iii) if possible.

AC:

  • List all updates we're currently doing to alert documents (even if some of them only happen conditionally)
  • Remove "include_unmapped" and use _source for any object/unmapped fields we need to access
  • Look into options (i) and (ii) above and decide on what's best (with the possibility that we are missing context that makes (iii) desirable, in which case we'll need to read in from _source and the fields API.

Note: The security solution appears to have logic in place (I'm not sure exactly where it's used at the moment) that attempts to merge the Fields API response with the _source response and then write the Fields version back to _source on purpose. I don't see an advantage to that for our use case right now but we should make sure we understand the purpose of doing that before making a final decision on which approach to use here.

@afgomez afgomez added the Feature:RAC label obsolete label Sep 23, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Sep 23, 2021
@afgomez afgomez added bug Fixes for quality problems that affect the customer experience and removed needs-team Issues missing a team label labels Sep 23, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Sep 23, 2021
@marshallmain
Copy link
Contributor

On the security solution side we've implemented "best-effort" merging of fields into _source which may be useful here. See implementation and readme.

This strategy tries to get the best of both worlds, so we can retain any runtime/const keyword fields that only show up in fields but also retain the original structure of _source.

@marshallmain
Copy link
Contributor

cc @FrankHassanabad implemented it and may have more thoughts.

@timroes
Copy link
Contributor

timroes commented Sep 27, 2021

I am curious: Why are we reading the document in general via fields API, when we want to use it as _source to index again? Wouldn't it make more sense reading the _source here directly?

Sorry just noticed this was already answered in another comment. That we're using runtime fields inside alerting somehow. I change my question then to: When you read from fields you'll get the runtime fields back and will now index them. Is that a desired behavior for those fields?

@afgomez
Copy link
Contributor Author

afgomez commented Sep 27, 2021

@timroes my guess is that we want to also read fields not present in the source, like runtime fields and such.

@jasonrhodes
Copy link
Member

When you read from fields you'll get the runtime fields back and will now index them. Is that a desired behavior for those fields?

No, this is not desired. That's the bug we've identified with this approach here in this ticket.

@jasonrhodes
Copy link
Member

jasonrhodes commented Sep 27, 2021

I've just brought this up with the Elasticsearch "Fix It" bi-weekly meeting and had a discussion about options. I think there are a few things here:

  1. We can't do this the way we are currently doing it where we spread the results of the Fields API-driven query back into the document. This will produce unintended results in the documents for aliased fields, runtime fields, and certain field types like object which come back from the fields API in a different format than they are intended to be indexed.
  2. We should read from _source for the unmapped fields that we want to read, as well as any object field. "include_unmapped" will likely go away at some point, it sounds like.
  3. We have three options to solve the update issue, listed here from best to worst:
    1. The Elasticsearch update API would allow us to let the GET + PUT logic be handled on the ES nodes. We would likely need to write the update script in Painless, though.
    2. Create a new object where updates are applied, and only apply those updates back to the document. Don't attempt to spread the entire document back into the update object and do a full document update/replace.
    3. Query from _source and from the fields API, then spread the _source result back onto the full document to do a full document update/replace.

I think we should focus on looking into (i) or (ii) and avoid (iii) if possible.

@marshallmain are there reasons I'm not aware of that make (iii) more preferable (the full document update)?

@jasonrhodes jasonrhodes added Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Theme: rac label obsolete labels Sep 27, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Sep 27, 2021
@jasonrhodes jasonrhodes added needs-team Issues missing a team label v7.16.0 labels Sep 27, 2021
@botelastic botelastic bot removed the needs-team Issues missing a team label label Sep 27, 2021
@jasonrhodes jasonrhodes changed the title [RAC] Potential problem when updating an alerting document [RAC] Alert document updates should not spread results of a fields API query Sep 27, 2021
@jasonrhodes
Copy link
Member

I'd like to first understand all/any updates we are currently making to the alert document, so we can decide which of these options is feasible for us. We can change how this works later if the updates become more complicated, but for now, if we can simply use a painless script to have this happen on the ES node, I think it would have a lot of potential advantages.

@marshallmain
Copy link
Contributor

The Elasticsearch update API would allow us to let the GET + PUT logic be handled on the ES nodes. We would likely need to write the update script in Painless, though.

This sounds fine to me.

When you read from fields you'll get the runtime fields back and will now index them. Is that a desired behavior for those fields?

For the security solution where we're creating a new document and copying over the fields from a source index that we don't control, yes, this is desired. We want to ensure that runtime/const_keyword/aliased fields from the source index can be queried in the alerts as well. But it seems that the desired behavior is different here.

@afgomez
Copy link
Contributor Author

afgomez commented Sep 29, 2021

For the security solution where we're creating a new document and copying over the fields from a source index that we don't control, yes, this is desired.

If I understand correctly, what happens there is that we get the runtime fields from the source indices (i.e. foobar*) and store them as regular fields in the .alert* indices, correct?

In our scenario, we fetch the fields from the .alert* indices and store them back as regular fields in the same document. This might cause problems if we ever add a runtime field or other virtual fields to the .alert* indices (which is what we want to prevent).

@weltenwort
Copy link
Member

weltenwort commented Oct 4, 2021

Looks like using the _update API with partial updates would be safest and doesn't require painless. It's not apparent to me whether reading from _source might still be safer than the fields API in this case or not. Until we're actually making use of runtime fields it might be the choice least likely to mess up values while writing back previously read values. 🤔

@jasonrhodes
Copy link
Member

@weltenwort I think I agree on both points here unless anyone of us knows of any reasons to do it differently.

@afgomez
Copy link
Contributor Author

afgomez commented Oct 11, 2021

It's not apparent to me whether reading from _source might still be safer than the fields API in this case or not

It's safer for data that exists already in the source document, since ES doesn't do any transformations in the source itself.


This next bit is a bit stream of consciousness, so forgive me if it feels disconnected

I wonder if we need to use the fields API in the executor at all. Is there a reason to read from the fields API right now? Does it do something for us that reading from the source doesn't or should we apply YAGNI?

Right now we get all fields using the fields API and transform the result into something we can consume. If we query the _source it would already be in a consumable form.

I understand that we want to read possible runtime fields, but even then do we expect to need them in the executor or is that more for the UI queries (e.g to perform searches, to show the values on the table, etc.)

Do we expect the possible future runtime fields to be arbitrary or would we know which fields exist? If that's the case would it make sense to query the source and use the fields API for those specific fields? Is there a way to only a certain type of field using the fields API? (e.g. fields: [{ field: '*', type: 'runtime' }]).

@jasonrhodes jasonrhodes added Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" v8.0.0 and removed Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.16.0 labels Oct 29, 2021
@simianhacker simianhacker self-assigned this Nov 2, 2021
@simianhacker
Copy link
Member

simianhacker commented Nov 2, 2021

I'm just digging into this so please bear with me...

It looks like the field spread is also problematic for the creation of the new documents as well, correct?

I'm assuming we really only need to update these fields:

[ALERT_DURATION]: (options.startedAt.getTime() - new Date(started).getTime()) * 1000,
[ALERT_INSTANCE_ID]: alertId,
[ALERT_START]: started,
[ALERT_UUID]: alertUuid,
[ALERT_STATUS]: isRecovered ? ALERT_STATUS_RECOVERED : ALERT_STATUS_ACTIVE,
[ALERT_WORKFLOW_STATUS]: alertData?.fields[ALERT_WORKFLOW_STATUS] ?? 'open',
[EVENT_KIND]: 'signal',
[EVENT_ACTION]: isNew ? 'open' : isActive ? 'active' : 'close',
[VERSION]: ruleDataClient.kibanaVersion,
...(isRecovered ? { [ALERT_END]: commonRuleFields[TIMESTAMP] } : {}),

Update: I guess we have to get the _source of the alert too in order to keep them in sync. For example, if someone updates an the name of the alert, we would need to update that too since it changed.

@mgiota
Copy link
Contributor

mgiota commented Nov 3, 2021

@simianhacker A bit of a history regarding this ticket. @afgomez wanted to store the rule params that triggered the alert and he explored storing it as an object but that caused lots of headaches when querying the field using the ES fields API.

In the end he used an alternative for the problem he was trying to solve. He stored the params as a keyword with index: false that contains the serialized params object. More information #113429.

So I am wondering do we actually need to fix this issue, as we used an alternative solution in the end? @jasonrhodes

@weltenwort
Copy link
Member

@simianhacker for new documents there won't be predecessor documents whose fields will be flattened incorrectly due to the usage of the fields API. But in the case of updating existing documents the fields API will cause problems, so I'd agree that we should probably use _source instead:

_source: false,
fields: [{ field: '*', include_unmapped: true }],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:RAC label obsolete Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" Theme: rac label obsolete v8.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants