Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REROUTE processor doesn't work for 2nd and later instances of the same integration and drops documents for theese instances. #111828

Closed
SergeyDrachuk opened this issue Aug 13, 2024 · 13 comments
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@SergeyDrachuk
Copy link

SergeyDrachuk commented Aug 13, 2024

Elasticsearch Version

8.15.0, Build: deb/1a77947f34deddb41af25e6f0ddb8e830159c179/2024-08-05T10:05:34.233336849Z, JVM: 22.0.1

Installed Plugins

No response

Java Version

bundled

OS Version

6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux

Problem Description

Trying to split existing integration datastream ("System" integration v1.60.2) in the ingest pipeline logs-system.auth@custom
with "REROUTE" processor.

[
  {
    "reroute": {
      "dataset": [
        "{{data_stream.dataset}}"
      ],
      "namespace": [
        "1y.nbu"
      ]
    }
  }
]

The idea is to keep critical docs (logs) longer then usual docs (logs).

This is a working configuration but only for the first agent policy to which the integration was added.
For example docs routed to the index .ds-logs-system.auth-1y.nbu-2024.08.05-000004

If the integration will be added to other agent policies with different names, the documents from such integrations will be completely lost. If i remove the reroute processor from the ingest pipeline the docs stop getting lost and all ingest pipeline works as expected (System integration documents are saved to the datastream from the all agent policies).
This behavior was observed in previous elasticsearch versions as well.

Steps to Reproduce

Install System integration
Add "Reroute" processor in the ingest pipeline "logs-system.auth@custom":
[
{
"reroute": {
"dataset": [
"{{data_stream.dataset}}"
],
"namespace": [
"1y.nbu"
]
}
}
]
Add "System" integration to more then one agent policy name.
Check logs from hosts in the different agent policies.

Logs (if relevant)

No response

@SergeyDrachuk SergeyDrachuk added >bug needs:triage Requires assignment of a team area label labels Aug 13, 2024
@gbanasiak
Copy link
Contributor

@SergeyDrachuk This is an Elasticsearch bug report but you're referring to Elastic Agent integrations, so it's not clear where the problem is, actually. Is ES misbehaving, or Elastic Agent integrations introduce an unexpected ES configuration? Can you provide reproduction steps without Elastic Agent involved, or at least check the state of ingest pipelines on each Elastic Agent policy interaction?

@gbanasiak gbanasiak added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Aug 14, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Aug 14, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Aug 14, 2024
@SergeyDrachuk
Copy link
Author

SergeyDrachuk commented Aug 14, 2024

@gbanasiak
We expected the doc index .ds-logs-system.auth-default (assigned in first integration ingest pipeline logs-system.auth-1.60.2)
would change its name to something like .ds-logs-system.auth-1y.nbu-xxxx
after treatment in the next ingest pipeline "logs-system.auth@custom" with "Reroute" processor (described above).
image

But pipeline simulation shows the new name will be precisely
.ds-system.auth-1y.nbu

Data IN:

{
  "_id": "YxtLQIoBvMlmAQX6MVtg",
  "_index": ".ds-logs-system.auth-default-2023.01.04-000058",
  "_source": {
    "agent": {
      "name": "FLEET-MASTER-A",
      "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
      "ephemeral_id": "15b46ae1-d892-4d19-8e11-5b9106920953",
      "type": "filebeat",
      "version": "8.9.0"
    },
    "process": {
      "name": "sshd",
      "pid": 5011
    },
    "log": {
      "file": {
        "path": "/var/log/auth.log.1"
      },
      "offset": 0
    },
    "elastic_agent": {
      "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
      "version": "8.9.0",
      "snapshot": false
    },
    "message": "Received disconnect from 10.143.10.33 port 34680:11: disconnected by user",
    "tags": [
      "system-auth"
    ],
    "input": {
      "type": "log"
    },
    "@timestamp": "2023-08-22T12:53:11.546+03:00",
    "system": {
      "auth": {}
    },
    "ecs": {
      "version": "8.0.0"
    },
    "related": {
      "hosts": [
        "SIEM-FLEET-1"
      ]
    },
    "data_stream": {
      "namespace": "default",
      "type": "logs",
      "dataset": "system.auth"
    },
    "host": {
      "hostname": "SIEM-FLEET-1",
      "os": {
        "kernel": "6.1.0-11-amd64",
        "codename": "bookworm",
        "name": "Debian GNU/Linux",
        "type": "linux",
        "family": "debian",
        "version": "12 (bookworm)",
        "platform": "debian"
      },
      "containerized": false,
      "ip": [
        "10.143.10.38"
      ],
      "name": "fleet-master-a",
      "id": "0cb615e4592b487e87233f92553f13d0",
      "mac": [
        "00-50-56-9B-05-32"
      ],
      "architecture": "x86_64"
    },
    "event": {
      "agent_id_status": "verified",
      "ingested": "2023-08-29T07:56:24Z",
      "timezone": "+03:00",
      "kind": "event",
      "dataset": "system.auth"
    }
  }
}

Data out

{
  "_index": ".ds-system.auth-1y.nbu",
  "_id": "YxtLQIoBvMlmAQX6MVtg",
  "_version": "-3",
  "_source": {
    "agent": {
      "name": "FLEET-MASTER-A",
      "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
      "ephemeral_id": "15b46ae1-d892-4d19-8e11-5b9106920953",
      "type": "filebeat",
      "version": "8.9.0"
    },
    "process": {
      "name": "sshd",
      "pid": 5011
    },
    "log": {
      "offset": 0,
      "file": {
        "path": "/var/log/auth.log.1"
      }
    },
    "elastic_agent": {
      "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
      "version": "8.9.0",
      "snapshot": false
    },
    "message": "Received disconnect from 10.143.10.33 port 34680:11: disconnected by user",
    "tags": [
      "system-auth"
    ],
    "input": {
      "type": "log"
    },
    "@timestamp": "2023-08-22T12:53:11.546+03:00",
    "system": {
      "auth": {}
    },
    "ecs": {
      "version": "8.0.0"
    },
    "related": {
      "hosts": [
        "SIEM-FLEET-1"
      ]
    },
    "data_stream": {
      "namespace": "1y.nbu",
      "type": ".ds",
      "dataset": "system.auth"
    },
    "host": {
      "hostname": "SIEM-FLEET-1",
      "os": {
        "kernel": "6.1.0-11-amd64",
        "codename": "bookworm",
        "name": "Debian GNU/Linux",
        "type": "linux",
        "family": "debian",
        "version": "12 (bookworm)",
        "platform": "debian"
      },
      "containerized": false,
      "ip": [
        "10.143.10.38"
      ],
      "name": "fleet-master-a",
      "id": "0cb615e4592b487e87233f92553f13d0",
      "mac": [
        "00-50-56-9B-05-32"
      ],
      "architecture": "x86_64"
    },
    "event": {
      "agent_id_status": "verified",
      "ingested": "2023-08-29T07:56:24Z",
      "dataset": "system.auth",
      "timezone": "+03:00",
      "kind": "event"
    }
  },
  "_ingest": {
    "pipeline": "_simulate_pipeline",
    "timestamp": "2024-08-14T10:11:45.761Z"
  }
}

Does it work as expected?
Is a new datastream created?
How to determine the name of it?

@gbanasiak
Copy link
Contributor

Thank you for additional data.

I have now realized the mechanism described in https://www.elastic.co/guide/en/elasticsearch/reference/8.15/ingest.html#pipelines-for-fleet-elastic-agent is used here. Specifically, Fleet-managed logs-system.auth-1.60.2 ingest pipeline calls logs-system.auth@custom pipeline, if it exists.

logs-system.auth-1.60.2 and other Fleet-managed pipelines contain the following pipeline processors at the end:

[..]
      {
        "pipeline": {
          "name": "global@custom",
          "ignore_missing_pipeline": true,
          "description": "[Fleet] Global pipeline for all data streams"
        }
      },
      {
        "pipeline": {
          "name": "logs@custom",
          "ignore_missing_pipeline": true,
          "description": "[Fleet] Pipeline for all data streams of type `logs`"
        }
      },
      {
        "pipeline": {
          "name": "logs-system.integration@custom",
          "ignore_missing_pipeline": true,
          "description": "[Fleet] Pipeline for all data streams of type `logs` defined by the `system` integration"
        }
      },
      {
        "pipeline": {
          "name": "logs-system.auth@custom", <--- HERE
          "ignore_missing_pipeline": true,
          "description": "[Fleet] Pipeline for the `system.auth` dataset"
        }
      }
[..]

Going back to last update it looks like ingest pipeline simulation does not produce the expected output as type is not preserved, and index name is incorrect:

[..]
  "_index": ".ds-system.auth-1y.nbu",
[..]
    "data_stream": {
      "namespace": "1y.nbu",
      "type": ".ds", <--- HERE
      "dataset": "system.auth"
    },

Reproduction steps in 8.15.0:

PUT _ingest/pipeline/logs-system.auth@custom
{
  "processors": [
    {
      "reroute": {
        "dataset": [
          "{{data_stream.dataset}}"
        ],
        "namespace": [
          "test"
        ]
      }
    }
  ]
}

POST /_ingest/pipeline/logs-system.auth@custom/_simulate
{
  "docs": [
    {
      "_id": "YxtLQIoBvMlmAQX6MVtg",
      "_index": ".ds-logs-system.auth-default-2023.01.04-000058",
      "_source": {
        "agent": {
          "name": "FLEET-MASTER-A",
          "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
          "ephemeral_id": "15b46ae1-d892-4d19-8e11-5b9106920953",
          "type": "filebeat",
          "version": "8.9.0"
        },
        "process": {
          "name": "sshd",
          "pid": 5011
        },
        "log": {
          "file": {
            "path": "/var/log/auth.log.1"
          },
          "offset": 0
        },
        "elastic_agent": {
          "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
          "version": "8.9.0",
          "snapshot": false
        },
        "message": "Received disconnect from 10.143.10.33 port 34680:11: disconnected by user",
        "tags": [
          "system-auth"
        ],
        "input": {
          "type": "log"
        },
        "@timestamp": "2023-08-22T12:53:11.546+03:00",
        "system": {
          "auth": {}
        },
        "ecs": {
          "version": "8.0.0"
        },
        "related": {
          "hosts": [
            "SIEM-FLEET-1"
          ]
        },
        "data_stream": {
          "namespace": "default",
          "type": "logs",
          "dataset": "system.auth"
        },
        "host": {
          "hostname": "SIEM-FLEET-1",
          "os": {
            "kernel": "6.1.0-11-amd64",
            "codename": "bookworm",
            "name": "Debian GNU/Linux",
            "type": "linux",
            "family": "debian",
            "version": "12 (bookworm)",
            "platform": "debian"
          },
          "containerized": false,
          "ip": [
            "10.143.10.38"
          ],
          "name": "fleet-master-a",
          "id": "0cb615e4592b487e87233f92553f13d0",
          "mac": [
            "00-50-56-9B-05-32"
          ],
          "architecture": "x86_64"
        },
        "event": {
          "agent_id_status": "verified",
          "ingested": "2023-08-29T07:56:24Z",
          "timezone": "+03:00",
          "kind": "event",
          "dataset": "system.auth"
        }
      }
    }
  ]
}

# output
[..]
        "_index": ".ds-system.auth-test",
[..]
          "data_stream": {
            "namespace": "test",
            "type": ".ds",
            "dataset": "system.auth"
          },
[..]

But this might only be simulation thing as the following test does work correctly:

PUT logs-system.auth-default/_bulk?pipeline=logs-system.auth@custom
{"create":{}}
{"agent":{"name":"FLEET-MASTER-A","id":"42d17a41-bb3b-402f-b687-5fca2426fbae","ephemeral_id":"15b46ae1-d892-4d19-8e11-5b9106920953","type":"filebeat","version":"8.9.0"},"process":{"name":"sshd","pid":5011},"log":{"file":{"path":"/var/log/auth.log.1"},"offset":0},"elastic_agent":{"id":"42d17a41-bb3b-402f-b687-5fca2426fbae","version":"8.9.0","snapshot":false},"message":"Received disconnect from 10.143.10.33 port 34680:11: disconnected by user","tags":["system-auth"],"input":{"type":"log"},"@timestamp":"2023-08-22T12:53:11.546+03:00","system":{"auth":{}},"ecs":{"version":"8.0.0"},"related":{"hosts":["SIEM-FLEET-1"]},"data_stream":{"namespace":"default","type":"logs","dataset":"system.auth"},"host":{"hostname":"SIEM-FLEET-1","os":{"kernel":"6.1.0-11-amd64","codename":"bookworm","name":"Debian GNU/Linux","type":"linux","family":"debian","version":"12 (bookworm)","platform":"debian"},"containerized":false,"ip":["10.143.10.38"],"name":"fleet-master-a","id":"0cb615e4592b487e87233f92553f13d0","mac":["00-50-56-9B-05-32"],"architecture":"x86_64"},"event":{"agent_id_status":"verified","ingested":"2023-08-29T07:56:24Z","timezone":"+03:00","kind":"event","dataset":"system.auth"}}

# result
{
  "errors": false,
  "took": 24087811880,
  "ingest_took": 7,
  "items": [
    {
      "create": {
        "_index": ".ds-logs-system.auth-test-2024.08.14-000001", <---- HERE
        "_id": "_PsPUZEBsRUWctdqcdH3",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
}

However, this test with the payload specified above does not:

PUT logs-system.auth-default/_bulk
[..]

But that's due to a failure earlier in the logs-system.auth-1.60.2 pipeline: "Processor conditional with tag grok-message-header in pipeline logs-system.auth-1.60.2 failed with message: Provided Grok expressions do not match field value: [Received disconnect from 10.143.10.33 port 34680:11: disconnected by user]". When pipeline encountered an error it moved to on_failure section bypassing custom pipeline processors. This Elasticsearch behavior is expected for this definition of the pipeline I think. Perhaps pipeline should be made more robust, but that's an integration issue.

I'll let @elastic/es-data-management comment further. At this point based on the data provided I think there's a concern around ingest pipeline simulation behavior. I think we should adjust the issue title.

@masseyke
Copy link
Member

What is the expected output from the POST /_ingest/pipeline/logs-system.auth@custom/_simulate request? I assume it would be this, right?

[..]
        "_index": ".ds-logs-system.auth-test-2023.01.04-000058",
[..]
          "data_stream": {
            "namespace": "test",
            "type": "logs",
            "dataset": "system.auth"
          },
[..]

I think the problem though is not with the simulate code, but with the fact that the reroute processor assumes that if you don't give it a destination then you have given a data stream (using the data stream naming convention) as the _index field. So it's parsing .ds-logs-system.auth-test-2023.01.04-000058 as if that were a data stream name (it's a backing index behind a data stream though). Maybe we just need to add some validation to the reroute processor to blow up if the index is not a data stream?

@gbanasiak
Copy link
Contributor

I see, so the correct way of using the simulate endpoint in this context would be:

POST /_ingest/pipeline/logs-system.auth@custom/_simulate
{
  "docs": [
    {
      [..]
      "_index": "logs-system.auth-default",
      [..]
    }
  ]
}

Maybe we just need to add some validation to the reroute processor to blow up if the index is not a data stream?

According to documentation this should already be happening?

When the destination option is not set, this processor is in a data stream mode. Note that in this mode, the reroute processor can only be used on data streams that follow the data stream naming scheme. Trying to use this processor on a data stream with a non-compliant name will raise an exception.

Also, exception or rather a pipeline failure? I would expect a failure, so it can be ignored with ignore_failure and handled through on_failure?

@SergeyDrachuk
Copy link
Author

Behind the simulation tests what we expect that Reroute processor provide the possibility to assign a different index template (for example with pattern logs-system.auth-test-) and indexes like .ds-logs-system.auth-test- would have different ILM (keep time) than .ds-logs-system.auth-default-* (some index mask overlapping exists, i'm still not sure this will work).
The solution with Reroute processor (to keep docs different time from any an integration datastream) come from elasticsearch team.

@gbanasiak
Copy link
Contributor

@SergeyDrachuk

Behind the simulation tests what we expect that Reroute processor provide the possibility to assign a different index template (for example with pattern logs-system.auth-test-) and indexes like .ds-logs-system.auth-test- would have different ILM (keep time) than .ds-logs-system.auth-default-* (some index mask overlapping exists, i'm still not sure this will work).

Reroute processor can alter the destination data stream, and destination data stream settings (such as ILM policy and thus data retention) depend on a matching index template. If you follow https://www.elastic.co/guide/en/fleet/8.15/data-streams-ilm-tutorial.html you will be able to differentiate data retention setting by duplicating and modifying the default logs-system.auth index template behavior.

When it comes to this issue I still don't see where the problem is, actually. The _simulate endpoint behavior is meaningless from ingest perspective. As discussed earlier, the proper way to exercise simulate would be as follows:

POST /_ingest/pipeline/logs-system.auth-1.60.2/_simulate
{
  "docs": [
    {
      "_id": "YxtLQIoBvMlmAQX6MVtg",
      "_index": "logs-system.auth-default", <--- data stream name, not index name
      "_source": {
        "agent": {
          "name": "FLEET-MASTER-A",
          "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
          "ephemeral_id": "15b46ae1-d892-4d19-8e11-5b9106920953",
          "type": "filebeat",
          "version": "8.9.0"
        },
        "process": {
          "name": "sshd",
          "pid": 5011
        },
        "log": {
          "file": {
            "path": "/var/log/auth.log.1"
          },
          "offset": 0
        },
        "elastic_agent": {
          "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
          "version": "8.9.0",
          "snapshot": false
        },
        "message": "Received disconnect from 10.143.10.33 port 34680:11: disconnected by user",
        "tags": [
          "system-auth"
        ],
        "input": {
          "type": "log"
        },
        "@timestamp": "2023-08-22T12:53:11.546+03:00",
        "system": {
          "auth": {}
        },
        "ecs": {
          "version": "8.0.0"
        },
        "related": {
          "hosts": [
            "SIEM-FLEET-1"
          ]
        },
        "data_stream": {
          "namespace": "default",
          "type": "logs",
          "dataset": "system.auth"
        },
        "host": {
          "hostname": "SIEM-FLEET-1",
          "os": {
            "kernel": "6.1.0-11-amd64",
            "codename": "bookworm",
            "name": "Debian GNU/Linux",
            "type": "linux",
            "family": "debian",
            "version": "12 (bookworm)",
            "platform": "debian"
          },
          "containerized": false,
          "ip": [
            "10.143.10.38"
          ],
          "name": "fleet-master-a",
          "id": "0cb615e4592b487e87233f92553f13d0",
          "mac": [
            "00-50-56-9B-05-32"
          ],
          "architecture": "x86_64"
        },
        "event": {
          "agent_id_status": "verified",
          "ingested": "2023-08-29T07:56:24Z",
          "timezone": "+03:00",
          "kind": "event",
          "dataset": "system.auth"
        }
      }
    }
  ]
}

What I get in return shows that the reroute did not happen which is expected because pipeline ran into Processor conditional with tag grok-message-header in pipeline logs-system.auth-1.60.2 failed with message: Provided Grok expressions do not match field value: [Received disconnect from 10.143.10.33 port 34680:11: disconnected by user] error.

{
  "docs": [
    {
      "doc": {
        "_index": "logs-system.auth-default", <--- unchanged data stream
        "_version": "-3",
        "_id": "YxtLQIoBvMlmAQX6MVtg",
        "_source": {
          "agent": {
            "name": "FLEET-MASTER-A",
            "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
            "ephemeral_id": "15b46ae1-d892-4d19-8e11-5b9106920953",
            "type": "filebeat",
            "version": "8.9.0"
          },
          "process": {
            "name": "sshd",
            "pid": 5011
          },
          "log": {
            "offset": 0,
            "file": {
              "path": "/var/log/auth.log.1"
            }
          },
          "elastic_agent": {
            "id": "42d17a41-bb3b-402f-b687-5fca2426fbae",
            "version": "8.9.0",
            "snapshot": false
          },
          "error": {
            "message": "Processor conditional with tag grok-message-header in pipeline logs-system.auth-1.60.2 failed with message: Provided Grok expressions do not match field value: [Received disconnect from 10.143.10.33 port 34680:11: disconnected by user]"
          },
          "tags": [
            "system-auth"
          ],
          "input": {
            "type": "log"
          },
          "@timestamp": "2023-08-22T12:53:11.546+03:00",
          "system": {
            "auth": {}
          },
          "ecs": {
            "version": "8.0.0"
          },
          "related": {
            "hosts": [
              "SIEM-FLEET-1"
            ]
          },
          "data_stream": {
            "namespace": "default",
            "type": "logs",
            "dataset": "system.auth"
          },
          "host": {
            "hostname": "SIEM-FLEET-1",
            "os": {
              "kernel": "6.1.0-11-amd64",
              "codename": "bookworm",
              "name": "Debian GNU/Linux",
              "type": "linux",
              "family": "debian",
              "version": "12 (bookworm)",
              "platform": "debian"
            },
            "containerized": false,
            "ip": [
              "10.143.10.38"
            ],
            "name": "fleet-master-a",
            "id": "0cb615e4592b487e87233f92553f13d0",
            "mac": [
              "00-50-56-9B-05-32"
            ],
            "architecture": "x86_64"
          },
          "event": {
            "agent_id_status": "verified",
            "ingested": "2023-08-29T07:56:24Z",
            "original": "Received disconnect from 10.143.10.33 port 34680:11: disconnected by user",
            "timezone": "+03:00",
            "kind": "pipeline_error",
            "dataset": "system.auth"
          }
        },
        "_ingest": {
          "timestamp": "2024-09-02T12:33:38.130868589Z"
        }
      }
    }
  ]
}

ES behavior is correct here. To have a different result, the logs-system.auth-1.60.2 pipeline definition would have to be different, but that's not an ES issue but rather https://github.com/elastic/integrations issue.

I don't see how the behavior you described initially could happen, i.e.:

This is a working configuration but only for the first agent policy to which the integration was added.
If the integration will be added to other agent policies with different names, the documents from such integrations will be completely lost. If i remove the reroute processor from the ingest pipeline the docs stop getting lost and all ingest pipeline works as expected (System integration documents are saved to the datastream from the all agent policies).

Is ES configuration changing as you go through integration policy reconfiguration? We need some reproduction steps at ES level. Can you provide these?

@SergeyDrachuk
Copy link
Author

SergeyDrachuk commented Sep 3, 2024

@gbanasiak

Is ES configuration changing as you go through integration policy reconfiguration?

I don't think an ES config changes were occasionally had impact on the Reroute processor behaviour. We tried to add/remove Reroute processor in a different days several times. The behaviour is the same.

We need some reproduction steps at ES level. Can you provide these?

Sure. What should i do?

I think the way we tested simulation is not 100% right. The doc we tried to put to ingest pipeline simulation was taken after it processed the same pipiline and stored to an output datastream. We did it in such a way bc i don't know how to intercept (catch) the raw doc at point between integration and ingest pipeline. Should we need to add to the integration output config an "empty" ingest pipiline and take raw doc for the simulation or there is an another more easy way?

@gbanasiak
Copy link
Contributor

To get the original document on input to ES as formed by the agent, you can enable debug logs in a single agent. Leave it running for a while to make sure documents were ingested, and then collect diagnostics using elastic-agent diagnostics command. The produced archive will include logs of a respective Beat (among other things). Debug logs are very verbose, so don't keep this log level for too long. The ones I'm interested in include a JSON payload sent to ES (of every single event/document).

Empty ingest pipeline you suggested should also work, but debug logs might be easier.

@SergeyDrachuk
Copy link
Author

elastic-agent-event-log-20240904-144.zip

@gbanasiak
I can't find in the diagnostics archive folder elastic-agent-8.15.0-25075f\events\ most of the following events from the /var/log/auth.log

Image

@gbanasiak
Copy link
Contributor

Thank you for sharing the logs. If you grep for system.auth you can find interesting bits.

First the bit we were looking for - ingested event. The agent log looks like this (after formatting):

{
    "log.level": "debug",
    "@timestamp": "2024-09-04T16:55:10.755+0300",
    "message": "Publish event: {\n  \"@timestamp\": \"2024-09-04T13:55:01.000Z\",\n  \"@metadata\": {\n    \"beat\": \"filebeat\",\n    \"type\": \"_doc\",\n    \"version\": \"8.15.0\",\n    \"raw_index\": \"logs-system.auth-default\",\n    \"input_id\": \"logfile-system-04b6e2f3-72bd-4e2e-a0a3-d8cb413195b9\",\n    \"stream_id\": \"logfile-system.auth-04b6e2f3-72bd-4e2e-a0a3-d8cb413195b9\"\n  },\n  \"event\": {\n    \"dataset\": \"system.auth\",\n    \"timezone\": \"+03:00\",\n    \"original\": \"Sep  4 16:55:01 SIEM-ARCHIVE CRON[3925397]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)\"\n  },\n  \"agent\": {\n    \"ephemeral_id\": \"91adbb8f-f4ab-4135-b972-38e30446c9a3\",\n    \"id\": \"814646e8-37b9-4d52-992f-40bf2e9eb209\",\n    \"name\": \"SIEM-ARCHIVE\",\n    \"type\": \"filebeat\",\n    \"version\": \"8.15.0\"\n  },\n  \"ecs\": {\n    \"version\": \"8.0.0\"\n  },\n  \"log\": {\n    \"offset\": 189281,\n    \"file\": {\n      \"path\": \"/var/log/auth.log\"\n    },\n    \"syslog\": {\n      \"appname\": \"CRON\",\n      \"procid\": \"3925397\",\n      \"hostname\": \"SIEM-ARCHIVE\"\n    }\n  },\n  \"message\": \"pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)\",\n  \"tags\": [\n    \"system-auth\"\n  ],\n  \"input\": {\n    \"type\": \"log\"\n  },\n  \"data_stream\": {\n    \"namespace\": \"default\",\n    \"type\": \"logs\",\n    \"dataset\": \"system.auth\"\n  },\n  \"elastic_agent\": {\n    \"id\": \"814646e8-37b9-4d52-992f-40bf2e9eb209\",\n    \"snapshot\": false,\n    \"version\": \"8.15.0\"\n  },\n  \"host\": {\n    \"containerized\": false,\n    \"ip\": [\n      \"10.143.10.33\"\n    ],\n    \"mac\": [\n      \"00-0C-29-88-2D-37\"\n    ],\n    \"name\": \"siem-archive\",\n    \"hostname\": \"siem-archive\",\n    \"architecture\": \"x86_64\",\n    \"os\": {\n      \"version\": \"11 (bullseye)\",\n      \"family\": \"debian\",\n      \"name\": \"Debian GNU/Linux\",\n      \"kernel\": \"5.10.0-28-amd64\",\n      \"codename\": \"bullseye\",\n      \"type\": \"linux\",\n      \"platform\": \"debian\"\n    },\n    \"id\": \"79782a8b868f45708e81fb1949743325\"\n  }\n}",
    "component": {
        "binary": "filebeat",
        "dataset": "elastic_agent.filebeat",
        "id": "log-default",
        "type": "log"
    },
    "log": {
        "source": "log-default"
    },
    "log.logger": "processors",
    "log.origin": {
        "file.line": 215,
        "file.name": "processing/processors.go",
        "function": "github.com/elastic/beats/v7/libbeat/publisher/processing.debugPrintProcessor.func1"
    },
    "service.name": "filebeat",
    "log.type": "event",
    "ecs.version": "1.6.0",
    "ecs.version": "1.6.0"
}

The event extracted from the message field is as follows:

{
    "@timestamp": "2024-09-04T13:55:01.000Z",
    "@metadata": {
        "beat": "filebeat",
        "type": "_doc",
        "version": "8.15.0",
        "raw_index": "logs-system.auth-default",
        "input_id": "logfile-system-04b6e2f3-72bd-4e2e-a0a3-d8cb413195b9",
        "stream_id": "logfile-system.auth-04b6e2f3-72bd-4e2e-a0a3-d8cb413195b9"
    },
    "event": {
        "dataset": "system.auth",
        "timezone": "+03:00",
        "original": "Sep  4 16:55:01 SIEM-ARCHIVE CRON[3925397]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)"
    },
    "agent": {
        "ephemeral_id": "91adbb8f-f4ab-4135-b972-38e30446c9a3",
        "id": "814646e8-37b9-4d52-992f-40bf2e9eb209",
        "name": "SIEM-ARCHIVE",
        "type": "filebeat",
        "version": "8.15.0"
    },
    "ecs": {
        "version": "8.0.0"
    },
    "log": {
        "offset": 189281,
        "file": {
            "path": "/var/log/auth.log"
        },
        "syslog": {
            "appname": "CRON",
            "procid": "3925397",
            "hostname": "SIEM-ARCHIVE"
        }
    },
    "message": "pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)",
    "tags": [
        "system-auth"
    ],
    "input": {
        "type": "log"
    },
    "data_stream": {
        "namespace": "default",
        "type": "logs",
        "dataset": "system.auth"
    },
    "elastic_agent": {
        "id": "814646e8-37b9-4d52-992f-40bf2e9eb209",
        "snapshot": false,
        "version": "8.15.0"
    },
    "host": {
        "containerized": false,
        "ip": [
            "10.143.10.33"
        ],
        "mac": [
            "00-0C-29-88-2D-37"
        ],
        "name": "siem-archive",
        "hostname": "siem-archive",
        "architecture": "x86_64",
        "os": {
            "version": "11 (bullseye)",
            "family": "debian",
            "name": "Debian GNU/Linux",
            "kernel": "5.10.0-28-amd64",
            "codename": "bullseye",
            "type": "linux",
            "platform": "debian"
        },
        "id": "79782a8b868f45708e81fb1949743325"
    }
}

The ingest simulation for this event with your setup works fine, i.e. document lands in non-default datastream.

POST /_ingest/pipeline/logs-system.auth-1.60.2/_simulate
{
  "docs": [
    {
      "_index": "logs-system.auth-default",
      "_source": {
        [..]
      }
    }
  ]
}

response:
{
  "docs": [
    {
      "doc": {
        "_index": "logs-system.auth-test", <--- HERE
[..]
          "data_stream": {
            "namespace": "test", <--- HERE
            "type": "logs",
            "dataset": "system.auth"
          },
[..]
      }
    }
  ]
}

Agent logs includes this interesting warning though: Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Meta:null, Fields:null, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}, EncodedEvent:(*elasticsearch.encodedEvent)(0xc002f69680)} (status=403): {\"type\":\"security_exception\",\"reason\":\"action [indices:data/write/bulk[s]] is unauthorized for API key id [FBzaMo4B7V8SpG9PYflL] of user [elastic/fleet-server] on indices [logs-system.auth-1y.nbu], this action is granted by the index privileges [create_doc,create,delete,index,write,all]\"}, dropping event!

{
    "type": "security_exception",
    "reason": "action [indices:data/write/bulk[s]] is unauthorized for API key id [FBzaMo4B7V8SpG9PYflL] of user [elastic/fleet-server] on indices [logs-system.auth-1y.nbu], this action is granted by the index privileges [create_doc,create,delete,index,write,all]"
}

This means that API key the agent has is not authorized to write into logs-system.auth-1y.nbu datastream which would explain why documents were completely missing. I don't know why this works initially and then stops working (as per your original description), but that's not an Elasticsearch issue, but Fleet/Elastic Agent setup issue. I would recommend opening a new topic in https://discuss.elastic.co/c/elastic-stack/elastic-agent or contacting Elastic Support.

I've raised #112543 to document the problem discussed earlier, and I'm closing this one. Feel free to open a new issue if evidence for ES bug is found.

@immon
Copy link
Contributor

immon commented Sep 11, 2024

I'm leaving a note here for others who'd like to use reroute.
The root cause of the issue lies in the insufficient permissions associated with the API key generated by the Fleet Server. Specifically, the API key doesn't permit writing to the logs-system.auth-1y.nbu index.

As a temporary solution, you can add the Custom Logs integration to the same Agent policy. This will grant the auto_configure and create_doc privileges to all logs-*-* indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

5 participants