-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REROUTE processor doesn't work for 2nd and later instances of the same integration and drops documents for theese instances. #111828
Comments
@SergeyDrachuk This is an Elasticsearch bug report but you're referring to Elastic Agent integrations, so it's not clear where the problem is, actually. Is ES misbehaving, or Elastic Agent integrations introduce an unexpected ES configuration? Can you provide reproduction steps without Elastic Agent involved, or at least check the state of ingest pipelines on each Elastic Agent policy interaction? |
Pinging @elastic/es-data-management (Team:Data Management) |
@gbanasiak But pipeline simulation shows the new name will be precisely Data IN:
Data out
Does it work as expected? |
Thank you for additional data. I have now realized the mechanism described in https://www.elastic.co/guide/en/elasticsearch/reference/8.15/ingest.html#pipelines-for-fleet-elastic-agent is used here. Specifically, Fleet-managed
Going back to last update it looks like ingest pipeline simulation does not produce the expected output as type is not preserved, and index name is incorrect:
Reproduction steps in 8.15.0:
But this might only be simulation thing as the following test does work correctly:
However, this test with the payload specified above does not:
But that's due to a failure earlier in the I'll let @elastic/es-data-management comment further. At this point based on the data provided I think there's a concern around ingest pipeline simulation behavior. I think we should adjust the issue title. |
What is the expected output from the
I think the problem though is not with the simulate code, but with the fact that the reroute processor assumes that if you don't give it a |
I see, so the correct way of using the simulate endpoint in this context would be:
According to documentation this should already be happening?
Also, exception or rather a pipeline failure? I would expect a failure, so it can be ignored with |
Behind the simulation tests what we expect that Reroute processor provide the possibility to assign a different index template (for example with pattern logs-system.auth-test-) and indexes like .ds-logs-system.auth-test- would have different ILM (keep time) than .ds-logs-system.auth-default-* (some index mask overlapping exists, i'm still not sure this will work). |
Reroute processor can alter the destination data stream, and destination data stream settings (such as ILM policy and thus data retention) depend on a matching index template. If you follow https://www.elastic.co/guide/en/fleet/8.15/data-streams-ilm-tutorial.html you will be able to differentiate data retention setting by duplicating and modifying the default When it comes to this issue I still don't see where the problem is, actually. The
What I get in return shows that the reroute did not happen which is expected because pipeline ran into
ES behavior is correct here. To have a different result, the I don't see how the behavior you described initially could happen, i.e.:
Is ES configuration changing as you go through integration policy reconfiguration? We need some reproduction steps at ES level. Can you provide these? |
I don't think an ES config changes were occasionally had impact on the Reroute processor behaviour. We tried to add/remove Reroute processor in a different days several times. The behaviour is the same.
Sure. What should i do? I think the way we tested simulation is not 100% right. The doc we tried to put to ingest pipeline simulation was taken after it processed the same pipiline and stored to an output datastream. We did it in such a way bc i don't know how to intercept (catch) the raw doc at point between integration and ingest pipeline. Should we need to add to the integration output config an "empty" ingest pipiline and take raw doc for the simulation or there is an another more easy way? |
To get the original document on input to ES as formed by the agent, you can enable debug logs in a single agent. Leave it running for a while to make sure documents were ingested, and then collect diagnostics using Empty ingest pipeline you suggested should also work, but debug logs might be easier. |
elastic-agent-event-log-20240904-144.zip @gbanasiak |
Thank you for sharing the logs. If you grep for First the bit we were looking for - ingested event. The agent log looks like this (after formatting):
The event extracted from the
The ingest simulation for this event with your setup works fine, i.e. document lands in non-default datastream.
Agent logs includes this interesting warning though:
This means that API key the agent has is not authorized to write into I've raised #112543 to document the problem discussed earlier, and I'm closing this one. Feel free to open a new issue if evidence for ES bug is found. |
I'm leaving a note here for others who'd like to use reroute. As a temporary solution, you can add the Custom Logs integration to the same Agent policy. This will grant the |
Elasticsearch Version
8.15.0, Build: deb/1a77947f34deddb41af25e6f0ddb8e830159c179/2024-08-05T10:05:34.233336849Z, JVM: 22.0.1
Installed Plugins
No response
Java Version
bundled
OS Version
6.1.0-22-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21) x86_64 GNU/Linux
Problem Description
Trying to split existing integration datastream ("System" integration v1.60.2) in the ingest pipeline logs-system.auth@custom
with "REROUTE" processor.
The idea is to keep critical docs (logs) longer then usual docs (logs).
This is a working configuration but only for the first agent policy to which the integration was added.
For example docs routed to the index .ds-logs-system.auth-1y.nbu-2024.08.05-000004
If the integration will be added to other agent policies with different names, the documents from such integrations will be completely lost. If i remove the reroute processor from the ingest pipeline the docs stop getting lost and all ingest pipeline works as expected (System integration documents are saved to the datastream from the all agent policies).
This behavior was observed in previous elasticsearch versions as well.
Steps to Reproduce
Install System integration
Add "Reroute" processor in the ingest pipeline "logs-system.auth@custom":
[
{
"reroute": {
"dataset": [
"{{data_stream.dataset}}"
],
"namespace": [
"1y.nbu"
]
}
}
]
Add "System" integration to more then one agent policy name.
Check logs from hosts in the different agent policies.
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: