-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change deprecation logs data stream name #68737
Change deprecation logs data stream name #68737
Conversation
Add `.default to the end of the deprecation logs data stream name, making it `.logs-deprecation-elasticsearch.default` after advice from the ECS folks. Also broaden the corresponding component template pattern to `.logs-deprecation-*`, so that any stack project can use it.
Pinging @elastic/es-core-infra (Team:Core/Infra) |
@elasticmachine update branch |
@@ -116,7 +116,7 @@ public void testTemplateExists() throws IOException { | |||
request.setOptions(expectWarnings( | |||
"legacy template [template] has index patterns [*] matching patterns from existing composable templates " + | |||
"[.deprecation-indexing-template,.slm-history,.triggered_watches,.watch-history-14,.watches,ilm-history,logs," + | |||
"metrics,synthetics] with patterns (.deprecation-indexing-template => [.logs-deprecation-elasticsearch]," + | |||
"metrics,synthetics] with patterns (.deprecation-indexing-template => [.logs-deprecation-elasticsearch.default]," + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the naming scheme (https://www.elastic.co/blog/an-introduction-to-the-elastic-data-stream-naming-scheme) we are using here elasticsearch.default
as the namespace which I'm not sure if this is on purpose? Can you point me to the discussion with the ECS folks to better understand the structure here? Is it expected that all stack components ship data to a single data stream or each service will have its own data stream with its own data structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, we're arrived where we are currently in a rather organic (haphazard) fashion. We haven't yet had an explicit discussion around whether to use a single data stream for all components, or a stream per component.
Since we're on the subject, however, I think it would make sense to have a single data stream, since we're trying to move to a place where we can give a user a complete view of the deprecation state of their Elastic Stack deployment. I don't know what benefit we'd get from multiple data streams, so long as it's possible to differentiate where each deprecation message originated, for aggregation and filtering purposes. What do you think? If we go down that road, would we end up with data stream like e.g. .logs-deprecation.elastic-default
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ignore for now the type and namespace part and only focus on the dataset
bit which above is deprecation.elastic
as I think it is the one that matters here.
I could see a few scenarios for naming. In general the logic I have in mind it goes from generic left to more specific on the right. Example nginx.acess
is nginx access logs.
deprecation.*
. This sounds like a general place for deprecation logs, not only the Elastic Stack.elastic.deprecation.*
. This sounds like deprecation logs specific to the Elastic Stack.kibana.deprecation
: Kibana has several logs, these are the deprecation logs for Kibana.
I don't think we build a "general place" for deprecation logs, so I would throw out deprecation.*
. elastic.deprecation
would allow us to ship all to a single data stream. Will all deprecation logs have the same format and same retention? Or will we clean the deprecation logs for Kibana but not Elasticsearch. I remember a discussion around this that someone wants to "clean" all alerts. If we do, this would indicate we land on elastic.deprecation.kibana
or option 3.
To decide between Option 2 and 3, the question for me is if this is temporary / 1 off thing or if deprecation logs should always be sent there not only for a migration effort. Today we collect Kibana logs into kibana.*
and Elasticsearch logs into elasticsearch.*
. So adding the deprecation logs here as elasticsearch.deprecation
feels very natural, but it is now a public index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the point of this work is to make upgrading the Stack an easier process for users, so I feel that we need to take a Stack-centric view wherever possible.
I'm afraid I get a bit lost with the discussion of namespaces, types, datasets, etc etc. Would you mind restating what values I'd need to put and where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, the namespace
is just default. I think where we need to have the discussion is on dataset
and I hope a few chime in with their opinions and pros / cons.
@pugnascotia What is your take on
To decide between Option 2 and 3, the question for me is if this is temporary / 1 off thing or if deprecation logs should always be sent there not only for a migration effort. Today we collect Kibana logs into kibana.* and Elasticsearch logs into elasticsearch.*. So adding the deprecation logs here as elasticsearch.deprecation feels very natural, but it is now a public index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_stream.dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding you correctly, elasticsearch.deprecation
would be a good option (and what this PR does). So each Stack component would write a document where the data_stream.dataset
would be <component>.deprecation
?
That said, I like the hierarchical structure of elastic.deprecation.*
. Would that option translate to elastic.deprecation.elasticsearch
in the documents written by ES, or is that the literal dataset value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small important difference from what this PR does and it is .logs
vs logs
and I'm not sure yet which path we should follow.
Today the Elasticsearch Filebeat module already ships deprecation logs: https://github.com/elastic/beats/tree/master/filebeat/module/elasticsearch/deprecation This will go to logs-elasticsearch.deprecation-default
as soon as it is part of Elastic Agent.
The advantage to make it private in our case is that maybe during migration he wants to wipe all the deprecation logs from the UI but does not necessarily really mean wipe what is shipped by Filebeat? What happens if a user enables both? Will the data be duplicated? Or is this actually different data?
@pugnascotia The data_stream.dataset
value in the document must always match the {dataset} part of the data stream, so I think the answer is yes to your second question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin I feel like there's a difference between data that a user has chosen to ingest, and data that the Stack has collected for its own purposes. I'd prefer to keep the data that we rely on separate.
This PR has been hanging around for a while now and I'd like to push it over the finish line. We need to decide:
- What is the data stream name for ES logs? This PR changes it to
.logs-deprecation-elasticsearch.default
, but it was previously.logs-deprecation-elasticsearch
- What should the value of
data_stream.dataset
be? It's currentelasticsearch.deprecation
(via DeprecatedMessage.java), but we could change it toelastic.deprecation.elasticsearch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicolas and I synced, and decided on the following.
Given the index naming pattern {type}-{dataset}-{namespace}
, we'll use the following values:
type = .logs
dataset = deprecation.{product} (e.g. `elasticsearch`)
namespace = default
I'll update the PR accordingly.
I've now updated It's a little unfortunate that |
@pugnascotia Can a user still log deprecation logs to file or will this now only go internally? |
Elasticsearch will continue to log deprecation warnings to file (unless a user has changed the log4j configuration). |
Great, so the existing deprecation logging will not change, neither will the content. |
That's correct 👍 |
More fixes to deprecation log indexing so that the data stream name and document contents are more ECS-compatible.
Backport of #68737. More fixes to deprecation log indexing so that the data stream name and document contents are more ECS-compatible.
Add
.default
to the end of the deprecation logs data stream name,making it
.logs-deprecation-elasticsearch.default
after advice fromthe ECS folks.
Also broaden the corresponding component template pattern to
.logs-deprecation-*
, so that any stack project can use it.cc @ruflin FYI