Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change deprecation logs data stream name #68737

Merged
merged 6 commits into from
Apr 13, 2021

Conversation

pugnascotia
Copy link
Contributor

@pugnascotia pugnascotia commented Feb 9, 2021

Add .default to the end of the deprecation logs data stream name,
making it .logs-deprecation-elasticsearch.default after advice from
the ECS folks.

Also broaden the corresponding component template pattern to
.logs-deprecation-*, so that any stack project can use it.

cc @ruflin FYI

Add `.default to the end of the deprecation logs data stream name,
making it `.logs-deprecation-elasticsearch.default` after advice from
the ECS folks.

Also broaden the corresponding component template pattern to
`.logs-deprecation-*`, so that any stack project can use it.
@pugnascotia pugnascotia requested review from ruflin and rjernst February 9, 2021 11:41
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Feb 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@pugnascotia
Copy link
Contributor Author

@elasticmachine update branch

@@ -116,7 +116,7 @@ public void testTemplateExists() throws IOException {
request.setOptions(expectWarnings(
"legacy template [template] has index patterns [*] matching patterns from existing composable templates " +
"[.deprecation-indexing-template,.slm-history,.triggered_watches,.watch-history-14,.watches,ilm-history,logs," +
"metrics,synthetics] with patterns (.deprecation-indexing-template => [.logs-deprecation-elasticsearch]," +
"metrics,synthetics] with patterns (.deprecation-indexing-template => [.logs-deprecation-elasticsearch.default]," +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the naming scheme (https://www.elastic.co/blog/an-introduction-to-the-elastic-data-stream-naming-scheme) we are using here elasticsearch.default as the namespace which I'm not sure if this is on purpose? Can you point me to the discussion with the ECS folks to better understand the structure here? Is it expected that all stack components ship data to a single data stream or each service will have its own data stream with its own data structure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, we're arrived where we are currently in a rather organic (haphazard) fashion. We haven't yet had an explicit discussion around whether to use a single data stream for all components, or a stream per component.

Since we're on the subject, however, I think it would make sense to have a single data stream, since we're trying to move to a place where we can give a user a complete view of the deprecation state of their Elastic Stack deployment. I don't know what benefit we'd get from multiple data streams, so long as it's possible to differentiate where each deprecation message originated, for aggregation and filtering purposes. What do you think? If we go down that road, would we end up with data stream like e.g. .logs-deprecation.elastic-default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ignore for now the type and namespace part and only focus on the dataset bit which above is deprecation.elastic as I think it is the one that matters here.

I could see a few scenarios for naming. In general the logic I have in mind it goes from generic left to more specific on the right. Example nginx.acess is nginx access logs.

  • deprecation.*. This sounds like a general place for deprecation logs, not only the Elastic Stack.
  • elastic.deprecation.*. This sounds like deprecation logs specific to the Elastic Stack.
  • kibana.deprecation: Kibana has several logs, these are the deprecation logs for Kibana.

I don't think we build a "general place" for deprecation logs, so I would throw out deprecation.*. elastic.deprecation would allow us to ship all to a single data stream. Will all deprecation logs have the same format and same retention? Or will we clean the deprecation logs for Kibana but not Elasticsearch. I remember a discussion around this that someone wants to "clean" all alerts. If we do, this would indicate we land on elastic.deprecation.kibana or option 3.

To decide between Option 2 and 3, the question for me is if this is temporary / 1 off thing or if deprecation logs should always be sent there not only for a migration effort. Today we collect Kibana logs into kibana.* and Elasticsearch logs into elasticsearch.*. So adding the deprecation logs here as elasticsearch.deprecation feels very natural, but it is now a public index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the point of this work is to make upgrading the Stack an easier process for users, so I feel that we need to take a Stack-centric view wherever possible.

I'm afraid I get a bit lost with the discussion of namespaces, types, datasets, etc etc. Would you mind restating what values I'd need to put and where?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, the namespace is just default. I think where we need to have the discussion is on dataset and I hope a few chime in with their opinions and pros / cons.

@pugnascotia What is your take on

To decide between Option 2 and 3, the question for me is if this is temporary / 1 off thing or if deprecation logs should always be sent there not only for a migration effort. Today we collect Kibana logs into kibana.* and Elasticsearch logs into elasticsearch.*. So adding the deprecation logs here as elasticsearch.deprecation feels very natural, but it is now a public index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_stream.dataset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding you correctly, elasticsearch.deprecation would be a good option (and what this PR does). So each Stack component would write a document where the data_stream.dataset would be <component>.deprecation?

That said, I like the hierarchical structure of elastic.deprecation.*. Would that option translate to elastic.deprecation.elasticsearch in the documents written by ES, or is that the literal dataset value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small important difference from what this PR does and it is .logs vs logs and I'm not sure yet which path we should follow.

Today the Elasticsearch Filebeat module already ships deprecation logs: https://github.com/elastic/beats/tree/master/filebeat/module/elasticsearch/deprecation This will go to logs-elasticsearch.deprecation-default as soon as it is part of Elastic Agent.

The advantage to make it private in our case is that maybe during migration he wants to wipe all the deprecation logs from the UI but does not necessarily really mean wipe what is shipped by Filebeat? What happens if a user enables both? Will the data be duplicated? Or is this actually different data?

@pugnascotia The data_stream.dataset value in the document must always match the {dataset} part of the data stream, so I think the answer is yes to your second question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin I feel like there's a difference between data that a user has chosen to ingest, and data that the Stack has collected for its own purposes. I'd prefer to keep the data that we rely on separate.

This PR has been hanging around for a while now and I'd like to push it over the finish line. We need to decide:

  1. What is the data stream name for ES logs? This PR changes it to .logs-deprecation-elasticsearch.default, but it was previously .logs-deprecation-elasticsearch
  2. What should the value of data_stream.dataset be? It's current elasticsearch.deprecation (via DeprecatedMessage.java), but we could change it to elastic.deprecation.elasticsearch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicolas and I synced, and decided on the following.

Given the index naming pattern {type}-{dataset}-{namespace}, we'll use the following values:

type = .logs
dataset = deprecation.{product} (e.g. `elasticsearch`)
namespace = default

I'll update the PR accordingly.

@pugnascotia
Copy link
Contributor Author

I've now updated event.dataset to match data_stream.dataset, and both have the value deprecation.elasticsearch.

It's a little unfortunate that event.dataset for other log4j appenders follow the elasticsearch.* pattern. Nicolas and I discussed elasticsearch.deprecation as an option for data_stream.dataset, but discounted it due to the similarity with the existing logs-elasticsearch.deprecation-default index that Beats can write when ingesting ES logs. In this case, event.dataset is largely an implementation detail and not of particular interest to users.

@pugnascotia pugnascotia requested a review from ruflin April 1, 2021 14:43
@ruflin
Copy link
Contributor

ruflin commented Apr 6, 2021

@pugnascotia Can a user still log deprecation logs to file or will this now only go internally?

@pugnascotia
Copy link
Contributor Author

Elasticsearch will continue to log deprecation warnings to file (unless a user has changed the log4j configuration).

@ruflin
Copy link
Contributor

ruflin commented Apr 6, 2021

Great, so the existing deprecation logging will not change, neither will the content.

@pugnascotia
Copy link
Contributor Author

That's correct 👍

@pugnascotia pugnascotia merged commit fb1921c into elastic:master Apr 13, 2021
@pugnascotia pugnascotia deleted the deprecation-indexing-updates branch April 13, 2021 14:11
pugnascotia added a commit to pugnascotia/elasticsearch that referenced this pull request Apr 13, 2021
More fixes to deprecation log indexing so that the data stream name and document
contents are more ECS-compatible.
pugnascotia added a commit that referenced this pull request Apr 14, 2021
Backport of #68737.

More fixes to deprecation log indexing so that the data stream name and document
contents are more ECS-compatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Logging Log management and logging utilities >refactoring Team:Core/Infra Meta label for core/infra team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants