Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] logs-* and metrics-* index patterns get overwritten on install/removal/upgrade of packages, breaking runtime fields and CCS #120340

Closed
hop-dev opened this issue Dec 3, 2021 · 29 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@hop-dev
Copy link
Contributor

hop-dev commented Dec 3, 2021

Kibana version: > 7.10

Describe the bug:

On package install, upgrade or removal the Fleet index patterns are regenerated, meaning any settings that the user may have specified on the index pattern e.g runtime or scripted fields

Steps to reproduce:
given a kibana instance with fleet setup:

  1. Add a scripted field to the metrics-* index pattern

Screenshot 2021-12-03 at 12 03 49

  1. Install the assets of a package (or remove a package, or upgrade a package)

Screenshot 2021-12-03 at 12 05 34

  1. Return to view the metrics-* index pattern to see that the scripted field has been removed

Screenshot 2021-12-03 at 12 13 03

Expected behavior: User changes are preserved.

@hop-dev hop-dev added the bug Fixes for quality problems that affect the customer experience label Dec 3, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Dec 3, 2021
@hop-dev hop-dev added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 3, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Dec 3, 2021
@joshdover joshdover changed the title [Fleet] logs-* and metrics-* index patterns overwritten on install/removal/upgrade of package [Fleet] logs-* and metrics-* index patterns get overwritten on install/removal/upgrade of packages, breaking runtime fields support Dec 3, 2021
@joshdover
Copy link
Contributor

Sorta surprised we haven't heard this one before. Though most objects installed by Fleet are considered "managed" I think this is a pretty broken experience for one of the headline features in 7.x: runtime fields + frozen tier storage. I don't think we should be overwriting these objects at all if possible.

The issue that is blocking us from excluding the index patterns from the import is this one: #120312

We could either wait for that to be fixed (which is being considered) or we could work around this by fetching the existing index pattern before the import so that we overwrite it with the existing value. This workaround could still result in lost writes due to a data race between a user updating the index pattern and a user upgrading a package concurrently.

@joshdover
Copy link
Contributor

@jen-huang is this a known limitation that we've avoiding fixing for any reason?

@jen-huang
Copy link
Contributor

@joshdover I think this is a known (but undocumented) limitation. AFAIK the only reason for the lack of a fix is just that other things took priority :) I think we've also been under the assumption that these are sort of "managed index patterns" and haven't thought through the UX for how they should work with runtime fields.

@jportner
Copy link
Contributor

jportner commented Dec 7, 2021

The issue that is blocking us from excluding the index patterns from the import is this one: #120312

We could either wait for that to be fixed (which is being considered)

We are going to fix this for the 8.0 release 👍 working on a PR now.

@mattkime
Copy link
Contributor

mattkime commented Dec 8, 2021

We should probably come up with a full user story for 'managed' data views. This would be driven by Fleet needs and the expected user experience. App services is happy to help once there's a good definition of what needs to be accomplished.

@nerophon
Copy link
Contributor

nerophon commented Dec 9, 2021

The following has been proposed by a user:

I was thinking of a solution of the CCS index patterns if deploying from Fleet.

  • Fleet should name the logs and metrics index patterns "logs" and "metrics" explicitly. Don't use random ID's.
  • Fleet shouldn't overwrite the logs and metrics index patterns if they already exist
  • Users delete and re-create the logs and metrics index patterns so they include the remote cluster name i.e. *-cluster, and use the same logs and metrics name.

If we follow this process the dashboards should work.

@joshdover joshdover added enhancement New value added to drive a business result and removed bug Fixes for quality problems that affect the customer experience labels Mar 14, 2022
@mbudge
Copy link

mbudge commented Aug 23, 2023

We're an enterprise customer with data in multiple remote clusters.

Users access data from a central search cluster.

We run Fleet in the search cluster to make it easier to deploy and update integration Assets. This is because we don't have time to copy them from remote cluster and use python scripts to update the configuration.

Please can this get fixed asap?

@joshdover
Copy link
Contributor

Hi @mbudge, thanks for chiming in here.

One question for you: do you need to have full control over the CCS pattern used or would it be ok if the data view was configured with a wildcard to cover all remote clusters, such as *:metrics-*-*?

@mbudge
Copy link

mbudge commented Aug 23, 2023

I have no problem having these added by default to fix this

*:metrics-*
*:logs-*

Thanks in advanced for fixing it.

@joshdover
Copy link
Contributor

@hop-dev How do you think we should handle this? I think we also need to meet the runtime fields requirement. So broadly speaking, I think we should:

  1. Change the index pattern by default to also cover CCS clusters by adding the *: prefix
  2. Avoid overwriting any fields in the data view object that are not the index pattern and name, to preserve any other customizations the user has made (like runtime fields).
  3. Rename the data views to "Logs" and "Metrics" (optional)

So we need a way to solve item (2). My understanding is that we have to import the data views at the same time as the other SOs with the current import logic and we don't have a way to not override some objects but not others in the same import call.

I see two routes here:

  1. Try work around the issue by reading the data view before the import and writing back any fields after the import
    • slightly hacky, probably some race conditions bugs here
  2. Add a new option to the import that allows us to not overwrite some objects in the import but still overwrite the others
    • This one still isn't perfect because I also don't want users to break dashboards by changing the index pattern to something that doesn't work with our integrations.

@mbudge
Copy link

mbudge commented Aug 23, 2023

I don't think you need to do 3.

Anyone who's build process documentation on elastic will have to update it.

@mbudge
Copy link

mbudge commented Sep 18, 2023

When Fleet overwrites the data view on install/removal/upgrade of packages, does the logs-* data view object ID change?

The logs-* data view object ID has changed which has broken all our dashboards. All the visuals in the dashboards are showing a "Could not find the data view" error which is having high impact on the operations we've built on elastic. We need to get this resolved ASAP.

@mattkime
Copy link
Contributor

When Fleet overwrites the data view on install/removal/upgrade of packages, does the logs-* data view object ID change?

I don't think it should be. I'm pretty sure the data views fleet would create always have the same id. Therefore I'm surprised that dependent saved objects would be broken. It would be worthwhile to go to saved object management and inspect the data view and a dependent broken saved object, paying attention to ids and relationships.

Are you using spaces?

@joshdover joshdover added bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed enhancement New value added to drive a business result labels Sep 19, 2023
@joshdover
Copy link
Contributor

joshdover commented Sep 19, 2023

Are you using spaces?

I'm curious about this as well. Spaces is the only situation I can think of where objects may have different IDs. Integration assets don't properly support spaces at this moment, mostly because the underlying assets (dashboards, etc.) don't support multiple spaces yet. So you may have run into a bug/limitation around package upgrades when installed in different spaces. For instance, if the package was installed in once space, but then upgraded while the user was in another, the assets in the original space are deleted, which could break a reference.

@mbudge I'm also curious what version of Kibana you were on when this happened. There was an import issue recently that was affecting 8.8.0-8.9.2, that could have impacted you depending on the integration that was modified: #164712.

I've also changed this overall issue from an enhancement to a bug to help with properly prioritizing this. Data views from Integrations should work with the rest of the features in the Stack, including CCS and runtime fields.

@joshdover joshdover changed the title [Fleet] logs-* and metrics-* index patterns get overwritten on install/removal/upgrade of packages, breaking runtime fields support [Fleet] logs-* and metrics-* index patterns get overwritten on install/removal/upgrade of packages, breaking runtime fields and CCS Sep 19, 2023
@mbudge
Copy link

mbudge commented Sep 19, 2023

We are using spaces. v8.9.1 I think.

Different operations teams use different spaces so spaces are very important. Fleet needs to support them.

We use CCS to search data in remote clusters. The Fleet policies+assets are in the remote clusters.

Elastic previously recommended exporting assets and using a python script to change the data view object IDs in the json data, before importing them to the search cluster the users access. We can't support this any more due to the large number of assets we have to transfer when we upgrade.

We tried setting up Fleet in the search cluster to deploy assets to one of the spaces used by the operations teams, without any elastic-agents connecting to the cluster. It also looks like assets can only be deployed to one space through Fleet, so we have to manually transfer assets between spaces when multiple operations teams need to access them.

Would it not be easier if Fleet created the logs-* and metrics-* data views if they don't exist, but not re-create/overwrite them. Or get all the assets to use saved object names instead of IDs so Fleet becomes space aware.

@kpollich
Copy link
Member

kpollich commented Oct 6, 2023

Change the index pattern by default to also cover CCS clusters by adding the *: prefix

Can we actually change these defaults? Seems like we should instead add *:logs-* and *:metrics-* as new data views to Fleet. I tried changing these in a working branch and it results in no data showing up in Discover when the wildcard prefix CCR-friendly versions of these data views are used.

@kpollich
Copy link
Member

kpollich commented Oct 6, 2023

Example of the above:

Screen.Recording.2023-10-06.at.10.52.48.AM.mov

@joshdover
Copy link
Contributor

Does it work if you use a pattern like logs-*,*:logs-*?

@joshdover
Copy link
Contributor

I think we need a single data view so that all of the dashboards will just work. We would also need the id of the index-pattern object not to change, only the underlying index pattern.

@kpollich
Copy link
Member

Does it work if you use a pattern like logs-,:logs-*?

Yes this looks like it works as expected. However, we should definitely do number 3 above and give these a vanity title if we're going to use this pattern. It's quite a bit less presentable than logs-* or metrics-* in the UI, imho

image

I think we need a single data view so that all of the dashboards will just work. We would also need the id of the index-pattern object not to change, only the underlying index pattern.

This is correct - we need to make sure the underlying ID stays the same. Since we'll be passing overwrite: false in the installation process for these data views, we'll want a one-time migration that updates the name and title (that's the property for the actual pattern, see https://www.elastic.co/guide/en/kibana/current/data-views-api-create.html) to conform with this new spec. Then, the code can continue working with overwrite: false which will prevent blowing away users' customizations to these data views.

@kpollich
Copy link
Member

Hmm actually I'm not sure that we can create a migration for these data views since they aren't registered as Fleet saved objects. Maybe we can do a "just-in-time migration" by detecting a case where a dataview with a given ID (logs-* or metrics-*) already exists, but it has the wrong name. That first save will happen with override: true to set the dataview to the new values following this fix, then we'll use override: false for every save in the future, or just skip the create call entirely if a dataview with the desired ID already exists.

@mbudge
Copy link

mbudge commented Oct 16, 2023

Just an FYI we do this to make it easier for users to find log data. When using logs-*, users reported constantly filtering to find data from an particular Fleet integration was a bit cumbersome. This way they know what data is available as opposed to using a dashboard/visual to display a table of data_stream.dataset, as datasets disappear from view if there's no logs within the time range they are filtering on or if the feed drops.

Screenshot from 2023-10-16 21-14-02wd

I find users just want to view the data without knowing if it's in a local or remote cluster.

@kpollich
Copy link
Member

users reported constantly filtering to find data from an particular Fleet integration was a bit cumbersome

This can definitely be cumbersome, especially if you're creating a data view for every single dataset.

I can recommend adding a filter via the + icon and filtering on the event.dataset value to see data limited to a particular integration as a potential alternative that might alleviate some pain or prove useful elsewhere.


I've filed a PR for the root cause fixes here, though I expect there might be some churn on the implementation specifics. See #169409.

@ruflin
Copy link
Contributor

ruflin commented Oct 24, 2023

@mbudge There is a feature we are working that will not solve yet your CCS issue but help with the integration / dataset selection. If you are on 8.10.* you can type into the Kibana Search bar "Logs Explorer" and Logs Explorer entry will show up. If you jump there, you see a similar experience to Discover but with an Integration drop down to select your data. There is more to come here but would be great to hear if this helps with your selection of dataset.
Screenshot 2023-10-24 at 08 57 36

Screenshot 2023-10-24 at 08 57 58

@mbudge
Copy link

mbudge commented Oct 24, 2023

@ruflin yes this looks very good. Our analysts will be very keen to start using this.

To get this working on our search cluster we need

  • Support for CCS
  • Fleet support Kibana spaces (The kibana instance we use has spaces for different business departments, and I don't think Fleet works with Kibana spaces yet).

Thanks

@kpollich
Copy link
Member

@mbudge What does better CCS support here look like to you in this context? Fleet isn't "doing" much with its managed logs/metrics data views. Would having a managed data view for cross-cluster logs/metrics be helpful, or would having the pattern on the existing managed data views be preferred?

@mbudge
Copy link

mbudge commented Oct 24, 2023

@kpollich I find our users just want to search the logs and don't want to know about the remote clusters, so adding it to the existing data view is preferred. Elastic already has a steep leaning curve so we like to keep it simple and keep training to a minimum, and we can add our own data views if required. For us the Logs Explorer also needs to work with CCS.

@kpollich
Copy link
Member

kpollich commented Nov 2, 2023

The root issue with overwriting the logs-* and metrics-* data streams has been fixed in #170188

@kpollich kpollich closed this as completed Nov 2, 2023
@kpollich kpollich added QA:Needs Validation Issue needs to be validated by QA and removed QA:Needs Validation Issue needs to be validated by QA labels Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants