Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Default Fleet Server policy missing fleet_server integration #124004

Open
juliaElastic opened this issue Jan 28, 2022 · 16 comments · Fixed by #124968, #124969, #125041 or #125043
Open

[Fleet] Default Fleet Server policy missing fleet_server integration #124004

juliaElastic opened this issue Jan 28, 2022 · 16 comments · Fixed by #124968, #124969, #125041 or #125043
Labels
bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@juliaElastic
Copy link
Contributor

juliaElastic commented Jan 28, 2022

Kibana version: 7.16 and potentially others

Elasticsearch version:

Server OS version:

Browser version:

Browser OS version:

Original install method (e.g. download page, yum, from source, etc.): on prem installation

Describe the bug:
Many users reported Default Fleet Server policy missing fleet_server integration, and this blocks them to enroll a Fleet Server successfully. The workaround is to add fleet_server integration manually to the policy.

The root cause/steps to reproduce is unknown, it could be that something goes wrong during setup, and that's why the fleet_server integration fails to be added.

image

Steps to reproduce:

  1. Set a dummy registryUrl in kibana.yml or kibana.dev.yml, so that fleet_server can't be installed on startup

  2. Remove registryUrl (or set to a valid value)

Bug: even though fleet_server is installed successfully now, fleet server policy still doesn't have a fleet_server policy integration.

Expected behavior: fleet_server integration should be added by default to Default Fleet Server policy.

Screenshots (if relevant):

Errors in browser console (if relevant):

Provide logs and/or server output (if relevant):

Any additional context: this bug might go away once we release removing Default policies: #108456

Reported issues:
https://discuss.elastic.co/t/why-cant-select-an-agent-policy-when-add-a-fleet-server/293083
https://discuss.elastic.co/t/cannot-add-a-fleet-server/295670

@juliaElastic juliaElastic added bug Fixes for quality problems that affect the customer experience Team:Fleet Team label for Observability Data Collection Fleet team labels Jan 28, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@jen-huang
Copy link
Contributor

This is sometimes spotted in our test failures too: #123105

In addition to #108456, may also be resolved by #122297

@juliaElastic
Copy link
Contributor Author

@jen-huang @joshdover
I think the root cause of this bug looks like this issue: #113921
Which was fixed for 8.0 and above here: #119488
I think we should backport the fix for the next minor of 7.16 and 7.17, do you agree?

And with that, this fix should be backported as well: #124363

@jen-huang
Copy link
Contributor

@juliaElastic Yeah, let's backport those fixes.

@juliaElastic
Copy link
Contributor Author

juliaElastic commented Feb 9, 2022

@nchaulet I went through this bug again to verify (see steps in description) and found that the fix for missing package_policies in preconfig is not entirely solved, only works for managed policies.

I found that your fix added a condition for is_managed here: https://github.com/nchaulet/kibana/blob/bcd9488c3fc836a46cb8c1712e857c19dafbb65f/x-pack/plugins/fleet/server/services/preconfiguration.ts#L284

Is there a reason why it was restricted only to managed policies? I think the logic should check for all preconfigured policies if package policies are missing (many reported issues are having this bug for Default Fleet Server policy in on-prem, which is not a managed policy).

I tested the fix locally (to recreate missing package policies for non-managed policies as well), and the caveat is that the fix makes it more likely to encounter this issue: #125105
As a workaround we could add a check not to recreate preconfigured package policies if another one already exists with the same base package name.
WDYT?

cc @joshdover

@joshdover
Copy link
Contributor

As a workaround we could add a check not to recreate preconfigured package policies if another one already exists with the same base package name.
WDYT?

This makes sense, but I'd also really like to just consider making id required on preconfigured package policies. I think it would greatly simplify our implementation and make this less of a headache to maintain. Problem is that this would be considered a breaking change I believe since we have not documented Fleet's preconfiguration settings as experimental: https://www.elastic.co/guide/en/kibana/current/fleet-settings-kb.html

We have some telemetry on non-default Kibana configurations that are in use. Let me see if I can find any data on how many customers are using preconfiguration. If it is low enough I think we can consider the change.

@nchaulet
Copy link
Member

Is there a reason why it was restricted only to managed policies? I think the logic should check for all preconfigured policies if package policies are missing (many reported issues are having this bug for Default Fleet Server policy in on-prem, which is not a managed policy).

For non managed policy one of the requirements was the ability for the user to delete the package policies not sure how it will work if we recreate the package policy on each setup.
It will really help if the policy created using preconfiguration are only editable using kibana.yml we will only have one source of truth.

@juliaElastic
Copy link
Contributor Author

yeah, I thought about that too, mentioned here: #125105 (comment)

@joshdover
Copy link
Contributor

We have some telemetry on non-default Kibana configurations that are in use. Let me see if I can find any data on how many customers are using preconfiguration. If it is low enough I think we can consider the change.

I checked this out and <0.25% of all clusters are using custom agent policies in preconfiguration. I don't know what % of those are specifying an id but I think this very small level of usage warrants considering making the id required if it allows us to implement this in a safer more stable way.

@juliaElastic
Copy link
Contributor Author

I checked this out and <0.25% of all clusters are using custom agent policies in preconfiguration. I don't know what % of those are specifying an id but I think this very small level of usage warrants considering making the id required if it allows us to implement this in a safer more stable way.

Okay, we can make the id required as a separate enhancement.

However as for this bug, should we allow recreating preconfigured non-managed policies? It sounds like it is not the best approach since it is a requirement not to recreate deleted non-managed policies.

Alternatively we can leave this bug as is, and see how much removing the dependency on EPR reduces the issue occurrence.

@joshdover
Copy link
Contributor

Alternatively we can leave this bug as is, and see how much removing the dependency on EPR reduces the issue occurrence.

I'm +1 on this for now unless we see this issue more widespread in 8.1+ clusters

@joshdover
Copy link
Contributor

Dropping priority here since we don't have a reasonable solution that wouldn't require breaking changes to the preconfiguration behavior. Will revisit as needed or consider solving in a IaC feature

@ruckc
Copy link

ruckc commented May 10, 2022

I'm seeing the issue with a 8.1.0 fresh installation with eck-operator deployed elasticsearch/fleetserver/agents.

@joshdover
Copy link
Contributor

@ruckc Could you share the xpack.fleet part of your Kibana’s config? It should match something similar to the example here: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-elastic-agent-fleet-quickstart.html

@Duri9292
Copy link

Hello team,

I have the same problem in 7.17.10. I raised case(#01387626) for your support team but so far no response. I have added the 'fleet_server policy integration' to the Default Fleet Policy but I encouraged the second bug mentioned [here].(#125105)

Is there any workaround how we can install fleet agent ?

@juliaElastic
Copy link
Contributor Author

juliaElastic commented Jun 23, 2023

@Duri9292 I suppose this is on an self managed cluster? You could try adding a new policy with Fleet Server integration, and try installing Fleet Server to that, so it doesn't clash with the preconfiguration.
We can look at your cluster details for more specific help on the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment