Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fleet-server doesn't boot up in time #359

Closed
mtojek opened this issue May 17, 2021 · 9 comments
Closed

fleet-server doesn't boot up in time #359

mtojek opened this issue May 17, 2021 · 9 comments
Labels
Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team

Comments

@mtojek
Copy link
Contributor

mtojek commented May 17, 2021

With latest PR elastic/integrations#988 introduced by @ruflin we noticed a flakiness:

https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/master/315/

[2021-05-17T06:06:31.685Z] Successfully built 4da27b95fc0f
[2021-05-17T06:06:31.685Z] Successfully tagged elastic-package-stack_package-registry:latest
[2021-05-17T06:06:31.685Z] 2021/05/17 06:06:28 DEBUG running command: /usr/local/bin/docker-compose -f /var/lib/jenkins/workspace/gest-manager_integrations_master/.elastic-package/stack/snapshot.yml -p elastic-package-stack up -d
[2021-05-17T06:06:31.685Z] Creating network "elastic-package-stack_default" with the default driver
[2021-05-17T06:06:31.685Z] Creating elastic-package-stack_package-registry_1 ... 
[2021-05-17T06:06:31.685Z] Creating elastic-package-stack_elasticsearch_1    ... 
[2021-05-17T06:06:33.600Z] 
Creating elastic-package-stack_package-registry_1 ... done
Creating elastic-package-stack_elasticsearch_1    ... done
Creating elastic-package-stack_package-registry_is_ready_1 ... 
[2021-05-17T06:06:56.132Z] 
Creating elastic-package-stack_package-registry_is_ready_1 ... done
Creating elastic-package-stack_elasticsearch_is_ready_1    ... 
[2021-05-17T06:06:56.132Z] Creating elastic-package-stack_kibana_1                    ... 
[2021-05-17T06:07:14.248Z] 
Creating elastic-package-stack_kibana_1                    ... done
Creating elastic-package-stack_elasticsearch_is_ready_1    ... done
Creating elastic-package-stack_kibana_is_ready_1           ... 
[2021-05-17T06:07:14.248Z] Creating elastic-package-stack_fleet-server_1              ... 
[2021-05-17T06:08:21.974Z] 
Creating elastic-package-stack_kibana_is_ready_1           ... done
Creating elastic-package-stack_fleet-server_1              ... done
[2021-05-17T06:08:21.974Z] ERROR: for elastic-agent  Container "bc53fb0a01cd" is unhealthy.
[2021-05-17T06:08:21.974Z] Encountered errors while bringing up the project.
[2021-05-17T06:08:21.974Z] Error: booting up the stack failed: running docker-compose failed: running command failed: running Docker Compose up command failed: exit status 1

which means that the fleet-server didn't start in time. Is it too slow now?

More logs:
https://beats-ci.elastic.co/job/Ingest-manager/job/integrations/job/master/315/artifact/build/elastic-stack-dump/latest/infoblox/logs/kibana.log
https://beats-ci.elastic.co/job/Ingest-manager/job/integrations/job/master/315/artifact/build/elastic-stack-dump/latest/infoblox/logs/fleet-server.log
https://beats-ci.elastic.co/job/Ingest-manager/job/integrations/job/master/315/artifact/build/elastic-stack-dump/latest/infoblox/logs/elasticsearch.log

@mtojek mtojek added Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team labels May 17, 2021
@ruflin
Copy link
Contributor

ruflin commented May 17, 2021

I'm trying to understand if this is in any way related to the change I made in elastic/integrations#988. It mainly (should) have fixed that the max connection setting is correctly accepted. But turns out it still doesn't work as expected. As the limit it 50+ this should not have an effect here. I wonder if this flakyness comes from another bugfix that has been merged recently. I could not spot something obvious in the logs that would explain the above. @blakerouse Any ideas?

It is also odd that it worked in "most" cases so I suspect it is a timing issue somehow.

@mtojek
Copy link
Contributor Author

mtojek commented May 17, 2021

Did you introduce anything heavy last time? Something that can affect timing?

@ruflin
Copy link
Contributor

ruflin commented May 17, 2021

I managed to reproduce this locally and what I see happening is that the fleet-server gets a new config from Fleet so it restarts the fleet-server. This is not really heavy but I think it is related to the timing. As it reloads it is at first healthy then again not healthy and healthy again afterwards. I wonder if this reloading happened before.

The config in Fleet seems to be fully aligned with the default config setup. Maybe the fix to the field name changes are causing this. Are you setting and custom fleet-policy settings or just taking the defaults created by Fleet?

Is there any option in docker to say: Healthy if at least passed healthy twice in a row?

@mtojek
Copy link
Contributor Author

mtojek commented May 18, 2021

As it reloads it is at first healthy then again not healthy and healthy again afterwards. I wonder if this reloading happened before.

Docker engine considers the container as healthy if only the health check is correct. Once it's considered "healthy", the engine proceeds to the next container to boot it up. In this case the engine didn't proceed, which means that every check had to hit the unhealthy state.

The config in Fleet seems to be fully aligned with the default config setup. Maybe the fix to the field name changes are causing this. Are you setting and custom fleet-policy settings or just taking the defaults created by Fleet?

This is basic stack boot up procedure with all default policy.

Is there any option in docker to say: Healthy if at least passed healthy twice in a row?

No, that's why I opened this issue: elastic/beats#25341

@ruflin
Copy link
Contributor

ruflin commented May 19, 2021

Lets properly fix the health check. Have you seen the flakyness recently again?

@mtojek
Copy link
Contributor Author

mtojek commented May 19, 2021

It happens to me from time to time while starting the stack locally, but so far it failed on the CI once.

@andresrc
Copy link
Contributor

Is this still happening?

@ph
Copy link
Contributor

ph commented Jan 28, 2022

@mtojek still happenning?

@mtojek
Copy link
Contributor Author

mtojek commented Jan 28, 2022

It's definetely stale, we can close it.

@ph ph closed this as completed Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team
Projects
None yet
Development

No branches or pull requests

4 participants