fleet-server doesn't boot up in time #359

mtojek · 2021-05-17T13:12:08Z

With latest PR elastic/integrations#988 introduced by @ruflin we noticed a flakiness:

https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Fintegrations/detail/master/315/

[2021-05-17T06:06:31.685Z] Successfully built 4da27b95fc0f
[2021-05-17T06:06:31.685Z] Successfully tagged elastic-package-stack_package-registry:latest
[2021-05-17T06:06:31.685Z] 2021/05/17 06:06:28 DEBUG running command: /usr/local/bin/docker-compose -f /var/lib/jenkins/workspace/gest-manager_integrations_master/.elastic-package/stack/snapshot.yml -p elastic-package-stack up -d
[2021-05-17T06:06:31.685Z] Creating network "elastic-package-stack_default" with the default driver
[2021-05-17T06:06:31.685Z] Creating elastic-package-stack_package-registry_1 ... 
[2021-05-17T06:06:31.685Z] Creating elastic-package-stack_elasticsearch_1    ... 
[2021-05-17T06:06:33.600Z] 
Creating elastic-package-stack_package-registry_1 ... done
Creating elastic-package-stack_elasticsearch_1    ... done
Creating elastic-package-stack_package-registry_is_ready_1 ... 
[2021-05-17T06:06:56.132Z] 
Creating elastic-package-stack_package-registry_is_ready_1 ... done
Creating elastic-package-stack_elasticsearch_is_ready_1    ... 
[2021-05-17T06:06:56.132Z] Creating elastic-package-stack_kibana_1                    ... 
[2021-05-17T06:07:14.248Z] 
Creating elastic-package-stack_kibana_1                    ... done
Creating elastic-package-stack_elasticsearch_is_ready_1    ... done
Creating elastic-package-stack_kibana_is_ready_1           ... 
[2021-05-17T06:07:14.248Z] Creating elastic-package-stack_fleet-server_1              ... 
[2021-05-17T06:08:21.974Z] 
Creating elastic-package-stack_kibana_is_ready_1           ... done
Creating elastic-package-stack_fleet-server_1              ... done
[2021-05-17T06:08:21.974Z] ERROR: for elastic-agent  Container "bc53fb0a01cd" is unhealthy.
[2021-05-17T06:08:21.974Z] Encountered errors while bringing up the project.
[2021-05-17T06:08:21.974Z] Error: booting up the stack failed: running docker-compose failed: running command failed: running Docker Compose up command failed: exit status 1

which means that the fleet-server didn't start in time. Is it too slow now?

More logs:
https://beats-ci.elastic.co/job/Ingest-manager/job/integrations/job/master/315/artifact/build/elastic-stack-dump/latest/infoblox/logs/kibana.log
https://beats-ci.elastic.co/job/Ingest-manager/job/integrations/job/master/315/artifact/build/elastic-stack-dump/latest/infoblox/logs/fleet-server.log
https://beats-ci.elastic.co/job/Ingest-manager/job/integrations/job/master/315/artifact/build/elastic-stack-dump/latest/infoblox/logs/elasticsearch.log

The text was updated successfully, but these errors were encountered:

ruflin · 2021-05-17T13:38:29Z

I'm trying to understand if this is in any way related to the change I made in elastic/integrations#988. It mainly (should) have fixed that the max connection setting is correctly accepted. But turns out it still doesn't work as expected. As the limit it 50+ this should not have an effect here. I wonder if this flakyness comes from another bugfix that has been merged recently. I could not spot something obvious in the logs that would explain the above. @blakerouse Any ideas?

It is also odd that it worked in "most" cases so I suspect it is a timing issue somehow.

mtojek · 2021-05-17T13:51:37Z

Did you introduce anything heavy last time? Something that can affect timing?

ruflin · 2021-05-17T14:40:39Z

I managed to reproduce this locally and what I see happening is that the fleet-server gets a new config from Fleet so it restarts the fleet-server. This is not really heavy but I think it is related to the timing. As it reloads it is at first healthy then again not healthy and healthy again afterwards. I wonder if this reloading happened before.

The config in Fleet seems to be fully aligned with the default config setup. Maybe the fix to the field name changes are causing this. Are you setting and custom fleet-policy settings or just taking the defaults created by Fleet?

Is there any option in docker to say: Healthy if at least passed healthy twice in a row?

mtojek · 2021-05-18T07:55:29Z

As it reloads it is at first healthy then again not healthy and healthy again afterwards. I wonder if this reloading happened before.

Docker engine considers the container as healthy if only the health check is correct. Once it's considered "healthy", the engine proceeds to the next container to boot it up. In this case the engine didn't proceed, which means that every check had to hit the unhealthy state.

The config in Fleet seems to be fully aligned with the default config setup. Maybe the fix to the field name changes are causing this. Are you setting and custom fleet-policy settings or just taking the defaults created by Fleet?

This is basic stack boot up procedure with all default policy.

Is there any option in docker to say: Healthy if at least passed healthy twice in a row?

No, that's why I opened this issue: elastic/beats#25341

ruflin · 2021-05-19T07:06:43Z

Lets properly fix the health check. Have you seen the flakyness recently again?

mtojek · 2021-05-19T07:58:31Z

It happens to me from time to time while starting the stack locally, but so far it failed on the CI once.

andresrc · 2021-09-22T13:27:25Z

Is this still happening?

ph · 2022-01-28T21:11:56Z

@mtojek still happenning?

mtojek · 2022-01-28T21:34:02Z

It's definetely stale, we can close it.

mtojek added Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team labels May 17, 2021

ph closed this as completed Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fleet-server doesn't boot up in time #359

fleet-server doesn't boot up in time #359

mtojek commented May 17, 2021

ruflin commented May 17, 2021

mtojek commented May 17, 2021

ruflin commented May 17, 2021

mtojek commented May 18, 2021

ruflin commented May 19, 2021

mtojek commented May 19, 2021

andresrc commented Sep 22, 2021

ph commented Jan 28, 2022

mtojek commented Jan 28, 2022

fleet-server doesn't boot up in time #359

fleet-server doesn't boot up in time #359

Comments

mtojek commented May 17, 2021

ruflin commented May 17, 2021

mtojek commented May 17, 2021

ruflin commented May 17, 2021

mtojek commented May 18, 2021

ruflin commented May 19, 2021

mtojek commented May 19, 2021

andresrc commented Sep 22, 2021

ph commented Jan 28, 2022

mtojek commented Jan 28, 2022