-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the usage of tianon/true images with docker compose up --wait #1720
Replace the usage of tianon/true images with docker compose up --wait #1720
Conversation
4828815
to
cc02bc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pkoutsovasilis, It looks interesting this change!
I added some comments about this, I see that at least there is a different behavior. The start up process is not repeated if elastic-agent container fails, as it happened in main (non-detached mode).
Another concern about this PR is that it would break the current support of elastic-package with docker-compose
V1. Should this support be removed as part of the future 1.0 version ? @jsoriano
This is also true that the support from docker itself (docker-compose release notes):
Effective July 2023, Compose V1 stopped receiving updates and is no longer in new Docker Desktop releases.
|
||
var allServices []string | ||
for _, aService := range services { | ||
allServices = append(allServices, aService, fmt.Sprintf("%s_%s", aService, readyServicesSuffix)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not needed to support docker-compose V1 and it can be removed all the _is_ready
containers.
It looks like that readyServicesSuffix
constant could be deleted.
If that is the case, the code of Status
function in internal/stack/status.go
could also be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those container definitions are part of the services required for testing (system tests).
Related to the status
command, it should not affect since that command is intended to show just information about the containers related to the Stack (Kibana, Package Registry, Elasticsearch, etc.). It uses the docker-compose project from the profile:
elastic-package/internal/stack/compose.go
Line 189 in 62fe1c9
containerIDs, err := docker.ContainerIDsWithLabel(projectLabelDockerCompose, DockerComposeProjectName(options.Profile)) |
About the test packages, not sure what it would be the best option.
Currently, the servicedeployer
run in detached mode, using -d
docker-compose flag:
ExtraArgs: []string{"--build", "-d"}, |
ExtraArgs: []string{"--build", "-d"}, |
ExtraArgs: []string{"--build", "-d"}, |
About the test packages, if servicedeployer
is not updated with the new flags, those new flags those container should be kept. And it is also run a explicit method to wait for the containers being ready/healthy:
err = p.WaitForHealthy(ctx, opts) |
err = p.WaitForHealthy(ctx, opts) |
@jsoriano Should servicedeployer
be updated too (Up options) with these new flags? Or keep the current implementation ? As they are running with -d
, it looks safe.
About the test package , it could be removed... but in the integrations repository they would keep using that container. Probably, it could be kept the tianon/true
container to be sure that it is tested also with that. WDYT ?
@@ -111,7 +111,7 @@ func dockerComposeUp(options Options) error { | |||
|
|||
var args []string | |||
if options.DaemonMode { | |||
args = append(args, "-d") | |||
args = append(args, "-d", "--wait", "--wait-timeout", fmt.Sprintf("%d", 600)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing in main branch without using detach mode (-d
), if elastic-agent
container fails... it is being re-tried the docker-compose up
dependency failed to start: container elastic-package-stack-elastic-agent-1 exited (1)
2024/03/13 19:13:05 DEBUG output command: /usr/bin/docker ps -a --filter label=com.docker.compose.project=elastic-package-stack --format {{.ID}}
2024/03/13 19:13:05 DEBUG output command: /usr/bin/docker inspect d5df6764fb51 09f479d5637e 644fbbaa8557 341a68995249 831674f8c23f 920f6d104b80 25de6ef50f35 8cd0417a9622 afa7b068ae1b 44f10e51526b
Elastic Agent failed to start, trying again in 10s.
2024/03/13 19:13:16 DEBUG running command: /usr/bin/docker compose version --short
2024/03/13 19:13:16 DEBUG Determined Docker Compose version: 2.24.6
2024/03/13 19:13:16 DEBUG running command: /usr/bin/docker compose -f /home/mariorodriguez/.elastic-package/profiles/default/stack/snapshot.yml -p elastic-package-stack up
[+] Running 9/0
With this change, it looks like elastic-package
does not get the error (elastic-agent failed to start) and it cannot retry the docker-compose up
. Leaving the scenario with the elastic agent with exited status. It looks like is_ready
containers help in this case, not sure how
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrodm this is weird as the e2e-test in the CI have caught one case that this happens. Can you help with an example that hopefully reproduces what you see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is weird as the e2e-test in the CI have caught one case that this happens.
Steps in CI use detached mode (-d
):
elastic-package stack up -d -v |
Can you help with an example that hopefully reproduces what you see?
Sure ! Here I was referring to the case where that flag is not used:
elastic-package stack up -v
Not all runs of this command fail with this, so it needs to be repeated until it is hit that error.
After some retries running the above command, elastic-agent-1
container could not start and elastic-package
did not try to re-start the container as it would happen :
elastic-agent-1 | {"log.level":"info","@timestamp":"2024-03-26T11:51:50.649Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":505},"message":"1st enrollment attempt failed, retrying for 10m0s, every 1m0s enrolling to URL: https://fleet-server:8220/","ecs.version":"1.6.0"}
elastic-agent-1 | Error: fail to enroll: fail to execute request to fleet-server: EOF
elastic-agent-1 | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.12/fleet-troubleshooting.html
elastic-agent-1 | Error: enrollment failed: exit status 1
elastic-agent-1 | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.12/fleet-troubleshooting.html
elastic-agent-1 exited with code 1
And the status of the cluster:
$ elastic-package stack status
Status of Elastic stack services:
╭──────────────────┬─────────┬───────────────────╮
│ SERVICE │ VERSION │ STATUS │
├──────────────────┼─────────┼───────────────────┤
│ elastic-agent │ 8.12.2 │ exited (1) │
│ elasticsearch │ 8.12.2 │ running (healthy) │
│ fleet-server │ 8.12.2 │ running (healthy) │
│ kibana │ 8.12.2 │ running (healthy) │
│ package-registry │ latest │ running (healthy) │
╰──────────────────┴─────────┴───────────────────╯
With the same options running with the latest published version, it does the retry:
elastic-agent-1 | {"log.level":"info","@timestamp":"2024-03-26T12:01:12.540Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":505},"message":"1st enrollment attempt failed, retrying for 10m0s, every 1m0s enrolling to URL: https://fleet-server:8220/","ecs.version":"1.6.0"}
elastic-agent-1 | Error: fail to enroll: fail to execute request to fleet-server: dial tcp 192.168.192.6:8220: connect: connection refused
elastic-agent-1 | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.12/fleet-troubleshooting.html
elastic-agent-1 | Error: enrollment failed: exit status 1
elastic-agent-1 | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.12/fleet-troubleshooting.html
elastic-agent-1 exited with code 1
dependency failed to start: container elastic-package-stack-elastic-agent-1 exited (1)
2024/03/26 13:01:12 DEBUG output command: /usr/bin/docker ps -a --filter label=com.docker.compose.project=elastic-package-stack --format {{.ID}}
2024/03/26 13:01:12 DEBUG output command: /usr/bin/docker inspect 0ebea2626c5e b5873a01c18c 94ec7181f20c 0449f8bf38ef a8c2d3eaf656 cc5633104400 1916c6b5f1dc 25baea050703 b1af9e72d4f2 2f268ae81ff3
Elastic Agent failed to start, trying again in 10s.
2024/03/26 13:01:22 DEBUG running command: /usr/bin/docker compose version --short
2024/03/26 13:01:22 DEBUG Determined Docker Compose version: 2.25.0
2024/03/26 13:01:22 DEBUG running command: /usr/bin/docker compose -f /home/mariorodriguez/.elastic-package/profiles/default/stack/snapshot.yml -p elastic-package-stack up
WARN[0000] /home/mariorodriguez/.elastic-package/profiles/default/stack/snapshot.yml: `version` is obsolete
[+] Running 9/0
✔ Container elastic-package-stack-package-registry-1 Running 0.0s
✔ Container elastic-package-stack-elasticsearch-1 Running 0.0s
✔ Container elastic-package-stack-package-registry_is_ready-1 Created 0.0s
✔ Container elastic-package-stack-elasticsearch_is_ready-1 Created 0.0s
✔ Container elastic-package-stack-kibana-1 Running 0.0s
✔ Container elastic-package-stack-kibana_is_ready-1 Created 0.0s
✔ Container elastic-package-stack-fleet-server-1 Running 0.0s
✔ Container elastic-package-stack-fleet-server_is_ready-1 Created 0.0s
✔ Container elastic-package-stack-elastic-agent-1 Created 0.0s
Attaching to elastic-agent-1, elastic-agent_is_ready-1, elasticsearch-1, elasticsearch_is_ready-1, fleet-server-1, fleet-server_is_ready-1, kibana-1, kibana_is_ready-1, package-registry-1, package-registry_is_ready-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's odd 🥲 hmmm I see, I think this is what happens; with the *_is_ready services in place we have dependencies on the actual services, so when elastic-agent fails the respective _is_ready service can't start because it's dependency failed completely and thus docker compose up return an error in the sense sorry I couldn't bring up all the services. However when we remove the *_is_ready services then this isn't triggering as nothing depends on elastic-agent and thus up considers that everything had been brought up and the user will deal with any errors visible in the logs... I don't have an immediate fix for that
I think we can remove support for v1, it was completely discontinued last year, and is not included anymore in docker desktop distributions. The migration to v2 is straightforward, and should be transparent for anyone using a recent docker version, since #1592. No need to relate this change to #1158. |
6533727
to
6c28d49
Compare
6c28d49
to
537ce34
Compare
💚 Build Succeeded
History
|
after speaking with @jsoriano we decided that is better to introduce a milder change to support native arm64 execution and exploiting docker-compose |
After having a look at the code I realised that the
tianon/true
images, as well as the respective docker compose*_is_ready
services, are required only in the detached mode ofdocker compose up
. Specifically, based on the dependencies of services, docker compose will detach when all services are reported as healthy and only*_is_ready
ones are in the state ofStarted
. Soelastic-package stack up -d -v
would look like this imageNotice that
Done
is being printed while*_is_ready
services are atStarted
This PR substitutes the usage of such former services with the built-in
docker compose --wait
for detached mode which results in the following imageNotice that
Done
is being printed when all services are reportedHealthy
As for the compose up attached mode this essentially remains the same. As before the invocation will immediately transition to the container logs.
Why we should try to avoid
tianon/true
images?! Because many devs employ Apple-silicon macbooks and this will get us a step closer running natively without any ISA emulation. This last bit, assuming this PR will make it, requires onlyelastic-package-registry
andfleet-server
to offer arm64 images and native execution here we go.As always I may have missed something in my logic above so please feel free to correct me 🙂
This PR closes #1708 (PS: @mrodm I told you in EAH that I didn't open an issue because I wanted to open a PR, but aside joking, thanks for capturing this in an issue 😄)