Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC2 fleet instance unexpectedly terminated by ec2-fleet-plugin #355

Closed
inickp opened this issue Dec 9, 2022 · 2 comments
Closed

EC2 fleet instance unexpectedly terminated by ec2-fleet-plugin #355

inickp opened this issue Dec 9, 2022 · 2 comments
Labels

Comments

@inickp
Copy link

inickp commented Dec 9, 2022

Issue Details

Describe the bug
Build fails unexpectedly due EC2 fleet instance being terminated by ec2-fleet-plugin.

Block with failure from build console:

11:24:49  9833f26c6bda: Layer already exists
11:24:49  8b77e88448a2: Layer already exists
11:24:49  latest: digest: sha256:25118fde3eb250f6b094694d922a57486e7cc645b9338e738b20fb352f1b9203 size: 6601
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // withDockerRegistry
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // dir
[Pipeline] }
[Pipeline] // script
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Deploy)
[Pipeline] script
[Pipeline] {
[Pipeline] isUnix
11:24:49  ERROR: Issue with creating launcher for agent i-0b0c5e5c9e0a61c6c. The agent has not been fully initialized yet
11:24:49  ERROR: Issue with creating launcher for agent i-0b0c5e5c9e0a61c6c. The agent has not been fully initialized yet
[Pipeline] withEnv
[Pipeline] {
[Pipeline] sh
11:24:49  cf-spot-agents-arm i-0b0c5e5c9e0a61c6c was marked offline: Connection was broken: java.io.EOFException
11:24:49  	at java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2905)
11:24:49  	at java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3400)

com.amazon.jenkins.ec2fleet log:

cf-spot-agents-arm [cf-spot-agents-arm] start cloud com.amazon.jenkins.ec2fleet.EC2FleetCloud@3f9cb16e
Dec 09, 2022 1:24:43 PM FINE com.amazon.jenkins.ec2fleet.EC2FleetCloud
cf-spot-agents-arm [cf-spot-agents-arm] Fleet instances: [i-0b0c5e5c9e0a61c6c]
Dec 09, 2022 1:24:43 PM FINE com.amazon.jenkins.ec2fleet.EC2FleetCloud
cf-spot-agents-arm [cf-spot-agents-arm] Described instances: [i-0b0c5e5c9e0a61c6c]
Dec 09, 2022 1:24:43 PM FINE com.amazon.jenkins.ec2fleet.EC2FleetCloud
cf-spot-agents-arm [cf-spot-agents-arm] Jenkins nodes: [i-0b0c5e5c9e0a61c6c]
Dec 09, 2022 1:24:43 PM FINE com.amazon.jenkins.ec2fleet.EC2FleetCloud
cf-spot-agents-arm [cf-spot-agents-arm] setting stats
Dec 09, 2022 1:24:49 PM INFO com.amazon.jenkins.ec2fleet.EC2FleetAutoResubmitComputerLauncher afterDisconnect
DISCONNECTED: cf-spot-agents-arm i-0b0c5e5c9e0a61c6c
Dec 09, 2022 1:24:49 PM INFO com.amazon.jenkins.ec2fleet.EC2FleetAutoResubmitComputerLauncher afterDisconnect
Start retriggering executors for cf-spot-agents-arm i-0b0c5e5c9e0a61c6c
Dec 09, 2022 1:24:49 PM INFO com.amazon.jenkins.ec2fleet.EC2FleetAutoResubmitComputerLauncher afterDisconnect
RETRIGGERING: org.jenkinsci.plugins.workflow.job.WorkflowJob@7179166b[web-deploy] - WITH ACTIONS: [hudson.model.ParametersAction@2010c5d6]

AWS events history for the EC2 instance:

{
    "eventVersion": "1.08",
    "userIdentity": {
        ...
        "userName": "deploy"
    },
    "eventTime": "2022-12-09T10:24:49Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "TerminateInstances",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "...",
    "userAgent": "ec2-fleet-plugin, aws-sdk-java/1.12.246 Linux/5.4.0-132-generic OpenJDK_64-Bit_Server_VM/11.0.15+10 java/11.0.15 groovy/2.4.21 vendor/Eclipse_Adoptium cfg/retry-mode/legacy",
    "requestParameters": {
        "instancesSet": {
            "items": [
                {
                    "instanceId": "i-0b0c5e5c9e0a61c6c"
                }
            ]
        }
    },
    "responseElements": {
        "requestId": "2d67b40e-3b47-42b5-8a16-b26830a12117",
        "instancesSet": {
            "items": [
                {
                    "instanceId": "i-0b0c5e5c9e0a61c6c",
                    "currentState": {
                        "code": 32,
                        "name": "shutting-down"
                    },
                    "previousState": {
                        "code": 16,
                        "name": "running"
                    }
                }
            ]
        }
    },
    "requestID": "2d67b40e-3b47-42b5-8a16-b26830a12117",
    "eventID": "1d4247d8-483b-4692-9184-3d7553ffcfe3",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "...",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "ec2.us-east-1.amazonaws.com"
    }
}

To Reproduce
I started observing described behavior not after certain moment like config change or update so really hard to say how to reproduce it. Build usually fails after building application docker image stage (23 steps build), sometimes right after pushing to registry and proceeding to the next stage, sometimes before pushing, but also saw cases when it was on a later stage. It is not observed on quick builds.

  1. Start the build on EC2 fleet agent.
  2. Build fails in the middle.

Environment Details

Plugin Version?
2.5.1

Jenkins Version?
2.346.2

Spot Fleet or ASG?
ASG

Label based fleet?
No

Linux or Windows?
Linux

EC2Fleet Configuration as Code
Unfortunately, don't have the configuration as code plugin installed, any other good option I can provide it?

Anything else unique about your setup?
No

@inickp inickp added the bug label Dec 9, 2022
@ppodbielski-cloudentity
Copy link

👍

@pdk27
Copy link
Collaborator

pdk27 commented Aug 3, 2023

@inickp Looks like the issue here is that the plugin terminated a busy instance. This has been addressed in ec2-fleet-2.7.0. Also, this version surfaces the reason why an instance was terminated by the plugin, one of these. Hope that helps.
See related issue: #363

If your specific issue persists after upgrading, feel free to re-open the issue with your cloud configuration (if you don't use configuration as code, then just list the configuration manually or add a screenshot, whatever is easier)

@pdk27 pdk27 closed this as completed Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants