Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

Question: Retrieve Deployment logs during deploy #1881

Closed
sepiroth887 opened this issue Jul 25, 2015 · 16 comments
Closed

Question: Retrieve Deployment logs during deploy #1881

sepiroth887 opened this issue Jul 25, 2015 · 16 comments

Comments

@sepiroth887
Copy link

I noticed that there is relatively little feedback during a deployment and frequently got me wondering whats going on. You can look at the marathon syslogs but this is really no great way to work if you got many devs trying to deploy things.

Is there something I'm missing where I could surface deployment details to users so they would know whats wrong and when to stop/rollback a deployment?

@pierluigi
Copy link

Hi @sepiroth887,

Thanks for your feedback - We are going to concentrate on surfacing vital info regarding deployments in the UI in the coming weeks.

Do you have any specific use case you had in mind?

@gregory90
Copy link
Contributor

I've got three use cases when I have to guess what's wrong(even mesos sandbox logs doesn't help):

  1. Deploying docker container with non existent image name/tag and when authentication to docker registry is required but not provided.
  2. When no offers were received for given task(e.g. with wrong constraint).
  3. When mesos "executor_registration_timeout" is exceeded.

It would really help troubleshooting user errors.

@pierluigi
Copy link

@gregory90 thank you very much for your valuable input – we are going to concentrate on a revamp of the deployments UI soon, and these specific use cases definitely help us better understand better all possible scenarios.

cc @air @aquamatthias

@sepiroth887
Copy link
Author

Thanks for the quick reply! On top of the previously mentioned points it would be great to get a stream of sandbox error logs / docker logs during a deployment. (Or some other consumable stream per deployment which exposes maybe parts of this)

As much as the gui is concerned, im looking forward to more functionality but it would really need to go along with a good permissions model for our usecases. We are planning on using marathon via the rest API with Teamcity or Jenkins

@MikeMichel
Copy link
Contributor

this is related to https://issues.apache.org/jira/browse/MESOS-2035
as the mesos slave is the executer it already has all the information but has to pass it to marathon where it then could be pushed via eventbus which would be pretty awesome

@sepiroth887
Copy link
Author

Maybe a good first step would be to allow marathon to return the URL to the sandbox stdout and stderr logs so we don't have to guess what they might be? (This was probably already covered in another issue though)

How granular is the evenbus? Can I attach to it and consume only events for a specific app or group?

@kamilchm
Copy link

See #878 for sandbox url

@MikeMichel
Copy link
Contributor

As https://issues.apache.org/jira/browse/MESOS-2020 landed in mesos 0.23.0 can marathon work now with this information?

@aquamatthias
Copy link
Contributor

@MikeMichel you can see the Last Task Failure with the reason now filled in correctly from mesos.
(E.g.: Failed to launch container: Failed to 'docker pull topface_test_wtf:latest': exit status = exited with status 1 stderr = time="2015-11-02T10:20:06Z" level=fatal msg="Error: image library/topface_test_wtf:latest not found")

@MikeMichel
Copy link
Contributor

@aquamatthias filled in where? In the mesos logfiles? event bus?

@aquamatthias
Copy link
Contributor

@MikeMichel See Application > Debug Tab > Last Task Failure. Or from the REST API see Json object App.lastTaskFailure
Example:

        "lastTaskFailure": {
            "appId": "/frontend/github-pr-assigner/ui-components",
            "host": "srv3.hw.ca1.msg.com",
            "message": "Failed to launch container: Failed to 'docker pull build.msg.com:5000/rafael/github-pr-assigner-ui-components:latest': exit status = exited with status 1 stderr = time=\"2015-10-23T14:31:25Z\" level=fatal msg=\"Error response from daemon: v1 ping attempt failed with error: Get https://build.msg.com:5000/v1/_ping: dial tcp: lookup build.msg.com: no such host. If this private registry supports only HTTP or HTTPS with an unknown CA certificate, please add `--insecure-registry build.msg.com:5000` to the daemon's arguments. In the case of HTTPS, if you have access to the registry's CA certificate, no need for the flag; simply place the CA certificate at /etc/docker/certs.d/build.msg.com:5000/ca.crt\" \n",
            "state": "TASK_FAILED",
            "taskId": "frontend_github-pr-assigner_ui-components.bc00e777-7992-11e5-8b11-56b91e7a505b",
            "timestamp": "2015-10-23T14:31:25.745Z",
            "version": "2015-07-21T20:38:30.692Z",
            "slaveId": "20150618-112946-201330860-5050-2210-S0"
        },

@MikeMichel
Copy link
Contributor

yay, that's a reason to update. thx!

@aquamatthias
Copy link
Contributor

@gregory90
Use Case 1 should be solved: Please see Last Task Failure in the Debug Tab
Use Case 2 partially solved: We introduced the Waiting state as part of the Deployment in the UI. If your deployment enters the Waiting state, than you know: Marathon waits for offers that match the requirements. There are a lot of possibilities, why there is no offer that matches the requirement: not enough resources, impossible constraints, resource allocation starvation etc. We need to improve on the root cause, but the problem you described (no offers were received for given task) is visible.
Use Case 3 should be solved: if the executor_registration_timeout is exceeded the task launch is marked as failed. You can see this in App.lastTaskFailure or the Debug Tab

@aquamatthias
Copy link
Contributor

@sepiroth887 You have access to every task status update via the event stream.
We will add events for accepted and denied resource offers to complete the information.
Please see Event Stream API Doc

@meichstedt
Copy link
Contributor

Note: This issue has been migrated to https://jira.mesosphere.com/browse/MARATHON-3485. For more information see https://groups.google.com/forum/#!topic/marathon-framework/khtvf-ifnp8.

1 similar comment
@meichstedt
Copy link
Contributor

Note: This issue has been migrated to https://jira.mesosphere.com/browse/MARATHON-3485. For more information see https://groups.google.com/forum/#!topic/marathon-framework/khtvf-ifnp8.

@d2iq-archive d2iq-archive locked and limited conversation to collaborators Mar 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants