Skip to content
This repository has been archived by the owner on Dec 5, 2019. It is now read-only.

Record log URI in spark job runs #477

Open
robhudson opened this issue May 18, 2017 · 3 comments
Open

Record log URI in spark job runs #477

robhudson opened this issue May 18, 2017 · 3 comments
Assignees
Milestone

Comments

@robhudson
Copy link
Member

This is to be able to more easily associate logs with a specific run

@robhudson
Copy link
Member Author

See #520 for details on how the EMR logs are different from the spark job logs being created in the batch script.

In the meeting today we decided it would be best to update the batch.sh file to create the log files with a more deterministic name that we can use on the Python side. One idea was the job name + cluster job ID, if possible.

@maurodoglio Would the above cause any problems that you know of? Is it possible to get the cluster job ID into the batch script?

@rafrombrc rafrombrc modified the milestones: m4, m3 Jun 22, 2017
@maurodoglio
Copy link
Contributor

maurodoglio commented Jun 22, 2017

Would the above cause any problems that you know of?

I don't think so

Is it possible to get the cluster job ID into the batch script?

I think so, you can probably use the aws cli and filter the list of running jobs by some attributes accessible from the machine (maybe the hostname?)

@robhudson
Copy link
Member Author

Here's an example of pulling out the jobflow ID from the running cluster:
https://gist.github.com/robotblake/7b08526b7a411739cd4c344476dd0860

This could be inserted into the job flow steps prior to the batch.sh to pass the jobflow_id.

@rafrombrc rafrombrc modified the milestones: m5, m4 Aug 23, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants