-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the destination of where failed script is written to #1530
Comments
Cannot you |
The suggestions we get from the vendor team is to avoid any communications between VM and S3 directly especially for frequent file exchanges. That was why their setup is to write data only to specific folders pre-mounted which later gets automatically copied to S3. I guess we can try to mount it as you suggest and see if it is feasible for large embarrassing parallel workflows involving many signature checks. I will report it back here if it seems too slow. |
Let me know if it works for you. It is pretty easy to add this setting from command line but the problem is that |
@BoPeng thank you! Currently we cannot test it properly because the way our vendor set it up is that the S3 bucket is mounted with read-only. The automatic process runs the sync separate from our SoS run so we don't really write there the real time. I am asking them to reconfigure it and am waiting for the response.
I think all we would need is to move the failed script ("command log"?) elsewhere. That's what we are interested in. If that's written to a folder that we sync between the VM and S3 that'd be the best. We should be able to leverage that and test it out. |
@BoPeng I'm sorry, it turns out we do need this feature -- to keep Can we change the default for SoS anyways to write the command log to the same folder as where people set |
The problem is that the |
Let me see if #1533 works. |
Thank you @BoPeng it's very helpful pointer to where to modify it. I have change it to: https://github.com/vatlab/sos/pull/1533/files It seems good. For example for this script:
it says:
but when i set stderr file explicitly:
the temporary script gets into the current folder properly:
Do you see any obvious issues with this patch? Please improve as you see fit. I wonder if we can also release a new version to conda for us to pull the changes and apply it to our jobs on AWS. Thanks! |
How about a combination of both patches? I think having the script directly in stderr can be useful especially when stderr is the default ( |
If we do that, for consistency and convenient, we should also write the script to Line 311 in 23bd5e9
This is because sometimes it is not entirely clear what went wrong with a task when it fails due to variable replacement problem. |
@BoPeng I brought the other patch back via: ffeac9c instead of writing the entire script, because the script can be very long in many applications. I think by placing the error message into stderr, it should also reflect into the task status so we don't have to modify The patch does not work as is, however. The error message is
I think this is because you opened the stderr file with |
ok, the patch is updated, it should work w/wo option |
I will clean up the code (pylint) and make a release. |
sos 0.24.5 is relased. |
Thank you @BoPeng . It's not here yet: https://anaconda.org/conda-forge/sos but i guess it will show up soon? |
Yes, it should be there in a few hours after the pypi release. |
I am not sure if that will be the case ... according to the release history, conda should have version 0.24.4: https://pypi.org/project/sos/#history However, it is still 0.24.3 https://anaconda.org/conda-forge/sos Perhaps there are some check fails that prevents it from getting onto conda-forge? |
It's posted after I merged PR. |
This is typically what we see when a job failed,
The line
Rscript /home/aw3600/.sos/97de343d7da3f0ce/susie_twas_1_0_ff982a12.R
is how we track and try to reproduce the error, to debug. However, long story short is that we are working with cloud computing where/home/aw3600/.sos/
is a path in the VM that gets destroyed after a command ends. Although it is possible to copy the entire.sos
folder to permanent AWS S3 bucket before the VM dies, it is non-trivial to sync the entire folder ... all we care is this filesusie_twas_1_0_ff982a12.R
.I think this conversation was once brought up but I don't remember we have an option to do it yet -- can we specify something on the
sos run
interface to make these temporary scripts saved to a given folder? I like the behavior of:which writes the stderr and stdout to where I want them. I wonder if we can add something like:
and only keep the scripts to
/path/to/debug/folder
when there is an issue -- and change the promptFailed to execute
to pointing to this script?The text was updated successfully, but these errors were encountered: