-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding the dswork framework. #1
Comments
Your workers are behaving as if they don't actually believe the file for ds.sample.initInds has actually been saved to disk. This information is supposed to travel from the master to the workers via the file [ds.sys.outdir 'ds/sys/distproc/savestate.mat']. If some of them got the message that the file is there, and some of them didn't, there must be a synchronization issue; i.e. the failed workers are reading an old savestate.mat file. Can you post the file [ds.sys.outdir 'ds/sys/distproc/savestate.mat']? Also, can you describe the filesystem you're using? |
Sorry for delay in reply and Thanks for reply. I was on leave for few days. Actually I kept the code in the nfs file system and output in xfs file system. As soon as you mentioned about filesystem. I just kept everything in the xfs filesystem. Now the code the running normally. Thanks |
That's odd. Moving the code to the xfs filesystem shouldn't have made any difference, since dswork doesn't write anything to the code directory. I have run dswork without any issues on systems where the code is on nfs and the output directory is on a lustre filesystem. At any rate, a few failing workers shouldn't affect the integrity of the program, since the failed jobs will just get re-run on other workers. The only exception is the reduce phase of the dsmapreduce, since in that case the jobs are assigned to specific workers, and so one consistently failing worker can make the program get stuck. Let me know if the issue reoccurs. |
Currently I am not able reproduce same error. If it reoccurs I will contact you and also send you the savestate.mat as you suggested. If filesystem was not the actual solution then I do not know how did it got solved. But the code is currently work fine. Thanks |
I was running the given code of Discriminative Mode Seeking paper. I encounter an error in one of the log files. I was running using 6 workers. Though most of workers completed their process. There were few of the workers which did not complete thier jobs assigned and showed some error in the output log files. Details of last 10 lines are given below. Below is the details of error when the function: sampleRandomPatchesbb() was running. I was not able to resolve the issue could you please help me to solve the issue.
ans =
.ds.sample.initInds
ans =
1x1 struct array with no fields.
ismapreducer:1
Reference to non-existent field 'sample'.
file: [1x91 char]
name: 'dsmapredrun'
line: 5
MATLAB:nonExistentField
checkpassed =
Could you please help me resolve the issue.
Thank
Praveen
The text was updated successfully, but these errors were encountered: