-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crab submit --dryrun executes cmsRun over and over again #7493
Comments
After private chat with Stefano, this issue is related to #6544 |
I can't reproduce it with CMSSW_10_6_12 [1]. One possibility is that in your case a ATM my best guess is that the problem is triggered by this line Indeed it is not documented anywhere that in dryRun cmsRun has to run for at least 25 seconds. [1]
|
If my theory is correct, the only fix is to detect the "running time is not increasing" in the loop and exit with a msg. OTOH it is quire unrealistic that someone bothers to use CRAB for something which never runs longer than 25 seconds per file. |
UH! Thanks Stefano! You found the culprit. I am sorry, I did not look at the submit dryrun code before opening this issue, my bad! As I mentioned, in order to speed up the turn around time when developing the jobwrapper, I am using Today I tried to replicate this. The first time I tried, running a single dryrun job took 40s and everything worked smoothly. In the following attempts, the job took 15s and it got stuck. The time report mentioned a way smaller time required to "init" the job. If I have to guess, from the second time onward opening the connection to stream the input file is quicker.
I agree, we do not need to change any logic here.
I also agree with this, maybe we can simply add something like while totalJobSeconds < maxSeconds:
+ if totalJobSeconds != 0:
+ self.logger.info("Last trial took only %s seconds. We are trying now with %s events", totalJobSeconds, events)
optsList = getCMSRunAnalysisOpts('Job.submit', 'RunJobs.dag', job=1, events=events) which gives
It does not identify the "issue" we are discussing here, but it is pretty simple and can definitely help a bit. If you want I can open the PR. Otherwise we can close this issue straight away. |
Sure, go ahead with the PR, but please cut those numbers to 1 decimal for secs and integer for event,
|
of course nobody know better than you that skipEvents should never be used with CRAB ! |
the most annoying part of dryrun imho is indeed that it prints nothing on stdout and keep you waiting for quite some time. I'd prefere that it prints cmsRun stdout, and tell you when it issues cmsRun, so that one can see that time is spent in cmssw initialization, non in crab things. Bur I still think that we should rather put our time in unifying with preparelocal |
the printout was fixed in dmwm/CRABClient#5184 . Can close |
problem
I submitted a task with
--dryrun
(adapted from https://github.com/dmwm/CRABServer/blob/master/test/statusTrackingTasks/HC-1kj.py ) to prod with cmssw 10 from lxplus7.The
crab submit --dryrun
command gets stuck at [1]and from another shell I notice that it keeps executing the same cmsRun command all over again [2].
I could replicate the same behavior with cmssw 12 in lxplus8
ideal behavior
cmsRun should be run only once.
[1]
[2]
The text was updated successfully, but these errors were encountered: