Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for CRAB3_CACHE_FILE=/dev/null #5337

Closed
belforte opened this issue Oct 18, 2024 · 11 comments
Closed

add support for CRAB3_CACHE_FILE=/dev/null #5337

belforte opened this issue Oct 18, 2024 · 11 comments
Assignees

Comments

@belforte
Copy link
Member

ref. https://cms-talk.web.cern.ch/t/adapting-usage-of-crab3-file-in-case-multiple-processes-trying-to-access-it/54334/4

current code does

DEBUG 2024-10-18 13:00:15.261 UTC: 	 Executing command: 'status'
DEBUG 2024-10-18 13:00:15.261 UTC: 	 Could not find CRAB cache file /dev/null; creating a new one.
ERROR 2024-10-18 13:00:15.302 UTC: 	 Unhandled Exception!
ERROR 2024-10-18 13:00:15.302 UTC: 	 [Errno 13] Permission denied: '/dev/null.1120937'
@ArturAkh
Copy link

Hi @belforte,

It seems like there is still some trouble when using CRAB3_CACHE_FILE=/dev/null:

[aakhmets@lxplus917 CMSSWWithCrab]$ crab resubmit -d /afs/cern.ch/work/a/aakhmets/private/htt_data_crab_nanoaod_submission_25-11-2024_prodreleasev12/crab/crab_data_2018UL_tau_Tau_Run2018A --maxmemory 4000
Rucio client intialized for account aakhmets
Found no jobs to resubmit. Only jobs in status failed can be resubmitted. Jobs in status finished can also be resubmitted, but only if the jobids are specified and the force option is set.
[aakhmets@lxplus917 CMSSWWithCrab]$ export CRAB3_CACHE_FILE=/dev/null
[aakhmets@lxplus917 CMSSWWithCrab]$ crab resubmit -d /afs/cern.ch/work/a/aakhmets/private/htt_data_crab_nanoaod_submission_25-11-2024_prodreleasev12/crab/crab_data_2018UL_tau_Tau_Run2018A --maxmemory 4000
Error: Please indicate the CRAB project directory with --dir=<project-directory>.

I think this is somewhat related to the following lines of code:

if self.cmdconf['requiresDirOption']:
if self.options.projdir is None:
if len(self.args) > 1:
msg = "%sError%s:" % (colors.RED, colors.NORMAL)
msg += " 'crab %s' command accepts at most 1 argument (a path to a CRAB project directory), %d given." % (self.name, len(self.args))
raise ConfigurationException(msg)
elif len(self.args) == 1 and self.args[0]:
self.options.projdir = self.args.pop(0)
elif self.cmdconf['useCache'] and self.crab3dic.get('crab_project_directory'):
self.options.projdir = str(self.crab3dic['crab_project_directory'])
if self.options.cmptask:
msg = "crab %s requires a projdir. Since you passed a taskname, we will run crab remake"
self.logger.info(msg, self.name)
ret = self.remakeLocalCache()
self.options.projdir = ret["workDir"]
self.logger.debug("crab remake created %s", self.options.projdir)
if self.options.projdir is None:
msg = "%sError%s:" % (colors.RED, colors.NORMAL)
msg += " Please indicate the CRAB project directory with --dir=<project-directory>."
ex = MissingOptionException(msg)
ex.missingOption = "task"
raise ex
if not os.path.isdir(self.options.projdir):
msg = "%sError%s:" % (colors.RED, colors.NORMAL)
msg += " %s is not a valid CRAB project directory." % (self.options.projdir)
raise ConfigurationException(msg)

It seems like when not using the variable being set tu /dev/null the resubmission works as intended. Right after doing that, I end up with the error of a missing option, while I've clearly have added it in the command.

Would you please have a closer look?

@ArturAkh
Copy link

Submission and status query work fine with CRAB3_CACHE_FILE=/dev/null, though

@belforte
Copy link
Member Author

thanks @ArturAkh for reporting. I guess I did not test all use cases :-(
Will look closer

@belforte belforte reopened this Nov 27, 2024
@belforte belforte removed the Done label Nov 27, 2024
@belforte
Copy link
Member Author

I can easily reproduce, so should be able to fix.

@ArturAkh
Copy link

Thanks a lot, @belforte!

@belforte
Copy link
Member Author

the problem is that resubmit, like other commands, most notably recover, internally calls other commands. In this case status. But current code relies on the fact that crabcache file exists and is pointing to current task, i.e. the -d <projdir> from CLI is not propagated. By the way this opens to an interesting race condition in case multiple clients run at same time with same $HOME, which apparently have not surfaced so far.

I could easily fix resubmit into working. But at this point prefer to make sure that the projdir is correctly propagated in all cases, including when the initial command does not have the -d option (e.g. for crab report)

@ArturAkh let me know if you want a recipe for running your own version of CRABClient with just resubmit fixed.

@belforte belforte added bug and removed enhancement labels Nov 27, 2024
@ArturAkh
Copy link

Ok thanks for the clarification, @belforte!

I agree that having the project directory properly propagated would be the solution to go for.

The issue is not urgent, so there is no need to rush :) Just wanted to bring it up since I 've stumbled across that.

In any case, I'd appreciate it very much if you would provide me with a recipe to run custom crab client to try thing out :)

@belforte
Copy link
Member Author

happy to do !
Instructions to run your own Client are in https://cmscrab.docs.cern.ch/crab-components/crab-client.html#using-crabclient-directly-from-your-github-repository
All you need is to go to your cloned CRABClient repository and apply these changes:

diff --git a/src/python/CRABClient/Commands/resubmit.py b/src/python/CRABClient/Commands/resubmit.py
index 9178495..940a42e 100644
--- a/src/python/CRABClient/Commands/resubmit.py
+++ b/src/python/CRABClient/Commands/resubmit.py
@@ -31,7 +31,7 @@ class resubmit(SubCommand):
 
     def __call__(self):
 
-        statusDict = getMutedStatusInfo(self.logger)
+        statusDict = getMutedStatusInfo(logger=self.logger, projdir=self.options.projdir)
         jobList = statusDict['jobList']
 
         if self.splitting == 'Automatic' and statusDict['dbStatus'] == 'KILLED':
diff --git a/src/python/CRABClient/UserUtilities.py b/src/python/CRABClient/UserUtilities.py
index f6fd71d..47125dc 100644
--- a/src/python/CRABClient/UserUtilities.py
+++ b/src/python/CRABClient/UserUtilities.py
@@ -193,7 +193,7 @@ def setConsoleLogLevel(lvl):
         for h in logging.getLogger('CRAB3.all').handlers:
             h.setLevel(lvl)
 
-def getMutedStatusInfo(logger=None, proxy=None):
+def getMutedStatusInfo(logger=None, proxy=None, projdir=None):
     """
     Mute the status console output before calling status and change it back to normal afterwards.
     """
@@ -202,6 +202,9 @@ def getMutedStatusInfo(logger=None, proxy=None):
     if proxy:
         cmdargs.append("--proxy")
         cmdargs.append(proxy)
+    if projdir:
+        cmdargs.append("-d")
+        cmdargs.append(projdir)
     cmdobj = getattr(mod, 'status')(logger=logger, cmdargs=cmdargs)
     loglevel = getConsoleLogLevel()
     setConsoleLogLevel(LOGLEVEL_MUTE)

@belforte
Copy link
Member Author

belforte commented Nov 27, 2024

Interestingly crab recover does not suffer of this problem when checking task status.

Because it already passed -d projdir as arg to status, differencly from

def getMutedStatusInfo(logger=None, proxy=None):

so:

not so bad !

@ArturAkh
Copy link

Thanks a lot, Stefano, sounds like a good plan :)

I'll try the instructions for a customized client out.

@belforte
Copy link
Member Author

belforte commented Dec 4, 2024

logging in CRABClient is more tricky than I thought. Mostly if I put my hands in there I could not resist making larger chances to make things clear and flush to file more frequently. I better leave it alone and just add projdir where needed.

belforte added a commit to belforte/CRABClient that referenced this issue Dec 4, 2024
belforte added a commit to belforte/CRABClient that referenced this issue Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants