replace CRABCache use with CERN S3 #4971

belforte · 2021-02-11T14:22:07Z

most work will have to be done in the server.
CRABClient will have to replace the current uplaod to cache with two calls:

get a presigned URL from CRAB REST
upload the file that URL

all work of tracking object ids, checking used space, etc, will be done in CRABServer

Code to be changed is in here

CRABClient/src/python/CRABClient/ClientUtilities.py

Lines 232 to 259 in 5951982

    
           if doupload: 
        
               # uploadLog is executed directly from crab main script, does not inherit from SubCommand 
        
               # so it needs its own RESTServer instantiation 
        
               restClass = CRABClient.Emulator.getEmulator('rest') 
        
               RESTServer = restClass(url=serverurl, localcert=proxyfilename, localkey=proxyfilename, 
        
                                      retry=2, logger=logger, verbose=False, version=__version__, 
        
                                      userAgent='CRABClient') 
        
               cacheurl = server_info(RESTServer=RESTServer, uriNoApi=baseurl, subresource='backendurls') 
        
               # Encode in ascii because old pycurl present in old CMSSW versions 
        
               # doesn't support unicode. 
        
               cacheurl = cacheurl['cacheSSL'].encode('ascii') 
        
               cacheurldict = {'endpoint': cacheurl, "pycurl": True} 
        
               ufc = UserFileCache(cacheurldict) 
        
               logger.debug("cacheURL: %s\nLog file name: %s" % (cacheurl, logfilename)) 
        
               logger.info("Uploading log file...") 
        
               ufc.uploadLog(logpath, logfilename) 
        
               logger.info("%sSuccess%s: Log file uploaded successfully." % (colors.GREEN, colors.NORMAL)) 
        
               logfileurl = cacheurl + '/logfile?name='+str(logfilename) 
        
               if not username: 
        
                   from CRABClient.UserUtilities import getUsername 
        
                   username = getUsername(proxyFile=proxyfilename, logger=logger) 
        
               logfileurl += '&username='+str(username) 
        
               logger.info("Log file URL: %s" % (logfileurl)) 
        
               return  logfileurl 
        
           else: 
        
               logger.info('Failed to upload the log file') 
        
               logfileurl = False

belforte · 2021-03-10T12:49:15Z

code that refers to crabcache in crab purge command can simply be commented-out/removed since we will deprecate that command (and remove the need for gsissh to schedd's !)

belforte · 2021-03-15T10:35:52Z

Need to take care also of dry run:

CRABClient/src/python/CRABClient/Commands/submit.py

Lines 376 to 383 in 5951982

    
           tmpDir = tempfile.mkdtemp() 
        
           os.chdir(tmpDir) 
        
           self.logger.info('Creating temporary directory for dry run sandbox in %s' % tmpDir) 
        
           ufc.downloadLog('dry-run-sandbox.tar.gz', output=os.path.join(tmpDir, 'dry-run-sandbox.tar.gz')) 
        
           for name in ['dry-run-sandbox.tar.gz', 'InputFiles.tar.gz', 'CMSRunAnalysis.tar.gz', 'sandbox.tar.gz']: 
        
               tf = tarfile.open(os.path.join(tmpDir, name)) 
        
               tf.extractall(tmpDir) 
        
               tf.close()

belforte · 2021-04-08T08:44:01Z

@ddaina I have started to work on this since I need at least to submit a task using S3 as cache to check what exactly goes in the DB and make progress on crabserver side. But I found that it is more work than initially thought since we need to pass a RESTServer object to the tarball handling code in

CRABClient/src/python/CRABClient/JobType/UserTarball.py

Lines 156 to 159 in 5951982

    
               def upload(self, filecacheurl=None): 
        
                   """ 
        
                   Upload the tarball to the File Cache 
        
                   """

The code I pointed to at the top of this issue is only about uploading log files :-(

So I decided to simplify a bit the way we handle the RESTServer communications now introducing a dictionary to track together the HTTPRequest object and the instance name:

self.REST = {'server': RESTServer, 'uriNoApi': uriNoApi}

this requires a change in almost all commands, as done earlier in
5951982

It is easier if I take care of this myself. I will leave the log upload part to you though.

log upload still to be dealt with heavy changes everywhere will need extensive validation

belforte · 2021-04-09T07:32:19Z

@ddaina I am fixing last pylint complains then will merge #4983 which , coupled with latest, unmerged yet, changes to the two files imported from CRABServer, manages to submit using S3 as crabcache.
I will also merge and tag CRABServer as soon as I have done it on CRABClient.

Of course submitted task does not work, because TW does not know how to use S3 yet. I will work on that, in the meanwhile you can, in whichever order:

look into uploadlog
test that new client still works with current server/cache in as much detail as you can/wish. I will be surprised if you do not find something to fix even before extending check..

Remember:
currently cmsweb-test1 is configured to use S3 as cache. All other crabservers point to old crabcache.
I will keep cmsweb-test1 uptodate with latest version of code, you can deploy whatever more convenient in test2

If you have time for strategical thinking, we could talk about how to plan for a smooth transistion, which may mean changing what I just did, but the sooner the better, before we end up like for new Publisher in a years-long mess.

* allow to use S3 to upload sandboxes. for #4971 log upload still to be dealt with heavy changes everywhere will need extensive validation * remove wait option from resubmit * stop using getUrl and remove it

upload logfile to s3 (fix #4971)

belforte self-assigned this Feb 11, 2021

belforte mentioned this issue Feb 11, 2021

break down development work for CRABCache replacement dmwm/CRABServer#6402

Closed

belforte mentioned this issue Feb 23, 2021

document all places where CRABCache is used dmwm/CRABServer#6400

Closed

belforte assigned ddaina Mar 10, 2021

belforte added the No WMCore label Mar 11, 2021

belforte added a commit to belforte/CRABClient that referenced this issue Apr 8, 2021

allow to use S3 to upload sandboxes. for dmwm#4971

3450996

log upload still to be dealt with heavy changes everywhere will need extensive validation

ddaina added a commit to ddaina/CRABClient that referenced this issue Apr 20, 2021

upload logfile to s3 (fix dmwm#4971)

51b5d97

ddaina added a commit to ddaina/CRABClient that referenced this issue Apr 20, 2021

upload logfile to s3 (fix dmwm#4971)

b0a77a8

ddaina closed this as completed in 2ae3898 Apr 22, 2021

ddaina added a commit that referenced this issue Apr 22, 2021

Merge pull request #4990 from ddaina/uploadLogS3

6ce00a3

upload logfile to s3 (fix #4971)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace CRABCache use with CERN S3 #4971

replace CRABCache use with CERN S3 #4971

belforte commented Feb 11, 2021

belforte commented Mar 10, 2021

belforte commented Mar 15, 2021

belforte commented Apr 8, 2021

belforte commented Apr 9, 2021

replace CRABCache use with CERN S3 #4971

replace CRABCache use with CERN S3 #4971

Comments

belforte commented Feb 11, 2021

belforte commented Mar 10, 2021

belforte commented Mar 15, 2021

belforte commented Apr 8, 2021

belforte commented Apr 9, 2021