Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace CRABCache use with CERN S3 #4971

Closed
belforte opened this issue Feb 11, 2021 · 4 comments
Closed

replace CRABCache use with CERN S3 #4971

belforte opened this issue Feb 11, 2021 · 4 comments
Assignees

Comments

@belforte
Copy link
Member

most work will have to be done in the server.
CRABClient will have to replace the current uplaod to cache with two calls:

  1. get a presigned URL from CRAB REST
  2. upload the file that URL

all work of tracking object ids, checking used space, etc, will be done in CRABServer

Code to be changed is in here

if doupload:
# uploadLog is executed directly from crab main script, does not inherit from SubCommand
# so it needs its own RESTServer instantiation
restClass = CRABClient.Emulator.getEmulator('rest')
RESTServer = restClass(url=serverurl, localcert=proxyfilename, localkey=proxyfilename,
retry=2, logger=logger, verbose=False, version=__version__,
userAgent='CRABClient')
cacheurl = server_info(RESTServer=RESTServer, uriNoApi=baseurl, subresource='backendurls')
# Encode in ascii because old pycurl present in old CMSSW versions
# doesn't support unicode.
cacheurl = cacheurl['cacheSSL'].encode('ascii')
cacheurldict = {'endpoint': cacheurl, "pycurl": True}
ufc = UserFileCache(cacheurldict)
logger.debug("cacheURL: %s\nLog file name: %s" % (cacheurl, logfilename))
logger.info("Uploading log file...")
ufc.uploadLog(logpath, logfilename)
logger.info("%sSuccess%s: Log file uploaded successfully." % (colors.GREEN, colors.NORMAL))
logfileurl = cacheurl + '/logfile?name='+str(logfilename)
if not username:
from CRABClient.UserUtilities import getUsername
username = getUsername(proxyFile=proxyfilename, logger=logger)
logfileurl += '&username='+str(username)
logger.info("Log file URL: %s" % (logfileurl))
return logfileurl
else:
logger.info('Failed to upload the log file')
logfileurl = False

@belforte
Copy link
Member Author

code that refers to crabcache in crab purge command can simply be commented-out/removed since we will deprecate that command (and remove the need for gsissh to schedd's !)

@belforte
Copy link
Member Author

Need to take care also of dry run:

tmpDir = tempfile.mkdtemp()
os.chdir(tmpDir)
self.logger.info('Creating temporary directory for dry run sandbox in %s' % tmpDir)
ufc.downloadLog('dry-run-sandbox.tar.gz', output=os.path.join(tmpDir, 'dry-run-sandbox.tar.gz'))
for name in ['dry-run-sandbox.tar.gz', 'InputFiles.tar.gz', 'CMSRunAnalysis.tar.gz', 'sandbox.tar.gz']:
tf = tarfile.open(os.path.join(tmpDir, name))
tf.extractall(tmpDir)
tf.close()

@belforte
Copy link
Member Author

belforte commented Apr 8, 2021

@ddaina I have started to work on this since I need at least to submit a task using S3 as cache to check what exactly goes in the DB and make progress on crabserver side. But I found that it is more work than initially thought since we need to pass a RESTServer object to the tarball handling code in

def upload(self, filecacheurl=None):
"""
Upload the tarball to the File Cache
"""

The code I pointed to at the top of this issue is only about uploading log files :-(

So I decided to simplify a bit the way we handle the RESTServer communications now introducing a dictionary to track together the HTTPRequest object and the instance name:

self.REST = {'server': RESTServer, 'uriNoApi': uriNoApi}

this requires a change in almost all commands, as done earlier in
5951982

It is easier if I take care of this myself. I will leave the log upload part to you though.

belforte added a commit to belforte/CRABClient that referenced this issue Apr 8, 2021
log upload still to be dealt with

heavy changes everywhere will need extensive validation
@belforte
Copy link
Member Author

belforte commented Apr 9, 2021

@ddaina I am fixing last pylint complains then will merge #4983 which , coupled with latest, unmerged yet, changes to the two files imported from CRABServer, manages to submit using S3 as crabcache.
I will also merge and tag CRABServer as soon as I have done it on CRABClient.

Of course submitted task does not work, because TW does not know how to use S3 yet. I will work on that, in the meanwhile you can, in whichever order:

  • look into uploadlog
  • test that new client still works with current server/cache in as much detail as you can/wish. I will be surprised if you do not find something to fix even before extending check..

Remember:
currently cmsweb-test1 is configured to use S3 as cache. All other crabservers point to old crabcache.
I will keep cmsweb-test1 uptodate with latest version of code, you can deploy whatever more convenient in test2

If you have time for strategical thinking, we could talk about how to plan for a smooth transistion, which may mean changing what I just did, but the sooner the better, before we end up like for new Publisher in a years-long mess.

belforte added a commit that referenced this issue Apr 9, 2021
* allow to use S3 to upload sandboxes. for #4971
log upload still to be dealt with

heavy changes everywhere will need extensive validation

* remove wait option from resubmit

* stop using getUrl and remove it
ddaina added a commit to ddaina/CRABClient that referenced this issue Apr 20, 2021
ddaina added a commit to ddaina/CRABClient that referenced this issue Apr 20, 2021
@ddaina ddaina closed this as completed in 2ae3898 Apr 22, 2021
ddaina added a commit that referenced this issue Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants