You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The aim of this should be to be able to submit a job to DiracX with an input sandbox and be able to run it with the existing DIRAC infrastructure.
There are two parts of this as uploaded sandboxes need to be available via both DISET and DiracX.
How sandboxes currently work?
Files are uploaded with DISET and the path is SE:ProductionSandboxSE:/Sandbox/u/username.group/hash.tar.gz where ProductionSandboxSE comes from the CS (LocalSEName) and /Sandbox/ is hard coded.
Data is stored on the service's local disk.
Data is deduplicated for a given user+group using the file's MD5 hash.
Files are downloaded with DISET from local storage.
How will sandboxes work in DiracX?
Data will be stored on S3-compatible storage (default is minio from the helm chart)
Sandbox paths will be SE:ProductionSandboxSE:/S3/u/username.group/hash.tar.gz
Doing POST /jobs/sandbox returns a payload indicating the presigned URL which is used to send the data. The body of the post request includes the SHA-256 of the payload and the payload size.
If the data already exists a JSON blob is returned containing the sandbox ID which should be included in the JDL.
If the data doesn't exist, a JSON blob including S3 pre-signed URL is returned. This pre-signed URL should use the x-amz-content-sha256 header to make the storage verify the hash of the sandbox and limit the Content-Lenth.
Data is downloaded by doing GET /jobs/sandbox/SE:ProductionSandboxSE:/S3/u/username.group/hash.tar.gz with a HTTP temporary redirect to a pre-signed URL.
For assigning sandboxes to job IDs (implementation tbd):
Input sandboxes are handled by the JobSanity executor
Output sandboxes
Migration
The exisiting SandboxStoreHandler need to be modified to:
Proxy data from S3 if the path starts with /S3/.
The existing SandboxStoreHandler should have a flag (UseS3Backend) added to make it upload to S3 instead of the local disk and return the appropriate path.
Clean up sandboxes on S3
Once all sandboxes exist on S3, the legacy adaptor mechanism can be used to make the SandboxStoreClient talk directly to diracx. At this point the SandboxStoreHandler can be removed.
We don't need to expose old sandboxes via DiracX as we can rely on the UseDiracXBackend flag having been set for a while.
If an installation cares about keeping older sandboxes a migration script can be created to move sandboxes from the local disk to S3.
The text was updated successfully, but these errors were encountered:
The aim of this should be to be able to submit a job to DiracX with an input sandbox and be able to run it with the existing DIRAC infrastructure.
There are two parts of this as uploaded sandboxes need to be available via both DISET and DiracX.
How sandboxes currently work?
SE:ProductionSandboxSE:/Sandbox/u/username.group/hash.tar.gz
whereProductionSandboxSE
comes from the CS (LocalSEName
) and/Sandbox/
is hard coded.How will sandboxes work in DiracX?
SE:ProductionSandboxSE:/S3/u/username.group/hash.tar.gz
POST /jobs/sandbox
returns a payload indicating the presigned URL which is used to send the data. The body of the post request includes the SHA-256 of the payload and the payload size.x-amz-content-sha256
header to make the storage verify the hash of the sandbox and limit theContent-Lenth
.GET /jobs/sandbox/SE:ProductionSandboxSE:/S3/u/username.group/hash.tar.gz
with a HTTP temporary redirect to a pre-signed URL.Tasks to implement this in DiracX:
SandboxMetadataDB
(see Partially adding sandbox metadata db #42 and Update SandboxMetadataDB interface #116)For assigning sandboxes to job IDs (implementation tbd):
JobSanity
executorMigration
The exisiting
SandboxStoreHandler
need to be modified to:/S3/
.SandboxStoreHandler
should have a flag (UseS3Backend
) added to make it upload to S3 instead of the local disk and return the appropriate path.Once all sandboxes exist on S3, the legacy adaptor mechanism can be used to make the SandboxStoreClient talk directly to diracx. At this point the SandboxStoreHandler can be removed.
We don't need to expose old sandboxes via DiracX as we can rely on the
UseDiracXBackend
flag having been set for a while.If an installation cares about keeping older sandboxes a migration script can be created to move sandboxes from the local disk to S3.
The text was updated successfully, but these errors were encountered: