Skip to content

Latest commit

 

History

History
1274 lines (953 loc) · 32.2 KB

slides.md

File metadata and controls

1274 lines (953 loc) · 32.2 KB
colorSchema favicon color layout routerMode title theme neversink_string
light
/public/images/diracx-logo-square.png
orange-light
cover
hash
The neXt Dirac incarnation
neversink
DiracX CHEP

The neXt Dirac incarnation

Federico Stagni

October 23rd 2024 __ CHEP 2024


layout: section color: lime-light

This is the story of why and how we decided to take a successful project and rewrite its code from scratch


layout: section color: lime-light

DIRAC logo --> DiracX

layout: section color: cyan-light

What is DIRAC?


layout: iframe-left title: DIRAC url: https://dirac.readthedocs.io/en/latest/ class: DIRAC slide_info: false color: gray-light

Juno Belle2 CTA lz ILC LHCb BES3 gridPP pierre-auger EGI na62 t2k weNMR hyperk lz euclid cepc nica

layout: top-title color: gray-light align: c title: Extensions

:: title ::

Action! (and extensions)

:: content ::

Few real life examples, also reported in this conference:

  • LHCb stores the metadata and provenance of every produced file in a LHCb-specific database (with an Oracle backend)
  • Belle2 is a HEP experiment. Uses Rucio as a data management solution.
  • CTAO has radically different requirements (compared to HEP experiments) on how to process its data.
  • HERD is an astronomy and particle astrophysics experiment using dHTC for data management.
  • EGI uses DIRAC as WMS, and EGI-CheckIn as an identity provider. Hosts (among others) WeNMR (structural biology and life science)
DIRAC is coded for being flexible and extendable

layout: top-title color: gray-light align: c title: history

:: title ::

DIRAC timeline

:: content ::

%%{init: {'theme': 'base', 'timeline': {'disableMulticolor': true}}}%%
timeline
        section LHCb software
          around 2000 : MC production system: bash scripts running at production sites
          2002 : DIRAC2 <br> Rewritten in Python, using xml-rpc, interfacing to EDG
          Data Challenge 04 : First successful grid usage ever.
                            : First use of pilot jobs based WMS
          2006-2007 : DIRAC3<br> Full rewriting, development of the DISET protocol -- still in use today!
                    : the current DIRAC framework is still based on this work
        section Open sourced, wider adoption
          2008 : Large-ish reshuffling to become multi-VO
               : LHCbDIRAC extension separated from core DIRAC code
          2009 : CLIC community adopts DIRAC
          2011 : France-Grilles is the first multi-VO DIRAC installation
          2012 : Belle2, BES3, CTA adopt DIRAC
          2021 : Python3 full support
Loading

layout: side-title align: lm-lm color: gray-light title: WMS titlewidth: is-3

:: title ::

Workload Management System

  • Pull model based on Pilot jobs
  • Also "Push" solution for HPCs that do not support pilots (because of limited internet access).
  • Will integrate CWL (Common Workflow Language) as a way of defining jobs (replacing JDL) --> see poster #217

:: content ::

%%{init: { 'theme': 'default' }}%%
flowchart LR;
Jobs["`Users see only **Jobs**`"]
A@{ shape: sl-rect, label: "APIs" }
WMS[("`**Workload
Management
System**`")]
style WMS fill:#bbf
HPC["`High
Perfomance
Computers`"]
style HPC fill:#A145
clusters["`Computer clusters`"]
style clusters fill:#A145
Grid_Nodes["Grid"]
Pilots["`**Pilots**
administer computing slots, and match (pull) jobs`"]

style HTCondorCE fill:#F23A
style ARC-AREX fill:#F23A
style libcloud fill:#F23A
style SSH fill:#F23A
style Grid_Nodes fill:#A145
style Iaas:Clouds fill:#A145
style HTCondor fill:#F26
style SLURM fill:#F26
style Jobs fill:#FFF
style Pilots fill:#FFF

A-->|jobs|WMS

WMS-->|pilots|libcloud
WMS-->|pilots|HTCondorCE
WMS-->|pilots|ARC-AREX
WMS-. jobs .->HPC
WMS-->|pilots|SSH

libcloud-->|VMs starting pilots|Iaas:Clouds
HTCondorCE-->Grid_Nodes
ARC-AREX-->Grid_Nodes
ARC-AREX-->HPC
SSH-->|pilots|SLURM
SSH-->|pilots|HTCondor
SSH-->|pilots|clusters
SLURM-->HPC
SLURM-->clusters
HTCondor-->clusters
Loading

layout: side-title align: lm-lm color: gray-light title: DMS titlewidth: is-5

:: title ::

Data Management System

It’s about files:​ placing, replicating, removing files​

  • there are LFNs (logical file names)
  • LFNs are registered in catalog(s)​
    • where are the LFNs? (in the DIRAC File Catalog (DFC), or in Rucio)​
    • where are their metadata? (in the DFC, or in the LHCb Bookkeeping, or in AMGA)​
  • LFNs may have PFNs (physical file names), stored in SEs (Storage Elements), that can be accessed with several protocols.​

:: content ::

%%{init: { 'theme': 'default' }}%%
flowchart LR;
A@{ shape: sl-rect, label: "APIs" }
DMS[("`**Data
Management
System**`")]
style DMS fill:#bbf
FC[["`**Catalog**`"]]
style FC fill:#bbf
StorageBase[["`**Storage Base**`"]]
style StorageBase fill:#bbf
DFC[("`DIRAC
Files
Catalog`")]
Rucio[("Rucio")]
style Rucio fill:#6001
TS[("`DIRAC
Transformation
System`")]
WebDav@{ shape: lin-cyl, label: "WebDav (http)" }
XRootD@{ shape: lin-cyl, label: "XRootD" }

style WebDav fill:#F23A
style XRootD fill:#F23A

A-->DMS
DMS-->FC
DMS-->StorageBase
FC-->DFC
FC-->Rucio
FC-->TS
StorageBase-->WebDav
StorageBase-->XRootD
Loading

layout: side-title color: gray-light align: lm-lm
title: TS

:: title ::

Transformation System

For productions and Dataset management

  • A Data Processing transformation (e.g. Simulation, Merge, DataReconstruction...) creates jobs in the WMS (and re-submits them if needed, eventually destroys them).​

  • A Data Manipulation transformation replicates, or removes, data from storage elements.

:: content ::

The Transformation System is used to automate common tasks related to production activities. It can handle thousands of productions, millions of files and jobs.

   

%%{init: { 'theme': 'default' }}%%
flowchart LR;
TS[("`**Transformation
System**`")]
style TS fill:#bbf
WMS[("`**Workload
Management
System**`")]
style WMS fill:#bba
RMS[("`**Request
Management
System**`")]
style RMS fill:#bba
DMS[("`**Data
Management
System**`")]
style DMS fill:#bba
PM@{ shape: sl-rect, label: "Productions Management" }
DM@{ shape: sl-rect, label: "DataSets Management" }

PM-->|Productions Definitions|TS
DM-->|DataSets Operations|TS
TS-->|Jobs|WMS
TS-->|Data Operations|RMS
RMS-->DMS
Loading

layout: top-title-two-cols align: c-ct-ct color: gray-light title: Web

:: title ::

Visualizations

:: left ::

DIRAC also provides a WebApp:

:: right ::

Dashboards can be created within the DIRAC Web App:

DIRAC_stack

and/or in Grafana:

DIRAC_stack


layout: top-title align: c color: gray-light title: DIRAC tech

:: title ::

Technicalities

:: content ::

  • DIRAC is written in python 3 (the Pilot still supports Python 2.7)
  • Services are exposed at urls like dips://box.some.where:9132/WorloadManagement/
    • dips stands for "DIRAC protocol"
  • The DIRAC framework provides also "Agents" (~ cron jobs) and "Executors" (~ tasks execution) to animate the system
  • As backends, and are supported (for different purposes)
  • The Web App is implemented using ExtJS, and fully custom Python "bindings"
  • For its internal AuthN/Z, DIRAC understands certificates and proxies
    • VOMS (Virtual Organization Membership Service) is effectively a hard DIRAC dependency

DIRAC_stack


layout: top-title-two-cols align: c-lm-lt color: gray-light title: world

What is the best way to keep up with these trends? Can we do it within the current framework?

:: title ::

Technology trends

:: left ::

You authenticate with an external "Identity provider":

idp_login

For authorization purposes you are using tokens everywhere:

google_token

:: right ::

(Nicely documented) REST APIs are a de-facto standard:

# "get a tag" from github

curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer <YOUR-TOKEN>" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/OWNER/REPO/git/tags/TAG_SHA

layout: top-title align: c color: gray-light title: wlcg

What is the best way to implement these recommendations? Can we do it within the current framework?

:: title ::

Recommendations from WLCG, EGI, etc.

:: content ::

  • VOMS (Virtual Organization Membership Service) has been, for many years, a de-facto standard for community management
    • it issues VOMS proxies ("short" certificates)
    • Outside of WLCG and EGI, proxies are not a thing
  • --> There are new Identity Providers delivering tokens instead of proxies

 

In this conference:


layout: top-title-two-cols color: gray-light align: c-rt-lt title: requirements

Easy to test (will make it easier to code), but also modern, fun, and accessible to new developers We need to ensure business continuity

:: title ::

Minimum Requirements

:: left ::

Communities/Users requirements

Ease of use, including ease of access

Fast and responsive interfaces

Scalable and flexible

:: right ::

Administrator requirements

Ease of installation and update

Up-to-date documentation

Clear confguration

Ready-to-use dashboards


layout: side-title align: rm-lm color: red titlewidth: is-2 title: issues

:: title ::

DIRAC challenges

:: content ::

  • complex, with high entrance bar
  • somewhat cumbersome deployment
  • late on “standards”
    • No http services
    • No tokens
    • Old monitoring
  • "old"-ish design (RPC, "cron" agents...)
  • not very developer-friendly: rather un-appealing/confusing, especially for new (and young) developers
  • multi-VO, but was not designed to do so since the beginning
  • a custom interface is needed to interact with a running DIRAC instance
    • meaning that you need to install a DIRAC client for interacting with DIRAC

layout: section color: lime-light

We decided that the best way of satisfying the requirements was to code a new Dirac


layout: section color: cyan-light

DiracX, the neXt DIRAC incarnation


layout: side-title side: left color: gray-light titlewidth: is-5 align: rm-lm title: DiracX

:: title ::

What is DiracX?

:: content ::

  • A cloud native app
  • Multi-VO from the get-go
  • Standards-based
Still Dirac, in terms of functionalities.

layout: iframe-right title: Web API url: https://diracx-cert.app.cern.ch/api/docs class: webAPI slide_info: false color: gray-light align: lm

DiracX Web API

What is on the right is the certification Web API, loaded live. Use with caution!

DIRAC Web APIs with

  • Nicely documented by
    • --> this is what you see on the right
  • Follows the specification, with the (python) client generated by AutoREST.

layout: top-title color: gray-light align: c title: CLI

:: title ::

CLI Interactions

:: content ::

  1. Logging in (using the diracx cli):
❯ dirac login gridpp
Logging in with scopes: ['vo:gridpp']
Now go to: https://diracx-cert.app.cern.ch/api/auth/device?user_code=XYZXYZXYZ
...Saved credentials to /home/fstagni/.cache/diracx/credentials.json
Login successful!
  1. Submitting a job (using Python requests):
import requests

requests.post('https://diracx-cert.app.cern.ch/api/jobs/', headers={'accept': 'application/json', 'Authorization': 'Bearer eyJhbG...', 'Content-Type': 'application/json'}, json=jdl)
  1. Getting its status (using curl):
```bash
curl -X 'GET' \
  'https://diracx-cert.app.cern.ch/api/jobs/status?job_ids=8971' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer eyJhbG...'  | jq
```
```json
{
  "8971": {
    "Status": "Done",
    "MinorStatus": "Execution Complete",
    "ApplicationStatus": "Unknown"
  }
}
```

layout: iframe-left title: WebApp url: https://diracx-cert.app.cern.ch class: webapp slide_info: false color: gray-light align: lm

DiracX web

We are also rewriting the Web App from scratch.

Software stack:

  • NextJS
  • Material UI
  • TypeScript
What is on the left is the certification WebApp, loaded live. Use with caution!

layout: top-title-two-cols color: gray-light align: c-lm-lm title: Deployments

:: title ::

Deployments

:: left ::

Kubernetes - Standard to define a distributed system

  • Separates infrastructure from applications
    • "Please IT department(/cloud provider) run this for me"

Helm gives the ability:

  • to parameterise
  • to distribute a kubernetes config

:: right ::

  • DiracX Helm chart
    • If your institution provides a kubernetes service: use it
    • If you work with public clouds: use their container services
    • Alternatively, follow these k3s instructions
  • Used for:
    • DiracX testing (GitHub actions)
    • Local development instance
    • Running a demo instance
    • Running the test instance you saw in the previous slides
    • Soon: running production instances

layout: quote color: sky-light quotesize: text-m authorsize: text-s author: 'Some of you out there'

"OK, but there are several communities using DIRAC right now. How do they migrate?"


layout: top-title color: gray-light align: c title: Migration

:: title ::

Business continuity for DIRAC communities is our top priority

Services of DIRAC v9 and DiracX will need to live together for some time

:: content ::

1 2 3 DIRAC and DiracX share the databases A legacy adaptor moves traffic from DIRAC to DiracX services DIRAC services can be removed

layout: top-title color: gray-light align: c title: FutureExtensions

:: title ::

Future action! (and extensions)

:: content ::

By now, we know that it is sometimes necessary to extend all Dirac(X) components
DiracX aims to provide an easy way to do so.
# entrypoints in pyproject.toml

[project.entry-points."diracx.db.sql"]
AuthDB = "diracx.db.sql:AuthDB"
JobDB = "<extension>.db.sql:ExtendedJobDB"
For DiracX and DiracX-Web we already provide reference extensions

layout: quote color: sky-light quotesize: text-m authorsize: text-s author: 'Again, some of you out there'

"You have shown tokens-based authorizations for DiracX. But the Grid still uses proxies.

VOMS is alive!"


layout: top-title color: gray-light align: c title: tokens

:: title ::

What are proxies and/or tokens needed for?

:: content ::

  • Identity (community membership): "in transition"
  • Submitting pilots: The computing elements right now prefer the tokens
  • Data access: at least in WLCG, proxies. One day, will be token
  • Verifying a user's identity (internally to Dirac):
    • DiracX uses only tokens (link to security model)
    • DIRAC uses only X509 proxies and certificates to verify identities

layout: top-title-two-cols color: gray-light align: c-cm-cm title: proxies+tokens columns: is-2

:: title ::

More on proxies and tokens

:: left ::

DiracX: Authorization with "standard" Authorization Code Flow redirecting to IdP

%%{init: { 'theme': 'forest' }}%%
sequenceDiagram
    create actor U as User
    create participant DiracX
    U->>DiracX: Login
    DiracX->>U: Redirect
    create participant External_IdP
    U->>External_IdP: 
    destroy External_IdP
    External_IdP->>DiracX: ID token
    DiracX->>U: DiracX token
Loading
DiracX delivers its own tokens, they are not the same tokens used for the Grid endpoints

:: right ::

DIRAC+DiracX: working with proxies and tokens

%%{init: { 'theme': 'forest' }}%%
sequenceDiagram
    create actor U as User
    create participant dirac-proxy-init
    U->>dirac-proxy-init: 
    create participant VOMS
    dirac-proxy-init->>VOMS: 
    destroy VOMS
    VOMS->>dirac-proxy-init: VOMS proxy
    create participant DIRAC
    dirac-proxy-init->>DIRAC: exchange proxy for token
    destroy DIRAC
    DIRAC->>dirac-proxy-init: DiracX token
    dirac-proxy-init->>U: proxy+token bundle
    U->>DIRAC_service: proxy
    U->>DiracX: token
Loading

layout: side-title color: gray-light title: Architecture align: cm-lm titlewidth: is-3

:: title ::

Architecture diagram

:: content ::


layout: top-title color: gray-light align: c title: Versions

:: title ::

Versions

:: content ::

%%{init: { 'logLevel': 'debug', 'theme': 'base', 'timeline': {'disableMulticolor': true}}}%%
timeline
    May 2022 : DIRAC v8.0
    Oct 2023 : EOL DIRAC v7.3
             : First DiracX demo
    Q4 2024  : DIRAC v9.0.0a30
             : DiracX v0.0.1a19
    Q1 2025  : DIRAC v9.0
             : DiracX v0.1
             : can start using DiracX services

Loading
Current production and only supported version, used by all DIRAC installations DIRAC v9 and DiracX 0.1 will be released together.

layout: side-title color: gray-light title: Contribute align: cm-lm titlewidth: is-3

:: title ::

"I want to contribute"

:: content ::

The obvious ways:

  • code (github.com/DIRACGrid)
  • tests: (as you could see we have a somewhat open test deployment infrastructure). Try something out, and let us know!

Run the demo (on your laptop):

git clone https://github.com/DIRACGrid/diracx-charts
diracx-charts/run_demo.sh # this is run for each and every commit in Github Actions

Discuss:


layout: top-title-two-cols align: cm-cm-lm color: orange-light columns: is-4 title: summary

:: title ::

Summary

:: left ::

:: right ::

  • DiracX is "the neXt Dirac incarnation", ensuring the future of the widely used Dirac
    • We are rewriting the code, but it is still Dirac that you love!
  • DiracX will ease the interoperability with Rucio and/or dask and/or any other tool out there
    • DiracX will still have the Data Management part, but its Workload Management functionalities will come first
  • The first DiracX release will soon be here
    • It will live together with DIRAC v9 for a while, until it will replace it completely

layout: credits color: navy loop: true speed: 0.5 title: credits/people

People
DiracX is an idea of
Chris Burr CERN, LHCb
Christophe Haen CERN, LHCb
Current Developers, maintainers, supporters
Alexandre Boyer CERN, LHCb
Natthan Piggoux LUPM (FR), CTA
Cedric Serfon Brookhaven National Laboratory (US), Belle2
Ryunosuke O'Neil CERN, LHCb
Jorge Lisa Laborda Univ. of Valencia and CSIC (ES), LHCb
Daniela Bauer Imperial college (UK), GridPP
Simon Fayer Imperial college (UK), GridPP
Janusz Martyniak Imperial college (UK), GridPP
Bertrand Rigaud IN2P3 (FR)
Luisa Arrabito LUPM (FR), CTA
Xiaomei Zhang Beijing, Inst. High Energy Phys. (CN), Juno
André Sailer CERN
Project lead
Federico Stagni CERN, LHCb
Andrei Tsaregorotsev CPPM (FR), EGI and LHCb

               

Questions?

layout: side-title align: rm-lm color: light titlewidth: is-5 title: QR

:: title ::

QR codes for your fun

or just click here (for DiracX web) and here (for the Web API docs)

:: content ::

WebApp:


WebAPI:



layout: section color: cyan-light align: r

Backup


layout: top-title color: gray-light align: c title: FAQ

:: title ::

Q/A

:: content ::

  • I am using {Rucio|dask|another_tool}. I could use DiracX as WMS but do not want to fiddle with DIRAC

--> It will probably be possible, but we do not know when.

  • What is in a DiracX token (is it "special")?

--> It carries the dirac_properties (which are the same as in current DIRAC)

           

  • What did you use to make these slides?

--> slidev with neversink theme. Diagrams with mermaid


layout: top-title color: gray-light align: c title: tests

:: title ::

Testing

:: content ::

  • we use Github Actions "massively"
  • our Integration tests create a "grid-in-a-box":
    • run DIRAC and DiracX servers, including databases
    • run ancillary services (e.g. IdP, CA)
    • authenticate, submit pilots, match and run jobs, upload files, etc