Merge pull request #87 from graeme-a-stewart/julia-caching

Add Julia caching project
research-software-collaborations · Feb 26, 2024 · 76e2690 · 76e2690
2 parents 54597f1 + 71e0aea
commit 76e2690
Show file tree

Hide file tree

Showing 2 changed files with 46 additions and 1 deletion.
diff --git a/projects/agc-physlite.yml b/projects/agc-physlite.yml
@@ -25,7 +25,7 @@ description: >
   The IRIS-HEP Analysis Grand Challenge (AGC) is a realistic environment for investigating how high energy physics data analysis workflows scale to the demands
   of the High-Luminosity LHC (HL-LHC). It captures relevant workflow aspects from data delivery to statistical inference. The AGC has so far been based on
   publicly available Open Data from the CMS experiment. The ATLAS collaboration aims to use a data format called PHYSLITE at the HL-LHC, which slightly differs
-  from the data formats used so far within the AGC. This project involves implementing the capability to analyze PHYSLITE ATLAS data within the similiar to AGC
+  from the data formats used so far within the AGC. This project involves implementing the capability to analyze PHYSLITE ATLAS data within the similar to AGC
   workflow, the columnar analysis prototype, and optimizing the related performance under large volumes of data. In addition to this, the evaluation of systematic
    uncertainties for ATLAS with PHYSLITE is expected to differ in some aspects from what the AGC has considered thus far. This project will also investigate workflows
   to integrate the evaluation of such sources of uncertainty for ATLAS.

diff --git a/projects/julia-caching.yml b/projects/julia-caching.yml
@@ -0,0 +1,45 @@
+---
+name: Enabling Julia code to run at scale with artefact caching
+postdate: 2024-02-23
+categories:
+  - Open science
+durations:
+  - 3 months
+experiments:
+  - Future Colliders
+skillset:
+  - Julia
+status:
+  - Available
+project:
+  - IRIS-HEP
+location:
+  - Any
+commitment:
+  - Full time
+program:
+  - IRIS-HEP fellow
+shortdescription: Develop HEP strategies for artefact caching in Julia to allow large scale running
+description: >
+  Julia is a promising language for high-energy physics as it combines the easy
+  of use and ergonomics of dynamic languages such as Python, with the runtime
+  speed of C or C++. One of Julia's features is that it uses a JIT (or
+  just-ahead-of-time) compiler to target the specific architecture on which it
+  is being run. This however, comes at the cost of the compilation time, meaning
+  that the first pass through the code is slower. If Julia is to be adopted
+  widely in high-energy physics, and run at large scales, then it is important
+  to mitigate this cost by *pre-compiling* the Julia code to be used on the
+  system and avoid the cost of recompiling on every node. This is accomplished
+  by the use of the `DEPOT_PATH` setting. This will first be investigated on
+  cluster systems at CERN, e.g., SWAN and lxbatch. Startup time and runtime will
+  be investigated with increasingly large sets of jobs running. Then we shall
+  extend our investigations to caching Julia code on CVMFS, which would allow
+  scaling to running on the whole grid. Finally, we shall examine the
+  possibility of precompiling artefacts for different microarchitectures, that
+  would allow exploitation of the full power of modern CPUs in large scale
+  heterogeneous systems.
+contacts:
+  - name: Graeme Stewart
+    email: [email protected]
+  - name: Pere Mato
+    email: [email protected]