From d0f3726108aefb17863dbeb3670d7115ba9f468b Mon Sep 17 00:00:00 2001 From: Leo Alessandro Bianchi <71835745+LeeoBianchi@users.noreply.github.com> Date: Fri, 17 May 2024 23:45:16 +0200 Subject: [PATCH] Update paper.md --- paper/paper.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/paper/paper.md b/paper/paper.md index 117e961..79f0c2d 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -24,15 +24,15 @@ bibliography: paper.bib # Summary -Spherical Harmonic Transforms (SHT) can be seen as Fourier Transforms' spherical, two-dimensional counterparts, casting real-space data to the spectral domain and vice versa. -As in Fourier analysis a function is decomposed into a set of amplitude coefficients, through an SHT, any spherically-symmetric field defined in real space can be decomposed into a set of complex harmonic coefficients $a_{\ell, m}$, commonly referred to as alms, each quantifying the contribution of the corresponding spherical harmonic function. +Spherical Harmonic Transforms (SHTs) can be seen as Fourier Transforms' spherical, two-dimensional counterparts, casting real-space data to the spectral domain and vice versa. +As in Fourier analysis a function is decomposed into a set of amplitude coefficients, an SHT allows to decompose any spherically-symmetric field, defined in real space, into a set of complex harmonic coefficients $a_{\ell, m}$, commonly referred to as alms, each quantifying the contribution of the corresponding spherical harmonic function. SHTs are important for a wide variety of theoretical and practical scientific applications, including particle physics, astrophysics, and cosmology. -However, the SHTs are generally computationally expensive operations and thus often constitute the *bottleneck* of the scientific software they are part of. -For this reason, much effort has been spent over the last couple of decades to obtain fast and efficient SHT implementations. +However, SHTs are generally computationally expensive operations and thus often constitute the *bottleneck* of the scientific software they are part of. +For this reason, much effort has been spent over the last couple of decades to obtain fast and efficient SHTs implementations. In such a setting, parallel computing naturally comes into play, especially for time-consuming software to be run on large High-Performance Computing (HPC) clusters. -The Julia package `HealpixMPI.jl` constitutes an extension package of `Healpix.jl` [@Healpix_jl], efficiently parallelizing its SHT functionalities. +The Julia package `HealpixMPI.jl` constitutes an extension package of `Healpix.jl` [@Healpix_jl], efficiently parallelizing its SHT-related functionalities. `Healpix.jl` is a Julia-only implementation of the HEALPix [@HEALPix] library, which provides one of the most used two-sphere tessellation schemes and a series of SHTs-related functions. The main goal of the Julia package presented in this paper, `HealpixMPI.jl`, is to efficiently employ a large number of computing cores to perform fast spherical harmonic transforms. @@ -46,7 +46,7 @@ This paper presents the key features implemented to achieve this, together with # Statement of need Together with a variety of applications, spherical harmonic transforms are extremely relevant in different cosmological research topics, e.g., @Loureiro_2023 and @euclidcollaboration2023euclid. -Among those, SHT are essential for the analysis of cosmic microwave background (CMB) radiation, which is one of the most active cosmology research areas. +Among those, SHTs are essential for the analysis of cosmic microwave background (CMB) radiation, which is one of the most active cosmology research areas. CMB radiation is, in fact, very conveniently described as a temperature (and polarization) field on the celestial sphere, making spherical harmonics the most natural mathematical tool for analyzing its measured signal. On the other hand, from a computational point of view, CMB field measurements need, of course, to be discretized, requiring a mathematically consistent pixelization of the sphere and the functions defined on it. This is exactly the goal HEALPix was targeting when it was released more than two decades ago; it quickly became the standard library for CMB numerical analysis. @@ -70,7 +70,7 @@ In fact, `DUCC`’s code is derived directly from `libsharp`, but has been signi # Hybrid parallelization of the SHT -To run SHT on a large number of cores, i.e., on an HPC cluster, `HealpixMPI.jl` provides a hybrid parallel design, based on simultaneous usage of multithreading and MPI, for shared- and distributed-memory parallelization respectively, as shown in \autoref{fig:hybrid}. +To run SHTs on a large number of cores, i.e., on an HPC cluster, `HealpixMPI.jl` provides a hybrid parallel design, based on simultaneous usage of multithreading and MPI, for shared- and distributed-memory parallelization respectively, as shown in \autoref{fig:hybrid}. ![Multi-node computing cluster representation. The optimal way to parallelize operations such as the SHTs on a cluster of computers is to employ MPI to share the computation *between* the available nodes, assigning one MPI task per node, and multithreading to parallelize *within* each node, involving as many CPUs as locally available. Figure taken from www.comsol.com. \label{fig:hybrid}](figures/hybrid_parallel.png){width=70%} @@ -92,7 +92,7 @@ This section shows the results of parallel benchmark tests conducted on `Healpix In particular, a strong-scaling scenario is analyzed: given a problem of fixed size, the wall time improvement is measured as the number of cores exploited in the computation is increased. To obtain a reliable measurement of massively parallel spherical harmonics wall time is certainly nontrivial, especially for tests employing a high number of cores; intermittent operating system activity (aka, jitter) can significantly distort the measurement of short time scales. -For this reason, the benchmark tests were carried out by timing a batch of 20 `alm2map` + `adjoint_alm2map` SHT pairs. +For this reason, the benchmark tests were carried out by timing a batch of 20 `alm2map` + `adjoint_alm2map` SHTs pairs. For reference, the scaling shown here is relative to unpolarized spherical harmonics with $\mathrm{N}_\mathrm{side} = 4096$ and $\ell_{\mathrm{max}} = 12287$ and were carried out on the [Hyades cluster](https://www.mn.uio.no/astro/english/services/it/help/basic-services/compute-resources.html) of the University of Oslo. The benchmark results are quantified as the wall time multiplied by the total number of cores, shown in a 3D plot (\autoref{fig:bench}) as a function of the number of local threads and MPI tasks (always one per node).