Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS paper writing #72

Merged
merged 20 commits into from
Jul 29, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ @article{Harris2020
year = {2020},
}
@article{Almansi2019,
author = {Mattia Almansi and Renske Gelderloos and Thomas W. n. Haine and Atousa Saberi and Ali H. Siddiqui},
author = {Mattia Almansi and Renske Gelderloos and Thomas W. N. Haine and Atousa Saberi and Ali H. Siddiqui},
doi = {10.21105/JOSS.01506},
issn = {2475-9066},
issue = {39},
Expand Down
18 changes: 9 additions & 9 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,28 +28,28 @@ bibliography: paper.bib

Numerical simulations of the Earth's oceans are becoming more realistic and sophisticated. Their complex layout and shear volume make it difficult for researchers to access and understand these data, however. Additionally, most ocean models, mostly finite-volume models, compute and calculate spatially-integrated properties, such as grid-cell averaged temperature or wall-integrated mass flux. On the other hand, in-situ oceanographic observations are effectively collected at points in space and time. This fundamental difference makes the comparison between observations and results from numerical simulation difficult.

In this work, we present seaduck, a Python package that can perform both Eulerian and Lagrangian interpolation on generic ocean datasets with good performance and scalability. This package accesses numerical datasets from the perspective of space-time points. It automatically navigates complex layouts of datasets and transforms discrete information to continuous fields. The value and derivatives of those fields can be access at any points in the domain defined by the user. Similar to fixed and mobile observational platforms, the points can be either stationary (Eulerian) or advected by the flow (Lagrangian).
In this work, we present seaduck, a Python package that can perform both Eulerian and Lagrangian interpolation on generic ocean datasets with good performance and scalability. This package accesses numerical datasets from the perspective of space-time points. It automatically navigates complex dataset layouts (grid topologies) and transforms discrete information to continuous fields. The values and derivatives of those fields can be accessed at any point in the domain defined by the user. Similar to fixed and drifting observational oceanographic instrument platforms, the points can be either stationary (Eulerian) or advected by the flow (Lagrangian).

# Statement of need

The seaduck package is different from other ocean analytical tools (e.g. oceanspy \[@Almansi2019\]) in the sense that it accesses the data from a point's perspective. Users define the points of interest using longitude, latitude, depth, and time, and the package then reads in relavent information from neighboring model grid points in discrete numerical models and constructs the continuous field around the points. Index lookup and space-time interpolation involved in this process is done efficiently with `scipy.spatial.cKDtree`[@Virtanen2020\] and numba\[@Lam2015\] compiled code , respectively. Since the points can be defined arbitrarily in the model domain, accessing discrete numerical output feels like getting values from a continuous field despite complex model layout.
The seaduck package is different from other ocean analytical tools (e.g., oceanspy \[@Almansi2019\]) because it accesses the circulation model data from the perspective of an arbitrary space-time point. Users define the points of interest using longitude, latitude, depth, and time. The package then reads necessary information from nearby model grid points and constructs the continuous (scalar or vector) field around the points. The index lookup and space-time interpolation involved in this process is done efficiently with `scipy.spatial.cKDtree` [@Virtanen2020\] and numba \[@Lam2015\] compiled code, respectively. As the points can be defined arbitrarily in the model domain, accessing discrete numerical output feels to the user like retrieving values from a continuous field, despite the complex model grid.

The points can be stationary or be advected by a model vector field. Most Lagrangian particle packages (e.g. [@oceanparcel, @individualdisplacement]) that compute particle trajectories by solving initial value problems numerically. Seaduck, instead, uses a mass-conserving analytic scheme based on the assumption of a step-wise steady velocity field similar to that used by TRACMASS\[@tracmass\]. The advection code is largely numba compiled, and the total amount of computation is smaller than solving initial value problems. Furthermore, seaduck also allow users to access the analytical trajectory of the particle rather than interpolated ones. The Lagrangian particle functionality is built based on the above-mentioned interpolation utilities, thus, is automatically able to navigate complex topology of numerical models.
The points can be stationary (fixed in space, or Eulerian) or be advected by a vector velocity field (Lagrangian). Most Lagrangian particle packages (e.g., [@oceanparcel, @individualdisplacement]) compute particle trajectories by solving the initial value problem numerically. Instead, seaduck uses efficient, accurate, mass-conserving analytic formulae, which assumes a step-wise steady velocity field similar to that used by TRACMASS \[@tracmass\]. The Lagrangian advection code is largely numba compiled, and the total amount of computation is less than solving the problem numerically. The Lagrangian particle functionality is based on the above-mentioned interpolation utilities, thus, it automatically navigates the complex topology of numerical ocean models.

Highly customizable interpolation methods is available for both Eulerian (stationary) or Lagrangian points. Users can define all the properties of the kernel or the list of kernel used, including: (1) the shape of the interpolation kernel(s) in both spatial and temporal dimensions, which defines which neighboring points are used, and therefore how the continuous field is estimated. (2) The interpolation weight function, which allows users to calculate derivatives in all four dimensions apart from interpolation. (3) The hierarchical sequence of kernels, namely what is the next smaller kernel to use if some of the interpolation points are land-masked.
Seaduck provides highly-customizable interpolation methods for both Eulerian and Lagrangian points. Users can control all the properties of a hierarchy of kernels, including: (1) The shape of the interpolation kernel(s) in both spatial and temporal dimensions, which defines which neighboring points are used, and therefore how the continuous field is estimated. (2) The interpolation weight function, which allows users to calculate generic linear operations on the data, such as differentiation and smoothing, in all four dimensions. The hierarchy of kernels controls behaviour near land-masked points. Specifically, the hierarchy consists of successively more compact kernels that are used depending on the proximity of land points.

With the above suite of functionalities, seaduck is capable of accomplishing many common tasks in ocean dataset analysis including interpolation, regridding and Lagrangian particle simulation and some new ones including interpolation in Lagrangian label space and analyzing tracer budget from Lagrangian perspective. We also strive to make seaduck an accessible education tool by creating a very simple high-level interface intended for people with little programming background.
With the above functionality, seaduck can accomplish many common tasks in ocean model data analysis, including interpolation, regridding, and Lagrangian particle simulation. Less common tasks are also possible, such as interpolation in Lagrangian label space, and analysis of tracer budgets along Lagrangian trajectories. We also strive to make seaduck an accessible education tool by creating a very simple high-level default interface, which is intended for people with little programming background, and for people who want to quickly try the tool.

# Usage Examples

While some usage examples are presented here, many more can be found in the documentation for seaduck (https://macekuailv.github.io/seaduck/). The notebooks of the following examples run on SciServer[@sciserver], an openly available cloud compute resource. A supplementary GitHub repository (https://github.com/MaceKuailv/seaduck_sciserver_notebook) holds all SciServer notebooks, which is being continuously maintained.
While some usage examples are presented here, many more can be found in the documentation for seaduck (https://macekuailv.github.io/seaduck/). The notebooks of the following examples run on SciServer [@sciserver], an openly available cloud compute resource for scientific data analysis. A supplementary GitHub repository (https://github.com/MaceKuailv/seaduck_sciserver_notebook) holds all SciServer notebooks, and is being continuously maintained.

![Fig.1 (a) Scatterplot with colors showing the sea surface height value near Kangerdlugssuaq Fjord defined in the model and interpolated by seaduck.\label{fig:onlyone}. (b) Streaklines of particle advected by stationary 2D slice of the LLC4320 simulation, colors denotes the current speed.](fig1.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the two panels into two separate figures so they're larger and easier to read.

![Fig.1 (a) Scatterplot with colors showing the sea surface height value near Kangerdlugssuaq Fjord defined in the model and interpolated by seaduck.\label{fig:onlyone} (b) Streaklines of particle advected by stationary 2D slice of the LLC4320 simulation. Colors denote the current speed.](fig1.png)

## Interpolation / regridding

In this subsection, we are going to explore the interpolation/regridding functionality of the package. As an example, we used a realistic simulation of the Kangerdlugssuaq Fjord [@Fraser2018] as an example. This is an MITgcm simulation with very uneven grid spacings, i.e. grids close or in the fjord is much more densely placed than the rest. For the interpolation on sea surface height field, we use all the center grid points of the datasets as well as another 60,000 points in a rectangular region where the model grid points are sparsely places (between 66.5N to 67N, between 28.5W to 34.5 W, 600 in longitudinal direction and 100 in latitudinal direction). As shown in Fig. 1a. The interpolated field matches the background field very well, even when the interpolation is happening close to land ocean interface.
As an example of seaduck's interpolation/regridding functionality, consider a realistic simulation of the Kangerdlugssuaq Fjord, which is in east Greenland [@Fraser2018]. This is an MITgcm simulation with uneven grid spacing such that grid cells within the fjord are much more densely packed than elsewhere. The goal is to interpolate, and hence regrid, the sea surface height field, $\eta$, to a uniform grid spacing in the southern part of the domain. As shown in Fig. 1a, the interpolated field matches the background field very well, even for points close to land.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify what you mean by "matches the background field very well" (?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cite for "MITgcm"


## Global particle simulation on LLC4320

In this example, a stationary 2D slice of the state of the art LLC4320[@llc4320] model is used. LLC4320 is a global kilometer-scale model with complex topology. 150,000 particles were released randomly and evenly on the globe and are simulated for 30 days. This simulation takes about an hour to run on SciServer[@sciserver]. Fig. 1b is the particle trajectories produced in this simulation spanning several tiles of the domain looking from the North pole. The colors denote the current speed.
In this example, a stationary, surface slice of the LLC4320 [@llc4320] simulation is used. LLC4320 is a kilometer-scale model of the global ocean circulation with complex topology. 150,000 Lagrangian particles are released randomly and evenly on the globe, and seaduck computes their trajectories for 30 days. Fig. 1b shows the particle trajectories for the northern hemisphere, which contains around 10$^8$ velocity points. The colors denote the current speed. This simulation takes about an hour to run on SciServer [@sciserver].