Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: multinterp: A Unified Interface for Multivariate Interpolation in the Scientific Python Ecosystem #937

Merged
merged 12 commits into from
Sep 25, 2024
Binary file added papers/alan_lujan/banner.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/alan_lujan/figure1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added papers/alan_lujan/figure2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
1,416 changes: 1,416 additions & 0 deletions papers/alan_lujan/figures/BilinearInterpolation.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
1,883 changes: 1,883 additions & 0 deletions papers/alan_lujan/figures/CurvilinearInterpolation.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
1,770 changes: 1,770 additions & 0 deletions papers/alan_lujan/figures/UnstructuredInterpolation.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
195 changes: 195 additions & 0 deletions papers/alan_lujan/figures/figures.ipynb

Large diffs are not rendered by default.

1,466 changes: 1,466 additions & 0 deletions papers/alan_lujan/figures/manim_notebook.ipynb

Large diffs are not rendered by default.

121 changes: 121 additions & 0 deletions papers/alan_lujan/main.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
# Ensure that this title is the same as the one in `myst.yml`
title: multinterp
subtitle: A Unified Interface for Multivariate Interpolation in the Scientific Python Ecosystem
abstract: |
Multivariate interpolation is a fundamental tool in scientific computing, yet the Python ecosystem offers a fragmented landscape of specialized tools. This fragmentation hinders code reusability, experimentation, and efficient deployment across diverse hardware. To address this challenge, I've developed the `multinterp` package. It provides a unified interface for regular/rectilinear interpolation, supports serial (NumPy/SciPy), parallel (Numba), and GPU (CuPy, PyTorch, JAX) backends, and includes tools for multivalued interpolation and interpolation of derivatives.
exports:
- format: pdf
---

alanlujan91 marked this conversation as resolved.
Show resolved Hide resolved
## Introduction

The scientific Python ecosystem has a number of diverse tools for multivariate interpolation. However, these tools are scattered across multiple packages, each constructed for a specific purpose that prevents them from being easily used in other contexts.

Currently, there is no unified and comprehensive interface for interpolation with structured data (such as regular, rectilinear, and curvilinear) and unstructured data (as in irregular or scattered) that can be used with different hardware backends (e.g., serial, in parallel, or on a gpu) and software (numpy, numba, cupy, pytorch, jax) backends. The lack of this common platform makes it difficult for users to switch between different interpolation methods and backends, leading to inefficiencies and inconsistencies in their research.

This project aims to develop a comprehensive framework for multivariate interpolation for structured and unstructured data that can be used with different software technologies and hardware backends.

## Grid Interpolation

Functions are powerful mappings between sets of inputs and outputs, indicating how one set of values is related to another. Functions, however, are also infinitely dimensional, in that the inputs can range over an infinite number of values each mapping 1-to-1 (typically) to an infinite number of outputs. This makes it difficult to represent non-analytic functions in a computational environment, as we can only store a finite number of values in memory. For this reason, interpolation is a powerful tool in scientific computing, as it allows us to represent functions with a finite number of values and to approximate the function's behavior between these values.

The set of input values on which we know the function's output values is called a **grid**. A grid (or input grid) of values can be represented in many ways, depending on its underlying structure. The broadest categories of grids are regular or structured grids, and irregular or unstructured grids. Regular grids are those where the input values are arranged in a regular pattern, such as a triangle or a quadrangle. Irregular grids are those where the input values are not arranged in a particularly structured way and can seem to be scattered randomly across the input space.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of technical terms and concepts could be more gradual and systematic, ensuring the audience can follow the content smoothly. The 2nd paragraph directly jumps to grid interpolation without setting the context before.


As we might imagine, interpolation on regular grids is much easier than interpolation on irregular grids as we are able to exploit the structure of the grid to make predictions about the function's behavior between known values. Irregular grid interpolation is much more difficult, and often requires *regularizing* and/or *regression* techniques to make predictions about the function's behavior between known values. `multinterp` aims to provide a comprehensive set of tools for both regular and irregular grid interpolation, and we will discuss some of these tools in the following sections.

```{list-table} Grids and structures implemented in "multinterp".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do you have plans to support other interpolations e.g. spline, inverse distance weight, natural neighbor etc.?

:label: tbl:grids
:header-rows: 1
* - Grid
- Structure
- Geometry
* - Rectilinear
- Regular
- Rectangular mesh
* - Curvilinear
- Regular
- Quadrilateral mesh
* - Unstructured
- Irregular
- Random
```

## Rectilinear Interpolation

A *rectilinear* grid is a regular grid where the input values are arranged in a *rectangular* (in 2D) or *hyper-rectangular* (in higher dimensions) pattern. Moreover, they can be represented by the tensor product of monotonically increasing vectors along each dimension. For example, a 2D rectilinear grid can be represented by two 1D arrays of increasing values, such as $x = [x_0, x_1, x_2, \cdots, x_n]$ and $y = [y_0, y_1, y_2, \cdots, y_m]$, where $x_i > x_j$ and $y_i > y_j$ $\forall i > j$, and the input grid is then represented by $x \times y$ of dimensions $n \times m$. This allows for a very simple and efficient interpolation algorithm, as we can easily find and use the nearest known values to make predictions about the function's behavior in the unknown space.

alanlujan91 marked this conversation as resolved.
Show resolved Hide resolved
```{figure} figures/BilinearInterpolation
:label: bilinear
:alt: A non-uniformly spaced rectilinear grid can be transformed into a uniformly spaced coordinate grid (and vice versa).
:align: center

A non-uniformly spaced rectilinear grid can be transformed into a uniformly spaced coordinate grid (and vice versa).
```

### Multilinear Interpolation

`multinterp` provides a simple and efficient implementation of *multilinear interpolation* for various backends (`numpy` (@Harris2020), `scipy` (@Virtanen2020), `numba` (@Lam2015), `cupy` (@Okuta2017), `pytorch` (@Paszke2019), and `jax` (@Bradbury2018)) via its `multinterp` function. From the remaining of this section, `multinterp` refers to the `multinterp` function in `multinterp` package, unless otherwise specified.
alanlujan91 marked this conversation as resolved.
Show resolved Hide resolved

The main workhorse of `multinterp` is `scipy.ndimage`'s `map_coordinates` function. This function takes an array of **input** values and an array of **coordinates**, and returns the interpolated values at those coordinates. More specifically, the `input` array is the array of known values on the coordinate (index) grid, such that `input[i,j,k]` is the known value at the coordinate `(i,j,k)`. The `coordinates` array is an array of fractional coordinates at which we wish to know the values of the function, such as `coordinates[0] = (1.5, 2.3, 3.1)`. This indicates that we wish to know the value of the function between input index $i \in [1,2]$, $j \in [2,3]$, and $k \in [3,4]$. While `map_coordinates` is a powerful tool for coordinate grid interpolation, a typical function in question may not be defined on a coordinate grid. For this reason, we first need to find a mapping between the functional input grid and the coordinate grid, and then use `map_coordinates` to interpolate the function on the coordinate grid.

To do this, the `multinterp` package provides the function `get_coordinates`, which creates a mapping between the functional input grid that may be defined on the real numbers, and the coordinate grid which is defined on the positive integers including zero. In short, `multinterp` consists of two main functions: `get_coordinates` and `map_coordinates`, which together provide a powerful and flexible framework for multilinear interpolation on rectilinear grids.

An additional advantage of `scipy`'s `map_coordinates` is that it has been extended and integrated to various backends, such as `cupy` ([`cupyx.scipy.ndimage.map_coordinates`](https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.ndimage.map_coordinates.html)) and `jax` ([`jax.scipy.ndimage.map_coordinates`](https://jax.readthedocs.io/en/latest/_autosummary/jax.scipy.ndimage.map_coordinates.html)). This allows for easy and efficient interpolation on GPUs and TPUs, which can be orders of magnitude faster than CPU interpolation. For wider compatibility, `multinterp` also provides a `numba` and `pytorch` implementation of `map_coordinates`, which broadens the range of hardware that can be used for interpolation.

```{include} notebooks/Multivariate_Interpolation.ipynb
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multilinear_Interpolation.ipynb comment

def squared_coords(x, y):
    return x**2 + y**2

It will be helpful to provide a short reason for choosing the squared coordinates function. Example: A squared x and y coordinates function is used to draw a figure whose grid geometry looks like a curved sheet (bowl) in 3D projection.

This closed-form solution function, for which all points along the curved surface are known, is used as the baseline model. From this known model, we can draw sample points to approximate similarly shaped unknown functions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multivariate_Interpolation.ipynb: figures comment
fig 1:
missing title, missing axes labels (x, y, z).
suggest adding color shading to increase contrast (center to edges)
screenshot3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fig 2: interpolated figures
missing title, axes labels (x, y z)
suggest ways to show this figure is interpolated.
Possibly blow up a small area with enhanced pixelation.
screenshot4

### Derivatives

The `multinterp` package also allows for the calculation of derivatives of the interpolated function defined on a rectilinear grid. This is done by using the function `get_grad`, which wraps numpy's `gradient` function to calculate the gradient of the interpolated function at the given coordinates.

```{include} notebooks/Multivariate_Interpolation_with_Derivatives.ipynb
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multivariate_Interpolation_with_Derivatives.ipynb comment
fig1, fig2, fig3, fig4
Axes need labels on all 4 figures (x, y, z) or (dz/dx, dz/dy, f(z)).

Difficult to see the relationship between first group of 2 figures and second group of 2 figures (partial derivatives). Perhaps some lines can be drawn to show an example of the original function and its partial derivatives. I leave this decision up to the author.
screenshot5

### Multivalued Interpolation

Finally, the `multinterp` package allows for multivalued interpolation on rectilinear grids via the `MultivaluedInterp` class.

```{include} notebooks/Multivalued_Interpolation.ipynb
```

## Curvilinear Interpolation

A *curvilinear* grid is a regular grid whose input coordinates are *curved* or *warped* in some regular way, but can nevertheless be transformed back into a regular grid by simple transformations. That is, every quadrangle in the grid can be transformed into a rectangle by a remapping of its verteces. There are two approaches to curvilinear interpolation in `multinterp`: the first requires a "point location" algorithm to determine which quadrangle the input point lies in, and the second requires a "dimensional reduction" algorithm to generate an interpolated value from the known values in the quadrangle.

```{figure} figures/CurvilinearInterpolation
:label: curvilinear
:alt: A curvilinear grid can be transformed into a rectilinear grid by a simple remapping of its vertices.
:align: center

A curvilinear grid can be transformed into a rectilinear grid by a simple remapping of its vertices.
```

```{include} notebooks/Curvilinear_Interpolation.ipynb
```

## Unstructured Interpolation

```{figure} figures/UnstructuredInterpolation
:label: unstructured
:alt: Unstructured grids are irregular and often require a triangulation step which might be computationally expensive and time-consuming.
:align: center

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memo, the alt-text sections in code blocks do not wrap properly when viewed on an iPad/Tablet device. Curvenotes build version works properly. So this issue only comes up when viewed direcly from a GitHub repo. Not sure if this is an issue for proceedings.
(Also alt-text wrap issue in Rectilinear Interpolation and Curvilinear Interpolation sections.)

Unstructured grids are irregular and often require a triangulation step which might be computationally expensive and time-consuming.
```

```{include} notebooks/Unstructured_Interpolation.ipynb
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unstructured_Interpolation.ipynb comment
For figures after the 1st one, (group 1: nearest, linear, cubic, radial basis) and (group 2: original, gaussian process regression), eye-balling the differences in the figures maybe easier with grid lines drawn in white or black ink. Also just a suggestion.

screenshot1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

screenshot2

## Conclusion

Multivariate interpolation is a cornerstone of scientific computing, yet the Python ecosystem (@Oliphant2007) presents a fragmented landscape of tools. While individually powerful, these packages often lack a unified interface. This fragmentation makes it difficult for researchers to experiment with different interpolation methods, optimize performance across diverse hardware, and handle varying data structures (regular, rectilinear, curvilinear, unstructured).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to give a nod to PyTorch and TensorFlow for having call-backs, hooks, and function wrappers to allow a user to swap out an optimization function or module mid-stream? They do not cover all the use cases of the multinterp package, but some effort went into developing a layered API to cover varying use cases.

NumPy also has structured data type, that can be used for custom data type and hierarchial data structures. It can be seen as an attempt to provide a flexible (customizable) user interface, even though its aim and scope is different from 'multinterp.' (NumPy structured datatype's goal seems mostly for C code interface and optimized C module or C numerical recipe interface and explicit memory control or memory layout control.) The general reader may appreciate having some context. This package may be viewed as a further development of previous efforts at a flexible user interface for users of varying data types and data geometries.
(See structured arrays in https://numpy.org/doc/stable/user/basics.rec.html)

The `multinterp` project seeks to change this. Its goal is to provide a unified, comprehensive, and flexible framework for multivariate interpolation in Python. This framework will streamline workflows by offering:

- Unified Interface: A consistent API for interpolation, regardless of data structure or desired backend, reducing the learning curve and promoting code reusability.
- Hardware Adaptability: Seamless support for CPU (NumPy, SciPy), parallel (Numba), and GPU (CuPy, PyTorch, JAX) backends, empowering users to optimize performance based on their computational resources.
- Broad Functionality: Tools for regular/rectilinear interpolation, multivalued interpolation, and derivative calculations, addressing a wide range of scientific problems.

The multinterp package (<https://github.com/alanlujan91/multinterp>) is currently in its beta stage. It offers a strong foundation but welcomes community contributions to reach its full potential. We invite collaboration to improve documentation, expand the test suite, and ensure the codebase aligns with the highest standards of Python package development.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the text mentions documentation and community contributions, specific links or guidelines for how users can contribute or access detailed documentation are not provided. Adding clear references or links to supplementary materials would enhance the text's utility.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curvenoted build version comment
This paper when viewed on-screen, right-bottom screen area shows a list of related files. If possible, files should be listed in its order of appearance.

Supporting Documents
(shown in the order of appearance)

Multivariate_Interpolation.ipynb
Multivariate_Interpolation_with_Derivatives.ipynb
Multivalued_Interpolation.ipynb
Curvilinear_Interpoliation.ipynb
Unstructured_Interpolation.ipynb
manim_notebook.ipynb (figure animation on Curvenotes build)
figures.ipynb (add?)

Loading