Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to NonStationaryConvolve2D #466

Open
mrava87 opened this issue Jan 17, 2023 · 7 comments
Open

Improvements to NonStationaryConvolve2D #466

mrava87 opened this issue Jan 17, 2023 · 7 comments

Comments

@mrava87
Copy link
Collaborator

mrava87 commented Jan 17, 2023

This issue collects a few improvements for the NonStationaryConvolve2D as raised in #465 (and a couple of others):

  • Refactor bilinear interpolation
    The CUDA bilinear interp is the same as the CPU one. It then makes sense to wrap the same code under a same function. This can be done by coding a pure Python function and then defining a device function from it. For example:

    # CPU
    def fun(a, b):
       return a + b
    
    # GPU
    dev_fun = cuda.jit(device=True)(fun)
  • Move CPU to auxiliary file
    Since the GPU functions are in another file and taking into account the above refactor, it makes sense to move it to another file.

  • Improve thread launch parameters and iteration model
    Currently, there is a limit to the size of the model, imposed by the number of blocks and threads in the call. There are two approaches to fixing this. The first is by making the total number of threads depend on the model. For example, if the model is 1000 x 1000, launch 1 block of 1024 x 1024. There is a problem which appears when very large models are used: the number of threads that are required exceed the number that the GPU can provide.
    Grid stride looping can help with this as allows arbitrarily sized models to be processed by a smaller number of threads. For example, set 1 block of 1024 x 1024 and you can process any model of any size. The issue with naive grid-stride looping is that it may not be optimal in the number of threads. For example, a 32 x 32 model always launches 1024 x 1024 threads.
    A solution to that is to combine both techniques: use an input-dependent number of blocks/threads (that are powers of 2), but set a maximum number of total threads launched. This can even be set per GPU, but is probably ok to hardcode for a reasonable number obtained from the MAX_GRID_DIM_X attribute (see here).

  • Create operator that takes filters as input
    Currently, our operator takes a generic 2d signal as input and filters as parameter. However, an alternative operator that takes the 2d signal as parameter and the filters as inputs is desirable as it could be used to estimate the best non-stationary filters that match a given data and input signal. This is kind of equivalent to pylops.avo.prestack.PrestackWaveletModelling for the 1D case. I suggest making a completely new operator and calling it NonStationaryWavelets2D or NonStationaryFilters2D.

@mrava87
Copy link
Collaborator Author

mrava87 commented Jan 17, 2023

@cako can you look at the last point and let me know what you think :)

@cako
Copy link
Collaborator

cako commented Jan 17, 2023

Yup, that is a good idea. In fact that is what I first thought the operator did hahaha. So it would be nice to have both,

@mrava87
Copy link
Collaborator Author

mrava87 commented Jan 21, 2023

@cako I am struggling a bit to implement a bilinear interpolator that is called by all versions (numpy, numba, cuda-numba)...

More specifically, letting cuda-numba aside for a moment, I would like to have a function that performs interpolation (_interpolate_h) and one that does matvec/rmatvec (_matvec_rmatvec). Both of them will be numpy and then I can make an equivalent jitted one for numba. However, a jitted function can only call jitted functions, so I cannot have something like this

def _interpolate_h(hs, ix, iz, ohx, dhx, nhx, ohz, dhz, nhz):
    """find closest filters and linearly interpolate between them and interpolate psf"""
    ...

if jit_message is None:
    _interpolate_h_numba = jit(**numba_opts)(_interpolate_h)

def _matvec_rmatvec(
    x: NDArray,
    y: NDArray,
    hs: NDArray,
    hshape: Tuple[int, int],
    xdims: Tuple[int, int],
    ohx: float,
    ohz: float,
    dhx: float,
    dhz: float,
    nhx: int,
    nhz: int,
    rmatvec: bool = False,
    number : bool = False,
) -> NDArray:
    for ix in prange(xdims[0]):
        for iz in range(xdims[1]):

            # find filter
            if numba:
                h = _interpolate_h_numba(hs, ix, iz, ohx, dhx, nhx, ohz, dhz, nhz)
            else:
                h = _interpolate_h(hs, ix, iz, ohx, dhx, nhx, ohz, dhz, nhz)
            ...

if jit_message is None:
    _matvec_rmatvec_numba = jit(**numba_opts)(_matvec_rmatvec)

Any smart idea?

@cako
Copy link
Collaborator

cako commented Jan 24, 2023

How about this:

from math import floor
import numba
import numpy as np

class InterpolateAt:
    # A JIT function cannot be a standard class method
    # (takes `self`, Numba can't handle) or a classmethod
    # (takes `cls`, Numba can't handle). At most, they can
    # be staticmethods. So, here are three options:
    #     1. Create a non-JIT and a JIT staticmethod, bind
    #        at runtime the correct one depending on the
    #        engine.
    #     2. Create two classes e.g., InterpolateAtNumpy
    #        and InterpolateAtNumba, and return the 
    #        appropriate one based on engine upon the
    #        call to self.__new__.
    # See: https://santoshk.dev/posts/2022/__init__-vs-__new__-and-when-to-use-them/
    #    3. Bind only the non-JIT functions as staticmethods
    #       when possible, create all other JIT-ed functions
    #       when calling self.__init__.
    #
    # This class follows approach 3.

    def __init__(self, i, engine="numpy"):
        # Local, possibly JIT-ed interpolation.
        _interpolate_local = (
            numba.jit(nopython=True)(self._interpolate)
            if engine == "numba"
            else self._interpolate
        )

        # This is not necessary but replaces the original
        # staticmethod. This replacement is not a staticmethod
        # but that probably doesn't matter.
        self._interpolate = _interpolate_local
        # Alternatively, you can apply staticmethod, but
        # you lose the Numba attributes.
        # self._interpolate = staticmethod(_interpolate_local)

        # Creates a function with bound `i`. This
        # allows the compitation using Numba.
        def _matvec_rmatvec_local(x):
            return _interpolate_local(x, i)
        
        # Attaches the result of a possibly JIT-ed
        # `_matvec_rmatvec_local` to a the class, so it can
        # be accessible by external or internal methods.
        self._matvec_rmatvec = (
            numba.jit(nopython=True)(_matvec_rmatvec_local)
            if engine == "numba"
            else _matvec_rmatvec_local
        )

    # This staticmethod decorator can be removed if not
    # required. Alternatively, _interpolate can be defined
    # outside the class if it needs to be used by other classes.
    @staticmethod
    def _interpolate(ary, i):
        ifloor = floor(i)
        w = i - ifloor
        if ifloor < 0 or ifloor + 1 > len(ary) - 1:
            return np.nan
        return (1 - w) * ary[ifloor] + w * ary[ifloor + 1]

    # Cannot be JIT-ed, but is a thin wrapper over a JIT-ed
    # function.
    def _matvec(self, x):
        return self._matvec_rmatvec(x)

print(InterpolateAt._interpolate)
# <function InterpolateAt._interpolate at 0x7ff8fc2b9360>
print(InterpolateAt._matvec_rmatvec)
# Doesn't exist yet, raises AttributeError

x = np.arange(10)

IOpPy = InterpolateAt(1.5, engine="numpy")
assert (
    IOpPy._matvec_rmatvec(x) == IOpPy._matvec(x) == IOpPy._interpolate(x, 1.5)
)
print(IOpPy._interpolate(x, 1.5))
# 1.5
print(type(IOpPy._matvec_rmatvec))
# <class 'function'>
print(type(IOpPy._interpolate))
# <class 'function'>

IOpNB = InterpolateAt(1.5, engine="numba")
assert (
    IOpNB._matvec_rmatvec(x) == IOpNB._matvec(x) == IOpNB._interpolate(x, 1.5)
)
print(IOpNB._interpolate(x, 1.5))
# 1.5
print(type(IOpNB._matvec_rmatvec))
# <class 'numba.core.registry.CPUDispatcher'>
print(type(IOpNB._interpolate))
# <class 'numba.core.registry.CPUDispatcher'>
print(IOpNB._matvec_rmatvec.signatures)
# [(array(int64, 1d, C),)]

@mrava87
Copy link
Collaborator Author

mrava87 commented Jan 24, 2023

Mmmh I am not sure I understand how this fits in the bigger picture. You are suggesting to make a new class just for the interpolation part? And then feed it to the main NonStationaryConvolve2D class?

I just worry that to avoid repeating 20 lines of code between cpu and gpu implementation, we are going to have 200 more that are very complicated to understand and more likely to break in future if any change is introduced to numba...

Also, remembering the end goal (have shared interpolate function for numpy, numba, and numba-cuda implementations), are you sure we can easily integrated within the numba-cuda code? I never tried to work with classes in numba-cuda so it may be ok...

@mrava87
Copy link
Collaborator Author

mrava87 commented Jan 24, 2023

Could you try taking the current nonstatconvolve2d file and show how this would be implemented?

@cako
Copy link
Collaborator

cako commented Jan 24, 2023

Yup, I can have a look :)

mrava87 added a commit to mrava87/pylops that referenced this issue Jan 27, 2023
First implemenation of non-stationary filter estimation operators
(point 4 in PyLops#466).
cako pushed a commit that referenced this issue Feb 17, 2023
* feat: Added NonStationaryFilters1D and NonStationaryFilters2D
First implemenation of non-stationary filter estimation operators
(point 4 in #466).

* minor: added tests for nonstatfilters operators

* minor: added checks that filter locations are within dims

* minor: reintroduced parallel in rmatvec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants