doc: added graphics of cpu-gpu scenarios

mrava87 · Nov 24, 2024 · e99a54a · e99a54a
1 parent 331b14c
commit e99a54a
Show file tree

Hide file tree

Showing 4 changed files with 169 additions and 147 deletions.
diff --git a/docs/source/_static/cupy_diagram.png b/docs/source/_static/cupy_diagram.png
diff --git a/docs/source/_static/numpy_cupy_bd_diagram.png b/docs/source/_static/numpy_cupy_bd_diagram.png
diff --git a/docs/source/_static/numpy_cupy_vs_diagram.png b/docs/source/_static/numpy_cupy_vs_diagram.png
diff --git a/docs/source/gpu.rst b/docs/source/gpu.rst
@@ -29,6 +29,172 @@ provide data vectors to the solvers, e.g., when using
 For JAX, apart from following the same procedure described for CuPy, the PyLops operator must
 be also wrapped into a :class:`pylops.JaxOperator`.
 
+See below for a comphrensive list of supported operators and additional functionalities for both the
+``cupy`` and ``jax`` backends.
+
+
+Examples
+--------
+
+Let's now briefly look at some use cases.
+
+End-to-end GPU powered inverse problems
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+First we consider the most common scenario when both the model and data
+vectors fit onto the GPU memory. We can therefore simply replace all our
+``numpy`` arrays with ``cupy`` arrays and solve the inverse problem of
+interest end-to-end on the GPU.
+
+.. image:: _static/cupy_diagram.png
+   :width: 600
+   :align: center
+
+Let's first write a code snippet using ``numpy`` arrays, which PyLops
+will run on your CPU:
+
+.. code-block:: python
+
+   ny, nx = 400, 400
+   G = np.random.normal(0, 1, (ny, nx)).astype(np.float32)
+   x = np.ones(nx, dtype=np.float32)
+
+   # Create operator
+   Gop = MatrixMult(G, dtype='float32')
+
+   # Create data and invert
+   y = Gop @ x
+   xest = Gop / y
+
+Now we write a code snippet using ``cupy`` arrays, which PyLops will run on
+your GPU:
+
+.. code-block:: python
+
+   ny, nx = 400, 400
+   G = cp.random.normal(0, 1, (ny, nx)).astype(np.float32)
+   x = cp.ones(nx, dtype=np.float32)
+
+   # Create operator
+   Gop = MatrixMult(G, dtype='float32')
+
+   # Create data and invert
+   y = Gop @ x
+   xest = Gop / y
+
+The code is almost unchanged apart from the fact that we now use ``cupy`` arrays,
+PyLops will figure this out.
+
+Similarly, we write a code snippet using ``jax`` arrays which PyLops will run on
+your GPU/TPU:
+
+.. code-block:: python
+
+   ny, nx = 400, 400
+   G = jnp.array(np.random.normal(0, 1, (ny, nx)).astype(np.float32))
+   x = jnp.ones(nx, dtype=np.float32)
+
+   # Create operator
+   Gop = JaxOperator(MatrixMult(G, dtype='float32'))
+
+   # Create data and invert
+   y = Gop @ x
+   xest = Gop / y
+
+   # Adjoint via AD
+   xadj = Gop.rmatvecad(x, y)
+
+Again, the code is almost unchanged apart from the fact that we now use ``jax`` arrays.
+
+
+Mixed CPU-GPU powered inverse problems
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Let us now consider a more intricate scenario where we have access to
+a GPU-powered operator, however the model and/or data vectors are too large
+to fit onto the GPU memory (or VRAM).
+
+For the sake of clarity, we consider a problem where
+the operator can be written as a :class:`pylops.BlockDiag` of
+PyLops operators. Note how, by simply sandwiching any of the GPU-powered
+operator within two :class:`pylops.ToCupy` operators, we are
+able to tell PyLops to transfer to the GPU only the part of the model vector
+required by a given operator and transfer back the output to the  CPU before
+forming the combine output vector (i.e., the output vector of the
+:class:`pylops.BlockDiag`).
+
+.. image:: _static/numpy_cupy_bd_diagram.png
+   :width: 1000
+   :align: center
+
+.. code-block:: python
+
+   nops, n = 5, 4
+   Ms = [np.diag((i + 1) * np.ones(n, dtype=dtype)) \
+            for i in range(nops)]
+   Ms = [M.T @ M for M in Ms]
+
+   # Create operator
+   Mops = []
+   for iop in range(nops):
+      Mop = MatrixMult(cp.asarray(Ms[iop], dtype=dtype))
+      Top = ToCupy(Mop.dims, dtype=dtype)
+      Top1 = ToCupy(Mop.dimsd, dtype=dtype)
+      Mop = Top1.H @ Mop @ Top
+      Mops.append(Mop)
+   Mops = BlockDiag(Mops, forceflat=True)
+
+   # Create data and invert
+   x = np.ones(n * nops, dtype=dtype)
+   y = Mops @ x.ravel()
+   xest = Mops / y
+
+
+Finally, let us consider a problem where
+the operator can be written as a :class:`pylops.VStack` of
+PyLops operators and the model vector can be fully transferred to the GPU.
+We can use again the :class:`pylops.ToCupy` operator, however this
+time we will only use it to move the output of each operator to the CPU.
+Since we are now in a special scenario, where the input of the overall
+operator sits on the GPU and the output on the
+CPU, we need to inform the :class:`pylops.VStack` operator about this.
+This can be easily done using the additional ``inoutengine`` parameter. Let's
+see this with an example.
+
+.. image:: _static/numpy_cupy_vs_diagram.png
+   :width: 1000
+   :align: center
+
+.. code-block:: python
+
+   nops, n, m = 3, 4, 5
+   Ms = [np.random.normal(0, 1, (n, m)) for _ in range(nops)]
+
+   # Create operator
+   Mops = []
+   for iop in range(nops):
+      Mop = MatrixMult(cp.asarray(Ms[iop]), dtype=dtype)
+      Top1 = ToCupy(Mop.dimsd, dtype=dtype)
+      Mop = Top1.H @ Mop
+      Mops.append(Mop)
+   Mops = VStack(Mops, inoutengine=("numpy", "cupy"))
+
+   # Create data and invert
+   x = cp.ones(m, dtype=dtype)
+   y = Mops @ x.ravel()
+   xest = pylops_cgls(Mops, y, x0=cp.zeros_like(x))[0]
+
+These features are currently not available for ``jax`` arrays.
+
+
+.. note::
+
+   More examples for the CuPy and JAX backends be found at `link1 <https://github.com/PyLops/pylops_notebooks/tree/master/developement-cupy>`_
+   and `link2 <https://github.com/PyLops/pylops_notebooks/tree/master/developement/Basic_JAX.ipynb>`_.
+
+
+Supported Operators
+-------------------
 
 In the following, we provide a list of methods in :class:`pylops.LinearOperator` with their current status (available on CPU,
 GPU with CuPy, and GPU with JAX):
@@ -195,6 +361,7 @@ Smoothing and derivatives:
      - |:white_check_mark:|
      - |:white_check_mark:|
 
+
 Signal processing:
 
 .. list-table::
@@ -322,6 +489,7 @@ Signal processing:
      - |:white_check_mark:|
      - |:white_check_mark:|
 
+
 Wave-Equation processing
 
 .. list-table::
@@ -369,6 +537,7 @@ Wave-Equation processing
      - |:red_circle:|
      - |:red_circle:|
 
+
 Geophysical subsurface characterization:
 
 .. list-table::
@@ -408,150 +577,3 @@ Geophysical subsurface characterization:
    in point 1 for the :class:`pylops.signalprocessing.Convolve1D` operator employed
    when ``explicit=False``.
 
-
-Examples
---------
-
-Finally, let's briefly look at some example. 
-
-End-to-end GPU powered inverse problems
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-First we consider the most common scenario when both the model and data 
-vectors fit onto the GPU memory. We can therefore simply replace all our 
-``numpy`` arrays with ``cupy`` arrays and solve the inverse problem of 
-interest end-to-end on the GPU. 
-
-Let's first write a code snippet using ``numpy`` arrays, which PyLops 
-will run on your CPU:
-
-.. code-block:: python
-
-   ny, nx = 400, 400
-   G = np.random.normal(0, 1, (ny, nx)).astype(np.float32)
-   x = np.ones(nx, dtype=np.float32)
-
-   # Create operator
-   Gop = MatrixMult(G, dtype='float32')
-   
-   # Create data and invert
-   y = Gop @ x
-   xest = Gop / y
-
-Now we write a code snippet using ``cupy`` arrays, which PyLops will run on
-your GPU:
-
-.. code-block:: python
-
-   ny, nx = 400, 400
-   G = cp.random.normal(0, 1, (ny, nx)).astype(np.float32)
-   x = cp.ones(nx, dtype=np.float32)
-
-   # Create operator
-   Gop = MatrixMult(G, dtype='float32')
-   
-   # Create data and invert
-   y = Gop @ x
-   xest = Gop / y
-
-The code is almost unchanged apart from the fact that we now use ``cupy`` arrays,
-PyLops will figure this out.
-
-Similarly, we write a code snippet using ``jax`` arrays which PyLops will run on
-your GPU/TPU:
-
-.. code-block:: python
-
-   ny, nx = 400, 400
-   G = jnp.array(np.random.normal(0, 1, (ny, nx)).astype(np.float32))
-   x = jnp.ones(nx, dtype=np.float32)
-
-   # Create operator
-   Gop = JaxOperator(MatrixMult(G, dtype='float32'))
-   
-   # Create data and invert
-   y = Gop @ x
-   xest = Gop / y
-
-   # Adjoint via AD
-   xadj = Gop.rmatvecad(x, y)
-
-Again, the code is almost unchanged apart from the fact that we now use ``jax`` arrays.
-
-
-Mixed CPU-GPU powered inverse problems
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Let us now consider a more intricate scenario where we have acess to 
-a GPU-powered operator, however the model and/or data vectors are too large 
-to fit onto the memory. 
-
-For the sake of clarity, we consider a problem where 
-the operator can be written as a :class:`pylops.BlockDiag` of
-PyLops operators. Note how, by simply sandwitching any of the GPU-powered 
-operator within two :class:`pylops.ToCupy` operators, we are
-able to tell PyLops to transfer to the GPU only the part of the model vector 
-required by a given operator and transfer back the output to the  CPU before 
-forming the combine output vector (i.e., the output vector of the 
-:class:`pylops.BlockDiag`)
-
-.. code-block:: python
-
-   nops, n = 5, 4
-   Ms = [np.diag((i + 1) * np.ones(n, dtype=dtype)) \
-            for i in range(nops)]
-   Ms = [M.T @ M for M in Ms]
-
-   # Create operator
-   Mops = []
-   for iop in range(nops):
-      Mop = MatrixMult(cp.asarray(Ms[iop], dtype=dtype))
-      Top = ToCupy(Mop.dims, dtype=dtype)
-      Top1 = ToCupy(Mop.dimsd, dtype=dtype)
-      Mop = Top1.H @ Mop @ Top
-      Mops.append(Mop)
-   Mops = BlockDiag(Mops, forceflat=True)
-
-   # Create data and invert
-   x = np.ones(n * nops, dtype=dtype)
-   y = Mops @ x.ravel()
-   xest = Mops / y
-
-
-Finally, let us consider a problem where 
-the operator can be written as a :class:`pylops.VStack` of
-PyLops operators and the model vector can be fully transferred to the GPU. 
-We can use again the :class:`pylops.ToCupy` operator, however this
-time we will only use it to move the output of each operator to the CPU. 
-Since we are now in a special scenario, where the input of the overall 
-operator sits on the GPU and the output on the
-CPU, we need to inform the :class:`pylops.VStack` operator about this.
-This can be easily done using the additional ``inoutengine`` parameter. Let's
-see this with an example:
-
-.. code-block:: python
-
-   nops, n, m = 3, 4, 5
-   Ms = [np.random.normal(0, 1, (n, m)) for _ in range(nops)]
-
-   # Create operator
-   Mops = []
-   for iop in range(nops):
-      Mop = MatrixMult(cp.asarray(Ms[iop]), dtype=dtype)
-      Top1 = ToCupy(Mop.dimsd, dtype=dtype)
-      Mop = Top1.H @ Mop
-      Mops.append(Mop)
-   Mops = VStack(Mops, inoutengine=("numpy", "cupy"))
-
-   # Create data and invert
-   x = cp.ones(m, dtype=dtype)
-   y = Mops @ x.ravel()
-   xest = pylops_cgls(Mops, y, x0=cp.zeros_like(x))[0]
-
-**Note:**: this feature is currently not available for ``jax`` arrays.
-
-
-.. note::
-
-   More examples for the CuPy and JAX backends be found `here <https://github.com/PyLops/pylops_notebooks/tree/master/developement-cupy>`_
-   and `here <https://github.com/PyLops/pylops_notebooks/tree/master/developement/Basic_JAX.ipynb>`_.