Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap grdhisteq #1433

Merged
merged 61 commits into from
Mar 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
b48cad9
Add grdhisteq processing function
maxrjones Jul 29, 2021
1cf0a95
Merge branch 'main' into grdhisteq
maxrjones Aug 10, 2021
1dbfff5
Merge branch 'main' into grdhisteq
maxrjones Sep 19, 2021
e248dcd
Apply suggestions from code review
maxrjones Sep 19, 2021
e36c93a
Support table or grid output
maxrjones Sep 19, 2021
67f4e41
Only accept int for divisions
maxrjones Sep 19, 2021
556dc47
Add more code supporting table or grid output
maxrjones Sep 19, 2021
40fbbbe
Update test
maxrjones Sep 19, 2021
e04ef8a
Update docstring
maxrjones Sep 19, 2021
6602098
Improve handling of tabular output
maxrjones Sep 20, 2021
94fd588
Add more tests
maxrjones Sep 20, 2021
cc6fde1
Fix function names
maxrjones Sep 20, 2021
9cd2cbc
Remove -h common option
maxrjones Sep 20, 2021
29f14e7
Merge branch 'main' into grdhisteq
maxrjones Sep 20, 2021
68e23dd
Support outgrid=None
maxrjones Sep 20, 2021
7512225
Update implementation to require outfile or outgrid
maxrjones Sep 20, 2021
b8635ee
Merge branch 'main' into grdhisteq
maxrjones Oct 1, 2021
a021d30
Try out class structure for grid versus table output
maxrjones Oct 1, 2021
d908c99
Merge branch 'main' into grdhisteq
maxrjones Oct 4, 2021
c528c66
Merge branch 'main' into grdhisteq
maxrjones Dec 27, 2021
2078120
Remove workaround for GMT 6.2 bug
maxrjones Dec 27, 2021
a08c724
Apply suggestions from code review
maxrjones Feb 4, 2022
222cf82
Merge branch 'main' into grdhisteq
maxrjones Feb 4, 2022
2b69196
Merge branch 'main' into grdhisteq
maxrjones Feb 25, 2022
e3d0678
Separate out parsing of D and G options
maxrjones Feb 25, 2022
6f3e23f
Update tests to use static_earth_relief
maxrjones Feb 25, 2022
b29ef79
Fix docstring
maxrjones Feb 25, 2022
d985b98
Do not use aliases in public grdhisteq functions
maxrjones Feb 27, 2022
a41423a
Format
maxrjones Feb 27, 2022
b84c745
Add default column names
maxrjones Feb 28, 2022
0338837
Format
maxrjones Feb 28, 2022
ed5a9ff
Add inline examples
maxrjones Feb 28, 2022
02ed8ce
Merge branch 'main' into grdhisteq
maxrjones Feb 28, 2022
0d1e6fb
Format
maxrjones Feb 28, 2022
c6d5368
Format
maxrjones Feb 28, 2022
eec3dc7
Merge branch 'main' into grdhisteq
maxrjones Mar 2, 2022
7031d93
Fix tests
maxrjones Mar 2, 2022
d479a29
Apply suggestions from code review
maxrjones Mar 3, 2022
325f72c
Remove two extra blank lines
maxrjones Mar 3, 2022
d7c22e0
Fix wording
maxrjones Mar 3, 2022
7cccc33
Apply suggestions from code review
maxrjones Mar 3, 2022
d6d8a69
Update dtypes
maxrjones Mar 3, 2022
4077f14
Merge branch 'main' into grdhisteq
maxrjones Mar 3, 2022
76ba9f0
Add note about weighted equalization
maxrjones Mar 3, 2022
32757d8
Try to improve wording in example
maxrjones Mar 3, 2022
938b729
Apply suggestions from code review
maxrjones Mar 3, 2022
71ed9bb
Set index for DataFrame output
maxrjones Mar 3, 2022
492980f
Update docstring example
maxrjones Mar 3, 2022
09f1385
Add test for invalid input
maxrjones Mar 3, 2022
77937cf
Add test for numpy output
maxrjones Mar 3, 2022
c5afdcc
Apply suggestions from code review
maxrjones Mar 4, 2022
245c560
Increase code coverage
maxrjones Mar 4, 2022
08f3d47
Merge branch 'main' into grdhisteq
maxrjones Mar 4, 2022
c5e430f
Remove redundant docs
maxrjones Mar 8, 2022
5017440
Do not pass tmpfile to _grdhisteq
maxrjones Mar 8, 2022
8b5fc49
Make output_type a required argument
maxrjones Mar 8, 2022
ace9916
Merge branch 'main' into grdhisteq
maxrjones Mar 8, 2022
75f61e3
Use pytest-doctestplus for skipping doctests
maxrjones Mar 8, 2022
ad23923
Apply suggestions from code review
maxrjones Mar 8, 2022
25c1a28
Only allow header argument for file output
maxrjones Mar 8, 2022
8df3492
Merge branch 'main' into grdhisteq
maxrjones Mar 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,9 @@ Operations on raster data
grdfill
grdfilter
grdgradient
grdhisteq
grdhisteq.equalize_grid
grdhisteq.compute_bins
weiji14 marked this conversation as resolved.
Show resolved Hide resolved
grdlandmask
grdproject
grdsample
Expand Down
1 change: 1 addition & 0 deletions pygmt/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
grdfill,
grdfilter,
grdgradient,
grdhisteq,
grdinfo,
grdlandmask,
grdproject,
Expand Down
1 change: 1 addition & 0 deletions pygmt/src/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from pygmt.src.grdfill import grdfill
from pygmt.src.grdfilter import grdfilter
from pygmt.src.grdgradient import grdgradient
from pygmt.src.grdhisteq import grdhisteq
from pygmt.src.grdimage import grdimage
from pygmt.src.grdinfo import grdinfo
from pygmt.src.grdlandmask import grdlandmask
Expand Down
349 changes: 349 additions & 0 deletions pygmt/src/grdhisteq.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,349 @@
"""
grdhisteq - Perform histogram equalization for a grid.
"""
import warnings

import numpy as np
import pandas as pd
maxrjones marked this conversation as resolved.
Show resolved Hide resolved
from pygmt.clib import Session
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import (
GMTTempFile,
build_arg_string,
fmt_docstring,
kwargs_to_strings,
use_alias,
)
from pygmt.io import load_dataarray

__doctest_skip__ = ["grdhisteq.*"]


class grdhisteq: # pylint: disable=invalid-name
r"""
Perform histogram equalization for a grid.
maxrjones marked this conversation as resolved.
Show resolved Hide resolved

Two common use cases of :meth:`pygmt.grdhisteq` are to find data values
that divide a grid into patches of equal area
(:meth:`pygmt.grdhisteq.compute_bins`) or to write a grid with
statistics based on some kind of cumulative distribution function
(:meth:`pygmt.grdhisteq.equalize_grid`).

Histogram equalization provides a way to highlight data that has most
values clustered in a small portion of the dynamic range, such as a
grid of flat topography with a mountain in the middle. Ordinary gray
shading of this grid (using :meth:`pygmt.Figure.grdimage` or
:meth:`pygmt.Figure.grdview`) with a linear mapping from topography to
graytone will result in most of the image being very dark gray, with the
mountain being almost white. :meth:`pygmt.grdhisteq.compute_bins` can
provide a list of data values that divide the data range into divisions
which have an equal area in the image [Default is 16 if ``divisions`` is
not set]. The :class:`pandas.DataFrame` or ASCII file output can be used to
make a colormap with :meth:`pygmt.makecpt` and an image with
:meth:`pygmt.Figure.grdimage` that has all levels of gray occuring
equally.

:meth:`pygmt.grdhisteq.equalize_grid` provides a way to write a grid with
statistics based on a cumulative distribution function. In this
application, the ``outgrid`` has relative highs and lows in the same
(x,y) locations as the ``grid``, but the values are changed to reflect
their place in the cumulative distribution.
"""

@staticmethod
@fmt_docstring
@use_alias(
C="divisions",
D="outfile",
G="outgrid",
R="region",
N="gaussian",
Q="quadratic",
V="verbose",
seisman marked this conversation as resolved.
Show resolved Hide resolved
h="header",
)
@kwargs_to_strings(R="sequence")
def _grdhisteq(grid, output_type, **kwargs):
r"""
Perform histogram equalization for a grid.

Must provide ``outfile`` or ``outgrid``.

Full option list at :gmt-docs:`grdhisteq.html`

{aliases}

Parameters
----------
grid : str or xarray.DataArray
The file name of the input grid or the grid loaded as a DataArray.
outgrid : str or bool or None
The name of the output netCDF file with extension .nc to store the
grid in.
outfile : str or bool or None
The name of the output ASCII file to store the results of the
histogram equalization in.
output_type: str
Determines the output type. Use "file", "xarray", "pandas", or
"numpy".
divisions : int
Set the number of divisions of the data range [Default is 16].

{R}
{V}
maxrjones marked this conversation as resolved.
Show resolved Hide resolved
{h}

Returns
-------
ret: pandas.DataFrame or xarray.DataArray or None
Return type depends on whether the ``outgrid`` parameter is set:

- xarray.DataArray if ``output_type`` is "xarray""
- numpy.ndarray if ``output_type`` is "numpy"
- pandas.DataFrame if ``output_type`` is "pandas"
- None if ``output_type`` is "file" (output is stored in
``outgrid`` or ``outfile``)

See Also
-------
:meth:`pygmt.grd2cpt`
"""

with Session() as lib:
file_context = lib.virtualfile_from_data(check_kind="raster", data=grid)
with file_context as infile:
arg_str = " ".join([infile, build_arg_string(kwargs)])
lib.call_module("grdhisteq", arg_str)

if output_type == "file":
return None
if output_type == "xarray":
return load_dataarray(kwargs["G"])

result = pd.read_csv(
filepath_or_buffer=kwargs["D"],
sep="\t",
header=None,
names=["start", "stop", "bin_id"],
dtype={
"start": np.float32,
"stop": np.float32,
"bin_id": np.uint32,
},
)
if output_type == "numpy":
return result.to_numpy()

return result.set_index("bin_id")

@staticmethod
@fmt_docstring
def equalize_grid(
grid,
*,
outgrid=True,
divisions=None,
region=None,
gaussian=None,
quadratic=None,
verbose=None,
):
r"""
Perform histogram equalization for a grid.

maxrjones marked this conversation as resolved.
Show resolved Hide resolved
:meth:`pygmt.grdhisteq.equalize_grid` provides a way to write a grid
with statistics based on a cumulative distribution function. The
``outgrid`` has relative highs and lows in the same (x,y) locations as
the ``grid``, but the values are changed to reflect their place in the
cumulative distribution.

Full option list at :gmt-docs:`grdhisteq.html`

Parameters
----------
grid : str or xarray.DataArray
The file name of the input grid or the grid loaded as a DataArray.
outgrid : str or bool or None
The name of the output netCDF file with extension .nc to store the
grid in.
divisions : int
Set the number of divisions of the data range.
gaussian : bool or int or float
*norm*.
Produce an output grid with standard normal scores using
``gaussian=True`` or force the scores to fall in the ±\ *norm*
range.
quadratic: bool
Perform quadratic equalization [Default is linear].
{R}
{V}

Returns
-------
ret: xarray.DataArray or None
Return type depends on the ``outgrid`` parameter:

- xarray.DataArray if ``outgrid`` is True or None
- None if ``outgrid`` is a str (grid output is stored in
``outgrid``)

Example
-------
>>> import pygmt
>>> # Load a grid of @earth_relief_30m data, with an x-range of 10 to
>>> # 30, and a y-range of 15 to 25
>>> grid = pygmt.datasets.load_earth_relief(
... resolution="30m", region=[10, 30, 15, 25]
... )
>>> # Create a new grid with a Gaussian data distribution
>>> grid = pygmt.grdhisteq.equalize_grid(grid=grid, gaussian=True)

See Also
-------
:meth:`pygmt.grd2cpt`

Notes
-----
This method does a weighted histogram equalization for geographic
grids to account for node area varying with latitude.
"""
# Return an xarray.DataArray if ``outgrid`` is not set
with GMTTempFile(suffix=".nc") as tmpfile:
if isinstance(outgrid, str):
output_type = "file"
else:
output_type = "xarray"
outgrid = tmpfile.name
return grdhisteq._grdhisteq(
grid=grid,
output_type=output_type,
outgrid=outgrid,
divisions=divisions,
region=region,
gaussian=gaussian,
quadratic=quadratic,
verbose=verbose,
)

@staticmethod
@fmt_docstring
def compute_bins(
grid,
*,
output_type="pandas",
outfile=None,
divisions=None,
quadratic=None,
verbose=None,
region=None,
maxrjones marked this conversation as resolved.
Show resolved Hide resolved
header=None,
):
r"""
Perform histogram equalization for a grid.

Histogram equalization provides a way to highlight data that has most
values clustered in a small portion of the dynamic range, such as a
grid of flat topography with a mountain in the middle. Ordinary gray
shading of this grid (using :meth:`pygmt.Figure.grdimage` or
:meth:`pygmt.Figure.grdview`) with a linear mapping from topography to
graytone will result in most of the image being very dark gray, with
the mountain being almost white. :meth:`pygmt.grdhisteq.compute_bins`
can provide a list of data values that divide the data range into
divisions which have an equal area in the image [Default is 16 if
``divisions`` is not set]. The :class:`pandas.DataFrame` or ASCII file
output can be used to make a colormap with :meth:`pygmt.makecpt` and an
image with :meth:`pygmt.Figure.grdimage` that has all levels of gray
occuring equally.

Full option list at :gmt-docs:`grdhisteq.html`

maxrjones marked this conversation as resolved.
Show resolved Hide resolved
Parameters
----------
grid : str or xarray.DataArray
The file name of the input grid or the grid loaded as a DataArray.
outfile : str or bool or None
The name of the output ASCII file to store the results of the
histogram equalization in.
output_type : str
Determine the format the xyz data will be returned in [Default is
``pandas``]:

- ``numpy`` - :class:`numpy.ndarray`
- ``pandas``- :class:`pandas.DataFrame`
- ``file`` - ASCII file (requires ``outfile``)
divisions : int
Set the number of divisions of the data range.
quadratic : bool
Perform quadratic equalization [Default is linear].
{R}
{V}
maxrjones marked this conversation as resolved.
Show resolved Hide resolved
{h}

Returns
-------
ret: pandas.DataFrame or None
Return type depends on the ``outfile`` parameter:

- pandas.DataFrame if ``outfile`` is True or None
- None if ``outfile`` is a str (file output is stored in
``outfile``)

Example
-------
>>> import pygmt
>>> # Load a grid of @earth_relief_30m data, with an x-range of 10 to
>>> # 30, and a y-range of 15 to 25
>>> grid = pygmt.datasets.load_earth_relief(
... resolution="30m", region=[10, 30, 15, 25]
... )
>>> # Find elevation intervals that splits the data range into 5
>>> # divisions, each of which have an equal area in the original grid.
>>> bins = pygmt.grdhisteq.compute_bins(grid=grid, divisions=5)
>>> print(bins)
start stop
bin_id
0 179.0 397.5
1 397.5 475.5
2 475.5 573.5
3 573.5 710.5
4 710.5 2103.0

maxrjones marked this conversation as resolved.
Show resolved Hide resolved
See Also
-------
:meth:`pygmt.grd2cpt`

Notes
-----
This method does a weighted histogram equalization for geographic
grids to account for node area varying with latitude.
"""
# Return a pandas.DataFrame if ``outfile`` is not set
if output_type not in ["numpy", "pandas", "file"]:
raise GMTInvalidInput(
maxrjones marked this conversation as resolved.
Show resolved Hide resolved
"Must specify 'output_type' either as 'numpy', 'pandas' or 'file'."
)

if header is not None and output_type != "file":
raise GMTInvalidInput("'header' is only allowed with output_type='file'.")

if isinstance(outfile, str) and output_type != "file":
msg = (
f"Changing 'output_type' from '{output_type}' to 'file' "
"since 'outfile' parameter is set. Please use output_type='file' "
"to silence this warning."
)
warnings.warn(message=msg, category=RuntimeWarning, stacklevel=2)
output_type = "file"
Comment on lines +329 to +336
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should raise an exception here rather than give a warning? Sometimes people just ignore warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bit of a discussion about this following this comment #1284 (review). See #1284 (comment) for a specific discussion about raising a warning versus exception. I think we should follow that lead, although we could move this to a helper function to make it more clear that the logic is used for multiple modules.

with GMTTempFile(suffix=".txt") as tmpfile:
if output_type != "file":
outfile = tmpfile.name
return grdhisteq._grdhisteq(
grid,
output_type=output_type,
outfile=outfile,
divisions=divisions,
quadratic=quadratic,
verbose=verbose,
region=region,
maxrjones marked this conversation as resolved.
Show resolved Hide resolved
header=header,
)
Loading