Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygmt.binstats: Add alias "tiling" for "T" #2409

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions pygmt/src/binstats.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
I="spacing",
N="normalize",
R="region",
T="tiling",
S="search_radius",
yvonnefroehlich marked this conversation as resolved.
Show resolved Hide resolved
V="verbose",
W="weight",
Expand Down Expand Up @@ -87,6 +88,21 @@ def binstats(data, **kwargs):
Sets the *search_radius* that determines which data points are
considered close to a node. Append the distance unit.
Not compatible with ``tiling``.
tiling : str
**h**\|\ **r**.
Instead of circular, possibly overlapping areas, select
non-overlapping tiling. Choose between **r**\ ectangular and
**h**\ exagonal binning.
For **r**, set bin sizes via ``spacing`` and we write
the computed statistics to the grid file named in ``outgrid``.
For **h**, we write a table with the centers of the hexagons and
the computed statistics to standard output (or to the file named
in ``outgrid``). Here, the ``spacing`` setting is expected to
Comment on lines +98 to +100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tiling="h" outputs the statistics to a table, which was why this alias was not added in the orginal implementation, see #1652 (comment).

The way we've handle functions that output to either a grid or table was discussed in #1536, which is to use a "Two methods in a single Python class" like pygmt.triangulate and pygmt.grdhisteq. Unfortunately, if we follow this style, that would require breaking changes to pygmt.binstats 🙂

Copy link
Member

@weiji14 weiji14 Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so @maxrjones mentioned at #1652 (review) that the flag to output statistics to a table is actually more similar to #896 (which is for fig.* plotting functions)? But gmtbinstats is not a plotting function, so this is kinda awkward 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @weiji14 for this clarification and background information. As tiling is already used in the docstring for search_radius I thought that adding the doctstring for tiling was simply overlooked...

search_radius (float or str) – Set the search_radius that determines which data points are considered close to a node. Append the distance unit. Not compatible with tiling.

I am unsure how to continue with this PR. Should we leave it open or should we have a separate issue for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure how to continue with this PR. Should we leave it open or should we have a separate issue for this?

Let's leave the PR open for now, the discussion on whether to turn pygmt.binstats into two separate methods (e.g. pygmt.binstats.to_grid or pygmt.binstats.to_table) can happen here. Anything more general can also be discussed on the open thread at #1536.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand this means that it's also not possible to plot the hexagons outlines (polygons )that are used to calculate the stats but to get the center coordinates and stats values for the corresponding hexagon?

Copy link
Member Author

@yvonnefroehlich yvonnefroehlich Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelgrund what is your concrete use-case here? Please note, that hexagonal binning is only available for Cartesian data and that only the y-increment can be given and GMT calculates the x-increment given the geometry (https://docs.generic-mapping-tools.org/latest/gmtbinstats.html#t). Based on the upstream GMT documentation, I feel it is not possible to get the outline of the hexagons directly.

Are you looking for figures like this:

outlines outlines + color-coding
binstats_hexagons_outline binstats_hexagons_outline_cmap

In some cases, one can try to determine the (regular) hexagon based on the center of the hexagon (first two columns of the output file via outgrid) and the spacing between the centers in y-direction (value passed to spacing). The outline can then be plotted (be careful with width and height of the figure) using the centers of the hexagon and style="h" + str(determined_diameter_of_hexagon) + "c". Additionally, the stats output can be used to color-code the fill of the hexagons (please un-comment the corresponding parts in the code below). However, I feel there are other libraries / packages which are more suitable to generate such a visualization.

Code to reproduce the figures shown above:
Test data (Please place it in your work dictionary): test_data_in.txt

import numpy as np
import pygmt as gmt

# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# >>> Adjust for your needs <<<
#
# Give y increment
# Passed to spacing
# In case of hexagon binning only the y increment can be given;
# the x increment is calculated based on the given geometry
# Please see the upsteam GMT documentation
# https://docs.generic-mapping-tools.org/dev/gmtbinstats.html#t
# Last access: 2023/06/01
bin_spacing_y = 1.3

# Give study region
x_min = -1
x_max = 7
y_min = -1
y_max = 5
study_region = [x_min, x_max, y_min, y_max]

# Give size of figure
fig_width = 7

# Give paths and file names
data_in = "test_data_in.txt"
stats_out = "stats_out.txt"
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


#-----------------------------------------------------------------------------
# Be carefulle if you change something here
delta_x = np.abs(x_max - x_min)
delta_y = np.abs(y_max - y_min)

scale_x = fig_width / delta_x

fig_hight = delta_y * scale_x

# Based on the geometry of a regular hexagon
diag_hexagon_temp = ( (bin_spacing_y/2) / np.cos(2*np.pi/360 * 30) ) * 2
diag_hexagon = diag_hexagon_temp * scale_x
bin_spacing_x = diag_hexagon * np.sqrt(3)


#-----------------------------------------------------------------------------
# Bin tabular data based on hexagons
# Calculate stats within hexagons here counts
# Write output to file: center of hexagons and stats
gmt.binstats(
    data=data_in,
    region=study_region,
    spacing=str(bin_spacing_y) + "+e",
    # +e to slightly adjust the max x or y to fit exactly the
    # given increment if needed [Default is to slightly adjust the increment
    # to fit the given domain].
    T="h",  # tiling using hexagons - only for Cartesian data
    statistic="n",  # n counts, a average, m median, z sum
    outgrid=stats_out,
)


#-----------------------------------------------------------------------------
# Plot statistic output
fig = gmt.Figure()

fig.basemap(
    region=study_region,
    projection="X" + str(fig_width) + "c/" + str(fig_hight) + "c",
    frame="a1g1f0.5",
)

# Set up colormap for counts per hexagon
trans_stats = 20
gmt.makecpt(
    cmap="grayC",
    series=[1, 6, 1],
    transparency=trans_stats,
)

# Plot hexagons color-coded by counts
fig.plot(
    data=stats_out,
    style="h" + str(diag_hexagon) + "c",
    pen="0.3p,darkred",
    # cmap=True,    # un-comment for color-coding
    # transparency=trans_stats,  # un-comment for color-coding
)

# Plot centers of hexagons
fig.plot(
    data=stats_out,
    style="p0.15",
    fill="darkred",
)

# Add labels with counts per hexagon
fig.text(
    textfiles=stats_out,
    font="5p",
    offset="0c/-0.2c",
    fill="white@30"
)

# Plot single records
fig.plot(
    data=data_in,
    style="p0.1",
    fill="darkorange",
)

# fig.colorbar(frame="xa1+lcounts within hexagon")  # un-comment for color-coding

fig.show()
# fig.savefig("binstats_hexagons_outline.png")

set the y-increment only and we compute the x-increment given
the geometry. Because the horizontal spacing between hexagon
centers in x and y have a ratio of :math:`sqrt(3)`, we will
automatically adjust *xmax* in ``region`` to fit a whole number
yvonnefroehlich marked this conversation as resolved.
Show resolved Hide resolved
of hexagons. **Note**: Hexagonal tiling requires Cartesian data.
weight : str
Input data have an extra column containing observation point weight.
If weights are given then weighted statistical quantities will be
Expand Down