-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the virtualfile_in function to accept more 1-D arrays #2744
Conversation
49b1b3f
to
5512d2f
Compare
5512d2f
to
70fc9e4
Compare
70fc9e4
to
66c4b97
Compare
pygmt/helpers/utils.py
Outdated
@@ -15,127 +15,133 @@ | |||
from pygmt.exceptions import GMTInvalidInput | |||
|
|||
|
|||
def _validate_data_input( | |||
data=None, x=None, y=None, z=None, required_z=False, required_data=True, kind=None | |||
def validate_data_input( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more useful to pass the list of column names instead, i.e., replacing ncols=2
with names=["x", "y"]
.
So, for most modules, vectors=["x", "y"]
and names=["x", "y"]
or vectors=[x, y, z]
and names=["x", "y", "z"]
.
For more complicated modules like plot
or plot3d
, the names can be
names=["x", "y", "direction_arg1", "direction_arg2", "fill", "size", "symbol", "transparency"]
.
The column names will be very useful when the GMTInvalidInput exception is raised.
For example, instead of "Column 5 can't be None."
, we can say "Column 5 ('size') can't be None."
. Instead of "data must have at least 8 columns."
, we can say
data must have at least 8 columns:
x y direction_arg1 direction_arg2 fill size symbol transparency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in f37413b
if len(vectors) < len(names): | ||
raise GMTInvalidInput( | ||
f"Requires {len(names)} 1-D arrays but got {len(vectors)}." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing unit test for this if-condition.
if len(data.shape) == 1 and data.shape[0] < len(names): | ||
raise GMTInvalidInput(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing unit test for this if-condition.
vectors, names = [x, y], "xy" | ||
if z is not None: | ||
vectors.append(z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to append 'z' to names
here? Also, need a unit test for this if-condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no. The problem is that project
only requires two columns, but three or more columns are required. Currently, the variable names
are used for two purposes: (1) names of passed columns; (2) the number of columns. So, if we append z
to names
here, the calling pygmt.project(data=data)
will fail if data
has only two columns. I think we still need to maintain a variable for the number of required columns.
pygmt/clib/session.py
Outdated
kind = data_kind(data, required=required_data) | ||
validate_data_input( | ||
data=data, | ||
vectors=vectors, | ||
names=names, | ||
required_data=required_data, | ||
kind=kind, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation checks have been moved from within data_kind
to virtualfile_from_data
here. But in plot.py
, we actually use data_kind
on its own here:
Line 217 in 3076ddc
kind = data_kind(data, x, y) |
Are we ok with raising GMTInvalidInput much later here in virtualfile_from_data
(after all the keyword argument parsing), rather than early on in data_kind
?
Closing this PR since it will be superseded by #3369. |
Description of proposed changes
Here are the current definitions of the
virtualfile_from_data
method and thedata_kind
function:pygmt/pygmt/clib/session.py
Lines 1473 to 1483 in c9d6147
pygmt/pygmt/helpers/utils.py
Line 110 in c9d6147
When I started issue #2731, I realized the current function definitions have some limitations:
binstats
usually requires 3 columns (x/y/z), but only requires 2 columns (x/y) if-Cn
is used, and requires 4 columns (x/y/z/w) if-W
is used. I don't think we want to addw=None
andrequired_w=False
to these functions. Also, we don't check if the input table has the required number of columns.data_kind
function does three things: (1) determines thekind
of the input data, and (2) checks if the data/x/y/z combinations are valid; and (3) checks if thematrix
-type data has 3 columns. Thedata_kind
function is called insidevirtualfile_from_data
, but sometimes we need to know in data kind when wrapping GMT modules, for example, inFigure.plot
andFigure.plot3d
. It means thedata_kind
function is called twice, which is not necessary.Solutions:
virtualfile_from_data
function likedata=None, vectors=None, names=["x", "y"]
.vectors
is a list of vectors (e.g.,vectors=[x, y])
andnames
is a list of column names. The wrappers are responsible for preparing the list of 1-D arrays (vectors
) and counting the column names (names
).data_kind
focus on determining the data kind and have another separate function to check if the input data/vectors are valid.