Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "incols" (-i) parameter doesn't works for vector input correctly #1313

Closed
michaelgrund opened this issue Jun 1, 2021 · 24 comments · Fixed by #1303 or #1692
Closed

The "incols" (-i) parameter doesn't works for vector input correctly #1313

michaelgrund opened this issue Jun 1, 2021 · 24 comments · Fixed by #1303 or #1692
Assignees
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT
Milestone

Comments

@michaelgrund
Copy link
Member

michaelgrund commented Jun 1, 2021

Description of the problem

Using the incols parameter does not allow to swap columns (seems like the parameter is ignored at all so far). I believe it's an upstream GMT bug. This was discovered in #1303 (comment).

Full code that generated the error

import pygmt
import numpy as np

# generate sample data
region = [10, 70, -5, 10]
x, y = np.meshgrid(
np.linspace(region[0], region[1]), np.linspace(region[2], region[3])
    )
x = x.flatten()
y = y.flatten()
z = (x - 0.5 * (region[0] + region[1])) ** 2 + 4 * y ** 2
z = np.exp(-z / 10 ** 2 * np.log(2))

# generate dataframe
data = np.array([x, y, z]).T

# correct figure
fig1 = pygmt.Figure()
fig1.contour(data=data,
            projection="X10c",
            frame="a",
            pen=True)

fig1.show()

# generate second dataframe, switch x and y from here onwards to simulate 
# different column order
data = np.array([y, x, z]).T

# wrong figure
fig2 = pygmt.Figure()
fig2.contour(data=data,
            incols=[1, 0, 2], # use incols to assign column order
            projection="X10c",
            frame="a",
            pen=True)

fig2.show()

Full output

fig1: expected fig2: actual (wrong)
correct incorrect

System information

Please paste the output of python -c "import pygmt; pygmt.show_versions()":

PyGMT information:
  version: v0.2.2.dev80+g132757a
System information:
  python: 3.6.12 | packaged by conda-forge | (default, Dec  9 2020, 00:36:02)  [GCC 9.3.0]
  executable: /home/mgrund/anaconda3/envs/pygmt/bin/python
  machine: Linux-4.4.0-19041-Microsoft-x86_64-with-debian-buster-sid
Dependency information:
  numpy: 1.19.4
  pandas: 1.1.5
  xarray: 0.16.2
  netCDF4: 1.5.6
  packaging: 20.8
  ghostscript: 9.53.3
  gmt: 6.2.0rc1
GMT library information:
  binary dir: /home/mgrund/anaconda3/envs/pygmt/bin
  cores: 8
  grid layout: rows
  library path: /home/mgrund/anaconda3/envs/pygmt/lib/libgmt.so
  padding: 2
  plugin dir: /home/mgrund/anaconda3/envs/pygmt/lib/gmt/plugins
  share dir: /home/mgrund/anaconda3/envs/pygmt/share/gmt
  version: 6.2.0rc1

@PaulWessel @meghanrjones could you please check if you can debug it?

@michaelgrund michaelgrund added bug Something isn't working upstream Bug or missing feature of upstream core GMT labels Jun 1, 2021
@michaelgrund michaelgrund added this to the 0.4.0 milestone Jun 1, 2021
@PaulWessel
Copy link
Member

Do not think this is upstream bug. Similar test with switching x,y then passing -i works as expected.

@seisman
Copy link
Member

seisman commented Jun 1, 2021

Yes, I think this is an upstream bug, but it only affects wrappers, that's why the GMT tests pass.

Here is a modified example based on @michaelgrund's example:

import pygmt
import numpy as np

# generate sample data
region = [10, 70, -5, 10]
x, y = np.meshgrid(np.linspace(region[0], region[1]), np.linspace(region[2], region[3]))
x = x.flatten()
y = y.flatten()
z = (x - 0.5 * (region[0] + region[1])) ** 2 + 4 * y ** 2
z = np.exp(-z / 10 ** 2 * np.log(2))

# generate dataframe
data = np.array([x, y, z]).T

# correct figure
fig1 = pygmt.Figure()
fig1.contour(data=data, projection="X10c", frame="a", pen=True)
fig1.savefig("array-xyz.png")

# correct
np.savetxt("xyz.dat", data)
fig3 = pygmt.Figure()
fig3.contour(data="xyz.dat", projection="X10c", frame="a", pen=True)
fig3.savefig("file-xyz.png")


# generate second dataframe, switch x and y from here onwards to simulate
# different column order
data = np.array([y, x, z]).T

# wrong figure
fig2 = pygmt.Figure()
fig2.contour(data=data, projection="X10c", frame="a", pen=True, incols=[1, 0, 2])
fig2.savefig("array-yxz.png")

# correct
np.savetxt("yxz.dat", data)
fig4 = pygmt.Figure()
fig4.contour(data="yxz.dat", projection="X10c", frame="a", pen=True, incols=[1, 0, 2])
fig4.savefig("file-yxz.png")
array-xyz file-xyz array-yxz file-yxz
array-xyz file-xyz array-yxz file-yxz

As you can see, passing "YXZ" data and using incols=[1, 0, 2] works well for file input, but not for an array input.

@PaulWessel
Copy link
Member

Debugging the wrong figure case now. We call gmt info to determine the region. I see that the list of options passed to contour includes the memoryfile, the projection, the -Ba, the -W and then there is a -incols0 string which of course is not parsed properly. Presumably it is pygmt that generates this string of options?

print (*options)->next->next->next->next->option
(char) $4 = 'i'
print (*options)->next->next->next->next->arg
(char *) $3 = 0x00007fc949c7df00 "ncols0"

@maxrjones
Copy link
Member

I think to use incols Paul would need to build from this branch (#1303) and that the master branch uses columns as the alias name. Rather than checkout out that branch and building, you could just change incols= to columns= or i=.

@seisman
Copy link
Member

seisman commented Jun 1, 2021

I think to use incols Paul would need to build from this branch (#1303) and that the master branch uses columns as the alias name. Rather than checkout out that branch and building, you could just change incols= to columns= or i=.

Yes, forgot to mention that #1303 is needed. Sorry about that.

@PaulWessel
Copy link
Member

So I am debugging the C code. I cannot change anything in these strings in the option structure I think. Is that what you are suggesting?

@maxrjones
Copy link
Member

Just suggesting to use this for the debugging if you build from master branch to avoid the pygmt alias parsing:

# wrong figure
fig2 = pygmt.Figure()
fig2.contour(data=data, projection="X10c", frame="a", pen=True, i=[1, 0, 2])
fig2.savefig("array-yxz.png")

@PaulWessel
Copy link
Member

So I think I found the problem, but concerned that this would certainly affect any matrix being passed in as a dataset with -i options. This is the function that determines which column we should read given input values col = 0, 1, 2, 3 ...:

GMT_LOCAL unsigned int gmtapi_pick_in_col_number (struct GMT_CTRL *GMT, unsigned int col) {
	/* Return the next column to be selected on input */
	unsigned int col_pos;
	if (GMT->common.i.select)	/* -i has selected some columns */
        col_pos = GMT->current.io.col[GMT_IN][col].order; /* Which data column to pick */
        //col_pos = GMT->current.io.col[GMT_IN][col].col; /* Which data column to pick */
#if 0
	else if (GMT->current.setting.io_lonlat_toggle[GMT_IN] && col < GMT_Z)	/* Worry about -: for lon,lat */
		col_pos = 1 - col;	/* Read lat/lon instead of lon/lat */
#endif
	else
		col_pos = col;	/* Just goto that column */
	return (col_pos);
}

I have commented out the offending line and replaced it with the one using the member order, like in gmtio_ascii_input for files. I will make a PR for this but please run all tests you have that uses a matrix for dataset input.

@michaelgrund
Copy link
Member Author

michaelgrund commented Jun 1, 2021

Yes, I think this is an upstream bug, but it only affects wrappers, that's why the GMT tests pass.

Here is a modified example based on @michaelgrund's example:

As you can see, passing "YXZ" data and using incols=[1, 0, 2] works well for file input, but not for an array input.

Thanks for adding this @seisman, only tested it with an array as input.

@seisman
Copy link
Member

seisman commented Jun 2, 2021

The upstream bug was fixed in GenericMappingTools/gmt#5289 and will be available in GMT 6.2.0.

@michaelgrund
Copy link
Member Author

The upstream bug was fixed in GenericMappingTools/gmt#5289 and will be available in GMT 6.2.0.

Great! Thanks for your efforts @PaulWessel @meghanrjones @seisman.

@maxrjones
Copy link
Member

There is still an issue with scalings not being applied on vector inputs. Here is a minimal example (will need to update the file path from Users/user/ to get it to work):

import pygmt
import pandas as pd

# Correct (file input)
fig1 = pygmt.Figure()
fig1.plot(data="@mississippi.txt",J="X15cT/7c",B="af",W="0.25p,red",i="0,1+s1e-3")
fig1.savefig("mississippi-file.png")

# Read Mississippi data into a pandas DataFrame (need to update path here)
df = pd.read_table('/Users/user/.gmt/cache/mississippi.txt',skiprows=1,header=None)
df[0] = pd.to_datetime(df.iloc[:,0])

# Incorrect (vector input)
fig2 = pygmt.Figure()
fig2.plot(data=df,J="X15cT/7c",B="af",W="0.25p,red",i="0,1+s1e-3")
fig2.savefig("mississippi-vector.png")
mississippi-file.png mississippi-vector.png
mississippi-file mississippi-vector

@maxrjones
Copy link
Member

I can work on debugging the scaling issue, but may not be able to get to it today.

We should generalize the issue title a bit for tracking, since this applies to virtual file input with incols across methods.

@maxrjones maxrjones self-assigned this Jun 2, 2021
@PaulWessel
Copy link
Member

Tables are either read rec-by-rec or as an entire table. Fixing the above for record reading was simple, and that gets the right -R via gmtinfo which reads records. So the 2nd plot now shares the same y-range as the first. However, the plotting in plot reads the entire table in and since GMT_IS_REFERENCE is passed we simply hook up the vectors to avoid memory duplication. We only do this if input vectors are double precision, which they are, else we switch to GMT_IS_DUPLICATE. Unfortunately, there is no check on -i+s+o+l manipulation of data which also needs to kick us to GMT_IS_DUPLICATE so we can adjust the values...Hence the data are still in the original units and plots off chart.

I will continue working on this issue later today after returning from UH.

PaulWessel added a commit to GenericMappingTools/gmt that referenced this issue Jun 2, 2021
See GenericMappingTools/pygmt#1313 for details.  This PR ensures the record-by-record reading of external vectors have any adjustments set via -i applied.
@maxrjones maxrjones removed their assignment Jun 2, 2021
PaulWessel added a commit to GenericMappingTools/gmt that referenced this issue Jun 3, 2021
* Ensure external vector is scaled via -I on rec-by-rec parsing

See GenericMappingTools/pygmt#1313 for details.  This PR ensures the record-by-record reading of external vectors have any adjustments set via -i applied.

* Update gmt_api.c
@seisman seisman reopened this Jun 25, 2021
@seisman
Copy link
Member

seisman commented Jun 25, 2021

Reopen because the upstream fix GenericMappingTools/gmt#5289 breaks GMT.jl.

See GenericMappingTools/gmt#5384 for discussions.

@maxrjones
Copy link
Member

A second fix was implemented in GenericMappingTools/gmt#5392.

xref #1440 (I will need to check if that is a duplicate issue).

@seisman seisman changed the title contour does not handle incols parameter correctly The "incols" (-i) parameter doesn't works for vector input correctly Aug 29, 2021
@seisman
Copy link
Member

seisman commented Aug 29, 2021

A second fix was implemented in GenericMappingTools/gmt#5392.

xref #1440 (I will need to check if that is a duplicate issue).

The example script in #1313 (comment) works well with the GMT master branch, but #1440 fails. So I think they are not duplicate issues.

As this issue has been fixed, I think we can close it, but we need to add a test before closing the issue. Agree?

@maxrjones
Copy link
Member

A second fix was implemented in GenericMappingTools/gmt#5392.
xref #1440 (I will need to check if that is a duplicate issue).

The example script in #1313 (comment) works well with the GMT master branch, but #1440 fails. So I think they are not duplicate issues.

As this issue has been fixed, I think we can close it, but we need to add a test before closing the issue. Agree?

Thanks for checking. Yes, I agree that we can close this issue after adding a test.

@seisman seisman self-assigned this Aug 30, 2021
@seisman
Copy link
Member

seisman commented Aug 31, 2021

I'm trying to add a test for this issue, but I found there are still bugs with -i option. With GMT 6.2.0, -i1,0 works for PyGMT, but modifiers are ignored. For example, -i1+s10.0,0 doesn't work:

import numpy as np
from pygmt import info

# prepre the data with two columns
x = np.arange(0, 10, 1)
y = np.arange(10, 20, 1)
data = np.array([x, y]).T
np.savetxt("xy.dat", data)

# Correct output without incols
print(info(table=data))
print(info(table="xy.dat"))

# Incorrect with GMT 6.1.1, but now correct with GMT 6.2.0
print(info(table=data, incols="1,0"))
print(info(table="xy.dat", incols="1,0"))

# Incorrect with GMT 6.2.0
print(info(table=data, incols="1+s10,0"))
# Correct with GMT 6.2.0
print(info(table="xy.dat", incols="1+s10,0"))

@maxrjones
Copy link
Member

I suspect the problem is that the fix applied in GenericMappingTools/gmt#5291 wasn't extended to table-reading. @PaulWessel, do you remember if you have any pending work for the GMT_IS_MATRIX case? If not, I could look into this more.

@seisman
Copy link
Member

seisman commented Sep 3, 2021

I suspect the problem is that the fix applied in GenericMappingTools/gmt#5291 wasn't extended to table-reading. @PaulWessel, do you remember if you have any pending work for the GMT_IS_MATRIX case? If not, I could look into this more.

I quickly looked at the source code. I think the info() function always uses GMT_IS_VECTOR not GMT_IS_MATRIX. So it's still a bug for vectors.

Actually, GMT_IS_MATRIX is rarely used in PyGMT. I searched for the virtualfile_from_matrix function in the source code, and only found the contour, meca and surface functions are using it.

If I understand it correctly, using the GMT_IS_MATIRX keyword or calling the GMT_Put_Matrix function means that the data passed to GMT must be a 2D matrix and all the elements in the 2D matrix must have the same data type (for example, a float 2D matrix). However, for contour, meca, surface or any other functions, data with mixed types should be allowed (for example, an integer x array and a float y array).

@maxrjones
Copy link
Member

I suspect the problem is that the fix applied in GenericMappingTools/gmt#5291 wasn't extended to table-reading. @PaulWessel, do you remember if you have any pending work for the GMT_IS_MATRIX case? If not, I could look into this more.

I quickly looked at the source code. I think the info() function always uses GMT_IS_VECTOR not GMT_IS_MATRIX. So it's still a bug for vectors.

Actually, GMT_IS_MATRIX is rarely used in PyGMT. I searched for the virtualfile_from_matrix function in the source code, and only found the contour, meca and surface functions are using it.

If I understand it correctly, using the GMT_IS_MATIRX keyword or calling the GMT_Put_Matrix function means that the data passed to GMT must be a 2D matrix and all the elements in the 2D matrix must have the same data type (for example, a float 2D matrix). However, for contour, meca, surface or any other functions, data with mixed types should be allowed (for example, an integer x array and a float y array).

I stepped through the code and confirmed that virtualfile_from_matrix is used for your example failure due to the following lines of code:

pygmt/pygmt/clib/session.py

Lines 1457 to 1463 in d01da5a

try:
# Just use virtualfile_from_matrix for 2D numpy.ndarray
# which are signed integer (i), unsigned integer (u) or
# floating point (f) types
assert data.ndim == 2 and data.dtype.kind in "iuf"
_virtualfile_from = self.virtualfile_from_matrix
_data = (data,)

@maxrjones
Copy link
Member

I think GenericMappingTools/gmt#5799 fixed the last of the upstream problems reported in this issue. Do we still need more tests for this before closing?

@seisman
Copy link
Member

seisman commented Nov 6, 2021

I think we still need a test like this one #1313 (comment), because we don't have any tests that use -i with modifiers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT
Projects
None yet
5 participants