Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure.plot: Crash for pd.DataFrame input containing floats when using "data" and "incols" #2637

Open
yvonnefroehlich opened this issue Aug 21, 2023 · 19 comments
Labels
bug Something isn't working

Comments

@yvonnefroehlich
Copy link
Member

yvonnefroehlich commented Aug 21, 2023

Description of the problem

Under specific circumstances, Figure.plot does not work if a pandas.DataFrame is passed to the data parameter and a column order is selected via incols. The issue does not occur in case the pd.DataFrame contains only integers. If the desired columns are passed directly to the x and y parameters, the code works well. For me, this issue occurs under Windows but not under Linux.

For context, see PR #2515 up on comment #2515 (comment)

Maybe related to the issues in

Minimal Complete Verifiable Example

import pandas as pd
import pygmt 


size = 5

# Set up random test data
test_dict_int = {
    'a': [ 2,  2, 2, 2],
    'z': [ 8,  6, 7, 3],
    'x': [-3, -1, 1, 3],
    'y': [ 2,  2, 2, 2],
}
test_df_int = pd.DataFrame(data=test_dict_int)


fig = pygmt.Figure()

fig.basemap(
    region=[-size, size, -size, size],
    projection="X" + str(size*2),
    frame=True,
)

fig.plot(
    # data=test_df_int,  # integers -> WORKs
    data=test_df_int.astype(float),  # floats -> FAILs
    incols=[2, 3],
    # verbose="d",
)

# fig.show()
# fig.savefig(fname="bug_MWE.png")

Output of verbose="d"

plot [DEBUG]: Look for file -5/5/-5/5 in C:/Users/Admin/.gmt
plot [DEBUG]: Look for file -5/5/-5/5 in C:/Users/Admin/.gmt/cache
plot [DEBUG]: Look for file -5/5/-5/5 in C:/Users/Admin/.gmt/server
plot [DEBUG]: Got regular w/e/s/n for region (-5/5/-5/5)
plot [INFORMATION]: Processing input table data
plot [DEBUG]: Operation will require 2 input columns [n_cols_start = 2]
plot [DEBUG]: Reset MAP_ANNOT_OBLIQUE to anywhere
plot [DEBUG]: Projected values in meters: -5 5 -5 5
plot [DEBUG]: Computed automatic parameters using dimension scaling: 0.9
plot [INFORMATION]: Map scale is 0.001 km per cm or 1:100.
plot [DEBUG]: Running in PS mode modern
plot [DEBUG]: Use PS filename C:/Users/Admin/.gmt/sessions/gmt_session.20196/gmt_1.ps-
plot [DEBUG]: Append to hidden PS file C:/Users/Admin/.gmt/sessions/gmt_session.20196/gmt_1.ps-
plot [DEBUG]: Got session name as pygmt-session and default graphics formats as pdf
plot [DEBUG]: Basemap order: Frame = above  Grid = below  Tick/Annot = below
plot [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Line
plot [DEBUG]: gmtapi_init_import: Added 1 new sources
plot [DEBUG]: GMT_Init_IO: Returned first Input object ID = 0
plot [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
plot [DEBUG]: gmtapi_import_dataset: Passed ID = -1 and mode = 0
plot [INFORMATION]: Referencing data table from user 4 column arrays of length 4
plot [DEBUG]: Object ID 1 : Registered Data Table Memory Reference 1e0f157bfc0 as an Input resource with geometry Point [n_objects = 2]
plot [DEBUG]: gmtapi_import_dataset processed 1 resources
plot [DEBUG]: GMT_End_IO: Input resource access is now disabled
plot [INFORMATION]: Plotting segment 0
plot [DEBUG]: GMT memory: Initialize 2 temporary column double arrays, each of length : 0

Full error message

Windows fatal exception: code 0xc0000374


Main thread:
Current thread 0x00003654 (most recent call first):
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\pygmt\clib\session.py", line 624 in call_module
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\pygmt\src\plot.py", line 267 in plot
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\pygmt\helpers\decorators.py", line 738 in new_module
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\pygmt\helpers\decorators.py", line 598 in new_module
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\pygmt\helpers\decorators.py", line 818 in new_module
  File "c:\users\admin\c2\eigenedokumente\studium\promotion\e_gmt\00_testing\001_gmt_pygmt\pr_tracksampling\bug_mwe_red.py", line 35 in <module>
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\spyder_kernels\py3compat.py", line 356 in compat_exec
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 473 in exec_code
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 615 in _exec_file
  File "C:\ProgramData\Anaconda3\envs\pygmt_env_dev\Lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 528 in runfile
  File "C:\Users\Admin\AppData\Local\Temp\ipykernel_1076\1879108342.py", line 1 in <module>


Restarting kernel...

System information

PyGMT information:
  version: v0.9.1.dev125
System information:
  python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 17:59:51) [MSC v.1935 64 bit (AMD64)]
  executable: C:\ProgramData\Anaconda3\envs\pygmt_env_dev\python.exe
  machine: Windows-10-10.0.19045-SP0
Dependency information:
  numpy: 1.24.3
  pandas: 2.0.2
  xarray: 2023.1.1.dev17
  netCDF4: 1.6.2
  packaging: 23.1
  contextily: 1.3.0
  geopandas: 0.13.2
  IPython: 8.14.0
  rioxarray: 0.14.1
  ghostscript: 9.54.0
GMT library information:
  binary version: 6.4.0
  cores: 4
  grid layout: rows
  image layout: 
  library path: C:/ProgramData/Anaconda3/envs/pygmt_env_dev/Library/bin/gmt.dll
  padding: 2
  plugin dir: C:/ProgramData/Anaconda3/envs/pygmt_env_dev/Library/bin/gmt_plugins
  share dir: C:/Program Files (x86)/gmt6/share
  version: 6.4.0
@seisman
Copy link
Member

seisman commented Nov 23, 2023

I can reproduce the bug on Linux.

If pd.DataFrame contain integer-only columns, the debug messages are:

plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt/cache
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt/server
plot [DEBUG]: Got regular w/e/s/n for region (-5/5/-5/5)
plot [INFORMATION]: Processing input table data
plot [DEBUG]: Operation will require 2 input columns [n_cols_start = 2]
plot [DEBUG]: Reset MAP_ANNOT_OBLIQUE to anywhere
plot [DEBUG]: Projected values in meters: -5 5 -5 5
plot [DEBUG]: Computed automatic parameters using dimension scaling: 0.9
plot [INFORMATION]: Map scale is 0.001 km per cm or 1:100.
plot [DEBUG]: Running in PS mode modern
plot [DEBUG]: Use PS filename /home/seisman/.gmt/sessions/gmt_session.1478556/gmt_1.ps-
plot [DEBUG]: Append to hidden PS file /home/seisman/.gmt/sessions/gmt_session.1478556/gmt_1.ps-
plot [DEBUG]: Got session name as pygmt-session and default graphics formats as pdf
plot [DEBUG]: Basemap order: Frame = above  Grid = below  Tick/Annot = below
plot [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Line
plot [DEBUG]: gmtapi_init_import: Added 1 new sources
plot [DEBUG]: GMT_Init_IO: Returned first Input object ID = 0
plot [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
plot [DEBUG]: gmtapi_import_dataset: Passed ID = -1 and mode = 0
plot [INFORMATION]: Duplicating data table from user 4 column arrays of length 4
plot [DEBUG]: Object ID 1 : Registered Data Table Memory Copy 560d93e51980 as an Input resource with geometry Point [n_objects = 2]
plot [DEBUG]: gmtapi_import_dataset processed 1 resources
plot [DEBUG]: GMT_End_IO: Input resource access is now disabled
plot [INFORMATION]: Plotting segment 0
plot [DEBUG]: GMT_Destroy_Data: freed memory for a Data Table for object 1
plot [DEBUG]: gmtlib_unregister_io: Unregistering object no 1 [n_objects = 1]
plot [DEBUG]: gmtlib_unregister_io: Object no 1 has non-NULL resource pointer
plot [DEBUG]: Current size of half-baked PS file /home/seisman/.gmt/sessions/gmt_session.1478556/gmt_1.ps- = 23633.

If pd.DataFrame contain float-type columns, the debug messages are:

plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt/cache
plot [DEBUG]: Look for file -5/5/-5/5 in /home/seisman/.gmt/server
plot [DEBUG]: Got regular w/e/s/n for region (-5/5/-5/5)
plot [INFORMATION]: Processing input table data
plot [DEBUG]: Operation will require 2 input columns [n_cols_start = 2]
plot [DEBUG]: Reset MAP_ANNOT_OBLIQUE to anywhere
plot [DEBUG]: Projected values in meters: -5 5 -5 5
plot [DEBUG]: Computed automatic parameters using dimension scaling: 0.9
plot [INFORMATION]: Map scale is 0.001 km per cm or 1:100.
plot [DEBUG]: Running in PS mode modern
plot [DEBUG]: Use PS filename /home/seisman/.gmt/sessions/gmt_session.1479589/gmt_1.ps-
plot [DEBUG]: Append to hidden PS file /home/seisman/.gmt/sessions/gmt_session.1479589/gmt_1.ps-
plot [DEBUG]: Got session name as pygmt-session and default graphics formats as pdf
plot [DEBUG]: Basemap order: Frame = above  Grid = below  Tick/Annot = below
plot [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Line
plot [DEBUG]: gmtapi_init_import: Added 1 new sources
plot [DEBUG]: GMT_Init_IO: Returned first Input object ID = 0
plot [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
plot [DEBUG]: gmtapi_import_dataset: Passed ID = -1 and mode = 0
plot [INFORMATION]: Referencing data table from user 4 column arrays of length 4
plot [DEBUG]: Object ID 1 : Registered Data Table Memory Reference 55c909f971a0 as an Input resource with geometry Point [n_objects = 2]
plot [DEBUG]: gmtapi_import_dataset processed 1 resources
plot [DEBUG]: GMT_End_IO: Input resource access is now disabled
plot [INFORMATION]: Plotting segment 0
free(): invalid next size (fast)

Here is the diff:

< plot [INFORMATION]: Duplicating data table from user 4 column arrays of length 4
< plot [DEBUG]: Object ID 1 : Registered Data Table Memory Copy 560d93e51980 as an Input resource with geometry Point [n_objects = 2]
---
> plot [INFORMATION]: Referencing data table from user 4 column arrays of length 4
> plot [DEBUG]: Object ID 1 : Registered Data Table Memory Reference 55c909f971a0 as an Input resource with geometry Point [n_objects = 2]
26,29c26
< plot [DEBUG]: GMT_Destroy_Data: freed memory for a Data Table for object 1
< plot [DEBUG]: gmtlib_unregister_io: Unregistering object no 1 [n_objects = 1]
< plot [DEBUG]: gmtlib_unregister_io: Object no 1 has non-NULL resource pointer
< plot [DEBUG]: Current size of half-baked PS file /home/seisman/.gmt/sessions/gmt_session.1478556/gmt_1.ps- = 23633.
---
> free(): invalid next size (fast)

So, for the integer-type case, data is duplicated, but for the float-type case, data is used by reference.

@PaulWessel Need your help.

@PaulWessel
Copy link
Member

How is the DataFrame passed to GMT? Via matrix? Also, this looks like a bad sign
gmtapi_import_dataset: Passed ID = -1 and mode = 0
since ID = -1 means "not set", so that can't be good.

@seisman
Copy link
Member

seisman commented Nov 23, 2023

It's passed via GMT_Put_Vectors.

@seisman
Copy link
Member

seisman commented Nov 23, 2023

Here are the values used in GMT_Open_Virtualfile

family="GMT_IS_DATASET|GMT_VIA_VECTOR"
geometry="GMT_IS_POINT"
direction="GMT_IN|GMT_IS_REFERENCE"

@PaulWessel
Copy link
Member

Not clear. Might you share a minimal example that (1) loads the data frame, (2) passes it to some simple module like gmtconvert (assuming that also crashes)? Think I need to debug.

@seisman
Copy link
Member

seisman commented Nov 24, 2023

Might you share a minimal example that (1) loads the data frame, (2) passes it to some simple module like gmtconvert (assuming that also crashes)?

Tried to pass the same dataset to gmtconvert, but it doesn't crash.

import pandas as pd
from pygmt.clib import Session

test_dict_int = {
    'a': [ 2,  2, 2, 2],
    'z': [ 8,  6, 7, 3],
    'x': [-3, -1, 1, 3],
    'y': [ 2,  2, 2, 2],
}
data = pd.DataFrame(data=test_dict_int)

with Session() as lib:
    with lib.virtualfile_from_data(data=data) as vintbl:
        lib.call_module("convert", f"{vintbl} -Vd")

The verbose messages are:

mtconvert [INFORMATION]: Processing input table data
gmtconvert [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Point
gmtconvert [DEBUG]: gmtapi_init_import: Added 1 new sources
gmtconvert [DEBUG]: GMT_Init_IO: Returned first Input object ID = 0
gmtconvert [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
gmtconvert [DEBUG]: gmtapi_import_dataset: Passed ID = -1 and mode = 0
gmtconvert [INFORMATION]: Referencing data table from user 4 column arrays of length 4
gmtconvert [DEBUG]: Object ID 1 : Registered Data Table Memory Reference 555a4fa95820 as an Input resource with geometry Point [n_objects = 2]
gmtconvert [DEBUG]: gmtapi_import_dataset processed 1 resources
gmtconvert [DEBUG]: GMT_End_IO: Input resource access is now disabled
gmtconvert [DEBUG]: Object ID 2 : Registered Data Table Memory Reference 555a4fad6fe0 as an Input resource with geometry Point [n_objects = 3]
gmtconvert [DEBUG]: Successfully duplicated a Data Table
gmtconvert [DEBUG]: Object ID 3 : Registered Data Table Stream 7f3cdcaae780 as an Output resource with geometry Point [n_objects = 4]
gmtconvert [DEBUG]: gmtapi_begin_io: Output resource access is now enabled [container]
gmtconvert [DEBUG]: gmtapi_export_dataset: Passed ID = 3 and mode = 0
gmtconvert [INFORMATION]: Write Data Table to <stdout>
2	8	-3	2
2	6	-1	2
2	7	1	2
2	3	3	2
gmtconvert [DEBUG]: GMT_End_IO: Output resource access is now disabled
gmtconvert [INFORMATION]: 1 tables concatenated, 4 records passed (input cols = 4; output cols = 4)
gmtconvert [DEBUG]: gmtlib_garbage_collection: Destroying object: C=0 A=0 ID=1 W=Input F=Data Table M=Memory Reference S=Used P=555a4fa95820 N=(null)
gmtconvert [DEBUG]: gmtlib_garbage_collection: Destroying object: C=0 A=0 ID=2 W=Input F=Data Table M=Memory Reference S=Unused P=555a4fad6fe0 N=(null)
gmtconvert [DEBUG]: GMTAPI_Garbage_Collection freed 2 memory objects
gmtconvert [DEBUG]: gmtlib_unregister_io: Unregistering object no 1 [n_objects = 3]
gmtconvert [DEBUG]: gmtlib_unregister_io: Unregistering object no 2 [n_objects = 2]
gmtconvert [DEBUG]: gmtlib_unregister_io: Unregistering object no 3 [n_objects = 1]

ID = -1 so it's not the real problem.

@seisman
Copy link
Member

seisman commented Nov 24, 2023

For the example in #2637 (comment), if I add style="c0.2c" (i.e., -Sc0.2c), the script works. So, it's likely it only crashes when plotting lines.

@PaulWessel
Copy link
Member

If I try this:

cat <<- EOF > bug.py
# Set up random test data
import pandas as pd
import pygmt 

size = 5

test_dict_int = {
    'a': [ 2,  2, 2, 2],
    'z': [ 8,  6, 7, 3],
    'x': [-3, -1, 1, 3],
    'y': [ 2,  2, 2, 2],
}
test_df_int = pd.DataFrame(data=test_dict_int)


fig = pygmt.Figure()

fig.basemap(
    region=[-size, size, -size, size],
    projection="X" + str(size*2),
    frame=True,
)

fig.plot(
    # data=test_df_int,  # integers -> WORKs
    data=test_df_int.astype(float),  # floats -> FAILs
    incols=[2, 3],
    # verbose="d",
)

fig.show()
fig.savefig(fname="bug_MWE.png")
EOF

and run

python bug.py

I get no errors and this plot

bug_MWE

What am I missing?

@seisman
Copy link
Member

seisman commented Nov 24, 2023

@yvonnefroehlich said "For me, this issue occurs under Windows but not under Linux.". Now I can reproduce the issue under Linux, but you can't reproduce it under macOS.

Need to find out why the behavior is system-dependent.

@PaulWessel
Copy link
Member

Need a Linux or Win (@joa-quim ) person to run in debug and determine WTF is going on. I cannot.

@joa-quim
Copy link
Member

I could try to start my python learning through a debug session but for that I would need that PyGMT was able to find my gmt.dll (which, ofc, has to be a debug build) as I don't want to mess with Conda and environments stuff.

@seisman
Copy link
Member

seisman commented Nov 24, 2023

I would need that PyGMT was able to find my gmt.dll (which, ofc, has to be a debug build)

Just set the GMT_LIBRARY_PATH environment variable to the path to the gmt.dll (something like C:\Users\USERNAME\Mambaforge\envs\pygmt\Library\bin\).

@joa-quim
Copy link
Member

OK, but when I said gmt.dll I was being generic. The true name is gmt_w64.dll and if only the path is set via GMT_LIBRARY_PATH then the right dll wont be found.

@seisman
Copy link
Member

seisman commented Nov 24, 2023

OK, but when I said gmt.dll I was being generic. The true name is gmt_w64.dll and if only the path is set via GMT_LIBRARY_PATH then the right dll wont be found.

PyGMT will try to find gmt.dll, gmt_w32.dll and gmt_w64.dll

@joa-quim
Copy link
Member

Good, thanks.

@seisman
Copy link
Member

seisman commented Dec 3, 2023

Tried to debug this issue. It seems plot crashes when trying to free the GMT_DATASET object
https://github.com/GenericMappingTools/gmt/blob/7825ff4632c85ef6569acf19192068b977127e07/src/psxy.c#L3008

		if (GMT_Destroy_Data (API, &D) != GMT_NOERROR) {
			Return (API->error);
		}

Actually it crashes in the gmt_free_segment function (https://github.com/GenericMappingTools/gmt/blob/7825ff4632c85ef6569acf19192068b977127e07/src/gmt_io.c#L8875):

	SH = gmt_get_DS_hidden (segment);
	for (col = 0; col < segment->n_columns; col++) {
		if (SH->alloc_mode[col] == GMT_ALLOC_INTERNALLY)	/* Free data GMT allocated */
			gmt_M_free (GMT, segment->data[col]);
	}
	gmt_M_free (GMT, segment->data);  # CRASHES HERE!

SH->alloc_mode[col] are GMT_ALLOC_EXTERNALLY for all the four columns, so segment->data[col] are not freed, but it crashes when freeing segment->data.

@seisman
Copy link
Member

seisman commented Dec 4, 2023

Ping @PaulWessel Does the above debugging help?

@PaulWessel
Copy link
Member

Yes, I got the same. Will debug again to see exactly that data is internally allocated. Not sure why that would crash, but it does.

@PaulWessel
Copy link
Member

Since I cannot reproduce it (works for macOS) and I cannot see why this would depend on the OS I cannot really help. Someone would need to debug in Linux but not sure what to look fore. segment->data is allocated in GMT so fair game to free as long as we dont free the read-only vectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants