-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full support of numpy dtypes #2848
Comments
Personally use
This is an interesting one. I've had to make plots for showing how a variable like RMSE changes over time from Day 0 to Day 10. The below one is using seaborn: Note though, the I think I did try to do this with GMT/PyGMT at first, but couldn't work out if GMT actually supports plotting relative time. Maybe there's a way to do it by changing the configs at https://docs.generic-mapping-tools.org/6.4/gmt.conf.html#calendar-time-parameters? But I'm not sure how we can convert |
The current issue is, the error message is helpless and very confusing, so users don't know that >>> import pygmt
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4], dtype=np.float16)
>>>> pygmt.info(x)
File ~/OSS/gmt/pygmt/pygmt/clib/session.py:1275, in Session.virtualfile_from_vectors(self, *vectors)
1273 # Use put_vector for columns with numerical type data
1274 for col, array in enumerate(arrays[:columns]):
-> 1275 self.put_vector(dataset, column=col, vector=array)
1277 # Use put_strings for last column(s) with string type data
1278 # Have to use modifier "GMT_IS_DUPLICATE" to duplicate the strings
1279 string_arrays = arrays[columns:]
File ~/OSS/gmt/pygmt/pygmt/clib/session.py:892, in Session.put_vector(self, dataset, column, vector)
889 vector_pointer = (ctp.c_char_p * len(vector))()
890 if gmt_type == self["GMT_DATETIME"]:
891 vector_pointer[:] = np.char.encode(
--> 892 np.datetime_as_string(array_to_datetime(vector))
893 )
894 else:
895 vector_pointer[:] = np.char.encode(vector)
ValueError: Cannot convert a NumPy datetime value other than NaT with generic units
I don't think users should be warned for a memory usage increase. GMT does a lot of memory duplication internally and no one cares about it. Instead, we should warn users if we make any changes (e.g., casting types) to the input data. |
After improving the error messages in PR #2856, I'm convinced that it's better to NOT support dtypes like |
Ok, shall we rescope this issue to just focus on supporting |
Here is a GMT CLI script showing how GMT deal with relative times:
To support
However, in the Python world, converting |
For you case, I think you're expecting a relative-time axis. >>> import pandas as pd
>>> import numpy as np
>>> data = pd.timedelta_range(start="1 day", periods=10)
>>> data
TimedeltaIndex([ '1 days', '2 days', '3 days', '4 days', '5 days',
'6 days', '7 days', '8 days', '9 days', '10 days'],
dtype='timedelta64[ns]', freq='D')
>>> data.to_numpy()
array([ 86400000000000, 172800000000000, 259200000000000, 345600000000000,
432000000000000, 518400000000000, 604800000000000, 691200000000000,
777600000000000, 864000000000000], dtype='timedelta64[ns]')
>>> data.to_numpy(dtype="timedelta64[D]")
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype='timedelta64[D]') I feel we should convert |
It's straightforward to support Lines 75 to 88 in 78dfcf1
Then the following codes work: >>> import pygmt
>>> import numpy as np
>>> data = np.arange(np.timedelta64(1, "D"), np.timedelta64(10, "D"))
>>> data
array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='timedelta64[D]')
>>> pygmt.info(data)
'<vector memory>: N = 9 <1/9>\n'
>>> data = data.astype("timedelta64[s]")
>>> pygmt.info(data)
'<vector memory>: N = 9 <86400/777600>\n'
>>> data = data.astype("timedelta64[ns]")
>>> pygmt.info(data)
'<vector memory>: N = 9 <8.64e+13/7.776e+14>\n' |
Ok, started a PR for this at #2884, and managed to recreate my example plot above. Example code: import numpy as np
import pygmt
fig = pygmt.Figure()
fig.basemap(
projection="X8c/5c",
region=[0, 8, 0, 10],
frame=["WSne", "xaf+lForecast Days", "yaf+lRMSE"],
)
fig.plot(
x=np.arange(np.timedelta64(0, "D"), np.timedelta64(8, "D")),
y=np.geomspace(start=0.1, stop=9, num=8),
style="c0.2c",
pen="1p",
)
fig.show() produces However, passing fig = pygmt.Figure()
fig.basemap(
projection="X8c/5c",
region=[np.timedelta64(0, "D"), np.timedelta64(8, "D"), 0, 10],
frame=["WSne", "xaf+lForecast Days", "yaf+lRMSE"],
)
fig.plot(
x=np.arange(np.timedelta64(0, "D"), np.timedelta64(8, "D")),
y=np.geomspace(start=0.1, stop=9, num=8),
style="c0.2c",
pen="1p",
)
fig.show() produces Not sure why |
Figured out how we could use Another thing to consider - do we want to also support Python's built-in datetime.timedelta object? Quoting from #2884 (comment):
|
I guess not. |
I've updated the top post (#2848 (comment)) based on what I learned recently about NumPy dtypes. After PR #3566, there are still a few unsupported dtypes that can be supported.
For |
Before starting to support PyArrow arrays (#2800), I think we should first ensure that PyGMT has complete support for NumPy dtypes. Currently, the following NumPy dtypes are supported:
pygmt/pygmt/clib/session.py
Lines 85 to 102 in e5ecee9
Here is a simple way to list all the available NumPy dtypes (xref: https://numpy.org/doc/stable/reference/arrays.scalars.html):
numpy.int8
:GMT_CHAR
numpy.int16
:GMT_SHORT
numpy.int32
:GMT_INT
numpy.int64
:GMT_LONG
numpy.uint8
:GMT_UCHAR
numpy.uint16
:GMT_USHORT
numpy.uint32
:GMT_UINT
numpy.uint64
:GMT_ULONG
numpy.float16
numpy.float32
:GMT_FLOAT
numpy.float64
:GMT_DOUBLE
numpy.longlong
:GMT_LONG
in Support 1-D/2-D numpy arrays with longlong and ulonglong dtype #3566numpy.ulonglong
:GMT_ULONG
in Support 1-D/2-D numpy arrays with longlong and ulonglong dtype #3566numpy.longdouble
: alias for np.float96 or np.float128 (xref: https://numpy.org/devdocs/reference/arrays.scalars.html#numpy.longdouble): GMT doesn't support complex floating point data types.numpy.complex64
: GMT doesn't support complex floating point data types.numpy.complex128
: GMT doesn't support complex floating point data types.numpy.clongdouble
numpy.bool
: For bytestring. Makes no sense in GMT.numpy.bytes_
numpy.str_
:GMT_TEXT
numpy.datetime64
:GMT_DATETIME
. But still need to check all time units (https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units)numpy.timedelta64
:GMT_LONG
Support timedelta64 dtype as input #2884numpy.object_
: Any Python objects. We should try to convert it to np.datetime64 or np.str_.: For creating a new structured or unstructured void scalar. Make no sense in GMT.numpy.void
As you can see, most NumPy types are already supported in PyGMT. It's clear that GMT and PyGMT can't support float128 and any complex floating-point types, but at least it's possible to support
np.float16
(np.half
),np.bool_
andnp.timedelta
.np.float16
: maybe we have to cast the array intonp.float32
?np.bool_
: maybe convert to 0 and 1 (e.g.,array.astype("int8")
np.timedelta
: Not sure about this yet.The text was updated successfully, but these errors were encountered: