-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pygmt.x2sys_cross: Refactor to use virtualfiles for output tables [BREAKING CHANGE: Dummy times in 3rd and 4th columns now have np.timedelta64 type] #3182
Conversation
295afc0
to
5280524
Compare
bc341f6
to
ff290da
Compare
ff290da
to
58c6ea4
Compare
pygmt/src/x2sys_cross.py
Outdated
# Convert 3rd and 4th columns to datetimes. | ||
# These two columns have names "t_1"/"t_2" or "i_1"/"i_2". | ||
# "t_1"/"t_2" means they are datetimes and should be converted. | ||
# "i_1"/"i_2" means they are dummy times (i.e., floating-point values). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding the output correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never used x2sys, but here is my understanding of the C codes and the output:
- The 3rd and 4th columns are datetimes. They can be either absolute datetimes (e.g.,
2023-01-01T01:23:45.678
or dummy datetimes (i.e., double-precision numbers), depending on whether the input tracks contain datetimes. - Internally, absolute datetimes are also represented as double-precision numbers in GMT. So absolute datetimes and dummy datetimes are the same internally.
- When outputting to a file, GMT will convert double-precision numbers into absolute datetimes, since GMT know if the column has dummy datetimes or not.
- A
GMT_DATASET
container can only contain double-precision numbers and text strings. So when outputting to a virtual file, the 3rd and 4th columns always have double-precision numbers. If the column names aret_1
/t_2
, then we know they're absolute datetimes and should be converted; otherwise, they are just dummy datetimes and should not be converted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little unsure if i_1
/i_2
are actually dummy datetimes. This is a sample output from x2sys_cross
:
# Tag: X2SYS4ivlhlo4
# Command: x2sys_cross @tut_ship.xyz -Qi -TX2SYS4ivlhlo4 ->/tmp/lala.txt
# x y i_1 i_2 dist_1 dist_2 head_1 head_2 vel_1 vel_2 z_X z_M
> @tut_ship 0 @tut_ship 0 NaN/NaN/1357.17 NaN/NaN/1357.17
251.004840022 20.000079064 18053.5647431 13446.6562433 333.339586673 229.636557499 269.996783034 270.023614846 NaN NaN 192.232797243 -2957.22757183
251.004840022 20.000079064 18053.5647431 71783.6562433 333.339586673 1148.20975878 269.996783034 270.023614846 NaN NaN 192.232797243 -2957.22757183
250.534946327 20.0000526811 18053.3762934 66989.0210846 332.869692978 1022.68273972 269.996783034 269.360150109 NaN NaN -57.6485957585 -2686.4268008
250.532033147 20.0000525175 18053.3751251 66988.9936489 332.866779797 1022.67977813 269.996783034 22.0133296951 NaN NaN -64.5973890802 -2682.04812157
252.068705 20.000075 13447.5 71784.5 230.700422496 1149.27362378 269.995072235 269.995072235 NaN NaN 0 -3206.5
It seems like the i_1
/i_2
values vary between rows, but I can't quite remember what they represent... maybe an index of some sort? I might need to inspect the C code to see what's going on, can you point me to where these i_1
/i_2
columns are being output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dummy times are just double-precision indexes from 0 to n (xref: https://github.com/GenericMappingTools/gmt/blob/b56be20bee0b8de22a682fdcd458f9b9eeb76f64/src/x2sys/x2sys.c#L533).
The column name i_1
or t_1
is controlled by the variable t_or_i
in the C code (https://github.com/GenericMappingTools/gmt/blob/b56be20bee0b8de22a682fdcd458f9b9eeb76f64/src/x2sys/x2sys_cross.c#L998). From https://github.com/GenericMappingTools/gmt/blob/b56be20bee0b8de22a682fdcd458f9b9eeb76f64/src/x2sys/x2sys_cross.c#L568, it's clear that, if got_time
is True, then the column is absolute time (GMT_IS_ABSTIME
), otherwise it's double-precision numbers (GMT_IS_FLOAT
).
We can keep the dummy times as double-precision numbers or think them as seconds since unix epoch and then convert them to absolute times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the dummy times as double-precision numbers or think them as seconds since unix epoch and then convert them to absolute times.
Maybe convert the relative time to pandas.Timedelta
or numpy.timedelta64
? Xref #2848.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Done in 9d12ae1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 main changes happening in this PR:
- Adding the
output_type="numpy"
option - Handling the different dtypes of the
i_1
/i_2
ort_1
/t_2
columns
We can keep this as a single PR since it's hard to separate the two things, but might need to discuss the implementation a bit more.
pygmt/src/x2sys_cross.py
Outdated
def x2sys_cross(tracks=None, outfile=None, **kwargs): | ||
def x2sys_cross( | ||
tracks=None, | ||
output_type: Literal["pandas", "numpy", "file"] = "pandas", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm not sure if we should support numpy
output type for x2sys_cross
because all 'columns' will need to be the same dtype in a np.ndarray
. If there are datetime values in the columns, they will get converted to floating point (?), which makes it more difficult to use later. Try adding a unit test for numpy
output_type and see if it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are datetime values in the columns, they will get converted to floating point (?)
You're right. Datetimes are converted to floating points by df.to_numpy()
. Will remove the numpy
output type.
pygmt/src/x2sys_cross.py
Outdated
# Convert 3rd and 4th columns to datetimes. | ||
# These two columns have names "t_1"/"t_2" or "i_1"/"i_2". | ||
# "t_1"/"t_2" means they are datetimes and should be converted. | ||
# "i_1"/"i_2" means they are dummy times (i.e., floating-point values). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little unsure if i_1
/i_2
are actually dummy datetimes. This is a sample output from x2sys_cross
:
# Tag: X2SYS4ivlhlo4
# Command: x2sys_cross @tut_ship.xyz -Qi -TX2SYS4ivlhlo4 ->/tmp/lala.txt
# x y i_1 i_2 dist_1 dist_2 head_1 head_2 vel_1 vel_2 z_X z_M
> @tut_ship 0 @tut_ship 0 NaN/NaN/1357.17 NaN/NaN/1357.17
251.004840022 20.000079064 18053.5647431 13446.6562433 333.339586673 229.636557499 269.996783034 270.023614846 NaN NaN 192.232797243 -2957.22757183
251.004840022 20.000079064 18053.5647431 71783.6562433 333.339586673 1148.20975878 269.996783034 270.023614846 NaN NaN 192.232797243 -2957.22757183
250.534946327 20.0000526811 18053.3762934 66989.0210846 332.869692978 1022.68273972 269.996783034 269.360150109 NaN NaN -57.6485957585 -2686.4268008
250.532033147 20.0000525175 18053.3751251 66988.9936489 332.866779797 1022.67977813 269.996783034 22.0133296951 NaN NaN -64.5973890802 -2682.04812157
252.068705 20.000075 13447.5 71784.5 230.700422496 1149.27362378 269.995072235 269.995072235 NaN NaN 0 -3206.5
It seems like the i_1
/i_2
values vary between rows, but I can't quite remember what they represent... maybe an index of some sort? I might need to inspect the C code to see what's going on, can you point me to where these i_1
/i_2
columns are being output?
13d36e4
to
5c7214d
Compare
5c7214d
to
db94b91
Compare
I'll give this a proper review over the weekend, a bit busy this week with some deadlines 🫠 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks also for handling the output differences between macOS and Linux (xref #3194). Pre-approving as the main logic around timedelta conversion checks out ok. Suggestions below are mostly documentation related or minor.
Co-authored-by: Wei Ji <[email protected]>
Co-authored-by: Wei Ji <[email protected]>
Description of proposed changes
This PR refactors the
pygmt.x2sys_cross
function to use virtualfiles for output. Need to note thatx2sys_cross
still uses temporary files in thetempfile_from_dftrack
function.Partially address #3160.
This PR introduces a breaking change: Previously, the dummy times in 3-4 columns (with column names
i_1
/i_2
) were innp.object
type, and now they havenp.timedelta64
type.