Allow passing in pandas dataframes to x2sys_cross #591

weiji14 · 2020-09-08T23:26:46Z

Description of proposed changes

Run crossover analysis directly on pandas.DataFrame inputs instead of having to write to tab-separated value (TSV) files first!

Example code:

import pygmt
from tempfile import TemporaryDirectory

dataframe: pd.DataFrame = pygmt.datasets.load_sample_bathymetry()
dataframe.columns = ["x", "y", "z"]  # longitude, latitude, bathymetry

os.environ["X2SYS_HOME"] = os.getcwd()

with TemporaryDirectory(prefix="X2SYS", dir=os.environ["X2SYS_HOME"]) as tmpdir:
    tag = os.path.basename(tmpdir)
    pygmt.x2sys_init(tag=tag, fmtfile="xyz", suffix="xyz", force=True)
    output: pd.DataFrame = pygmt.x2sys_cross(tracks=[dataframe], tag=tag, coe="i", verbose="i")

This isn't a trivial thing to implement, because:

x2sys requires those TSV files in quite a specific format (especially for the datetime columns)
The TSV files cannot be in any arbitrary directory like /tmp, it must be stored either in the current working directory or in specific locations listed in the TAG_paths.txt file.

Support for pandas DataFrame inputs into x2sys_cross was originally left out in the original implementation at #546 because we wanted to wait for GenericMappingTools/gmt#3717. But seeing as it's not a trivial matter, this is an interim solution.

Fixes #

Reminders

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst.
Write detailed docstrings for all functions/methods.
If adding new functionality, add an example to docstrings or tutorials.

Implemented by storing pandas.DataFrame data in a temporary file and passing this intermediate file to x2sys_cross. Need to do some regex file parsing to get the right file extension (suffix) for this to work.

So that the tests will pass on macOS and Windows too.

This reverts commit e8cb8b6.

This reverts commit 1d9e72a.

Because Windows (and macOS?) might not support opening same temporary file twice.

…orking dir" This reverts commit a64f47f.

This reverts commit 01c5ee5.

weiji14 · 2020-09-10T02:19:10Z

pygmt/x2sys.py

+                )  # e.g. "-Dxyz -Etsv -I1/1"
+                try:
+                    # 1st try to match file extension after -E
+                    suffix = re.search(pattern=r"-E(\S*)", string=lastline).group(1)
+                except AttributeError:  # 'NoneType' object has no attribute 'group'
+                    # 2nd try to match file extension after -D
+                    suffix = re.search(pattern=r"-D(\S*)", string=lastline).group(1)


Is there a better way to check if -Exyz is in the string, and if not, fallback to parsing from Dxyz?

What about this one?

lastline = "-Dxyz -Etsv -I1/1" #lastline = "-Dxyz -I1/1" for item in lastline.split(): for key in ['-E', '-D']: if item.startswith(key): suffix = item[2:] break print(suffix)

Note: the code may be wrong, but maybe some codes like this one.

Hmm, this gives me some ideas. I'll play around with it, thanks!

…rames

Also rename 'result' to 'table' to prevent pylint complaining about R0914: Too many local variables (16/15) (too-many-locals)

weiji14

Ok ready for review! I understand that this isn't easy to review properly, but would really appreciate getting this in for v0.2.0 tomorrow as I'll be using it for my PhD research. There's two unit tests added, one for internal crossovers (i.e. on 1 track) and one for external crossovers (2 tracks). I'll add a tutorial example for this over the weekend to explain things better.

weiji14 · 2020-09-10T03:39:12Z

pygmt/x2sys.py

+    try:
+        tmpfilename = f"track-{unique_name()[:7]}.{suffix}"
+        track.to_csv(
+            path_or_buf=tmpfilename,
+            sep="\t",
+            index=False,
+            date_format="%Y-%m-%dT%H:%M:%S.%fZ",
+        )
+        yield tmpfilename
+    finally:
+        os.remove(tmpfilename)


The original implementation using GMTTempFile/NamedTemporaryFile didn't work because of some permissions issues (on macOS/Windows), which is why this try-finally block is used.

The code quality looks good. As you're the one who develops and uses these functions, we have to trust you. 😄

Just one suggestion, add the comment to the codes, explaining why you use unique_name here.

That's the first question when I read your codes before I see your comment here.

The code quality looks good. As you're the one who develops and uses these functions, we have to trust you. smile

It's all Paul's work done a decade ago, I'm just wrapping it in Python so more people can use it easily 😃 You won't believe how many 'crossover analysis' tools have been written again and again, but that's another story.

Just one suggestion, add the comment to the codes, explaining why you use unique_name here.

That's the first question when I read your codes before I see your comment here.

Ok, will do.

Bumps [pygmt](https://github.com/GenericMappingTools/pygmt) from 0.1.2-36-g4939ee2a to 0.2.0. - [Release notes](https://github.com/GenericMappingTools/pygmt/releases) - [Changelog](https://github.com/GenericMappingTools/pygmt/blob/master/doc/changes.rst) - [Commits](GenericMappingTools/pygmt@v0.1.2-36-g4939ee2a...v0.2.0) This includes several enhancements such as 'Sensible array outputs for pygmt info' (GenericMappingTools/pygmt#575) and 'Allow passing in pandas dataframes to x2sys_cross' (GenericMappingTools/pygmt#591) that will make our crossover analysis work and figure generation easier! Also edited Github Actions workflow to only run Docker build on Pull Requests when ready to review or when review is requested (i.e. not when PR is in draft mode).

weiji14 added 4 commits September 9, 2020 00:12

Implement tempfile_from_buffer for io.StringIO inputs

1d9e72a

Allow passing in kwargs to tempfile_from_buffer

e8cb8b6

Mock X2SYS_HOME in a temporary X2SYS_TMP instead of current working dir

a64f47f

Allow passing in pandas dataframes to x2sys_cross

9437850

Implemented by storing pandas.DataFrame data in a temporary file and passing this intermediate file to x2sys_cross. Need to do some regex file parsing to get the right file extension (suffix) for this to work.

weiji14 added the enhancement Improving an existing feature label Sep 8, 2020

weiji14 mentioned this pull request Sep 8, 2020

Analysis of ICESat-2 crossover track height errors weiji14/deepicedrain#146

Merged

3 tasks

Change working directory to inside X2SYS_TMP during test session

c32870c

So that the tests will pass on macOS and Windows too.

vercel bot temporarily deployed to Preview September 8, 2020 23:54 Inactive

weiji14 added 2 commits September 9, 2020 17:03

Merge branch 'master' into x2sys_cross_dataframes

ef98ab5

Remove some info level verbose messages from x2sys_cross

01c5ee5

vercel bot temporarily deployed to Preview September 9, 2020 05:06 Inactive

weiji14 added 2 commits September 9, 2020 17:45

Revert "Allow passing in kwargs to tempfile_from_buffer"

4132ff5

This reverts commit e8cb8b6.

Revert "Implement tempfile_from_buffer for io.StringIO inputs"

2c41ca4

This reverts commit 1d9e72a.

vercel bot temporarily deployed to Preview September 9, 2020 05:58 Inactive

weiji14 added 2 commits September 9, 2020 22:01

Use tempfile_from_dftrack instead of tempfile_from_buffer

aa3a521

Try closing the file stream before writing

fcd6cfe

Because Windows (and macOS?) might not support opening same temporary file twice.

weiji14 force-pushed the x2sys_cross_dataframes branch from 0f4d5d7 to fcd6cfe Compare September 9, 2020 10:20

vercel bot temporarily deployed to Preview September 9, 2020 10:20 Inactive

Try different way of closing the tmpfile generated from dataframe

17a2f3e

vercel bot temporarily deployed to Preview September 9, 2020 10:49 Inactive

Don't use GMTTempFile, just generate random filename and write to it

1446429

vercel bot temporarily deployed to Preview September 9, 2020 11:42 Inactive

Reduce git diff and make Windows tests pass by ignoring permission error

a2b08be

vercel bot temporarily deployed to Preview September 9, 2020 12:13 Inactive

vercel bot temporarily deployed to Preview September 9, 2020 21:55 Inactive

weiji14 added 2 commits September 10, 2020 09:55

Revert "Mock X2SYS_HOME in a temporary X2SYS_TMP instead of current w…

8efb415

…orking dir" This reverts commit a64f47f.

Revert "Remove some info level verbose messages from x2sys_cross"

1b658c7

This reverts commit 01c5ee5.

weiji14 force-pushed the x2sys_cross_dataframes branch from 9bb678c to 1b658c7 Compare September 9, 2020 21:56

vercel bot temporarily deployed to Preview September 9, 2020 21:56 Inactive

Better suffix finder for dataframe input into x2sys_cross

8430dd7

Test input two pandas dataframes to x2sys_cross with time column

b93a7aa

vercel bot temporarily deployed to Preview September 9, 2020 23:35 Inactive

weiji14 marked this pull request as draft September 9, 2020 23:53

weiji14 marked this pull request as ready for review September 9, 2020 23:53

weiji14 commented Sep 10, 2020

View reviewed changes

weiji14 added 2 commits September 10, 2020 15:26

Merge remote-tracking branch 'upstream/master' into x2sys_cross_dataf…

fb52df4

…rames

Refactor to find suffix without using regex

5338f37

Also rename 'result' to 'table' to prevent pylint complaining about R0914: Too many local variables (16/15) (too-many-locals)

vercel bot temporarily deployed to Preview September 10, 2020 03:28 Inactive

weiji14 commented Sep 10, 2020

View reviewed changes

seisman approved these changes Sep 10, 2020

View reviewed changes

Improve docstring of x2sys_cross and tempfile_from_dftrack

bb22363

vercel bot temporarily deployed to Preview September 10, 2020 04:45 Inactive

weiji14 merged commit 6deb388 into master Sep 10, 2020

weiji14 deleted the x2sys_cross_dataframes branch September 10, 2020 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow passing in pandas dataframes to x2sys_cross #591

Allow passing in pandas dataframes to x2sys_cross #591

weiji14 commented Sep 8, 2020 •

edited

Loading

weiji14 Sep 10, 2020

seisman Sep 10, 2020 •

edited

Loading

weiji14 Sep 10, 2020

weiji14 left a comment

weiji14 Sep 10, 2020

seisman Sep 10, 2020

weiji14 Sep 10, 2020

Allow passing in pandas dataframes to x2sys_cross #591

Allow passing in pandas dataframes to x2sys_cross #591

Conversation

weiji14 commented Sep 8, 2020 • edited Loading

weiji14 Sep 10, 2020

Choose a reason for hiding this comment

seisman Sep 10, 2020 • edited Loading

Choose a reason for hiding this comment

weiji14 Sep 10, 2020

Choose a reason for hiding this comment

weiji14 left a comment

Choose a reason for hiding this comment

weiji14 Sep 10, 2020

Choose a reason for hiding this comment

seisman Sep 10, 2020

Choose a reason for hiding this comment

weiji14 Sep 10, 2020

Choose a reason for hiding this comment

weiji14 commented Sep 8, 2020 •

edited

Loading

seisman Sep 10, 2020 •

edited

Loading