Create a table-based result data structure #549

jeremykubica · 2024-04-08T16:19:40Z

The goal is to address #535. The Results object provides a wrapper around an astropy table that lets the user easily build it from a list of trajectories and to update multiple columns at once. It preserves the tracking (and ability to revert) filtering of results.

A user can directly access the table with:

my_table = table.results

or access the columns as:
table["x"]
directly interact with it, filter it, add columns, etc. without having to later sync it back.

jeremykubica · 2024-04-09T13:43:04Z

If we don't care about tracking the filtered rows and providing the ability to revert those, we could simplify this a bunch further.

DinoBektesevic

I think this is ok, obviously it changes one pretty big thing - jagged array support is removed because of the fundamental change to how the data is represented internally.

Mostly my wants were related to __getitem__ and __repr_html_ because that's what I end up looking at most often.

This change did bring up a lot of questions for me about the utility and purpose of Trajectory in this particular context. I think there's a lot of structure of arrays versus array of structures conflicts in this proof of concept that I'd like to see cleared out by making a definitive decision to go one or the other way - this double dipping causes some indirection that are hard to follow.

I left a bunch of little comments for documentation but it's not serious, please feel free to ignore I understand this is a POC implementation.

src/kbmod/result_table.py

DinoBektesevic · 2024-04-12T21:38:15Z

src/kbmod/result_table.py

+        # Go through each row to update.
+        for row in self.results:
+            if use_valid_indices:
+                inds = row["index_valid"]
+            trj = update_trajectory_from_psi_phi(
+                row["trajectory"], row["psi_curve"], row["phi_curve"], index_valid=inds, in_place=True
+            )
+


It's not always very clear what's index_valid, inds, valid_indices and they keep shifting names but I have no better naming suggestions though.

add_psi_phi calls update_lh calls update_trj_from_psi_phi, it's all for loops and can be replaced by (if it's ok that there are no jagged arrays in the table like we talked about) with this:

phisum = (test["phi"] * test["valid_idxs"]).sum(axis=1) psisum = (test["psi"] * test["valid_idxs"]).sum(axis=1) test["lh"] = phisum/np.sqrt(psisum) test["flux"] = psi_sum / phi_sum test["n_obs"] = test["valid_idxs"].sum(axis=1)

and this is all vectorized so pretty fast and most of the time in-place so no extra memory allocations. You seemed to be somewhat concerned about that, judging by the comments.

src/kbmod/result_table.py

DinoBektesevic · 2024-04-12T22:02:08Z

tests/test_result_table.py

+    def test_create(self):
+        table = ResultTable(self.trj_list)
+        self.assertEqual(len(table), self.num_entries)
+        for i in range(self.num_entries):


I feel like maybe dumping something into utils.py in the tests that just mocks out a few of these results could be helpful to remove a lot of this repeated code (including comparison code). But I have to go teach so I hadnt paid that much attention to tests.

This is mostly for backwards compatibility.

Duplicates the tests. We can remove the extra tests when re remove the ResultList object.

DinoBektesevic

I can't pinpoint something that I think is bad or equal to what we had before. In all aspects it looks better to me.

I would rename some things for clarity, perhaps simplify some things if possible. I am miles away from the actual problems when integrating both of the classes together like this so some of this may not be possible atm, but would definitely be something I'd put on the wall for the final move.

I would want to go over some of the filtering that was changed a bit closer when I get a minute, maybe see if there's something where we can cut a corner, but it's "wishlist" atm.

src/kbmod/filters/clustering_filters.py

src/kbmod/results.py

DinoBektesevic · 2024-04-22T16:37:48Z

src/kbmod/results.py

+        if not Path(filename).is_file():
+            raise FileNotFoundError(f"File {filename} not found.")


It's fine to have this here, but you can also let table raise itself.

src/kbmod/results.py

src/kbmod/trajectory_utils.py

DinoBektesevic

Looks amazing! Thanks for all the effort.

jeremykubica added 6 commits April 5, 2024 12:56

Basic skeleton

bdc2968

Merge branch 'main' into result_table

f494eaa

Framework of the psi/phi information

032e1b9

Finish psi/phi updates

673bd7e

Fix typo

c444c5b

Add ability to save and load

0776084

jeremykubica mentioned this pull request Apr 9, 2024

Add helper functions to update trajectory statistics. #556

Merged

Merge branch 'main' into result_table

1849512

DinoBektesevic reviewed Apr 12, 2024

View reviewed changes

jeremykubica added 6 commits April 15, 2024 09:57

Address some PR comments

2a46779

Update comment

d6ec08f

Add helper to compute the likelihood curves

32659f3

Create a vectorized version of sigmaG

f8938f5

Add ability to save in a results file format

ba7dc44

This is mostly for backwards compatibility.

Generalize clustering code so it takes Results objects

c77b95c

Duplicates the tests. We can remove the extra tests when re remove the ResultList object.

jeremykubica marked this pull request as ready for review April 17, 2024 14:07

Fix comment

fd3fe77

jeremykubica mentioned this pull request Apr 18, 2024

Remove the BatchFilter abstract data type #561

Merged

jeremykubica added 9 commits April 18, 2024 16:10

Merge branch 'main' into result_table

26b6755

Fix bad merge

933cf9e

Improve flow of sigmaG filtering for Results object

5fa1334

Add ability to fetch all stamps for a Results object

72ed621

Extend stamp filtering to use Results object

5fe9e62

Fix negative_clipping on sigmaG filtering with a Results object

1b964d4

Allow mistmatched joins with an empty table

04e8926

Update constructor

deb3c9f

Update results.py

9e790cf

DinoBektesevic reviewed Apr 22, 2024

View reviewed changes

Address PR comments

84484cd

DinoBektesevic self-requested a review April 22, 2024 19:16

DinoBektesevic approved these changes Apr 22, 2024

View reviewed changes

jeremykubica merged commit f0f7787 into main Apr 22, 2024
2 checks passed

jeremykubica deleted the result_table branch April 22, 2024 19:20

jeremykubica mentioned this pull request Apr 23, 2024

Fix small bug #564

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a table-based result data structure #549

Create a table-based result data structure #549

jeremykubica commented Apr 8, 2024 •

edited

Loading

jeremykubica commented Apr 9, 2024

DinoBektesevic left a comment

DinoBektesevic Apr 12, 2024

DinoBektesevic Apr 12, 2024

DinoBektesevic left a comment

DinoBektesevic Apr 22, 2024

DinoBektesevic left a comment

		if not Path(filename).is_file():
		raise FileNotFoundError(f"File {filename} not found.")

Create a table-based result data structure #549

Create a table-based result data structure #549

Conversation

jeremykubica commented Apr 8, 2024 • edited Loading

jeremykubica commented Apr 9, 2024

DinoBektesevic left a comment

Choose a reason for hiding this comment

DinoBektesevic Apr 12, 2024

Choose a reason for hiding this comment

DinoBektesevic Apr 12, 2024

Choose a reason for hiding this comment

DinoBektesevic left a comment

Choose a reason for hiding this comment

DinoBektesevic Apr 22, 2024

Choose a reason for hiding this comment

DinoBektesevic left a comment

Choose a reason for hiding this comment

jeremykubica commented Apr 8, 2024 •

edited

Loading