Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a table-based result data structure #549

Merged
merged 24 commits into from
Apr 22, 2024
Merged

Create a table-based result data structure #549

merged 24 commits into from
Apr 22, 2024

Conversation

jeremykubica
Copy link
Contributor

@jeremykubica jeremykubica commented Apr 8, 2024

The goal is to address #535. The Results object provides a wrapper around an astropy table that lets the user easily build it from a list of trajectories and to update multiple columns at once. It preserves the tracking (and ability to revert) filtering of results.

A user can directly access the table with:

my_table = table.results

or access the columns as:
table["x"]
directly interact with it, filter it, add columns, etc. without having to later sync it back.

@jeremykubica
Copy link
Contributor Author

If we don't care about tracking the filtered rows and providing the ability to revert those, we could simplify this a bunch further.

Copy link
Member

@DinoBektesevic DinoBektesevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok, obviously it changes one pretty big thing - jagged array support is removed because of the fundamental change to how the data is represented internally.

Mostly my wants were related to __getitem__ and __repr_html_ because that's what I end up looking at most often.

This change did bring up a lot of questions for me about the utility and purpose of Trajectory in this particular context. I think there's a lot of structure of arrays versus array of structures conflicts in this proof of concept that I'd like to see cleared out by making a definitive decision to go one or the other way - this double dipping causes some indirection that are hard to follow.

I left a bunch of little comments for documentation but it's not serious, please feel free to ignore I understand this is a POC implementation.

src/kbmod/result_table.py Outdated Show resolved Hide resolved
src/kbmod/result_table.py Outdated Show resolved Hide resolved
src/kbmod/result_table.py Outdated Show resolved Hide resolved
src/kbmod/result_table.py Outdated Show resolved Hide resolved
src/kbmod/result_table.py Outdated Show resolved Hide resolved
Comment on lines 160 to 167
# Go through each row to update.
for row in self.results:
if use_valid_indices:
inds = row["index_valid"]
trj = update_trajectory_from_psi_phi(
row["trajectory"], row["psi_curve"], row["phi_curve"], index_valid=inds, in_place=True
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always very clear what's index_valid, inds, valid_indices and they keep shifting names but I have no better naming suggestions though.

add_psi_phi calls update_lh calls update_trj_from_psi_phi, it's all for loops and can be replaced by (if it's ok that there are no jagged arrays in the table like we talked about) with this:

phisum = (test["phi"] * test["valid_idxs"]).sum(axis=1)
psisum = (test["psi"] * test["valid_idxs"]).sum(axis=1)
test["lh"] = phisum/np.sqrt(psisum)
test["flux"] = psi_sum / phi_sum
test["n_obs"] = test["valid_idxs"].sum(axis=1)

and this is all vectorized so pretty fast and most of the time in-place so no extra memory allocations. You seemed to be somewhat concerned about that, judging by the comments.

src/kbmod/result_table.py Outdated Show resolved Hide resolved
src/kbmod/result_table.py Outdated Show resolved Hide resolved
src/kbmod/result_table.py Outdated Show resolved Hide resolved
def test_create(self):
table = ResultTable(self.trj_list)
self.assertEqual(len(table), self.num_entries)
for i in range(self.num_entries):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like maybe dumping something into utils.py in the tests that just mocks out a few of these results could be helpful to remove a lot of this repeated code (including comparison code). But I have to go teach so I hadnt paid that much attention to tests.

@jeremykubica jeremykubica marked this pull request as ready for review April 17, 2024 14:07
Copy link
Member

@DinoBektesevic DinoBektesevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't pinpoint something that I think is bad or equal to what we had before. In all aspects it looks better to me.

I would rename some things for clarity, perhaps simplify some things if possible. I am miles away from the actual problems when integrating both of the classes together like this so some of this may not be possible atm, but would definitely be something I'd put on the wall for the final move.

I would want to go over some of the filtering that was changed a bit closer when I get a minute, maybe see if there's something where we can cut a corner, but it's "wishlist" atm.

src/kbmod/filters/clustering_filters.py Show resolved Hide resolved
src/kbmod/results.py Outdated Show resolved Hide resolved
src/kbmod/results.py Outdated Show resolved Hide resolved
src/kbmod/results.py Show resolved Hide resolved
Comment on lines +168 to +169
if not Path(filename).is_file():
raise FileNotFoundError(f"File {filename} not found.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to have this here, but you can also let table raise itself.

src/kbmod/results.py Outdated Show resolved Hide resolved
src/kbmod/results.py Outdated Show resolved Hide resolved
src/kbmod/results.py Outdated Show resolved Hide resolved
src/kbmod/results.py Outdated Show resolved Hide resolved
src/kbmod/trajectory_utils.py Outdated Show resolved Hide resolved
@DinoBektesevic DinoBektesevic self-requested a review April 22, 2024 19:16
Copy link
Member

@DinoBektesevic DinoBektesevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks amazing! Thanks for all the effort.

@jeremykubica jeremykubica merged commit f0f7787 into main Apr 22, 2024
2 checks passed
@jeremykubica jeremykubica deleted the result_table branch April 22, 2024 19:20
@jeremykubica jeremykubica mentioned this pull request Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants