-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a table-based result data structure #549
Conversation
If we don't care about tracking the filtered rows and providing the ability to revert those, we could simplify this a bunch further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok, obviously it changes one pretty big thing - jagged array support is removed because of the fundamental change to how the data is represented internally.
Mostly my wants were related to __getitem__
and __repr_html_
because that's what I end up looking at most often.
This change did bring up a lot of questions for me about the utility and purpose of Trajectory
in this particular context. I think there's a lot of structure of arrays versus array of structures conflicts in this proof of concept that I'd like to see cleared out by making a definitive decision to go one or the other way - this double dipping causes some indirection that are hard to follow.
I left a bunch of little comments for documentation but it's not serious, please feel free to ignore I understand this is a POC implementation.
src/kbmod/result_table.py
Outdated
# Go through each row to update. | ||
for row in self.results: | ||
if use_valid_indices: | ||
inds = row["index_valid"] | ||
trj = update_trajectory_from_psi_phi( | ||
row["trajectory"], row["psi_curve"], row["phi_curve"], index_valid=inds, in_place=True | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not always very clear what's index_valid
, inds
, valid_indices
and they keep shifting names but I have no better naming suggestions though.
add_psi_phi
calls update_lh
calls update_trj_from_psi_phi
, it's all for loops and can be replaced by (if it's ok that there are no jagged arrays in the table like we talked about) with this:
phisum = (test["phi"] * test["valid_idxs"]).sum(axis=1)
psisum = (test["psi"] * test["valid_idxs"]).sum(axis=1)
test["lh"] = phisum/np.sqrt(psisum)
test["flux"] = psi_sum / phi_sum
test["n_obs"] = test["valid_idxs"].sum(axis=1)
and this is all vectorized so pretty fast and most of the time in-place so no extra memory allocations. You seemed to be somewhat concerned about that, judging by the comments.
tests/test_result_table.py
Outdated
def test_create(self): | ||
table = ResultTable(self.trj_list) | ||
self.assertEqual(len(table), self.num_entries) | ||
for i in range(self.num_entries): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like maybe dumping something into utils.py
in the tests that just mocks out a few of these results could be helpful to remove a lot of this repeated code (including comparison code). But I have to go teach so I hadnt paid that much attention to tests.
This is mostly for backwards compatibility.
Duplicates the tests. We can remove the extra tests when re remove the ResultList object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't pinpoint something that I think is bad or equal to what we had before. In all aspects it looks better to me.
I would rename some things for clarity, perhaps simplify some things if possible. I am miles away from the actual problems when integrating both of the classes together like this so some of this may not be possible atm, but would definitely be something I'd put on the wall for the final move.
I would want to go over some of the filtering that was changed a bit closer when I get a minute, maybe see if there's something where we can cut a corner, but it's "wishlist" atm.
if not Path(filename).is_file(): | ||
raise FileNotFoundError(f"File {filename} not found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to have this here, but you can also let table raise itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks amazing! Thanks for all the effort.
The goal is to address #535. The
Results
object provides a wrapper around an astropy table that lets the user easily build it from a list of trajectories and to update multiple columns at once. It preserves the tracking (and ability to revert) filtering of results.A user can directly access the table with:
or access the columns as:
table["x"]
directly interact with it, filter it, add columns, etc. without having to later sync it back.