-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to transform catalog to band-named columns #442
Add method to transform catalog to band-named columns #442
Conversation
This adds a method for "un-tidying" the data in a catalog to get an arrangement in which separate passbands are columns.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #442 +/- ##
==========================================
+ Coverage 80.18% 80.26% +0.08%
==========================================
Files 30 30
Lines 3921 3938 +17
==========================================
+ Hits 3144 3161 +17
Misses 777 777 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as long as it passes the tests.
``mag_error_``. If the catalog already has columns with these names, they will | ||
be overwritten. The input catalog will not be changed. | ||
""" | ||
catalog_passbands = set(self["passband"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, a use for set
that isn't students trying to de-duplicate items in a list by looking up answer in Stack Overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤣
# This makes a MultiIndex for the columns -- "mag" and "mag_error" are the | ||
# top level, and the passbands are the second level. | ||
df = df.pivot( | ||
columns="passband", index=["id", "ra", "dec"], values=["mag", "mag_error"] | ||
) | ||
|
||
# The column names are a MultiIndex, so we flatten them to either "mag_band" | ||
# or "mag_error_band", where "band" is the passband name. | ||
df.columns = df.columns.to_series().str.join("_") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to stare at this a bit to see what it is doing. I think I understand it now. Your creating a pivoted table that lists magnitude and magnitude error with passband a a column, then using that to build mag_band and magerr_band columns... Still a bit of head scratcher for me, but I understand it has to be done this way.
assert set(new_cat["mag_V"]) == set(apass_input["Vmag"]) | ||
assert set(new_cat["mag_error_V"]) == set(apass_input["e_Vmag"]) | ||
assert set(new_cat["mag_SI"]) == set(apass_input["i_mag"]) | ||
assert set(new_cat["mag_error_SI"]) == set(apass_input["e_i_mag"]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a reasonable way to check that the new catalog format holds the same data as the original.
This adds a method for "un-tidying" the data in a catalog to get table in which separate passbands are columns. This briefly made me wonder if going with a tidy format for the catalog was a good decision, but I think it was. This ensures some uniformity across catalogs, hopefully.
This PR is part of the prep for bringing back magnitude transforms.