-
-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add: new SDSS V datatype loaders #1107
Conversation
9966c92
to
0846cc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start. One thing we might want to have a think on, or get some feedback from the specutils folks, is for the multi
loaders, if we should use SpectrumList
or Spectrum1D
. If all the spectra in that file are on the same wavelength solution (and shape), we may want to use Spectrum1D
as the container to take advantage of its numpy array operations. If they really are different, then we can stick with SpectrumList
, which is just a regular python list.
|
||
""" | ||
# Orderly with an OrderedDict | ||
common_meta = OrderedDict([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dicts
are ordered by default now, as of python 3.7. So this could be common_meta = {}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep for backward compatibility -- needs to work on python 3.4->3.6 as per contributing guidelines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to keep it for backward compatibility with older Python versions (3.4->3.6) as per guidelines, but I've changed all OrderedDict()
metadata calls to dict()
in b61a668.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those docs are out of date. The minimum python for spectuils is 3.8. Python 3.4-3.7 is already end-of-life
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I didn't realize our contributor docs were so out of date. I'll open a PR to fix that soon.
for key in hdulist[0].header.keys(): | ||
# exclude these keys | ||
if key.startswith(("TTYPE", "TFORM", "TDIM")) or key in ( | ||
"", | ||
"COMMENT", | ||
"CHECKSUM", | ||
"DATASUM", | ||
"NAXIS", | ||
"NAXIS1", | ||
"NAXIS2", | ||
"XTENSION", | ||
"BITPIX", | ||
"PCOUNT", | ||
"GCOUNT", | ||
"TFIELDS", | ||
): | ||
continue | ||
|
||
common_meta[key.lower()] = hdulist[0].header.get( | ||
key) # add key to dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't documented very well but the header is expected to be stored within meta
in a key called header
. I think we can also just dump the complete primary header here, something like meta['header'] = hdulist[0].header
All the other keys you've added can either be added at the top level of meta
, or if you'd rather they live in the header, you can add them there. I think where they live only matters when writing out the Spectrum1D
object to a file and reading it back in again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fine then. I haven't tested the tabular writer wcs1d_fits_writer
, but I'd assume leaving this header as is in the metadata would be fine.
mwmVisit/Star
might fail on writing like that though, since it partially stores some metadata within the data
component of the BinTableHDU
(SNR, telescope, observation date, MJD), so I've left that in the meta
dictionary for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too worried about people writing the Spectrum1D
objects back to files. Currently, both writers would write out files that are not in the original format, and not that useful. Many of the SDSS products in here may not write correctly at the moment.
There is a longer ongoing discussion about improving the default writers to make more fits-like things. We could write custom writers for our loaders, but I think that's out of scope for this PR.
spectral_axis=spectral_axis, | ||
flux=flux, | ||
uncertainty=e_flux, | ||
# mask=mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would advocate for including the mask
where possible for all the loaders. The SDSS mask arrays are a bit different than the astropy masked arrays used in specutils, which are basic boolean True/False. See the use in the SDSS manga
loaders of examples for converting from an SDSS maskbit array to the boolean one that specutils expects. It's not ideal but it's something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fd999ce -- I'm not sure if this is how the bitmask works (do we consider 0 valid or invalid?), and also the coadd mask is done with two methods (AND and OR) for BOSS spectra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for SDSS, a value of 0 is a good pixel (valid) and >0 is a bad pixel (invalid). The Spectrum1D mask
attribute is an "array where values in the flux to be masked are those that astype(bool) converts to True. (For example, integer arrays are not masked where they are 0, and masked for any other value.)". For SDSS products that have a single mask array, flipping the condition like in the manga
loader should work.
For the BOSS AND/OR masks, I think we can ask @Sean-Morrison whether we should use one or the other as the default input for the mask
attribute, or if they should be combined, and if so, how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think for BOSS we want to use OR masks, as that would result in a cleaner plot, but we might want to check with @joelbrownstein for what we did in the old SDSS webapp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think once we fix the mask arrays, this PR is ready to be marked as ready
and reviewed by the larger specutils
group. They should be able to give us more insight into our other questions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rileythai Others should perhaps comment, but I think we need to convert all the mask arrays explicitly to boolean True/False arrays, similar to what is done in manga
. I believe a fair amount of other specutils methods/functions assume this and do things like s.flux[~s.mask]
for selection of good values. If we retain the integer values, this selection won't work, and people need to explicitly do s.mask==False
.
The main issue is jdaviz. It won't load |
If it's primarily a jdaviz issue, then it should be fixed there. We can open a new issue in |
# reduce flux array if 1D in 2D np array | ||
# NOTE: bypasses jdaviz specviz NotImplementedError, but could be the expected output for specutils | ||
if flux.shape[0] == 1: | ||
flux = flux[0] | ||
e_flux = e_flux[0] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we want any jdaviz specific logic in here since it's a downstream package and it could change the use this apStar spectrum1d outside of that context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be its expected behavior anyways? If there's only a single spectrum within the data file, is the flux
expected to still be loaded as an array within an array (2D, so without this block), or just a 1D array (with this block). Just wondering what the convention for this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Spectrum1D
just takes what you pass in as flux
, so np.array([1,2,3,4])
, and [np.array([1,2,3,4])]
retain their shapes of (4,)
and (1,4)
respectively. My point is only that we shouldn't put logic in here that is specifically addressing something in Jdaviz, but if this is the behaviour we want for the Spectrum1D
object, then that's certainly ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reducing a 2D flux array where one of the axes is degenerate to a 1D is reasonable, I would expect a 1D flux to come out of the reader for a 1D spectrum.
fix as per @Sean-Morrison 's suggestion in astropy pull req [astropy#1107](astropy#1107) could be reverted in future, in which case this commit can just be deleted
@rileythai I think there's something wrong with the SDSS-V identify functions on the loaders. The following code should correctly identify the format of this file either as
This function basically loops over all the identifiers in the formats table, Examples for MaNGA, and spec-lite for SDSS-IV.
|
I had it check for an
|
Thanks for this contribution, I'm trying to make some time to review in the next week. In the meantime, would you resolve the conflict with main? Cheers. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1107 +/- ##
==========================================
+ Coverage 71.11% 72.21% +1.10%
==========================================
Files 61 62 +1
Lines 4248 4427 +179
==========================================
+ Hits 3021 3197 +176
- Misses 1227 1230 +3 ☔ View full report in Codecov by Sentry. |
working on loaders
new helper funcs: - _fetch_metadata to perform grab of common metadata - _fetch_flux_unit to get flux unit from the given HDU and convert it to an astropy unit object
can fully load boss files now other changes: - commented out abstract @ things to make it so ipython autoreloads notes: - flux unit where for BOSS files? - what the heck are spAll, zAll, and zList HDUs? - does the InverseVariance need a unit?
added mwmVisit and mwmStar loaders. updated demonstration notebook accordingly and output to PDF del: test.pdf and test.ipynb del: secret SDSS-V data
able to now load all BOSS spec directly with the same underlying code. required refactoring methods into BOSS_spec loaders
going to now write implementation test add
mwm confirmed working still todo: - add HDU not-specified message - merge the mwm types into a single 2 loaders - confirm all other loaders work and add to __all__
- refactored BOSS spec methods and mwm spec methods into single functions for simplicity - all loaders WORKING!! (except apStar multi) - all the documentation + type hinting (excluding outputs) - changed variable names to standard types used in specutils - TODO: the apStar multi-loader is confusing, so it remains unimplemented for now. - CHECK: do I need to clean the files of zero vals? - TODO: BUNIT pulls for spec and mwm files - TODO: check with data team what mwm files are needed
- currently non-functional because of zero values in x-axis - deleted test_implementation.ipynb for policy reasons
- jdaviz hates nan and zero flux, so they have to be removed - TODO: open issue on jdaviz repo about nan and zero flux bug the bug originates in the x_min x_max values used for the redshift slider for line_lists (somehow) on nan and zero flux values in the Spectrum object.
apStar loader not yet tested because file is of length 1 (no visits) mwm loaders will SKIP any DATASUM=0 because Spectrum1D cannot be instantiated with zero data
fixes a jdaviz issue regarding a 1D flux in a 2D object, where it gets confused and explodes i will put an issue in for it this fix is different from the previous as it keeps all zero and NaN flux points
need someone to help me write a BinTableHDU for mwm files...
still need to write mwm dummy file for the tests there's also a foobar variable check for the metadata
now obtains header from PrimaryHDU in the HDUList, any data that was previously accessed through it has been removed too
keeping .jukit incase anyone else uses vim-jukit during dev
Spectrum1D intializer converts any 0 to valid values. I'm assuming that zeroes in the bitmask means that its valid, as per manga.py
fix as per @Sean-Morrison 's suggestion in astropy pull req [astropy#1107](astropy#1107) could be reverted in future, in which case this commit can just be deleted
OBSERVAT column not in everything so i changed it, also adding another LOGLAM check to the coadd HDU check.
instead of specifying a hdu on Spectrum1D loaders for spec and mwm types, it will not find the first HDU with data, or in the case of spec, just use the coadd. this means that it works directly with jdaviz for those two datatypes correctly now. there are no user facing methods, and I don't want to break anything, but it should be noted that these datafiles can contain several spectra, which inherently limits this. in theory, I could put everything as a Spectrum1D nD flux object, but I'm pretty sure that breaks sometimes for jdaviz.
- force masks to be boolean prior to entering initializer - add mwm file tests based on dummy file (credit to @andycasey for those dummy file generators) - add more mwm file tests for failures - added checks to see if file is empty for mwm files based on datasum (failsafe)
e548a16
to
1fdb4a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks (and double thanks for adding tests)!
Within the 5th generation of the SDSS, there are several new data products, and many of the access methods for existing datatypes have been changed.
This pull request will add new default loaders for the new/updated SDSS-V spectra data products in a new file called
sdss_v.py
, along with unit tests intest_sdss_v.py
, which will allow for automatic loading of SDSS-V data into Spectrum1D/List objects, and directly into jdaviz.For
mwm
andspec
files, the first HDU with data is loaded with theSpectrum1D
loader, whereas all spectra within a file are loaded with theSpectrumList
loader.Remaining questions:
mwm
files can contain multiple stacked spectra from APOGEE and BOSS spectrographs.spec-full
files can contain both the coadd and individual visits.spec
files, and the first HDU with data formwm
files.SpectrumList
inherits a basic reading method from theSpectrum1D
methods (I don't want it to do this because of how some access methods will encounter non-spectra when we want to import everything)jdaviz
does not yet have an implementation forSpectrum1D
objects of nD flux. Do we want to accomodate for that here? Would it be it's expected behaviour anyways? (i.e. 1 spectrum in a Spectrum1D object only, not several flux arrays)mwmVisit
files is stacked, similarly to how it is inapStar
files. If we keep current access methods, this means it would need a double specification for accessing a single spectrum (the observatory/instrument + the specific spectrum from that obs/instrument)