Spectroscopy Notebook Updates #281

afaisst · 2024-05-31T17:18:25Z

The following changes were made:

Include JWST MSA and Slit spectra (main update!)
SPARCL query (DESI search) now includes SDSS-DR16 that can be used for comparison to DR17 search from SDSS archive. Note that according to the SPARCL webpage, DR17 is not yet included.
Imported nddata package (astropy) in desi_functions.py. This was a bug.
Bug fix in HST (and accordingly JWST) search functions: the order of file in the as output in the mast download function is not the same as the order of files in the product input list. This led to a mismatch if there are multiple spectra (especially the case for JWST). This has now been fixed.
The clean_sample() function now has a precision argument where the user can set the precision (as astropy unit) at which coordinates are taken to be the same. This was implemented because the JWST MSA slits were too close together and in the current hard-coded threshold would be counted as overlapping.
The SDSS_get_spec() function now has a data_release argument where the user can choose the SDSS data release. Currently, up to DR17 is supported. DR18 is not supported, yet, due to some inconsistencies on the astroquery backend (an issue has been submitted, the likely culprit is a broken link to DR18).
General updates in the documentation of spectra_generator.md.

…or.md

jkrick

Thanks for working on the JWST spectra and fixing and cleaning up the rest of the notebook.

As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?

jkrick · 2024-06-05T20:46:32Z

spectroscopy/spectra_generator.md


 ```python
 %%time
-## Get DESI and BOSS spectra with SPARCL
+## Get DESI and BOSS and SDSS spectra with SPARCL


I don't understand why we would want there to be duplicates in the final plot for DR16 and DR17? Of the example spectra in this notebook, the two spectra are mostly overlapping. Does the difference give us some sort of information? Without knowing if the difference is due to a different reduction or a second observation, it is hard to evaluate what we would learn. I vote for no DR16 and just using sparcle for DESI and BOSS.

removed SDSS-DR16 here to avoid duplication.

jkrick · 2024-06-05T21:40:41Z

spectroscopy/spectra_generator.md



-sample_table = clean_sample(coords, labels , verbose=1)
+sample_table = clean_sample(coords, labels, precision=2.0* u.arcsecond , verbose=1)

 print("Number of sources in sample table: {}".format(len(sample_table)))


This line is repetitive because the same information is printed out at the bottom of clean_sample

ok, removed that line.

jkrick · 2024-06-05T21:42:19Z

spectroscopy/spectra_generator.md

line 174 fails for me because there is no directory 'data', can you please add a line to try to make a 'data' directory and catch the error if it already exists?

jkrick · 2024-06-05T21:48:29Z

spectroscopy/spectra_generator.md

+```python
+%%time
+## Get Spectra for HST
+df_jwst = JWST_get_spec(sample_table , search_radius_arcsec = 0.5, datadir = "./data/")


Can we make this less verbose in the default? It has a lot of output that is good for debugging, but not necessary for most users, ie., listing out the fits files it is downloading, plots, etc. None of the other get_spec functions include plots.

The "Downloading URL..." output is coming from the mastquery. I can suppress that.

I added a verbose argument to the function. If set to True the use can enable extra talking. In the other case, I was able to suppress the output of the mastquery function by using:

from contextlib import redirect_stdout trap = io.StringIO() with redirect_stdout(trap): myfun()

The "Downloading URL..." output is coming from the mastquery. I can suppress that.

We should fix that in astroquery, too, it should not generate such noise without explicit opt-in.

(can you please open an issue for it?)

done: astropy/astroquery#3029

jkrick · 2024-06-05T22:02:47Z

spectroscopy/spectra_generator.md

@@ -265,12 +272,8 @@ We show flux in mJy as a function of time for all available bands for each objec
 ```python
 ### Plotting ####
 create_figures(df_spec = df_spec,
-             bin_factor=10,
+             bin_factor=5,


something is wrong with the X-axis wavelengths of the JWST spectra. I am getting the clear prism from 0.0001 to 0.0005 microns. I'm guessing there is a unit problem here.

Also, can the plotting be changed to eliminate the downward excursions on the log plots? ie, are there zeros in the spectra or are those error bars that extend to the bottoms of the log plots? or???

I changed the plotting. Opened a can of worms, the units were somehow wrong. I made it consistent now by using the astropy units throughout.

these are very small values that are making the plot ugly. I can try to sigma-clip them out...

for #2, I sigma clipped and also removed negative fluxes (just for plotting)

bsipocz · 2024-06-11T18:19:57Z

As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?

I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.
astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.

afaisst · 2024-06-11T18:41:09Z

As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?

I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case. astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.

I agree with that.
Somehow limit the disk space and give the user a warning when space is tight? We can also write a clean-up function that clears all the temporary files at some point.

afaisst · 2024-06-11T21:14:59Z

ok, I made the following updates according to the comments above:

removed SDSS DR16 from the DESI/BOSS search
removed redundant print() statement under sample creation
if the "data" directory does not exist, the directory is created
added "verbose" argument to JWST and HST spectrum generator functions. User get more outputs if verbose is set to True. (Along these lines, I opened an astroquery issue to reduce output)
major update in the plotting backend: astropy units are not propagated correctly (avoid doing the 1/1e4 when plotting um instead of Angstroms). Also include sigma-clipping and removal of negative fluxes for nicer looking log-plots. Rewrote the spectral binning (to include astropy units and avoid "empty slices" warnings).

@jkrick : Ready to review again... then I will merge.

jkrick · 2024-06-12T01:18:55Z

As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?

I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case. astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.

I agree with that. Somehow limit the disk space and give the user a warning when space is tight? We can also write a clean-up function that clears all the temporary files at some point.

I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.

troyraen · 2024-06-12T21:30:28Z

Since this is a Fornax notebook I think we have to make it usable on the Fonax Console, which means respecting the 10G user disk space.

Has anyone looked to see if there are ways to avoid actually downloading anything(s)?
We should warn the user upfront how much space will be needed.
Does anyone have a sense of how much disk space the full notebook currently requires?

We can also write a clean-up function that clears all the temporary files at some point.

If the full notebook needs less than 5G(?) disk space, then my vote is to write this function and make it an optional thing, so it's available for the to user run or not as they wish. (Choosing 5G to leave space for other things.)

I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.

Do you know how much disk space would be needed for all the Herschel stuff you want to download?

So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.

I agree that's a good thing to push for. I don't think we should count on that to come in time for this notebook to rely on it.

jkrick · 2024-06-13T17:50:26Z

opened an issue (make cleanup parameter for storing/deleting downloaded files for _get_spec functions #285) for what to do with the downloaded data to let that conversation continue but be able to close this PR when the rest is ready.
I am getting a UnitConversionError on the JWST module:

UnitConversionError                       Traceback (most recent call last)
File <timed exec>:2

File /stage/irsa-staff-jkrick/fornax-demo-notebooks/spectroscopy/code_src/mast_functions.py:55, in JWST_get_spec(sample_table, search_radius_arcsec, datadir, verbose)
     53 ## Group
     54 print("Grouping Spectra... ")
---> 55 df_jwst_group = JWST_group_spectra(df_jwst_all, verbose=verbose, quickplot=False)
     56 print("done")
     58 return(df_jwst_group)

File /stage/irsa-staff-jkrick/fornax-demo-notebooks/spectroscopy/code_src/mast_functions.py:238, in JWST_group_spectra(df, verbose, quickplot)
    235 if verbose: print("Units of fluxes for each spectrum: {}".format( ",".join([str(tt) for tt in fluxes_units]) ) )
    237 ## Unit conversion to erg/s/cm2/A (note fluxes are nominally in Jy. So have to do the step with dividing by lam^2)
--> 238 fluxes_stack_cgs = (fluxes_stack * fluxes_units[0]).to(u.erg / u.second / (u.centimeter**2) / u.hertz) * (const.c.to(u.angstrom/u.second)) / (wave_grid.to(u.angstrom)**2)
    239 fluxes_stack_cgs = fluxes_stack_cgs.to(u.erg / u.second / (u.centimeter**2) / u.angstrom)
    241 ## Add to data frame

File /opt/conda/lib/python3.11/site-packages/astropy/units/quantity.py:938, in Quantity.to(self, unit, equivalencies, copy)
    934 unit = Unit(unit)
    935 if copy:
    936     # Avoid using to_value to ensure that we make a copy. We also
    937     # don't want to slow down this method (esp. the scalar case).
--> 938     value = self._to_value(unit, equivalencies)
    939 else:
    940     # to_value only copies if necessary
    941     value = self.to_value(unit, equivalencies)

File /opt/conda/lib/python3.11/site-packages/astropy/units/quantity.py:891, in Quantity._to_value(self, unit, equivalencies)
    888     equivalencies = self._equivalencies
    889 if not self.dtype.names or isinstance(self.unit, StructuredUnit):
    890     # Standard path, let unit to do work.
--> 891     return self.unit.to(
    892         unit, self.view(np.ndarray), equivalencies=equivalencies
    893     )
    895 else:
    896     # The .to() method of a simple unit cannot convert a structured
    897     # dtype, so we work around it, by recursing.
    898     # TODO: deprecate this?
    899     # Convert simple to Structured on initialization?
    900     result = np.empty_like(self.view(np.ndarray))

File /opt/conda/lib/python3.11/site-packages/astropy/units/core.py:1195, in UnitBase.to(self, other, value, equivalencies)
   1193     return UNITY
   1194 else:
-> 1195     return self._get_converter(Unit(other), equivalencies)(value)

File /opt/conda/lib/python3.11/site-packages/astropy/units/core.py:1124, in UnitBase._get_converter(self, other, equivalencies)
   1121             else:
   1122                 return lambda v: b(converter(v))
-> 1124 raise exc

File /opt/conda/lib/python3.11/site-packages/astropy/units/core.py:1107, in UnitBase._get_converter(self, other, equivalencies)
   1105 # if that doesn't work, maybe we can do it with equivalencies?
   1106 try:
-> 1107     return self._apply_equivalencies(
   1108         self, other, self._normalize_equivalencies(equivalencies)
   1109     )
   1110 except UnitsError as exc:
   1111     # Last hope: maybe other knows how to do it?
   1112     # We assume the equivalencies have the unit itself as first item.
   1113     # TODO: maybe better for other to have a `_back_converter` method?
   1114     if hasattr(other, "equivalencies"):

File /opt/conda/lib/python3.11/site-packages/astropy/units/core.py:1085, in UnitBase._apply_equivalencies(self, unit, other, equivalencies)
   1082 unit_str = get_err_str(unit)
   1083 other_str = get_err_str(other)
-> 1085 raise UnitConversionError(f"{unit_str} and {other_str} are not convertible")

UnitConversionError: 'MJy / sr' and 'erg / (Hz s cm2)' (spectral flux density) are not convertible

bsipocz · 2024-06-13T23:53:50Z

UnitConversionError: 'MJy / sr' and 'erg / (Hz s cm2)' (spectral flux density) are not convertible

That looks to be a case for using an equivalency? https://docs.astropy.org/en/stable/units/equivalencies.html#spectral-flux-and-luminosity-density-units

…o HST

afaisst · 2024-06-14T21:15:10Z

should all be resolved now. Issued ticket to astropy/specutils to fix the bug regarding reading in wrong units: astropy/specutils#1143

afaisst · 2024-06-18T19:18:39Z

Ran through it successfully.

The following changes were made: - Include JWST MSA and Slit spectra (main update!) SPARCL query (DESI search) now includes SDSS-DR16 that can be used for comparison to DR17 search from SDSS archive. Note that according to the SPARCL webpage, DR17 is not yet included. - Imported nddata package (astropy) in desi_functions.py. This was a bug. - Bug fix in HST (and accordingly JWST) search functions: the order of file in the as output in the mast download function is not the same as the order of files in the product input list. This led to a mismatch if there are multiple spectra (especially the case for JWST). This has now been fixed. - The clean_sample() function now has a precision argument where the user can set the precision (as astropy unit) at which coordinates are taken to be the same. This was implemented because the JWST MSA slits were too close together and in the current hard-coded threshold would be counted as overlapping. - The SDSS_get_spec() function now has a data_release argument where the user can choose the SDSS data release. Currently, up to DR17 is supported. DR18 is not supported, yet, due to some inconsistencies on the astroquery backend (an issue has been submitted, the likely culprit is a broken link to DR18). - Fixed bug related to specutils, which reads in the wrong unit for the JWST spectrum - General updates in the documentation of spectra_generator.md. c07ad7b

afaisst added 3 commits May 23, 2024 01:57

bug fix in HST search; first implementation of JWST search

f99be51

implemented JWST; added precision argument for cleaning sample

475bdc9

removed print() in desi_functions.py; updated text in spectra_generat…

e25d0c5

…or.md

afaisst requested a review from jkrick May 31, 2024 17:18

afaisst self-assigned this May 31, 2024

jkrick approved these changes Jun 5, 2024

View reviewed changes

bsipocz added the use case: spectroscopy Spectroscopy use case label Jun 11, 2024

bug fixes. Plotting improvement. Astropy Units added.

957d340

afaisst requested a review from jkrick June 11, 2024 21:15

This was referenced Jun 13, 2024

Add herschel_get_spec functionality to spectroscopy notebook #284

Merged

make cleanup parameter for storing/deleting downloaded files for _get_spec functions #285

Closed

afaisst added 2 commits June 14, 2024 20:56

fixed bug with specutils reading in wrong units; added proper error t…

12f839e

…o HST

fix typo

f19a433

fixed typos

202962a

afaisst merged commit c07ad7b into main Jun 18, 2024
3 checks passed

afaisst deleted the spectroscopy-faisst-May24 branch June 18, 2024 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spectroscopy Notebook Updates #281

Spectroscopy Notebook Updates #281

afaisst commented May 31, 2024

jkrick left a comment

jkrick Jun 5, 2024

afaisst Jun 11, 2024

jkrick Jun 5, 2024

afaisst Jun 11, 2024

jkrick Jun 5, 2024

afaisst Jun 11, 2024

jkrick Jun 5, 2024

afaisst Jun 11, 2024

afaisst Jun 11, 2024

bsipocz Jun 11, 2024

bsipocz Jun 11, 2024

afaisst Jun 11, 2024

jkrick Jun 5, 2024

afaisst Jun 11, 2024

afaisst Jun 11, 2024

bsipocz commented Jun 11, 2024

afaisst commented Jun 11, 2024

afaisst commented Jun 11, 2024

jkrick commented Jun 12, 2024

troyraen commented Jun 12, 2024

jkrick commented Jun 13, 2024 •

edited by bsipocz

Loading

bsipocz commented Jun 13, 2024

afaisst commented Jun 14, 2024

afaisst commented Jun 18, 2024

Spectroscopy Notebook Updates #281

Spectroscopy Notebook Updates #281

Conversation

afaisst commented May 31, 2024

jkrick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bsipocz commented Jun 11, 2024

afaisst commented Jun 11, 2024

afaisst commented Jun 11, 2024

jkrick commented Jun 12, 2024

troyraen commented Jun 12, 2024

jkrick commented Jun 13, 2024 • edited by bsipocz Loading

bsipocz commented Jun 13, 2024

afaisst commented Jun 14, 2024

afaisst commented Jun 18, 2024

jkrick commented Jun 13, 2024 •

edited by bsipocz

Loading