Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parameters to delete downloaded data from spectroscopy queries #347

Merged
merged 8 commits into from
Sep 30, 2024

Conversation

jkrick
Copy link
Contributor

@jkrick jkrick commented Sep 25, 2024

  • added delete downloaded data to JWST, HST and made uniform variable names with existing Herschel function. The other archives don't download any data.
  • cleaned up the order of the cells (JWST had gotten separated from the MAST section)
  • removed intro section on 'open issues' and moved these to actual issues. Its easier not to remember we need to edit this section every time we make changes.
  • updated runtime info for the notebook. It now takes 30s longer than before, which is ~10%. Not sure why, and I don't have the statistics to know if this is only this notebook (which hits many archives some of which could be having a slow day), or if it is the newer system.

This will close #285

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the default for the delete_downloaded_data should be False, but we can iterate on that.

Some minor code cleanup suggestions, otherwise the things I notices should come up in #336, e.g. making more keywords optional, pep8 cleanup.

@@ -39,7 +39,7 @@ def find_max_flux_column(df):
return max_flux_col


def Herschel_get_spec(sample_table, search_radius_arcsec, datadir, delete_tarfiles = False):
def Herschel_get_spec(sample_table, search_radius_arcsec, datadir, delete_downloaded_data = True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def Herschel_get_spec(sample_table, search_radius_arcsec, datadir, delete_downloaded_data = True):
def Herschel_get_spec(sample_table, search_radius_arcsec, datadir, delete_downloaded_data=True):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

spectroscopy/code_src/herschel_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
Comment on lines 102 to 98
```{code-cell}
```{code-cell} ipython3
# Uncomment the next line to install dependencies if needed.
# !pip install -r requirements_spectra_generator.txt
!pip install -r requirements_spectra_generator.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't commit changes to this cell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its hard for me to remember this because I need it uncommented to run, and then forget to go back and comment it out. Is there some easy way to have both work, ie some if statement around it, or???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't have a good solution for how to make it more convenient to commit patches rather than full files when working directly on fornax.

However, I'm surprised that you need to rerun the installs each time, my understanding was that you need to install the extra ones only once and they will be in your persistent storage space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time I login to Fornax, I need to re-run the pip install cell. Installs are persistent across kernel restarts but not daskhub logouts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, interesting. I'll open a follow-up issue for this over at the image repo as I think it would be a nicer user experience to have persistency. (But also, installing all these requirements file is on the request list)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie some if statement around it, or

While we did some conditionals and then install steps in the early version, I don't think it's particularly good to ship demo notebooks to end users where installations are hidden away that way. cc @troyraen

As a workflow workaround for you as a notebook developer rather than end user could be to pip install everything from a terminal session freshly after a new login and then propagate the list of missing dependencies as an issue in the fornax-image repo, so we can include it in the kernel.

spectroscopy/spectra_generator.md Outdated Show resolved Hide resolved
@jkrick
Copy link
Contributor Author

jkrick commented Sep 30, 2024

I was thinking the default needs to be True to delete downloaded data because of the initial space limit on Fornax.

Copy link
Contributor Author

@jkrick jkrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at it, all changes made.

@@ -39,7 +39,7 @@ def find_max_flux_column(df):
return max_flux_col


def Herschel_get_spec(sample_table, search_radius_arcsec, datadir, delete_tarfiles = False):
def Herschel_get_spec(sample_table, search_radius_arcsec, datadir, delete_downloaded_data = True):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

spectroscopy/code_src/herschel_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
Comment on lines 102 to 98
```{code-cell}
```{code-cell} ipython3
# Uncomment the next line to install dependencies if needed.
# !pip install -r requirements_spectra_generator.txt
!pip install -r requirements_spectra_generator.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its hard for me to remember this because I need it uncommented to run, and then forget to go back and comment it out. Is there some easy way to have both work, ie some if statement around it, or???

spectroscopy/spectra_generator.md Outdated Show resolved Hide resolved
@bsipocz
Copy link
Member

bsipocz commented Sep 30, 2024

I was thinking the default needs to be True to delete downloaded data because of the initial space limit on Fornax.

That would mean a lot of duplicated download when the users are experimenting in the same notebook.
I would just hope some bigger temporary workspace will be available, a space that doesn't have backup or anything fancy, but is suitable for dumping content during a work session.
cc @zoghbi-a - Is that on the radar or do we have to cook up workarounds to operate in a tiny box?

@bsipocz
Copy link
Member

bsipocz commented Sep 30, 2024

@jkrick - are the any more commits to be pushed here? You said for some of the review comments that they have been done, though they have not (if that's easier, I have left them as review comments, so all of those can be committed directly without editing the file manually)

@jkrick
Copy link
Contributor Author

jkrick commented Sep 30, 2024

Sorry @bsipocz I thought I got them all. Now it is telling me 'outdated suggestions cannot be applied'. I did the changes manually to try to help learn from the suggestions, but I must have missed something.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so here is another batch. I'll commit these and will go ahead with the merge, and keep the True or False is better for default for #336, or to be totally separated pending we get some roadmap for a better temporary storage situation.

spectroscopy/code_src/herschel_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
spectroscopy/code_src/mast_functions.py Outdated Show resolved Hide resolved
spectroscopy/spectra_generator.md Outdated Show resolved Hide resolved
spectroscopy/spectra_generator.md Outdated Show resolved Hide resolved
spectroscopy/spectra_generator.md Outdated Show resolved Hide resolved
spectroscopy/spectra_generator.md Outdated Show resolved Hide resolved
@bsipocz bsipocz merged commit df06e2f into main Sep 30, 2024
@bsipocz bsipocz deleted the delete_data_files branch September 30, 2024 20:49
github-actions bot pushed a commit that referenced this pull request Sep 30, 2024
add parameters to delete downloaded data from spectroscopy queries df06e2f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use case: spectroscopy Spectroscopy use case
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make cleanup parameter for storing/deleting downloaded files for _get_spec functions
2 participants