Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CRDS sync instead of informal URL for HST reffile downloads #53

Merged
merged 4 commits into from
Mar 8, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 44 additions & 10 deletions ci_watson/hst_helpers.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
"""Helper module for HST tests."""

import os
import glob
import shutil

import crds

__all__ = ['ref_from_image', 'raw_from_asn', 'download_crds']

import shutil

CRDS_SERVER_URL = "https://hst-crds.stsci.edu"
HST_INSTRUMENTS = ['acs', 'wfc3', 'stis', 'cos', 'wfpc2']

def _get_reffile(hdr, key):
"""Get ref file from given key in given FITS header."""
Expand Down Expand Up @@ -108,6 +116,8 @@ def download_crds(refname, timeout=30, verbose=False):

"""
refdir = None
fname = refname
print('Deprecation Warning: `timeout` parameter no longer needed.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit brute force to print this regardless. You could change the signature to take **kwargs and only emit warning if timeout is given. Also better to use warnings.warn(..., DeprecationWarning).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Coming right up...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to using 'warnings.warn' in e00d94c


# Expand IRAF-style dir shortcut.
if '$' in refname:
Expand All @@ -128,15 +138,39 @@ def download_crds(refname, timeout=30, verbose=False):
if refdir is None:
raise ValueError('Unknown HTTP destination for {}'.format(refname))

from ci_watson.artifactory_helpers import check_url, _download
# need to insure CRDS has been cached locally
os.environ['CRDS_SERVER_URL'] = CRDS_SERVER_URL
os.environ['CRDS_PATH']='./'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ever run the test locally, is this behavior desirable? Won't this download a bunch of stuff into the working dir?

Copy link
Contributor Author

@stsci-hack stsci-hack Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that is the point of using the 'crds sync' call below; namely, only those files which are requested get downloaded whether it is local or remote. I ran this on my Windows system at home and only downloaded the one reference file that was requested without getting any other CRDS file. These definitions, though, are required in order to tell CRDS what server to use and that it should only download the files into the local directory (to start with anyway). I could, though, make this definition dependent on whether or not it was already defined by the user. Would that be safer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was made in b68f309. It was also tested on my desktop Linux workstation at work (freyja) and when 'iref' was defined to point to the local directory, it only copied the desired reference file as requested, while it did not copy anything when 'iref' was pointing to the already available local online cache (as before). It also still works just fine under Windows as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does hstcal even run on Windows? 👀

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just don't want to get into the situation where dev runs this locally, downloads a bunch of files into the working dir (which might be source dir), and then accidentally git add all of them into version control.

make this definition dependent on whether or not it was already defined by the user. Would that be safer?

Yes, I think so.

Copy link
Contributor Author

@stsci-hack stsci-hack Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests have been run primarily on Drizzlepac under Windows, not hstcal, as I have not tried to build hstcal C code under Windows. This definition of the URL was made optional under b68f309 and worked as expected only downloading the requested single reference file on my STScI Linux desktop system.

# Make sure expected output directory is present in local directory
tmpbase = os.path.join('references', 'hst')
tmpref = os.path.join(tmpbase, HST_INSTRUMENTS[0])
try:
if not os.path.exists(tmpref):
os.makedirs(tmpref)
for inst in HST_INSTRUMENTS[1:]:
tmppath = os.path.join(tmpbase, inst)
if not os.path.exists(tmppath):
os.mkdir(tmppath)

# run the command to sync this CRDS file with local directory
sync_cmd = crds.sync.SyncScript('sync --files '+ fname)
sync_cmd.sync_explicit_files() # copies file into subdir

# Move the sync'd reference file to locally defined directory now
# We need to find it first, though, since we are not sure what
# instrument that reference file was for...
tmpfile = glob.glob(os.path.join(tmpbase, '*', '*.fits'))[0]
shutil.move(tmpfile, refname)

if verbose:
print('Downloaded {} from {}'.format(refname, url))

# NOTE: For this part to work, jref (for example) must point to
# "." or reference file value in FITS header cannot have "jref$".
url = 'http://ssb.stsci.edu/trds_open/{}/{}'.format(refdir, fname)
if check_url(url):
_download(url, fname, timeout=timeout)
else:
raise ValueError('Invalid URL {}'.format(url))
except Exception:
print(f"Failed to download {fname}")

if verbose:
print('Downloaded {} from {}'.format(refname, url))
finally:
# delete tmp CRDS directories now, if possible.
try:
shutil.rmtree('references')
except Exception:
pass