-
-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent errors from identifier functions for loaders #404
Conversation
Fair point! But ambiguous try/excepts can be dangerous, if we know that they are there to catch improper fits header keywords, we should aim to either test directly for that case (just as you test whether |
I disagree. Identifier functions should be tested to make sure they work, but they should work with whatever users throw at them and not raise Errors. Why would I even have an identifier function, if I can't use |
That's in the developer's interest, but it's totally a show-stopper for a user. Developers can get around that by testing the identifier, see #405. I came across this problem like this: I got a fits file from an optical observer. I don't know the exact format, so I tried Then, I tried to read another spectrum, that someone had given me as a text file. So, I tried again So, I enclosed all readers assuming fits format in try..except blocks. That's the PR you see here. So, I strongly recommend to write identifiers that return |
I still disagree with this approach. If the identifier function is unable to handle the file type gracefully, it should absolutely raise an error instead of obfuscating the reason why the registry didn't choose that loader.
My argument is that arbitrary files should be a handled case, and not swallowed by simple all-encompassing exception. The point of the identifier function is to act as a filter, and a filter is only useful if the both the user and developer can determine its efficacy, otherwise it's impossible to know whether a loader is chosen or not due to the data actually not being a match, or some malformed header that blocks the appropriate loader from being selected. I am agreeing that the try/except is decent means of handling this, I'm simply pointing out that the exceptions should be qualified and handled appropriately. If a file is a fits file and was generated from HST, and specifically from COS or STIS, and yet still fails, the user should be made aware that the appropriate loader is not being selected due to some issue with the data. This is the information that is being swallowed when implementing this type of try/except anti-pattern. Your examples above would benefit from simply having more selective identifier functions, not necessarily blanket exception handling. |
What valid reason do you imagine to have a file that "is a fits file and was generated from HST, and specifically from COS or STIS, and yet still fails"? I suggest that we solicit more opinions on the strategy. If @nmearl view prevails, I'm happy to make this much narrower and only catch the errors that I specifically can produce now. However, I can guarantee that this issue will return with files that fail e.g. the check if it's a fits file in some other way. Potentially, this can be a continuing source of frustration for users - unless we catch essentially all exceptions. |
There is no valid reason to have a file that one knows is of a particular flavor, and yet still fails. The point being that, in your case, the identifier would skip the appropriate loader instead of throwing an error. Either way, the user is left without a properly loaded data file. The difference is whether or not they know why it could not be parsed. |
The solution may be as simple as being sure to log the error in the |
I think logging any error caught is a good solution and I'll see how to do that. If you happen to know offhand how to write to the logger, I appreciate a small note!
Good idea. I'll open an issue in astropy itself. |
@nmearl I rebased and added full exception logging to all catch-all try-except statements that I introduced. |
The more I look at this, the more I think the try/except behavior should be moved to the decorator definition. That way, the user does not need to be responsible for wrapping their identifier function in a try/except block and log the failure, it would be automatically done when the loader and identifier function are added to the registry. I would suggest that in the data loader definition, place your try/except behavior in a nested fuction and wrap the passed in identifier function: def data_loader(label, identifier=None, dtype=Spectrum1D, extensions=None,
priority=0):
"""
...
"""
def identifier_wrapper(ident):
def wrapper(*args, **kwargs):
try:
return ident(*args, **kwargs)
except Exception as e:
logging.debug("Tried to read this as {} file, but could not.".format(label))
logging.debug(e, exc_info=True)
return False
return wrapper
identifier = identifier_wrapper(identifier)
def decorator(func):
io_registry.register_reader(label, dtype, func)
... |
Yes, a decorator sounds good for this case. In fact, once it's done here and works, I think I'll contribute that decorator to astropy, too. However, the fact that this problem has never come up in several years of astropy means that it is not urgent there. I think specutils is the first file that makes extensive use of identifier functions. Most other packages (including astropy itself) only look at the file extension, which does not raise exception. If accepted into astropy, it could then be removed here again. However, I definitely want this in photutils now and not wait for an entire astropy development cycle plus the time until photutils requirements match the most recent version of astropy because I tried reading several different fits files form different sources and they all failed, so I think that this is probably a common problem. I will turn this into a decorator some time this weekend. |
For now, I'm suggesting that this can already be handled in the already existing decorator that is responsible for adding the loader and the identifier function to the registry. That decorator currently exists in specutils, so you would only need to add the functionality to wrap the identifier function (the little code snippet I offered). Certainly though, in the future, the decorator approach could be moved to astropy proper. |
However, the existing identifier functions often make assumptions about the file which may be wrong, e.g. they check the fits header keyword "TELESCOP" and fail with a KeyError when that keyword does not exist or the file is not valid fits. The correct return would be False, as in "this is not a file I can read", which then allows the registry to try the next format. This fixes the problem for the existing default reader and adds a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks @hamogu.
A well-written identifier function will return
True
orFalse
.However, the existing identifier functions often make assumptions
about the file which may be wrong, e.g. they check the fits header
keyword "TELESCOP" and fail with a
KeyError
when that keyword doesnot exist or the file is not valid fits.
The correct return would be
False
, as in "this is not a fileI can read", which then allows the registry to try the next format.
This fixes the problem for the existing default reader and adds a
test.