Prevent errors from identifier functions for loaders #404

hamogu · 2018-11-27T21:32:07Z

A well-written identifier function will return True or False.
However, the existing identifier functions often make assumptions
about the file which may be wrong, e.g. they check the fits header
keyword "TELESCOP" and fail with a KeyError when that keyword does
not exist or the file is not valid fits.
The correct return would be False, as in "this is not a file
I can read", which then allows the registry to try the next format.

This fixes the problem for the existing default reader and adds a
test.

nmearl · 2018-11-27T22:04:51Z

Fair point! But ambiguous try/excepts can be dangerous, if we know that they are there to catch improper fits header keywords, we should aim to either test directly for that case (just as you test whether 'TELESCOPE' exists in the header), or explicitly catch the error. I think it's in our interests to know when an identifier function is incorrect due to some unforeseen issue.

hamogu · 2018-11-27T22:14:58Z

I disagree. Identifier functions should be tested to make sure they work, but they should work with whatever users throw at them and not raise Errors. Why would I even have an identifier function, if I can't use Spectrum1D.read(...) on an arbitrary file?
Any valid file should give True and any invalid file should give False. At that point the user should not have to worry about why it failed - the point is that the reader can't read it.
As a developer, I want all the errors in detail, but as a user, I want Spectrum1D.read(...) to work and not to fail in identify_spec_type_I_dont_care_about some identifier function that's totally unrelated to the data I have.

hamogu · 2018-11-27T23:40:04Z

I think it's in our interests to know when an identifier function is incorrect due to some unforeseen issue.

That's in the developer's interest, but it's totally a show-stopper for a user. Developers can get around that by testing the identifier, see #405.

I came across this problem like this: I got a fits file from an optical observer. I don't know the exact format, so I tried spec = Spectrum1D.read(...) which fails with a a KeyError in the STIS reader. I happen to be involved in astropy and previous specutils programming, so I started to dig; but as a normal user I would have said "Mmm, seems like I can't use this unless I know the exact format.". So, I fixed the STIS reader problem with the keyword and tried again. Failure, because KeyError in the HST/COS file. Ok, so I fix that reader, too.

Then, I tried to read another spectrum, that someone had given me as a text file. So, I tried again spec = Spectrum1D.read(...), knowing that I just fixed the KeyErrors. Failure. "Not a valid fits file". Mmm, I know it's a text file, so that's OK, but seems like I can't use this package unless I know the exact format...

So, I enclosed all readers assuming fits format in try..except blocks. That's the PR you see here.

So, I strongly recommend to write identifiers that return True or False and don't break the chain of trying out the formats because someone uses them on a file format that we did not anticipate.

nmearl · 2018-11-28T03:44:14Z

I still disagree with this approach. If the identifier function is unable to handle the file type gracefully, it should absolutely raise an error instead of obfuscating the reason why the registry didn't choose that loader.

Why would I even have an identifier function, if I can't use Spectrum1D.read(...) on an arbitrary file?

My argument is that arbitrary files should be a handled case, and not swallowed by simple all-encompassing exception. The point of the identifier function is to act as a filter, and a filter is only useful if the both the user and developer can determine its efficacy, otherwise it's impossible to know whether a loader is chosen or not due to the data actually not being a match, or some malformed header that blocks the appropriate loader from being selected.

I am agreeing that the try/except is decent means of handling this, I'm simply pointing out that the exceptions should be qualified and handled appropriately. If a file is a fits file and was generated from HST, and specifically from COS or STIS, and yet still fails, the user should be made aware that the appropriate loader is not being selected due to some issue with the data. This is the information that is being swallowed when implementing this type of try/except anti-pattern.

Your examples above would benefit from simply having more selective identifier functions, not necessarily blanket exception handling.

hamogu · 2018-11-28T10:37:43Z

What valid reason do you imagine to have a file that "is a fits file and was generated from HST, and specifically from COS or STIS, and yet still fails"?
Also, if you already know what you are dealing with, you can set the format=... and your file will read (EDIT: or throw and error that will tell you what's wrong with it).

I suggest that we solicit more opinions on the strategy. If @nmearl view prevails, I'm happy to make this much narrower and only catch the errors that I specifically can produce now.

However, I can guarantee that this issue will return with files that fail e.g. the check if it's a fits file in some other way. Potentially, this can be a continuing source of frustration for users - unless we catch essentially all exceptions.

cc: @sargas @eteq @stscicrawford

nmearl · 2018-11-28T20:20:01Z

What valid reason do you imagine to have a file that "is a fits file and was generated from HST, and specifically from COS or STIS, and yet still fails"?

There is no valid reason to have a file that one knows is of a particular flavor, and yet still fails. The point being that, in your case, the identifier would skip the appropriate loader instead of throwing an error. Either way, the user is left without a properly loaded data file. The difference is whether or not they know why it could not be parsed.

nmearl · 2018-11-28T20:21:25Z

The solution may be as simple as being sure to log the error in the except block, not necessarily raising an exception. But if the true experience were to never raise errors in the identifier functions, then that implementation detail would've been codified upstream in the registry itself. I'm of the mind that catch-all anti-patterns should not be used when we certainly have the ability to be more selective.

hamogu · 2018-11-28T20:27:29Z

I think logging any error caught is a good solution and I'll see how to do that. If you happen to know offhand how to write to the logger, I appreciate a small note!

But if the true experience were to never raise errors in the identifier functions, then that implementation detail would've been codified upstream in the registry itself.

Good idea. I'll open an issue in astropy itself.

hamogu · 2019-02-01T11:53:00Z

@nmearl I rebased and added full exception logging to all catch-all try-except statements that I introduced.

nmearl · 2019-02-01T14:08:44Z

The more I look at this, the more I think the try/except behavior should be moved to the decorator definition. That way, the user does not need to be responsible for wrapping their identifier function in a try/except block and log the failure, it would be automatically done when the loader and identifier function are added to the registry.

I would suggest that in the data loader definition, place your try/except behavior in a nested fuction and wrap the passed in identifier function:

def data_loader(label, identifier=None, dtype=Spectrum1D, extensions=None,
                priority=0):
    """
    ...
    """
    def identifier_wrapper(ident):
        def wrapper(*args, **kwargs):
             try:
                 return ident(*args, **kwargs)
             except Exception as e:
                 logging.debug("Tried to read this as {} file, but could not.".format(label))
                 logging.debug(e, exc_info=True)
                 return False
        return wrapper

    identifier = identifier_wrapper(identifier)

    def decorator(func):
        io_registry.register_reader(label, dtype, func)
        ...

specutils/tests/test_loaders.py

hamogu · 2019-02-01T14:56:16Z

Yes, a decorator sounds good for this case. In fact, once it's done here and works, I think I'll contribute that decorator to astropy, too. However, the fact that this problem has never come up in several years of astropy means that it is not urgent there. I think specutils is the first file that makes extensive use of identifier functions. Most other packages (including astropy itself) only look at the file extension, which does not raise exception.

If accepted into astropy, it could then be removed here again. However, I definitely want this in photutils now and not wait for an entire astropy development cycle plus the time until photutils requirements match the most recent version of astropy because I tried reading several different fits files form different sources and they all failed, so I think that this is probably a common problem.

I will turn this into a decorator some time this weekend.

nmearl · 2019-02-01T15:09:42Z

For now, I'm suggesting that this can already be handled in the already existing decorator that is responsible for adding the loader and the identifier function to the registry. That decorator currently exists in specutils, so you would only need to add the functionality to wrap the identifier function (the little code snippet I offered). Certainly though, in the future, the decorator approach could be moved to astropy proper.

However, the existing identifier functions often make assumptions about the file which may be wrong, e.g. they check the fits header keyword "TELESCOP" and fail with a KeyError when that keyword does not exist or the file is not valid fits. The correct return would be False, as in "this is not a file I can read", which then allows the registry to try the next format. This fixes the problem for the existing default reader and adds a test.

nmearl

This looks great, thanks @hamogu.

hamogu force-pushed the nofailstis branch from e9925f0 to 0de9450 Compare November 27, 2018 21:35

hamogu force-pushed the nofailstis branch from 0de9450 to 72f22dd Compare November 27, 2018 23:28

hamogu force-pushed the nofailstis branch from 72f22dd to 82f6544 Compare November 27, 2018 23:40

hamogu mentioned this pull request Nov 28, 2018

Catch all exceptions in Identify functions in the io.registry? astropy/astropy#8196

Open

hamogu force-pushed the nofailstis branch from 82f6544 to aef9a82 Compare February 1, 2019 11:32

hamogu force-pushed the nofailstis branch from 80a00a4 to b3bf16e Compare February 1, 2019 12:08

nmearl reviewed Feb 1, 2019

View reviewed changes

specutils/tests/test_loaders.py Outdated Show resolved Hide resolved

hamogu force-pushed the nofailstis branch from 2690183 to bf74a13 Compare February 2, 2019 21:24

nmearl approved these changes Feb 4, 2019

View reviewed changes

nmearl merged commit 91e2918 into astropy:master Feb 4, 2019

hamogu deleted the nofailstis branch May 2, 2019 19:36

hamogu mentioned this pull request May 4, 2019

XraySpectrum1D class with loaders #427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent errors from identifier functions for loaders #404

Prevent errors from identifier functions for loaders #404

hamogu commented Nov 27, 2018

nmearl commented Nov 27, 2018

hamogu commented Nov 27, 2018

hamogu commented Nov 27, 2018

nmearl commented Nov 28, 2018

hamogu commented Nov 28, 2018 •

edited

Loading

nmearl commented Nov 28, 2018

nmearl commented Nov 28, 2018 •

edited

Loading

hamogu commented Nov 28, 2018

hamogu commented Feb 1, 2019

nmearl commented Feb 1, 2019 •

edited

Loading

hamogu commented Feb 1, 2019

nmearl commented Feb 1, 2019

nmearl left a comment

Prevent errors from identifier functions for loaders #404

Prevent errors from identifier functions for loaders #404

Conversation

hamogu commented Nov 27, 2018

nmearl commented Nov 27, 2018

hamogu commented Nov 27, 2018

hamogu commented Nov 27, 2018

nmearl commented Nov 28, 2018

hamogu commented Nov 28, 2018 • edited Loading

nmearl commented Nov 28, 2018

nmearl commented Nov 28, 2018 • edited Loading

hamogu commented Nov 28, 2018

hamogu commented Feb 1, 2019

nmearl commented Feb 1, 2019 • edited Loading

hamogu commented Feb 1, 2019

nmearl commented Feb 1, 2019

nmearl left a comment

Choose a reason for hiding this comment

hamogu commented Nov 28, 2018 •

edited

Loading

nmearl commented Nov 28, 2018 •

edited

Loading

nmearl commented Feb 1, 2019 •

edited

Loading