Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling runtime errors/exceptions #212

Open
nshaffer opened this issue Jun 12, 2020 · 12 comments
Open

Handling runtime errors/exceptions #212

nshaffer opened this issue Jun 12, 2020 · 12 comments
Labels
implementation Implementation in experimental and submission of a PR meta Related to this repository topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...

Comments

@nshaffer
Copy link
Contributor

nshaffer commented Jun 12, 2020

This issue is to discuss approaches to handle runtime errors and exceptions in stdlib. I have in mind scenarios such as

  • An iterative solver algorithm fails to converge
  • A negative number is passed to a routine that requires a nonnegative one
  • Not enough memory for an array allocation

These are conditions which cannot be identified at compile time, but if left alone will either cause user code to crash or be erroneous. Currently in stdlib, we have check, which can print error messages and optionally terminate execution. I think it is not sufficient for general-purpose runtime checking. To me, there are a few major considerations when thinking about runtime checking

  1. Users must be given the opportunity to recover from the error if at all possible. This is especially important for library code, which may be difficult to debug depending on the installation/distribution. It is also generally rude for library code to kill execution without giving the user any say in the matter.

  2. It should be possible to recover from runtime errors without sacrificing purity. If a routine is manifestly pure, then it should not have to sacrifice the pure attribute just to have some error checking. If I write a factorial function, making it pure and handling negative arguments should not have to be an either/or proposition.

  3. Checking and handling errors should not be unduly burdensome to users. If a function call requires one line, and handling its possible error conditions requires ten, users will simply not bother with error checking.

This list of criteria is not exhaustive, but they are the three that are most important to me. That stated, here are the runtime checking approaches I am most familiar with and how they stack up:

  1. Return error code and message as optional out-params.

    • Does not play nicely with pure functions (pure subroutines OK, though)
    • Intrinsic functions do not do this (but statements do, e.g., iostat, iomsg)
    • Passes the buck to the user to interpret and handle the error code/message
  2. Die with error stop

    • As of f2018, can be used in pure functions
    • Kind of rude for library code to kill execution without chance of recovery
  3. Return a special value

    • This is what intrinsics usually do, e.g., index returning -1
    • Some algorithms may not have a natural "obviously wrong" value (maybe NaN?)
    • Works with pure functions/subroutines
  4. Raise an exception

    • Maybe in 2045?

Of these, my preference is strongly toward approach 3 whenever possible.

@epagone
Copy link

epagone commented Jun 12, 2020

Very nice description and, FWIW, I prefer approach 3 as well.

One problem that I have faced a few times in the design of such routines is how this API communicates when nested at multiple levels of depth. Say the user is running a numerical integrator within a root-finding problem that is, in turn, used to solve a system of a differential equations. We might have three levels of depth in this case (admittedly, probably rare and fictitious), but I have personally used (within the scope of stdlib) two levels of depth in several cases. I'd guess OO design would help in this sense (with separate instances of the "error handler class") but I know it is discouraged in stdlib.

@certik
Copy link
Member

certik commented Jun 12, 2020

One thing that we should try to avoid is to force every user to be checking every single function from stdlib to ensure it succeeded. We should figure out how to allow users to check it if they want to and handle the error themselves, but we should also allow users to just call a given stdlib function and it would fail with a nice error message if something goes wrong.

So I would say a combination of 3. and 1. is the way to go.

The option 2. should be combined with the option 1. I.e., if the optional stat argument is not present, then it will fail with error stop. If stat is present, it would allow the user to handle the error. This is great for functions such as open or allocate.

The option 3. is great for functions such as find to find an element in an array or a string, and return -1 if it does not exist.

@jvdp1
Copy link
Member

jvdp1 commented Jun 15, 2020

Thank you for these clear propositions. It is related to #95 too.

The options 1., 2., and 3. could be used for different purposes/procedures. For example, returning -1 or even NaN for one of the statistical procedure when a dim > 15 is provided would make no sense, and would not help the user. In such a case, I think it is better that the program fails with error stop and an informative message.

@wclodius2
Copy link
Contributor

wclodius2 commented Jul 7, 2020

There are two related questions raised by @nshaffer original post:

  1. What infrastructure should the library provide for handling errors?
  2. What coding guidelines should the authors of the library follow to best handle errors?

To address the first question I have written a module, that I have tentatively named STDLIB_ERROR_REPORTING, and a markdown document describing the module's API in some detail. The module consists of a number of named constants and five subroutines.

The module defines 80+ unique named constants to serve as error codes for reporting errors in a consistent manner. The constants can be defined as integers or a derived type, but are currently defined as a derived type named ERRORS, whose sole public component is named CODE. The advantages and disadvantages of the different means of defining the constants are:

  1. Integers
    a. Advantage:

     * It is what the Fortran standard currently uses for reporting errors, and is what its users are used to,
     * It requires less typing to enter `INTEGER` than to enter `TYPE(NAME)`, and
     * They can easily be used in a `SELECT CASE` construct
    

    b. Disadvantage: the users may be tempted to use the processor's STATUS values as their flags.

     * the `STATUS` values the processor returns may duplicate values for the error codes, but with very different meanings
     * the meanings of the processor's `STATUS` values are not readily apparent
    
  2. Derived type with a public component:

    a. Advantage:

     * The name of the type can be intuitive
     * With a public component it can be used in `SELECT CASE` constructs
     * More difficult to confuse with `STATUS` values
    

    b. Disadvantage:

     * It is not what users are used to
     * `TYPE(NAME)` requires more typing than `INTEGER`
    
  3. Derived type with a private component

    a. Advantage

     * The name of the type can be intuitive
     * Very difficult to confuse with STATUS values
    

    b. Disadvantage

     * It is not what users are used to
     * Cannot be used in `SELECT CASE` constructs
     * `TYPE(NAME)` requires more typing than `INTEGER`
    

Four of the five subroutines have ERROR as an optional argument for passing error codes. For the fifth ERROR is mandatory. Three subroutines, have optional MODULE and PROCEDURE arguments for specifying the location where the error was discovered. One of the procedures, ASSERT, is similar in intent to CHECK in STDLIB_ERROR, while another, SEND_STOP, is similar in intent to ERROR_STOP in the same module. A summary of the subroutines is as follows:

ASSERT( TEST, MESSAGE [, MODULE, PROCEDURE, ERROR ] ): If TEST is .TRUE. does nothing, otherwise it writes text to the ERROR_UNIT, with an ERROR dependent string as the stop code.

REPORT_ERROR( MESSAGE [, MODULE, PROCEDURE, STAT, ERRMSG, ERROR ] ): Writes the character string, MESSAGE, to ERROR_UNIT, with an ERROR dependent string as the stop code.

REPORT_IO_ERROR( MESSAGE [, MODULE, PROCEDURE, IOSTAT, IOMSG, ERROR ] ): Writes the character string, MESSAGE, to ERROR_UNIT, with an ERROR dependent string as the stop code.

REPORT_TEXT_ERROR( LINE, START_INDEX, DESCRIP [, FILENAME, LINE_NUMBER, WRITE_UNIT, ERROR ] ): Sends a message to WRITE_UNIT describing an error found in a line of text.

SEND_STOP( ERROR ): Stops processing with an ERROR specific string as the stop code. Note this subroutine would be PURE in F2018, but is not PURE in F2008.

As to the second question as to what guidelines programmers should follow I have a few strong opinions on user and memory allocation errors and pure procedures. Programmers should not report user errors up the call chain. They should simply stop with an informative message, or if purity or elemental attributes are important and F2008 compatibility is important, they should do nothing at all. Programmers should almost never attempt to handle memory allocation problems. With virtual memory, allocation should almost never fail, and with lazy allocation on Linux, failure need not be discovered at the invocation of an ALLOCATE statement. They should only attempt to handle an allocation error if:

  1. A large amount of memory is likely to be allocated,
  2. An alternative less memory intensive algorithm is available, and
  3. Most memory is used near the ALLOCATE statement.

Finally as to purity, there is currently nothing to be done to handle errors in F2018 for ELEMENTAL procedures. For PURE procedures in 2018 you can invoke ERROR STOP, or pass a STATUS flag. For the nominal, elemental factorial function with a negative argument, I would return a NaN.

@wclodius2
Copy link
Contributor

FWIW here is some example code illustrating how I view the module being used:

MODULE EXAMPLE_MOD
  ...
  USE STDLIB_ERROR_REPORTING
  ...
  MODULE_NAME = 'EXAMPLE_MOD'
  ...
CONTAINS
  ...
  SUBROUTINE EXAMPLE_SUB(..., STATUS, ... )
    ...
    TYPE(ERRORS), INTENT(OUT), OPTIONAL :: STATUS
    ...
    SELECT CASE (FLAG)
    ...
    CASE DEFAULT
      IF ( PRESENT(STATUS) ) THEN
        STATUS = CONSISTENCY_FAULT
        RETURN
      ELSE
        CALL REPORT_ERROR('Illegal FLAG value.', &
          MODULE = MODULE_NAME,                  &
          PROCEDURE = 'EXAMPLE_SUB',             &
          ERROR = CONSISTENCY_FAULT )
      END IF
    END SELECT
    ...
    OPEN ( UNIT=LUN, FILE=FILENAME, STATUS='OLD', &
      ERR=99, IOSTAT=STAT, IOMSG=MESSAGE )
    ...
 99 IF (PRESENT(STATUS)) THEN
      STATUS = OPEN_FAULT
      RETURN
    ELSE
      CALL REPORT_IO_ERROR ( TRIM(FILENAME) // &
        " could not be opened as 'OLD'.",      &
        MODULE = MODULE_NAME,                  &
        PROCEDURE = 'EXAMPLE_SUB',             &
        IOSTAT = STAT,                         &
        IOMSG = MESSAGE,                       &
        ERROR = OPEN_FAULT )
    END IF
    ...
100 READ ( UNIT=LUN, END=190, EOR=192, ERR=194,&
           IOMSG=IOMSG, IOSTAT=STAT ) A
    ...
    IF ( PRESENT(STATUS) ) THEN
      STATUS = SUCCESS
    END IF
    RETURN
    ...
190 IF ( PRESENT(STATUS) ) THEN
      STATUS = EOF_FAULT
      RETURN
    END IF
    CALL REPORT_IO_ERROR("End-of-File occurred"//&
      "reading from file '"//TRIM(FILENAME)//"'.",&
      MODULE = MODULE_NAME,                       &
      PROCEDURE = 'EXAMPLE_SUB',                  &
      IOSTAT = STAT,                              &
      IOMSG = MESSAGE,                            &
      ERROR = EOF_FAULT )
192 IF ( PRESENT(STATUS) ) THEN
      STATUS = EOR_FAULT
      RETURN
    END IF
    CALL REPORT_IO_ERROR("End-of-Record occurred reading "//&
      "from file '"//TRIM(FILENAME)//"'.",     &
      MODULE = MODULE_NAME,                    &
      PROCEDURE = 'EXAMPLE_SUB',               &
      IOSTAT = STAT,                           &
      IOMSG = MESSAGE,                         &
      ERROR = EOR_FAULT )
194 IF ( PRESENT(STATUS) ) THEN
      STATUS = READ_FAULT
      RETURN
    END IF
    CALL REPORT_IO_ERROR("Read fault occurred reading "//&
      "from file '"//TRIM(FILENAME)//"'.",      &
      MODULE = MODULE_NAME,                     &
      PROCEDURE = 'EXAMPLE_SUB',                &
      IOSTAT = STAT,                            &
      IOMSG = MESSAGE,                          &
      ERROR = READ_FAULT )
  END SUBROUTINE EXAMPLE_SUB
  ...
END MODULE EXAMPLE_MOD

@nshaffer
Copy link
Contributor Author

@wclodius2 Thank you for the carefully considered post. I hope you can clarify a few things for me about your approach.

  1. I do not fully understand your comments on processor-defined status values, e.g., for the stat= specifier in allocate statements or the iostat= specifier of I/O statements. At least in the case of iostat values, the named constants in iso_fortran_env seem quite suitable.

  2. In your example, is it intended that users only test for error codes that they expect to handle and let all others fall to the default case? What do they do if they think none of the predefined error codes makes sense for them?

  3. It is not clear to me what is gained from this approach compared to defining and testing for my own status codes. It provides pre-defined status codes and a few utility routines, but does this facilitate error handling in user code or just formalize it?

@everythingfunctional
Copy link
Member

@wclodius2 , your example usage looks very similar to a library I developed: erloff

I don't have the routines that output the errors and stop the program, but those are easy to add.

@wclodius2
Copy link
Contributor

@nshaffer thank you for your kind words.

  1. Thank you for reminding me of the error flags in ISO_FORTRAN_ENV. It has given me a few more error
    codes to add to my initial list of eighty two. The one problem with the ISO_FORTRAN_ENV flags is that
    their values can vary arbitrarily between processors, so if you want to pass status flags for multiple
    categories of problems, ensuring that your values are different for each cause requires replacing their
    values with your own any way.

  2. It is expected that the writer of the routine that reports the errors to the user documents what errors
    it passes, that the user decides whether he wants to handle any of them or let failure occur in the
    called routine, and that the user decide which errors he wants to handle or let fail in the calling routine.
    I have tried to be very thorough in identifying categories of errors. If a user decides that none of the
    codes are suitable they can do any of: ask me to add an appropriate error code; if the code were
    added to the standard library he could add them himself, use the generic FAILURE code; or define
    their own (presumably integer) flags and document them.

  3. I think a standard library should formalize its error reporting so that users have a consistent interface.
    As a starting point, I think that having utility routines that report the location of the errors are useful.
    I want to contribute to the library, but don't want each module to have its own error reporting.

@wclodius2
Copy link
Contributor

@everythingfunctional I'm glad I am not the only one that approaches error handling this way.

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 13, 2020 via email

@wclodius2
Copy link
Contributor

wclodius2 commented Jul 13, 2020

FWIW my version of gfortran (10.1.0 on the Mac) doesn't define the most recent addition to ISO_FORTRAN_ENV, STAT_FAILED_IMAGE, so an enumeration of error codes that included those defined in ISO_FORTRAN_ENV beyond the I/O ones would have to be processor specific. However, there doesn't seem to be much interest in an enumeration of error codes, whether or not they included those in ISO_FORTRAN_ENV. If there is I could post my enumeration as a starting point.

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 13, 2020 via email

@awvwgk awvwgk added implementation Implementation in experimental and submission of a PR meta Related to this repository topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... labels Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation Implementation in experimental and submission of a PR meta Related to this repository topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
Development

No branches or pull requests

8 participants