Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling and propagating errors in stdlib #224

Open
aradi opened this issue Jul 23, 2020 · 43 comments
Open

Handling and propagating errors in stdlib #224

aradi opened this issue Jul 23, 2020 · 43 comments
Labels
implementation Implementation in experimental and submission of a PR meta Related to this repository

Comments

@aradi
Copy link
Member

aradi commented Jul 23, 2020

Before the file system layer (see #201, #220) in stdlib can be implemented, it would be important to reach some agreement on how to signalize errors within stdlib. I am not talking about logging, as this is an add on we can discuss later. I want to concentrate here on how errors can be passed between routines in stdlib.

Fortran's current approach to report status in I/O commands is to have some optional integer argument, returning the status of the operation. If the argument has not been specified, the code stops in case of errors. I'd propose to generalize this principle, but using derived types to achieve higher robustness and flexibility:

  • The status should become a derived type (type(status)), which methods allowing to set and query it.
  • The status-type has a finalizer. If it goes out of scope with an unhandled error in it (without having been handled by the caller), it calls error stop.
  • The derived has a special field which contains a class(error) item. It can carry arbitrary types derived from a base class, so that arbitrary complex error information can be passed around. This way, the error signaling mechanism is extensible, we could even think to add an error-hierarchy as we have in Python.

I have made a toy project to demonstrate the principle. Please, have a look at it and let me know what you think.

A few more notes:

  • This error passing should be applied to cases, where error reporting via special values in not feasible (e.g. change_dir()).
  • We could of course implement in the error-class to information about how it was propagate (backtrace).

This issue is related to several other error discussons, e.g. #219 , #193, #95. My suggestion is, again, to first reach agreement on the low-level of the error reporting, and we can add extended functionality, such as logging, on top later.

@certik
Copy link
Member

certik commented Jul 23, 2020

Would the idea still be that if the status argument is not provided, that it will error out right away? I think this is the key principle.

I think your approach can work.

The only possible downside that I can see is that you can notice that built-in Fortran functions never return a derived type, but only simple arguments such as integers or arrays. The reason for that is that it does not impose any particular derived type on the user. In here, we would be imposing the type type(status). So that's the main downside compared to returning just an integer.

Regarding speed, the derived type can possibly be a bit slower than an integer. So maybe not a good idea for small functions in hot loops. But for IO functions, the possible overhead is probably small.

Just like integer, type(status) is thread safe, does not have any global state.

And the Pros are: extensible, can convey more rich error information such as a string.

And it is still relatively simple.


If we chose to go with returning just an integer, then we can document the various different errors as different integers, and then provide a function that prints a nice error message based on the integer number.

So I think the key difference to an integer is that type(status) allows to convey information that is not just the error type, but something that changes all the time --- for example which directory failed, or what error some low level command (shell) returned. As is obvious from fpm, we need to have some functionality for running commands, and if they fail, the stderr output can be put into type(status) and propagated to the user. That would be hard to do with just integers.

So it seems to me if just a simple error with no extra information is needed, the integer might be better. But if more information must be attached to the error that is only known at runtime, then I think your type(status) is the way to go.

@aradi
Copy link
Member Author

aradi commented Jul 23, 2020

Would the idea still be that if the status argument is not provided, that it will error out right away? I think this is the key principle.

Actually, originally I wanted to propose to make them mandatory. But to be in accordance with Fortrans usual way of doing things, we could make it optional, and hope, that people would always pass (and handle it) in order to obtain a robust program.

I agree, that Fortran ususally operates with scalar types and arrays of them preventing to impose derived types on the programmer. But in my opinion, this is exaclty what makes many things quite painful, needing a lot of extra-coding for basic stuff.

As for type(status), the extensibility would be very important:

  • We could attach any kind of error information to ease the understanding what happened (similar to Pythons exception)
  • We could extend type(status) with traceback capabilities, so that it can collect the call stack while being propagated upwards.

@certik
Copy link
Member

certik commented Jul 23, 2020

we could make it optional, and hope, that people would always pass (and handle it) in order to obtain a robust program.

Why would you consider a program not robust if it does not handle the error? If the status is not provided, then the function stops the program with a useful error message. I consider this very robust and the desired behavior. If I don't want to handle the error, I want to get a nice error message and a stacktrace. If the user wants to handle the error, then the status is provided. So it seems it is robust either way.

Regarding the stacktrace --- I think it's the compiler's job to create a nice stacktrace when the error occurs. GFortran does it, although I would like it to be formatted better.

What we can do is to try using type(status) in the IO routines and see how it goes.

@milancurcic
Copy link
Member

I overall like it, although I don't think it's feasible to use for everything. For procedures that return a single result we use functions, and if we use an intent(out) argument in functions we can't make them pure. If we trust the long-term vision of Fortran being parallel and running on emerging architectures, then writing pure functions is what we should aim for.

So, without being convinced otherwise, my suggestion would be to limit such status/error propagation to subroutines that modify variables in-place. I/O and file system stuff is I think the perfect candidate for such things, stats functions not so much.

@certik
Copy link
Member

certik commented Jul 23, 2020

@milancurcic I agree. The stat functions will be used in hot loops, so we do not want to have the overhead of more general error handling there. The IO functions will have the overhead of the IO, so we can afford more general error handling there.

@aradi
Copy link
Member Author

aradi commented Jul 23, 2020

Of course, this error handling mechanism is not meant for functions at all. Especially numerical functions have better ways of signalizing errors by returning a specific value (e.g. nan).

I'd propose to use this error-handling mechanism in those cases, where otherwise an error stop would have been called. IMO, a library, which calls an error stop in a deeply nested internal routine instead of propagating the error back to the consumer is a true nightmare and can not really be considered being robust.

@everythingfunctional
Copy link
Member

There are 2 schools of thought I think on having an error stop in a library procedure. On the one hand, the error is unlikely to be helpful to a user of the program, as they're unlikely to be able to deduce why their input caused an error in that procedure without the rest of the context, which isn't available to that procedure to print out.

However, if such an error occurs, it is more likely a bug in the program, the procedure was called with invalid inputs, and that's on the caller. You need to fix your program.

So you could take the stance that, here's the requirements of calling this procedure, if you violate them that's on you, or, you can strive for "total functions", that can gracefully handle all possible inputs, but force the caller to deal with potential errors.

I generally like using libraries that strive towards the latter, as it makes explicit in the interface that things might go wrong, but there are of course performance penalties, and there's nothing wrong with a procedure taking the stance of "if you call me with bad inputs, I crash".

@wclodius2
Copy link
Contributor

I consider a procedure that only calls ERROR STOP when it encounters a problem to be a nightmare, but I also consider one that passes a status flag up to a calling procedure, that represents a problem a calling procedure can't be expected to handle, to also be a nightmare. Instead, a procedure, if it encounters a problem it thinks is unhandleable, should print a detailed error message describing the type of problem, and the location where it was found, and then call ERROR STOP. On rare occasions when the errors are genuinely handleable (e.g. a fast but non-robust method, that discovers it is having problems may pass a flag indicating that a more robust method is needed), it should pass an optional STATUS flag, that, if not present, stops the application with the detailed error message. FWIW I also consider most user errors to be unhadleable. See Herb Sutter’s paper for SC22/WG21, "Zero-overhead deterministic exceptions: Throwing values”, for a discussion of the problems with trying to handle user or allocation errors.

@certik
Copy link
Member

certik commented Jul 24, 2020 via email

@jvdp1
Copy link
Member

jvdp1 commented Jul 24, 2020

Yes, I think we all agree to print detailed error information before calling error stop.

Indeed, I think we agree all on that. However, this means that functions are not allowed to be pure, as currently in stdlib_experimental_stats (see example here.
Any ideas how to resolve that for functions, since this proposition doesn't seem to cover functions?

@aradi
Copy link
Member Author

aradi commented Jul 24, 2020

Of course, error stop with message is better. But often, the error is not caused by a bug, but by wrong user input (e.g. paths, some numerical values). And I find it much nicer, if the simulation package writes out a qualified error message (Error while trying to read the specified parameterization file (input.txt:23)) instead of (Could not read 'something.dat. Or The mixer scheme failed, try to use a different mixer settings versus Unable to solve linearly dependent system of equations.

As for functions: I think, we should only allow for functions which either never fail or can report their failure via a special value of their return type (e.g. NaN).

@jvdp1
Copy link
Member

jvdp1 commented Jul 24, 2020

As for functions: I think, we should only allow for functions which either never fail or can report their failure via a special value of their return type (e.g. NaN).

I am not sure I fully agree with that. For example, for the function mean, the dimension dim can be provided (with 0 < dim < maximum_rank_array). Currently if the user provides a wrong dim, a message is printed and error stop is called.
This is similar to what happen with the intrinsic sum (with gfortran):

Fortran runtime error: Dim argument incorrect in SUM intrinsic: is 3, should be between 1 and 2

I don't think that returning NaN in this case would be correct, but this strategy would allow pure functions.

@certik
Copy link
Member

certik commented Jul 24, 2020

However, this means that functions are not allowed to be pure

It seems error stop can be in pure functions, at least with GFortran. We should check the Fortran Standard on this one.

This function indeed does not compile:

pure integer function f(x)
integer, intent(in) :: x
f = x+1
stop "ok"
end function

with:

a.f90:4:9:

 stop "ok"
         1
Error: STOP statement not allowed in PURE procedure at (1)

But this one compiles:

pure integer function f(x)
integer, intent(in) :: x
f = x+1
error stop "ok"
end function

@aradi
Copy link
Member Author

aradi commented Jul 24, 2020

According to MRC: "Execution of an error stop statement causes execution to cease as soon as possible on all images, so there is no need to disallow it in a pure procedure. It is now allowed ..." (It is apparently a Fortran 2018 thing).

@jvdp1
Copy link
Member

jvdp1 commented Jul 24, 2020

It seems error stop can be in pure functions, at least with GFortran. We should check the Fortran Standard on this one.

Since Fortran 2018, indeed.

@certik
Copy link
Member

certik commented Jul 24, 2020

There you go. So this is an option. One still cannot have any print statements in pure functions, but we can communicate the message via error stop, as in this function:

pure integer function f(x)
integer, intent(in) :: x
character(len=100) :: msg
msg = "some text"
f = x+1
error stop msg
end function

@jvdp1
Copy link
Member

jvdp1 commented Jul 24, 2020

There you go. So this is an option. One still cannot have any print statements in pure functions, but we can communicate the message via error stop, as in this function:

This is indeed an option. However it might limit the number of compilers that support pure procedures with F18 error stop. But for the long term, it is indeed a solution.

@wclodius2
Copy link
Contributor

@aradi the main performance impact I see in your proposal is that I believe the finalizer will be called every time a procedure is called with a STAT argument with INTENT(OUT) whether or not the STAT argument is present. If not present it obviously will be called when the procedure ends. If present I believe it will be called on entry since the corresponding variable will cease to exist. The call of the finalizer will have the cost of the creation of the call stack, the jump to the procedure, the if branch in the procedure, and unwinding the call stack.

If the class(error) can be used to pass arbitrary information the structure of a STAT object can be arbitrarily complex which can make constructing the stack and accessing the object from the stack more costly. I don't know what effect that will have on the finalizer. There is a cost for an arbitrary class hierarchy.

FWIW I have minor objections to the name STATUS for the type, as that is the natural name for the STAT variable, and users will try out that name forgetting that types and variables belong to the same namespace. FWIW I use a type ERRORS for my enumerations of error conditions for that reason.

@jacobwilliams
Copy link
Member

Don't forget about those of us developing GUI applications. For us, an error_stop is never acceptable, and printing a message to the console is useless since the user may not even see that. All errors have to be handled.

I like the type(status) extensible type. But of course, what we really need is proper exception handling in the language. Anything else is just a stop gap solution. (like the current integer and string error messages of some of the intrinsic routines). It's just not enough in 2020.

@aradi
Copy link
Member Author

aradi commented Jul 26, 2020

I agree, the finalization may give an extra overhead, especially due to the intent(out) arguments. Thanks @wclodius2 to point it out! So, I have an alternative suggestion, which should decrease the overhead significantly and is almost as convenient and flexible as the previous suggestion:

  • Error reporting capability for a subroutine is provided by an allocatable, intent(out) dummy argument of a given derived type.
  • At call, the actual argument is unallocated. (Therefore, no finalization is necessary.)
  • If the subroutine finishes without error, it leaves the dummy argument unallocated. If the subroutine finishes with error, it allocates the dummy argument and puts the necessary information into it.
  • The caller can check the success of the subroutine by checking the allocation-status of the actual argument after the call.
  • The derived types for various errors are derived from a base-error type. This base class ensures, that if the error goes out of scope before having been "deactivated", the program is stopped. The error can be deactivated by calling its %deactivate() method.
  • If a subroutine only emits one kind of error, it accepts a given error-type as argument
    subroutine emits_os_error_only(error, ...)
      type(os_error), allocatable, intent(out) :: error
    
    Consequently the calling routine can directly access the publicly available components of the error type:
    type(os_error), allocatable :: error
    call emits_os_es_error_only(error, ...)
    if (allocated(error)) then
      ! Do something with the info found in an os_error type
      ...
    end if
    
  • If a routine can emmit multiple type of errors, it accepts a given error-class as argument
    subroutine emits_multiple_error_types(error, ...)
      class(critical_error), allocatable, intent(out) :: error
    
    In this case, the caller uses select type if it wants to differentiate between the different error types:
    class(critical_error), allocatable :: error
    call emits_multiple_error_types(error, ...)
    if (allocated(error)) then
      select type (error)
      type is (os_error)
        ! handle os-error
       ...
      end select
    end if
    

I have rewritten FX-Error to demonstrate this way of error handling, so have a look at the relevant source files there for further details.

@aradi
Copy link
Member Author

aradi commented Jul 26, 2020

@wclodius2 As for the naming convention, I've opened issue #225 on derived type naming conventions.

@aradi
Copy link
Member Author

aradi commented Jul 26, 2020

I am not sure I fully agree with that. For example, for the function mean, the dimension dim can be provided (with 0 < dim < maximum_rank_array). Currently if the user provides a wrong dim, a message is printed and error stop is called.
This is similar to what happen with the intrinsic sum (with gfortran):
[...]
I don't think that returning NaN in this case would be correct, but this strategy would allow pure functions.

@jvdp1 Yes, returning NaN here were not correct, for sure. However, in my opinion, this case is a clear programming error, as the caller is responsible to set a dimension which is compatible with the data it passes to the function (the same way as the programmer is responsible not to address elements beyond the array sizes). In those cases I'd find stop error with a proper error message acceptable and probably the best/only option we have at the moment, especially if the function is supposed to be pure.

The error-reporting mechanism here is more for cases, where the error is expected by design as the success of the call can not be warranted by the programmer/caller, like

  • Trying to open a directory/file for reading: The directory/file may not exist, or it exists but may not be readable due to missing permissions, etc.
  • Trying to solve a linear system of equations: The caller has probably no knowledge about the linear dependence of the input data. Finding out about linear dependency in advance would take as much time as solving the equations, so it would be highly inefficient. So, we need some way of reporting back the failure, which can be handled by the caller.

@wclodius2
Copy link
Contributor

@aradi your test program, test_fxerror, doesn't appear to have an example of a call of a procedure with an optional STAT argument. When I try to imagine how the code would be set up for an optional argument I don't see how having the finalizer simplifies things. If it is used for the optional STAT argument, it will have to be guarded by IF blocks in such a way that if it exists it never goes out of scope, while if it is used for local variables it will always go out of scope.

@wclodius2
Copy link
Contributor

As near as I can tell the most user friendly way to propagate errors
in Fortran is through an optional argument, typically called STAT or
STATUS. The programmer ensures that if STAT is present the
resulting argument has information about the nature of any error
encountered in executing the subprogram, and if STAT is absent the
program stops if an error is found and reports details about the
nature of the error to either ERROR_UNIT or an appropriate log
file. The main questions to be answered are:

  • What information should STAT convey;
  • What type should STAT have; and
  • What form should the error reporting take?

I will put off discussing the form of error reporting and focus on
the information STAT could convey, and options for the type of
STAT.

There are two categories of information that a STAT argument might
convey: the severity of the error, and the cause of the error.
FLIBS and
Futility
seem to focus on the severity of the error providing values
representing:

  • Information - provides information to the user without stopping;
  • Warning - provides a warning to the user without stopping;
  • Error - that will stop processing only if stop on error is set;
  • Fatal Error - that will always stop processing; and
  • Failure - not well defined

The severity flags seem to be more useful for logging routines
and not for use as a STAT argument. My cursory review indicates that
the FLIBS and Futility library codes focus on error reporting and not
on error handling. As to the cause of the error, in many cases a
library code is well focussed and can have only one cause of
errors. For such codes a simple success/failure flag is sufficient to
identify the cause. But for some codes there can be multiple causes of
failure, e.g., a code that reads a value out of a file may fail
because the file does not exist, the file exists, but does not have
read privileges, an end of file was found before the desired value,
the value did not have the desired form, or the value was outside its
expected range. Each cause has a different possible fix, and each
cause should be distinguished by the STAT argument value. They can
be distinguished by either an ad hoc module specific set of error
codes, or by a library defined set of error codes.

The most obvious type for STAT is a default INTEGER. This is what
the Fortran standard has historically used, and, as a result, what
every Fortran programmer is familiar with. The use of an integer has a
bad rap as the standard did nothing for decades about standardizing
the values, so that all a user could rely on was that a non-zero value
indicated that an error had occurred. Even when it standardized the
meaning of some values in ISO_FORTRAN_ENV it limited the number of
defined values to a handful. However integers could be used for
either an application specific enumeration of error codes, or for a
library wide enumeration of error code values. This allows it to
potentially convey a significant amount of information about the
nature of an error. It has the advantage that no wrapper procedures
are needed for comparisons or setting values, so it can be very
efficient. A problem with an application specific enumeration is that
different applications may use the same numeric value for different
errors, or different numeric values for essentially the same error. A
problem with a library wide enumeration is that it will initially be
incomplete, requiring updates. Another problem is that it will be
large, I suspect with more than a hundred values, so finding the
appropriate code will require a detailed search. FWIW I have created
my own enumeration of error codes. It numbers over 90 entries, though
some are, I believe, relatively useless so trimming might lower the
number of codes to about 80.

The other option is a derived type. Such a derived type can
distinguish between different errors in several ways:

  1. By a component that is an integer that takes on different values
    for different categories of errors; or

  2. By a component that is another intrinsic type that takes on
    different values for different errors; or

  3. By having the derived type be a class and using inheritance to
    create different classes that represent different errors.

An integer component has many of the advantages and disadvantages of
using an integer directly. Unless a library wide enumeration of values
is provided, different applications may use the same value to
represent different sources of errors or different values to
represent the same error. If the component is not made private, then
comparisons and setting values can be very efficient, at the syntactic
cost of consistently referring to the component. If the component is
made private then comparisons and setting values will require the
(implicit) use of simple procedures, that will require inter-module
optimization to reduce the cost.

Using other intrinsic types as component to distinguish different
errors is problematic. I cannot see enumerating errors using a REAL
or COMPLEX, though it can be done. A LOGICAL value can be used to
note whether an error is active, but not to meaningfully distinguish
errors. A CHARACTER string can be used to distinguish errors, but
the comparisons would have a significant performance cost. Still a
CHARACTER string component can be used to supplement an INTEGER
component, by providing an error specific message. The usefulness
of this message depends on whether information about the error should
be specific to where it is first detected, or to where it is
reported. In the first case the string is useful, in the second case
it is much less useful.

Using the class hierarchy to distinguish different errors has one
major advantage: the processor will ensure that different names of
errors will correspond to different values. However the same name can
be used by different applications, and if defined in different modules
the same name can correspond to different values. This problem again
might be addressed by a library wide enumeration of error
codes. However from @aradi's code it seems that ALLOCATE needs to be
used to activate an error and MOVE_ALLOC needs to be used to
associate that specific error with the nominal STAT argument, though
maybe `ALLOCATE( TYPE(error-type-name):: STAT ) might work instead. In
any case the syntax would be unintuitive to a novice.

The above analysis prompts the following questions:

Should the library support STAT arguments? (yes/no)

If it supports STAT arguments should they represent severity or
cause?

If it supports STAT arguments should they be:

A. Integers?

B. A derived type wrapper around integers with the component public?

C. A derived type wrapper around integers with the component private?

D. A derived type that uses the class hierarchy to distinguish errors?

If the library uses a derived type to support STAT arguments, should
it have a CHARACTER string component, e.g., MSG?

If the library uses a derived type to support STAT arguments, should
it follow the integer path of using a special value to indicate that
no error was found or should there be an internal logical(?) flag to
indicate whether an error occurred?

If the library supports STAT arguments, should it provide a
predefined enumeration of error codes, or should it leave it up to the
module's programmer to come up with an ad hoc enumeration of module
specific codes?

FWIW my answers to the above questions are that the library should
support STAT arguments, that they should represent the cause of the
error, that they should be either integers or a derived type wrapper
around integers with the component public, with a mild preference for
simple integers, that the derived type, if it is used, should have a
string component, that it should use a special value to indicate that
no error occurred, and that it should provide a pre-defined
enumeration of error codes.

@certik
Copy link
Member

certik commented Jul 29, 2020 via email

@aradi
Copy link
Member Author

aradi commented Jul 30, 2020

@wclodius2 The finalizer in my approach made sure, that one can not let the error being unhandled: If the subroutine returns an error, it must be handled (and explicitely deactivated) before it goes out of scope. Especially, anti-patterns like

! We pass the optional error argument, but do not hande it afterwards before passing it to the next routine.
call some_routine(error=error, ...)
call other_routine(error=error,...)
...

would give then a run-time error. If that is too much of constraints, we could leave away the finalizer. Then, we were basically back to my original proposal using a derived type as status container. No allocation, no move_alloc necessary then.

I would still argue for using derived types / classes for error classification, as this is the only way can stay flexible with the payload of the error. Some (maybe most) errors only return a simple error message, but sometimes also additiional data could be useful to understand the details of the error (numerical value of the determinant of the matrix, which could not be inverted...). We can make the error msg + integer combination the default case, but it should be possible to go beyond that, if needed.
The following minimal working example demonstrates what I vaguealy have in mind:

module error_handling
  implicit none
  private

  public :: general_error, status, raise_error


  type :: general_error
    character(:), allocatable :: msg
    integer :: error_code
  contains
    procedure :: as_char => general_error_as_char
  end type general_error


  type :: status
    class(general_error), allocatable :: error
  contains
    procedure :: is_ok => status_is_ok
    procedure :: has_failed => status_has_failed
  end type status


  interface general_error
    module procedure construct_general_error
  end interface general_error


contains

  pure function status_is_ok(this) result(isok)
    class(status), intent(in) :: this
    logical :: isok

    isok = .not. allocated(this%error)

  end function status_is_ok


  pure function status_has_failed(this) result(hasfailed)
    class(status), intent(in) :: this
    logical :: hasfailed

    hasfailed = allocated(this%error)

  end function status_has_failed


  function construct_general_error(msg, errorcode) result(this)
    character(:), allocatable :: msg
    integer, intent(in) :: errorcode
    type(general_error) :: this

    this%msg = msg
    this%error_code = errorcode

  end function construct_general_error


  function general_error_as_char(this) result(errorchar)
    class(general_error), intent(in) :: this
    character(:), allocatable :: errorchar

    character(100) :: buffer

    write(buffer, "(' (Error code: ',I0,')')") this%error_code
    errorchar = this%msg // trim(buffer)

  end function general_error_as_char


  subroutine raise_error(stat, error)
    type(status), optional, intent(out) :: stat
    class(general_error), intent(in) :: error

    character(:), allocatable :: errormsg

    if (present(stat)) then
      stat%error = error
    else
      errormsg = error%as_char()
      error stop errormsg
    end if

  end subroutine raise_error

end module error_handling


program test_error_handling
  use error_handling

  type(status) :: stat
  call error_raiser(stat, .true.)
  if (stat%has_failed()) then
    print *, "Error occured: ", stat%error%as_char()
  end if

contains

  subroutine error_raiser(stat, raise)
    type(status), optional, intent(out) :: stat
    logical, intent(in) :: raise

    if (raise) then
      ! It either sets an error in stat, or stops if stat was not present
      call raise_error(stat, general_error("Some error occured", -1))
      return
    end if

  end subroutine error_raiser

end program test_error_handling

Errors with more complicated payload could be derived from type(general_error). If the handler wished to access the additional details, it would have to use select type to access the content of the derived error type.

@wclodius2
Copy link
Contributor

wclodius2 commented Jul 31, 2020 via email

@MarDiehl
Copy link
Contributor

I like the type(status) extensible type. But of course, what we really need is proper exception handling in the language. Anything else is just a stop gap solution. (like the current integer and string error messages of some of the intrinsic routines). It's just not enough in 2020.

I agree that the error handling of Fortran is quite cumbersome. error stop with a fixed error message (i.e no runtime information can be communicated to the user) seems to be the best that the language offers. I have used error stop in the os and os_path implementations and figured out that it gives at least a reasonable stack trace.
For subroutines, the error handling by means of return values is quite reasonable, but for functions with one return value it becomes annoying. If I remember correctly, intent(out) is not even allowed in pure functions.

I would prefer if basic building blocks of the stdlib would use error stop without more elaborated error handling. In addition, stdlib should provide reasonable error handling/logging utilities that allow the user to deal with unexpected situations. Consider the following case:

call chdir('this_dir_does_not_exist') ! will terminate with error stop
if(.not. isdir('this_dir_does_not_exist')) then
  call stdlib_error_handler('this_dir_does_not_exist'// 'does not exist')
else
 call chdir('this_dir_does_not_exist') 
endif

This means extra work for the user, but all intrinsic statements (i.e. open) work like that.

If we would enforce error propagation by means of an error type, we would essentially forbid the use of pure functions!

@wclodius2
Copy link
Contributor

wclodius2 commented Jul 31, 2020 via email

@arjenmarkus
Copy link
Member

arjenmarkus commented Aug 1, 2020 via email

@aradi
Copy link
Member Author

aradi commented Aug 1, 2020

@MarDiehl I'd personally would have very-very strong objections against letting low level I/O-routines die on error with error stop without the possibility of preventing it via error reporting / handling. Think about a Fortran library linked to a big graphical UI. It is a no-go, if an entire UI just crashes / stops without saving user data, just because a badly written Fortran routine at the very depth is not able to handle / propagate errors.

In my opinion, there must be always an alternative to error stop whenever

  • the programmer has not full control over the environment and can therefore not ensure proper action in advance or
  • whenever ensuring the right conditions would be way too elaborate (e.g. finding out whether a system of equation is linearly dependent before passing it to the solver routine).

How, the error is indicated/propagated/handled is of course a matter of taste (and topic of this issue 😉 ) and can even differ in each case, but error stop without an alternative can not be a serious choice for a standard library IMO. We can offer the error stop solution if the caller wants it (e.g. by making the error/status argument optional and stopping on error if not present.) But we also provide an alternative way, so that one can write robust components of big packages in Fortran.

Your proposed error handling for change_directory is not robust IMO. It would be quite elaborate to find out in advance whether you can change into a directory as you would also have to check for the proper access rights for example (which is usually not covered by is_dir()). And even then, it may happen, that a concurrent process deletes the directory between is_dir() and chdir. The simplest and most robust way to know, whether you can change into it is to try it (make the appropriate libc-call), report the error on failure and let the caller decide.

@aradi
Copy link
Member Author

aradi commented Aug 1, 2020

As for pure functions: I think, we only should allow pure functions, where:

  • The function can not fail or
  • the function can report failure with special return values (e.g. -1 or NaN).

Note that I am not referring here to programming errors as mentioned by @jvdp1 . If you pass the wrong dimension to a routine, it is OK if it calls stop error as this a clear programming error.

@MarDiehl
Copy link
Contributor

MarDiehl commented Aug 1, 2020

You have convinced me that error stop is a bad idea and will change os and os_pathaccordingly.
My suggestion would be to follow the Fortran way and use integer return values and/or return special values. This is for example also done in the open statement. The user can then handle errors from stdlib and the core language in a similar way.

In also think that it makes sense to offer a stdlib solution for error handling (a special type/printing messages etc). However, I would like to avoid a strong coupling of individual parts from stdlib. Therefore, I'm opposing to use this error handling routine in basic routines like os and os_path. That would force the user to also use the error handling routine (which is not part of the language, but 'just' a part of the standard library). If it turns out that there is a commonly used pattern of combining file system routines and error handling, this could become a second level module (similar to os.path and Pathlib in python: A more direct, low level library and a more advanced, OO-based approach).

@wclodius2
Copy link
Contributor

In the codes I am attempting to write, I find error handling to be a pervasive problem and think that strong coupling to library defined error handling to be unavoidable. I do not want each module to provide its own logging system. I therefore want the library to provide strong quality error handling routines. For other applications strong coupling to the file handling and os services may be unavoidable.

@certik
Copy link
Member

certik commented Aug 1, 2020 via email

@wclodius2
Copy link
Contributor

Do we want users to have the option of sending the error messages, if the optional STAT is not present, to their log files as well as the ERROR_UNIT of ERROR STOP? If we do then we want the library to consistently use our logging routines for error reporting.

@certik
Copy link
Member

certik commented Aug 1, 2020

What I had in mind for the y=solve(A,x) kind of middle level API is that it would just call error stop with some nice explanation, and that's it. If users want something more sophisticated, then they have to provide the optional stat argument, or use the lower level or more OO oriented API. That way functions like solve can stay pure.

@MarDiehl
Copy link
Contributor

MarDiehl commented Aug 2, 2020

Do we want users to have the option of sending the error messages, if the optional STAT is not present, to their log files as well as the ERROR_UNIT of ERROR STOP? If we do then we want the library to consistently use our logging routines for error reporting.

I would suggest to make it consistent with the language features: exit with error stop if stat is not present and an error occurs, otherwise do not exit and inform via stat. This is at least the case for read and open

@MarDiehl
Copy link
Contributor

MarDiehl commented Aug 2, 2020

What I had in mind for the y=solve(A,x) kind of middle level API is that it would just call error stop with some nice explanation, and that's it. If users want something more sophisticated, then they have to provide the optional stat argument, or use the lower level or more OO oriented API. That way functions like solve can stay pure.

y=solve(A,x) with optional stat argument cannot be pure, it would need to be call solve(A,x,y) or call solve(A,x,y,err). Pure functions can have only intent(in) arguments, unless for pointers. Could make sense to define the a pointer solution for the error handling routines.

But apart of this, I like the approach to have a error handling type for more involved/abstract functions and use plain integer for simple functions.

@wclodius2
Copy link
Contributor

I would suggest to make it consistent with the language features: exit with error stop if stat is not present and an error occurs, otherwise do not exit and inform via stat. This is at least the case for read and open

and write, inquire, close, rewind, backspace, allocate, deallocate, the "atomic" subroutines, and basically anything else that can go wrong that the processor can detect. There are two limitations with error stop by itself for this application: first, in Fortran 08 it cannot accept variable strings as arguments; and, second, it is awkward by itself to send detailed messages. Both of these issues could be dealt with by having a logging system in the standard library.

@wclodius2
Copy link
Contributor

But apart of this, I like the approach to have a error handling type for more involved/abstract functions and use plain integer for simple functions.

There is no simple dividing line between involved/abstract functions and simple functions. I like using the same approach for all procedures.

@wclodius2
Copy link
Contributor

To facilitate discussion of this issue I thought it would be useful to
code up examples for different approaches to error handling where
additional information might be useful. As my example I chose an
an I/O failure on an OPEN statement, where the standard provides
addional data in the form of an IOSTAT and IOMSG. In practice, I
thought an I/O failure would be more amenable to handling, than an
allocation or co-array related failure, where the standard provides
additional information in the form of a STAT (or STATUS) and
ERRMSG. For my examples I choose a simple flattend derived type,
scalar integer codes, and a variant of @aradi's derived type that has a
derived type as an attribute. In all examples I assume the existence
of a module ERROR_CODES with predefined values for the most obvious
error codes. FWIW I have such a module with about 100 error codes
defined.

For my first example of a handling method I chose a flattened derived
type, ERROR_T, with the attributes, CODE, MSG, STAT, and
ERRMSG as one that could handle both IOSTAT with IOMSG and
STAT with ERRMSG. As testing for the value of the CODE value is
both a cheap and relatively intuitive means of determining whether an
error has occured I don't define IS_OK or HAS_FAILED
functions. Additional attributes MODULE and PROCEDURE to indicate
the source of the error, might also be usefully added to ERROR_T.

module error_handling

    use error_codes, only : success
    implicit none
    private

    public :: error_t, error

    type :: error_t
        integer :: code = success
        character(:), allocatable :: msg
        integer :: stat = 0
        character(:), allocatable :: errmsg
    contains
        procedure :: as_char => error_as_char
    end type error_t

    interface error
        module procedure construct_error
    end interface error

contains

    function construct_error(msg, code, errmsg, stat) &
        result(this)
        character(:), intent(in) :: msg
        integer, intent(in) :: code
        character(:), intent(in), optional :: errmsg
        integer, intent(in), optional :: stat
        type(error_t) :: this

        this % msg = msg
        this % code = code
        if ( present(errmsg) ) then
            this % errmsg = errmsg
        end if
        if ( present(stat) ) then
            this % stat = stat
        end if

    end function construct_error


    function error_as_char(this) result(errorchar)
        type(error_t), intent(in) :: this
        character(:), allocatable :: errorchar

        character(100) :: buffer

        write(buffer, "(' (Error code: ',I0,')')") this % code
        errorchar = trim(this % msg) // trim(buffer)

    end function error_as_char

end module error_handling


program test_error_t_handling
    use error_handling
    use error_codes, only: success, missing_file_failure, &
        open_failure

    type(error_t) :: stat

    call error_raiser(stat)
    if (stat % code /= success) then
        print *, "Error occured: ", stat % as_char()
        write(output_unit, '("IOMSG =", a)' ) stat % errmsg
        write(output_unit, '("IOSTAT = ", i0)' ) stat % stat
    end if

contains

    subroutine error_raiser(stat)
        type(error_t), optional, intent(out) :: stat
        logical :: exist
        integer :: iostat, lun
        character(256) :: iomsg

        open( newunit=lun, file='dummy.txt', status='old', &
              form='formatted', action='read', err=999,    &
              iostat=iostat, iomsg=iomsg )
        return

999     inquire( file='dummy.txt' exist=exist )
        if ( exist ) then
! May never happen
            if ( present(stat) ) then
                stat = error( msg='Unable to open existing' // &
                    ' "OLD" file "dummy.txt".', &
                    code = open_failure, errmsg=trim(iomsg), &
                    stat=iostat )
                return
            else
                write(output_unit, '("Procedure: ERROR_RAISER")' )
                write(output_unit, '("IOMSG =", a)' ) iomsg
                write(output_unit, '("IOSTAT = ", i0)' ) iostat
                error stop 'Unable to open existing "OLD" file ' // &
                    '"dummy.txt".'
            end if
        else
! May be due to a lack of READ priviledges for the file
            if ( present(stat) ) then
                stat = error( msg='Missing "OLD" file "dummy.txt".', &
                    code = missing_file_failure, errmsg=trim(iomsg), &
                    stat=iostat )
                return
            else
                write(output_unit, '("Procedure: ERROR_RAISER")' )
                write(output_unit, '("IOMSG =", a)' ) iomsg
                write(output_unit, '("IOSTAT = ", i0)' ) iostat
                error stop 'Missing "OLD" file "dummy.txt".'
            end if
        end if

    end subroutine error_raiser

end program test_error_t_handling

As an alternative, I show the handling with a simple integer error
code. This, of course, lacks the derived type's MSG, STAT, and
ERRMSG attributes. FWIW for the I/O errors I suspect the fixes would
involve either moving a file or changing its priviledges, and for
those actions I suspect the error code is sufficient and the other
attributes are unnecessary.

program test_integer_handling
    use error_codes, only: success, missing_file_failure, &
        open_failure

    integer :: stat
    call error_raiser(stat)
    if (stat /= success) then
        select case ( stat )
        case( open_failure )
            print *, "A failure occurred on an OPEN statement."
        case( missing_file_failure )
            print *, "A missing file failure occured."
        end select

    end if

contains

    subroutine error_raiser(stat)
        integer, optional, intent(out) :: stat
        logical :: exist
        integer :: iostat, lun
        character(256) :: iomsg

        open( newunit=lun, file='dummy.txt', status='old', &
              form='formatted', action='read', err=999,    &
              iostat=iostat, iomsg=iomsg )
        return

999     inquire( file='dummy.txt' exist=exist )
        if ( exist ) then
! May never happen
            if ( present(stat) ) then
                stat =  open_failure
                return
            else
                write(output_unit, '("Procedure: ERROR_RAISER")' )
                write(output_unit, '("IOMSG =", a)' ) iomsg
                write(output_unit, '("IOSTAT = ", i0)' ) iostat
                error stop 'Unable to open existing "OLD"' // &
                    ' file "dummy.txt".'
            end if
        else
! May be due to a lack of READ priviledges for the file
            if ( present(stat) ) then
                stat = missing_file_failure
                return
            else
                write(output_unit, '("Procedure: ERROR_RAISER")' )
                write(output_unit, '("IOMSG =", a)' ) iomsg
                write(output_unit, '("IOSTAT = ", i0)' ) iostat
                error stop 'Missing "OLD" file "dummy.txt".'
            end if
        end if

    end subroutine error_raiser

end program test_integer_handling

Finally I show a modified version of @aradi's code in which I extend
his GENERAL_ERROR with IO_ERROR with the additional attributes
IOSTAT and IOMSG. This also requires defining an additonal
constructor for IO_ERROR. Because defining a new type and associated
constructor for each type of error strikes me as awkward and a pain, I
assume that ERRORCODE is used to distinguish different sub-types of
I/O errors.

module error_handling
    implicit none
    private

    public :: general_error, io_error, status, raise_error

    type :: general_error
        character(:), allocatable :: msg
        integer :: error_code
    contains
        procedure :: as_char => general_error_as_char
    end type general_error

    type, extends(general_error) :: io_error
        character(:), allocatable :: iomsg
        integer :: iostat
    end type io_error

    type :: status
        class(general_error), allocatable :: error
    contains
        procedure :: is_ok => status_is_ok
        procedure :: has_failed => status_has_failed
    end type status

    interface general_error
        module procedure construct_general_error
    end interface general_error

    interface io_error
        module procedure construct_io_error
    end interface io_error

contains

    pure function status_is_ok(this) result(isok)
        class(status), intent(in) :: this
        logical :: isok

        isok = .not. allocated(this%error)

    end function status_is_ok

    pure function status_has_failed(this) result(hasfailed)
        class(status), intent(in) :: this
        logical :: hasfailed

        hasfailed = allocated(this%error)

    end function status_has_failed

    function construct_general_error(msg, errorcode) &
        result(this)
        character(*), intent(in) :: msg
        integer, intent(in) :: errorcode
        class(general_error) :: this

        this%msg = msg
        this%error_code = errorcode

    end function construct_general_error

    function construct_io_error(msg, errorcode, iomsg, iostat) &
        result(this)
        character(*), intent(in) :: msg
        integer, intent(in) :: errorcode
        character(*), intent(in) :: iomsg
        integer, intent(in) :: iostat
        class(io_error) :: this

        this % msg = msg
        this % error_code = errorcode
        this % iomsg = iomsg
        this % iostat = iostat

    end function construct_io_error

    function general_error_as_char(this) result(errorchar)
        class(general_error), intent(in) :: this
        character(:), allocatable :: errorchar

        character(100) :: buffer

        write(buffer, "(' (Error code: ',I0,')')") this%error_code
        errorchar = this%msg // trim(buffer)

    end function general_error_as_char

    subroutine raise_error(stat, error)
        type(status), optional, intent(out) :: stat
        class(general_error), intent(in) :: error

        character(:), allocatable :: errormsg

        if (present(stat)) then
            stat%error = error
            return
        else
            errormsg = error%as_char()
            error stop errormsg ! Legal in F18 but not in F08
        end if

    end subroutine raise_error

end module error_handling


program test_error_handling
    use error_handling
    use error_codes, only: success, missing_file_failure, &
        open_failure

    type(status) :: stat
    call error_raiser(stat)
    if (stat%has_failed()) then
        print *, "Error occured: ", stat%error%as_char()
    end if

contains

    subroutine error_raiser(stat)
        type(status), optional, intent(out) :: stat
        logical :: exist
        integer :: iostat, lun
        character(256) :: iomsg

        open( newunit=lun, file='dummy.txt', status='old', &
              form='formatted', action='read', err=999,    &
              iostat=iostat, iomsg=iomsg )
        return

999     inquire( file='dummy.txt' exist=exist )
        if ( exist ) then
! May never happen
            if ( present(stat) ) then
                call raise_error( stat, io_error( msg='Unable' // &
                    ' to open existing "OLD" file "dummy.txt".', &
                    errorcode = open_failure, errmsg=trim(iomsg), &
                    stat=iostat ) )
                return
            else
                write(output_unit, '("Procedure: ERROR_RAISER")' )
                write(output_unit, '("IOMSG =", a)' ) iomsg
                write(output_unit, '("IOSTAT = ", i0)' ) iostat
                error stop 'Unable to open existing "OLD" file ' // &
                    '"dummy.txt".'
            end if
        else
! May be due to a lack of READ priviledges for the file
            if ( present(stat) ) then
                call raise_error( stat, io_error( msg='Missing' // & 
                    ' "OLD" file "dummy.txt".', &
                    errorcode = missing_file_failure, &
                    errmsg=trim(iomsg), &
                    stat=iostat )
                return
            else
                write(output_unit, '("Procedure: ERROR_RAISER")' )
                write(output_unit, '("IOMSG =", a)' ) iomsg
                write(output_unit, '("IOSTAT = ", i0)' ) iostat
                error stop 'Missing "OLD" file "dummy.txt".'
            end if
        end if

    end subroutine error_raiser

end program test_error_handling

@aradi
Copy link
Member Author

aradi commented Aug 25, 2020

As for the third version, considering our earlier discussions, I'd suggest to use the error type directly, and its allocation status as status flag (not allocated: OK, allocated: error occured). This is more direct and probably also somewhat more efficient. So, it would be something like:

module error_handling
  implicit none
  private

  public :: general_error, io_error

  type :: general_error
    character(:), allocatable :: msg
    integer :: error_code
  contains
    procedure :: as_char => general_error_as_char
  end type general_error

  type, extends(general_error) :: io_error
    character(:), allocatable :: iomsg
    integer :: iostat
  end type io_error

  interface general_error
    module procedure construct_general_error
  end interface general_error

  interface io_error
    module procedure construct_io_error
  end interface io_error

contains

  function construct_general_error(msg, errorcode) result(this)
    character(*), intent(in) :: msg
    integer, intent(in) :: errorcode
    type(general_error) :: this

    this%msg = msg
    this%error_code = errorcode

  end function construct_general_error


  function construct_io_error(msg, errorcode, iomsg, iostat) result(this)
    character(*), intent(in) :: msg
    integer, intent(in) :: errorcode
    character(*), intent(in) :: iomsg
    integer, intent(in) :: iostat
    type(io_error) :: this

    this%msg = msg
    this%error_code = errorcode
    this%iomsg = iomsg
    this%iostat = iostat

  end function construct_io_error


  function general_error_as_char(this) result(errorchar)
    class(general_error), intent(in) :: this
    character(:), allocatable :: errorchar

    character(100) :: buffer

    write(buffer, "(' (Error code: ',I0,')')") this%error_code
    errorchar = this%msg // trim(buffer)

  end function general_error_as_char


end module error_handling


program test_error_handling
  use error_handling
  use error_codes, only: success, missing_file_failure, open_failure

  type(io_error), allocatable :: error

  call ioerror_raiser(error)
  if (allocated(error)) then
    print *, "Error occured: ", error%as_char()
  end if

contains

  subroutine ioerror_raiser(error)
    type(io_error), allocatable, intent(out), optional :: error

    logical :: exist
    integer :: iostat, lun
    character(256) :: iomsg

    open(newunit=lun, file='dummy.txt', status='old', form='formatted',&
        & action='read', err=999, iostat=iostat, iomsg=iomsg)
    return

999 inquire(file='dummy.txt', exist=exist)
    if (exist) then
      ! May never happen
      if (present(error)) then
        error = io_error(msg='Unable to open existing "OLD" file "dummy.txt"',&
            & errorcode=open_failure, iomsg=trim(iomsg), iostat=iostat)
        return
      else
        write(output_unit, '("Procedure: ERROR_RAISER")' )
        write(output_unit, '("IOMSG =", a)' ) iomsg
        write(output_unit, '("IOSTAT = ", i0)' ) iostat
        error stop 'Unable to open existing "OLD" file ' // &
            '"dummy.txt".'
      end if
    else
      ! May be due to a lack of READ priviledges for the file
      if (present(error)) then
        error = io_error(msg='Missing "OLD" file "dummy.txt".', &
            & errorcode = missing_file_failure, iomsg=trim(iomsg),&
            & iostat=iostat)
        return
      else
        write(output_unit, '("Procedure: ERROR_RAISER")' )
        write(output_unit, '("IOMSG =", a)' ) iomsg
        write(output_unit, '("IOSTAT = ", i0)' ) iostat
        error stop 'Missing "OLD" file "dummy.txt".'
      end if
    end if

  end subroutine ioerror_raiser

end program test_error_handling

@jvdp1 jvdp1 mentioned this issue Oct 14, 2020
@awvwgk awvwgk added implementation Implementation in experimental and submission of a PR meta Related to this repository labels Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation Implementation in experimental and submission of a PR meta Related to this repository
Projects
None yet
Development

No branches or pull requests

10 participants