-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a testsuite for the standard #57
Comments
@certik this will be tremendously helpful. Do you think GitHub can be used for this? And do you think it will be possible to get a "critical mass" of support from the Fortran committee to engage and contribute to such an effort? For example, to review and confirm the standard-conformance of each of the tests in a suite because such a "blessing" is what will validate the test suite. |
@FortranFan Yes, GitHub or GitLab can be used. Regarding convincing the committee, I think as we keep discussing and proposing the various ideas in this repository, I think it will become clear in the future which ideas are very popular, and then we should all work towards convincing the committee to do those. |
A thorough suite of positive and negative tests for every feature, requirement, and constraint in the standard would be incredibly valuable to implementations, as well as to commercial customers requiring proof of conformance. |
For FORTRAN 77, there was an official ANSI test suite which vendors would pay the US government annually to come in, run, and certify the results. The suite was rather simplistic overall and never changed. ANSI/NIST discontinued it. There have been several attempts to create a more comprehensive test suite for newer revisions. The Hendrickson/Spackman SHAPE95 suite was licensed by multiple vendors, and it was good, but work stopped on it. While I was at Intel I became aware of an Italian user, whose name escapes me, who appeared to be creating a useful test suite covering all the syntax, but maybe not the full semantics. I have no idea what happened to that as we stopped hearing from him (bug reports) after a while. This is a far bigger task than you might imagine, and requires serious, ongoing resources. It is not something the Fortran committee, all volunteers taking time from their day jobs, can develop or even specify/manage. As a former vendor, I can tell you that vendors would be happy to pay for a good test suite with ongoing maintenance and support. But who is going to create it? These vendors (and I'll include freeware developers in this) all have their own test suites, largely comprised of collected applications, unit and regression tests; but they are far from comprehensive, no matter how good intentioned. Writing a good test is much harder than you might think. Compiler developers are the wrong people to write tests, as they will test what they think the feature is supposed to do. For a while, Intel had a dedicated team, separate from the compiler group, writing tests based on the documentation; it worked very well and uncovered many bugs, but the team, funded by a different organization, was disbanded and we lost that resource. The big problem as I see it is that the market for such a test suite is small compared to the cost of creating and maintaining it. The only way I could see this happening is if some government funded the effort, as they did in the F77 days, and I don't see that happening today, especially for Fortran. So, yes, it would be fantastic if a good, modern Fortran test suite existed and was maintained. Who is going to do it? |
@sblionel Thanks a lot for your feedback, I really appreciate it. My approach is to first figure out ideas that the community would like to see. This seems to be one of the more popular ideas. Once we have established that we want to get this done, we can shift the discussion into how to get it done. I can see several approaches, and there might be more ways to get it done:
So it might be possible to secure and facilitate the funding. The next question is who will do that. Again, there are several approaches here:
Somebody would need to assemble a team and lead this effort and try to get it funded. This comment can serve as some ideas how to do so. |
@certik wrote:
@certik, there might be some people like me who are restricted in terms of what they can do with GitHub (e.g., basic use via browser ok, but not much else) but who can contribute some unit tests for consideration toward inclusion in the "modern Fortran testsuite". If it's possible for you or others on such a team on the "standard" testsuite to set up some kind of infrastructure for such a suite (perhaps on GitHub itself?) and some mechanism for community submissions which can then be reviewed for correctness/suitability, etc. particularly in terms of standard conformance (or lack thereof as intended failures) and approved for inclusion in the suite, that might be another resource this team can start drawing upon. Over the years, I've assembled quite a few "informal" and small unit tests on a variety of standard features including OO facilities, parameterized derived types, UDDTIO, coarrays, interoperability with C, pure and elemental subprograms, submodules, block constructs, etc. which I can start to "clean up" and share with the team working on the "standard testsuite". And there might be other community members willing to contribute similarly. To show an example, here's a case I've been grappling with the last couple of weeks: I think the code is conforming per my read of current standard, however it fails with 2 processors I tried which then makes me wonder if the compilers have bugs in their implementations or whether I'm in the wrong. If it is the former i.e., compiler bugs, then this might be a possible case for inclusion in the "modern Fortran testsuite": #include <stdio.h>
#include <assert.h>
#include "ISO_Fortran_binding.h"
void Csub(const CFI_cdesc_t *, size_t);
void Csub(const CFI_cdesc_t * dv, size_t locd) {
CFI_index_t lb[1];
lb[0] = dv->dim[0].lower_bound;
size_t ld = (size_t)CFI_address(dv, lb);
printf("In C function: CFI_address of dv = %lx\n", ld);
assert( ld == locd );
return;
}
Here're the compilation and linking steps used using MinGW gfortran along with program execution which shows the failure:
|
@FortranFan I will setup the infrastructure very soon and I will update this issue when I do so. Here is my plan so far: The testsuite would be separate and independent of any compiler. It would contain metadata about what feature is being tested, which files, and any other useful info, and then it would also contain (probably Python) library showing how to load all the tests and process them. There can be lots of backends, either as part of the test suite, or separate, for things like autogenerated CMake build system to test any compiler, or to generate a nice website with the coverage for each compiler, etc. Then compilers, such as LFortran, can have scripts that take this and process the files and create an LFortran specific testsuite from it, for example I'd like to generate parts of this test file. Other compilers could do something similar. For example for Flang the files in https://github.com/flang-compiler/f18/tree/fdb351ca2afb0d71028785da4687113343e11f54/test/semantics could be generated from such a testsuite. One downside is that the Flang testsuite has a compiler dependent checks for error messages, so it might not be possible to generate such tests from a compiler independent test suite. One worry that I have is that if the testsuite files change (for whatever reason), then when Flang or LFortran specific testsuite gets regenerated, it will break (the test won't pass anymore). Or to formulate it in another way --- to what extent can a compiler testsuite be "outsourced" to a compiler independent testsuite? Can it even work? I don't have the answer. If it cannot be done, then the answer is that each compiler still has to maintain its own testsuite, and the compiler independent "standard" testsuite is then used to automatically check what features are being implemented by each compiler, and also each compiler can at least take the standard testsuite as a starting point and then adapt it to its needs. @klausler what are your thoughts on this? |
I think that the tests themselves are what's important, not the test suite infrastructure around them, so I would avoid getting too tightly integrated with tools like Python that change incompatibly over the years. Test cases that can be run manually through a compiler without knowledge of infrastructure are the most useful. In practical terms, I suggest making a distinction between three kinds of tests:
Tests in the first group should be as self-contained as possible and execute in such a way that it's easy to determine from the shell or other tools that they terminated happily; perhaps via Tests in the second group should terminate with an expected error code at runtime; if they run to completion, they should indicate to the shell or other tools that they failed to detect a required error, perhaps via Tests in the third group should have their source code marked with comments describing expected error messages. Testing this group against a particular compiler may be best done by capturing error messages from that compiler, validating them manually, and saving them as "known good" compiler output, deviation from which may indicate errors later. If a compiler fails to catch an error and the program somehow survives to execution, it should crash with a message, perhaps via |
@klausler thanks for the feedback. What would be some example in the category 2.? Things like What's your opinion on whether compilers like Flang could (in principle) use such a testsuite, or whether compilers will have to maintain their own testsuite anyway. |
It's not either/or. I would love to have access to an authoritative suite of tests for the current revision of the standard, so that I wouldn't have to wade my way through all the ambiguity and imprecision in the text itself in an attempt to write tests for it. An authoritative test suite would reduce the size of f18's future test directories as the standard evolves; and if the committee had to develop or subcontract the development of such tests, perhaps ambiguity would be exposed earlier and fixed sooner. As is, we have access to many open- and closed-source test suites as well as applications, and depend heavily on all of them. Oh, one last point that's really more important than it might seem: the Fortran standard updates section, requirement, and constraint numbers in incompatible ways with each revision. It's important for f18 compiler and test source code to refer frequently to the standard document. When Fortran 202x arrives, all of those textual citations are going to have to be updated if the standard continues to gratuitously renumber them. The C++ language standard has moved to using names rather than numbers for these referential purposes. |
@klausler wrote:
I would like to make it abundantly clear when I used the phrase "set up some kind of infrastructure for" a test suite, I only meant something that can help community members submit cases - like the example I showed upthread - for consideration and which can then be reviewed by the team for suitability. It'll be useful to some (basic) indexing scheme to reference the tests (perhaps something that follows a numbering scheme of features in the standard might help?) and preferably some description/comments/results summary to go along with these tests. |
Re: numbering of clauses, constraints, and requirements: In Principles and rules for the structure and drafting of ISO and IEC documents section 5.6 (page 8):
If the series of Fortran standards constitute a "series of associated documents", future revisions should avoid needless renumberings of their contents. |
Again, I'll note that the numbering is largely due to ISO rules for standards documents. These are why the chapter numbers changed by three this time around and notes don't have section numbers. Syntax rules and constraints have to be sequentially numbered and there's no feasible way to keep the numbering consistent without freezing the language, as both get additions and deletions over time. Clauses (chapters) are consistent across versions, subject to ISO constraints, and added clauses (for example, C interoperability.) I have sympathy for code and documentation (including error messages) that want to refer to specific sections of the standard, but I just don't see a way to get there other than qualifying these references with a specific standard and updating them once the implementation fully supports a revision. The DEC/Compaq/Intel compiler had many such references in error messages, but we ended up taking them out. |
Why? Because they're assigned by LaTeX and you don't want to change that, or is there an external requirement from INCITS or ISO that they be so? |
You'd need to ask Malcolm Cohen for details, but I think so. This would be ISO, not INCITS. I think it would be chaos if rules and constraints weren't sequential, and what do you do if you want to add new rules in between old ones? Constraints that apply to syntax rules appear in order of the rule (and come before those that don't apply to rules.) To me, this is like requiring that words in the dictionary always appear on the same page number. |
Here's a case for consideration toward a test suite for the standard: ! Unit Test #: Test-1.F2018-8.7
! Author : FortranFan
! Reference : https://j3-fortran.org/doc/year/18/18-007r1.pdf
!
! Description:
! Section 8.7 IMPLICIT statement in above pdf
! c.f. page 116 NOTE 3:
! Implicit typing is not affected by BLOCK constructs
!
integer :: x
x = fn()
print *, "x = ", x
if ( x /= 42 ) then
error stop "FAILURE: expected function return is 42."
else
stop "SUCCESS"
end if
contains
function fn() result(r)
integer :: r
block
i = 42
end block
r = i
end function
end One processor I tried works as I expect with this test whereas another doesn't:
! Unit Test #: Test-1.F2018-8.7
! Author : FortranFan
! Reference : https://j3-fortran.org/doc/year/18/18-007r1.pdf
!
! Description:
! Section 8.7 IMPLICIT statement in above pdf
! c.f. page 116 NOTE 3:
! Implicit typing is not affected by BLOCK constructs
!
integer :: x
x = fn()
print *, "x = ", x
if ( x /= 42 ) then
error stop "FAILURE: expected function return is 42."
else
stop "SUCCESS"
end if
contains
function fn() result(r)
integer :: r
block
i = 42
end block
r = i
end function
end
|
Here's a variant of the case shown in #57 (comment) for consideration: ! Unit Test #: Test-2.F2018-8.7
! Author : FortranFan
! Reference : https://j3-fortran.org/doc/year/18/18-007r1.pdf
!
! Description:
! Section 8.7 IMPLICIT statement in above pdf
! c.f. page 116 NOTE 3:
! Implicit typing is not affected by BLOCK constructs
!
integer :: x
x = fn()
print *, "x = ", x
if ( x == 42 ) then
error stop "Expected x is some arbitrary processor-dependent value, not 42."
end if
contains
function fn() result(r)
integer :: r
block
integer :: i
i = 42
end block
r = i
end function
end Both the processors I tried appear to get this case right:
|
A recent discussion at comp.lang.fortran involving type-bound procedure for a defined assignment and the ELEMENTAL attribute refers to compiler issues. Here's a case which might be useful for a standard test suite - my take is 2 processors currently get this wrong. module b_m
type :: b_t
integer :: i = 0
logical :: defined_assignment = .false.
contains
procedure, pass(lhs) :: assign_b_t
generic :: assignment(=) => assign_b_t
end type
contains
elemental subroutine assign_b_t( lhs, rhs )
! Argument list
class(b_t), intent(inout) :: lhs
class(b_t), intent(in) :: rhs
lhs%i = rhs%i
lhs%defined_assignment = .true.
end subroutine
end module
program case1
use b_m, only : b_t
type, extends(b_t) :: e_t
end type
type :: f_t
type(e_t) :: e
end type
type(f_t) :: foo(2), bar(2)
bar = foo
print *, "bar(1)%e%defined_assignment = ", bar(1)%e%defined_assignment, "; expected value is T."
if ( .not. bar(1)%e%defined_assignment ) error stop "Program did not work as expected."
stop "SUCCESS"
end program case1 Upon execution of program compiled using gfortran, the run-time behavior is:
|
Here's an even simpler scenario involving a derived type containing a component which is a derived type with a defined assignment that one processor appears to treat wrongly: module b_m
type :: b_t
integer :: i = 0
logical :: defined_assignment = .false.
contains
procedure, pass(lhs) :: assign_b_t
generic :: assignment(=) => assign_b_t
end type
contains
elemental subroutine assign_b_t( lhs, rhs )
! Argument list
class(b_t), intent(inout) :: lhs
class(b_t), intent(in) :: rhs
lhs%i = rhs%i
lhs%defined_assignment = .true.
end subroutine
end module
program case2
use b_m, only : b_t
type :: c_t
type(b_t) :: b
end type
type(c_t) :: foo(2), bar(2)
bar = foo
print *, "bar(1)%b%defined_assignment = ", bar(1)%b%defined_assignment, "; expected value is T."
if ( .not. bar(1)%b%defined_assignment ) error stop "Program did not work as expected."
stop "SUCCESS"
end program case2 The program output unexpectedly is as follows:
|
I haven't read this entire thread, but I like the original idea and have been thinking for some time about repurposing the AdHoc repository to automate and democratize the generation of a standards-conformance table. Because AdHoc stores compiler bug reproducers, its build scripts are designed to continue building the remainder of the repository after a compile-time error occurs. This makes it feasible to generate a standards-conformance table based on a test suite that includes code with features not supported by the involved compilers. The build scripts could automatically generated table in GitHub Markdown structured something like the following:
Each entry would correspond to a unique subdirectory. Contributors would submit pull requests with tests that enable the build/test scripts to set the value of each entry according to the following rules:
For example, the following directory tree would yield the above table if the tests
For free compilers, the tests could be run at no cost using GitHub continuous integration features. For non-free compilers, another mechanism for running the tests might be required. The degree of comprehensiveness of the test suite would be determined by the community. |
A case for consider in a test suite for the Fortran standard, this one with finalization. ! Unit Test #: Test-1.F2018-7.5.6
! Author : FortranFan
! Reference : https://j3-fortran.org/doc/year/18/18-007r1.pdf
!
! Description:
! Section 7.5.6.3 When finalization occurs in above pdf
! c.f. page 80 paragraph starting line 17:
! When finalization occurs
!
module m
type :: t
character(len=12) :: name = "default"
contains
final :: final_t
end type
interface t
module procedure construct_t
end interface
contains
function construct_t( name ) result(r)
character(len=*), intent(in), optional :: name
type(t) :: r
if ( present(name) ) r%name = name
end function
subroutine final_t( this )
type(t), intent(inout) :: this
print *, "final_t: this%name = ", this%name
return
end subroutine
subroutine sub1()
type(t), allocatable :: foo
foo = t( name="constructor" )
foo%name = "foo"
end subroutine
subroutine sub2()
type(t), allocatable :: foo
allocate( foo )
foo = t( name="constructor" )
foo%name = "foo"
end subroutine
end module
blk1: block
use m, only : sub1
print *, "Block 1: Two lines from final_t are expected"
call sub1()
end block blk1
print *
blk2: block
use m, only : sub2
print *, "Block 2: Three lines from final_t are expected"
call sub2()
end block blk2
end Consider the program output using
Whereas the output I expect per the Fortran standard is this:
|
Here's a variant of the previous case, this one involves the expr on the right-hand side to include a component of the variable on the LHS: ! Unit Test #: Test-2.F2018-7.5.6
! Author : FortranFan
! Reference : https://j3-fortran.org/doc/year/18/18-007r1.pdf
!
! Description:
! Section 7.5.6.3 When finalization occurs in above pdf
! c.f. page 80 paragraph starting line 17:
! When finalization occurs
!
module m
type :: t
character(len=12) :: name = "default"
contains
private
procedure, pass(lhs) :: add_t
procedure, pass(this) :: clone_t
generic, public :: operator(+) => add_t
generic, public :: clone => clone_t
final :: final_t
end type
interface t
module procedure construct_t
end interface
contains
function construct_t( name ) result(r)
character(len=*), intent(in), optional :: name
type(t) :: r
if ( present(name) ) r%name = name
end function
subroutine final_t( this )
type(t), intent(inout) :: this
print *, "final_t: this%name = ", this%name
return
end subroutine
function add_t( lhs, rhs ) result(r)
class(t), intent(in) :: lhs
type(t), intent(in) :: rhs
type(t) :: r
r%name = trim(lhs%name) // "+" // trim(rhs%name)
end function
function clone_t( this ) result(r)
class(t), intent(in) :: this
type(t) :: r
r%name = trim(this%name) // "*"
end function
subroutine sub()
type(t), allocatable :: foo
foo = t( name="constructor" )
print *, "1"
foo%name = "foo"
foo = foo%clone()
print *, "2"
foo = foo + foo
print *, "3"
end subroutine
end module
use m, only : sub
call sub()
end gfortran program output is
The expected program output is
|
There is now a project idea for Google Summer of Code (GSoC) 2021 for Fortran-lang to create such a testsuite: Obviously it would initially be limited to what can be achieved over the summer and then the community can contribute more tests over time. If anyone knows about a student, please direct them to the page to apply. |
as @certik suggested in #200, scalar expressions used in array shape and pdt declarations should probably be included in any conformance test suite, as the related language features are not well supported in any compiler. providing tests for this sort of language feature is pretty tricky because implementations will necessarily include lots of branching depending on the contents of the expression and the surrounding context. sanity checking and codegen bugs can be hidden in eg support for "specification functions" that are very difficult to uncover with unit testing. property-based testing ala haskell's quickcheck is super helpful for this sort of thing, but that would make implementation of a test suite a lot more complicated. i really have no idea what the ideal approach for testing these features should be, but it seems important to include. |
I think for the array shapes and PDT we simply have to include lots of cases and try to get as many corner cases as we can included. Then as people report bugs in a specific compiler that is not caught by the test suite, we simply add the case in. |
The following "silly" case shows a memory leak on Windows OS using gfortran but not Intel Fortran per an inhouse proprietary memory checker utility. This silly case is used by me as a quick test followed by a host of other validation steps when a team I work with needs to consider a newer version of Intel Fortran compiler for production use. Can someone please run ! Unit Test #: Test-0.F2018-general
! Author : FortranFan
! Reference : https://j3-fortran.org/doc/year/18/18-007r1.pdf
!
! Description:
! A highly contrived but general test using various features of the
! Fortran standard including the use of ALLOCATABLE local objects,
! a derived type with a finalizer toward a component with the POINTER
! attribute, and enhanced interoperability with C to check for
! memory leaks
!
module cstring_m
use, intrinsic :: iso_c_binding, only : c_size_t, c_ptr
interface
function strlen( ps ) result(slen) bind(C, name="strlen")
import :: c_size_t, c_ptr
type(c_ptr), intent(in), value :: ps
integer(c_size_t) :: slen
end function
end interface
end module
module foo_m
use, intrinsic :: iso_c_binding, only : c_char, c_size_t, c_ptr, c_f_pointer
use cstring_m, only : strlen
type :: foo_t
private
class(*), pointer :: d => null()
contains
final :: clean_foo
procedure :: set, get
end type
contains
impure elemental subroutine clean_foo( this )
type(foo_t), intent(inout) :: this
if ( associated(this%d) ) then
deallocate(this%d)
end if
this%d => null()
end subroutine
subroutine set( this, ps )
class(foo_t), intent(inout) :: this
type(c_ptr), intent(in), value :: ps
integer(c_size_t) :: slen
slen = strlen(ps)
if ( slen <= 0 ) error stop
block
character(kind=c_char,len=slen), pointer :: cs
character(kind=c_char,len=:), allocatable :: s
call c_f_pointer( cptr=ps, fptr=cs )
s = cs
cs => null()
call clean_foo( this )
allocate( this%d, source=s )
end block
end subroutine
function get( this ) result(r)
class(foo_t), intent(in) :: this
class(*), allocatable :: r
if (.not. associated(this%d)) error stop
allocate( r, source=this%d )
end function
end module
module bar_m
use, intrinsic :: iso_c_binding, only : c_char, c_loc
use foo_m, only : foo_t
type :: bar_t
class(foo_t), allocatable :: foo
end type
interface bar_t
module procedure :: construct_bar
end interface
contains
function construct_bar( msg ) result(r)
character(kind=c_char, len=*), intent(in), target :: msg
type(bar_t) :: r
allocate( foo_t :: r%foo )
call r%foo%set( c_loc(msg) )
end function
end module
module foobar_m
use, intrinsic :: iso_c_binding, only : c_char, c_size_t
interface
#ifndef __GFORTRAN__
! gfortran doesn't yet adequately support enhanced interoperabilty with C
subroutine getmsg_a( str ) bind(C, name="getmsg_a")
import :: c_char
character(kind=c_char, len=:), allocatable, intent(out) :: str
end subroutine
#endif
subroutine getmsg_p( s, lens ) bind(C, name="getmsg_p")
import :: c_char, c_size_t
character(kind=c_char, len=1), intent(inout) :: s(*)
integer(c_size_t), intent(in), value :: lens
end subroutine
end interface
contains
function msg() result(s)
character(kind=c_char, len=:), allocatable :: s
#ifdef __GFORTRAN__
! Workaround due to gfortran issue with CHARACTER scalar and C interop
allocate( character(kind=c_char, len=14) :: s )
block
integer(c_size_t) :: lens
lens = int( len(s), kind=kind(lens) )
call getmsg_p( s, lens )
end block
#else
call getmsg_a( s )
#endif
end function
end module
block
use, intrinsic :: iso_c_binding, only : c_char
use bar_m, only : bar_t
use foobar_m, only : msg
class(bar_t), allocatable :: bar
class(*), allocatable :: c
bar = bar_t( msg() )
c = bar%foo%get()
select type ( s => c )
type is ( character(len=*) )
print *, s
end select
end block
end #include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "ISO_Fortran_binding.h"
const char msg[] = "Hello World!";
// Silly function for a Fortran test
void getmsg_a( CFI_cdesc_t *str ) {
size_t lens = strlen(msg);
int irc = CFI_allocate(str, (CFI_index_t *)0, (CFI_index_t *)0, lens);
if (irc == 0) {
memcpy(str->base_addr, msg, lens);
}
}
// Silly function for a Fortran test
void getmsg_p( char *s, size_t lens ) {
memcpy(s, msg, lens);
} Expected program behavior:
|
Currently each compiler must develop its own tests for every feature in the standard. One could imagine, down the road, that the committee can maintain a "blessed" set of tests for each feature in the standard, which if the compiler passes, then the feature can be considered "implemented".
One can then maintain an automatic matrix of features and compilers to see which compilers implement which features.
Such a testsuite for each feature can be a nice complement to the standard, giving an example for all the corner cases how a certain feature should behave.
This might seem like a lot of work to do from scratch, but we can start doing it for every new feature from now on. And eventually implement the tests for old features as time allows. Doing it for new features would not be as hard, since the committee spends a lot of time designing each feature carefully. So writing tests for it might even make the process easier.
The text was updated successfully, but these errors were encountered: