Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning array from function #114

Open
leonfoks opened this issue Jan 19, 2020 · 23 comments
Open

Returning array from function #114

leonfoks opened this issue Jan 19, 2020 · 23 comments
Labels
implementation Implementation in experimental and submission of a PR meta Related to this repository

Comments

@leonfoks
Copy link

I was going to ask this in #113 but didn't want to derail that thread.
This is more of a personal question... since I've actually done very little with returning an allocatable array from a function. I may have flawed logic, so please correct me if i'm wrong.
I've had compile issues in the past returning allocatable from functions (cant remember all the details but perhaps others have had these too) and I "intuitively" thought there might be a slow down if those functions need calling multiple times.

Ill use the snippet from #113

function mean(mat, dim) result(res)
    real(sp), intent(in) :: mat(:,:)
    integer, intent(in), optional :: dim !dim = 1 or dim = 2 
    real(sp), allocatable :: res(:)
end function

If I have a large array, and I call this multiple times, does it reallocate memory for that variable every time? Or is there some cleverness that happens?

e.g. 1D result is already allocated, function allocates its own result, on return those results are copied to pre-allocated array.

real(sp), allocatable :: x(:), A(:, :)

! A = full of numbers
allocate(x(shape(A)(2)))

x(:) = mean(A, dim=1)

Is it known what happens specifically, memory wise? Is it compiler dependent?

I'm asking because it will be slower if allocations keep happening, we might want to be careful and always make subroutines available where preallocated memory can be accessed without any implicit allocations and then copies. Am I worrying about this too much haha.

@certik
Copy link
Member

certik commented Jan 19, 2020 via email

@leonfoks
Copy link
Author

Phew, I was hoping I wasn't crazy haha

@nncarlson
Copy link
Member

nncarlson commented Jan 19, 2020

I don't believe there is any performance difference between an automatically-sized array valued function, for example

function foo(array) result(res)
real, intent(in) :: array(:,:)
real :: res(size(array,dim=2))

versus one whose result is allocatable. In either case the size of the result is determined at run time. So there is absolutely no reason to avoid allocatable results if one is determined to use a function.

Also, in the past I have found that if a function returns an array (non allocatable), it can be slower than an equivalent subroutine with intent(out) array. Apparently there is an extra copy happening.

This is the real problem with array-valued functions, allocatable or not, and has nothing to do with allocatable.

@milancurcic
Copy link
Member

So there is absolutely no reason to avoid allocatable results if one is determined to use a function.

How about the allocation on stack vs. heap? Yes, this isn't defined in the language and is implementation specific, but I found that in practice there is a difference:

  • Non-allocatable function result may be allocated on the stack, which is faster;
  • Allocatable function result is always allocated on the heap, which is slower;

I never measured this, but I have anecdotal personal experience of WRF being noticeably slower when compiled with -heap-arrays (ifort flag to make all arrays be allocated on heap).

This is also the caveat with non-allocatable function (or subroutine) results (output arguments): For large arrays, stack can overflow. This problem doesn't occur with allocatable arrays.

@leonfoks leonfoks changed the title Returning allocatable array from function Returning array from function Jan 20, 2020
@nncarlson
Copy link
Member

I can certainly believe there is a performance difference between stack and heap allocated. But in the case of an automatic array result that must be allocated at run time, do you think the compiler inserts code to decide where to allocate (stack or heap) based on the size it determines for a particular invocation of the function? (Note I'm not talking about a size known at compile time.) I suppose that could be the case; do you know? But even if so, I think the far bigger cost is the one @certik pointed out with using a function at all.

@certik
Copy link
Member

certik commented Jan 20, 2020 via email

@nncarlson
Copy link
Member

nncarlson commented Jan 20, 2020

We should setup a project to carefully benchmark all these options.

That would be very interesting to see. And there are many other specific performance questions like this. Conclusions from such a project could feed into a "Fortran best practices" guide.

Note that gfortran and Intel can put an allocatable on a stack as well.

Yep. My experience with intel is that I routinely have to unlimit the stack size to avoid hitting a segfault. I have no such issues with NAG, for example.

@certik
Copy link
Member

certik commented Jan 20, 2020

Also I want to setup benchmarks for operations on arrays (y(:) = a(:) + b(:)) compared to an explicit loop. The loop seems to be faster some times and I have seen several codes moving away from arrays to explicit loops because of that.

Those are things where I want Fortran compilers to do better. And a first step is to have reliable benchmarks, showing where the "performance bug" is.

@leonfoks
Copy link
Author

In that benchmark, a manually unrolled loop version would be cool to see. The compilers should already do a good job of this. It would be nice to see in a benchmark though if it’s worth it?

@milancurcic
Copy link
Member

This looks like a good candidate for a fortran-lang/benchmarks repo. This could include both benchmarks of stdlib, and more fundamental basic operations discussed in this thread. It would be nice to have a pipeline there that automatically generates a report for all procedures/programs in the suite. Later on when we change implementations of some stdlib procedures that are used by other procedures, it would be useful to verify that there's no significant performance regression when changing the implementation of a building block.

Besides these, I'm also curious about what's the overhead of calling a generic procedure that overloads some 50 or 100 specific procedures.

@zerothi
Copy link

zerothi commented Apr 6, 2020

Additionally one should be careful with allocatable statements, the below is a perfectly reasonable code:

real, allocatable :: a(:)

a = [1, 2, 3]
a = [1, 2, 3, 4]

However, the compiler have to add checks for sizes and do deallocation/allocation in case they aren't commensurate.
To force the compiler to not check one has to add this:

a(:) = [1, 2, 3]

in which case it will segfault if the dimensions doesn't match.
If this library is low-level enough it should not rely on automatic allocation AND it should disallow array dimension checks (i.e. use a(:) = )

@certik
Copy link
Member

certik commented Apr 6, 2020

@zerothi indeed, that seems to be what is needed now. It seems very unfortunate that the natural syntax a = [1, 2, 3] has a runtime overhead, and one must use a(:) = [1, 2, 3] instead.

What's worse, say the code looks like this:

real :: a(3)
...
a = [1, 2, 3]

and then later you decide to change the declaration to allocatable:

real, allocatable :: a(:)
allocate(a(3))
...
a = [1, 2, 3]

Then in the line a = [1, 2, 3] there is now performance overhead.

I wonder if for this reason one should never use a = [1, 2, 3] and instead always use a(:) = [1, 2, 3], even in:

real :: a(3)
...
a(:) = [1, 2, 3]

So that when the declaration is later changed to allocatable, there is no unexpected overhead.

@nncarlson
Copy link
Member

The technique of using a(:) instead of a to avoid checks for reallocation when one knows they aren't needed is good. However I'd be uncomfortable recommending using a(:) in the case a were not allocatable not knowing what object code compilers will generate. Superficially they are the same thing, but they are technically very different.

@zerothi
Copy link

zerothi commented Apr 6, 2020

I don't think the compiler distinguishes between a and a(:) for non-allocatable arrays. I have never heard of performance degradation due to that. However, the other is very easy to check...

Generally I use it, simply because it is an easy documentation reminder of the array dimensions ;)

@milancurcic
Copy link
Member

What's worse, say the code looks like this:

real :: a(3)
...
a = [1, 2, 3]

and then later you decide to change the declaration to allocatable:

real, allocatable :: a(:)
allocate(a(3))
...
a = [1, 2, 3]

Then in the line a = [1, 2, 3] there is now performance overhead.

Is the overhead due to a run-time check that the RHS has the same shape as the LHS, or due to the reallocation on assignment always happens, even if the shape is the same?

If the latter, that's compiler dependent, no? A compiler should be able to check at run-time if the array needs reallocating.

I wonder if for this reason one should never use a = [1, 2, 3] and instead always use a(:) = [1, 2, 3]

What happens if the re-allocation is desired, for example if size(a) == 3 and then you do a(:) = [1, 2, 3, 4]? In this case, I want re-allocation and not a run-time error.

@zerothi
Copy link

zerothi commented Apr 6, 2020

What's worse, say the code looks like this:

real :: a(3)
...
a = [1, 2, 3]

and then later you decide to change the declaration to allocatable:

real, allocatable :: a(:)
allocate(a(3))
...
a = [1, 2, 3]

Then in the line a = [1, 2, 3] there is now performance overhead.

Is the overhead due to a run-time check that the RHS has the same shape as the LHS, or due to the reallocation on assignment always happens, even if the shape is the same?

Basically

..., allocatable :: a
a = [1, 2, 3]

gets translated to this

logical :: dealloc
dealloc = .false.
do i = lbound(a), ubound(a)
dealloc = dealloc .or. size(a,dim=i) /= size(RHS, dim=i)
end do
if ( dealloc ) then
deallocate(a)
allocate(a, mold=RHS)
end if
a(...) = RHS(...)

If the latter, that's compiler dependent, no? A compiler should be able to check at run-time if the array needs reallocating.

I have seen the same behaviour on both Intel and GCC, but yes, compilers could handle this differently.

I wonder if for this reason one should never use a = [1, 2, 3] and instead always use a(:) = [1, 2, 3]

What happens if the re-allocation is desired, for example if size(a) == 3 and then you do a(:) = [1, 2, 3, 4]? In this case, I want re-allocation and not a run-time error.

Well you have nothing to do, then it won't allow reallocation. It isn't allowed by the specification (as far as I remember).

@milancurcic
Copy link
Member

@zerothi Okay, got it, that's clear now. So this:

I wonder if for this reason one should never use a = [1, 2, 3] and instead always use a(:) = [1, 2, 3]

applies only when we know reallocation is not intended, which I didn't make the connection at first.

@zerothi
Copy link

zerothi commented Apr 6, 2020

applies only when we know reallocation is not intended, which I didn't make the connection at first.

exactly.

@certik
Copy link
Member

certik commented Apr 6, 2020

I have seen the same behaviour on both Intel and GCC, but yes, compilers could handle this differently.

@zerothi how could compilers handle it differently?

@zerothi
Copy link

zerothi commented Apr 6, 2020

There could be flags that disable such checks and force array dimensions to be the same. This is just a guess and I don't know if it is there. Hence could ;)

@certik
Copy link
Member

certik commented Apr 6, 2020

Both GFortran and Intel have such an option, see, e.g.: https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-standard-realloc-lhs. I didn't know if that's what you were talking about, or something else.

@zerothi
Copy link

zerothi commented Apr 8, 2020

Both GFortran and Intel have such an option, see, e.g.: https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-standard-realloc-lhs. I didn't know if that's what you were talking about, or something else.

Yes.

@awvwgk awvwgk added implementation Implementation in experimental and submission of a PR meta Related to this repository labels Sep 18, 2021
@MuellerSeb
Copy link
Contributor

Another important thing regarding loadtxt is, that if the array would be returned, you couldn't overload the function. For example gfortran would throw an error like this:

Error: Ambiguous interfaces in generic interface 'loadtxt' for 'loadtxt_dp' at (1) and 'loadtxt_sp' ...

Since the signature would be the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation Implementation in experimental and submission of a PR meta Related to this repository
Projects
None yet
Development

No branches or pull requests

7 participants