Returning array from function #114

leonfoks · 2020-01-19T19:07:57Z

I was going to ask this in #113 but didn't want to derail that thread.
This is more of a personal question... since I've actually done very little with returning an allocatable array from a function. I may have flawed logic, so please correct me if i'm wrong.
I've had compile issues in the past returning allocatable from functions (cant remember all the details but perhaps others have had these too) and I "intuitively" thought there might be a slow down if those functions need calling multiple times.

Ill use the snippet from #113

function mean(mat, dim) result(res)
    real(sp), intent(in) :: mat(:,:)
    integer, intent(in), optional :: dim !dim = 1 or dim = 2 
    real(sp), allocatable :: res(:)
end function

If I have a large array, and I call this multiple times, does it reallocate memory for that variable every time? Or is there some cleverness that happens?

e.g. 1D result is already allocated, function allocates its own result, on return those results are copied to pre-allocated array.

real(sp), allocatable :: x(:), A(:, :)

! A = full of numbers
allocate(x(shape(A)(2)))

x(:) = mean(A, dim=1)

Is it known what happens specifically, memory wise? Is it compiler dependent?

I'm asking because it will be slower if allocations keep happening, we might want to be careful and always make subroutines available where preallocated memory can be accessed without any implicit allocations and then copies. Am I worrying about this too much haha.

certik · 2020-01-19T19:36:32Z

This is really important. I don't think we should use allocatable at all in the lowest level API, unless the size of the output is truly not known ahead of time, like in loadtxt. Also, in the past I have found that if a function returns an array (non allocatable), it can be slower than an equivalent subroutine with intent(out) array. Apparently there is an extra copy happening. This is something where it feels Fortran compilers must do better. I feel ideally there shouldn't be an overhead of using functions over subroutines, but unfortunately it seems there is.

…

On Sun, Jan 19, 2020, at 12:07 PM, Leon Foks wrote: I was going to ask this in #113 <#113> but didn't want to derail that thread. This is more of a personal question... since I've actually done very little with returning an allocatable array from a function. I may have flawed logic, so please correct me if i'm wrong. I've had compile issues in the past returning allocatable from functions (cant remember all the details but perhaps others have had these too) and I "intuitively" thought there might be a slow down if those functions need calling multiple times. Ill use the snippet from #113 <#113> function mean(mat, dim) result(res) real(sp), intent(in) :: mat(:,:) integer, intent(in), optional :: dim !dim = 1 or dim = 2 real(sp), allocatable :: res(:) end function If I have a large array, and I call this multiple times, does it reallocate memory for that variable every time? Or is there some cleverness that happens? e.g. 1D result is already allocated, function allocates its own result, on return those results are copied to pre-allocated array. real(sp), allocatable :: x(:), A(:, :) ! A = full of numbers allocate(x(shape(A)(2))) x(:) = mean(A, dim=1) Is it known what happens specifically, memory wise? Is it compiler dependent? I'm asking because it will be slower if allocations keep happening, we might want to be careful and always make subroutines available where preallocated memory can be accessed without any implicit allocations and then copies. Am I worrying about this too much haha. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#114?email_source=notifications&email_token=AAAFAWC6TNXY3SQSYYYBK3DQ6SQI7A5CNFSM4KI2655KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHGL63Q>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWBI63LLCUGKPWVAQNTQ6SQI7ANCNFSM4KI2655A>.

leonfoks · 2020-01-19T19:42:39Z

Phew, I was hoping I wasn't crazy haha

nncarlson · 2020-01-19T23:14:05Z

I don't believe there is any performance difference between an automatically-sized array valued function, for example

function foo(array) result(res)
real, intent(in) :: array(:,:)
real :: res(size(array,dim=2))

versus one whose result is allocatable. In either case the size of the result is determined at run time. So there is absolutely no reason to avoid allocatable results if one is determined to use a function.

Also, in the past I have found that if a function returns an array (non allocatable), it can be slower than an equivalent subroutine with intent(out) array. Apparently there is an extra copy happening.

This is the real problem with array-valued functions, allocatable or not, and has nothing to do with allocatable.

milancurcic · 2020-01-20T16:58:59Z

So there is absolutely no reason to avoid allocatable results if one is determined to use a function.

How about the allocation on stack vs. heap? Yes, this isn't defined in the language and is implementation specific, but I found that in practice there is a difference:

Non-allocatable function result may be allocated on the stack, which is faster;
Allocatable function result is always allocated on the heap, which is slower;

I never measured this, but I have anecdotal personal experience of WRF being noticeably slower when compiled with -heap-arrays (ifort flag to make all arrays be allocated on heap).

This is also the caveat with non-allocatable function (or subroutine) results (output arguments): For large arrays, stack can overflow. This problem doesn't occur with allocatable arrays.

nncarlson · 2020-01-20T17:12:25Z

I can certainly believe there is a performance difference between stack and heap allocated. But in the case of an automatic array result that must be allocated at run time, do you think the compiler inserts code to decide where to allocate (stack or heap) based on the size it determines for a particular invocation of the function? (Note I'm not talking about a size known at compile time.) I suppose that could be the case; do you know? But even if so, I think the far bigger cost is the one @certik pointed out with using a function at all.

certik · 2020-01-20T17:13:39Z

We should setup a project to carefully benchmark all these options. Note that gfortran and Intel can put an allocatable on a stack as well.

…

On Mon, Jan 20, 2020, at 9:59 AM, Milan Curcic wrote: > > So there is absolutely no reason to avoid allocatable results if one is determined to use a function. How about the allocation on stack vs. heap? Yes, this isn't defined in the language and is implementation specific, but I found that in practice there is a difference: * Non-allocatable function result may be allocated on the stack, which is faster; * Allocatable function result is *always* allocated on the heap, which is slower; I never measured this, but I have anecdotal personal experience of WRF being noticeably slower when compiled with `-heap-arrays` (ifort flag to make all arrays be allocated on heap). This is also the caveat with non-allocatable function (or subroutine) results (output arguments): For large arrays, stack can overflow. This problem doesn't occur with allocatable arrays. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#114?email_source=notifications&email_token=AAAFAWCIENGWONBPWR32MH3Q6XJ5JA5CNFSM4KI2655KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJNIWIY#issuecomment-576359203>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWAEIKYND6XWHNJ4S3TQ6XJ5JANCNFSM4KI2655A>.

nncarlson · 2020-01-20T17:20:49Z

We should setup a project to carefully benchmark all these options.

That would be very interesting to see. And there are many other specific performance questions like this. Conclusions from such a project could feed into a "Fortran best practices" guide.

Note that gfortran and Intel can put an allocatable on a stack as well.

Yep. My experience with intel is that I routinely have to unlimit the stack size to avoid hitting a segfault. I have no such issues with NAG, for example.

certik · 2020-01-20T18:14:16Z

Also I want to setup benchmarks for operations on arrays (y(:) = a(:) + b(:)) compared to an explicit loop. The loop seems to be faster some times and I have seen several codes moving away from arrays to explicit loops because of that.

Those are things where I want Fortran compilers to do better. And a first step is to have reliable benchmarks, showing where the "performance bug" is.

leonfoks · 2020-01-20T18:30:25Z

In that benchmark, a manually unrolled loop version would be cool to see. The compilers should already do a good job of this. It would be nice to see in a benchmark though if it’s worth it?

milancurcic · 2020-01-20T18:36:43Z

This looks like a good candidate for a fortran-lang/benchmarks repo. This could include both benchmarks of stdlib, and more fundamental basic operations discussed in this thread. It would be nice to have a pipeline there that automatically generates a report for all procedures/programs in the suite. Later on when we change implementations of some stdlib procedures that are used by other procedures, it would be useful to verify that there's no significant performance regression when changing the implementation of a building block.

Besides these, I'm also curious about what's the overhead of calling a generic procedure that overloads some 50 or 100 specific procedures.

zerothi · 2020-04-06T11:51:32Z

Additionally one should be careful with allocatable statements, the below is a perfectly reasonable code:

real, allocatable :: a(:)

a = [1, 2, 3]
a = [1, 2, 3, 4]

However, the compiler have to add checks for sizes and do deallocation/allocation in case they aren't commensurate.
To force the compiler to not check one has to add this:

a(:) = [1, 2, 3]

in which case it will segfault if the dimensions doesn't match.
If this library is low-level enough it should not rely on automatic allocation AND it should disallow array dimension checks (i.e. use a(:) = )

certik · 2020-04-06T16:59:17Z

@zerothi indeed, that seems to be what is needed now. It seems very unfortunate that the natural syntax a = [1, 2, 3] has a runtime overhead, and one must use a(:) = [1, 2, 3] instead.

What's worse, say the code looks like this:

real :: a(3)
...
a = [1, 2, 3]

and then later you decide to change the declaration to allocatable:

real, allocatable :: a(:)
allocate(a(3))
...
a = [1, 2, 3]

Then in the line a = [1, 2, 3] there is now performance overhead.

I wonder if for this reason one should never use a = [1, 2, 3] and instead always use a(:) = [1, 2, 3], even in:

real :: a(3)
...
a(:) = [1, 2, 3]

So that when the declaration is later changed to allocatable, there is no unexpected overhead.

nncarlson · 2020-04-06T17:10:10Z

The technique of using a(:) instead of a to avoid checks for reallocation when one knows they aren't needed is good. However I'd be uncomfortable recommending using a(:) in the case a were not allocatable not knowing what object code compilers will generate. Superficially they are the same thing, but they are technically very different.

zerothi · 2020-04-06T18:09:32Z

I don't think the compiler distinguishes between a and a(:) for non-allocatable arrays. I have never heard of performance degradation due to that. However, the other is very easy to check...

Generally I use it, simply because it is an easy documentation reminder of the array dimensions ;)

milancurcic · 2020-04-06T18:20:34Z

What's worse, say the code looks like this:
real :: a(3)
...
a = [1, 2, 3]
and then later you decide to change the declaration to allocatable:
real, allocatable :: a(:)
allocate(a(3))
...
a = [1, 2, 3]
Then in the line a = [1, 2, 3] there is now performance overhead.

Is the overhead due to a run-time check that the RHS has the same shape as the LHS, or due to the reallocation on assignment always happens, even if the shape is the same?

If the latter, that's compiler dependent, no? A compiler should be able to check at run-time if the array needs reallocating.

I wonder if for this reason one should never use a = [1, 2, 3] and instead always use a(:) = [1, 2, 3]

What happens if the re-allocation is desired, for example if size(a) == 3 and then you do a(:) = [1, 2, 3, 4]? In this case, I want re-allocation and not a run-time error.

zerothi · 2020-04-06T18:27:05Z

What's worse, say the code looks like this:
real :: a(3)
...
a = [1, 2, 3]
and then later you decide to change the declaration to allocatable:
real, allocatable :: a(:)
allocate(a(3))
...
a = [1, 2, 3]
Then in the line a = [1, 2, 3] there is now performance overhead.
Is the overhead due to a run-time check that the RHS has the same shape as the LHS, or due to the reallocation on assignment always happens, even if the shape is the same?

Basically

..., allocatable :: a
a = [1, 2, 3]

gets translated to this

logical :: dealloc
dealloc = .false.
do i = lbound(a), ubound(a)
dealloc = dealloc .or. size(a,dim=i) /= size(RHS, dim=i)
end do
if ( dealloc ) then
deallocate(a)
allocate(a, mold=RHS)
end if
a(...) = RHS(...)

If the latter, that's compiler dependent, no? A compiler should be able to check at run-time if the array needs reallocating.

I have seen the same behaviour on both Intel and GCC, but yes, compilers could handle this differently.

I wonder if for this reason one should never use a = [1, 2, 3] and instead always use a(:) = [1, 2, 3]

What happens if the re-allocation is desired, for example if size(a) == 3 and then you do a(:) = [1, 2, 3, 4]? In this case, I want re-allocation and not a run-time error.

Well you have nothing to do, then it won't allow reallocation. It isn't allowed by the specification (as far as I remember).

milancurcic · 2020-04-06T18:36:51Z

@zerothi Okay, got it, that's clear now. So this:

I wonder if for this reason one should never use a = [1, 2, 3] and instead always use a(:) = [1, 2, 3]

applies only when we know reallocation is not intended, which I didn't make the connection at first.

zerothi · 2020-04-06T18:39:17Z

applies only when we know reallocation is not intended, which I didn't make the connection at first.

exactly.

certik · 2020-04-06T19:25:31Z

I have seen the same behaviour on both Intel and GCC, but yes, compilers could handle this differently.

@zerothi how could compilers handle it differently?

zerothi · 2020-04-06T19:27:36Z

There could be flags that disable such checks and force array dimensions to be the same. This is just a guess and I don't know if it is there. Hence could ;)

certik · 2020-04-06T19:29:42Z

Both GFortran and Intel have such an option, see, e.g.: https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-standard-realloc-lhs. I didn't know if that's what you were talking about, or something else.

zerothi · 2020-04-08T05:21:18Z

Both GFortran and Intel have such an option, see, e.g.: https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-standard-realloc-lhs. I didn't know if that's what you were talking about, or something else.

Yes.

MuellerSeb · 2022-04-19T19:56:42Z

Another important thing regarding loadtxt is, that if the array would be returned, you couldn't overload the function. For example gfortran would throw an error like this:

Error: Ambiguous interfaces in generic interface 'loadtxt' for 'loadtxt_dp' at (1) and 'loadtxt_sp' ...

Since the signature would be the same.

leonfoks changed the title ~~Returning allocatable array from function~~ Returning array from function Jan 20, 2020

jvdp1 mentioned this issue Feb 6, 2020

Addition of variance function in stdlib_experimental_stats #144

Merged

masuday mentioned this issue Feb 7, 2020

Pseudorandom number generator #135

Closed

jvdp1 mentioned this issue Mar 14, 2020

Addition of a function to compute the k-th order central moment of an array #153

Merged

ivan-pi mentioned this issue Apr 7, 2020

diag, eye, and trace #169

Closed

nshaffer mentioned this issue Jun 4, 2020

Quadrature: spec tweaks & Simpson's rule #209

Merged

awvwgk added implementation Implementation in experimental and submission of a PR meta Related to this repository labels Sep 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning array from function #114

Returning array from function #114

leonfoks commented Jan 19, 2020

certik commented Jan 19, 2020 via email

leonfoks commented Jan 19, 2020

nncarlson commented Jan 19, 2020 •

edited

Loading

milancurcic commented Jan 20, 2020

nncarlson commented Jan 20, 2020

certik commented Jan 20, 2020 via email

nncarlson commented Jan 20, 2020 •

edited

Loading

certik commented Jan 20, 2020

leonfoks commented Jan 20, 2020

milancurcic commented Jan 20, 2020

zerothi commented Apr 6, 2020

certik commented Apr 6, 2020

nncarlson commented Apr 6, 2020

zerothi commented Apr 6, 2020 •

edited

Loading

milancurcic commented Apr 6, 2020

zerothi commented Apr 6, 2020

milancurcic commented Apr 6, 2020

zerothi commented Apr 6, 2020

certik commented Apr 6, 2020

zerothi commented Apr 6, 2020

certik commented Apr 6, 2020

zerothi commented Apr 8, 2020

MuellerSeb commented Apr 19, 2022

Returning array from function #114

Returning array from function #114

Comments

leonfoks commented Jan 19, 2020

certik commented Jan 19, 2020 via email

leonfoks commented Jan 19, 2020

nncarlson commented Jan 19, 2020 • edited Loading

milancurcic commented Jan 20, 2020

nncarlson commented Jan 20, 2020

certik commented Jan 20, 2020 via email

nncarlson commented Jan 20, 2020 • edited Loading

certik commented Jan 20, 2020

leonfoks commented Jan 20, 2020

milancurcic commented Jan 20, 2020

zerothi commented Apr 6, 2020

certik commented Apr 6, 2020

nncarlson commented Apr 6, 2020

zerothi commented Apr 6, 2020 • edited Loading

milancurcic commented Apr 6, 2020

zerothi commented Apr 6, 2020

milancurcic commented Apr 6, 2020

zerothi commented Apr 6, 2020

certik commented Apr 6, 2020

zerothi commented Apr 6, 2020

certik commented Apr 6, 2020

zerothi commented Apr 8, 2020

MuellerSeb commented Apr 19, 2022

nncarlson commented Jan 19, 2020 •

edited

Loading

nncarlson commented Jan 20, 2020 •

edited

Loading

zerothi commented Apr 6, 2020 •

edited

Loading