-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning array from function #114
Comments
This is really important.
I don't think we should use allocatable at all in the lowest level API, unless the size of the output is truly not known ahead of time, like in loadtxt.
Also, in the past I have found that if a function returns an array (non allocatable), it can be slower than an equivalent subroutine with intent(out) array. Apparently there is an extra copy happening. This is something where it feels Fortran compilers must do better. I feel ideally there shouldn't be an overhead of using functions over subroutines, but unfortunately it seems there is.
…On Sun, Jan 19, 2020, at 12:07 PM, Leon Foks wrote:
I was going to ask this in #113
<#113> but didn't want to
derail that thread.
This is more of a personal question... since I've actually done very
little with returning an allocatable array from a function. I may have
flawed logic, so please correct me if i'm wrong.
I've had compile issues in the past returning allocatable from
functions (cant remember all the details but perhaps others have had
these too) and I "intuitively" thought there might be a slow down if
those functions need calling multiple times.
Ill use the snippet from #113
<#113>
function mean(mat, dim) result(res)
real(sp), intent(in) :: mat(:,:)
integer, intent(in), optional :: dim !dim = 1 or dim = 2
real(sp), allocatable :: res(:)
end function
If I have a large array, and I call this multiple times, does it
reallocate memory for that variable every time? Or is there some
cleverness that happens?
e.g. 1D result is already allocated, function allocates its own result,
on return those results are copied to pre-allocated array.
real(sp), allocatable :: x(:), A(:, :)
! A = full of numbers
allocate(x(shape(A)(2)))
x(:) = mean(A, dim=1)
Is it known what happens specifically, memory wise? Is it compiler dependent?
I'm asking because it will be slower if allocations keep happening, we
might want to be careful and always make subroutines available where
preallocated memory can be accessed without any implicit allocations
and then copies. Am I worrying about this too much haha.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#114?email_source=notifications&email_token=AAAFAWC6TNXY3SQSYYYBK3DQ6SQI7A5CNFSM4KI2655KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHGL63Q>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWBI63LLCUGKPWVAQNTQ6SQI7ANCNFSM4KI2655A>.
|
Phew, I was hoping I wasn't crazy haha |
I don't believe there is any performance difference between an automatically-sized array valued function, for example function foo(array) result(res)
real, intent(in) :: array(:,:)
real :: res(size(array,dim=2)) versus one whose result is allocatable. In either case the size of the result is determined at run time. So there is absolutely no reason to avoid allocatable results if one is determined to use a function.
This is the real problem with array-valued functions, allocatable or not, and has nothing to do with allocatable. |
How about the allocation on stack vs. heap? Yes, this isn't defined in the language and is implementation specific, but I found that in practice there is a difference:
I never measured this, but I have anecdotal personal experience of WRF being noticeably slower when compiled with This is also the caveat with non-allocatable function (or subroutine) results (output arguments): For large arrays, stack can overflow. This problem doesn't occur with allocatable arrays. |
I can certainly believe there is a performance difference between stack and heap allocated. But in the case of an automatic array result that must be allocated at run time, do you think the compiler inserts code to decide where to allocate (stack or heap) based on the size it determines for a particular invocation of the function? (Note I'm not talking about a size known at compile time.) I suppose that could be the case; do you know? But even if so, I think the far bigger cost is the one @certik pointed out with using a function at all. |
We should setup a project to carefully benchmark all these options. Note that gfortran and Intel can put an allocatable on a stack as well.
…On Mon, Jan 20, 2020, at 9:59 AM, Milan Curcic wrote:
>
> So there is absolutely no reason to avoid allocatable results if one is determined to use a function.
How about the allocation on stack vs. heap? Yes, this isn't defined in
the language and is implementation specific, but I found that in
practice there is a difference:
* Non-allocatable function result may be allocated on the stack, which
is faster;
* Allocatable function result is *always* allocated on the heap, which
is slower;
I never measured this, but I have anecdotal personal experience of WRF
being noticeably slower when compiled with `-heap-arrays` (ifort flag
to make all arrays be allocated on heap).
This is also the caveat with non-allocatable function (or subroutine)
results (output arguments): For large arrays, stack can overflow. This
problem doesn't occur with allocatable arrays.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#114?email_source=notifications&email_token=AAAFAWCIENGWONBPWR32MH3Q6XJ5JA5CNFSM4KI2655KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJNIWIY#issuecomment-576359203>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWAEIKYND6XWHNJ4S3TQ6XJ5JANCNFSM4KI2655A>.
|
That would be very interesting to see. And there are many other specific performance questions like this. Conclusions from such a project could feed into a "Fortran best practices" guide.
Yep. My experience with intel is that I routinely have to unlimit the stack size to avoid hitting a segfault. I have no such issues with NAG, for example. |
Also I want to setup benchmarks for operations on arrays ( Those are things where I want Fortran compilers to do better. And a first step is to have reliable benchmarks, showing where the "performance bug" is. |
In that benchmark, a manually unrolled loop version would be cool to see. The compilers should already do a good job of this. It would be nice to see in a benchmark though if it’s worth it? |
This looks like a good candidate for a fortran-lang/benchmarks repo. This could include both benchmarks of stdlib, and more fundamental basic operations discussed in this thread. It would be nice to have a pipeline there that automatically generates a report for all procedures/programs in the suite. Later on when we change implementations of some stdlib procedures that are used by other procedures, it would be useful to verify that there's no significant performance regression when changing the implementation of a building block. Besides these, I'm also curious about what's the overhead of calling a generic procedure that overloads some 50 or 100 specific procedures. |
Additionally one should be careful with allocatable statements, the below is a perfectly reasonable code: real, allocatable :: a(:)
a = [1, 2, 3]
a = [1, 2, 3, 4] However, the compiler have to add checks for sizes and do deallocation/allocation in case they aren't commensurate. a(:) = [1, 2, 3] in which case it will segfault if the dimensions doesn't match. |
@zerothi indeed, that seems to be what is needed now. It seems very unfortunate that the natural syntax What's worse, say the code looks like this: real :: a(3)
...
a = [1, 2, 3] and then later you decide to change the declaration to real, allocatable :: a(:)
allocate(a(3))
...
a = [1, 2, 3] Then in the line I wonder if for this reason one should never use real :: a(3)
...
a(:) = [1, 2, 3] So that when the declaration is later changed to |
The technique of using |
I don't think the compiler distinguishes between Generally I use it, simply because it is an easy documentation reminder of the array dimensions ;) |
Is the overhead due to a run-time check that the RHS has the same shape as the LHS, or due to the reallocation on assignment always happens, even if the shape is the same? If the latter, that's compiler dependent, no? A compiler should be able to check at run-time if the array needs reallocating.
What happens if the re-allocation is desired, for example if |
Basically ..., allocatable :: a
a = [1, 2, 3] gets translated to this logical :: dealloc
dealloc = .false.
do i = lbound(a), ubound(a)
dealloc = dealloc .or. size(a,dim=i) /= size(RHS, dim=i)
end do
if ( dealloc ) then
deallocate(a)
allocate(a, mold=RHS)
end if
a(...) = RHS(...)
I have seen the same behaviour on both Intel and GCC, but yes, compilers could handle this differently.
Well you have nothing to do, then it won't allow reallocation. It isn't allowed by the specification (as far as I remember). |
@zerothi Okay, got it, that's clear now. So this:
applies only when we know reallocation is not intended, which I didn't make the connection at first. |
exactly. |
@zerothi how could compilers handle it differently? |
There could be flags that disable such checks and force array dimensions to be the same. This is just a guess and I don't know if it is there. Hence could ;) |
Both GFortran and Intel have such an option, see, e.g.: https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-standard-realloc-lhs. I didn't know if that's what you were talking about, or something else. |
Yes. |
Another important thing regarding
Since the signature would be the same. |
I was going to ask this in #113 but didn't want to derail that thread.
This is more of a personal question... since I've actually done very little with returning an allocatable array from a function. I may have flawed logic, so please correct me if i'm wrong.
I've had compile issues in the past returning allocatable from functions (cant remember all the details but perhaps others have had these too) and I "intuitively" thought there might be a slow down if those functions need calling multiple times.
Ill use the snippet from #113
If I have a large array, and I call this multiple times, does it reallocate memory for that variable every time? Or is there some cleverness that happens?
e.g. 1D result is already allocated, function allocates its own result, on return those results are copied to pre-allocated array.
Is it known what happens specifically, memory wise? Is it compiler dependent?
I'm asking because it will be slower if allocations keep happening, we might want to be careful and always make subroutines available where preallocated memory can be accessed without any implicit allocations and then copies. Am I worrying about this too much haha.
The text was updated successfully, but these errors were encountered: