Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement trueloc/falseloc #603

Merged
merged 6 commits into from
Dec 22, 2021
Merged

Conversation

awvwgk
Copy link
Member

@awvwgk awvwgk commented Dec 19, 2021

Closes #568

@awvwgk awvwgk added topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... reviewers needed This patch requires extra eyes labels Dec 19, 2021
Copy link
Contributor

@gareth-nx gareth-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- a few simple suggestions here, which should be easy to address.

Copy link
Contributor

@gareth-nx gareth-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@milancurcic
Copy link
Member

I have a concern about the implementation.

Since trueloc and falseloc call logicalloc and they're all functions, there will be a double copy of the result array on each call of trueloc and falseloc. It's possible that some of the poor performance shown here can be explained by the double copy. So maybe there is a performance penalty in doing it this way.

Why not implement the functionality of logicalloc in both trueloc and falseloc (and duplicate a few lines of code) and avoid the extra function call and array copy?

And, is falseloc really needed? You could call trueloc(.not. condition) to get that. I understand it's a matter of style and has no right or wrong answer.

FWIW, my preference would be, in this order:

  1. Have only trueloc and avoid calling logicalloc
  2. Have trueloc and falseloc and avoid calling logicalloc
  3. Current implementation.

@awvwgk
Copy link
Member Author

awvwgk commented Dec 20, 2021

Since trueloc and falseloc call logicalloc and they're all functions, there will be a double copy of the result array on each call of trueloc and falseloc. It's possible that some of the poor performance shown here can be explained by the double copy. So maybe there is a performance penalty in doing it this way.

The poor performance results mainly from the function call, evaluating the dimension of the returned array twice gives some penalty, the double copy will probably get optimized away. Inlining the implementation doesn't give a performance benefit, therefore I'll keep logicalloc as a subroutine. I also kept the timer in the unittests.

Implementation in 7b5ffac

[Timing] trueloc: 0.0218s where: 0.0043s ratio:  5.0
[Timing] trueloc: 0.0218s merge: 0.0044s ratio:  4.9
[Timing] falseloc: 0.0223s where: 0.0051s ratio:  4.4
[Timing] falseloc: 0.0233s merge: 0.0053s ratio:  4.4

Using subroutine instead of function (4a4ac22)

[Timing] trueloc: 0.0181s where: 0.0043s ratio:  4.2
[Timing] trueloc: 0.0168s merge: 0.0044s ratio:  3.8
[Timing] trueloc: 0.0180s pack: 0.0095s ratio:  1.9
[Timing] falseloc: 0.0174s where: 0.0050s ratio:  3.5
[Timing] falseloc: 0.0178s merge: 0.0052s ratio:  3.4
[Timing] falseloc: 0.0169s pack: 0.0098s ratio:  1.7

Duplicated inline implementation

[Timing] trueloc: 0.0184s where: 0.0042s ratio:  4.4
[Timing] trueloc: 0.0168s merge: 0.0046s ratio:  3.7
[Timing] falseloc: 0.0177s where: 0.0049s ratio:  3.6
[Timing] falseloc: 0.0177s merge: 0.0053s ratio:  3.3

Implementing logicalloc with pack intrinsic

[Timing] trueloc: 0.0182s where: 0.0042s ratio:  4.3
[Timing] trueloc: 0.0200s merge: 0.0053s ratio:  3.7
[Timing] trueloc: 0.0190s pack: 0.0079s ratio:  2.4
[Timing] falseloc: 0.0197s where: 0.0049s ratio:  4.0
[Timing] falseloc: 0.0202s merge: 0.0044s ratio:  4.6
[Timing] falseloc: 0.0200s pack: 0.0079s ratio:  2.5

@jvdp1
Copy link
Member

jvdp1 commented Dec 20, 2021

Implementation in 7b5ffac

[Timing] trueloc: 0.0218s where: 0.0043s ratio:  5.0
[Timing] trueloc: 0.0218s merge: 0.0044s ratio:  4.9
[Timing] falseloc: 0.0223s where: 0.0051s ratio:  4.4
[Timing] falseloc: 0.0233s merge: 0.0053s ratio:  4.4

Using subroutine instead of function

[Timing] trueloc: 0.0181s where: 0.0043s ratio:  4.2
[Timing] trueloc: 0.0168s merge: 0.0044s ratio:  3.8
[Timing] falseloc: 0.0174s where: 0.0050s ratio:  3.5
[Timing] falseloc: 0.0178s merge: 0.0052s ratio:  3.4

Duplicated inline implementation

[Timing] trueloc: 0.0184s where: 0.0042s ratio:  4.4
[Timing] trueloc: 0.0168s merge: 0.0046s ratio:  3.7
[Timing] falseloc: 0.0177s where: 0.0049s ratio:  3.6
[Timing] falseloc: 0.0177s merge: 0.0053s ratio:  3.3

Nice comparison. However, is this comparison fair for trueloc, since the aim is a bit different than where and merge? Furthermore, trueloc must initialize an `integer array, which is not the case of where and merge, if I understand it well.

@awvwgk
Copy link
Member Author

awvwgk commented Dec 20, 2021

Nice comparison. However, is this comparison fair for trueloc, since the aim is a bit different than where and merge? Furthermore, trueloc must initialize an `integer array, which is not the case of where and merge, if I understand it well.

Indeed, merge and where are both language intrinsic constructs and give the compiler many opportunities for optimization. For trueloc we have to construct an index array (function call, stack allocation, loop over array and increment counter), which can only be slower. With link time optimization the compiler might be able to optimize trueloc away again and regain the performance loss.

I would consider trueloc and falseloc a convenience feature, for actual compute intense task where is preferable (also offers worksharing via OMP).

@awvwgk awvwgk mentioned this pull request Dec 20, 2021
@jvdp1
Copy link
Member

jvdp1 commented Dec 20, 2021

You were faster than me @awvwgk ;) I also tested the pack approach using your branch, and trueloc is 1/0.7 faster than pack (which is for me more equivalent to trueloc than where or merge).
Furthermore, trueloc probably requests less memory than pack as a full integer array does not need to be generated.

@awvwgk
Copy link
Member Author

awvwgk commented Dec 20, 2021

Hmm, now I want to have timers in test-drive and stdlib, do we have an issue already for this?

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thahnk you.
Here are some minor comments

doc/specs/stdlib_array.md Outdated Show resolved Hide resolved
doc/specs/stdlib_array.md Outdated Show resolved Hide resolved
doc/specs/stdlib_array.md Outdated Show resolved Hide resolved
doc/specs/stdlib_array.md Outdated Show resolved Hide resolved
doc/specs/stdlib_array.md Outdated Show resolved Hide resolved
doc/specs/stdlib_array.md Outdated Show resolved Hide resolved
doc/specs/stdlib_array.md Show resolved Hide resolved
src/stdlib_array.f90 Show resolved Hide resolved
@gareth-nx
Copy link
Contributor

I would consider trueloc and falseloc a convenience feature, for actual compute intense task where is preferable (also offers worksharing via OMP).

I suggest that something like this is stated in the specs to help guide users.

@gareth-nx
Copy link
Contributor

Looks like this is good to be merged? @jvdp1, any further comments?

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @awvwgk

@gareth-nx gareth-nx merged commit 30f5321 into fortran-lang:master Dec 22, 2021
@awvwgk awvwgk deleted the logical-index branch December 22, 2021 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewers needed This patch requires extra eyes topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

true_pos
4 participants