Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection algorithms #500

Merged
merged 53 commits into from
Dec 1, 2021
Merged

Conversation

gareth-nx
Copy link
Contributor

@gareth-nx gareth-nx commented Aug 28, 2021

This pull request implements quick selection algorithms, which can find the kth-smallest entry of an array in time that scales with O(size(array)), so is faster than sorting the whole array.

See discussion in issue 471.

Basically this is useful for efficient computation of the median and other percentiles, and as a building block for problems such as mentioned in issue495 and issue 405 and issue 378.

This is my first time contributing to stdlib, and one thing I am a uncertain about is the structure of the CMake and Makefiles in the test suite,. I copied and modified those from the stats tests, and while it all seems to work, I wonder if it can be simplified?

@awvwgk awvwgk added the reviewers needed This patch requires extra eyes label Aug 28, 2021
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @gareth-nx for these algorithms. My major comment is about the license: is it compatible with the MIT license?

doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
doc/specs/stdlib_selection.md Outdated Show resolved Hide resolved
@awvwgk awvwgk added the topic: algorithms searching and sorting, merging, ... label Sep 18, 2021
@jvdp1
Copy link
Member

jvdp1 commented Oct 2, 2021

@gareth-nx what is the status of this PR? Re: the license?

@gareth-nx
Copy link
Contributor Author

@jvdp1 So far the author of the matlab code didn't reply to me. I plan to just re-write the partition step (which still mirrors the matlab code -- the rest does not). This should be straightforward. So far I haven't found the time, but likely in the next week or two I can do that.

@gareth-nx
Copy link
Contributor Author

So my recent changes introduced bugs on some platforms but not others. It seemed to work on my machine, but I will test with a few different compilation options to try to catch the issue.

src/stdlib_selection.fypp Outdated Show resolved Hide resolved
src/stdlib_selection.fypp Outdated Show resolved Hide resolved
src/stdlib_selection.fypp Outdated Show resolved Hide resolved
Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few more nits to pick.

src/stdlib_selection.fypp Show resolved Hide resolved
src/stdlib_selection.fypp Outdated Show resolved Hide resolved
src/stdlib_selection.fypp Outdated Show resolved Hide resolved
src/stdlib_selection.fypp Outdated Show resolved Hide resolved
src/stdlib_selection.fypp Outdated Show resolved Hide resolved
src/stdlib_selection.fypp Outdated Show resolved Hide resolved
Copy link
Member

@ivan-pi ivan-pi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some minor comments.

@ivan-pi
Copy link
Member

ivan-pi commented Nov 24, 2021

I'm also worried what happens if an array contains NaN's, and how they interact with infinite loops. The R sort routine (also covers partial sorting) has an option to place NaN at the front or the end.

I'm fine with having this PR merged (documenting that no precautions are taken for NaN values) and then discussing potential NaN modifications in a new issue.

@jvdp1
Copy link
Member

jvdp1 commented Nov 24, 2021 via email

@gareth-nx
Copy link
Contributor Author

I'm also worried what happens if an array contains NaN's, and how they interact with infinite loops. The R sort routine (also covers partial sorting) has an option to place NaN at the front or the end.

I'm fine with having this PR merged (documenting that no precautions are taken for NaN values) and then discussing potential NaN modifications in a new issue.

I'm also not sure what would happen if we pass an array with NaN values -- does the loop even exit? Will have to look into it.

@gareth-nx
Copy link
Contributor Author

I'm also worried what happens if an array contains NaN's, and how they interact with infinite loops. The R sort routine (also covers partial sorting) has an option to place NaN at the front or the end.
I'm fine with having this PR merged (documenting that no precautions are taken for NaN values) and then discussing potential NaN modifications in a new issue.

I'm also not sure what would happen if we pass an array with NaN values -- does the loop even exit? Will have to look into it.

I ran a few experiments with NaN values, and the code seems to run fine (it exits).

I'm also worried what happens if an array contains NaN's, and how they
interact with infinite loops. The R sort https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sort routine (also covers partial sorting) has an option to place NaN at the front or the end. A sentence like in stdlib_sorting.fypp could be added: "If both the type of array is real and at least one of the elements is a NaN, then the ordering of the result is undefined." Le mer. 24 nov. 2021 à 21:08, Ivan Pribec @.***> a écrit :

I'm also worried what happens if an array contains NaN's, and how they interact with infinite loops. The R sort https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sort routine (also covers partial sorting) has an option to place NaN at the front or the end. I'm fine with having this PR merged (documenting that no precautions are taken for NaN values) and then discussing potential NaN modifications in a new issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#500 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5RO7ABKVILY45276C6B3LUNVA2HANCNFSM5C6RRLPQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

From experiments, the code runs when NaN values are included in the array, but there is not a consistent interpretation given to how they are ordered relative to regular numbers. I have added documentation in the specs to note that NaNs are not supported.

@gareth-nx
Copy link
Contributor Author

I think that I've addressed all the recent comments -- thanks all for the suggestions.

src/stdlib_selection.fypp Outdated Show resolved Hide resolved
@gareth-nx
Copy link
Contributor Author

I believe the recent commit addresses the remaining comments of @ivan-pi -- ready to merge if there are no other issues. Thanks.

@jvdp1
Copy link
Member

jvdp1 commented Nov 28, 2021

@ivan-pi: are you happy with @gareth-nx 's changes? If yes, could you merge this PR, please?

@awvwgk awvwgk dismissed ivan-pi’s stale review November 29, 2021 21:24

Requested changes addressed

@milancurcic
Copy link
Member

Thank you all, let's merge. @ivan-pi if there are further changes needed, let's address them in a follow-up PR.

@milancurcic milancurcic merged commit 22c1be0 into fortran-lang:master Dec 1, 2021
@gareth-nx gareth-nx deleted the pr/quickselect branch December 3, 2021 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: algorithms searching and sorting, merging, ...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants