-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outer product #432
Outer product #432
Conversation
src/stdlib_linalg.fypp
Outdated
module function outer_product_${t1[0]}$${k1}$(u,v) result(res) | ||
${t1}$, intent(in) :: u(:), v(:) | ||
${t1}$ :: res(size(u),size(v)) | ||
integer :: col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to leave col
only in the submodule part with the implementation. The top level module only needs to contain the public interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, I just copy and pasted indiscriminately from the submodule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think it's a great start!
Thanks for doing the benchmark. It would be interesting to add the results of using do_concurrent
(take a look at #429). As far as I can tell the columns of the result matrix are independent.
Thanks for the feedback. Yes, the columns are are independent so some sort of parallelization should provide a nice speedup.
Some caveats about my
|
I don't think this is a good example to test First, it is important that we can't actually test this within stdlib, because the compiler specific optimization for concurrencies is disabled by default. Therefore, it is best to test outside stdlib and make sure to enable vectorization and parallelization of concurrencies in the compiler (in I attempted a test on 8 cores of an AMD EPYC 7742 using
My conclusion from those numbers here is that parallel execution will only burn more CPU time than doing any good for this kind of problem. |
@awvwgk Thank you for the more detailed tests and suggestions. I believe this means my tests probably didn't even enable parallelization based on your comment. I will leave it as a standard I'm working on getting the tests done now, but I am a bit worried that How do you all commit after confirming a change builds properly? Do you |
@ghbrown I highly recommend to build out-of-tree ( |
The tests should now be completed and committed. I will work on the documentation now. @awvwgk Thanks. Yes, I had been using that out-of-tree make command. Even after using
Not a big deal, just curious. Perhaps I hit return too early after entering |
I believe these last three commits should cover the documentation. I made a few assumptions about how the documentation works in Please let me know if there are any outstanding issues to address. Thanks. |
@ivan-pi All comments should be addressed. Thanks for catching my mistakes, apologies there were so many. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing here, this is a useful addition to stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this nice addition. Here are some comments/suggestions.
Co-authored-by: Jeremie Vandenplas <[email protected]>
Co-authored-by: Jeremie Vandenplas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ghbrown and reviewers! I will wait one more day before merging in case there are any more requests for changes.
I'll merge. Thank oyu @ghbrown |
Hi, all! I have implemented an outer product function (for vectors only)
outer_product
as discussed in #427.This is my first pull request, so I wanted to check in early before I got going too far in the wrong direction. I have done speed tests on different implementations, implemented the most performant version, added the submodule files to the Makefiles, and made sure the project built. I did not see any other tests I could run locally, but perhaps there are additional tools for this.
Known deficiencies of pull request (to be addressed after feedback):
Speed test results
Thank you!