Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outer product #432

Merged
merged 16 commits into from
Jun 30, 2021
Merged

Outer product #432

merged 16 commits into from
Jun 30, 2021

Conversation

ghbrown
Copy link
Member

@ghbrown ghbrown commented Jun 10, 2021

Hi, all! I have implemented an outer product function (for vectors only) outer_product as discussed in #427.

This is my first pull request, so I wanted to check in early before I got going too far in the wrong direction. I have done speed tests on different implementations, implemented the most performant version, added the submodule files to the Makefiles, and made sure the project built. I did not see any other tests I could run locally, but perhaps there are additional tools for this.

Known deficiencies of pull request (to be addressed after feedback):

  • no tests
  • no documentation

Speed test results

--- for two vectors of size n= 15000 ---
outer_product_spread() :    4.30 s  (spread vectors into matrices of proper size and multiply elementswise)
outer_product_rowwise():    3.33 s  (assign result rowwise)
outer_product_matmul():     0.49 s  (reshape input vectors into mx1 and 1xn matrices, then matmul())
outer_product_columnwise(): 0.33 s  (assign result columnwise, THIS IS THE IMPLEMENTATION USED)

Thank you!

@awvwgk awvwgk added the reviewers needed This patch requires extra eyes label Jun 10, 2021
module function outer_product_${t1[0]}$${k1}$(u,v) result(res)
${t1}$, intent(in) :: u(:), v(:)
${t1}$ :: res(size(u),size(v))
integer :: col
Copy link
Member

@ivan-pi ivan-pi Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to leave col only in the submodule part with the implementation. The top level module only needs to contain the public interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, I just copy and pasted indiscriminately from the submodule.

Copy link
Member

@ivan-pi ivan-pi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think it's a great start!

Thanks for doing the benchmark. It would be interesting to add the results of using do_concurrent (take a look at #429). As far as I can tell the columns of the result matrix are independent.

@ivan-pi ivan-pi marked this pull request as draft June 10, 2021 14:10
@ghbrown
Copy link
Member Author

ghbrown commented Jun 10, 2021

Thanks for the feedback. Yes, the columns are are independent so some sort of parallelization should provide a nice speedup.
I have implemented do concurrent in my tests, and it seems to perform very similarly to the columnwise method using standard do (within 5%, sometimes faster, sometimes slower), and this gap is independent of the optimization level (no optimization flag and -O3 both tested):

n=35000
(I am testing/timing on function at a time now, so can
run larger system since only using 1/3-1/4 the memory)

---no optimization---
columnwise:      3.57 s
columnwise_conc: 3.55 s

--- level -O3 optimization ---
columnwise:      2.09 s
columnwise_conc: 2.17 s

Some caveats about my do concurrent implementation:

  • the issue you linked sent me down the rabbit hole of #62, which gives a few more qualifiers on what do concurrent can do for you in the absence of additional keywords like local(...) and shared(...) and whether it offers much performance boost without them; this is my time messing with do concurrent, so these are my preliminary conclusions from that issue
  • on the topic of local(...) and shared(...), I can't seem to use them without an error from gfortran even though the Intel documentation mentions them; perhaps these keywords are unique Intel ifort features?
  • I'm running on a dual core laptop, so perhaps not the best platform to reap the gains of parallelization (if anything gets parallelized in any way)

@awvwgk
Copy link
Member

awvwgk commented Jun 12, 2021

I don't think this is a good example to test do concurrent.

First, it is important that we can't actually test this within stdlib, because the compiler specific optimization for concurrencies is disabled by default. Therefore, it is best to test outside stdlib and make sure to enable vectorization and parallelization of concurrencies in the compiler (in ifort you have to add -qopenmp for this purpose).

I attempted a test on 8 cores of an AMD EPYC 7742 using ifort -Ofast -qopenmp -march=native -fma (dim=20000, 20 repeats) and another test on 8 cores of an Intel(R) Xeon(R) Gold 6148 usng ifort -Ofast -qopenmp -xHOST (dim=10000, 10 repeats):

implementation average runtime (1.) average runtime (2.)
omp parallel do simd 7.2194s 3.1270s
do concurrent 7.9281s 3.3452s
normal do loop 9.5253s 2.8560s

My conclusion from those numbers here is that parallel execution will only burn more CPU time than doing any good for this kind of problem.

@ghbrown
Copy link
Member Author

ghbrown commented Jun 12, 2021

@awvwgk Thank you for the more detailed tests and suggestions. I believe this means my tests probably didn't even enable parallelization based on your comment. I will leave it as a standard do loop.

I'm working on getting the tests done now, but I am a bit worried that git status informs me of a horde of new files that will be committed. It seems like mainly .f90 files and files inside tests/ with no extensions (executables probably?); I believe these files are from building the project.

How do you all commit after confirming a change builds properly? Do you make clean to clear all the artifacts of the build, or have I just configured something wrong? My assumption was that such files would be excluded by .gitignores, but that doesn't seem to be the case.

@awvwgk
Copy link
Member

awvwgk commented Jun 12, 2021

@ghbrown I highly recommend to build out-of-tree (cmake -B build instead of just cmake, described in https://github.com/fortran-lang/stdlib#build-with-cmake) as this separates the build artifacts from the actual source.

@jvdp1 jvdp1 marked this pull request as ready for review June 12, 2021 19:29
@jvdp1 jvdp1 marked this pull request as draft June 12, 2021 19:32
@milancurcic
Copy link
Member

Thanks @ghbrown! Pending docs and simple tests as suggested by @ivan-pi, I think this is good to go.

@ghbrown
Copy link
Member Author

ghbrown commented Jun 14, 2021

The tests should now be completed and committed. I will work on the documentation now.

@awvwgk Thanks. Yes, I had been using that out-of-tree make command. Even after using cmake --build build --target clean I still have the following new files:

	src/tests/logger/fifth_log_file.txt
	src/tests/logger/first_log_file.txt
	src/tests/logger/fourth_log_file.txt
	src/tests/logger/second_log_file.txt
	src/tests/logger/sixth_log_file.txt
	src/tests/logger/third_log_file.txt
	src/tests/sorting/test_sorting.txt
	src/tests/string/scratch.txt

Not a big deal, just curious. Perhaps I hit return too early after entering cmake, but even so, I guess I expect clean and .gitignores to cover everything.

@ghbrown
Copy link
Member Author

ghbrown commented Jun 14, 2021

I believe these last three commits should cover the documentation. I made a few assumptions about how the documentation works in stdlib_linalg.fypp, but I think it should all be okay.

Please let me know if there are any outstanding issues to address. Thanks.

doc/specs/stdlib_linalg.md Outdated Show resolved Hide resolved
doc/specs/stdlib_linalg.md Outdated Show resolved Hide resolved
doc/specs/stdlib_linalg.md Outdated Show resolved Hide resolved
doc/specs/stdlib_linalg.md Outdated Show resolved Hide resolved
doc/specs/stdlib_linalg.md Outdated Show resolved Hide resolved
@ghbrown
Copy link
Member Author

ghbrown commented Jun 14, 2021

@ivan-pi All comments should be addressed. Thanks for catching my mistakes, apologies there were so many.

Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing here, this is a useful addition to stdlib.

@awvwgk awvwgk marked this pull request as ready for review June 19, 2021 10:53
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this nice addition. Here are some comments/suggestions.

doc/specs/stdlib_linalg.md Show resolved Hide resolved
doc/specs/stdlib_linalg.md Outdated Show resolved Hide resolved
doc/specs/stdlib_linalg.md Outdated Show resolved Hide resolved
src/stdlib_linalg.fypp Outdated Show resolved Hide resolved
src/tests/linalg/test_linalg.f90 Show resolved Hide resolved
src/stdlib_linalg_outer_product.fypp Outdated Show resolved Hide resolved
Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ghbrown and reviewers! I will wait one more day before merging in case there are any more requests for changes.

@jvdp1
Copy link
Member

jvdp1 commented Jun 30, 2021

I'll merge. Thank oyu @ghbrown

@jvdp1 jvdp1 merged commit c598899 into fortran-lang:master Jun 30, 2021
@awvwgk awvwgk removed the reviewers needed This patch requires extra eyes label Jul 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants