Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Tile 8×8 covariance matrix multiplication #1181

Merged
merged 20 commits into from
Mar 4, 2024

Conversation

stephenswat
Copy link
Member

Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages:

  1. It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods.
  2. It allows us to perform some trickery with matrix identities to reduce the number of floating point operations.

@stephenswat stephenswat added Component - Core Affects the Core module Impact - Minor Nuissance bug and/or affects only a single module Improvement Changes to an existing feature labels Mar 3, 2022
@stephenswat stephenswat added this to the next milestone Mar 3, 2022
@stephenswat stephenswat changed the title perf: Tile 8x8 covariance matrix multiplication perf: Tile 8×8 covariance matrix multiplication Mar 3, 2022
@codecov
Copy link

codecov bot commented Mar 3, 2022

Codecov Report

Attention: Patch coverage is 0% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 48.67%. Comparing base (a1b40bc) to head (b85fe9c).

Files Patch % Lines
Core/include/Acts/Propagator/EigenStepper.ipp 0.00% 1 Missing and 12 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1181      +/-   ##
==========================================
- Coverage   48.69%   48.67%   -0.03%     
==========================================
  Files         493      493              
  Lines       28992    29004      +12     
  Branches    13804    13816      +12     
==========================================
  Hits        14117    14117              
- Misses       4946     4947       +1     
- Partials     9929     9940      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stephenswat added a commit to stephenswat/acts that referenced this pull request Mar 3, 2022
This commit optimizes some of the Eigen usage in the covariance engine,
specifically in the critical path for the propagation examples. The
first optimisation we make is to introduce a tiled matrix multiplication
method, which takes 2i×2j matrices, and performs four i×j
multiplications instead, which Eigen can optimize far more easily.
Secondly, we reduce the number of floating point operations performed by
working with smaller submatrices wherever possible.

On my machine, the following performance is achieved in the propagation
example before this patch: 53.555595 ms/event. After this patch, we take
43.750143 ms/event. This performance gain is independent from the
performance gain of acts-project#1181.
@stephenswat
Copy link
Member Author

With both this and #1183, my performance for the Eigen stepper becomes 36.595292 ms/event on the generic propagation example, while the performance for the ATLAS stepper is 29.353479 ms/event. Getting closer!

@stephenswat stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch 2 times, most recently from 8732133 to 56bc920 Compare March 3, 2022 17:30
@stephenswat
Copy link
Member Author

Miffed why this fails to be honest.

@stephenswat
Copy link
Member Author

Okay, I really think this is just harmless numerical errors causing hash changes.

@stephenswat stephenswat added the 🚧 WIP Work-in-progress label Mar 4, 2022
stephenswat added a commit to stephenswat/acts that referenced this pull request Mar 15, 2022
This commit optimizes some of the Eigen usage in the covariance engine,
specifically in the critical path for the propagation examples. The
first optimisation we make is to introduce a tiled matrix multiplication
method, which takes 2i×2j matrices, and performs four i×j
multiplications instead, which Eigen can optimize far more easily.
Secondly, we reduce the number of floating point operations performed by
working with smaller submatrices wherever possible.

On my machine, the following performance is achieved in the propagation
example before this patch: 53.555595 ms/event. After this patch, we take
43.750143 ms/event. This performance gain is independent from the
performance gain of acts-project#1181.
kodiakhq bot pushed a commit that referenced this pull request Mar 18, 2022
This commit optimizes some of the Eigen usage in the covariance engine, specifically in the critical path for the propagation examples. The first optimisation we make is to introduce a tiled matrix multiplication method, which takes 2i×2j matrices, and performs four i×j multiplications instead, which Eigen can optimize far more easily. Secondly, we reduce the number of floating point operations performed by working with smaller submatrices wherever possible.

On my machine, the following performance is achieved in the propagation example before this patch: 53.555595 ms/event. After this patch, we take 43.750143 ms/event. This performance gain is independent from the performance gain of #1181.
@stephenswat stephenswat removed the 🚧 WIP Work-in-progress label Mar 18, 2022
@stephenswat stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch 2 times, most recently from 47ebcd7 to ffc6256 Compare March 31, 2022 16:12
@paulgessinger paulgessinger reopened this Nov 19, 2023
@paulgessinger
Copy link
Member

The physmon coverage and configuration is much more robust than it was a year or even half a year ago. I don't see any significant changes, other than the ROOT file hashes, everything is green.

Should we just merge this at this stage?

@andiwand andiwand removed the Stale label Nov 20, 2023
@stephenswat
Copy link
Member Author

image

@andiwand
Copy link
Contributor

andiwand commented Mar 1, 2024

looks like the physmon diff is gone

@andiwand
Copy link
Contributor

andiwand commented Mar 1, 2024

physmon is passing. results here #1181 (comment)

hashes are changing which means it does at least something 😄

should we put this in @paulgessinger ?

@github-actions github-actions bot added Component - Examples Affects the Examples module Changes Performance labels Mar 1, 2024
@andiwand
Copy link
Contributor

andiwand commented Mar 1, 2024

Did a quick performance measurement with https://github.com/andiwand/cern-scripts/blob/main/tmp/full_chain_perf.py

Fatras

on average main x 0.98

image

CKF

on average main x 0.90

image

Copy link
Member

@paulgessinger paulgessinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All green, let's go!

@kodiakhq kodiakhq bot merged commit 6182aef into acts-project:main Mar 4, 2024
54 checks passed
@github-actions github-actions bot removed the automerge label Mar 4, 2024
kodiakhq bot pushed a commit that referenced this pull request Mar 6, 2024
Just to try if this also returns clean outputs, I'm downgrading the `if` added in #1181 to an `assert`. Let's see what the CI says here.
Ragansu pushed a commit to Ragansu/acts that referenced this pull request Mar 7, 2024
Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages:

  1. It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods.
  2. It allows us to perform some trickery with matrix identities to reduce the number of floating point operations.


Co-authored-by: Andreas Stefl <[email protected]>
Ragansu pushed a commit to Ragansu/acts that referenced this pull request Mar 7, 2024
…ect#3009)

Just to try if this also returns clean outputs, I'm downgrading the `if` added in acts-project#1181 to an `assert`. Let's see what the CI says here.
EleniXoch pushed a commit to EleniXoch/acts that referenced this pull request May 6, 2024
Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages:

  1. It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods.
  2. It allows us to perform some trickery with matrix identities to reduce the number of floating point operations.


Co-authored-by: Andreas Stefl <[email protected]>
EleniXoch pushed a commit to EleniXoch/acts that referenced this pull request May 6, 2024
…ect#3009)

Just to try if this also returns clean outputs, I'm downgrading the `if` added in acts-project#1181 to an `assert`. Let's see what the CI says here.
asalzburger pushed a commit to asalzburger/acts that referenced this pull request May 21, 2024
Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages:

  1. It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods.
  2. It allows us to perform some trickery with matrix identities to reduce the number of floating point operations.


Co-authored-by: Andreas Stefl <[email protected]>
asalzburger pushed a commit to asalzburger/acts that referenced this pull request May 21, 2024
…ect#3009)

Just to try if this also returns clean outputs, I'm downgrading the `if` added in acts-project#1181 to an `assert`. Let's see what the CI says here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changes Performance Component - Core Affects the Core module Component - Examples Affects the Examples module Impact - Minor Nuissance bug and/or affects only a single module Improvement Changes to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants