perf: Tile 8×8 covariance matrix multiplication #1181

stephenswat · 2022-03-03T10:11:11Z

Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages:

It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods.
It allows us to perform some trickery with matrix identities to reduce the number of floating point operations.

codecov · 2022-03-03T11:05:45Z

Codecov Report

Attention: Patch coverage is 0% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 48.67%. Comparing base (a1b40bc) to head (b85fe9c).

Files	Patch %	Lines
Core/include/Acts/Propagator/EigenStepper.ipp	0.00%	1 Missing and 12 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1181      +/-   ##
==========================================
- Coverage   48.69%   48.67%   -0.03%     
==========================================
  Files         493      493              
  Lines       28992    29004      +12     
  Branches    13804    13816      +12     
==========================================
  Hits        14117    14117              
- Misses       4946     4947       +1     
- Partials     9929     9940      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This commit optimizes some of the Eigen usage in the covariance engine, specifically in the critical path for the propagation examples. The first optimisation we make is to introduce a tiled matrix multiplication method, which takes 2i×2j matrices, and performs four i×j multiplications instead, which Eigen can optimize far more easily. Secondly, we reduce the number of floating point operations performed by working with smaller submatrices wherever possible. On my machine, the following performance is achieved in the propagation example before this patch: 53.555595 ms/event. After this patch, we take 43.750143 ms/event. This performance gain is independent from the performance gain of acts-project#1181.

stephenswat · 2022-03-03T15:43:47Z

With both this and #1183, my performance for the Eigen stepper becomes 36.595292 ms/event on the generic propagation example, while the performance for the ATLAS stepper is 29.353479 ms/event. Getting closer!

Core/include/Acts/Propagator/EigenStepper.ipp

stephenswat · 2022-03-03T19:14:52Z

Miffed why this fails to be honest.

stephenswat · 2022-03-03T20:52:30Z

Okay, I really think this is just harmless numerical errors causing hash changes.

This commit optimizes some of the Eigen usage in the covariance engine, specifically in the critical path for the propagation examples. The first optimisation we make is to introduce a tiled matrix multiplication method, which takes 2i×2j matrices, and performs four i×j multiplications instead, which Eigen can optimize far more easily. Secondly, we reduce the number of floating point operations performed by working with smaller submatrices wherever possible. On my machine, the following performance is achieved in the propagation example before this patch: 53.555595 ms/event. After this patch, we take 43.750143 ms/event. This performance gain is independent from the performance gain of acts-project#1181.

This commit optimizes some of the Eigen usage in the covariance engine, specifically in the critical path for the propagation examples. The first optimisation we make is to introduce a tiled matrix multiplication method, which takes 2i×2j matrices, and performs four i×j multiplications instead, which Eigen can optimize far more easily. Secondly, we reduce the number of floating point operations performed by working with smaller submatrices wherever possible. On my machine, the following performance is achieved in the propagation example before this patch: 53.555595 ms/event. After this patch, we take 43.750143 ms/event. This performance gain is independent from the performance gain of #1181.

paulgessinger · 2023-11-20T12:05:09Z

The physmon coverage and configuration is much more robust than it was a year or even half a year ago. I don't see any significant changes, other than the ROOT file hashes, everything is green.

Should we just merge this at this stage?

Core/include/Acts/Propagator/EigenStepper.ipp

stephenswat · 2023-11-21T19:50:54Z

Core/include/Acts/Propagator/EigenStepper.ipp

andiwand · 2024-03-01T09:32:01Z

looks like the physmon diff is gone

andiwand · 2024-03-01T10:24:03Z

physmon is passing. results here #1181 (comment)

hashes are changing which means it does at least something 😄

should we put this in @paulgessinger ?

andiwand · 2024-03-01T10:55:57Z

Did a quick performance measurement with https://github.com/andiwand/cern-scripts/blob/main/tmp/full_chain_perf.py

Fatras

on average main x 0.98

CKF

on average main x 0.90

acts-project-service · 2024-03-04T10:00:06Z

✅ Athena integration test results [`6182aef`]

✅ All tests successful

status	job	report
🟢	run_unit_tests
🟢	test_ActsEFTrackFit
🟢	test_ActsBenchmarkWithSpot
🟢	test_ActsConversionWorkflow
🟢	test_ActsWorkflow
🟢	test_ActsValidateAmbiguityResolution
🟢	test_ActsValidateResolvedTracks
🟢	test_ActsValidateTracks
🟢	test_ActsValidateActsCoreSpacePoints
🟢	test_ActsValidateActsSpacePoints
🟢	test_ActsValidateSeeds
🟢	test_ActsValidateOrthogonalSeeds
🟢	test_ActsValidateClusters
🟢	test_ActsPersistifyEDM
🟢	test_ActsGSFRefitting
🟢	test_ActsKfRefitting
🟢	test_ActsExtrapolationAlgTest
🟢	test_ActsITkTest
🟢	run_workflow_tests_run4_mc
🟢	run_workflow_tests_run2_mc
🟢	run_workflow_tests_run2_data
🟢	run_workflow_tests_run3_mc
🟢	run_workflow_tests_run3_data
🟢	run_art_test: test_data18_13TeV_1000evt
🟢	run_art_test: test_ttbarPU40_reco

paulgessinger

All green, let's go!

Just to try if this also returns clean outputs, I'm downgrading the `if` added in #1181 to an `assert`. Let's see what the CI says here.

Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages: 1. It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods. 2. It allows us to perform some trickery with matrix identities to reduce the number of floating point operations. Co-authored-by: Andreas Stefl <[email protected]>

…ect#3009) Just to try if this also returns clean outputs, I'm downgrading the `if` added in acts-project#1181 to an `assert`. Let's see what the CI says here.

Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages: 1. It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods. 2. It allows us to perform some trickery with matrix identities to reduce the number of floating point operations. Co-authored-by: Andreas Stefl <[email protected]>

…ect#3009) Just to try if this also returns clean outputs, I'm downgrading the `if` added in acts-project#1181 to an `assert`. Let's see what the CI says here.

Currently, we are multiplying an 8x8 covariance matrix with an 8x8 transport matrix, and we see that Eigen is failing to optimize this properly, because it is calling a generalized GEMM method rather than an optimized small matrix method. In order to resolve this, we change the code to use a tiled multiplication method which splits the matrices into 4x4 sub-matrices which can be multiplied and added to achieve the desired effect. This has two advantages: 1. It allows Eigen to use its hand-rolled optimized 4x4 matrix multiplication methods. 2. It allows us to perform some trickery with matrix identities to reduce the number of floating point operations. Co-authored-by: Andreas Stefl <[email protected]>

…ect#3009) Just to try if this also returns clean outputs, I'm downgrading the `if` added in acts-project#1181 to an `assert`. Let's see what the CI says here.

stephenswat requested a review from paulgessinger March 3, 2022 10:11

stephenswat added Component - Core Affects the Core module Impact - Minor Nuissance bug and/or affects only a single module Improvement Changes to an existing feature labels Mar 3, 2022

stephenswat added this to the next milestone Mar 3, 2022

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch from 2097897 to b290c25 Compare March 3, 2022 10:24

stephenswat changed the title ~~perf: Tile 8x8 covariance matrix multiplication~~ perf: Tile 8×8 covariance matrix multiplication Mar 3, 2022

stephenswat mentioned this pull request Mar 3, 2022

perf: Optimize Eigen usage in covariance engine #1183

Merged

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch from b290c25 to a19f558 Compare March 3, 2022 15:34

paulgessinger reviewed Mar 3, 2022

View reviewed changes

Core/include/Acts/Propagator/EigenStepper.ipp Show resolved Hide resolved

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch 2 times, most recently from 8732133 to 56bc920 Compare March 3, 2022 17:30

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch from 56bc920 to 0309ea8 Compare March 3, 2022 20:50

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch from 0309ea8 to 61fb806 Compare March 4, 2022 12:42

stephenswat added the 🚧 WIP Work-in-progress label Mar 4, 2022

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch from 61fb806 to 2309f35 Compare March 15, 2022 22:25

stephenswat removed the 🚧 WIP Work-in-progress label Mar 18, 2022

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch from 2309f35 to d12cf00 Compare March 18, 2022 09:41

paulgessinger mentioned this pull request Mar 23, 2022

ci: Physics performance monitoring #1193

Merged

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch 2 times, most recently from 47ebcd7 to ffc6256 Compare March 31, 2022 16:12

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch from ffc6256 to 51e2ff1 Compare April 13, 2022 13:05

stephenswat force-pushed the perf/cov_matrix_mult_4x4 branch from 51e2ff1 to ee2fed6 Compare April 26, 2022 15:14

paulgessinger reopened this Nov 19, 2023

Merge branch 'main' into perf/cov_matrix_mult_4x4

8ccde7d

andiwand reviewed Nov 20, 2023

View reviewed changes

Core/include/Acts/Propagator/EigenStepper.ipp Outdated Show resolved Hide resolved

andiwand removed the Stale label Nov 20, 2023

andiwand reviewed Nov 22, 2023

View reviewed changes

Core/include/Acts/Propagator/EigenStepper.ipp Outdated Show resolved Hide resolved

andiwand added 2 commits November 22, 2023 11:11

Update Core/include/Acts/Propagator/EigenStepper.ipp

a36fad4

Merge branch 'main' into perf/cov_matrix_mult_4x4

9db933a

format; update hashes

aa21577

github-actions bot added Component - Examples Affects the Examples module Changes Performance labels Mar 1, 2024

Merge branch 'main' into perf/cov_matrix_mult_4x4

b83164a

paulgessinger approved these changes Mar 4, 2024

View reviewed changes

paulgessinger added the automerge label Mar 4, 2024

Merge branch 'main' into perf/cov_matrix_mult_4x4

b85fe9c

kodiakhq bot merged commit 6182aef into acts-project:main Mar 4, 2024
54 checks passed

github-actions bot removed the automerge label Mar 4, 2024

paulgessinger mentioned this pull request Mar 5, 2024

refactor: Unconditionally use blocked mult in EigenStepper #3009

Merged

kodiakhq bot pushed a commit that referenced this pull request Mar 6, 2024

refactor: Unconditionally use blocked mult in EigenStepper (#3009)

f25a727

Just to try if this also returns clean outputs, I'm downgrading the `if` added in #1181 to an `assert`. Let's see what the CI says here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Tile 8×8 covariance matrix multiplication #1181

perf: Tile 8×8 covariance matrix multiplication #1181

stephenswat commented Mar 3, 2022

codecov bot commented Mar 3, 2022 •

edited

Loading

stephenswat commented Mar 3, 2022

stephenswat commented Mar 3, 2022

stephenswat commented Mar 3, 2022

paulgessinger commented Nov 20, 2023

stephenswat commented Nov 21, 2023

andiwand commented Mar 1, 2024

andiwand commented Mar 1, 2024 •

edited

Loading

andiwand commented Mar 1, 2024 •

edited

Loading

acts-project-service commented Mar 4, 2024 •

edited

Loading

paulgessinger left a comment

perf: Tile 8×8 covariance matrix multiplication #1181

perf: Tile 8×8 covariance matrix multiplication #1181

Conversation

stephenswat commented Mar 3, 2022

codecov bot commented Mar 3, 2022 • edited Loading

Codecov Report

stephenswat commented Mar 3, 2022

stephenswat commented Mar 3, 2022

stephenswat commented Mar 3, 2022

paulgessinger commented Nov 20, 2023

stephenswat commented Nov 21, 2023

andiwand commented Mar 1, 2024

andiwand commented Mar 1, 2024 • edited Loading

andiwand commented Mar 1, 2024 • edited Loading

acts-project-service commented Mar 4, 2024 • edited Loading

✅ Athena integration test results [6182aef]

✅ All tests successful

paulgessinger left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 3, 2022 •

edited

Loading

andiwand commented Mar 1, 2024 •

edited

Loading

andiwand commented Mar 1, 2024 •

edited

Loading

acts-project-service commented Mar 4, 2024 •

edited

Loading

✅ Athena integration test results [`6182aef`]