refactor!: Refactor CKF branch stopper to allow stop and keep tracks #3102

andiwand · 2024-04-15T13:29:16Z

Changes the CKF branch stopper to allow keeping the tracks after the branch is stopped. This is useful if a track already collected enough measurements but then starts to accumulate holes.

This change should allow us to stop branches more aggressively because we don't have to be worried anymore to throw out valid track candidates.

closes

Improvement: allow CKF to keep stopped branch #2967

blocked by

github-actions · 2024-04-15T14:01:29Z

📊: Physics performance monitoring for `d8560bf`

Full contents

physmon summary

AJPfleger

lgtm, as we already discussed this morning.

I don't see any problems to keep the track, if it has a lot of "holes" in the end. In most (all?) places we consider holes only as such, when they appear etween the first and last measurements.

timadye

Looks good.

It's a pity this change can't be made to be compatible, since an explicit cast from bool to BranchStopperResult would return the right result. But I can't see any way to make that work, especially for a Delegate call.

Invalidated by push of 09cfd26

andiwand · 2024-04-30T12:45:12Z

I am measuring a speedup of 3x which makes me wonder if that can be real 🤔

timadye

Unfortunately, when I run full_chain_itk.py, I get a crash:

Core/include/Acts/Utilities/VectorHelpers.hpp:141: std::array<double, 4> Acts::VectorHelpers::evaluateTrigonomics(const Acts::Vector3&): Assertion `sinTheta != 0 && "VectorHelpers: Vector is parallel to the z-axis " "which leads to division by zero"' failed.

after ~47 events. With main HEAD, it runs OK, at least to >90 events. Of course that difference could just be bad luck.

Core/include/Acts/TrackFinding/CombinatorialKalmanFilter.hpp

andiwand · 2024-04-30T18:40:37Z

Unfortunately, when I run full_chain_itk.py, I get a crash:
Core/include/Acts/Utilities/VectorHelpers.hpp:141: std::array<double, 4> Acts::VectorHelpers::evaluateTrigonomics(const Acts::Vector3&): Assertion `sinTheta != 0 && "VectorHelpers: Vector is parallel to the z-axis " "which leads to division by zero"' failed.
after ~47 events. With main HEAD, it runs OK, at least to >90 events. Of course that difference could just be bad luck.

Hm that looks unrelated. I can try to see if this is easily fixable but this should go into another PR in any case.

andiwand · 2024-05-01T10:04:18Z

@timadye after #3164 and #3163 the problem was gone for me

timadye · 2024-05-01T10:08:11Z

@timadye after #3164 and #3163 the problem was gone for me

Thanks! That should make testing the performance of this PR much easier. I will do that today.

timadye · 2024-05-01T17:23:56Z

@timadye after #3164 and #3163 the problem was gone for me

Those fix the problems I had, but my attempt to merge #3164 with this (#3102) led to a crash on the first event:

#5  0x00007f7e66f55b70 in std::vector<unsigned int, std::allocator<unsigned int> >::operator[] (__n=112, this=0x128) at /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/include/c++/13.1.0/bits/stl_vector.h:1143
#6  Acts::detail_vmt::VectorMultiTrajectoryBase::component_impl<true, Acts::VectorMultiTrajectory const> (istate=112, key=4099663144, instance=...) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/VectorMultiTrajectory.hpp:236
#7  Acts::VectorMultiTrajectory::component_impl (istate=112, key=4099663144, this=0x110) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/VectorMultiTrajectory.hpp:454
#8  Acts::MultiTrajectory<Acts::VectorMultiTrajectory>::component<unsigned int, 4099663144u> (istate=112, this=0x110) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/MultiTrajectory.hpp:593
#9  Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false>::component<unsigned int, 4099663144u> (this=0x22c15f50) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/TrackStateProxy.hpp:1083
#10 Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false>::previous (this=0x22c15f50) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/TrackStateProxy.hpp:270
#11 Acts::CombinatorialKalmanFilter<Acts::Propagator<Acts::EigenStepper<Acts::StepperExtensionList<Acts::DefaultExtension>, Acts::detail::VoidAuctioneer>, Acts::Navigator>, Acts::VectorMultiTrajectory>::Actor<Acts::Delegate<std::pair<Acts::SourceLinkAdapterIterator<boost::container::vec_iterator<ActsExamples::IndexSourceLink*, true> >, Acts::SourceLinkAdapterIterator<boost::container::vec_iterator<ActsExamples::IndexSourceLink*, true> > > (Acts::Surface const&), void, (Acts::DelegateType)1>, Acts::GenericBoundTrackParameters<Acts::ParticleHypothesis> >::processSelectedTrackStates(Acts::ContextType const&, __gnu_cxx::__normal_iterator<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false> const*, std::vector<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false>, std::allocator<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false> > > >, __gnu_cxx::__normal_iterator<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false> const*, std::vector<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false>, std::allocator<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false> > > >, Acts::CombinatorialKalmanFilterResult<Acts::VectorMultiTrajectory>&, bool, Acts::CombinatorialKalmanFilterTipState const&, unsigned long&) const (this=0x7ffd73f709f0, gctx=..., begin=..., end=..., result=..., isOutlier=true, prevTipState=..., nBranchesOnSurface=0x7ffd73f6e7d8: 0) at /home/ppd/adye/acts/build/src/Core/include/Acts/TrackFinding/CombinatorialKalmanFilter.hpp:825

Maybe I merged them badly. Did you try with both PRs?

timadye · 2024-05-01T23:32:33Z

For 1000 ttbar+PU200 events, with 8 threads, I get an overall speedup 9.1→8.2 s/event (x1.10) for the whole job, and in the TrackFindingAlgorithm, 2.7→1.9 s/event (x1.43). The efficiency is a bit weird, but I suppose mostly better than main:

timadye · 2024-05-01T23:43:58Z

Some more plots with interesting features:

andiwand · 2024-05-02T07:09:21Z

Just to be sure: these plots come from Athena reconstruction? And all of them are ttbar PU 200?

I am a little bit surprised that the efficiency is sometimes lower. While in phi it looks like it is almost consistently larger?
For efficiency over pT I guess our signal is concentrated on the left where we do see some improvements.
Over eta it looks more symmetrical which is nice.

I believe the increase in efficiency comes from #3164. I found a spot where we stop the branch but never stored the track.

Speedup 1.4 is great! I am surprised that it is so different between ODD Acts and ITk Athena-Acts. Maybe the previous branch aborter was already more effective than the ODD Acts one because of the detector geometry.

I guess you are using the same holes/outlier cuts as before on Athena main?

timadye · 2024-05-02T16:34:01Z

Just to be sure: these plots come from Athena reconstruction? And all of them are ttbar PU 200?

No, they are ACTS stand-alone with the ITk. All with ttbar PU 200.

I can't easily test it in Athena until the PRs are merged - and even then, the main--ACTS build will fail due to the breaking change - at least to start with.

I am a little bit surprised that the efficiency is sometimes lower. While in phi it looks like it is almost consistently larger?

That is η dependent. Overall, the efficiency is larger in more η regions than regions where it is smaller.

For efficiency over pT I guess our signal is concentrated on the left where we do see some improvements.

right. Most ttbar PU 200 tracks are low p_T.

Over eta it looks more symmetrical which is nice.

👍

I believe the increase in efficiency comes from #3164. I found a spot where we stop the branch but never stored the track.

Cool! I'll test this with #3164 on its own to separate the effect of the two PRs.

Speedup 1.4 is great! I am surprised that it is so different between ODD Acts and ITk Athena-Acts. Maybe the previous branch aborter was already more effective than the ODD Acts one because of the detector geometry.

Did you check this is still the case? You expressed some doubt about the x3 number before, and the lack of spread in times for different events is a little suspicious.

I assume your plot before was for the TrackFindingAlgorithm alone, so comparable with my x1.4 number. Mine was based on timing.tsv which only gives an average, not a histogram. How do yet get the timing for each event?

I guess you are using the same holes/outlier cuts as before on Athena main?

This is acts #3102 vs acts main. I am using the same holes/outlier cuts for both, ie. both with #3163 merged in.

andiwand · 2024-05-02T17:11:50Z

Did you check this is still the case? You expressed some doubt about the x3 number before, and the lack of spread in times for different events is a little suspicious.

I will run it again. The output seems to have changed significantly with 2b068ea which is unexpected but the change looks more correct than what was there before.

One suspicion I have is that we end up in the Solenoid quite often in the ODD and just waste a lot of time propagating there. This is prohibited in a lot of cases not by stopping the finding earlier.

timadye · 2024-05-02T17:23:20Z

#3164 didn't show any changes, so the efficiency changes must be down to this PR (#3102). I will also run this one again, following your latest updates.

andiwand · 2024-05-02T17:57:54Z

Speedup is still significant - seems to be real.

Here are some stats from the track finding which gives a little bit more insight.

main

19:53:34    TrackFinding   INFO      TrackFindingAlgorithm statistics:
19:53:34    TrackFinding   INFO      - total seeds: 1533560
19:53:34    TrackFinding   INFO      - deduplicated seeds: 363980
19:53:34    TrackFinding   INFO      - failed seeds: 0
19:53:34    TrackFinding   INFO      - failed smoothing: 0
19:53:34    TrackFinding   INFO      - failed extrapolation: 0
19:53:34    TrackFinding   INFO      - failure ratio seeds: 0
19:53:34    TrackFinding   INFO      - found tracks: 1028500
19:53:34    TrackFinding   INFO      - selected tracks: 33380
19:53:34    TrackFinding   INFO      - stopped branches: 141080

changed

19:50:37    TrackFinding   INFO      TrackFindingAlgorithm statistics:
19:50:37    TrackFinding   INFO      - total seeds: 1533560
19:50:37    TrackFinding   INFO      - deduplicated seeds: 288710
19:50:37    TrackFinding   INFO      - failed seeds: 0
19:50:37    TrackFinding   INFO      - failed smoothing: 0
19:50:37    TrackFinding   INFO      - failed extrapolation: 0
19:50:37    TrackFinding   INFO      - failure ratio seeds: 0
19:50:37    TrackFinding   INFO      - found tracks: 50090
19:50:37    TrackFinding   INFO      - selected tracks: 33950
19:50:37    TrackFinding   INFO      - stopped branches: 1196310

stopped branches is ~10x

timadye · 2024-05-02T22:11:46Z

Same test as before, but comparing #3102 after 2b068ea was added against the old main. Similar speedup as before: TrackFindingAlgorithm 2.7→1.9 s/event (x1.42), but even better efficiency improvement:

timadye · 2024-05-02T22:21:35Z

again for the ITk:

`main`:

00:14:50    TrackFinding   INFO      TrackFindingAlgorithm statistics:
00:14:50    TrackFinding   INFO      - total seeds: 20906970
00:14:50    TrackFinding   INFO      - deduplicated seeds: 18113959
00:14:50    TrackFinding   INFO      - failed seeds: 0
00:14:50    TrackFinding   INFO      - failed smoothing: 0
00:14:50    TrackFinding   INFO      - failed extrapolation: 0
00:14:50    TrackFinding   INFO      - failure ratio seeds: 0
00:14:50    TrackFinding   INFO      - found tracks: 4771749
00:14:50    TrackFinding   INFO      - selected tracks: 3037718
00:14:50    TrackFinding   INFO      - stopped branches: 636123

#3102 (`2b068ea`):

18:58:50    TrackFinding   INFO      TrackFindingAlgorithm statistics:
18:58:50    TrackFinding   INFO      - total seeds: 20894532
18:58:50    TrackFinding   INFO      - deduplicated seeds: 17905047
18:58:50    TrackFinding   INFO      - failed seeds: 0
18:58:50    TrackFinding   INFO      - failed smoothing: 0
18:58:50    TrackFinding   INFO      - failed extrapolation: 0
18:58:50    TrackFinding   INFO      - failure ratio seeds: 0
18:58:50    TrackFinding   INFO      - found tracks: 3527696
18:58:50    TrackFinding   INFO      - selected tracks: 3193132
18:58:50    TrackFinding   INFO      - stopped branches: 1946303

so x3.1 stopped branches. Comparing with ~x10 for the ODD @andiwand found, is in line with the smaller speedup.

timadye

This is clearly an improvement. Unless you are planning other breaking changes to be included at the same time, is it ready to go? 🚀

acts-project-service · 2024-05-03T08:10:28Z

✅ Athena integration test results [`bed53b2`]

✅ All tests successful

status	job	report
🟢	run_unit_tests
🟢	test_ActsEFTrackFit
🟢	test_ActsPersistifySeeds
🟢	test_ActsBenchmarkWithSpot
🟢	test_ActsAnalogueClustering
🟢	test_ActsConversionWorkflow
🟢	test_ActsWorkflowHeavyIons
🟢	test_ActsWorkflowFastTracking
🟢	test_ActsWorkflowCached
🟢	test_ActsWorkflow
🟢	test_ActsValidateAmbiguityResolution
🟢	test_ActsValidateResolvedTracks
🟢	test_ActsValidateTracks
🟢	test_ActsValidateActsCoreSpacePoints
🟢	test_ActsValidateActsSpacePoints
🟢	test_ActsValidateSeeds
🟢	test_ActsValidateOrthogonalSeeds
🟢	test_ActsValidateClusters
🟢	test_ActsPersistifyEDM
🟢	test_ActsGSFRefitting
🟢	test_ActsKfRefitting
🟢	test_ActsExtrapolationAlgTest
🟢	test_ActsITkTest
🟢	run_workflow_tests_run4_mc
🟢	run_workflow_tests_run2_mc
🟢	run_workflow_tests_run2_data
🟢	run_workflow_tests_run3_mc
🟢	run_workflow_tests_run3_data
🟢	run_art_test: test_data18_13TeV_1000evt
🟢	run_art_test: test_ttbarPU40_reco

…cts-project#3102) Changes the CKF branch stopper to allow keeping the tracks after the branch is stopped. This is useful if a track already collected enough measurements but then starts to accumulate holes. This change should allow us to stop branches more aggressively because we don't have to be worried anymore to throw out valid track candidates. closes - acts-project#2967 blocked by - acts-project#3164 - acts-project#3163

After removing this in #3102 I reintroduce this functionality here to reduce the changes to a minimum. After #3102 we see some bigger tack finding changes than expected which I suspect are coming from the fact that we effectively stopped most of the branches when reaching a measurement surface. Having this callback will also be necessary if we want to allow the user to count holes/outliers/measurements in specific geometry regions by themselves.

…cts-project#3102) Changes the CKF branch stopper to allow keeping the tracks after the branch is stopped. This is useful if a track already collected enough measurements but then starts to accumulate holes. This change should allow us to stop branches more aggressively because we don't have to be worried anymore to throw out valid track candidates. closes - acts-project#2967 blocked by - acts-project#3164 - acts-project#3163

…oject#3172) After removing this in acts-project#3102 I reintroduce this functionality here to reduce the changes to a minimum. After acts-project#3102 we see some bigger tack finding changes than expected which I suspect are coming from the fact that we effectively stopped most of the branches when reaching a measurement surface. Having this callback will also be necessary if we want to allow the user to count holes/outliers/measurements in specific geometry regions by themselves.

refactor CKF branch stopper to allow stop and keep tracks

803323c

andiwand added this to the next milestone Apr 15, 2024

andiwand requested a review from timadye April 15, 2024 13:29

andiwand changed the title ~~refactor: Refactor CKF branch stopper to allow stop and keep tracks~~ refactor!: Refactor CKF branch stopper to allow stop and keep tracks Apr 15, 2024

github-actions bot added Component - Core Affects the Core module Track Finding labels Apr 15, 2024

andiwand modified the milestones: next, v35.0.0 Apr 15, 2024

AJPfleger previously approved these changes Apr 15, 2024

View reviewed changes

timadye previously approved these changes Apr 15, 2024

View reviewed changes

andiwand mentioned this pull request Apr 17, 2024

Improvement: allow CKF to keep stopped branch #2967

Closed

timadye mentioned this pull request Apr 17, 2024

feat: Branch aborter for track finding in Examples #3098

Merged

Merge branch 'main' into refactor-ckf-branch-stopper-stop-and-keep

292f9d2

andiwand dismissed AJPfleger’s stale review via 292f9d2 April 19, 2024 14:06

andiwand added 2 commits April 24, 2024 09:00

Merge branch 'main' into refactor-ckf-branch-stopper-stop-and-keep

2a901f5

refactor examples branch stopper

09cfd26

github-actions bot added the Component - Examples Affects the Examples module label Apr 24, 2024

Merge branch 'main' into refactor-ckf-branch-stopper-stop-and-keep

c7a19b6

andiwand requested a review from timadye April 30, 2024 12:36

timadye reviewed Apr 30, 2024

View reviewed changes

Core/include/Acts/TrackFinding/CombinatorialKalmanFilter.hpp Show resolved Hide resolved

Core/include/Acts/TrackFinding/CombinatorialKalmanFilter.hpp Outdated Show resolved Hide resolved

andiwand added 2 commits May 1, 2024 10:22

refactor: Common function to store tracks in Core CKF

92c8648

fix out of bounds

183b792

improve doc

5e601b7

andiwand added 2 commits May 2, 2024 09:22

use storeLastActiveTip

2b068ea

move enum

61b1c7b

Merge branch 'main' into refactor-ckf-branch-stopper-stop-and-keep

2e285c2

andiwand removed the 🛑 blocked This item is blocked by another item label May 2, 2024

update ref

9ba9f9f

github-actions bot added Infrastructure Changes to build tools, continous integration, ... Changes Performance labels May 2, 2024

timadye approved these changes May 2, 2024

View reviewed changes

andiwand added the automerge label May 3, 2024

Merge branch 'main' into refactor-ckf-branch-stopper-stop-and-keep

d8560bf

kodiakhq bot merged commit bed53b2 into acts-project:main May 3, 2024
51 checks passed

github-actions bot removed the automerge label May 3, 2024

andiwand deleted the refactor-ckf-branch-stopper-stop-and-keep branch May 3, 2024 07:24

acts-project-service added Breaks Athena build This PR breaks the Athena build Fails Athena tests This PR causes a failure in the Athena tests labels May 3, 2024

andiwand mentioned this pull request May 6, 2024

feat: Reintroduce branch stopping on measurement in Core CKF #3172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!: Refactor CKF branch stopper to allow stop and keep tracks #3102

refactor!: Refactor CKF branch stopper to allow stop and keep tracks #3102

andiwand commented Apr 15, 2024 •

edited

Loading

github-actions bot commented Apr 15, 2024 •

edited

Loading

AJPfleger left a comment

timadye left a comment

andiwand commented Apr 30, 2024

timadye left a comment

andiwand commented Apr 30, 2024

andiwand commented May 1, 2024

timadye commented May 1, 2024

timadye commented May 1, 2024

timadye commented May 1, 2024

timadye commented May 1, 2024

andiwand commented May 2, 2024

timadye commented May 2, 2024

andiwand commented May 2, 2024

timadye commented May 2, 2024

andiwand commented May 2, 2024

timadye commented May 2, 2024

timadye commented May 2, 2024

timadye left a comment

acts-project-service commented May 3, 2024 •

edited

Loading

refactor!: Refactor CKF branch stopper to allow stop and keep tracks #3102

refactor!: Refactor CKF branch stopper to allow stop and keep tracks #3102

Conversation

andiwand commented Apr 15, 2024 • edited Loading

github-actions bot commented Apr 15, 2024 • edited Loading

📊: Physics performance monitoring for d8560bf

physmon summary

AJPfleger left a comment

Choose a reason for hiding this comment

timadye left a comment

Choose a reason for hiding this comment

andiwand commented Apr 30, 2024

timadye left a comment

Choose a reason for hiding this comment

andiwand commented Apr 30, 2024

andiwand commented May 1, 2024

timadye commented May 1, 2024

timadye commented May 1, 2024

timadye commented May 1, 2024

timadye commented May 1, 2024

andiwand commented May 2, 2024

timadye commented May 2, 2024

andiwand commented May 2, 2024

timadye commented May 2, 2024

andiwand commented May 2, 2024

timadye commented May 2, 2024

timadye commented May 2, 2024

main:

#3102 (2b068ea):

timadye left a comment

Choose a reason for hiding this comment

acts-project-service commented May 3, 2024 • edited Loading

✅ Athena integration test results [bed53b2]

✅ All tests successful

andiwand commented Apr 15, 2024 •

edited

Loading

github-actions bot commented Apr 15, 2024 •

edited

Loading

📊: Physics performance monitoring for `d8560bf`

`main`:

#3102 (`2b068ea`):

acts-project-service commented May 3, 2024 •

edited

Loading

✅ Athena integration test results [`bed53b2`]