Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: Refactor CKF branch stopper to allow stop and keep tracks #3102

Merged

Conversation

andiwand
Copy link
Contributor

@andiwand andiwand commented Apr 15, 2024

Changes the CKF branch stopper to allow keeping the tracks after the branch is stopped. This is useful if a track already collected enough measurements but then starts to accumulate holes.

This change should allow us to stop branches more aggressively because we don't have to be worried anymore to throw out valid track candidates.

closes

blocked by

@andiwand andiwand added this to the next milestone Apr 15, 2024
@andiwand andiwand requested a review from timadye April 15, 2024 13:29
@andiwand andiwand changed the title refactor: Refactor CKF branch stopper to allow stop and keep tracks refactor!: Refactor CKF branch stopper to allow stop and keep tracks Apr 15, 2024
@github-actions github-actions bot added Component - Core Affects the Core module Track Finding labels Apr 15, 2024
@andiwand andiwand modified the milestones: next, v35.0.0 Apr 15, 2024
AJPfleger
AJPfleger previously approved these changes Apr 15, 2024
Copy link
Contributor

@AJPfleger AJPfleger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, as we already discussed this morning.

I don't see any problems to keep the track, if it has a lot of "holes" in the end. In most (all?) places we consider holes only as such, when they appear etween the first and last measurements.

timadye
timadye previously approved these changes Apr 15, 2024
Copy link
Contributor

@timadye timadye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

It's a pity this change can't be made to be compatible, since an explicit cast from bool to BranchStopperResult would return the right result. But I can't see any way to make that work, especially for a Delegate call.

@acts-policybot acts-policybot bot dismissed timadye’s stale review April 24, 2024 07:23

Invalidated by push of 09cfd26

@github-actions github-actions bot added the Component - Examples Affects the Examples module label Apr 24, 2024
@andiwand andiwand requested a review from timadye April 30, 2024 12:36
@andiwand
Copy link
Contributor Author

I am measuring a speedup of 3x which makes me wonder if that can be real 🤔

image

Copy link
Contributor

@timadye timadye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, when I run full_chain_itk.py, I get a crash:

Core/include/Acts/Utilities/VectorHelpers.hpp:141: std::array<double, 4> Acts::VectorHelpers::evaluateTrigonomics(const Acts::Vector3&): Assertion `sinTheta != 0 && "VectorHelpers: Vector is parallel to the z-axis " "which leads to division by zero"' failed.

after ~47 events. With main HEAD, it runs OK, at least to >90 events. Of course that difference could just be bad luck.

@andiwand
Copy link
Contributor Author

Unfortunately, when I run full_chain_itk.py, I get a crash:

Core/include/Acts/Utilities/VectorHelpers.hpp:141: std::array<double, 4> Acts::VectorHelpers::evaluateTrigonomics(const Acts::Vector3&): Assertion `sinTheta != 0 && "VectorHelpers: Vector is parallel to the z-axis " "which leads to division by zero"' failed.

after ~47 events. With main HEAD, it runs OK, at least to >90 events. Of course that difference could just be bad luck.

Hm that looks unrelated. I can try to see if this is easily fixable but this should go into another PR in any case.

@andiwand
Copy link
Contributor Author

andiwand commented May 1, 2024

@timadye after #3164 and #3163 the problem was gone for me

@timadye
Copy link
Contributor

timadye commented May 1, 2024

@timadye after #3164 and #3163 the problem was gone for me

Thanks! That should make testing the performance of this PR much easier. I will do that today.

@timadye
Copy link
Contributor

timadye commented May 1, 2024

@timadye after #3164 and #3163 the problem was gone for me

Those fix the problems I had, but my attempt to merge #3164 with this (#3102) led to a crash on the first event:

#5  0x00007f7e66f55b70 in std::vector<unsigned int, std::allocator<unsigned int> >::operator[] (__n=112, this=0x128) at /cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/include/c++/13.1.0/bits/stl_vector.h:1143
#6  Acts::detail_vmt::VectorMultiTrajectoryBase::component_impl<true, Acts::VectorMultiTrajectory const> (istate=112, key=4099663144, instance=...) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/VectorMultiTrajectory.hpp:236
#7  Acts::VectorMultiTrajectory::component_impl (istate=112, key=4099663144, this=0x110) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/VectorMultiTrajectory.hpp:454
#8  Acts::MultiTrajectory<Acts::VectorMultiTrajectory>::component<unsigned int, 4099663144u> (istate=112, this=0x110) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/MultiTrajectory.hpp:593
#9  Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false>::component<unsigned int, 4099663144u> (this=0x22c15f50) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/TrackStateProxy.hpp:1083
#10 Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false>::previous (this=0x22c15f50) at /home/ppd/adye/acts/build/src/Core/include/Acts/EventData/TrackStateProxy.hpp:270
#11 Acts::CombinatorialKalmanFilter<Acts::Propagator<Acts::EigenStepper<Acts::StepperExtensionList<Acts::DefaultExtension>, Acts::detail::VoidAuctioneer>, Acts::Navigator>, Acts::VectorMultiTrajectory>::Actor<Acts::Delegate<std::pair<Acts::SourceLinkAdapterIterator<boost::container::vec_iterator<ActsExamples::IndexSourceLink*, true> >, Acts::SourceLinkAdapterIterator<boost::container::vec_iterator<ActsExamples::IndexSourceLink*, true> > > (Acts::Surface const&), void, (Acts::DelegateType)1>, Acts::GenericBoundTrackParameters<Acts::ParticleHypothesis> >::processSelectedTrackStates(Acts::ContextType const&, __gnu_cxx::__normal_iterator<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false> const*, std::vector<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false>, std::allocator<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false> > > >, __gnu_cxx::__normal_iterator<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false> const*, std::vector<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false>, std::allocator<Acts::TrackStateProxy<Acts::VectorMultiTrajectory, 6ul, false> > > >, Acts::CombinatorialKalmanFilterResult<Acts::VectorMultiTrajectory>&, bool, Acts::CombinatorialKalmanFilterTipState const&, unsigned long&) const (this=0x7ffd73f709f0, gctx=..., begin=..., end=..., result=..., isOutlier=true, prevTipState=..., nBranchesOnSurface=0x7ffd73f6e7d8: 0) at /home/ppd/adye/acts/build/src/Core/include/Acts/TrackFinding/CombinatorialKalmanFilter.hpp:825

Maybe I merged them badly. Did you try with both PRs?

@timadye
Copy link
Contributor

timadye commented May 1, 2024

For 1000 ttbar+PU200 events, with 8 threads, I get an overall speedup 9.1→8.2 s/event (x1.10) for the whole job, and in the TrackFindingAlgorithm, 2.7→1.9 s/event (x1.43). The efficiency is a bit weird, but I suppose mostly better than main:
trackeff_vs_eta

@timadye
Copy link
Contributor

timadye commented May 1, 2024

Some more plots with interesting features:
trackeff_vs_pT
trackeff_vs_phi
nOutliers_vs_eta

@andiwand
Copy link
Contributor Author

andiwand commented May 2, 2024

Just to be sure: these plots come from Athena reconstruction? And all of them are ttbar PU 200?

  • I am a little bit surprised that the efficiency is sometimes lower. While in phi it looks like it is almost consistently larger?
  • For efficiency over pT I guess our signal is concentrated on the left where we do see some improvements.
  • Over eta it looks more symmetrical which is nice.

I believe the increase in efficiency comes from #3164. I found a spot where we stop the branch but never stored the track.

Speedup 1.4 is great! I am surprised that it is so different between ODD Acts and ITk Athena-Acts. Maybe the previous branch aborter was already more effective than the ODD Acts one because of the detector geometry.

I guess you are using the same holes/outlier cuts as before on Athena main?

@timadye
Copy link
Contributor

timadye commented May 2, 2024

Just to be sure: these plots come from Athena reconstruction? And all of them are ttbar PU 200?

No, they are ACTS stand-alone with the ITk. All with ttbar PU 200.

I can't easily test it in Athena until the PRs are merged - and even then, the main--ACTS build will fail due to the breaking change - at least to start with.

  • I am a little bit surprised that the efficiency is sometimes lower. While in phi it looks like it is almost consistently larger?

That is η dependent. Overall, the efficiency is larger in more η regions than regions where it is smaller.

  • For efficiency over pT I guess our signal is concentrated on the left where we do see some improvements.

right. Most ttbar PU 200 tracks are low pT.

  • Over eta it looks more symmetrical which is nice.

👍

I believe the increase in efficiency comes from #3164. I found a spot where we stop the branch but never stored the track.

Cool! I'll test this with #3164 on its own to separate the effect of the two PRs.

Speedup 1.4 is great! I am surprised that it is so different between ODD Acts and ITk Athena-Acts. Maybe the previous branch aborter was already more effective than the ODD Acts one because of the detector geometry.

Did you check this is still the case? You expressed some doubt about the x3 number before, and the lack of spread in times for different events is a little suspicious.

I assume your plot before was for the TrackFindingAlgorithm alone, so comparable with my x1.4 number. Mine was based on timing.tsv which only gives an average, not a histogram. How do yet get the timing for each event?

I guess you are using the same holes/outlier cuts as before on Athena main?

This is acts #3102 vs acts main. I am using the same holes/outlier cuts for both, ie. both with #3163 merged in.

@andiwand
Copy link
Contributor Author

andiwand commented May 2, 2024

Did you check this is still the case? You expressed some doubt about the x3 number before, and the lack of spread in times for different events is a little suspicious.

I will run it again. The output seems to have changed significantly with 2b068ea which is unexpected but the change looks more correct than what was there before.

One suspicion I have is that we end up in the Solenoid quite often in the ODD and just waste a lot of time propagating there. This is prohibited in a lot of cases not by stopping the finding earlier.

@timadye
Copy link
Contributor

timadye commented May 2, 2024

#3164 didn't show any changes, so the efficiency changes must be down to this PR (#3102). I will also run this one again, following your latest updates.

@andiwand
Copy link
Contributor Author

andiwand commented May 2, 2024

image

Speedup is still significant - seems to be real.

Here are some stats from the track finding which gives a little bit more insight.

main

19:53:34    TrackFinding   INFO      TrackFindingAlgorithm statistics:
19:53:34    TrackFinding   INFO      - total seeds: 1533560
19:53:34    TrackFinding   INFO      - deduplicated seeds: 363980
19:53:34    TrackFinding   INFO      - failed seeds: 0
19:53:34    TrackFinding   INFO      - failed smoothing: 0
19:53:34    TrackFinding   INFO      - failed extrapolation: 0
19:53:34    TrackFinding   INFO      - failure ratio seeds: 0
19:53:34    TrackFinding   INFO      - found tracks: 1028500
19:53:34    TrackFinding   INFO      - selected tracks: 33380
19:53:34    TrackFinding   INFO      - stopped branches: 141080

changed

19:50:37    TrackFinding   INFO      TrackFindingAlgorithm statistics:
19:50:37    TrackFinding   INFO      - total seeds: 1533560
19:50:37    TrackFinding   INFO      - deduplicated seeds: 288710
19:50:37    TrackFinding   INFO      - failed seeds: 0
19:50:37    TrackFinding   INFO      - failed smoothing: 0
19:50:37    TrackFinding   INFO      - failed extrapolation: 0
19:50:37    TrackFinding   INFO      - failure ratio seeds: 0
19:50:37    TrackFinding   INFO      - found tracks: 50090
19:50:37    TrackFinding   INFO      - selected tracks: 33950
19:50:37    TrackFinding   INFO      - stopped branches: 1196310

stopped branches is ~10x

@andiwand andiwand removed the 🛑 blocked This item is blocked by another item label May 2, 2024
@github-actions github-actions bot added Infrastructure Changes to build tools, continous integration, ... Changes Performance labels May 2, 2024
@timadye
Copy link
Contributor

timadye commented May 2, 2024

Same test as before, but comparing #3102 after 2b068ea was added against the old main. Similar speedup as before: TrackFindingAlgorithm 2.7→1.9 s/event (x1.42), but even better efficiency improvement:
trackeff_vs_eta
trackeff_vs_phi
nOutliers_vs_eta

@timadye
Copy link
Contributor

timadye commented May 2, 2024

again for the ITk:

main:

00:14:50    TrackFinding   INFO      TrackFindingAlgorithm statistics:
00:14:50    TrackFinding   INFO      - total seeds: 20906970
00:14:50    TrackFinding   INFO      - deduplicated seeds: 18113959
00:14:50    TrackFinding   INFO      - failed seeds: 0
00:14:50    TrackFinding   INFO      - failed smoothing: 0
00:14:50    TrackFinding   INFO      - failed extrapolation: 0
00:14:50    TrackFinding   INFO      - failure ratio seeds: 0
00:14:50    TrackFinding   INFO      - found tracks: 4771749
00:14:50    TrackFinding   INFO      - selected tracks: 3037718
00:14:50    TrackFinding   INFO      - stopped branches: 636123

#3102 (2b068ea):

18:58:50    TrackFinding   INFO      TrackFindingAlgorithm statistics:
18:58:50    TrackFinding   INFO      - total seeds: 20894532
18:58:50    TrackFinding   INFO      - deduplicated seeds: 17905047
18:58:50    TrackFinding   INFO      - failed seeds: 0
18:58:50    TrackFinding   INFO      - failed smoothing: 0
18:58:50    TrackFinding   INFO      - failed extrapolation: 0
18:58:50    TrackFinding   INFO      - failure ratio seeds: 0
18:58:50    TrackFinding   INFO      - found tracks: 3527696
18:58:50    TrackFinding   INFO      - selected tracks: 3193132
18:58:50    TrackFinding   INFO      - stopped branches: 1946303

so x3.1 stopped branches. Comparing with ~x10 for the ODD @andiwand found, is in line with the smaller speedup.

Copy link
Contributor

@timadye timadye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly an improvement. Unless you are planning other breaking changes to be included at the same time, is it ready to go? 🚀

@kodiakhq kodiakhq bot merged commit bed53b2 into acts-project:main May 3, 2024
51 checks passed
@github-actions github-actions bot removed the automerge label May 3, 2024
@andiwand andiwand deleted the refactor-ckf-branch-stopper-stop-and-keep branch May 3, 2024 07:24
@acts-project-service acts-project-service added Breaks Athena build This PR breaks the Athena build Fails Athena tests This PR causes a failure in the Athena tests labels May 3, 2024
EleniXoch pushed a commit to EleniXoch/acts that referenced this pull request May 6, 2024
…cts-project#3102)

Changes the CKF branch stopper to allow keeping the tracks after the branch is stopped. This is useful if a track already collected enough measurements but then starts to accumulate holes.

This change should allow us to stop branches more aggressively because we don't have to be worried anymore to throw out valid track candidates.

closes
- acts-project#2967

blocked by
- acts-project#3164
- acts-project#3163
kodiakhq bot pushed a commit that referenced this pull request May 7, 2024
After removing this in #3102 I reintroduce this functionality here to reduce the changes to a minimum. After #3102 we see some bigger tack finding changes than expected which I suspect are coming from the fact that we effectively stopped most of the branches when reaching a measurement surface.

Having this callback will also be necessary if we want to allow the user to count holes/outliers/measurements in specific geometry regions by themselves.
asalzburger pushed a commit to asalzburger/acts that referenced this pull request May 21, 2024
…cts-project#3102)

Changes the CKF branch stopper to allow keeping the tracks after the branch is stopped. This is useful if a track already collected enough measurements but then starts to accumulate holes.

This change should allow us to stop branches more aggressively because we don't have to be worried anymore to throw out valid track candidates.

closes
- acts-project#2967

blocked by
- acts-project#3164
- acts-project#3163
asalzburger pushed a commit to asalzburger/acts that referenced this pull request May 21, 2024
…oject#3172)

After removing this in acts-project#3102 I reintroduce this functionality here to reduce the changes to a minimum. After acts-project#3102 we see some bigger tack finding changes than expected which I suspect are coming from the fact that we effectively stopped most of the branches when reaching a measurement surface.

Having this callback will also be necessary if we want to allow the user to count holes/outliers/measurements in specific geometry regions by themselves.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaks Athena build This PR breaks the Athena build Changes Performance Component - Core Affects the Core module Component - Examples Affects the Examples module Fails Athena tests This PR causes a failure in the Athena tests Infrastructure Changes to build tools, continous integration, ... Track Finding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants