-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index error in L1Trigger/L1TTrackMatch/L1TrackJetEmulatorProducer #43723
Comments
cms-bot internal usage |
A new Issue was created by @dan131riley Dan Riley. @antoniovilela, @makortel, @smuzaffar, @Dr15Jones, @sextonkennedy, @rappoccio can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign l1 |
New categories assigned: l1 @epalencia,@aloeliger you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Just a note from #43735 (closed, and follow up in this issue) is that it is unclear to me why the rates of failure are different with very close geometry version. For example, I try 20 jobs of ttbar nopu, 500 events per job with same GEN event, between 24834.0 and 25234.0. I don't see any fail job from 24834.0 (D98), but 4 job fail from 25234.0 (D99). However, it should be fixed before the Phase-2 production starts. |
@epalencia @aloeliger @BenjaminRS |
@BenjaminRS Do you know the track group responsible for this producer? |
@SClarkPhysics - we have an issue with the line mentioned above. I see you added this particular line in this commit. Can you have a look into this problem as soon as you can please? |
To reproduce the error with CMSSW_14_0_0_pre2:
with input file: I see that crash at event 209. |
Can maybe @NJManganelli help look at this from the GTT side? |
We are working on a fix. Should this be pushed to master? @srimanob |
Hi @ccahoughton, please. Thanks very much. |
Fix is in #43852 Testing in private production, issue is gone. I can completely produce sample with no crash (using 1000 events/lumi). |
Looks like this was fixed by #43852 .... @dan131riley can we close this issue? |
ok to close |
this line
cmssw/L1Trigger/L1TTrackMatch/plugins/L1TrackJetEmulatorProducer.cc
Line 298 in 10b8a60
is occasionally producing a -1 bin index which is subsequently used to index into the stack-allocated epbins array, smashing the stack of the previous stack allocations. On very rare occasions this results in a segfault, but it actually happens (without a segfault) fairly often for some workflows. The way to see this to add
at line 300, and run wf 23234.0, 24834.0, or 25034.999, which should frequently give an assertion failure in step2 or step3. ASAN and UBSAN don't catch this, but valgrind memcheck does:
In all the cases I've checked,
L1TrkPtrs_[k]->getTanlWord()
at the failure has the valueand working through
eta_bin_firmwareStyle()
confirms this gives a -1 index.The text was updated successfully, but these errors were encountered: