-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Rare Undefined Behavior in PixelThresholdClusterizer #35275
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35275/25270
|
A new Pull Request was created by @OzAmram (Oz Amram) for master. It involves the following packages:
@jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild , please test |
type bug-fix |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-110411/18609/summary.html Comparison SummarySummary:
|
please test workflow 134.706 for CMSSW_12_1_UBSAN_X |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-110411/18617/summary.html Comparison SummaryThe workflows 140.53 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons @slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
@@ -222,6 +222,11 @@ void PixelThresholdClusterizer::copy_to_buffer(DigiIterator begin, DigiIterator | |||
// std::cout << (doMissCalibrate ? "VI from db" : "VI linear") << std::endl; | |||
} | |||
#endif | |||
|
|||
//avoid undefined behavior | |||
if (end <= begin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be better to essentially assert/crash here, and ensure that copy_to_buffer is not called with incorrect inputs (by fixing the calling code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong preference here either but calling copy_to_buffer
in https://github.com/cms-sw/cmssw/blob/CMSSW_12_1_0_pre3/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#L151 only if end>begin
perhaps would be a more elegant solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, I think it makes sense for a function itself to act in case it encounters an undefined behavior rather than to leave the checking to the caller. So in the end I think I like the current fix more.
In general, I am not a big fan of asserts in the production code. For anything unexpected or undefined, isn't it better to deal with it gracefully and report a LogError? In this particular case it looks like we are encountering a situation where a pixel module has zero digis produced and an empty vector (DetSet) is passed on to the clustering routine. So this I would say is nothing particularly alarming and probably does not even require issuing a LogError. The following commented out line https://github.com/cms-sw/cmssw/blob/CMSSW_12_1_0_pre3/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#L135 seems to suggest that this scenario can indeed occur. However, what I am a bit confused about is why these empty DetSets are not simply dropped from the digi collection? Either way, the clusterizer code should be able to handle such cases without any trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this empty case corresponding to (end == begin)
?
or is this a case of end before begin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slava77, end == begin
@OzAmram, I had a quick chat with @tsusa today precisely about this issue of end == begin
which suggests that in the digi collection we have an empty vector of digis stored for a particular detId. The question then is why is this empty vector not dropped from the collection in the first place. It would therefore be good to check how that happens. On the other hand, even if there is some issue in the digi producer which can lead to such situations, the clusterizer should be immune against such cases which is what this PR achieves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I tend to agree that it would be good if the clusterizer does not fail for such a case. Maybe we add LogWarning
message to this PR and followup later to try and track down the upstream issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view, at least the code comments should be reasonably clear why a function is expected to sometimes get incorrect inputs. Something like "avoid undefined behaviour" may be quite mysterious for a reader later.
So if fixing this at the source is out of scope, how about:
//In rare cases, this function gets called with an empty DetSet.
//This is not expected to be a problem because of XYZ
if (end <= begin)
return;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
begin == end
is a reasonable case for a generic caller and it makes sense to me that if there is some preamble computation in the method to skip it if it's clearly not needed.
However, the part with end < begin
looks bad and better be resolved at the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @slava77. The end < begin
case looks bad but by construction it is not possible in this particular case. In the caller code the begin
and end
are iterators from the same vector so in the worst case end==begin
. So for such cases the method could issue a LogWarning and return.
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35275/25399
|
please test |
@OzAmram , sorry I force pushed a change here in order to get a new commit. The previous commit has reached the max commit statuses limit of 1000 which was causing bot to fail with error message like
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35275/25574
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-110411/19166/summary.html Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
+reconstruction
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
please test workflow 134.706 for CMSSW_12_1_UBSAN_X lets test based on latest UBSAN IB |
-1 Failed Tests: UnitTests Found compilation warnings Unit TestsI found errors in the following unit tests: ---> test EcnaCalculationsExample had ERRORS ---> test testUCTUnpacker had ERRORS Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
+1
|
This is a small fix to address the undefined behavior reported in issue #35036. The change is just to check the range is sensible before initializing an array.
No changes in output are expected.
@mmusich @ferencek @tsusa @tvami @czangela