Rewrote checkForModuleDependencyCorrectness #34735

Dr15Jones · 2021-08-02T19:00:03Z

PR description:

The IBs were showing the old algorithm, using boost graph library, could hit some pathological cases and take >10 minutes to run.
The new algorithm simulates how the framework would run the modules and checks to see if a deadlock would occur.
New unit tests were added to test the higher level algorithm interface and to check exception messages.

PR validation:

Code compiles. Framework unit tests (including new ones) pass.

fixes #34633
fixes #31199
fixes cms-sw/framework-team#210

The IBs were showing the old algorithm, using boost graph library, could hit some pathological cases and take >10 minutes to run. The new algorithm simulates how the framework would run the modules and checks to see if a deadlock would occur.

The function is no longer needed as the dependency checks are now done using a different algoritm.

cmsbuild · 2021-08-02T19:08:21Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34735/24382

This PR adds an extra 36KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File FWCore/Framework/test/BuildFile.xml modified in PR(s): Make EventSetup get to throw if called without a token when EDModule consumed any ES product #31746

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34735/24382/code-format.patch
e.g. curl https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34735/24382/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

cmsbuild · 2021-08-02T19:18:19Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34735/24383

This PR adds an extra 36KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File FWCore/Framework/test/BuildFile.xml modified in PR(s): Make EventSetup get to throw if called without a token when EDModule consumed any ES product #31746

cmsbuild · 2021-08-02T19:18:44Z

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

FWCore/Framework (core)

@makortel, @smuzaffar, @cmsbuild, @Dr15Jones can you please review it and eventually sign? Thanks.
@makortel, @wddgit this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy, @perrotta you are the release manager for this.

cms-bot commands are listed here

Dr15Jones · 2021-08-02T19:44:38Z

I tested this change on step2 of workflow 11725.0 under CMSSW_12_0_X_2021-07-28-2300. On the machine I tested, the original job took > 10 minutes to do the job initialization. Using this code, it took less than 2 minutes. However, the new code gave an error stating a dependent module was later on a path. The problem was the same modules appear multiple times on the same path which confused the part of the algorithm that is meant to enforce policy, not the part that tests for runnability.

I'll modify the algorithm to ignore duplicate modules on the same path.

cmsbuild · 2021-08-02T20:16:53Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-34735/24384

This PR adds an extra 40KB to repository
There are other open Pull requests which might conflict with changes you have proposed:
- File FWCore/Framework/test/BuildFile.xml modified in PR(s): Make EventSetup get to throw if called without a token when EDModule consumed any ES product #31746

cmsbuild · 2021-08-02T20:17:15Z

Pull request #34735 was updated. @makortel, @smuzaffar, @cmsbuild, @Dr15Jones can you please check and sign again.

Dr15Jones · 2021-08-02T20:20:12Z

please test

Dr15Jones · 2021-08-02T20:25:17Z

@Martin-Grunewald @fwyzard The new module dependency checking algorithm explicitly adds some enforcements of policy which the old algorithm appeared to be doing.

One such policy is if a module 'a' consumes data from 'module 'b' then if module 'b' appears on at least 1 path with module 'a' it must appear on ALL paths with module 'b'. We believe that policy was requested by the HLT.

NOTE: that is strictly a policy enforcement as in the cases mentioned above, the framework could actually properly schedule the modules even if the paths were not completely consistent.

The reason I mention this is I was just testing step2 of workflow 11725.0 and it is failing with

----- Begin Fatal Exception 02-Aug-2021 15:16:31 CDT-----------------------
An exception of category 'ScheduleExecutionFailure' occurred while
   [0] Calling beginJob
Exception Message:
Unrunnable schedule
Paths are non consistent
  module 'hltPixelTrackerHVOn' depends on 'hltScalersRawToDigi' which appears on paths
  HLT_Ele24_eta2p1_WPTight_Gsf_LooseChargedIsoPFTauHPS30_eta2p1_CrossL1_v1 HLT_Ele24_eta2p1_WPTight_Gsf_MediumChargedIsoPFTauHPS30_eta2p1_CrossL1_v1 HLT_Ele24_eta2p1_WPTight_Gsf_TightChargedIsoPFTauHPS30_eta2p1_CrossL1_v1 HLT_Ele24_eta2p1_WPTight_Gsf_LooseChargedIsoPFTauHPS30_eta2p1_TightID_CrossL1_v1 HLT_Ele24_eta2p1_WPTight_Gsf_MediumChargedIsoPFTauHPS30_eta2p1_TightID_CrossL1_v1 HLT_Ele24_eta2p1_WPTight_Gsf_TightChargedIsoPFTauHPS30_eta2p1_TightID_CrossL1_v1 HLT_IsoMu20_eta2p1_LooseChargedIsoPFTauHPS27_eta2p1_CrossL1_v4 HLT_IsoMu20_eta2p1_MediumChargedIsoPFTauHPS27_eta2p1_CrossL1_v1 HLT_IsoMu20_eta2p1_TightChargedIsoPFTauHPS27_eta2p1_CrossL1_v1 HLT_IsoMu20_eta2p1_LooseChargedIsoPFTauHPS27_eta2p1_TightID_CrossL1_v1 HLT_IsoMu20_eta2p1_MediumChargedIsoPFTauHPS27_eta2p1_TightID_CrossL1_v1 HLT_IsoMu20_eta2p1_TightChargedIsoPFTauHPS27_eta2p1_TightID_CrossL1_v1 HLT_IsoMu24_eta2p1_TightChargedIsoPFTauHPS35_Trk1_eta2p1_Reg_CrossL1_v1 HLT_IsoMu24_eta2p1_MediumChargedIsoPFTauHPS35_Trk1_TightID_eta2p1_Reg_CrossL1_v1 HLT_IsoMu24_eta2p1_TightChargedIsoPFTauHPS35_Trk1_TightID_eta2p1_Reg_CrossL1_v1 HLT_IsoMu24_eta2p1_MediumChargedIsoPFTauHPS35_Trk1_eta2p1_Reg_CrossL1_v4 HLT_IsoMu24_eta2p1_MediumChargedIsoPFTauHPS30_Trk1_eta2p1_Reg_CrossL1_v1 HLT_IsoMu27_LooseChargedIsoPFTauHPS20_Trk1_eta2p1_SingleL1_v1 HLT_IsoMu27_MediumChargedIsoPFTauHPS20_Trk1_eta2p1_SingleL1_v1 HLT_IsoMu27_TightChargedIsoPFTauHPS20_Trk1_eta2p1_SingleL1_v1 HLT_HT425_v9 HLT_HT430_DisplacedDijet40_DisplacedTrack_v13 HLT_HT500_DisplacedDijet40_DisplacedTrack_v13 HLT_HT430_DisplacedDijet60_DisplacedTrack_v13 HLT_HT400_DisplacedDijet40_DisplacedTrack_v13 HLT_HT650_DisplacedDijet60_Inclusive_v13 HLT_HT550_DisplacedDijet60_Inclusive_v13 AlCa_LumiPixelsCounts_ZeroBias_v1 HLT_DoubleMediumChargedIsoPFTauHPS30_L1MaxMass_Trk1_eta2p1_Reg_v1 HLT_DoubleTightChargedIsoPFTauHPS35_Trk1_eta2p1_Reg_v1 HLT_DoubleMediumChargedIsoPFTauHPS35_Trk1_TightID_eta2p1_Reg_v1 HLT_DoubleMediumChargedIsoPFTauHPS35_Trk1_eta2p1_Reg_v4 HLT_DoubleTightChargedIsoPFTauHPS35_Trk1_TightID_eta2p1_Reg_v1 HLT_DoubleMediumChargedIsoPFTauHPS40_Trk1_eta2p1_Reg_v1 HLT_DoubleTightChargedIsoPFTauHPS40_Trk1_eta2p1_Reg_v1 HLT_DoubleMediumChargedIsoPFTauHPS40_Trk1_TightID_eta2p1_Reg_v1 HLT_DoubleTightChargedIsoPFTauHPS40_Trk1_TightID_eta2p1_Reg_v1 HLT_VBF_DoubleLooseChargedIsoPFTauHPS20_Trk1_eta2p1_v1 HLT_VBF_DoubleMediumChargedIsoPFTauHPS20_Trk1_eta2p1_v1 HLT_VBF_DoubleTightChargedIsoPFTauHPS20_Trk1_eta2p1_v1 
but is missing from
  AlCa_LumiPixelsCounts_Random_v1 
----- End Fatal Exception -------------------------------------------------

Therefore strict enforcement of this policy will likely break the IBs as the old algorithm was not catching all the cases.

So we need to know if this policy must actually be enforced and if so, who will clean up the existing problems.

fwyzard · 2021-08-05T06:29:36Z

Hi Chris,
that's a problem with the mkFit customisation, not with the HLT menu.

@mmasciov @slava77 @makortel may be able to help fix it.

In the meantime, the agreement when it was introduced was that HLT-related failures in that test should not block other PRs from being merged.

Martin-Grunewald · 2021-08-05T06:46:33Z

@fwyzard
Sorry, I do not get your point re mkfit problem in this context, could you please clarify?
I also do not see how module 'hltIter0PFlowCkfTrackCandidates' depends on 'hltIter0PFlowCkfTrackCandidatesMkFitEventOfHits' - the latter is not an InputTag parameter of the former...

fwyzard · 2021-08-05T06:59:03Z

Sure !

The error reported by Chris

Paths are non consistent
  module 'hltIter0PFlowCkfTrackCandidates' depends on 'hltIter0PFlowCkfTrackCandidatesMkFitEventOfHits' which appears on paths
  HLT_AK8PFJet360_TrimMass30_v18 HLT_AK8PFJet380_TrimMass30_v11 HLT_AK8PFJet400_TrimMass30_v12 ...[cut many, many, many paths]
but is missing from
  HLT_IsoTrackHB_v4 HLT_IsoTrackHE_v4

mentions the module hltIter0PFlowCkfTrackCandidatesMkFitEventOfHits, which is not part of the HLT menu itself, but is added by the mkFit customisation at RecoTracker/MkFit/python/customizeHLTIter0ToMkFit.py.

.7 is the workflow modifier used by runTheMatrix.py to switch on the use of mkFit.

fwyzard · 2021-08-05T07:03:58Z

See #33802 and the discussion around #33802 (comment) .

By the way, I should correct myself: the agreement was that the mkFit customisation should not block any HLT-related work from being merged - for the framework changes, it's up to the Core Software group.

Martin-Grunewald · 2021-08-05T07:06:24Z

Thanks Andrea!

Sorry to say, but my first reaction is that this sucks: That customisation is alien to HLT, why is an HLT modification done by Reco? In view of integration into HLT 'some time in the future'? At this stage it may help Reco but is no good for HLT. Who is the proponent to push this into HLT? On what time scale?

Hmm, I would prefer if that modification of HLT would be removed alltogether from IB and other 'official' tests, and kept private for now until it gets proposed and approved for integration into HLT.

fwyzard · 2021-08-05T07:12:48Z

Sorry to say, but my first reaction is that this sucks: That customisation is alien to HLT, why is an HLT modification done by Reco?

Eh... I don't disagree.

In view of integration into HLT 'some time in the future'? At this stage it may help Reco but is no good for HLT.
Who is the proponent to push this into HLT? On what time scale?

I would say @mmasciov, @slava77, and @makortel, based on the previous presentations, discussion, and work on PRs.

However it's not clear (to me) if it will be useful for the HLT, at least on the timescale of the beginning of Run 3.

cmsbuild · 2021-08-05T07:29:42Z

-1

Failed Tests: RelVals RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-76ce15/17549/summary.html
COMMIT: 6bf59e3
CMSSW: CMSSW_12_1_X_2021-08-04-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/34735/17549/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

1000.0

----- Begin Fatal Exception 04-Aug-2021 23:15:24 CEST-----------------------
An exception of category 'ScheduleExecutionFailure' occurred while
   [0] Calling beginJob
Exception Message:
Unrunnable schedule
Paths are non consistent
  module 'ALCARECOHcalCalPhisymDQM' depends on 'hbherecoNoise' which appears on paths
  pathALCARECOHcalCalMinBias 
but is missing from
  pathALCARECOHcalCalIterativePhiSym 
----- End Fatal Exception -------------------------------------------------

RelVals-INPUT

1000.01000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log
11634.711634.7_TTbar_14TeV+2021_trackingMkFit+TTbar_14TeV_TuneCP5_GenSimINPUT+Digi+Reco+HARVEST/step2_TTbar_14TeV+2021_trackingMkFit+TTbar_14TeV_TuneCP5_GenSimINPUT+Digi+Reco+HARVEST.log

Dr15Jones · 2021-08-05T18:16:22Z

please test with #34793, #34784

cmsbuild · 2021-08-05T21:27:54Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-76ce15/17580/summary.html
COMMIT: 6bf59e3
CMSSW: CMSSW_12_1_X_2021-08-05-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/34735/17580/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 4 differences found in the comparisons
DQMHistoTests: Total files compared: 39
DQMHistoTests: Total histograms compared: 2999410
DQMHistoTests: Total failures: 10
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 2999377
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 173.577 KiB( 38 files compared)
DQMHistoSizes: changed ( 1000.0 ): 24.607 KiB ALCAStreamHcalIterativePhiSym/MBdepth1
DQMHistoSizes: changed ( 1000.0 ): 24.607 KiB ALCAStreamHcalIterativePhiSym/MBdepth2
DQMHistoSizes: changed ( 1000.0 ): 24.607 KiB ALCAStreamHcalIterativePhiSym/MBdepth3
DQMHistoSizes: changed ( 1000.0 ): 24.607 KiB ALCAStreamHcalIterativePhiSym/MBdepth4
DQMHistoSizes: changed ( 1000.0 ): 24.607 KiB ALCAStreamHcalIterativePhiSym/MBdepth5
DQMHistoSizes: changed ( 1000.0 ): 24.607 KiB ALCAStreamHcalIterativePhiSym/MBdepth6
DQMHistoSizes: changed ( 1000.0 ): 24.607 KiB ALCAStreamHcalIterativePhiSym/MBdepth7
DQMHistoSizes: changed ( 1000.0 ): 0.440 KiB ALCAStreamHcalIterativePhiSym/DistrHBHEsize
DQMHistoSizes: changed ( 1000.0 ): 0.438 KiB ALCAStreamHcalIterativePhiSym/DistrHFsize
DQMHistoSizes: changed ( 1000.0 ): 0.438 KiB ALCAStreamHcalIterativePhiSym/DistrHOsize
DQMHistoSizes: changed ( 1000.0 ): ...
Checked 165 log files, 37 edm output root files, 39 DQM output files
TriggerResults: no differences found

Dr15Jones · 2021-08-06T12:41:58Z

+1
requires #34793 and #34784 in order to avoid failures in the RelVals.

cmsbuild · 2021-08-06T12:42:25Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy, @perrotta (and backports should be raised in the release meeting by the corresponding L2)

perrotta · 2021-08-10T08:40:19Z

+1

Both Add additional paths to customizeHLTIter0ToMkFit #34793 and Run3-alca193 Add DQM for HcalIterativePhiSymAlCaReco #34784 are now merged

Dr15Jones added 2 commits August 2, 2021 13:44

Rewrote checkForModuleDependencyCorrectness

baa4d48

The IBs were showing the old algorithm, using boost graph library, could hit some pathological cases and take >10 minutes to run. The new algorithm simulates how the framework would run the modules and checks to see if a deadlock would occur.

Removed function edm::throwIfImproperDependencies

c088008

The function is no longer needed as the dependency checks are now done using a different algoritm.

cmsbuild added this to the CMSSW_12_1_X milestone Aug 2, 2021

cmsbuild added code-checks-pending core-pending orp-pending pending-signatures tests-pending labels Aug 2, 2021

cmsbuild added code-checks-rejected and removed code-checks-pending labels Aug 2, 2021

format code

ef5a480

cmsbuild added code-checks-pending and removed code-checks-rejected labels Aug 2, 2021

cmsbuild added code-checks-approved and removed code-checks-pending labels Aug 2, 2021

Ignore duplicate modules later on a Path/EndPath

c56d99e

cmsbuild added code-checks-pending and removed code-checks-approved labels Aug 2, 2021

cmsbuild added code-checks-approved and removed code-checks-pending labels Aug 2, 2021

cmsbuild added tests-started and removed tests-pending labels Aug 2, 2021

cmsbuild added tests-rejected and removed tests-started labels Aug 5, 2021

Dr15Jones mentioned this pull request Aug 5, 2021

Add additional paths to customizeHLTIter0ToMkFit #34793

Merged

cmsbuild added tests-started and removed tests-rejected labels Aug 5, 2021

cmsbuild added tests-approved and removed tests-started labels Aug 5, 2021

cmsbuild added core-approved fully-signed and removed core-pending pending-signatures labels Aug 6, 2021

cmsbuild added orp-approved and removed orp-pending labels Aug 10, 2021

cmsbuild merged commit b6555c2 into cms-sw:master Aug 10, 2021

qliphy mentioned this pull request Aug 11, 2021

Slowness in cmsRun starts #34633

Closed

Dr15Jones deleted the improveUnrunnableScheduledFinder branch August 17, 2021 15:20

missirol mentioned this pull request Jul 22, 2022

HLT menu development for 12_4_X (5/N): HLT V1.3 [12_5_X] #38816

Merged

missirol mentioned this pull request May 1, 2023

remove 2022 HLT menu from CMSSW_13_X_Y #41471

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrote checkForModuleDependencyCorrectness #34735

Rewrote checkForModuleDependencyCorrectness #34735

Dr15Jones commented Aug 2, 2021 •

edited

Loading

cmsbuild commented Aug 2, 2021

cmsbuild commented Aug 2, 2021

cmsbuild commented Aug 2, 2021

Dr15Jones commented Aug 2, 2021

cmsbuild commented Aug 2, 2021

cmsbuild commented Aug 2, 2021

Dr15Jones commented Aug 2, 2021

Dr15Jones commented Aug 2, 2021 •

edited

Loading

fwyzard commented Aug 5, 2021

Martin-Grunewald commented Aug 5, 2021 •

edited

Loading

fwyzard commented Aug 5, 2021 •

edited

Loading

fwyzard commented Aug 5, 2021

Martin-Grunewald commented Aug 5, 2021 •

edited

Loading

fwyzard commented Aug 5, 2021

cmsbuild commented Aug 5, 2021

Dr15Jones commented Aug 5, 2021

cmsbuild commented Aug 5, 2021

Dr15Jones commented Aug 6, 2021

cmsbuild commented Aug 6, 2021

perrotta commented Aug 10, 2021

Rewrote checkForModuleDependencyCorrectness #34735

Rewrote checkForModuleDependencyCorrectness #34735

Conversation

Dr15Jones commented Aug 2, 2021 • edited Loading

PR description:

PR validation:

cmsbuild commented Aug 2, 2021

cmsbuild commented Aug 2, 2021

cmsbuild commented Aug 2, 2021

Dr15Jones commented Aug 2, 2021

cmsbuild commented Aug 2, 2021

cmsbuild commented Aug 2, 2021

Dr15Jones commented Aug 2, 2021

Dr15Jones commented Aug 2, 2021 • edited Loading

fwyzard commented Aug 5, 2021

Martin-Grunewald commented Aug 5, 2021 • edited Loading

fwyzard commented Aug 5, 2021 • edited Loading

fwyzard commented Aug 5, 2021

Martin-Grunewald commented Aug 5, 2021 • edited Loading

fwyzard commented Aug 5, 2021

cmsbuild commented Aug 5, 2021

RelVals

RelVals-INPUT

Dr15Jones commented Aug 5, 2021

cmsbuild commented Aug 5, 2021

Comparison Summary

Dr15Jones commented Aug 6, 2021

cmsbuild commented Aug 6, 2021

perrotta commented Aug 10, 2021

Dr15Jones commented Aug 2, 2021 •

edited

Loading

Dr15Jones commented Aug 2, 2021 •

edited

Loading

Martin-Grunewald commented Aug 5, 2021 •

edited

Loading

fwyzard commented Aug 5, 2021 •

edited

Loading

Martin-Grunewald commented Aug 5, 2021 •

edited

Loading