[GPU] Workflow failures when running the alpaka customization in presence of a `Fake` menu #44119

mmusich · 2024-02-27T10:46:50Z

Several workflows {12434,12450}.{402,403,404,412} fail in GPU IB tests in CMSSW_14_1_GPU_X_2024-02-26-2300 along:

DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2023,ENDJOB
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
with DB:
entry filelist:step1_dasquery.log
found files:  ['/store/relval/CMSSW_13_0_10/RelValTTbar_14TeV/GEN-SIM/130X_mcRun3_2023_realistic_withEarly2023BS_v1_2023-v1/2590000/4a9c4099-1812-4afd-9c94-6f9409595929.root', '/store/relval/CMSSW_13_0_10/RelValTTbar_14TeV/GEN-SIM/130X_mcRun3_2023_realistic_withEarly2023BS_v1_2023-v1/2590000/99db1b20-ec34-4bff-84df-dfffcbdfb184.root', '/store/relval/CMSSW_13_0_10/RelValTTbar_14TeV/GEN-SIM/130X_mcRun3_2023_realistic_withEarly2023BS_v1_2023-v1/2590000/c388e800-ddaa-408d-a2ec-b40a9b8c7a08.root']
Step: DIGI Spec: ['pdigi_valid']
Step: L1 Spec: 
Step: DIGI2RAW Spec: 
Step: HLT Spec: ['@relval2023']
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/bin/el8_amd64_gcc12/cmsDriver.py", line 40, in <module>
    run()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/bin/el8_amd64_gcc12/cmsDriver.py", line 16, in run
    configBuilder.prepare()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 2310, in prepare
    self.addStandardSequences()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 850, in addStandardSequences
    getattr(self,"prepare_"+stepName)(stepSpec = '+'.join(stepSpec))
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 1670, in prepare_HLT
    self.loadAndRemember('HLTrigger/Configuration/HLT_%s_cff' % stepSpec)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 376, in loadAndRemember
    self.process.load(includeFile)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/FWCore/ParameterSet/python/Config.py", line 761, in load
    module = __import__(moduleName)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/HLT_Fake2_cff.py", line 237, in <module>
    fragment = customizeHLTforCMSSW(fragment,"Fake2")
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/customizeHLTforCMSSW.py", line 262, in customizeHLTforCMSSW
    (alpaka & run3_common).makeProcessModifier(customizeHLTforAlpaka).apply(process)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/FWCore/ParameterSet/python/Config.py", line 1980, in apply
    self.__func(process)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/customizeHLTforAlpaka.py", line 917, in customizeHLTforAlpaka
    process = customizeHLTforAlpakaEcalLocalReco(process)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/customizeHLTforAlpaka.py", line 908, in customizeHLTforAlpakaEcalLocalReco
    process.HLTDoFullUnpackingEgammaEcalTask = cms.ConditionalTask(process.HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask, process.HLTPreshowerTask)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02826/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/FWCore/ParameterSet/python/Config.py", line 1656, in __getattribute__
    return getattr(self.__process, name)
AttributeError: 'Process' object has no attribute 'HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask'

this likely comes from the integration of #44026 that moved @relval2023 to @Fake2.

The text was updated successfully, but these errors were encountered:

mmusich · 2024-02-27T10:47:02Z

assign hlt, heterogeneous

cmsbuild · 2024-02-27T10:47:06Z

New categories assigned: hlt,heterogeneous

@Martin-Grunewald,@mmusich,@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2024-02-27T10:47:08Z

cms-bot internal usage

cmsbuild · 2024-02-27T10:47:08Z

A new Issue was created by @mmusich.

@smuzaffar, @antoniovilela, @Dr15Jones, @makortel, @rappoccio, @sextonkennedy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

mmusich · 2024-02-27T10:47:19Z

@thomreis FYI

Martin-Grunewald · 2024-02-27T10:52:39Z

The customisation should check whether HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask actually exists, before messing with it.

thomreis · 2024-02-27T11:09:02Z

What menu was used for this?

mmusich · 2024-02-27T11:10:10Z

What menu was used for this?

Fake one. See above. In any case it does not matter. Please provide a fix, since the customization needs to run irrespectively

Martin-Grunewald · 2024-02-27T11:12:02Z

Hmm, alternatively, it may be best to remove alpaka from these (failing) 2023 (HLT) workflows (as those are now using the Fake menus). Testing alpaka on Fake HLT menus does not make much sense!

mmusich · 2024-02-27T11:14:11Z

Hmm, alternatively, it may be best to remove alpka from these (failing) 2023 (HLT) workflows (as those are now using the Fake menus). Testing alpaka on Fake HLT menus does not make much sense!

this is what this PR #44075 is going to do . On the other hand the customization should not break in any circumstance IMHO.

mmusich · 2024-02-27T11:49:50Z

On the other hand the customization should not break in any circumstance IMHO.

in order to achieve that, though also all the other customization pieces need to comply, perhaps better to remove all years with the fake menu from the alpaka customization

mmusich · 2024-02-27T11:50:01Z

assign pdmv

see [GPU] Workflow failures when running the alpaka customization in presence of a Fake menu #44119 (comment)

cmsbuild · 2024-02-27T11:50:06Z

New categories assigned: pdmv

@AdrianoDee,@sunilUIET,@miquork you have been requested to review this Pull request/Issue and eventually sign? Thanks

thomreis · 2024-02-27T12:30:43Z

Would add a condition to this line would fix this?

if hasattr(process, 'HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask') and hasattr(process, 'HLTPreshowerTask'):
    process.HLTDoFullUnpackingEgammaEcalTask = cms.ConditionalTask(process.HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask, process.HLTPreshowerTask)

Martin-Grunewald · 2024-02-27T12:36:50Z

This error, yes, I think so.

mmusich · 2024-02-27T12:41:01Z

Would add a condition to this line would fix this?

It does, but then it fails with:

DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2023,ENDJOB
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
with DB:
entry file:step1.root
Step: DIGI Spec: ['pdigi_valid']
Step: L1 Spec: 
Step: DIGI2RAW Spec: 
Step: HLT Spec: ['@relval2023']
Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/bin/el8_amd64_gcc12/cmsDriver.py", line 40, in <module>
    run()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/bin/el8_amd64_gcc12/cmsDriver.py", line 16, in run
    configBuilder.prepare()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 2310, in prepare
    self.addStandardSequences()
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 850, in addStandardSequences
    getattr(self,"prepare_"+stepName)(stepSpec = '+'.join(stepSpec))
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 1670, in prepare_HLT
    self.loadAndRemember('HLTrigger/Configuration/HLT_%s_cff' % stepSpec)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/Configuration/Applications/python/ConfigBuilder.py", line 376, in loadAndRemember
    self.process.load(includeFile)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/FWCore/ParameterSet/python/Config.py", line 761, in load
    module = __import__(moduleName)
  File "/tmp/musich/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/HLT_Fake2_cff.py", line 237, in <module>
    fragment = customizeHLTforCMSSW(fragment,"Fake2")
  File "/tmp/musich/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/customizeHLTforCMSSW.py", line 262, in customizeHLTforCMSSW
    (alpaka & run3_common).makeProcessModifier(customizeHLTforAlpaka).apply(process)
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/FWCore/ParameterSet/python/Config.py", line 1980, in apply
    self.__func(process)
  File "/tmp/musich/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/customizeHLTforAlpaka.py", line 919, in customizeHLTforAlpaka
    process = customizeHLTforAlpakaPixelReco(process)
  File "/tmp/musich/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/customizeHLTforAlpaka.py", line 809, in customizeHLTforAlpakaPixelReco
    process = customizeHLTforAlpakaPixelRecoVertexing(process)
  File "/tmp/musich/CMSSW_14_1_GPU_X_2024-02-26-2300/src/HLTrigger/Configuration/python/customizeHLTforAlpaka.py", line 732, in customizeHLTforAlpakaPixelRecoVertexing
    process.hltTrimmedPixelVertices 
  File "/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_GPU_X_2024-02-26-2300/src/FWCore/ParameterSet/python/Config.py", line 1656, in __getattribute__
    return getattr(self.__process, name)
AttributeError: 'Process' object has no attribute 'hltTrimmedPixelVertices'

thomreis · 2024-02-27T12:42:33Z

But that is not and issue of the ECAL customisation anymore. Looks like Pixel in this case.

Martin-Grunewald · 2024-02-27T12:42:59Z

It looks there are more instances where alpaka customisation parts fail on Fake* menus.
#44075 (#44076 bp) would fix it from the workflow use-case side?!

mmusich · 2024-02-27T12:43:49Z

But that is not and issue of the ECAL customisation anymore. Looks like Pixel in this case.

right, but it does not solve the issue.

thomreis · 2024-02-27T12:45:35Z

right, but it does not solve the issue.

Well it would solve this issue. But there seem to be others.

Martin-Grunewald · 2024-02-27T12:46:34Z

I guess it is faster to get the PRs in, rather than making alpaka customisations failsafe - given that the alpaka customisation will be folded into the ConfDb menus within a couple of weeks?
Or are there issues not fixed by the two PRs?

mmusich · 2024-02-27T12:46:48Z

Well it would solve this issue. But there seem to be others.

I edited the issue title to be more inclusive, so no, unfortunately it's not an adequate fix.

mmusich · 2024-02-27T12:48:14Z

Or are there issues not fixed by the two PRs?

getting the PR in will probably remove the failures from the IBs tests, but the workflows will remain broken IIUC

mmusich · 2024-02-27T12:57:17Z

diff --git a/HLTrigger/Configuration/python/customizeHLTforAlpaka.py b/HLTrigger/Configuration/python/customizeHLTforAlpaka.py
index d1ca276fb3e..a9bdb2feae0 100644
--- a/HLTrigger/Configuration/python/customizeHLTforAlpaka.py
+++ b/HLTrigger/Configuration/python/customizeHLTforAlpaka.py
@@ -190,6 +190,10 @@ def customizeHLTforAlpakaParticleFlowClustering(process):
             pfRecHits = cms.InputTag("hltPFRecHitSoAProducerHCALCPUSerial"),
             )
 
+    ## failsafe for fake menus
+    if(not hasattr(process,'hltParticleFlowClusterHBHE')):
+        return process
+
     process.hltLegacyPFClusterProducer = cms.EDProducer("LegacyPFClusterProducer",
             src = cms.InputTag("hltPFClusterSoAProducer"),
             pfClusterParams = cms.ESInputTag("pfClusterParamsESProducer:"),
@@ -725,6 +729,10 @@ def customizeHLTforAlpakaPixelRecoVertexing(process):
         src = cms.InputTag("hltPixelVerticesCPUSerial")
     )
 
+    ## failsafe for fake menus
+    if(not hasattr(process,'hltTrimmedPixelVertices')):
+        return process
+
     process.HLTRecopixelvertexingTask = cms.ConditionalTask(
         process.HLTRecoPixelTracksTask,
         process.hltPixelVerticesSoA,
@@ -905,7 +913,9 @@ def customizeHLTforAlpakaEcalLocalReco(process):
         if hasattr(process, 'hltEcalUncalibRecHitSoA'):
             delattr(process, 'hltEcalUncalibRecHitSoA')
 
-    process.HLTDoFullUnpackingEgammaEcalTask = cms.ConditionalTask(process.HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask, process.HLTPreshowerTask)
+        ## failsafe for fake menus
+        if hasattr(process, 'HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask') and hasattr(process, 'HLTPreshowerTask'):
+            process.HLTDoFullUnpackingEgammaEcalTask = cms.ConditionalTask(process.HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask, process.HLTPreshowerTask)
 
     return process

this seems to be enough to avoid runtime failures.

AdrianoDee · 2024-02-27T13:04:53Z

I don't think #44075 will fix this in the IBs since I didn't remove 2023 wfs but added 2024 ones (if I understood well the issue here). Alternatively to the solution here by @mmusich one could inhibit the *FakeHLT steps for the Alpaka wfs.

mmusich · 2024-02-27T13:10:57Z

one could inhibit the *FakeHLT steps for the Alpaka wfs.

this assumes that we are (correctly) running the FakeHLT RECO+DQM sequence in the workflows that run a Fake HLT menu, but this is not in general guaranteed nor enforced (even though we've been trying to be diligent with it). On the other hand since all the customization thing will get reabsorbed soon, I guess it's an academic discussion.
I would open a PR now with #44119 (comment) to get rid of failures for the next few weeks and be done with it.

AdrianoDee · 2024-02-27T13:12:16Z

Alternatively to the solution here by @mmusich one could inhibit the *FakeHLT steps for the Alpaka wfs.

Ok, on a second thought this could overcomplicate things. Would protect the customizer with the failsafes.

AdrianoDee · 2024-02-27T13:13:46Z

this assumes that we are (correctly) running the FakeHLT RECO+DQM sequence in the workflows that run a Fake HLT menu, but this is not in general guaranteed nor enforced (even though we've been trying to be diligent with it). On the other hand since all the customization thing will get reabsorbed soon, I guess it's an academic discussion.

Agreed, you just preceded me.

makortel · 2024-02-29T14:51:45Z

+heterogeneous

mmusich · 2024-02-29T14:52:43Z

+hlt

for the record

cmsbuild added hlt-pending pending-signatures heterogeneous-pending labels Feb 27, 2024

cmsbuild added the pdmv-pending label Feb 27, 2024

mmusich changed the title ~~[GPU] Workflow failures from missing HLTDoFullUnpackingEgammaEcalWithoutPreshowerTask~~ [GPU] Workflow failures when running the alpaka customization in presence of a Fake menu Feb 27, 2024

mmusich mentioned this issue Feb 27, 2024

add failsafes for protecting the alpaka customization against Fake HLT menus #44221

Merged

cmsbuild closed this as completed in #44221 Feb 29, 2024

cmsbuild added heterogeneous-approved and removed heterogeneous-pending labels Feb 29, 2024

cmsbuild added hlt-approved and removed hlt-pending labels Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Workflow failures when running the alpaka customization in presence of a `Fake` menu #44119

[GPU] Workflow failures when running the alpaka customization in presence of a `Fake` menu #44119

mmusich commented Feb 27, 2024 •

edited

Loading

mmusich commented Feb 27, 2024

cmsbuild commented Feb 27, 2024

cmsbuild commented Feb 27, 2024 •

edited

Loading

cmsbuild commented Feb 27, 2024

mmusich commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024

thomreis commented Feb 27, 2024

mmusich commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024 •

edited

Loading

mmusich commented Feb 27, 2024

mmusich commented Feb 27, 2024 •

edited

Loading

mmusich commented Feb 27, 2024

cmsbuild commented Feb 27, 2024

thomreis commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024

mmusich commented Feb 27, 2024

thomreis commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024 •

edited

Loading

mmusich commented Feb 27, 2024

thomreis commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024

mmusich commented Feb 27, 2024

mmusich commented Feb 27, 2024

mmusich commented Feb 27, 2024

AdrianoDee commented Feb 27, 2024

mmusich commented Feb 27, 2024

AdrianoDee commented Feb 27, 2024

AdrianoDee commented Feb 27, 2024

makortel commented Feb 29, 2024

mmusich commented Feb 29, 2024

[GPU] Workflow failures when running the alpaka customization in presence of a Fake menu #44119

[GPU] Workflow failures when running the alpaka customization in presence of a Fake menu #44119

Comments

mmusich commented Feb 27, 2024 • edited Loading

mmusich commented Feb 27, 2024

cmsbuild commented Feb 27, 2024

cmsbuild commented Feb 27, 2024 • edited Loading

cmsbuild commented Feb 27, 2024

mmusich commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024

thomreis commented Feb 27, 2024

mmusich commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024 • edited Loading

mmusich commented Feb 27, 2024

mmusich commented Feb 27, 2024 • edited Loading

mmusich commented Feb 27, 2024

cmsbuild commented Feb 27, 2024

thomreis commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024

mmusich commented Feb 27, 2024

thomreis commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024 • edited Loading

mmusich commented Feb 27, 2024

thomreis commented Feb 27, 2024

Martin-Grunewald commented Feb 27, 2024

mmusich commented Feb 27, 2024

mmusich commented Feb 27, 2024

mmusich commented Feb 27, 2024

AdrianoDee commented Feb 27, 2024

mmusich commented Feb 27, 2024

AdrianoDee commented Feb 27, 2024

AdrianoDee commented Feb 27, 2024

makortel commented Feb 29, 2024

mmusich commented Feb 29, 2024

[GPU] Workflow failures when running the alpaka customization in presence of a `Fake` menu #44119

[GPU] Workflow failures when running the alpaka customization in presence of a `Fake` menu #44119

mmusich commented Feb 27, 2024 •

edited

Loading

cmsbuild commented Feb 27, 2024 •

edited

Loading

Martin-Grunewald commented Feb 27, 2024 •

edited

Loading

mmusich commented Feb 27, 2024 •

edited

Loading

Martin-Grunewald commented Feb 27, 2024 •

edited

Loading