Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECAL DQM - Add WF .513 for ECAL GPU vs. CPU validation #37123

Merged
merged 5 commits into from
Mar 9, 2022

Conversation

thomreis
Copy link
Contributor

@thomreis thomreis commented Mar 2, 2022

PR description:

This PR adds a new type of WFs with suffix .515 .513 to do the ECAL GPU vs. CPU validation. It also addresses an issue with the configuration raised during the review of #36742 .
Both changes are tracked in #37025.

The WF suffix .515 was chosen because the initially planned .513 is foreseen for a GPU with CPU fallback WF now.

PR validation:

Passes 11634.513 and validation histograms are produced.

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37123/28647

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2022

A new Pull Request was created by @thomreis (Thomas Reis) for master.

It involves the following packages:

  • Configuration/PyReleaseValidation (pdmv, upgrade)
  • DQM/EcalMonitorTasks (dqm)

@jordan-martins, @bbilin, @wajidalikhan, @emanueleusai, @ahmad3213, @cmsbuild, @AdrianoDee, @srimanob, @jfernan2, @kskovpen, @pmandrik, @pbo0, @rvenditti can you please review it and eventually sign? Thanks.
@makortel, @kpedro88, @argiro, @Martin-Grunewald, @missirol, @rchatter, @thomreis, @simonepigazzini, @fabiocos, @slomeo this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@thomreis
Copy link
Contributor Author

thomreis commented Mar 2, 2022

Addresses the ECAL part of #37075 albeit with a different suffix. Could be changed if desired.

@thomreis
Copy link
Contributor Author

thomreis commented Mar 2, 2022

@alejands @fwyzard FYI

@srimanob
Copy link
Contributor

srimanob commented Mar 3, 2022

test parameters:

  • workflow = 11634.515
  • enable_test = gpu
  • relvals_opt = --what upgrade,standard,highstats,pileup,generator,extendedgen,production,ged,machine,premix

@srimanob
Copy link
Contributor

srimanob commented Mar 3, 2022

@srimanob
Copy link
Contributor

srimanob commented Mar 3, 2022

@cmsbuild please test

@fwyzard
Copy link
Contributor

fwyzard commented Mar 3, 2022 via email

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 3, 2022

Pull request has been put on hold by @fwyzard
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

@cmsbuild cmsbuild added the hold label Mar 3, 2022
@fwyzard
Copy link
Contributor

fwyzard commented Mar 3, 2022

A GPU workflow with fallback to CPU exists already, and is the .512 one. Can you use .513 here ?

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 3, 2022

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a1065c/22793/summary.html
COMMIT: adbbc2c
CMSSW: CMSSW_12_3_X_2022-03-02-1100/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37123/22793/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 03-Mar-2022 04:44:37 CET-----------------------
An exception of category 'BadAlloc' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'dqmoffline_step'
   [2] Prefetching for module EcalDQMonitorTask/'ecalMonitorTaskEcalOnly'
   [3] Prefetching for module EcalCPUDigisProducer/'ecalDigis@cuda'
   [4] Prefetching for module EcalRawToDigiGPU/'ecalDigisGPU'
   [5] Calling method for EventSetup module EcalElectronicsMappingGPUESProducer/'ecalElectronicsMappingGPUESProducer'
Exception Message:
A std::bad_alloc exception was thrown.
The job has probably exhausted the virtual memory available to the process.
----- End Fatal Exception -------------------------------------------------

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19811
  • DQMHistoTests: Total failures: 2158
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 17653
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2022

Pull request #37123 was updated. @jordan-martins, @makortel, @bbilin, @wajidalikhan, @emanueleusai, @ahmad3213, @AdrianoDee, @srimanob, @jfernan2, @kskovpen, @fwyzard, @pmandrik, @pbo0, @rvenditti can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a1065c/22839/summary.html
COMMIT: f9271b0
CMSSW: CMSSW_12_3_X_2022-03-04-0800/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/37123/22839/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-a1065c/11634.513_TTbar_14TeV+2021_Patatrack_ECALOnlyGPU_Validation+TTbar_14TeV_TuneCP5_GenSim+Digi+RecoNano+HARVESTNano

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19811
  • DQMHistoTests: Total failures: 1579
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 18232
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 1 / 3 workflows

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3987741
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3987711
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 204 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

jfernan2 commented Mar 4, 2022

+1

@fwyzard
Copy link
Contributor

fwyzard commented Mar 4, 2022

+1

@AdrianoDee
Copy link
Contributor

+upgrade

@thomreis
Copy link
Contributor Author

thomreis commented Mar 9, 2022

@cms-sw/pdmv-l2 do you have any comments?

@kskovpen
Copy link
Contributor

kskovpen commented Mar 9, 2022

+pdmv

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 9, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Mar 9, 2022

+1

@cmsbuild cmsbuild merged commit c0513d4 into cms-sw:master Mar 9, 2022
@thomreis thomreis deleted the ecal-gpu-reco-dqm-wf branch March 9, 2022 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants