Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable .59521 and .59621 wfs for profiling #42061

Merged

Conversation

srimanob
Copy link
Contributor

PR description:

This PR is a follow up from https://indico.cern.ch/event/1297976/#17-update-on-gpu-profiling.
It enables .59521 and .59621 wfs for profiling.

PR validation:

Test with 11634.59521, all steps run.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

No need of backport.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42061/36041

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @srimanob (Phat Srimanobhas) for master.

It involves the following packages:

  • Configuration/PyReleaseValidation (pdmv, upgrade)

@bbilin, @cmsbuild, @AdrianoDee, @srimanob, @kskovpen, @sunilUIET can you please review it and eventually sign? Thanks.
@makortel, @kpedro88, @Martin-Grunewald, @missirol, @fabiocos, @slomeo this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@srimanob
Copy link
Contributor Author

test parameters:

  • workflows = 11834.59521
  • relvals_opt = --what cleanedupgrade,standard,highstats,pileup,generator,extendedgen,production,identity,ged,machine,premix,nano,gpu,2017,2026

@srimanob
Copy link
Contributor Author

@cmsbuild please test

@srimanob
Copy link
Contributor Author

FYI @gartung @fwyzard

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-08ab3c/33346/summary.html
COMMIT: 2706dee
CMSSW: CMSSW_13_2_X_2023-06-22-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/42061/33346/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 14 lines to the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3200270
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3200242
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@srimanob
Copy link
Contributor Author

+Upgrade

The new 11834.59521 runs fine.

@fwyzard
Copy link
Contributor

fwyzard commented Jun 26, 2023

FYI @gartung @fwyzard

Thanks for adding these workflows.

@srimanob
Copy link
Contributor Author

Kindly ping @cms-sw/pdmv-l2

@sunilUIET
Copy link
Contributor

+pdmv

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 0f56b14 into cms-sw:master Jun 27, 2023
@gartung
Copy link
Member

gartung commented Jul 5, 2023

@srimanob Should this work with 23834 workflows as well?

@srimanob
Copy link
Contributor Author

srimanob commented Jul 5, 2023

@srimanob Should this work with 23834 workflows as well?

Hi @gartung
We need #41011 to extend patatrack to newer geometry. I will push for that.

@gartung
Copy link
Member

gartung commented Jul 11, 2023

@srimanob No gpu kernels were run in the 59621 workflow. Previously I had added
process.options.accelerators = ['gpu-nvidia'], but I don't see this in the generated config files.
The nsys log file does not show any kernels being run.

@srimanob
Copy link
Contributor Author

@fwyzard @AdrianoDee @makortel
Do you have some clues?

What I understand is that, .59621 is a clone of .596 which should use GPU if available, and procmodifier of GPU is enable.
https://github.com/cms-sw/cmssw/blob/master/Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py#L1422-L1462

'--accelerators': 'gpu-nvidia', is a must to use, which we don't expect to use it. The example of workflow is .597 when we run both on CPU and GPU, then make comparison.

Thx.

@srimanob
Copy link
Contributor Author

Just a note, I don't understand the log. It does not mean you don't have a match system to use GPU?

An exception of category 'UnavailableAccelerator' occurred while
   [0] Constructing the EventProcessor
Exception Message:
The system has no compute accelerators that match the patterns specified in process.options.accelerators:
 gpu-nvidia

The following compute accelerators are available:
 cpu

@gartung
Copy link
Member

gartung commented Jul 11, 2023

It could be an unrecognized nvidia gpu on cms-oc-gpu-01. It has two RTX 2060's.

----- Begin Fatal Exception 11-Jul-2023 23:43:18 CEST-----------------------
An exception of category 'UnavailableAccelerator' occurred while
   [0] Constructing the EventProcessor
Exception Message:
The system has no compute accelerators that match the patterns specified in process.options.accelerators:
 gpu-nvidia

The following compute accelerators are available:
 cpu

@gartung
Copy link
Member

gartung commented Jul 11, 2023

Oh. Wait... The logs got over written when I put the explicit modifier back.

@gartung
Copy link
Member

gartung commented Jul 11, 2023

I will re-run the job that generates the log.

@fwyzard
Copy link
Contributor

fwyzard commented Jul 11, 2023

CMSSW seems to recognize the GPUs without issues:

[2023-07-12 00:14:10] fwyzard@cms-oc-gpu-01:/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/slc7_amd64_gcc11/cms/cmssw/CMSSW_13_2_X_2023-07-11-1100$ cudaComputeCapabilities 
   0     7.5    NVIDIA GeForce RTX 2060
   1     7.5    NVIDIA GeForce RTX 2060

@gartung
Copy link
Member

gartung commented Jul 12, 2023

It worked with the current IB
https://cmssdt.cern.ch/SDT/jenkins-artifacts/profiling/CMSSW_13_2_X_2023-07-12-1100/el8_amd64_gcc11/11834.59621/step3_gpu_nsys.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants