Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix numeric issues in PFCand scaling, add some debug output #41550

Merged
merged 4 commits into from
May 16, 2023

Conversation

kdlong
Copy link
Contributor

@kdlong kdlong commented May 5, 2023

PR description:

Fix candidate scaling in cases where momentum is a huge number, where the mass-aware scaling hits numeric issues. The fix is simple, just calculate E in a way without ratios involved. This should be a fix for #41397

PR validation:

Checked that the crash and nan candidate are fixed and the mass scaling works properly on the failing even from #41397

@cmsbuild
Copy link
Contributor

cmsbuild commented May 5, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41550/35424

@cmsbuild
Copy link
Contributor

cmsbuild commented May 5, 2023

A new Pull Request was created by @kdlong (Kenneth Long) for master.

It involves the following packages:

  • DataFormats/ParticleFlowCandidate (reconstruction)
  • RecoParticleFlow/PFClusterTools (reconstruction)
  • RecoParticleFlow/PFProducer (reconstruction)

@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks.
@mmarionncern, @rovere, @lgray, @missirol, @hatakeyamak, @seemasharmafnal this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@kdlong kdlong mentioned this pull request May 5, 2023
@mmusich
Copy link
Contributor

mmusich commented May 5, 2023

type pf

@mmusich
Copy link
Contributor

mmusich commented May 5, 2023

type bug-fix

@cmsbuild
Copy link
Contributor

cmsbuild commented May 5, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41550/35425

@cmsbuild
Copy link
Contributor

cmsbuild commented May 5, 2023

Pull request #41550 was updated. @cmsbuild, @mandrenguyen, @clacaputo can you please check and sign again.


float e = std::sqrt(p() * p() * rescaleFactor * rescaleFactor + mass() * mass());

// Protect against invalid values (shouldn't happen, but could)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether with the new formulation above (no more differences that could lead to numerical instabilities) this check is still needed. Did you try in your test if e can get some invalid value now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't see invalid values after this fix. I think what happened is that the scaling was applied multiple times, the first time being screwed up by the precision, and the second time giving a negative value for e^2 due to a effectively having a negative mass from the inconsistent E value. Probably this check isn't needed, or it could raise and exception instead of silently correcting the value. I'm open to suggestions.

@mandrenguyen
Copy link
Contributor

I suggest we roll back my hotfix (##41473) in this PR, as this solution should address the problem at the source. Please also prepare a 13_1_X backport for consistency

@kdlong
Copy link
Contributor Author

kdlong commented May 7, 2023

Thanks @mandrenguyen, I'll do this ASAP. Should I leave in the nan check in the rescaling function or drop it there as well?

@mandrenguyen
Copy link
Contributor

I guess we should avoid checks that might end up obscuring problems.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41550/35500

@cmsbuild
Copy link
Contributor

Pull request #41550 was updated. @cmsbuild, @mandrenguyen, @clacaputo can you please check and sign again.

@clacaputo
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2822e8/32558/summary.html
COMMIT: aef16e0
CMSSW: CMSSW_13_2_X_2023-05-11-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41550/32558/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

  • 136.83111136.83111_RunJetHT2017FreMINIAODUL/step2_RunJetHT2017FreMINIAODUL.log
  • 136.9136.9_RunDoubleMuon2016C/step2_RunDoubleMuon2016C.log
  • 139.001139.001_RunMinimumBias2021/step2_RunMinimumBias2021.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • You potentially removed 3 lines from the logs
  • Reco comparison results: 19 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3461906
  • DQMHistoTests: Total failures: 35
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3461849
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@clacaputo
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-2822e8/32626/summary.html
COMMIT: aef16e0
CMSSW: CMSSW_13_2_X_2023-05-14-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41550/32626/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 22 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3461906
  • DQMHistoTests: Total failures: 44
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3461840
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

  • There are tiny differences in the masses of very few packedPFCandidates, which are apparently consistent with the changes at the level of numerical precision implemented with this fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants