Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AlpakaTest tests to use their dependencies for controlling their execution, and extend module tests to ROCm. #43204

Merged
merged 3 commits into from
Nov 10, 2023

Conversation

makortel
Copy link
Contributor

@makortel makortel commented Nov 6, 2023

PR description:

Following the discussion in #41340 this PR makes the HeterogeneousCore/AlpakaTest tests to use their dependencies to control their execution on various hardware. It also extends the Alpaka module tests to ROCm. It also adds a note on the unit tests to the README.

Resolves cms-sw/framework-team#564

PR validation:

Unit tests run on a machine without a GPU, and on a machine with NVIDIA GPU. I was not able to test on AMD GPU.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43204/37528

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2023

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

  • HeterogeneousCore/AlpakaCore (heterogeneous)
  • HeterogeneousCore/AlpakaTest (heterogeneous)

@fwyzard, @makortel, @cmsbuild can you please review it and eventually sign? Thanks.
@rovere, @missirol this is something you requested to watch as well.
@rappoccio, @antoniovilela, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Nov 6, 2023

enable gpu

@makortel
Copy link
Contributor Author

makortel commented Nov 6, 2023

@cmsbuild, please test

@fwyzard
Copy link
Contributor

fwyzard commented Nov 6, 2023

Is there a way to force running an alpaka-based test even if the corresponding backend is not available ?

HeterogeneousCore/AlpakaInterface/test/alpaka/testVec.cc is host-only and should work with any back-end.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2023

-1

Failed Tests: RelVals-GPU GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-973fa9/35642/summary.html
COMMIT: dc64d75
CMSSW: CMSSW_13_3_X_2023-11-06-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43204/35642/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-GPU

  • 12434.58712434.587_TTbar_14TeV+2023_Patatrack_AllTripletsGPU_Validation/step2_TTbar_14TeV+2023_Patatrack_AllTripletsGPU_Validation.log

GPU Unit Tests

I found 1 errors in the following unit tests:

---> test testHeterogeneousCoreAlpakaTestModulesCUDA had ERRORS

Comparison Summary

Summary:

  • You potentially removed 99 lines from the logs
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3363010
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3362982
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

makortel commented Nov 7, 2023

The test failures look like the node had no GPU (probably the GPU setup of the node was incompatible with CUDA 12.2?).

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 8, 2023

-1

Failed Tests: RelVals-GPU GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-973fa9/35642/summary.html
COMMIT: dc64d75
CMSSW: CMSSW_13_3_X_2023-11-06-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43204/35642/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-GPU

  • 12434.58712434.587_TTbar_14TeV+2023_Patatrack_AllTripletsGPU_Validation/step2_TTbar_14TeV+2023_Patatrack_AllTripletsGPU_Validation.log

GPU Unit Tests

I found 1 errors in the following unit tests:

---> test testHeterogeneousCoreAlpakaTestModulesCUDA had ERRORS

Comparison Summary

Summary:

  • You potentially removed 99 lines from the logs
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3363010
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3362982
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

makortel commented Nov 8, 2023

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 9, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-973fa9/35704/summary.html
COMMIT: dc64d75
CMSSW: CMSSW_14_0_X_2023-11-08-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43204/35704/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 176 lines from the logs
  • Reco comparison results: 14 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3363010
  • DQMHistoTests: Total failures: 1397
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3361591
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 48 differences found in the comparisons
  • DQMHistoTests: Total files compared: 3
  • DQMHistoTests: Total histograms compared: 39740
  • DQMHistoTests: Total failures: 1147
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 38593
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
  • Checked 8 log files, 10 edm output root files, 3 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

makortel commented Nov 9, 2023

The CPU comparison differences are related to #39803.

The GPU comparison differences are in 12434.586 and 12434.587, 572 differences in both (on a cursory look limited to pixel tracks or derivatives). I guess this is within the normal variation of the GPU workflows?

@makortel
Copy link
Contributor Author

makortel commented Nov 9, 2023

Unit test lists that both tests in AlpakaTest get run for CPU and GPU.

@makortel
Copy link
Contributor Author

makortel commented Nov 9, 2023

@fwyzard @smuzaffar Do you have any further comments?

@smuzaffar
Copy link
Contributor

No @makortel , nothing to add

@makortel
Copy link
Contributor Author

+heterogeneous

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate AlpakaTest unit tests to use the new test declarations
5 participants