Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[L0] Enable Counter Based Events by default for immediate command lists #1897

Merged

Conversation

nrspruit
Copy link
Contributor

  • Remove the PVC limitation for enabling counter based events.

@nrspruit nrspruit requested a review from a team as a code owner July 25, 2024 18:11
@github-actions github-actions bot added the level-zero L0 adapter specific issues label Jul 25, 2024
Copy link

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/10099286246

Copy link

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/10099286246
Job status: failure. Test status: skipped.

@pbalcer
Copy link
Contributor

pbalcer commented Jul 25, 2024

Apologies, but all things requiring SYCL (like perf or e2e tests) are broken, waiting for this to merge.

@nrspruit nrspruit added the v0.10.x Include in the v0.10.x release label Jul 26, 2024
Copy link

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/10106902441

Copy link

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/10106902441
Job status: success. Test status: success.

Summary

Name Result %
This PR 100.00%
baseline 101.91%

Benchmark Results

---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl, mean execution time per 10 kernels
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF

        This PR (20.838 μs)   : crit, 0, 20

        baseline (22.705 μs)   :  0, 22

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF

        This PR (23.536 μs)   : crit, 0, 23

        baseline (23.606 μs)   :  0, 23

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>

        This PR (26.697 μs)   : crit, 0, 26

        baseline (23.62 μs)   :  0, 23

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>

        This PR (26.543 μs)   : crit, 0, 26

        baseline (25.476 μs)   :  0, 25

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Hashtable
    todayMarker off
    dateFormat  X
    axisFormat %s

    section hashtable<br>Imm-CmdLists-OFF

        This PR (356.39765 M keys/sec)   : crit, 0, 356

        baseline (306.262877 M keys/sec)   :  0, 306

    -   : 0, 0

    -   : 0, 0

    section hashtable<br>

        This PR (358.792274 M keys/sec)   : crit, 0, 358

        baseline (360.15055 M keys/sec)   :  0, 360

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Bitcracker
    todayMarker off
    dateFormat  X
    axisFormat %s

    section bitcracker<br>Imm-CmdLists-OFF

        This PR (35.5768 s)   : crit, 0, 35

        baseline (39.0378 s)   :  0, 39

    -   : 0, 0

    -   : 0, 0

    section bitcracker<br>

        This PR (35.6101 s)   : crit, 0, 35

        baseline (35.6105 s)   :  0, 35

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Easywave
    todayMarker off
    dateFormat  X
    axisFormat %s

    section easywave<br>Imm-CmdLists-OFF

        This PR (478 ms)   : crit, 0, 478

        baseline (606.0 ms)   :  0, 606

    -   : 0, 0

    -   : 0, 0

    section easywave<br>

        This PR (239 ms)   : crit, 0, 239

        baseline (241.0 ms)   :  0, 241

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench QuickSilver
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QuickSilver<br>

        This PR (117.74 MMS/CTT)   : crit, 0, 117

        baseline (110.88 MMS/CTT)   :  0, 110

    -   : 0, 0

    -   : 0, 0

Loading
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Sobel Filter
    todayMarker off
    dateFormat  X
    axisFormat %s

    section sobel_filter<br>Imm-CmdLists-OFF

        This PR (559.986 ms)   : crit, 0, 559

        baseline (609.227 ms)   :  0, 609

    -   : 0, 0

    -   : 0, 0

    section sobel_filter<br>

        This PR (556.645 ms)   : crit, 0, 556

        baseline (548.773 ms)   :  0, 548

    -   : 0, 0

    -   : 0, 0

Loading

Details

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),20.838,20.821,4.72%,20.289,317.022,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.536,23.520,1.86%,22.936,127.050,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),26.697,26.818,4.06%,22.832,276.037,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),26.543,26.452,4.12%,25.212,270.664,[CPU],[us]

hashtable Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.376595 s
356.397650 million keys/second

hashtable

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.374082 s
358.792274 million keys/second

bitcracker Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00500693 s
bitcracker - total time for whole calculation: 35.5768 s

bitcracker

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00441138 s
bitcracker - total time for whole calculation: 35.6101 s

easywave Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27)
SYCL: Platform : Intel(R) Level-Zero
MAIN: Program successfully completed

easywave

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27)
SYCL: Platform : Intel(R) Level-Zero
MAIN: Program successfully completed

QuickSilver

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.270620e-01 6.154280e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.700810e-01 7.573490e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.651640e-01 7.722560e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 4.139380e-01 8.341790e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.503300e-01 7.950260e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.461060e-01 7.688270e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.433100e-01 7.674490e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.433390e-01 7.899100e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.448010e-01 7.881430e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.411080e-01 7.626550e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.130e+07 1.130e+07 1.130e+07 0.000e+00 100.00
cycleInit 10 3.645e+06 3.645e+06 3.645e+06 0.000e+00 100.00
cycleTracking 10 7.651e+06 7.651e+06 7.651e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.944e+06 4.944e+06 4.944e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.138e+05 2.138e+05 2.138e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.150e+02 4.150e+02 4.150e+02 0.000e+00 100.00
Figure Of Merit 117.74 [Num Mega Segments / Cycle Tracking Time]

sobel_filter Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.51463 s
sobelfilter - total time for whole calculation: 0.559986 s

sobel_filter

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1
OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.48726 s
sobelfilter - total time for whole calculation: 0.556645 s

- Remove the PVC limitation for enabling counter based events.

Signed-off-by: Neil R. Spruit <[email protected]>
@nrspruit nrspruit force-pushed the enable_counter_events_imm_all branch from 3d7c787 to 5b1d9b5 Compare July 26, 2024 15:05
nrspruit added a commit to nrspruit/llvm that referenced this pull request Jul 26, 2024
@nrspruit nrspruit added the ready to merge Added to PR's which are ready to merge label Jul 26, 2024
@omarahmed1111 omarahmed1111 merged commit 2e6c1de into oneapi-src:main Jul 29, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level-zero L0 adapter specific issues ready to merge Added to PR's which are ready to merge v0.10.x Include in the v0.10.x release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants