Benchmarking Results: Energy Efficiency and Wall Time Analysis #3400
alanakihn
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
Nice analysis!
Without knowing more about your AMD CPU I would be very careful with such statements. You are comparing right now a TGV against a Tram. You call is an AMD 23xx CPU but the only one I can find that matches that description is the Ryzen 3 2300X. How did you measure the energy utilization? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wanted to share a few performance results from an undergraduate research project I conducted, supervised by @BrodiePearson and @amrapallig. I evaluated energy efficiency and wall time across different hardware available to us:
We used the Langmuir turbulence example, exploring different sized simulations by adjusting the number of grid points while maintaining the same resolution across simulations. We also performed a similar analysis for a Julia package for spectral simulation of 2D/layered geophysical systems (GeophysicalFlows.jl - some lines included below; @navidcy) and the convecting plankton Oceananigans example (adjusted to be 3D – results not shown here).
Speed of simulations
Our data indicates a marked superiority of the Nvidia A100 GPUs over the V100s in terms of computational wall time. The A100s demonstrate significantly lower normalized wall times across all tested grid points, and allow larger simulations than the V100s, showcasing the A100’s advanced memory and computational capabilities. The wall times are normalized by the fastest simulation (across all hardware) for a given software package. For Oceananigans, the fastest simulation at the largest amount of simulated grid points was 247 seconds (about 4 minutes), for GeophysicalFlows it was 100 seconds.
Energy Efficiency
In terms of energy efficiency, the A100 GPUs consume less energy than V100 or CPUs for a given simulation. This benefit stems from the increased speed of the simulations (shorter simulations require hardware to run for less time), rather than from changes in the rate of hardware energy usage (power consumption) as the number of grid points is changed. The energy consumption per grid point for the A100 GPUs is consistently lower than that for the V100 GPUs and CPUs, highlighting the A100s ability to conduct more energy-efficient (faster!) simulations, despite having the highest power consumption of the three pieces of hardware.
Quantitative Insights
For large simulations, the Nvidia A100 GPUs are approximately 1000 times faster and consume roughly 100 times less energy per grid point compared to the AMD CPUs. In addition, the A100 GPUs are about twice as fast and consume nearly half as much energy as the Nvidia V100 GPUs.
Future Directions
We aim to extend this benchmarking effort to include the 2023 Nvidia H100 GPUs and multi-GPU LES benchmarks and will update as we get them. We currently have access to ~8 H100s and a small number of V100s and A100s. We’re also expecting some Grace-hoppers and a larger number of H100s in the medium term.
If anyone has suggestions for new directions/extensions of this work, please let us know! @glwagner has already suggested that we look at diagnostics and explore whether there are ways to speed up the code when a large number of diverse or similar diagnostics are being computed.
Beta Was this translation helpful? Give feedback.
All reactions