This repository contains steps to reproduce our issue with the error:
ERROR: An NVIDIA kernel module 'nvidia' appears to already be loaded in your kernel.
This may be because it is in use (for example, by an X server, a CUDA program, or
the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured
without support for module unloading. Please be sure to exit any programs that may be
using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs
are running, you know that your kernel supports module unloading, and you still receive
this message, then an error may have occured that has corrupted an NVIDIA kernel
module's usage count, for which the simplest remedy is to reboot your computer.
The included Audiomack-EMR-DLAMI-NVIDIA-issue-March-2021.pdf file describes the whole scenario.
The python/sample_driver.py can be used as the Spark driver program to configure the EMR cluster with. It's a simple no-op and in fact the cluster setup does not get to invoke it because the described NVIDIA error happens first.