Hello everyone,
Today the nvidia driver on my server stopped working out of nowhere. Yesterday it was working and today it’s not. I didn’t do anything in yesterday or today.
Today my Plex container stopped working because there was a problem with the nvidia card I was using for transcoding. It’s a GTX 1650.
I tried running nvidia-smi
and it said Failed to initialize NVML: Driver/library version mismatch
. After I tried upgrading my system because it was a months ago I upgraded, maybe it will help. It didn’t. I tried some rebooting because some sources said it solves the issue but it persisted.
It’s driver reinstall time. Purged the driver with apt purge nvidia*
then installed driver with ubuntu-drivers install --gpgpu nvidia:525-server
. After reboot nvidia-smi
gives the error NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
.
lsmod | grep nvidia
shows nothing and /proc/driver/nvidia/version
doesn’t exists. I tried starting nvidia-persistenced with systemctl but it gives this error:
Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 113 has read and write permissions for those files.
/dev/nvidia* doesn’t exist.
I’m very noobish when it comes to nvidia and linux it was a pain to set it up initially and I was hoping that it wouldn’t go wrong someday. But here I am unfortunatelly. I don’t really know what logs should I show you or what commands should I run to troubleshoot so every tip is appreciated and I will provide logs and things like that if needed.
System info:
- Ubuntu Server 22.04
- kernel: 5.15.0-76-generic
- theoretically installed nvidia driver: nvidia-driver-525-server
Solution
I was using the ubuntu-drivers utility to install the driver but turns out it’s not that great. After installing with the manual method from https://help.ubuntu.com/community/NvidiaDriversInstallation using the command apt install linux-modules-nvidia-${DRIVER_BRANCH}${SERVER}-${LINUX_FLAVOUR}
it’s working again.
I was using the ubuntu-drivers utility that this page mentions too but it turns out it isn’t working very much. Now I installed with the manual method from this page using
apt install linux-modules-nvidia-${DRIVER_BRANCH}${SERVER}-${LINUX_FLAVOUR}
and it’s working. Thank you for the suggestion!