由于 GPU 驱动存在每次重新启动,都需要重新安装的问题,有空予以解决。

Quick Start

# 确认显卡状态
$ nvidia-smi

# 查看驱动版本
$ ls -l /usr/src/
total 16
drwxr-xr-x  2 root root 4096 Jul  4 15:03 annobin
drwxr-xr-x. 2 root root 4096 Feb  9  2022 debug
drwxr-xr-x. 4 root root 4096 Jul  4 15:01 kernels
drwxr-xr-x  8 root root 4096 Jul  4 10:52 nvidia-525.105.17

# 使用 dkms 管理驱动
$ yum install dkms

# 安装指定版本驱动( /usr/src/ 下的驱动版本)
$ sudo dkms install -m nvidia -v 525.105.17

# 确认显卡状态
$ nvidia-smi

History


[root@iZ8vbgjtg291j3aek7yk6bZ ~]# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

[root@iZ8vbgjtg291j3aek7yk6bZ ~]# uname -r
5.10.134-14.1.al8.x86_64


[root@iZ8vbgjtg291j3aek7yk6bZ ~]# ls -l /usr/src/
total 16
drwxr-xr-x  2 root root 4096 Jul  4 15:03 annobin
drwxr-xr-x. 2 root root 4096 Feb  9  2022 debug
drwxr-xr-x. 4 root root 4096 Jul  4 15:01 kernels
drwxr-xr-x  8 root root 4096 Jul  4 10:52 nvidia-525.105.17



[root@iZ8vbgjtg291j3aek7yk6bZ ~]# yum install dkms
Last metadata expiration check: 0:02:28 ago on Tue 11 Jul 2023 09:06:09 PM CST.
Dependencies resolved.
==========================================================================================================================================================================================================================================================
 Package                                                   Architecture                                                Version                                                            Repository                                                 Size
==========================================================================================================================================================================================================================================================
Installing:
 dkms                                                      noarch                                                      3.0.11-1.el8                                                       epel                                                       90 k

Transaction Summary
==========================================================================================================================================================================================================================================================
Install  1 Package

Total download size: 90 k
Installed size: 192 k
Is this ok [y/N]: y
Downloading Packages:
dkms-3.0.11-1.el8.noarch.rpm                                                                                                                                                                                              386 kB/s |  90 kB     00:00
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                                     384 kB/s |  90 kB     00:00
Extra Packages for Enterprise Linux 8 - x86_64                                                                                                                                                                            1.6 MB/s | 1.6 kB     00:00
Importing GPG key 0x2F86D6A1:
 Userid     : "Fedora EPEL (8) <epel@fedoraproject.org>"
 Fingerprint: 94E2 79EB 8D8F 25B2 1810 ADF1 21EA 45AB 2F86 D6A1
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-8
Is this ok [y/N]: y
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                                                                                  1/1
  Installing       : dkms-3.0.11-1.el8.noarch                                                                                                                                                                                                         1/1
  Running scriptlet: dkms-3.0.11-1.el8.noarch                                                                                                                                                                                                         1/1
  Verifying        : dkms-3.0.11-1.el8.noarch                                                                                                                                                                                                         1/1

Installed:
  dkms-3.0.11-1.el8.noarch

Complete!


[root@iZ8vbgjtg291j3aek7yk6bZ ~]# sudo dkms install -m nvidia -v 525.105.17
Sign command: /lib/modules/5.10.134-14.1.al8.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub
Certificate or key are missing, generating self signed certificate for MOK...
Creating symlink /var/lib/dkms/nvidia/525.105.17/source -> /usr/src/nvidia-525.105.17

Building module:
Cleaning build area...
'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.10.134-14.1.al8.x86_64 modules................
Signing module /var/lib/dkms/nvidia/525.105.17/build/nvidia.ko
Signing module /var/lib/dkms/nvidia/525.105.17/build/nvidia-uvm.ko
Signing module /var/lib/dkms/nvidia/525.105.17/build/nvidia-modeset.ko
Signing module /var/lib/dkms/nvidia/525.105.17/build/nvidia-drm.ko
Signing module /var/lib/dkms/nvidia/525.105.17/build/nvidia-peermem.ko
Cleaning build area...

nvidia.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.10.134-14.1.al8.x86_64/extra/

nvidia-uvm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.10.134-14.1.al8.x86_64/extra/

nvidia-modeset.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.10.134-14.1.al8.x86_64/extra/

nvidia-drm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.10.134-14.1.al8.x86_64/extra/

nvidia-peermem.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.10.134-14.1.al8.x86_64/extra/
Adding any weak-modules
depmod...

[root@iZ8vbgjtg291j3aek7yk6bZ ~]# nvidia-smi
Tue Jul 11 21:10:47 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:07.0 Off |                    0 |
| N/A   32C    P0    39W / 300W |      0MiB / 16384MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

References