Recently I bought a Nvidia GPU card, GTX 1080 Ti, it’s a very powerful card. I’m very happy to have an my own card for deep learning purpose. In order to make it work I bought an used HP Z420 workstation to work with it. Though z420 is used, it’s very powerful to work with GTX 1080 Ti. Buying an used product also saves me a lot of money because of my limited budget. The time I got the card, I’m eager to make everything combined and functional. Now I already have an ready environment for deep learning purpose, here I want to share the process to make everything work and hope this will benefit others, good luck!
My operating system is the recent released Ubuntu LTS version, namely Ubuntu 18.04. So following operations are carried out in this setting.
Install nvidia driver
There are three methods to install nvidia driver for your card, I will introduce them later. Now an important work to do is disabling nouveau.
1. Disable nouveau
To make your card to work properly, one important thing to do is disabling the open source driver supplied by
nouveau, you can do this by editing the grub config file (/boot/grub/grub.cfg), searching for line containing
quiet splash and add
acpi_osi=linux nomodeset to end of the line.
2. Learn about your card
You can check the type of your card using following method:
The above command shows all devices which need drivers, and which packages apply to them. The output is as follows on my computer:
== /sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0 ==
modalias : pci:v000010DEd00001B06sv00001458sd0000376Abc03sc00i00
vendor : NVIDIA Corporation
model : GP102 [GeForce GTX 1080 Ti]
driver : nvidia-driver-390 - distro non-free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
We can conclude from above outputs that the machine has a gpu card manufactured by the vendor NVIDIA Corporation, its model is GP102 (GeForce GTX 1080 Ti) and a driver nvidia-driver-390 is recommended to install.
3. Install the driver
To install the driver, one easy way is using the command
ubuntu-drivers autoinstall which installs drivers that are appropriate for automatic installation. A second way is adding a ppa repository and installing from it. You can find related resources about that, here I mainly describe the third method I use to install the driver, installing from the executable run file downloaded from nvidia official website. By running the installer file provided officially you can install the most recent driver and get better performances.
Step 1: download the installer file
You can visit drivers site to download appropriate driver installer file that matches your card, os and language preference. In my case, I downloaded a most recent version that matches the options indicated by following picture.
Finally,I got a file named
NVIDIA-Linux-x86_64-410.73.run in my Downloads folder.
Step 2: install the driver
To install the driver, change to the Downloads directory and run the following commands in the terminal:
sudo telinit 3
sudo sh NVIDIA-Linux-x86_64-410.73.run
The first command disables the gui and takes you to an terminal login, all you to do is login, run the second command and continue with the instructions.
Note: You might need to run
sudo apt install build-essentials before you carry out the command to install the driver, because some requirements need to meet, like cc etc.
When the installation completes, reboot your machine to make the driver work. Check the installation with the following commands:
- Run command
lspci | grep -i nvidia to make sure the output is correct.
- An application named
NVDIA X Server Settings is installed, you can open it and check the settings.
4. Install cuda and cudnn
In this part, we install cuda toolkit and cudnn library. An important step is choosing the right cuda version, because I want to install tensorflow-gpu that only supports cuda 9.0, so this version is the only choice I can take.
ps: I try to use cuda 10.0 and cuda 9.2, neither works with tensorflow-gpu. Knowing this will save you a lot of time.
With the version determined, go to the cuda toolkit downloads site, this page shows you the recent cuda toolkit 10.0 download, you need to go to the legacy releases to download older versions.
Let’s check the pages to ease your downloads:
Choose Legacy Releases
After you clicked Legacy Releases, you are taken to the cuda toolkit archive site, here you need to select Cuda Toolkit 9.0. Remained steps are choosing the operating system, architecture, distribution, version and installer type, in my case these are Linux, x86_64, Ubuntu, 16.04 and runfile(local). To install this version, you need to install additional four patches, download and install them.
As in previous, check following picture easing your choosing:
After all necessary files are downloaded, just issue the command
sudo sh runfile, the order is main runfile, then patch 1 runfile, patch 2 runfile, patch 3 runfile, finally patch 4 runfile. When all is done, you have cuda toolkit 9.0 installed.
/usr/local/cuda-9.0/bin to your
/usr/local/cuda-9.0/lib64 to your
To install cudnn library, you need a nvidia deveploper account to download it. When the download completes, you got an archive file, decompress it and you got an folder named cuda, all you need to do is copy files in this folder to previous installed cuda toolkit related location, see following for details:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
Congrats! When you are here, you got an cuda toolkit 9.0 and cudnn library instaled.
5. Install tensorflow-gpu
To install tensorflow, please refer to Install Tensorflow with pip. Following the guide, I created a python virtual environment and installed tensorflow-gpu with pip.
6. Test tensorflow
Now we have tensorflow installed, let’s check it by a simple tutorial.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
model.fit(x_train, y_train, epochs=5)
Let’s check the output by running above code:
(venv) duliqiang@hp-z420-workstation:~/ml$ python tf_minist.py
2018-11-03 14:39:13.246295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.721
totalMemory: 10.92GiB freeMemory: 10.17GiB
2018-11-03 14:39:13.246620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2018-11-03 14:39:17.931728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-03 14:39:17.931773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2018-11-03 14:39:17.931782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2018-11-03 14:39:17.932021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9827 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
60000/60000 [==============================] - 14s 237us/step - loss: 0.2017 - acc: 0.9402
60000/60000 [==============================] - 7s 116us/step - loss: 0.0813 - acc: 0.9747
60000/60000 [==============================] - 7s 120us/step - loss: 0.0532 - acc: 0.9836
60000/60000 [==============================] - 7s 119us/step - loss: 0.0370 - acc: 0.9880
60000/60000 [==============================] - 7s 117us/step - loss: 0.0271 - acc: 0.9917
10000/10000 [==============================] - 1s 60us/step
Nice work! The gpu card is working as expected.