Mean vs Total Squared (or Absolute) Error

A potential confusion is the following: How do we know if we should use the mean or the total squared (or absolute) error?

The total squared error is the sum of errors at each point, given by the following equation:

M = \sum_{i=1}^m \frac{1}{2} (y - \hat{y})^2,

whereas the mean squared error is the average of these errors, given by the equation, where m is the number of points:

T = \sum_{i=1}^m \frac{1}{2m}(y - \hat{y})^2.

The good news is, it doesn’t really matter. As we can see, the total squared error is just a multiple of the mean squared error, since

M = mT.

Therefore, since derivatives are linear functions, the gradient of T is also mm times the gradient of M.

However, the gradient descent step consists of subtracting the gradient of the error times the learning rate \alpha. Therefore, choosing between the mean squared error and the total squared error really just amounts to picking a different learning rate.

In real life, we’ll have algorithms that will help us determine a good learning rate to work with. Therefore, if we use the mean error or the total error, the algorithm will just end up picking a different learning rate.


Build an Environment for Deep Learning

Recently I bought a Nvidia GPU card, GTX 1080 Ti, it’s a very powerful card. I’m very happy to have an my own card for deep learning purpose. In order to make it work I bought an used HP Z420 workstation to work with it. Though z420 is used, it’s very powerful to work with GTX 1080 Ti. Buying an used product also saves me a lot of money because of my limited budget. The time I got the card, I’m eager to make everything combined and functional. Now I already have an ready environment for deep learning purpose, here I want to share the process to make everything work and hope this will benefit others, good luck!

My operating system is the recent released Ubuntu LTS version, namely Ubuntu 18.04. So following operations are carried out in this setting.

Install nvidia driver

There are three methods to install nvidia driver for your card, I will introduce them later. Now an important work to do is disabling nouveau.

1. Disable nouveau

To make your card to work properly, one important thing to do is disabling the open source driver supplied by nouveau, you can do this by editing the grub config file (/boot/grub/grub.cfg), searching for line containing quiet splash and add acpi_osi=linux nomodeset to end of the line.

2. Learn about your card

You can check the type of your card using following method:

 ubuntu-drivers devices

The above command shows all devices which need drivers, and which packages apply to them. The output is as follows on my computer:

== /sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0 ==
modalias : pci:v000010DEd00001B06sv00001458sd0000376Abc03sc00i00
vendor   : NVIDIA Corporation
model    : GP102 [GeForce GTX 1080 Ti]
manual_install: True
driver   : nvidia-driver-390 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

We can conclude from above outputs that the machine has a gpu card manufactured by the vendor NVIDIA Corporation, its model is GP102 (GeForce GTX 1080 Ti) and a driver nvidia-driver-390 is recommended to install.

3. Install the driver

To install the driver, one easy way is using the command ubuntu-drivers autoinstall which installs drivers that are appropriate for automatic installation. A second way is adding a ppa repository and installing from it. You can find related resources about that, here I mainly describe the third method I use to install the driver, installing from the executable run file downloaded from nvidia official website. By running the installer file provided officially you can install the most recent driver and get better performances.

Step 1: download the installer file

You can visit drivers site to download appropriate driver installer file that matches your card, os and language preference. In my case, I downloaded a most recent version that matches the options indicated by following picture.


Finally,I got a file named in my Downloads folder.

Step 2: install the driver

To install the driver, change to the Downloads directory and run the following commands in the terminal:

sudo telinit 3
sudo sh

The first command disables the gui and takes you to an terminal login, all you to do is login, run the second command and continue with the instructions.

Note: You might need to run sudo apt install build-essentials before you carry out the command to install the driver, because some requirements need to meet, like cc etc.

When the installation completes, reboot your machine to make the driver work. Check the installation with the following commands:

  • Run command lspci | grep -i nvidia to make sure the output is correct.
  • An application named NVDIA X Server Settings is installed, you can open it and check the settings.

4. Install cuda and cudnn

In this part, we install cuda toolkit and cudnn library. An important step is choosing the right cuda version, because I want to install tensorflow-gpu that only supports cuda 9.0, so this version is the only choice I can take.

ps: I try to use cuda 10.0 and cuda 9.2, neither works with tensorflow-gpu. Knowing this will save you a lot of time.

With the version determined, go to the cuda toolkit downloads site, this page shows you the recent cuda toolkit 10.0 download, you need to go to the legacy releases to download older versions.

Let’s check the pages to ease your downloads:

Choose Legacy Releases

Choose cuda legacy releases

After you clicked Legacy Releases, you are taken to the cuda toolkit archive site, here you need to select Cuda Toolkit 9.0. Remained steps are choosing the operating system, architecture, distribution, version and installer type, in my case these are Linux, x86_64, Ubuntu, 16.04 and runfile(local). To install this version, you need to install additional four patches, download and install them.

As in previous, check following picture easing your choosing:

Download Cuda Toolkist 9.0

After all necessary files are downloaded, just issue the command sudo sh runfile, the order is main runfile, then patch 1 runfile, patch 2 runfile, patch 3 runfile, finally patch 4 runfile. When all is done, you have cuda toolkit 9.0 installed.

Note: Add /usr/local/cuda-9.0/bin to your PATH and /usr/local/cuda-9.0/lib64 to your LD_LIBRARY_PATH.

To install cudnn library, you need a nvidia deveploper account to download it. When the download completes, you got an archive file, decompress it and you got an folder named cuda, all you need to do is copy files in this folder to previous installed cuda toolkit related location, see following for details:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Congrats! When you are here, you got an cuda toolkit 9.0 and cudnn library instaled.

5. Install tensorflow-gpu

To install tensorflow, please refer to Install Tensorflow with pip. Following the guide, I created a python virtual environment and installed tensorflow-gpu with pip.

6. Test tensorflow

Now we have tensorflow installed, let’s check it by a simple tutorial.

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
              metrics=['accuracy']), y_train, epochs=5)
model.evaluate(x_test, y_test)

Let’s check the output by running above code:

(venv) duliqiang@hp-z420-workstation:~/ml$ python 
Epoch 1/5
2018-11-03 14:39:13.246295: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.721
pciBusID: 0000:05:00.0
totalMemory: 10.92GiB freeMemory: 10.17GiB
2018-11-03 14:39:13.246620: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2018-11-03 14:39:17.931728: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-03 14:39:17.931773: I tensorflow/core/common_runtime/gpu/]      0 
2018-11-03 14:39:17.931782: I tensorflow/core/common_runtime/gpu/] 0:   N 
2018-11-03 14:39:17.932021: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9827 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
60000/60000 [==============================] - 14s 237us/step - loss: 0.2017 - acc: 0.9402
Epoch 2/5
60000/60000 [==============================] - 7s 116us/step - loss: 0.0813 - acc: 0.9747
Epoch 3/5
60000/60000 [==============================] - 7s 120us/step - loss: 0.0532 - acc: 0.9836
Epoch 4/5
60000/60000 [==============================] - 7s 119us/step - loss: 0.0370 - acc: 0.9880
Epoch 5/5
60000/60000 [==============================] - 7s 117us/step - loss: 0.0271 - acc: 0.9917
10000/10000 [==============================] - 1s 60us/step

Nice work! The gpu card is working as expected.



RNNs 背后的思想是利用顺序的信息(sequential information)。在传统的神经网络中,我们假定所有的输入(以及输出)是彼此相互独立的。但这对于很多任务来说却是一个糟糕的想法。举个例子,如果你要预测一个句子中的下一个词,你最好知道哪些词出现在这个词之前。RNNs之所以称之为 recurrent ,因为它们对序列中的每个元素执行相同的任务,同时输出依赖于前面的计算。另一种理解RNNs的方式是它们具有记忆能够捕获到目前为止已经计算的信息。从理论上讲,RNNs可以利用任意长序列中的信息,但在实际中它们限于仅往后看若干步。(why?)一个典型的RNN看起来如下所示:

A recurrent neural network and the unfolding in time of the computation involved in its forward computation.

上图显示了一个展开(unrolled or unfolded)成全网络的RNN。这里的展开我指的是写出整个句子对应的网络。例如,如果我们关心的句子由5个单词组成,那么网络将被展开成一个5层的网络,每个单词对应一层。

  • x_t 是时刻t的输入。例如,x_1可以是一个one-hot向量,对应于句子中的第二个单词。
  • s_t是时刻t的隐藏状态。它是网络的“记忆”。它的计算基于先前的隐藏状态以及当前时刻的输入:s_t=f(Ux_t + Ws_{t-1})。函数f通常是一个非线性函数,如tanh或ReLU。s_{-1}一般初始化为0,我们需要它计算第一个隐藏态。
  • o_t是时刻t的输出。例如,如果我们想要预测一个句子中的下一个词,它将是一个概率向量,其中每个概率对应词汇表中对应单词出现的概率。o_t=softmax(Vs_t)


  • 你可以将隐藏态s_t想象为网络的memory。s_t捕捉在所有先前的步骤中发生的信息。输出o_t仅仅基于时刻$t$时的memory进行计算。实际情况会更复杂,因为s_t不能捕捉太久之前的信息。
  • 与传统的神经网络不同,其在每一层使用不同的参数,RNN在所有层共享相同的参数(上图中的U, V, W)。这反映了一个事实,即RNN在每一步执行同样的任务,仅仅是输入不同。这极大地减少了要学习的参数数目。
  • 在上图中,每一步都有一个输出,在实际情况中,根据任务有的输出是不必要的。比如,当我们在预测一个句子表达的情绪是积极还是消极时,我们仅关心最后的输出,而不是每个词的sentiment。类似的,每一步的输入也不是必须的。RNN主要的特征在于其隐藏态。




给定一个由单词组成的句子,在给定前面词的情况下预测每个词的概率。语言建模使我们能度量一个句子的可能性,这是机器翻译的一个重要输入(因为高概率的句子通常是正确的)。能够预测下一个词的副作用是我们得到一个生成模型,通过采样输出概率将能使模型产生新的文本。并且,取决于我们的训练数据,我们可以生成各种东西。在语言建模中,输入通常是由单词(例如,编码为one-hot向量)构成的序列,输出是由预测词构成的序列。在训练网络的时候,我们设置o_t = x_{t+1},因为我们想让步骤t处的输出成为下一个实际词。




RNN for Machine Translation








Deep Visual-Semantic Alignments for Generating Image Descriptions


训练一个RNN跟训练一个传统的神经网络是类似的。我们也使用BP算法,但是会稍做修改。由于在RNN中,参数在所有时刻共享,梯度计算不仅依赖于当前时刻,还依赖于先前的时刻。例如,为了计算在$t=4$时的梯度,我们需要考虑前3个时刻的梯度,然后把它们加起来。这称之为Backpropagation Through Time (BPTT) 。当使用BPTT训练一个原始的RNN时,将存在困难学习长期的依赖(比如,存在于很远时刻之间的依赖),这是由称之为梯度消失/梯度膨胀问题导致的。已经存在某种机制来应对这些问题,并且RNNs的某些类型(如LSTMs)被特别设计来解决这些问题。



Bidirectional RNNs(双向RNNs),它背后的想法是在时刻$t$的输出不仅依赖于先前的元素,也依赖于后续的元素。比如,要预测一个句子中缺失的单词,你会查看其左右的上下文。

Bidirectional RNN

Deep (Bidirectional) RNNs

Deep Bidirectional RNN

LSTM networks