## Mean vs Total Squared (or Absolute) Error

A potential confusion is the following: How do we know if we should use the mean or the total squared (or absolute) error?

The total squared error is the sum of errors at each point, given by the following equation:

$M = \sum_{i=1}^m \frac{1}{2} (y - \hat{y})^2$,

whereas the mean squared error is the average of these errors, given by the equation, where $m$ is the number of points:

$T = \sum_{i=1}^m \frac{1}{2m}(y - \hat{y})^2$.

The good news is, it doesn’t really matter. As we can see, the total squared error is just a multiple of the mean squared error, since

$M = mT$.

Therefore, since derivatives are linear functions, the gradient of $T$ is also mm times the gradient of $M$.

However, the gradient descent step consists of subtracting the gradient of the error times the learning rate $\alpha$. Therefore, choosing between the mean squared error and the total squared error really just amounts to picking a different learning rate.

In real life, we’ll have algorithms that will help us determine a good learning rate to work with. Therefore, if we use the mean error or the total error, the algorithm will just end up picking a different learning rate.

## Build an Environment for Deep Learning

Recently I bought a Nvidia GPU card, GTX 1080 Ti, it’s a very powerful card. I’m very happy to have an my own card for deep learning purpose. In order to make it work I bought an used HP Z420 workstation to work with it. Though z420 is used, it’s very powerful to work with GTX 1080 Ti. Buying an used product also saves me a lot of money because of my limited budget. The time I got the card, I’m eager to make everything combined and functional. Now I already have an ready environment for deep learning purpose, here I want to share the process to make everything work and hope this will benefit others, good luck!

My operating system is the recent released Ubuntu LTS version, namely Ubuntu 18.04. So following operations are carried out in this setting.

## Install nvidia driver

There are three methods to install nvidia driver for your card, I will introduce them later. Now an important work to do is disabling nouveau.

1. Disable nouveau

To make your card to work properly, one important thing to do is disabling the open source driver supplied by nouveau, you can do this by editing the grub config file (/boot/grub/grub.cfg), searching for line containing quiet splash and add acpi_osi=linux nomodeset to end of the line.

You can check the type of your card using following method:

 ubuntu-drivers devices


The above command shows all devices which need drivers, and which packages apply to them. The output is as follows on my computer:

== /sys/devices/pci0000:00/0000:00:02.0/0000:05:00.0 ==
modalias : pci:v000010DEd00001B06sv00001458sd0000376Abc03sc00i00
vendor   : NVIDIA Corporation
model    : GP102 [GeForce GTX 1080 Ti]
manual_install: True
driver   : nvidia-driver-390 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin


We can conclude from above outputs that the machine has a gpu card manufactured by the vendor NVIDIA Corporation, its model is GP102 (GeForce GTX 1080 Ti) and a driver nvidia-driver-390 is recommended to install.

3. Install the driver

To install the driver, one easy way is using the command ubuntu-drivers autoinstall which installs drivers that are appropriate for automatic installation. A second way is adding a ppa repository and installing from it. You can find related resources about that, here I mainly describe the third method I use to install the driver, installing from the executable run file downloaded from nvidia official website. By running the installer file provided officially you can install the most recent driver and get better performances.

Finally,I got a file named NVIDIA-Linux-x86_64-410.73.run in my Downloads folder.

Step 2: install the driver

To install the driver, change to the Downloads directory and run the following commands in the terminal:

sudo telinit 3
sudo sh NVIDIA-Linux-x86_64-410.73.run


The first command disables the gui and takes you to an terminal login, all you to do is login, run the second command and continue with the instructions.

Note: You might need to run sudo apt install build-essentials before you carry out the command to install the driver, because some requirements need to meet, like cc etc.

When the installation completes, reboot your machine to make the driver work. Check the installation with the following commands:

• Run command lspci | grep -i nvidia to make sure the output is correct.
• An application named NVDIA X Server Settings is installed, you can open it and check the settings.

4. Install cuda and cudnn

In this part, we install cuda toolkit and cudnn library. An important step is choosing the right cuda version, because I want to install tensorflow-gpu that only supports cuda 9.0, so this version is the only choice I can take.

ps: I try to use cuda 10.0 and cuda 9.2, neither works with tensorflow-gpu. Knowing this will save you a lot of time.

Choose Legacy Releases

After you clicked Legacy Releases, you are taken to the cuda toolkit archive site, here you need to select Cuda Toolkit 9.0. Remained steps are choosing the operating system, architecture, distribution, version and installer type, in my case these are Linux, x86_64, Ubuntu, 16.04 and runfile(local). To install this version, you need to install additional four patches, download and install them.

As in previous, check following picture easing your choosing:

After all necessary files are downloaded, just issue the command sudo sh runfile, the order is main runfile, then patch 1 runfile, patch 2 runfile, patch 3 runfile, finally patch 4 runfile. When all is done, you have cuda toolkit 9.0 installed.

Note: Add /usr/local/cuda-9.0/bin to your PATH and /usr/local/cuda-9.0/lib64 to your LD_LIBRARY_PATH.

To install cudnn library, you need a nvidia deveploper account to download it. When the download completes, you got an archive file, decompress it and you got an folder named cuda, all you need to do is copy files in this folder to previous installed cuda toolkit related location, see following for details:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*


Congrats! When you are here, you got an cuda toolkit 9.0 and cudnn library instaled.

5. Install tensorflow-gpu

To install tensorflow, please refer to Install Tensorflow with pip. Following the guide, I created a python virtual environment and installed tensorflow-gpu with pip.

6. Test tensorflow

Now we have tensorflow installed, let’s check it by a simple tutorial.

import tensorflow as tf
mnist = tf.keras.datasets.mnist

x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)


Let’s check the output by running above code:

(venv) duliqiang@hp-z420-workstation:~/ml$python tf_minist.py Epoch 1/5 2018-11-03 14:39:13.246295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.721 pciBusID: 0000:05:00.0 totalMemory: 10.92GiB freeMemory: 10.17GiB 2018-11-03 14:39:13.246620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0 2018-11-03 14:39:17.931728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-11-03 14:39:17.931773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 2018-11-03 14:39:17.931782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N 2018-11-03 14:39:17.932021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9827 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1) 60000/60000 [==============================] - 14s 237us/step - loss: 0.2017 - acc: 0.9402 Epoch 2/5 60000/60000 [==============================] - 7s 116us/step - loss: 0.0813 - acc: 0.9747 Epoch 3/5 60000/60000 [==============================] - 7s 120us/step - loss: 0.0532 - acc: 0.9836 Epoch 4/5 60000/60000 [==============================] - 7s 119us/step - loss: 0.0370 - acc: 0.9880 Epoch 5/5 60000/60000 [==============================] - 7s 117us/step - loss: 0.0271 - acc: 0.9917 10000/10000 [==============================] - 1s 60us/step  Nice work! The gpu card is working as expected. ## RNNs简介 ## 什么是RNNs? RNNs 背后的思想是利用顺序的信息（sequential information）。在传统的神经网络中，我们假定所有的输入（以及输出）是彼此相互独立的。但这对于很多任务来说却是一个糟糕的想法。举个例子，如果你要预测一个句子中的下一个词，你最好知道哪些词出现在这个词之前。RNNs之所以称之为 recurrent ，因为它们对序列中的每个元素执行相同的任务，同时输出依赖于前面的计算。另一种理解RNNs的方式是它们具有记忆能够捕获到目前为止已经计算的信息。从理论上讲，RNNs可以利用任意长序列中的信息，但在实际中它们限于仅往后看若干步。（why？）一个典型的RNN看起来如下所示： 上图显示了一个展开（unrolled or unfolded）成全网络的RNN。这里的展开我指的是写出整个句子对应的网络。例如，如果我们关心的句子由5个单词组成，那么网络将被展开成一个5层的网络，每个单词对应一层。 • $x_t$ 是时刻t的输入。例如，$x_1$可以是一个one-hot向量，对应于句子中的第二个单词。 • $s_t$是时刻t的隐藏状态。它是网络的“记忆”。它的计算基于先前的隐藏状态以及当前时刻的输入：$s_t=f(Ux_t + Ws_{t-1})$。函数f通常是一个非线性函数，如tanh或ReLU。$s_{-1}$一般初始化为0，我们需要它计算第一个隐藏态。 • $o_t$是时刻t的输出。例如，如果我们想要预测一个句子中的下一个词，它将是一个概率向量，其中每个概率对应词汇表中对应单词出现的概率。$o_t=softmax(Vs_t)$ 一些注意的点： • 你可以将隐藏态$s_t$想象为网络的memory。$s_t$捕捉在所有先前的步骤中发生的信息。输出$o_t$仅仅基于时刻$t$时的memory进行计算。实际情况会更复杂，因为$s_t$不能捕捉太久之前的信息。 • 与传统的神经网络不同，其在每一层使用不同的参数，RNN在所有层共享相同的参数（上图中的$U, V, W$）。这反映了一个事实，即RNN在每一步执行同样的任务，仅仅是输入不同。这极大地减少了要学习的参数数目。 • 在上图中，每一步都有一个输出，在实际情况中，根据任务有的输出是不必要的。比如，当我们在预测一个句子表达的情绪是积极还是消极时，我们仅关心最后的输出，而不是每个词的sentiment。类似的，每一步的输入也不是必须的。RNN主要的特征在于其隐藏态。 ## RNNs可以做什么？ RNNs在许多NLP任务中表现出了巨大的成功。最常使用的一种RNNs的类型为LSTMs，比起原始的RNNs，其更擅长捕捉长期（long-term）的依赖。以下给出在NLP中RNNs的一些样例应用。 ### 语言建模和文本生成 给定一个由单词组成的句子，在给定前面词的情况下预测每个词的概率。语言建模使我们能度量一个句子的可能性，这是机器翻译的一个重要输入（因为高概率的句子通常是正确的）。能够预测下一个词的副作用是我们得到一个生成模型，通过采样输出概率将能使模型产生新的文本。并且，取决于我们的训练数据，我们可以生成各种东西。在语言建模中，输入通常是由单词（例如，编码为one-hot向量）构成的序列，输出是由预测词构成的序列。在训练网络的时候，我们设置$o_t = x_{t+1}$，因为我们想让步骤t处的输出成为下一个实际词。 下面是关于语言建模和文本生成的一些研究论文： ### 机器翻译 机器翻译与语言建模相似的地方在于输入都是由源语言（比如，德语）中的单词构成的序列。在机器翻译中，我们想要输出一个由目标语言（比如，英语）中的单词构成的序列。一个重要的区别在于输出只有在完整的输入被处理后才开始，因为我们要得到的翻译句子的第一个单词可能需要从完整输入序列捕获的信息。 关于机器翻译的一些研究论文： ### 语音识别 给定一个由声音信号构成的序列，预测一个由语音片段组成的序列以及相应的概率。 注意：上述中的概率指的是某个语音片段的概率。 关于语音识别的研究论文： ### 生成图像描述 结合卷机神经网络（convNets），RNNs一直以来被作为模型的一部分，用来为无标签图片生成描述。这种组合的模型甚至能够将生成的词与在图片中发现的特征对齐。 ## 训练RNNs 训练一个RNN跟训练一个传统的神经网络是类似的。我们也使用BP算法，但是会稍做修改。由于在RNN中，参数在所有时刻共享，梯度计算不仅依赖于当前时刻，还依赖于先前的时刻。例如，为了计算在$t=4$时的梯度，我们需要考虑前3个时刻的梯度，然后把它们加起来。这称之为Backpropagation Through Time (BPTT) 。当使用BPTT训练一个原始的RNN时，将存在困难学习长期的依赖（比如，存在于很远时刻之间的依赖），这是由称之为梯度消失/梯度膨胀问题导致的。已经存在某种机制来应对这些问题，并且RNNs的某些类型（如LSTMs）被特别设计来解决这些问题。 ## RNN扩展 多年以来，研究者们已经开发出了更复杂类型的RNNs来应对原始RNNs模型的缺点。下面提供一个简要的overview。 Bidirectional RNNs（双向RNNs），它背后的想法是在时刻$t\$的输出不仅依赖于先前的元素，也依赖于后续的元素。比如，要预测一个句子中缺失的单词，你会查看其左右的上下文。

Deep (Bidirectional) RNNs

LSTM networks

LSTM相比原始的RNNs在结构上没有根本的不同，但使用了不同的函数去计算隐藏态。在LSTM中，memory称之为cells，可以将其想象为黑盒子，它们接受先前的隐藏态$h_{t-1}$和当前的输入$x_t$作为输入。在内部，这些cells决定在memory中保留什么（以及擦除什么）。最后将先前的状态，当前的记忆以及输入结合起来。这种类型的单元被证明在捕获长期依赖上十分高效。如果你有兴趣了解更多，这篇博文给出了一个十分出色的解释