【李宏毅ML笔记】 10 Brief introduction of deep learning

注意：

1 Muti-layer perceptron 类似于 DNN

2 2006年，用RBM找的初始化参数的神经网络叫做DeepLearning，否则则是1980年代的Muti-layer perceptron。用到了很多较难的方法做initialization，带来的帮助并不大。

3 一般大的DeepLearning，训练要几周，后来用GPU，可以加速

【李宏毅ML笔记】 10 Brief introduction of deep learning

注意：

1 Step1中的a set function，就是Neural Network.

【李宏毅ML笔记】 10 Brief introduction of deep learning

注意：

1 怎么连接不同的Neuron呢？

【李宏毅ML笔记】 10 Brief introduction of deep learning

注意：

1 把Neuron排成两排，每一组都有自己的权重weight和偏置bias，是根据trainng data找出来的

2 输入1，-1，经过若干个Neuron Network，使用其中的权重和偏置，再加上SIGMOD函数，得到新的值，不断地变换，到最终结果。

3 如果该Neuron Network中的参数（所有权重和偏置）都知道的话，则该Neuron Network就是一个function。其input是一个vector，output是另外一个vector。

如果给一个Neuron Network结构设置不同的参数（所有权重和偏置），那么可以得到多个function，即定义了a function set。

类比，Neuron Network的function相对一般的机器学习算法中的function很大。

所以，决定了Neuron Network的结构，就决定了一个function set。

【李宏毅ML笔记】 10 Brief introduction of deep learning

注意：

1 好多排的Neuron，每一排的Neuron的数目可能很多，每个球代表一个Neuron。

2 第一排Neuron1的输出会接给第二排每一个Neuron2的输出。Layer2的input是layer1的output。因为两个layer之间的neuron两两链接，所以叫fully connect network

3 因为**的方向是有layer1，到layer2，到layer3，由后往前传，则叫feedforward network

4 整个输入需要一个input，对layer1的每一个Neuron的input，就是input layer的每一个dimension。

最后一个layer L直接输出对应的yi值，Output Layer。

对于Hidden layer有多少层叫DeepLearning呢？AlexNet有8层，VGG有19层，GoogleNet有22层，其它如下：

【李宏毅ML笔记】 10 Brief introduction of deep learning

对于Neuron 怎么操作的呢？看下图，利用矩阵操作，左边为权重矩阵，每一行为一个Neuron的权重列表，右边为输入向量，最后加上偏置。最后通过signmoid function**函数做转换，依次不停地处理每一层Neutron，即：

【李宏毅ML笔记】 10 Brief introduction of deep learning

看下图，假设wi代表第i层layer的权重参数矩阵（每行为一个Neutron的权重列表），bi代表第i层layer的偏置的向量

输入向量x1，x2...xn作为x，输出向量y1，y2..yn作为y，配备sigmoid之类的**函数

【李宏毅ML笔记】 10 Brief introduction of deep learning

注意：

1 一个Neuron Network所做的工作就是一连串的vector乘以matrix，再加上matrix

2 写成矩阵方式，可使用GPU并行计算技术，加速计算。当需要做矩阵运算的时候，就调用一下GPU，使其完成该计算工作，比CPU快。

实际network，可看作：

1 将以前手工的feature engineering 替换为自动的 feature extrator，抽出归纳为一组新的feature。

2 output layer，可看做muti-class classifier

【李宏毅ML笔记】 10 Brief introduction of deep learning

看下图，例子：

1 输入向量x：2 中有图黑的地方就是1，没有就是0，共256维。

2 output如果是十维的话，代表1,2,3....0的几率，the confidence of a digit。

3 需要一个function，将256维度的输入，输出为10维的数字几率。所以，设计network structure（很关键），构建出一个function set，寻找适用于手写上是别的最佳候选function。

【李宏毅ML笔记】 10 Brief introduction of deep learning

看下图，FAQ？

有了DeepLearning，不需要再做feature selection/手动transform，而是全部丢进去，但需要解决新问题，即：多少个layer，每个层多少个Neuron，经验+直觉+试错。

如果是语音辨识，图像识别的，则design network structure，比 feature engeering容易。对人来说，抽取一组好的feature，来做分类或回归更难。不如用design network structure让其自动抽取出一组好的feature。

如果是dl在nlp方面，DeepLearning（仅）可能没有其他的算法好，因为人的文字处理能力较强，可控，可基于先验经验指定规则，而语音却抽象。

能不能自动学network structure？可利用gradient descent，只是方向变复杂了，其他更一般的机器学习方法差不多。

【李宏毅ML笔记】 10 Brief introduction of deep learning

看下图，利用gradient做network training：【这里不是很懂，多复习两遍！！！】

设定初始值（0.2， -0.1， 0.3等），计算每个参数w、b的gradient，即对total loss的偏微分，将其集合起来，叫做gradient，然后更新参数，最后将所有参数减去learning rate。持续更新下去。

【李宏毅ML笔记】 10 Brief introduction of deep learning

gradient descent在Neuron network中手把手算法gradient比较难，可用toolcase。

B-P反向传播，是一种有效率的计算total loss与参数的偏微分的方法。

【李宏毅ML笔记】 10 Brief introduction of deep learning

问题：

越多的parameter，则cover的function set就越大，则起bias就越小，则性能越好。那变deep有什么了不起的？

【李宏毅ML笔记】 10 Brief introduction of deep learning

有种说法：任何连续的function，都可以用一层的hidden layer，因为可以表示成任何的function set，得到好的结果，那deep有意义么？

【李宏毅ML笔记】 10 Brief introduction of deep learning