V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
MoModel
V2EX  ›  机器学习

反向传播算法推导

  •  1
     
  •   MoModel · 2019-01-19 16:17:45 +08:00 · 2512 次点击
    这是一个创建于 2173 天前的主题,其中的信息可能已经有所发展或是发生改变。

    反向传播(英语:Backpropagation,缩写为 BP )是“误差反向传播”的简称,是一种与最优化方法(如梯度下降法)结合使用的,用来训练人工神经网络的常见方法。该方法对网络中所有权重计算损失函数的梯度。这个梯度会反馈给最优化方法,用来更新权值以最小化损失函数。

    假设,你有这样一个网络层

    第一层是输入层,包含两个神经元 $i1$$i2$,和截距项$b1$;第二层是隐含层,包含两个神经元$h1$,$h2$和截距项$b2$,第三层是输出$o1$,$o2$,每条线上标的$wi$是层与层之间连接的权重,激活函数我们默认为 sigmoid 函数。

    现在对他们赋上初值,如下图:

          其中, 输入数据 $i1=0.05$$i2=0.10$;

    输出数据 $o1=0.01$$o2=0.99$;

    初始权重
    $w1=0.15$$w2=0.20$, $w3=0.25$$w4=0.30$; $w5=0.40$$w6=0.45$, $w7=0.50$$w8=0.55$;

    目标:给出输入数据$i1$$i2$(0.05 和 0.10),使输出尽可能与原始输出$o1$,$o2$(0.01 和 0.99)接近。

    前向传播过程

    1. 输入层---->隐含层:

    计算神经元$h1$的输入加权和:

    net_{h1} = w_1 * i_1 + w_2 * i_2 + b_1 * 1
    
    net_{h1} = 0.15 * 0.05 + 0.2 * 0.1 + 0.35 * 1 = 0.3775
    

    计算神经元$h1$的输出$o1$:(此处用到激活函数为 sigmoid 函数)

    out_{h1} = \frac{1}{1+e^{-net_{h1}}} = 0.5932
    
    

    同理,可计算神经元 $h2$ 的输出 $o2$

    out_{h2} = 0.5968
    
    

    2. 隐藏层---->输出层:

    net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
    
    out_{o1} =  \frac{1}{1+e^{-net_{o1}}} = 0.7514
    

    同样的,计算神经元 o2 的输出

    out_{o2} = 0.7730
    
    

    反向传播过程

    接下来,就可以进行反向传播的计算了

    1. 计算总误差

    E_{total} = E_{o1} + E_{o2}
    
    

    分别计算$o1$,$o2$的误差

    E_{o1} = \frac{1}{2} (target_{o1} - out_{o1})^2 = 0.2748
    
    E_{o2} = \frac{1}{2} (target_{o2} - out_{o2})^2 = 0.0235
    
    E_{total} = E_{o1} + E_{o2} = 0.2983
    

    2. 隐含层---->输出层的权值更新:

    以权重参数$w5$为例,如果我们想知道$w5$对整体误差产生了多少影响,可以用整体误差对$w5$求偏导求出(链式法则)

    \frac {\partial (E_{total} )}{\partial (w_{5})} = \frac {\partial (E_{total} )}{\partial (out_{o1})} + \frac {\partial (out_{o1} )}{\partial (net_{o1})} + \frac {\partial (net_{o1} )}{\partial (w_{5})}
    
    

    下面的图可以更直观的看清楚误差是怎样反向传播的

    我们分别计算每个式子的值:

    计算 $\frac {\partial (E_{total} )}{\partial (out_{o1})}$

     E_{total} = \frac {1}{2}(target_{o1} - out_{o1} )^2 +\frac {1}{2}(target_{o2} - out_{o2} )^2
    
    \frac {\partial (E_{total} )}{\partial (out_{o1})} = - (target_{o1} - out{o1} ) = 0.7414
    
    

    计算 $ \frac {\partial ( out_{o1} )}{\partial (net_{o1})} $

     out_{o1} = \frac{1}{1+e^{-net_{o1}}}
     
     \frac {\partial ( out_{o1}  )}{\partial (net_{o1})} =  out_{o1}(1 - out_{o1} ) = 0.1868
     
     
    

    计算 $ \frac {\partial ( net_{o1} )}{\partial (w_{5})}$

    net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
    
     \frac {\partial ( net_{o1}  )}{\partial (w_{5})} =  out_{h1} = 0.5932
    

    最后三者相乘

    
    \frac {\partial (E_{total} )}{\partial (w_{5})} = \frac {\partial (E_{total} )}{\partial (out_{o1})} * \frac {\partial (out_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (w_{5})} = 0.082
    
    

    看看上面的公式,我们发现:

    \frac {\partial (E_{total} )}{\partial (w_{5})} = -(target_{o1}-out_{o1})*out_{o1}(1-out_{o1})*out_{h1}
    
    

    为了表达方便,用$\delta _{o1}$来表示输出层的误差

    \delta _{o1} = \frac {\partial (E_{total} )}{\partial (out_{o1})} + \frac {\partial (out_{o1} )}{\partial (net_{o1})}
    
    
    
    \delta _{o1} = -(target_{o1}-out_{o1})*out_{o1}(1-out_{o1})
    
    
    
    \frac {\partial (E_{total} )}{\partial (w_{5})} = \delta _{o1} *out_{h1}
    
    

    更新$w_5$的值:

    
    w_5^+ = w_5 - \eta * \frac {\partial (E_{total} )}{\partial (w_{5})} = 0.3589
    
    

    同理,更新 $w_6$,$w_7$,$w_8$

    w_6^+ = 0.4086
    
    w_7^+ = 0.5113
    
    w_8^+ = 0.5614
    

    3.隐含层---->隐含层的权值更新:

    我们可以依照上述的方法计算 $w_1$, $w_2$, $w_3$, $w_4$,方法其实与上面说的差不多,但是有个地方需要变一下。

    在上文计算总误差对 w5 的偏导时,是从:

    $out_{o1}$ -> $net_{o1}$ -> $w_5$

    但是在隐含层之间的权值更新时,是从:

    $out_{h1}$ -> $net_{h1}$ -> $w_1$

    计算 $\frac {\partial (E_{total} )}{\partial (out_{h1})}$

    \frac {\partial (E_{total} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (out_{h1})} + \frac {\partial (E_{o2} )}{\partial (out_{h1})}
    

    先计算$\frac {\partial (E_{o1} )}{\partial (out_{h1})}$

    \frac {\partial (E_{o1} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (out_{h1})}
    
     \frac {\partial (E_{o1} )}{\partial (net_{o1})} = \frac {\partial (E_{o1} )}{\partial (out_{o1})} * \frac {\partial (out_{o1} )}{\partial (net_{o1})} = 0.1385
    
     net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
    
    
     \frac {\partial (net_{o1} )}{\partial (out_{h1})} = w_5= 0.40
    
    \frac {\partial (E_{o1} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (out_{h1})} = 0.138 * 0.4 = 0.055
    

    同理,计算出

     \frac {\partial (E_{o2} )}{\partial (out_{h1})} = -0.019
    

    两者相加,得到总值

     \frac {\partial (E_{total} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (out_{h1})} + \frac {\partial (E_{o2} )}{\partial (out_{h1})} = 0.036
    

    再计算 $\frac {\partial (out_{h1} )}{\partial (net_{h1})}$

    
     out_{h1} = \frac{1}{1+e^{-net_{h1}}}
     
     
    
     \frac {\partial (out_{h1} )}{\partial (net_{h1})} =  out_{h1} *(1-out_{h1}) = 0.2413
    

    再计算$ \frac {\partial (net_{h1} )}{\partial (w_{1})} $

     net_{h1} =  w_1 * i_1 + w_2 * i_2 + b_1 * 1
     
      \frac {\partial (net_{h1} )}{\partial (w_{1})} = i_1 =0.05
      
    

    最后,三者相乘

     \frac {\partial (E_{total} )}{\partial (w_{1})} = \frac {\partial (E_{total} )}{\partial (out_{h1})} * \frac {\partial (out_{h1} )}{\partial (net_{h1})} * \frac {\partial (net_{h1} )}{\partial (w_{1})}
     
    
      \frac {\partial (E_{total} )}{\partial (w_{1})} =  0.036 * 0.2413 * 0.05 = 0.000438
     
    

    我们更新$w_1$的值

    
    w_1^+ = w_1 - \eta * \frac {\partial (E_{total} )}{\partial (w_{1})} = 0.1498
    
    

    同理,更新 $w_2$,$w_3$,$w_4$

    w_2^+ = 0.1996
    
    w_3^+ = 0.2498
    
    w_4^+ = 0.2995
    

    这样误差反向传播法就完成了,最后我们再把更新的权值重新计算,不停地迭代.

    完整代码( PC 端查看): http://www.momodel.cn:8899/#/explore/5b84e0098fe30b727acaa360?type=app

    —————————————————————————————————————————————————————————————————————— Mo (网址:momodel.cn )是一个支持 Python 的人工智能在线建模平台,能帮助你快速开发训练并部署 AI 应用。期待你的加入。

    3 条回复    2019-04-28 09:46:21 +08:00
    nical
        1
    nical  
       2019-01-21 19:23:54 +08:00
    厉害了,很有帮助
    MoModel
        2
    MoModel  
    OP
       2019-01-21 20:20:33 +08:00
    @nical 不好意思很多公式都乱码了,请直接用 PC 端打开 http://www.momodel.cn:8899/#/explore/5b84e0098fe30b727acaa360?type=app 查看源码
    MoModel
        3
    MoModel  
    OP
       2019-04-28 09:46:21 +08:00
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2552 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 24ms · UTC 06:06 · PVG 14:06 · LAX 22:06 · JFK 01:06
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.