3. 反向传播

• 加法节点的反向传播，将上游的值原封不动的输出到下游，因为加法节点的导数为1.
• 乘法的反向传播会乘以输入信号的翻转值.

4. 简单层的实现

4.1 乘法层的实现

``````# 一个乘法节点，作为一个类的对象
class MulLayer:
def __init__(self):
self.x = None
self.y = None

# 进行一次前向传播后，对象的x和y被赋值，即输入信号，输出为结果，只有一个值
def forward(self,x,y):
self.x = x
self.y = y
out = x * y

return out

# 反向的话，对应两个输出,dout是上一级的导数
def backward(self,dout):
dx = dout * self.y
dy = dout * self.x

return dx,dy
``````
• 前向传播,前一层前向传播的结果，作为下一层前向传播的输入信号
``````apple = 100
apple_num = 2
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple,apple_num)
price = mul_tax_layer.forward(apple_price,tax)

print("forward price:",price)
``````

• 反向传播, tax和apple的乘法节点中，经过前向传播，节点对象中有x和y，分别为该节点的输入信号。
``````# backward
dprice = 1
dapple_price,dtax = mul_tax_layer.backward(dprice)
dapple,dapple_num = mul_apple_layer.backward(dapple_price)

print("dapple:",dapple,",dapple_num:",dapple_num,",dtax:",dtax)
``````

4.2 加法层的实现

``````class AddLayer:
# 因为加法的导数，不需要输入信号，即与其他的输入信号无关，故不用保存
def __init__(self):
pass

# 直接相加
def forward(self,x,y):
out = x + y
return out

def backward(self,dout):
dx = dout * 1
dy = dout * 1
return dx,dy
``````
``````apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple,apple_num)
orange_price = mul_orange_layer.forward(orange,orange_num)
price = mul_tax_layer.forward(all_price,tax)

# backward
dprice = 1
dall_price,dtax = mul_tax_layer.backward(dprice)
dorange, dorange_num = mul_orange_layer.backward(dorange_price)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

# 前向传播结果
print(price)

# 反向传播结果，对应5个输入信号的导数
print(dapple_num, dapple, dorange, dorange_num, dtax) # 110 2.2 3.3 165 650
``````

``````715.0000000000001
110.00000000000001 2.2 3.3000000000000003 165.0 650
``````

5. 激活函数层的实现

5.1 ReLU(Rectified Linear Unit)

• x>0时，y=x，对应y关于x的导数为1
• x<=0时，y=0，对应y关于x的导数为0

• 如果正向传播时输入x大于0，则反向传播会将上游的值原封不动传给下游
• 如果正向传播时x小于等于0，则反向传播中传给下游的信号停在此处

``````class Relu:
def __init__(self):

def forward(self,x):
out = x.copy()

return out

def backward(self,dout):
dx = dout

return dx

``````

``````import numpy as np
x = np.array([[1.0,-0.5],[-2,3]])
print("before-x:",x)

print("after-x:",x)
``````

``````before-x: [[ 1.  -0.5]
[-2.   3. ]]
[ True False]]
after-x: [[ 1.  0.]
[ 0.  3.]]
``````

``````image = mpimg.imread('./L10/test6.jpg')
thresh = (180, 255)
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
binary = np.zeros_like(gray)
binary[(gray > thresh[0]) & (gray <= thresh[1])] = 1
``````

ReLU层的作用就像电路中的开关，正向传播时，有电流通过就将开关设为ON，没有电流就设为OFF。 反向传播时，开关为ON的话，电流直接通过，开关为OFF的话，不会有电流通过。

5.2 Sigmoid层

Sigmoid函数表达式为 y=f(x)= 1/(1+exp(-x))，用计算图表达式，则如下图:

``````class Sigmoid:
def __init__(self):
self.out = None

def forward(self,x):
out = 1 / (1+np.exp(-x))
self.out = out

return out

def backward(self,dout):
dx = dout * (1-self.out) * self.out
return dx
``````

6. Affine-Softmax层的实现

6.1 Affine层

``````X = np.random.rand(2)
W = np.random.rand(2,3)
B = np.random.rand(3)

Y = np.dot(X,W) + B

print("■ X.shape",X.shape)
print("X:",X)

print("■ W.shape",W.shape)
print("W:",W)

print("■ B.shape",B.shape)
print("B:",B)

print("■ Y.shape",Y.shape)
print("Y:",Y)
``````
``````■ X.shape (2,)
X: [ 0.86921374  0.91685817]
■ W.shape (2, 3)
W: [[ 0.0615422   0.62958261  0.49322879]
[ 0.81998697  0.87313284  0.40750217]]
■ B.shape (3,)
B: [ 0.42251695  0.66899528  0.16928182]
■ Y.shape (3,)
Y: [ 1.22782203  2.01677612  0.97162475]
``````

6.2 批版本的Affine层

``````X = np.random.rand(4,2)
W = np.random.rand(2,3)
B = np.random.rand(3)

Y = np.dot(X,W) + B

print("■ X.shape",X.shape)
print("X:",X)

print("■ W.shape",W.shape)
print("W:",W)

print("■ B.shape",B.shape)
print("B:",B)

print("■ Y.shape",Y.shape)
print("Y:",Y)
``````
``````■ X.shape (4, 2)
X: [[ 0.18474588  0.11781004]
[ 0.09969628  0.22527226]
[ 0.18807723  0.11648619]
[ 0.27385227  0.97771642]]
■ W.shape (2, 3)
W: [[ 0.73291508  0.98707711  0.78022863]
[ 0.04959705  0.95326781  0.10378382]]
■ B.shape (3,)
B: [ 0.94428108  0.32814772  0.90723177]
■ Y.shape (4, 3)
Y: [[ 1.08552715  0.62281067  1.06360257]
[ 1.02852282  0.64130042  1.00839727]
[ 1.08790309  0.62483698  1.06606439]
[ 1.19348339  1.53048662  1.22237029]]
``````

``````x_dot_W = np.array([[0,0,0],[10,10,10]])
B = np.array([1,2,3])

print(x_dot_W + B)
``````

``````[[ 1  2  3]
[11 12 13]]
``````

``````dY = np.array([[1,2,3],[4,5,6]])

dB = np.sum(dY,axis=0)
print(dB)

dB = np.sum(dY,axis=1)
print(dB)

dB = np.sum(dY)
print(dB)
``````

``````[5 7 9]
[ 6 15]
21
``````

6.3 softmax-with-loss层

• 推理：神经网络的推理通常不使用softmax层，因为推理只需要给出一个答案，因为此时只对得分最大值感兴趣，所以不需要softmax层。
• 学习：神经网络学习阶段需要softmax层。

1. softmax层将输入(a1,a2,a3)正规化，输出(y1,y2,y3).
2. CrossEntropyError层接收Softmax的输出(y1,y2,y3)和标签(t1,t2,t3)，从这些数据中输出损失L

Softmax-with-Loss的实现代码如下：

``````class SoftmaxWithLoss:
def __init__(self):
self.loss = None # 损失
self.y = None # softmax输出
self.t = None # 监督数据

def forward(self,x,t):
self.t = t
self.y = softmax(x)

self.loss = cross_entropy_error(self.y,self.t)

return self.loss

def backward(self,dout=1):
batch_size = self.t.shape[0]
dx = (self.y - self.t) / batch_size

return dx
``````

``````def softmax(a):
exp_a = np.exp(a)
sum_exp_a = np.sum(exp_a)
y = exp_a / sum_exp_a

return y

def cross_entropy_error(y, t):
delta = 1e-7
return -np.sum(t * np.log(y + delta))
``````

7. 误差反向传播法的实现

1. mini-batch，从训练数据中随机选择一部分数据。
2. 计算梯度，计算损失函数关于各个权重参数的梯度。
3. 更新参数，将权重参数沿梯度方向进行微小的更新。
4. 重复上面的步骤。

1. params，保存在神经网络的参数的字典型变量，如权重和偏置
2. layers，保存神经网络各个层的有序型字典变量，比如：layers[‘Affine1’]、 layers[‘ReLu1’]、 layers[‘Affine2’]
3. lastLayer，神经网络的最后一层，这里是SoftmaxWithLoss层
4. __init__()，进行初始化
5. predict(self, x)，识别处理，输入是图像数据
6. loss(self, x, t)，损失函数的计算，x是输入，t是正确标签
7. accuracy(self, x, t)，识别精度计算

7.1 TwoLayerNet的代码

``````import sys, os
sys.path.append(os.pardir)
import numpy as np
from common.layers import *
from collections import OrderedDict

class TwoLayerNet:
def __init__(self,input_size,hidden_size,output_size,
weight_init_std=0.01):
# 初始化权重
self.params = {}
self.params["W1"] = weight_init_std * \
np.random.randn(input_size,hidden_size)
self.params["b1"] = np.zeros(hidden_size)
self.params["W2"] = weight_init_std * \
np.random.randn(hidden_size,output_size)
self.params["b2"] = np.zeros(output_size)

# 生成层
self.layers = OrderedDict()
self.layers["Affine1"] = \
Affine(self.params["W1"],self.params["b1"])
self.layers["Relu1"] = Relu()
self.layers["Affine2"] = \
Affine(self.params["W2"],self.params["b2"])

self.lastLayer = SoftmaxWithLoss()

def predict(self,x):
for layer in self.layers.values():
x = layer.forward(x)

return x

def loss(self,x,t):
y = self.predict(x)
return self.lastLayer.forward(y,t)

def accuracy(self,x,t):
y = self.predict(x)
y = np.argmax(y,axis=1)

if t.ndim != 1:
t = np.argmax(t,axis=1)

accuracy = np.sum(y == t) / float(x.shape[0])

return accuracy

loss_W = lambda W:self.loss(x,t)

# forward
self.loss(x,t)

# backward
dout = 1
dout = self.lastLayer.backward(dout)

layers = list(self.layers.values())
layers.reverse()
for layer in layers:
dout = layer.backward(dout)

# 设定

``````

``````        # 初始化权重
self.params = {}
self.params["W1"] = weight_init_std * \
np.random.randn(input_size,hidden_size)
self.params["b1"] = np.zeros(hidden_size)
self.params["W2"] = weight_init_std * \
np.random.randn(hidden_size,output_size)
self.params["b2"] = np.zeros(output_size)
``````

`````` # 生成层
self.layers = OrderedDict()
self.layers["Affine1"] = \
Affine(self.params["W1"],self.params["b1"])
self.layers["Relu1"] = Relu()
self.layers["Affine2"] = \
Affine(self.params["W2"],self.params["b2"])

self.lastLayer = SoftmaxWithLoss()
``````

``````    def predict(self,x):
for layer in self.layers.values():
x = layer.forward(x)

return x
``````

``````    def loss(self,x,t):
y = self.predict(x)
return self.lastLayer.forward(y,t)
``````

``````    def accuracy(self,x,t):
y = self.predict(x)
y = np.argmax(y,axis=1)

if t.ndim != 1:
t = np.argmax(t,axis=1)

accuracy = np.sum(y == t) / float(x.shape[0])

return accuracy
``````

``````    def numerical_gradient(self,x,t):
loss_W = lambda W:self.loss(x,t)

``````

``````def numerical_gradient(f, x):
h = 1e-4 # 0.0001

while not it.finished:
idx = it.multi_index
tmp_val = x[idx]
x[idx] = float(tmp_val) + h
fxh1 = f(x) # f(x+h)

x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
grad[idx] = (fxh1 - fxh2) / (2*h)

x[idx] = tmp_val # 还原值
it.iternext()

``````

``````    def gradient(self,x,t):
# forward
self.loss(x,t)

# backward
dout = 1
dout = self.lastLayer.backward(dout)

layers = list(self.layers.values())
# 反向传播只需要按照相反的顺序调用各层
layers.reverse()
for layer in layers:
dout = layer.backward(dout)

# 设定

``````

7.2 误差反向传播法的梯度确认

``````import sys, os
sys.path.append(os.pardir)
import numpy as np
#from two_layer_net import TwoLayerNet

# 读入数据
(x_train,t_train),(x_test,t_test) = \

network = TwoLayerNet(input_size=784,hidden_size=50,output_size=10)

x_batch = x_train[:3]
t_batch = t_train[:3]

print(key + ":" + str(diff))

``````
``````b1:5.59066227433e-06
b2:1.39361826149e-07
W2:5.29915498019e-09
W1:5.46059483329e-07
``````

7.3 使用误差反向传播法的学习

``````import sys, os
sys.path.append(os.pardir)
import numpy as np
import matplotlib.pyplot as plt
#from two_layer_net import TwoLayerNet

# 读入数据
(x_train,t_train),(x_test,t_test) = \

network = TwoLayerNet(input_size=784,hidden_size=50,output_size=10)

iters_num = 10000
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1
train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):

# 通过误差反向传播法求梯度

# 更新
for key in ("W1","b1","W2","b2"):

loss = network.loss(x_batch,t_batch)
train_loss_list.append(loss)

if i % iter_per_epoch == 0:
train_acc = network.accuracy(x_train, t_train)
test_acc = network.accuracy(x_test, t_test)
train_acc_list.append(train_acc)
test_acc_list.append(test_acc)
print(train_acc,",",test_acc)

# 绘制图形
markers = {'train': 'o', 'test': 's'}
x = np.arange(len(train_acc_list))
plt.plot(x, train_acc_list, label='train acc')
plt.plot(x, test_acc_list, label='test acc', linestyle='--')
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.ylim(0, 1.0)
plt.legend(loc='lower right')
plt.show()
``````
``````0.1043 , 0.1041
0.904633333333 , 0.9079
0.921 , 0.9236
0.9321 , 0.9338
0.9436 , 0.9426
0.95025 , 0.9494
0.956133333333 , 0.9531
0.960166666667 , 0.9564
0.9638 , 0.959
0.965933333333 , 0.9607
0.9682 , 0.9619
0.970266666667 , 0.9621
0.97075 , 0.9641
0.973583333333 , 0.9669
0.974083333333 , 0.9663
0.975666666667 , 0.9664
0.9777 , 0.9683
``````

8. 小结

1. 计算图的正向传播进行一般的计算(如预测值，softmax值)，反向传播可以计算各个节点的导数。
2. 通过将神经网络的组成元素实现为层，可以高效地计算梯度。
3. 通过比较数值微分和误差反向传播法的结果，可以确认误差反向传播法的实现是否正确。(梯度确认)