然后, 训练过程将有三个步骤, 计算前向传递, 然后后向传递, 最后更新权重。这里关键的一点是把更新权重放在最后, 因为权重可以在多个层中重用,我们更希望在需要的时候再更新它。
- class Layer:
- def __init__(self):
- self.parameters = []
-
- def forward(self, X):
- """
- Override me! A simple no-op layer, it passes forward the inputs
- """
- return X, lambda D: D
-
- def build_param(self, tensor):
- """
- Creates a parameter from a tensor, and saves a reference for the update step
- """
- param = Parameter(tensor)
- self.parameters.append(param)
- return param
-
- def update(self, optimizer):
- for param in self.parameters: optimizer.update(param)
标准的做法是将更新参数的工作交给优化器, 优化器在每一批(batch)后都会接收参数的实例。最简单和最广为人知的优化方法是mini-batch随机梯度下降。
- class SGDOptimizer():
- def __init__(self, lr=0.1):
- self.lr = lr
-
- def update(self, param):
- param.tensor -= self.lr * param.gradient
- param.gradient.fill(0)
在此框架下, 并使用前面计算的结果后, 线性层如下所示:
- class Linear(Layer):
- def __init__(self, inputs, outputs):
- super().__init__()
- tensor = np.random.randn(inputs, outputs) * np.sqrt(1 / inputs)
- selfself.weights = self.build_param(tensor)
- selfself.bias = self.build_param(np.zeros(outputs))
-
- def forward(self, X):
- def backward(D):
- self.weights.gradient += X.T @ D
- self.bias.gradient += D.sum(axis=0)
- return D @ self.weights.tensor.T
- return X @ self.weights.tensor + self.bias.tensor, backward
(编辑:晋中站长网)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|