直觉上,网络越深应该越强。但实验发现了一个反直觉的现象:56层网络的训练误差比20层网络还高。
这不是过拟合(训练误差也高),而是退化问题(Degradation):更深的网络反而更难优化。理论上,深层网络至少应该和浅层网络一样好(多余的层学恒等映射即可),但SGD很难学出恒等映射。
何恺明的天才洞察:与其让网络学习H(x),不如让它学习F(x) = H(x) - x
# 普通网络:直接学习映射
H(x) = 目标映射
# 残差网络:学习残差
F(x) = H(x) - x → 即 H(x) = F(x) + x
如果某一层只需要做恒等映射,残差网络只需学F(x)=0,这比学H(x)=x容易得多!
输入x
→ Conv → BN → ReLU → Conv → BN
→ + x(跳跃连接/Shortcut Connection)
→ ReLU
→ 输出
关键:跳跃连接不增加参数,不增加计算量,只做恒等映射(element-wise addition)。
| 网络 | 层数 | Top-5错误率 |
|---|---|---|
| VGG-19 | 19 | 7.32% |
| Plain-34 | 34 | 10.02%(退化!) |
| ResNet-34 | 34 | 5.71% |
| ResNet-101 | 101 | 4.60% |
| ResNet-152 | 152 | 3.57% |
ResNet-152比VGG-19深8倍,但复杂度反而更低!ResNet横扫ILSVRC 2015全部5项冠军。
残差连接的影响远超计算机视觉:
可以说,没有残差连接,就不可能有现代深度学习。
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers — 8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task.
评论区