ResNet论文+复现

ResNet(Residual Network)由何恺明等人于2015年提出(论文《Deep Residual Learning for Image Recognition》),是深度学习领域里程碑式的论文。其核心思想是残差连接,解决了训练极深神经网络时遇到的退化问题

这是ResNet论文的地址:Deep Residual Learning for Image Recognition

一、论文核心总结

  1. 核心问题:退化问题

    • 传统的深度神经网络(如VGG)随着层数的增加(例如超过20层),在训练集和测试集上的性能反而会下降。
    • 不是由过拟合引起的(因为训练误差也升高了),也不是由梯度消失/爆炸引起的(通过BN层等技术基本解决了)。
    • 这表明更深层的网络更难优化,简单地堆叠更多层并不能自动带来性能提升。
  2. 核心解决方案:残差学习

    • 不再让网络层直接学习目标函数 H(x),而是学习残差函数 F(x) = H(x) - x

    • 目标函数因此变为 H(x) = F(x) + x

    • 关键结构:残差块

      • 输入 x 直接通过一个“快捷连接”传递到输出附近。
      • 堆叠的几层非线性层学习残差映射 F(x)
      • 块的输出是 F(x) + x

      残差连接示意图

      1
      2
      3
      4
      5
      6
      7
      8
      9
      # 伪代码表示核心残差块
      def ResidualBlock(x):
      identity = x # 保存原始输入 (快捷连接)
      out = Conv2D(x) # 一些卷积层、BN层、激活函数等
      out = ReLU(BN(Conv2D(out)))
      out = ... # 可能有多层
      out = out + identity # 核心操作:残差连接 (F(x) + x)
      out = ReLU(out) # 最后的激活函数
      return out
  3. 残差学习的优势

    • 解决退化问题: 如果恒等映射 x 是最优的(即更深层什么也不做最好),学习 F(x) = 0 比学习 F(x) = x 要容易得多。残差块可以轻松地学习到恒等映射(将权重推向0即可),避免了性能下降。
    • 缓解梯度消失: 梯度可以通过快捷连接几乎无损地直接回传到更浅的层,极大地改善了反向传播的效率,使训练数百甚至上千层的网络成为可能。
    • 提高优化效率: 残差函数通常是小幅度的扰动,相对于学习完整的映射更容易优化。
    • 促进信息流动: 快捷连接提供了信息传递的高速公路。
  4. 关键技术点

    • 快捷连接: 核心是恒等映射 (y = x + F(x))。当输入输出维度不匹配时(例如下采样时),论文采用:
      • F(x) 路径上用步长为2的卷积进行下采样。
      • 在快捷连接上用步长为2的1x1卷积进行下采样和通道数调整(projection shortcut)。
    • 瓶颈设计: 为了降低计算复杂度,在深层网络中使用了“瓶颈”残差块(1x1 Conv 降维 -> 3x3 Conv -> 1x1 Conv 升维)。这大大减少了参数数量和计算量。
    • 后激活: 原始论文中,残差块内采用“后激活”模式(Conv -> BN -> ReLU),最后的相加操作后还有一个ReLU。
  5. 主要贡献与影响

    • 彻底解决了深度网络的退化问题: 成功训练了高达152层(ImageNet)甚至1000层(CIFAR-10)的网络。
    • 显著提升性能: 在ImageNet分类、COCO目标检测等多项计算机视觉基准任务上取得了当时最优结果。ResNet-152在ImageNet top-5错误率降至3.57%,首次超越人类水平(约5%)。
    • 成为基础架构: ResNet及其变种(如ResNeXt, Wide ResNet, ResNet in ResNet, DenseNet等)成为计算机视觉乃至其他深度学习领域最广泛使用和构建的基础网络架构之一。
    • 启发性强: “残差学习”的思想被广泛应用到各种网络结构(如Transformer中的残差连接)和任务中。

一句话总结核心:
ResNet 通过引入残差块输出 = F(x) + x),利用快捷连接让网络层专注于学习输入与期望输出之间的残差F(x),而非完整的映射,有效解决了极深神经网络的退化问题,使训练成百上千层的网络成为可能,并大幅提升了模型性能。

其简单、通用且极其有效的设计,使其成为深度学习发展史上最重要的基石之一。

二、复现(CIFAR10数据集)

1 残差连接块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import torch
import torch.nn as nn
import torch.nn.functional as F

class ResidualBlock(nn.Module):
def __init__(self, in_channel, out_channel, stride=1):
super(ResidualBlock, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1, bias=False),
nn.BatchNorm2d(out_channel),
nn.ReLU(inplace=True),

nn.Conv2d(out_channel, out_channel, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(out_channel)
)
self.shortcut = nn.Sequential()
# 虚线部分需要进行下采样
if stride != 1 or in_channel != out_channel:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channel)
)

def forward(self, x):
out = self.features(x)
out += self.shortcut(x)
out = F.relu(out)
return out

2 瓶颈连接块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class BottleNeck(nn.Module):
def __init__(self, in_channel, out_channel, stride=1):
super(BottleNeck, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(in_channel, int(out_channel / 4), kernel_size=1, bias=False),
nn.BatchNorm2d(int(out_channel / 4)),
nn.ReLU(inplace=True),

nn.Conv2d(int(out_channel / 4), int(out_channel / 4), kernel_size=3, stride=stride, padding=1, bias=False),
nn.BatchNorm2d(int(out_channel / 4)),
nn.ReLU(inplace=True),

nn.Conv2d(int(out_channel / 4), out_channel, kernel_size=1, bias=False),
nn.BatchNorm2d(out_channel)
)
self.shortcut = nn.Sequential()
# 虚线部分需要进行下采样
if stride != 1 or in_channel != out_channel:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channel)
)

def forward(self, x):
out = self.features(x)
out += self.shortcut(x)
out = F.relu(out)
return out

3 残差网络

残差网络复现主要是根据下图进行

层次参数图

18层残差网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class ResNet18(nn.Module):
def __init__(self, ResidualBlock, config):
super(ResNet18, self).__init__()
self._config = config
self.in_channel = 64

self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True)
)

self.layer1 = self.make_layer(ResidualBlock, 64, 2, stride=1)
self.layer2 = self.make_layer(ResidualBlock, 128, 2, stride=2)
self.layer3 = self.make_layer(ResidualBlock, 256, 2, stride=2)
self.layer4 = self.make_layer(ResidualBlock, 512, 2, stride=2)
self.linear = nn.Linear(512, self._config['num_classes'])

def make_layer(self, block, out_channel, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1)
layers = []
for stride in strides:
layers.append(block(self.in_channel, out_channel, stride))
self.in_channel = out_channel
return nn.Sequential(*layers)

def forward(self, x):
out = self.conv1(x) # 64*32*32
out = self.layer1(out) # 64*32*32
out = self.layer2(out) # 128*16*16
out = self.layer3(out) # 256*8*8
out = self.layer4(out) # 512*4*4
out = F.avg_pool2d(out, 4) # 512*1*1
out = out.view(out.size(0), -1) # 512
out = self.linear(out)
return out

# 2、模型保存与模型加载
def save_model(self):
torch.save(self.state_dict(), self._config['model_name'])

# map_location 参数用于指定加载模型时使用的设备(如 CPU 或特定的 GPU)
def load_model(self, map_location):
state_dict = torch.load(self._config['model_name'], map_location=map_location)
self.load_state_dict(state_dict, strict=False)

34层残差网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class ResNet34(nn.Module):
def __init__(self, ResidualBlock, config):
super(ResNet34, self).__init__()
self._config = config
self.in_channel = 64

self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(),
)

self.layer1 = self.make_layer(ResidualBlock, 64, 3, stride=1)
self.layer2 = self.make_layer(ResidualBlock, 128, 4, stride=2)
self.layer3 = self.make_layer(ResidualBlock, 256, 6, stride=2)
self.layer4 = self.make_layer(ResidualBlock, 512, 3, stride=2)
self.linear = nn.Linear(512, self._config['num_classes'])

def make_layer(self, block, out_channel, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1) # strides=[1,1]
layers = []
for stride in strides:
layers.append(block(self.in_channel, out_channel, stride))
self.in_channel = out_channel
return nn.Sequential(*layers)

def forward(self, x): # 3*32*32
out = self.conv1(x) # 64*32*32
out = self.layer1(out) # 64*32*32
out = self.layer2(out) # 128*16*16
out = self.layer3(out) # 256*8*8
out = self.layer4(out) # 512*4*4
out = F.avg_pool2d(out, 4) # 512*1*1
out = out.view(out.size(0), -1) # 512
out = self.linear(out)
return out

# 2、模型保存与模型加载
def save_model(self):
torch.save(self.state_dict(), self._config['model_name'])

# map_location 参数用于指定加载模型时使用的设备(如 CPU 或特定的 GPU)
def load_model(self, map_location):
state_dict = torch.load(self._config['model_name'], map_location=map_location)
self.load_state_dict(state_dict, strict=False)

50层残差网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class ResNet50(nn.Module):
def __init__(self, ResidualBlock, config):
super(ResNet50, self).__init__()
self._config = config
self.in_channel = 64

self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(),
)

self.layer1 = self.make_layer(BottleNeck, 256, 3, stride=1)
self.layer2 = self.make_layer(BottleNeck, 512, 4, stride=2)
self.layer3 = self.make_layer(BottleNeck, 1024, 6, stride=2)
self.layer4 = self.make_layer(BottleNeck, 2048, 3, stride=2)
self.linear = nn.Linear(2048, self._config['num_classes'])

def make_layer(self, block, out_channel, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1) # strides=[1,1]
layers = []
for stride in strides:
layers.append(block(self.in_channel, out_channel, stride))
self.in_channel = out_channel
return nn.Sequential(*layers)

def forward(self, x):
out = self.conv1(x)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = F.avg_pool2d(out, 4)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out

# 2、模型保存与模型加载
def save_model(self):
torch.save(self.state_dict(), self._config['model_name'])

# map_location 参数用于指定加载模型时使用的设备(如 CPU 或特定的 GPU)
def load_model(self, map_location):
state_dict = torch.load(self._config['model_name'], map_location=map_location)
self.load_state_dict(state_dict, strict=False)

101层残差网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

class ResNet101(nn.Module):
def __init__(self, ResidualBlock, config):
super(ResNet101, self).__init__()
self._config = config
self.in_channel = 64

self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(),
)

self.layer1 = self.make_layer(BottleNeck, 256, 3, stride=1)
self.layer2 = self.make_layer(BottleNeck, 512, 4, stride=2)
self.layer3 = self.make_layer(BottleNeck, 1024, 23, stride=2)
self.layer4 = self.make_layer(BottleNeck, 2048, 3, stride=2)
self.linear = nn.Linear(2048, self._config['num_classes'])

def make_layer(self, block, out_channel, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1) # strides=[1,1]
layers = []
for stride in strides:
layers.append(block(self.in_channel, out_channel, stride))
self.in_channel = out_channel
return nn.Sequential(*layers)

def forward(self, x):
out = self.conv1(x)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = F.avg_pool2d(out, 4)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out

# 2、模型保存与模型加载
def save_model(self):
torch.save(self.state_dict(), self._config['model_name'])

# map_location 参数用于指定加载模型时使用的设备(如 CPU 或特定的 GPU)
def load_model(self, map_location):
state_dict = torch.load(self._config['model_name'], map_location=map_location)
self.load_state_dict(state_dict, strict=False)

152层残差网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class ResNet152(nn.Module):
def __init__(self, ResidualBlock, config):
super(ResNet152, self).__init__()
self._config = config
self.in_channel = 64

self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
nn.BatchNorm2d(64),
nn.ReLU(),
)

self.layer1 = self.make_layer(BottleNeck, 256, 3, stride=1)
self.layer2 = self.make_layer(BottleNeck, 512, 8, stride=2)
self.layer3 = self.make_layer(BottleNeck, 1024, 36, stride=2)
self.layer4 = self.make_layer(BottleNeck, 2048, 3, stride=2)
self.linear = nn.Linear(2048, self._config['num_classes'])

def make_layer(self, block, out_channel, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1) # strides=[1,1]
layers = []
for stride in strides:
layers.append(block(self.in_channel, out_channel, stride))
self.in_channel = out_channel
return nn.Sequential(*layers)

def forward(self, x):
out = self.conv1(x)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = F.avg_pool2d(out, 4)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out

# 2、模型保存与模型加载
def save_model(self):
torch.save(self.state_dict(), self._config['model_name'])

# map_location 参数用于指定加载模型时使用的设备(如 CPU 或特定的 GPU)
def load_model(self, map_location):
state_dict = torch.load(self._config['model_name'], map_location=map_location)
self.load_state_dict(state_dict, strict=False)

4 数据集预处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader


# 定义构造数据加载器的函数
def Construct_DataLoader(dataset, batch_size):
return DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True)


# 图像预处理
train_transform = transforms.Compose([
# transforms.Resize(96),
# transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),

transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

test_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])


# 加载数据集函数
def LoadCIFAR10(download=False):
# 加载数据集
train_dataset = torchvision.datasets.CIFAR10(root='../data', train=True, transform=train_transform, download=True)
test_dataset = torchvision.datasets.CIFAR10(root='../data', train=False, transform=test_transform)
return train_dataset, test_dataset

5 定义训练函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

from torch.autograd import Variable


class Train(object):
# 初始化函数、配置参数、优化器和损失函数
def __init__(self, model, config):
self._model = model
self._config = config
self._optimizer = torch.optim.SGD(self._model.parameters(), lr=self._config['lr'],
momentum=self._config['momentum'], weight_decay=self._config['weight_decay'])
# 定义并初始化一个损失函数,用于在训练过程中计算模型的预测输出与真实标签之间的差异。
self.loss_func = nn.CrossEntropyLoss()

def _train_single_batch(self, images, labels):
y_predict = self._model(images)

loss = self.loss_func(y_predict, labels)
# 先将梯度清零,如果不清零,那么这个梯度就和上一个mini-batch有关
self._optimizer.zero_grad()
# 反向传播计算梯度
loss.backward()
# 梯度下降等优化器 更新参数
self._optimizer.step()
# 将loss的值提取成python的float类型
loss = loss.item()

# 计算训练精确度
# 这里的y_predict是一个多个分类输出,将dim指定为1,即返回每一个分类输出最大的值以及下标
_, predicted = torch.max(y_predict.data, dim=1)
return loss, predicted

def _train_an_epoch(self, train_loader, epoch):
"""
训练一个Epoch,即将训练集中的所有样本全部都过一遍
"""
# 设置模型为训练模式,启用dropout以及batch normalization
self._model.train()
total = 0
correct = 0

# 从DataLoader中获取小批量的num以及数据
for batch, (images, labels) in enumerate(train_loader):
images = Variable(images)
labels = Variable(labels)
if self._config['use_cuda'] is True:
images, labels = images.cuda(), labels.cuda()

loss, predicted = self._train_single_batch(images, labels)

# 计算训练精确度
total += labels.size(0)
correct += (predicted == labels.data).sum()

# print('[Training Epoch: {}] Batch: {}, Loss: {}'.format(epoch_num, batch, loss))
print('Training Epoch: {}, accuracy rate: {}%%'.format(epoch + 1, correct / total * 100.0))

def train(self, train_dataset):
# 是否使用GPU加速
self.use_cuda()
for epoch in range(self._config['epoch']):
print('-' * 20 + ' Epoch {} starts '.format(epoch) + '-' * 20)
# 构造DataLoader
data_loader = DataLoader(dataset=train_dataset, batch_size=self._config['batch_size'], shuffle=True)
# 训练一个轮次
self._train_an_epoch(data_loader, epoch=epoch)

# 用于将模型和数据迁移到GPU上进行计算,如果CUDA不可用则会抛出异常
def use_cuda(self):
if self._config['use_cuda'] is True:
assert torch.cuda.is_available(), 'CUDA is not available'
torch.cuda.set_device(self._config['device_id'])
self._model.cuda()

# 保存训练好的模型
def save(self):
self._model.save_model()

6 训练

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

resnet_config = {
'epoch': 15, # 训练轮次数
'batch_size': 64, # 每个小批量训练的样本数量
'lr': 1e-3, # 学习率
'num_classes': 10, # 分类的类别数目
'device_id': 0, # 使用的GPU设备的ID号
'use_cuda': True, # 是否使用CUDA加速
'momentum': 0.9, # 动量
'weight_decay': 1e-4, # 权重衰减
'model_name': './resnet18.model' # 保存模型的文件名
}

if __name__ == "__main__":
####################################################################################
# AlexNet 模型
####################################################################################
train_dataset, test_dataset = LoadCIFAR10(True)
# define AlexNet model
resnet = ResNet18(ResidualBlock, resnet_config)

####################################################################################
# 模型训练阶段
####################################################################################
# # 实例化模型训练器
trainer = Train(model=resnet, config=resnet_config)
# # 训练
trainer.train(train_dataset)
# # 保存模型
trainer.save()

####################################################################################
# 模型测试阶段
####################################################################################
resnet.eval()
resnet.load_model(map_location=torch.device('cpu'))
if resnet_config['use_cuda']:
resnet = resnet.cuda()

correct = 0
total = 0
# 对测试集中的每个样本进行预测,并计算出预测的精度
for images, labels in Construct_DataLoader(test_dataset, resnet_config['batch_size']):
images = Variable(images)
labels = Variable(labels)
if resnet_config['use_cuda']:
images = images.cuda()
labels = labels.cuda()

y_pred = resnet(images)
_, predicted = torch.max(y_pred.data, 1)
total += labels.size(0)
temp = (predicted == labels.data).sum()
correct += temp
print('Accuracy of the model on the test images: %.2f%%' % (100.0 * correct / total))

ResNet论文+复现
http://example.com/2025/07/25/ResNet论文/
作者
Alaskaboo
发布于
2025年7月25日
更新于
2025年7月25日
许可协议