faster_pytorch训练流程代码详解

faster rcnn训练流程的代码详解,请参照faster_pytorch框架食用。

Faster RCNN训练流程

1.加载数据和配置文件和网络

1
#load data,cfg,net

2.设置优化器

1
optimizer = torch.optim.Adam(params[-8:], lr=lr)

3.训练初始化

1
2
3
4
5
6
7
8
9
10
11
12
train_loss = 0
tp, tf, fg, bg = 0., 0., 0, 0
step_cnt = 0
for step in range(start_step, end_step+1):#0,100000
{
# get one batch
blobs = data_layer.forward()
im_data = blobs['data']
im_info = blobs['im_info']
gt_boxes = blobs['gt_boxes']
gt_ishard = blobs['gt_ishard']
dontcare_areas = blobs['dontcare_areas']

4.训练

1
2
3
4
5
net(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)
#im_data:size=(1, 598, 1000, 3)
#im_info:[598. , 1000. , 1.28369701]
#gt_box:size=[2,5]
#gt_ishard:[0,0]

fastrcnn/cfg:

RPN:

1
2
3
4
5
6
7
8
9
10
# Max number of foreground examples
__C.TRAIN.RPN_FG_FRACTION = 0.5
# Total number of examples
__C.TRAIN.RPN_BATCHSIZE = 256
# Use RPN to detect objects
__C.TRAIN.HAS_RPN = True
# IOU >= thresh: positive example
__C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
# IOU < thresh: negative example
__C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3

Fast RCNN:

1
2
3
4
5
6
7
8
9
10
# Minibatch size (number of regions of interest [ROIs])
__C.TRAIN.BATCH_SIZE = 128

# Fraction of minibatch that is labeled foreground (i.e. class > 0)
__C.TRAIN.FG_FRACTION = 0.25

# Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH)
__C.TRAIN.FG_THRESH = 0.5
__C.TRAIN.BG_THRESH_HI = 0.5
__C.TRAIN.BG_THRESH_LO = 0.1

具体分为下面几步:

RPN

1
2
3
4
5
6
7
8
9
10
11
12
#计算特征
features = self.features(im_data)#feature:[1, 512, 12, 62])
#rpn sliding window
rpn_conv1 = self.conv1(features)#[1, 512, 12, 62]
#计算rpn分数和rpn box
rpn_cls_score = self.score_conv(rpn_conv1)#[1, 18, 12, 62])
rpn_cls_score_reshape = self.reshape_layer(rpn_cls_score, 2)#([1, 2, 108, 62])
#rpn_cls_score_reshape.size() :([1, 2, 108, 62])
rpn_cls_prob = F.softmax(rpn_cls_score_reshape)#([1, 2, 108, 62])
rpn_cls_prob_reshape = self.reshape_layer(rpn_cls_prob, len(self.anchor_scales)*3*2)#([1, 18, 12, 62])
# rpn boxes
rpn_bbox_pred = self.bbox_conv(rpn_conv1)#

proposal layer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
rois = self.proposal_layer
#1.产生anchor
_anchors = generate_anchors()

##将anchor移到每个点,产生w/16*h/16*9个anchor
anchors = _anchors.reshape((1, A, 4)) + \shifts.reshape((1, K, 4)).transpose((1, 0, 2))
#2.Generate proposals from bbox deltas and shifted anchors
proposals = bbox_transform_inv(anchors, bbox_deltas)
#3. clip predicted boxes to image
proposals = clip_boxes()
#4.remove predicted boxes with either height or width < threshold
keep = _filter_boxes(proposals, min_size * im_info[2])()
#4. sort all (proposal, score) pairs by score from highest to lowest
order = scores.ravel().argsort()[::-1]
#5. take top pre_nms_topN (e.g. 6000)
order = order[:pre_nms_topN]#6409boxes->6000boxes
proposals = proposals[order, :]#24000
scores = scores[order]
#6. apply nms (e.g. threshold = 0.7)
keep = nms(np.hstack((proposals, scores)), nms_thresh)#1344
# 7. take after_nms_topN (e.g. 300)
keep = keep[:post_nms_topN]#300 highest
proposals = proposals[keep, :]#1200
scores = scores[keep]
# 8. return the top proposals (-> RoIs top)

anchor target layer

产生anchors;

把anchor分配到ground truth上:

  1. 产生所有的基准anchor,即$w/16\times h/16\times9$.(62x37x9=20646)
  2. 将超出图像边界的anchor删除.(7858)
  3. 计算每个anchor对应的最大overlap的gt_boxes的下标,记录每个anchor对应的最大的overlap值=anchors数目
  4. 计算每个gt_boxes对应的最大overlap的anchor的下标,记录每个gt_boxes对应的最大的overlap值,有可能比gt_boxes数目多,因为可能有相同的overlap

给每个anchor打标签,是前景还是背景,还是不考虑的。

  1. bg:每个 anchor对应最大的overlap值小于阈值0.3;

    1
    labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

  2. fg:对应每个GT,有最大overlap的anchor;或者每个anchor与gt的最大overlap超过阈值0.7的anchor:

    1
    2
    3
    labels[gt_argmax_overlaps] = 1
    # fg label: above threshold IOU
    labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1

严格控制正负样本1:1(128,128)

边界框的回归目标

  1. 计算每一个anchor与重合度最高的那个ground truth的偏移值,即$t_x^*,t_y^*,t_w^*,t_h^{*}$,用于RPN层回归参数的学习。计算方法bbox_transform函数中:

    1
    2
    3
    4
    targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
    targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
    targets_dw = np.log(gt_widths / ex_widths)
    targets_dh = np.log(gt_heights / ex_heights)

  2. 将裁剪过的anchors还原成未裁剪的anchos。返回RPN的region的label和回归目标系数。

RPN损失

类别交叉熵损失和回归损失L1smooth损失。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def build_loss(self, rpn_cls_score_reshape, rpn_bbox_pred, rpn_data):
# classification loss
rpn_cls_score = rpn_cls_score_reshape.permute(0, 2, 3, 1).contiguous().view(-1, 2)
rpn_label = rpn_data[0].view(-1)

rpn_keep = Variable(rpn_label.data.ne(-1).nonzero().squeeze()).cuda()
rpn_cls_score = torch.index_select(rpn_cls_score, 0, rpn_keep)
rpn_label = torch.index_select(rpn_label, 0, rpn_keep)

fg_cnt = torch.sum(rpn_label.data.ne(0))

rpn_cross_entropy = F.cross_entropy(rpn_cls_score, rpn_label)

# box loss
rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = rpn_data[1:]
rpn_bbox_targets = torch.mul(rpn_bbox_targets, rpn_bbox_inside_weights)
rpn_bbox_pred = torch.mul(rpn_bbox_pred, rpn_bbox_inside_weights)

rpn_loss_box = F.smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, size_average=False) / (fg_cnt + 1e-4)

return rpn_cross_entropy, rpn_loss_box

proposal target layer:

proposal target layer加了gt box在roi里面,但是直接加的话,会导致损失为0,所以要先抖动(稍微处理一下)。

1.给RPN输出的proposal分配标签;

1
2
3
4
5
# Select foreground RoIs as those with >= FG_THRESH overlap#找到IOU大于等于0.5的ROI,获得其引索
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]

2.控制mini_batch里面的前景和背景ROI的数目

1
2
3
4
5
6
7
8
9
10
11
# Guard against the case when an image has fewer than fg_rois_per_image
# foreground RoIs #设定的每张图片fg最多为90=0.3*300
num_images = 1
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images#300
fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image))
fg_rois_per_this_image = min(fg_rois_per_image, fg_inds.size)#min(90,15)

# Compute number of background RoIs to take from this image (guarding
# against there being fewer than desired)
bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)

此处没有dontcare的样本

2.计算proposal和groundtruth boxes的偏移量,用于网络最后一层参数的学习。

1.计算每个ROI和最佳匹配的GT的回归系数$t_x,t_y,t_w,t_h$.维度是【300,5】

1
2
3
#传入值为rois的(x1,y1,x2,y2),对应最佳匹配GT的(x1,y1,x2,y2),对应的labels,#返回[标签,dx,dy,dw,dh],shape:(len(rois),5)[300,5]
bbox_target_data = _compute_targets(
rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)

即:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def _compute_targets(ex_rois, gt_rois, labels):
"""Compute bounding-box regression targets for an image."""

assert ex_rois.shape[0] == gt_rois.shape[0]
assert ex_rois.shape[1] == 4
assert gt_rois.shape[1] == 4

targets = bbox_transform(ex_rois, gt_rois)#[300,4]
if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
# Optionally normalize targets by a precomputed mean and stdev
targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
/ np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
return np.hstack(
(labels[:, np.newaxis], targets)).astype(np.float32, copy=False)

2.再把目标拓展到每个类上:

1
2
bbox_targets, bbox_inside_weights = \
_get_bbox_regression_labels(bbox_target_data, num_classes)

即:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def _get_bbox_regression_labels(bbox_target_data, num_classes):
"""Bounding-box regression targets (bbox_target_data) are stored in a
compact form N x (class, tx, ty, tw, th)

This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets).
拓展到300*4K*NUM_class上,即每个ROI将自己的回归系数写在对应类别处位置。
Returns:
bbox_target (ndarray): N x 4K blob of regression targets
bbox_inside_weights (ndarray): N x 4K blob of loss weights
"""

clss = bbox_target_data[:, 0]#300 bbox_target_data:[300,5]
bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)#[300,8]
bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
inds = np.where(clss > 0)[0]#positives:15
for ind in inds:
cls = int(clss[ind]) #得到第ind个roi的类别
start = 4 * cls#回归系数摆放的位置
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights

ROIPooling

池化:

1
2
3
self.roi_pool = RoIPool(7, 7, 1.0/16)
pooled_features = self.roi_pool(features, rois)#[300, 512, 7, 7]
x = pooled_features.view(pooled_features.size()[0], -1)#[300,25088]

计算classification 层损失

将pooling后的特征送入两个FC层后,产生预测的分数和边界框回归系数:

1
2


计算分类损失

1
2
3
4
def build_loss(self, cls_score, bbox_pred, roi_data):
# classification loss
label = roi_data[1].squeeze()#300
cross_entropy = F.cross_entropy(cls_score, label, weight=ce_weights)

为什么这里的交叉熵损失的计算为什么送入的是cls_score (softmax的输入),而不是cls_prob(softmax层的输出),参见附录A。

计算回归损失

1
2
3
4
5
6
# bounding box regression L1 loss
bbox_targets, bbox_inside_weights, bbox_outside_weights = roi_data[2:]
bbox_targets = torch.mul(bbox_targets, bbox_inside_weights)
bbox_pred = torch.mul(bbox_pred, bbox_inside_weights)

loss_box = F.smooth_l1_loss(bbox_pred, bbox_targets, size_average=False) / (fg_cnt + 1e-4)

计算总损失:

1
loss = net.loss + net.rpn.loss

Appendix:交叉熵损失的代码实现

pytorch中CrossEntropyLoss是通过两个步骤计算出来的,第一步是计算log softmax,第二步是计算cross entropy(或者说是negative log likehood),CrossEntropyLoss不需要在网络的最后一层添加softmax和log层,直接输出全连接层即可。而NLLLoss则需要在定义网络的时候在最后一层添加softmax和log层。

log softmax:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def log_softmax(input, dim=None, _stacklevel=3):
r"""Applies a softmax followed by a logarithm.

While mathematically equivalent to log(softmax(x)), doing these two
operations separately is slower, and numerically unstable. This function
uses an alternative formulation to compute the output and gradient correctly.

See :class:`~torch.nn.LogSoftmax` for more details.

Arguments:
input (Variable): input
dim (int): A dimension along which log_softmax will be computed.
"""
if dim is None:
dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
return torch._C._nn.log_softmax(input, dim)

NLL loss:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

def nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True):
r"""The negative log likelihood loss.

See :class:`~torch.nn.NLLLoss` for details.

Args:
input: :math:`(N, C)` where `C = number of classes` or :math:`(N, C, H, W)`
in case of 2D Loss, or :math:`(N, C, d_1, d_2, ..., d_K)` where :math:`K > 1`
in the case of K-dimensional loss.
target: :math:`(N)` where each value is `0 <= targets[i] <= C-1`,
or :math:`(N, C, d_1, d_2, ..., d_K)` where :math:`K >= 1` for
K-dimensional loss.
weight (Tensor, optional): a manual rescaling weight given to each
class. If given, has to be a Tensor of size `C`
size_average (bool, optional): By default, the losses are averaged
over observations for each minibatch. If size_average
is False, the losses are summed for each minibatch. Default: ``True``
ignore_index (int, optional): Specifies a target value that is ignored
and does not contribute to the input gradient. When size_average is
True, the loss is averaged over non-ignored targets. Default: -100

Example::

>>> # input is of size N x C = 3 x 5
>>> input = autograd.Variable(torch.randn(3, 5))
>>> # each element in target has to have 0 <= value < C
>>> target = autograd.Variable(torch.LongTensor([1, 0, 4]))
>>> output = F.nll_loss(F.log_softmax(input), target)
>>> output.backward()
"""
dim = input.dim()
if torch.is_tensor(weight):
weight = Variable(weight)
if dim == 2:
return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
elif dim == 4:
return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
elif dim == 3 or dim > 4:
n = input.size(0)
c = input.size(1)
out_size = (n,) + input.size()[2:]
if target.size()[1:] != input.size()[2:]:
raise ValueError('Expected target size {}, got {}'.format(
out_size, input.size()))
input = input.contiguous().view(n, c, 1, -1)
target = target.contiguous().view(n, 1, -1)
if reduce:
return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
out = torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
return out.view(out_size)
else:
raise ValueError('Expected 2 or more dimensions (got {})'.format(dim))