faster rcnn训练流程的代码详解，请参照faster_pytorch框架食用。

Faster RCNN训练流程

1.加载数据和配置文件和网络

1	#load data,cfg,net

2.设置优化器

1	optimizer = torch.optim.Adam(params[-8:], lr=lr)

3.训练初始化

train_loss = 0
tp, tf, fg, bg = 0., 0., 0, 0
step_cnt = 0
for step in range(start_step, end_step+1):#0,100000
{
    # get one batch
    blobs = data_layer.forward()
    im_data = blobs['data']
    im_info = blobs['im_info']
    gt_boxes = blobs['gt_boxes']
    gt_ishard = blobs['gt_ishard']
    dontcare_areas = blobs['dontcare_areas']

4.训练

net(im_data, im_info, gt_boxes, gt_ishard, dontcare_areas)
#im_data:size=(1, 598, 1000, 3)
#im_info:[598.        ,  1000.        ,     1.28369701]
#gt_box:size=[2,5]
#gt_ishard:[0,0]

fastrcnn/cfg:

RPN:

# Max number of foreground examples
__C.TRAIN.RPN_FG_FRACTION = 0.5
# Total number of examples
__C.TRAIN.RPN_BATCHSIZE = 256
# Use RPN to detect objects
__C.TRAIN.HAS_RPN = True
# IOU >= thresh: positive example
__C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
# IOU < thresh: negative example
__C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3

Fast RCNN:

# Minibatch size (number of regions of interest [ROIs])
__C.TRAIN.BATCH_SIZE = 128

# Fraction of minibatch that is labeled foreground (i.e. class > 0)
__C.TRAIN.FG_FRACTION = 0.25

# Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH)
__C.TRAIN.FG_THRESH = 0.5
__C.TRAIN.BG_THRESH_HI = 0.5
__C.TRAIN.BG_THRESH_LO = 0.1

具体分为下面几步：

RPN

#计算特征
features = self.features(im_data)#feature:[1, 512, 12, 62])
#rpn sliding window
rpn_conv1 = self.conv1(features)#[1, 512, 12, 62]
#计算rpn分数和rpn box
rpn_cls_score = self.score_conv(rpn_conv1)#[1, 18, 12, 62])
rpn_cls_score_reshape = self.reshape_layer(rpn_cls_score, 2)#([1, 2, 108, 62])
#rpn_cls_score_reshape.size() :([1, 2, 108, 62])
rpn_cls_prob = F.softmax(rpn_cls_score_reshape)#([1, 2, 108, 62])
rpn_cls_prob_reshape = self.reshape_layer(rpn_cls_prob, len(self.anchor_scales)*3*2)#([1, 18, 12, 62])
# rpn boxes
rpn_bbox_pred = self.bbox_conv(rpn_conv1)#

proposal layer

rois = self.proposal_layer
#1.产生anchor
 _anchors = generate_anchors()
    
    ##将anchor移到每个点，产生w/16*h/16*9个anchor
anchors = _anchors.reshape((1, A, 4)) + \shifts.reshape((1, K, 4)).transpose((1, 0, 2))
#2.Generate proposals from bbox deltas and shifted anchors
proposals = bbox_transform_inv(anchors, bbox_deltas)
#3. clip predicted boxes to image
 proposals = clip_boxes()
#4.remove predicted boxes with either height or width < threshold
keep = _filter_boxes(proposals, min_size * im_info[2])()
#4. sort all (proposal, score) pairs by score from highest to lowest
order = scores.ravel().argsort()[::-1]
#5. take top pre_nms_topN (e.g. 6000)
order = order[:pre_nms_topN]#6409boxes->6000boxes
proposals = proposals[order, :]#24000
scores = scores[order]
#6. apply nms (e.g. threshold = 0.7)
keep = nms(np.hstack((proposals, scores)), nms_thresh)#1344
# 7. take after_nms_topN (e.g. 300)
keep = keep[:post_nms_topN]#300 highest
proposals = proposals[keep, :]#1200
scores = scores[keep]
# 8. return the top proposals (-> RoIs top)

anchor target layer

产生anchors；

把anchor分配到ground truth上：

产生所有的基准anchor，即$w/16\times h/16\times9$.（62x37x9=20646）
将超出图像边界的anchor删除.(7858)
计算每个anchor对应的最大overlap的gt_boxes的下标，记录每个anchor对应的最大的overlap值=anchors数目
计算每个gt_boxes对应的最大overlap的anchor的下标，记录每个gt_boxes对应的最大的overlap值，有可能比gt_boxes数目多，因为可能有相同的overlap

给每个anchor打标签，是前景还是背景，还是不考虑的。

bg:每个 anchor对应最大的overlap值小于阈值0.3；
1
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

fg:对应每个GT,有最大overlap的anchor；或者每个anchor与gt的最大overlap超过阈值0.7的anchor:

1
2
3

labels[gt_argmax_overlaps] = 1
    # fg label: above threshold IOU
labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1

严格控制正负样本1：1（128，128）

边界框的回归目标

计算每一个anchor与重合度最高的那个ground truth的偏移值，即$t_x^*,t_y^*,t_w^*,t_h^{*}$,用于RPN层回归参数的学习。计算方法bbox_transform函数中：

targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
targets_dw = np.log(gt_widths / ex_widths)
targets_dh = np.log(gt_heights / ex_heights)

将裁剪过的anchors还原成未裁剪的anchos。返回RPN的region的label和回归目标系数。

RPN损失

类别交叉熵损失和回归损失L1smooth损失。

def build_loss(self, rpn_cls_score_reshape, rpn_bbox_pred, rpn_data):
        # classification loss
        rpn_cls_score = rpn_cls_score_reshape.permute(0, 2, 3, 1).contiguous().view(-1, 2)
        rpn_label = rpn_data[0].view(-1)

        rpn_keep = Variable(rpn_label.data.ne(-1).nonzero().squeeze()).cuda()
        rpn_cls_score = torch.index_select(rpn_cls_score, 0, rpn_keep)
        rpn_label = torch.index_select(rpn_label, 0, rpn_keep)

        fg_cnt = torch.sum(rpn_label.data.ne(0))

        rpn_cross_entropy = F.cross_entropy(rpn_cls_score, rpn_label)

        # box loss
        rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = rpn_data[1:]
        rpn_bbox_targets = torch.mul(rpn_bbox_targets, rpn_bbox_inside_weights)
        rpn_bbox_pred = torch.mul(rpn_bbox_pred, rpn_bbox_inside_weights)

        rpn_loss_box = F.smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, size_average=False) / (fg_cnt + 1e-4)

        return rpn_cross_entropy, rpn_loss_box

proposal target layer:

proposal target layer加了gt box在roi里面，但是直接加的话，会导致损失为0，所以要先抖动（稍微处理一下）。

1.给RPN输出的proposal分配标签；

# Select foreground RoIs as those with >= FG_THRESH overlap#找到IOU大于等于0.5的ROI，获得其引索
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
                   (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]

2.控制mini_batch里面的前景和背景ROI的数目

# Guard against the case when an image has fewer than fg_rois_per_image
# foreground RoIs #设定的每张图片fg最多为90=0.3*300
num_images = 1
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images#300
fg_rois_per_image = int(np.round(cfg.TRAIN.FG_FRACTION * rois_per_image))
fg_rois_per_this_image = min(fg_rois_per_image, fg_inds.size)#min(90,15)
    
# Compute number of background RoIs to take from this image (guarding
    # against there being fewer than desired)
    bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
    bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size)

此处没有dontcare的样本

2.计算proposal和groundtruth boxes的偏移量，用于网络最后一层参数的学习。

1.计算每个ROI和最佳匹配的GT的回归系数$t_x,t_y,t_w,t_h$.维度是【300，5】

1
2
3

#传入值为rois的（x1,y1,x2,y2）,对应最佳匹配GT的（x1,y1,x2,y2），对应的labels,#返回[标签，dx,dy,dw,dh]，shape：（len（rois），5）[300,5]
bbox_target_data = _compute_targets(
    rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)

即：

def _compute_targets(ex_rois, gt_rois, labels):
    """Compute bounding-box regression targets for an image."""

    assert ex_rois.shape[0] == gt_rois.shape[0]
    assert ex_rois.shape[1] == 4
    assert gt_rois.shape[1] == 4

    targets = bbox_transform(ex_rois, gt_rois)#[300,4]
    if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
        # Optionally normalize targets by a precomputed mean and stdev
        targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
                   / np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
    return np.hstack(
        (labels[:, np.newaxis], targets)).astype(np.float32, copy=False)

2.再把目标拓展到每个类上：

1 2	bbox_targets, bbox_inside_weights = \ _get_bbox_regression_labels(bbox_target_data, num_classes)

即：

def _get_bbox_regression_labels(bbox_target_data, num_classes):
    """Bounding-box regression targets (bbox_target_data) are stored in a
    compact form N x (class, tx, ty, tw, th)

    This function expands those targets into the 4-of-4*K representation used
    by the network (i.e. only one class has non-zero targets).
拓展到300*4K*NUM_class上，即每个ROI将自己的回归系数写在对应类别处位置。
    Returns:
        bbox_target (ndarray): N x 4K blob of regression targets
        bbox_inside_weights (ndarray): N x 4K blob of loss weights
    """

    clss = bbox_target_data[:, 0]#300 bbox_target_data:[300,5]
    bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)#[300,8]
    bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
    inds = np.where(clss > 0)[0]#positives:15
    for ind in inds:
        cls = int(clss[ind]) #得到第ind个roi的类别
        start = 4 * cls#回归系数摆放的位置
        end = start + 4
        bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
        bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
    return bbox_targets, bbox_inside_weights

ROIPooling

池化：

1
2
3

self.roi_pool = RoIPool(7, 7, 1.0/16)
pooled_features = self.roi_pool(features, rois)#[300, 512, 7, 7]
x = pooled_features.view(pooled_features.size()[0], -1)#[300,25088]

计算classification 层损失

将pooling后的特征送入两个FC层后，产生预测的分数和边界框回归系数：

1
2

计算分类损失

def build_loss(self, cls_score, bbox_pred, roi_data):
        # classification loss
		label = roi_data[1].squeeze()#300
        cross_entropy = F.cross_entropy(cls_score, label, weight=ce_weights)

为什么这里的交叉熵损失的计算为什么送入的是cls_score (softmax的输入），而不是cls_prob(softmax层的输出），参见附录A。

计算回归损失

# bounding box regression L1 loss
       bbox_targets, bbox_inside_weights, bbox_outside_weights = roi_data[2:]
       bbox_targets = torch.mul(bbox_targets, bbox_inside_weights)
       bbox_pred = torch.mul(bbox_pred, bbox_inside_weights)

       loss_box = F.smooth_l1_loss(bbox_pred, bbox_targets, size_average=False) / (fg_cnt + 1e-4)

计算总损失：

1	loss = net.loss + net.rpn.loss

Appendix:交叉熵损失的代码实现

pytorch中CrossEntropyLoss是通过两个步骤计算出来的，第一步是计算log softmax，第二步是计算cross entropy（或者说是negative log likehood），CrossEntropyLoss不需要在网络的最后一层添加softmax和log层，直接输出全连接层即可。而NLLLoss则需要在定义网络的时候在最后一层添加softmax和log层。

log softmax:

def log_softmax(input, dim=None, _stacklevel=3):
    r"""Applies a softmax followed by a logarithm.

    While mathematically equivalent to log(softmax(x)), doing these two
    operations separately is slower, and numerically unstable. This function
    uses an alternative formulation to compute the output and gradient correctly.

    See :class:`~torch.nn.LogSoftmax` for more details.

    Arguments:
        input (Variable): input
        dim (int): A dimension along which log_softmax will be computed.
    """
    if dim is None:
        dim = _get_softmax_dim('log_softmax', input.dim(), _stacklevel)
    return torch._C._nn.log_softmax(input, dim)

NLL loss:


def nll_loss(input, target, weight=None, size_average=True, ignore_index=-100, reduce=True):
    r"""The negative log likelihood loss.

    See :class:`~torch.nn.NLLLoss` for details.

    Args:
        input: :math:`(N, C)` where `C = number of classes` or :math:`(N, C, H, W)`
            in case of 2D Loss, or :math:`(N, C, d_1, d_2, ..., d_K)` where :math:`K > 1`
            in the case of K-dimensional loss.
        target: :math:`(N)` where each value is `0 <= targets[i] <= C-1`,
            or :math:`(N, C, d_1, d_2, ..., d_K)` where :math:`K >= 1` for
            K-dimensional loss.
        weight (Tensor, optional): a manual rescaling weight given to each
            class. If given, has to be a Tensor of size `C`
        size_average (bool, optional): By default, the losses are averaged
            over observations for each minibatch. If size_average
            is False, the losses are summed for each minibatch. Default: ``True``
        ignore_index (int, optional): Specifies a target value that is ignored
            and does not contribute to the input gradient. When size_average is
            True, the loss is averaged over non-ignored targets. Default: -100

    Example::

        >>> # input is of size N x C = 3 x 5
        >>> input = autograd.Variable(torch.randn(3, 5))
        >>> # each element in target has to have 0 <= value < C
        >>> target = autograd.Variable(torch.LongTensor([1, 0, 4]))
        >>> output = F.nll_loss(F.log_softmax(input), target)
        >>> output.backward()
    """
    dim = input.dim()
    if torch.is_tensor(weight):
        weight = Variable(weight)
    if dim == 2:
        return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
    elif dim == 4:
        return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
    elif dim == 3 or dim > 4:
        n = input.size(0)
        c = input.size(1)
        out_size = (n,) + input.size()[2:]
        if target.size()[1:] != input.size()[2:]:
            raise ValueError('Expected target size {}, got {}'.format(
                out_size, input.size()))
        input = input.contiguous().view(n, c, 1, -1)
        target = target.contiguous().view(n, 1, -1)
        if reduce:
            return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
        out = torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
        return out.view(out_size)
    else:
        raise ValueError('Expected 2 or more dimensions (got {})'.format(dim))