200字范文 > 【图普科技】边界框的数据增强(二) ——缩放和平移

【图普科技】边界框的数据增强(二) ——缩放和平移

时间：2024-04-18 19:07:52

本文由【图普科技】编译。

这是我们根据目标检测任务调整图像增强技术系列文章的第二部分。在这一部分中,我们将介绍如何实现缩放和平移的数据增强技术, 以及图像增强后，如果出现边界框在图像区域之外的情况该如何处理。

在上一部分中,我们介绍了实现增强的一般方法以及水平翻转的增强技术.

GitHub Repo

本文的所有内容以及所有的增强库都可以在下面的 GitHub Repo中找到。

/Paperspace/DataAugmentationForObjectDetection

文档

可以通过在浏览器中打开docs/build/html/index.html或在此链接中找到此项目的文档。

本系列的第1部分是本文的先决条件, 强烈建议您先阅读它。

本系列包括4个部分。

第1部分：基本设计和水平翻转

第2部分：缩放和平移

第3部分：旋转和裁剪

第4部分：所有技术整合

在这篇文章中, 我们将实现缩放和平移这两种增强技术。

缩放

缩放和平移后图像效果与下图相似。

左:原始图像右: 应用了缩放增强技术。

设计决策

我们首先需要考虑的是缩放增强技术的参数。显而易见，我们要确定的是原始图像尺寸的缩放因子。因此,它必须是大于-1的值，不能缩小到小于其自身的尺寸。我们可以选择通过控制高度和宽度缩放因子的一致来保持一定长宽比。但是, 我们可以允许缩放因子不同, 这不仅会产生缩放的图片增强效果, 而且可以更改图像的长宽比。我们引入了一个布尔变量diff,可以关闭/启用此功能。在使用缩放技术实现随机图像增强时, 我们需要一个从某个时间间隔随机取得的样本缩放因子。我们处理它的方式是，用户需要提供缩放因子scale的采样范围。如果用户只提供一个浮点型的scale，它必须是正的，而缩放因子的采样范围为 (-scale,scale)。

现在让我们来定义__init__方法。

class RandomScale(object):"""Randomly scales an image Bounding boxes which have an area of less than 25% in the remaining in the transformed image is dropped. The resolution is maintained, and the remainingarea if any is filled by black color.Parameters----------scale: float or tuple(float)if **float**, the image is scaled by a factor drawn randomly from a range (1 - `scale` , 1 + `scale`). If **tuple**,the `scale` is drawn randomly from values specified by the tupleReturns-------numpy.ndaarayScaled image in the numpy format of shape `HxWxC`numpy.ndarrayTranformed bounding box co-ordinates of the format `n x 4` where n is number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box"""def __init__(self, scale = 0.2, diff = False):self.scale = scaleif type(self.scale) == tuple:assert len(self.scale) == 2, "Invalid range"assert self.scale[0] > -1, "Scale factor can't be less than -1"assert self.scale[1] > -1, "Scale factor can't be less than -1"else:assert self.scale > 0, "Please input a positive float"self.scale = (max(-1, -self.scale), self.scale)self.diff = diff

这里的一个经验教训是不应采样__init__函数的最终参数。如果你采样了__init__函数的最终参数，在每次调用该函数时，参数的值都相同。如此一来，该增强技术就无法设定随机元素

将其传递给__call__函数将导致在每次应用增强技术时，参数具有不同的值（范围内）。

该增强技术的逻辑

缩放平移的逻辑相当简单。我们使用 opencv 函数cv2.resize来缩放我们的图像内容, 并通过缩放因子来缩放边界框。

但是, 我们将保持图像的大小不变。因此，如果我们缩小的话, 将会产生多余的空白区域。我们将这部分用黑色表示。最终的输出将类似于上面显示的增强图像。

img_shape = img.shapeif self.diff:scale_x = random.uniform(*self.scale)scale_y = random.uniform(*self.scale)else:scale_x = random.uniform(*self.scale)scale_y = scale_xresize_scale_x = 1 + scale_xresize_scale_y = 1 + scale_yimg= cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]

首先, 我们创建一个黑色图像，大小与我们的原始图像相同。

canvas = np.zeros(img_shape, dtype = np.uint8)

然后, 我们需要确定缩放后图像的大小。如果超过了原始图像的尺寸（放大时），则需要裁剪超出原始尺寸的部分。然后，我们“粘贴”调整后的图像在画布上。

y_lim = int(min(resize_scale_y,1)*img_shape[0])x_lim = int(min(resize_scale_x,1)*img_shape[1])canvas[:y_lim,:x_lim,:] = img[:y_lim,:x_lim,:]img = canvas

边界框裁剪

放大后，目标可能从图像中被“挤出”了，我们要做的最后一件事就是处理这种情况。例如，放大因子为0.1时，最终图像的尺寸是原始图像的1.1倍。这就使得原图像中的足球被挤出，不再存在于我们的图像中。

在放大的情况下，足球被挤出图像

这样的情况不仅是在放大的时候会出现,在应用平移等其他增强技术的过程中也出现。

因此，我们在辅助文件bbox_utils.py中定义了一个函数clip_box，此函数将根据边界框总面积处于图像边界内的百分比来剪除边界框。此百分比是一个可控制的参数，可在文件bbox_utils.py中定义,

def clip_box(bbox, clip_box, alpha):"""Clip the bounding boxes to the borders of an imageParameters----------bbox: numpy.ndarrayNumpy array containing bounding boxes of shape `N X 4` where N is the number of bounding boxes and the bounding boxes are represented in theformat `x1 y1 x2 y2`clip_box: numpy.ndarrayAn array of shape (4,) specifying the diagonal co-ordinates of the imageThe coordinates are represented in the format `x1 y1 x2 y2`alpha: floatIf the fraction of a bounding box left in the image after being clipped is less than `alpha` the bounding box is dropped. Returns-------numpy.ndarrayNumpy array containing **clipped** bounding boxes of shape `N X 4` where N is the number of bounding boxes left are being clipped and the bounding boxes are represented in theformat `x1 y1 x2 y2` """ar_ = (bbox_area(bbox))x_min = np.maximum(bbox[:,0], clip_box[0]).reshape(-1,1)y_min = np.maximum(bbox[:,1], clip_box[1]).reshape(-1,1)x_max = np.minimum(bbox[:,2], clip_box[2]).reshape(-1,1)y_max = np.minimum(bbox[:,3], clip_box[3]).reshape(-1,1)bbox = np.hstack((x_min, y_min, x_max, y_max, bbox[:,4:]))delta_area = ((ar_ - bbox_area(bbox))/ar_)mask = (delta_area < (1 - alpha)).astype(int)bbox = bbox[mask == 1,:]return bbox

上述函数修改了bboxes数组，并移除了那些在图像增强后有太多区域损失了的边界框。

然后, 我们只需在我们RandomScale中的__call__方法中调用此函数, 以确保所有边界框都已经被剪除。在这里，我们删除的是那些图像边界内的边界框部分占整个边界框面积的比例低于25% 的边界框。

bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)

为了计算边界框的面积, 我们还定义了一个函数bbox_area。

最后, 我们完整的_ call _方法如下所示:

def __call__(self, img, bboxes):#Chose a random digit to scale by img_shape = img.shapeif self.diff:scale_x = random.uniform(*self.scale)scale_y = random.uniform(*self.scale)else:scale_x = random.uniform(*self.scale)scale_y = scale_xresize_scale_x = 1 + scale_xresize_scale_y = 1 + scale_yimg= cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]canvas = np.zeros(img_shape, dtype = np.uint8)y_lim = int(min(resize_scale_y,1)*img_shape[0])x_lim = int(min(resize_scale_x,1)*img_shape[1])print(y_lim, x_lim)canvas[:y_lim,:x_lim,:] = img[:y_lim,:x_lim,:]img = canvasbboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)return img, bboxes

平移

我们接下来要讲的增强技术是平移，效果如下所示。

与缩放时一样，增强参数是原始图像尺寸的平移因子。上述的设计决策也同样适用。

除了确保该平移因子不小于-1，还应确保它不大于1，否则你只能得到一个黑色图片，原始图像内容被整体转移了。

class RandomTranslate(object):"""Randomly Translates the image Bounding boxes which have an area of less than 25% in the remaining in the transformed image is dropped. The resolution is maintained, and the remainingarea if any is filled by black color.Parameters----------translate: float or tuple(float)if **float**, the image is translated by a factor drawn randomly from a range (1 - `translate` , 1 + `translate`). If **tuple**,`translate` is drawn randomly from values specified by the tupleReturns-------numpy.ndaarayTranslated image in the numpy format of shape `HxWxC`numpy.ndarrayTranformed bounding box co-ordinates of the format `n x 4` where n is number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box"""def __init__(self, translate = 0.2, diff = False):self.translate = translateif type(self.translate) == tuple:assert len(self.translate) == 2, "Invalid range" assert self.translate[0] > 0 & self.translate[0] < 1assert self.translate[1] > 0 & self.translate[1] < 1else:assert self.translate > 0 & self.translate < 1self.translate = (-self.translate, self.translate)self.diff = diff

增强技术逻辑

这种增强的逻辑比缩放要棘手得多。所以, 我们要对它做一些解释。

我们首先设置变量。

def __call__(self, img, bboxes): #Chose a random digit to scale by img_shape = img.shape#translate the image#percentage of the dimension of the image to translatetranslate_factor_x = random.uniform(*self.translate)translate_factor_y = random.uniform(*self.translate)if not self.diff:translate_factor_y = translate_factor_x

现在, 当我们平移图像时, 会留下一些尾随空白。我们将把它用黑色表示。你可以在上面的平移演示图像中很容易地找出这些黑色区域。

我们首先初始化一个黑色图像, 其大小与我们原来的图像相同。

canvas = np.zeros(img_shape)

然后，我们有两个任务。首先，如第一个图像所示，确定图像将被粘贴在黑画布的哪部分。

然后，确定图像的哪一部分将被粘贴。

左:图像中将被粘贴的区域。右: 将要粘贴图像的画布区域,

因此, 我们首先要找出将要粘贴到画布图像上的图像区域。(左边图像上的紫色区域)。

#get the top-left corner co-ordinates of the shifted image corner_x = int(translate_factor_x*img.shape[1])corner_y = int(translate_factor_y*img.shape[0])mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]

现在，让我们确定将要“粘贴”图像mask的画布区域。

orig_box_cords = [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = maskimg = canvas

边界框的平移相对简单，只需要偏移边界框的角。我们同样删除那些图像边界内的边界框部分占整个边界框面积的比例低于25% 的边界框。

bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)

总之, 我们的调用函数如下所示。

def __call__(self, img, bboxes): #Chose a random digit to scale by img_shape = img.shape#translate the image#percentage of the dimension of the image to translatetranslate_factor_x = random.uniform(*self.translate)translate_factor_y = random.uniform(*self.translate)if not self.diff:translate_factor_y = translate_factor_xcanvas = np.zeros(img_shape).astype(np.uint8)corner_x = int(translate_factor_x*img.shape[1])corner_y = int(translate_factor_y*img.shape[0])#change the origin to the top-left corner of the translated boxorig_box_cords = [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = maskimg = canvasbboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)return img, bboxes

测试

如上例所示，您可以在样本图像或您自己的图像上（前提是您已以正确的格式存储它）测试这些增强技术。

from data_aug.bbox_utils import *import matplotlib.pyplot as plt scale = RandomScale(0.2, diff = True) translate = RandomTranslate(0.2, diff = True)img, bboxes = translate(img, bboxes)img,bboxes = scale(img, bboxes)plt.imshow(draw_rect(img, bboxes))

最终的结果可能是这样的。

你可以先尝试做缩放，然后再平移。你可能会发现，即使参数的值相同，结果也是不同的。

在下一部分中, 我们将通过 opencv 的仿射变换功能来实现旋转和裁剪的增强技术，增强效果更加强大。敬请关注！

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。