200字范文 > SSD-Tensorflow 目标检测（自定义数据集（VOC格式））

SSD-Tensorflow 目标检测（自定义数据集（VOC格式））

时间：2022-06-16 21:39:57

文章目录

一、准备二、生成.tfrecords文件三、训练模型修改四、训练五、测试验证六、报错误及解决方案：

一、准备

搭建SSD框架，下载解压即可

下载pascalvoc数据，自己的数据根据voc格式改写（图片的名称，不用拘泥于6位数字，其他命名也可以）数据集下载点击

解压后不要混合在一个文件夹下

VOCtrainval用来训练，VOCtest用来测试。

VOCtrainval 中JPEGImage文件夹中仅是训练和验证的图片，Main文件夹中仅是trainval.txt, train.txt, val.txt

VOCtest中JPEGImage文件夹中仅是测试图片，Main文件夹中仅是test.txt

自己的文件根据以上文件格式放置图片即可。

自己的数据根据voc格式改写（图片的名称，不用拘泥于6位数字，其他命名也可以）

文件重命名点击

标记自己的数据，这个过程枯燥，需要耐心。详情请点击,

生成txt文件，train.txt, trainval.txt, test.txt, val.txt（注意文件路径）

import osimport randomsaveBasePath = r"./VOC/ImageSets" # txt文件保存目录total_xml = os.listdir(r'./VOC/Annotations') # 获取标注文件（file_name.xml）# 划分数据集为（训练，验证，测试集 = 49%,20%,30%）trainval_percent = 0.7train_percent = 0.7tv = int(len(total_xml) * trainval_percent) # 70%训练-验证集的文件数目tr = int(tv * train_percent) # 70%训练集的文件数目# 打乱训练文件（洗牌）trainval = random.sample(range(len(total_xml)), tv)train = random.sample(trainval, tr)print("train and val size", tv)print("train size", tr)ftrainval = open(os.path.join(saveBasePath, 'Main/trainval.txt'), 'w')ftest = open(os.path.join(saveBasePath, 'Main/test.txt'), 'w')ftrain = open(os.path.join(saveBasePath, 'Main/train.txt'), 'w')fval = open(os.path.join(saveBasePath, 'Main/val.txt'), 'w')for i in range(len(total_xml)):# 遍历所有 file_name.xml 文件name = total_xml[i][:-4] + '\n' # 获取 file_nameif i in trainval:ftrainval.write(name)if i in train:ftrain.write(name)else:fval.write(name)else:ftest.write(name)ftrainval.close()ftrain.close()fval.close()ftest.close()

将train.txt, trainval.txt, test.txt, val.txt放置训练验证集文件目录下

VOCtrainval_06-Nov-\VOCdevkit\VOC\ImageSets\Main\将test.txt放置测试集文件目录下：

VOCtest_06-Nov-\VOCdevkit\VOC\ImageSets\Main\

二、生成.tfrecords文件

将训练类别修改为和自己一样的

在此目录文件下：SSD-Tensorflow/datasets/pascalvoc_common.py

根据实际情况进行修改

# 注释原始的标签，添加自己的标签VOC_LABELS = {'none': (0, 'Background'), 'aeroplane': (1, 'Vehicle'), 'bicycle': (2, 'Vehicle'), 'bird': (3, 'Animal'), 'boat': (4, 'Vehicle'), ... ...'Person': (15, 'Person'), 'pottedplant': (16, 'Indoor'), 'sheep': (17, 'Animal'), 'sofa': (18, 'Indoor'), 'train': (19, 'Vehicle'), 'tvmonitor': (20, 'Indoor'), }

将图像数据转换为tfrecods格式

SSD-Tensorflow/datasets/pascalvoc_to_tfrecords.py。

更改文件的83行:image_data = tf.gfile.FastGFile(filename, 'rb').read()；

如果你的图片不是.jpg格式，修改图片类型；

更改文件的67行，SAMPLES_PER_FILES = 500(自定义)意为：几个.xml转为一个tfrecords，如下图

生成.tfrecords文件

打开tf_convert_data.py文件，依次点击：run、Edit Configuration，在Parameters中填入以下内容，再运行tf_convert_data.py文件，在面板中得到成功信息，可以在tfrecords_文件夹下看到生成的.tfrecords文件；

--dataset_name=pascalvoc--dataset_dir=./VOC/--output_name=voc__train--output_dir=./tfrecords_

或者在SSD-Tensorflow 文件夹下创建tf_conver_data.sh运行。

#!/bin/bash # 这是一个shell脚本，用于将pascal VOC数据集转换tfrecords数据DATASET_DIR=./VOC/ # VOC数据保存的文件夹（VOC的目录格式未改变） OUTPUT_DIR=./tfrecords_ # 保存tfrecords数据的文件夹 python ./tf_convert_data.py\--dataset_name=pascalvoc\--dataset_dir=${DATASET_DIR}\--output_name=voc__train\--output_dir=${OUTPUT_DIR}

或者直接使用如下代码

"""特别注意: path地址是否正确、要在主目录下提前创建“tfrecords_”文件夹"""import osimport sysimport randomimport numpy as npimport tensorflow as tfimport xml.etree.ElementTree as ET # 操作xml文件# 我的标签定义只有两类，要根据自己的图片而定VOC_LABELS = {'none': (0, 'Background'),'aiaitie': (1, 'Product')}# 图片和标签存放的文件夹.DIRECTORY_ANNOTATIONS = 'Annotations/'DIRECTORY_IMAGES = 'JPEGImages/'RANDOM_SEED = 4242# 随机种子.SAMPLES_PER_FILES = 3 # 每个.tfrecords文件包含几个.xml样本# 生成整数型，浮点型和字符串型的属性def int64_feature(value):if not isinstance(value, list):value = [value]return tf.train.Feature(int64_list=tf.train.Int64List(value=value))def float_feature(value):if not isinstance(value, list):value = [value]return tf.train.Feature(float_list=tf.train.FloatList(value=value))def bytes_feature(value):if not isinstance(value, list):value = [value]return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))# 图片处理def _process_image(directory, name):# Read the image file.filename = directory + DIRECTORY_IMAGES + name + '.jpg'image_data = tf.gfile.FastGFile(filename, 'rb').read()# Read the XML annotation file.filename = os.path.join(directory, DIRECTORY_ANNOTATIONS, name + '.xml')tree = ET.parse(filename)root = tree.getroot()# Image shape.size = root.find('size')shape = [int(size.find('height').text),int(size.find('width').text),int(size.find('depth').text)]# Find annotations.bboxes = []labels = []labels_text = []difficult = []truncated = []for obj in root.findall('object'):label = obj.find('name').textlabels.append(int(VOC_LABELS[label][0]))labels_text.append(label.encode('ascii')) # 变为ascii格式if obj.find('difficult'):difficult.append(int(obj.find('difficult').text))else:difficult.append(0)if obj.find('truncated'):truncated.append(int(obj.find('truncated').text))else:truncated.append(0)bbox = obj.find('bndbox')a = float(bbox.find('ymin').text) / shape[0]b = float(bbox.find('xmin').text) / shape[1]a1 = float(bbox.find('ymax').text) / shape[0]b1 = float(bbox.find('xmax').text) / shape[1]a_e = a1 - ab_e = b1 - bif abs(a_e) < 1 and abs(b_e) < 1:bboxes.append((a, b, a1, b1))return image_data, shape, bboxes, labels, labels_text, difficult, truncated# 转化样例def _convert_to_example(image_data, labels, labels_text, bboxes, shape,difficult, truncated):xmin = []ymin = []xmax = []ymax = []for b in bboxes:assert len(b) == 4# pylint: disable=expression-not-assigned[l.append(point) for l, point in zip([ymin, xmin, ymax, xmax], b)]# pylint: enable=expression-not-assignedimage_format = b'JPEG'example = tf.train.Example(features=tf.train.Features(feature={'image/height': int64_feature(shape[0]),'image/width': int64_feature(shape[1]),'image/channels': int64_feature(shape[2]),'image/shape': int64_feature(shape),'image/object/bbox/xmin': float_feature(xmin),'image/object/bbox/xmax': float_feature(xmax),'image/object/bbox/ymin': float_feature(ymin),'image/object/bbox/ymax': float_feature(ymax),'image/object/bbox/label': int64_feature(labels),'image/object/bbox/label_text': bytes_feature(labels_text),'image/object/bbox/difficult': int64_feature(difficult),'image/object/bbox/truncated': int64_feature(truncated),'image/format': bytes_feature(image_format),'image/encoded': bytes_feature(image_data)}))return example# 增加到tfrecorddef _add_to_tfrecord(dataset_dir, name, tfrecord_writer):image_data, shape, bboxes, labels, labels_text, difficult, truncated = \_process_image(dataset_dir, name)example = _convert_to_example(image_data, labels, labels_text,bboxes, shape, difficult, truncated)tfrecord_writer.write(example.SerializeToString())# name为转化文件的前缀def _get_output_filename(output_dir, name, idx):return '%s/%s_%03d.tfrecord' % (output_dir, name, idx)def run(dataset_dir, output_dir, name='voc_train', shuffling=False):if not tf.gfile.Exists(dataset_dir):tf.gfile.MakeDirs(dataset_dir)path = os.path.join(dataset_dir, DIRECTORY_ANNOTATIONS)filenames = sorted(os.listdir(path)) # 排序if shuffling:random.seed(RANDOM_SEED)random.shuffle(filenames)i = 0fidx = 0while i < len(filenames): # Open new TFRecord file.tf_filename = _get_output_filename(output_dir, name, fidx)with tf.python_io.TFRecordWriter(tf_filename) as tfrecord_writer:j = 0while i < len(filenames) and j < SAMPLES_PER_FILES:sys.stdout.write(' Converting image %d/%d \n' % (i + 1, len(filenames))) # 终端打印，类似printsys.stdout.flush() # 缓冲filename = filenames[i]img_name = filename[:-4]_add_to_tfrecord(dataset_dir, img_name, tfrecord_writer)i += 1j += 1fidx += 1print('\nFinished converting the Pascal VOC dataset!')# 原数据集路径，输出路径以及输出文件名，要根据自己实际做改动dataset_dir = "C:/Users/Admin/Desktop/"output_dir = "./tfrecords_"name = "voc_train"def main(_):run(dataset_dir, output_dir, name)if __name__ == '__main__':tf.app.run()

三、训练模型修改

datasets/pascalvoc_.py修改训练数据shape：

根据自己训练数据修改：NUM_CLASSES = 类别数；

TRAIN_STATISTICS = {'none': (0, 0),'aeroplane': (238, 306), #238图片数， 306目标总数'bicycle': (243, 353),'bird': (330, 486),'boat': (181, 290),... ...'sheep': (96, 257),'sofa': (229, 248),'train': (261, 297),'tvmonitor': (256, 324),'total': (5011, 12608), #5011 为训练的图片书，12608为目标总数}TEST_STATISTICS = {'none': (0, 0),'aeroplane': (1, 1),'bicycle': (1, 1),'bird': (1, 1),... ...'sheep': (1, 1),'sofa': (1, 1),'train': (1, 1),'tvmonitor': (1, 1),'total': (20, 20),}SPLITS_TO_SIZES = {'train': 5011, # 训练数据量'test': 4952, # 测试数据量}SPLITS_TO_STATISTICS = {'train': TRAIN_STATISTICS,'test': TEST_STATISTICS,}NUM_CLASSES = 20 # 类别，根据自己数据的实际类别修改（不包含背景）

nets/ssd_vgg_300.py修改类别个数，根据自己训练类别数修改96和97行：等于类别数+1；

img_shape=(300, 300),num_classes=21, #根据自己的数据修改为类别+1 no_annotation_label=21, #根据自己的数据修改为类别+1

eval_ssd_network.py修改类别个数，修改66行的类别个数：等于类别数+1；

tf.app.flags.DEFINE_integer('num_classes', 21, 'Number of classes to use in the dataset.')

train_ssd_network.py

修改27行的数据格式，改为’NHWC’；

修改135行的类别个数：等于类别数+1；

修改56行—66行是关于模型运行保存的参数；

修改154行的最大训练步数，将None（训练会无限进行）改为比如50000。

tf.app.flags.DEFINE_integer('log_every_n_steps', 10,'The frequency with which logs are print.')tf.app.flags.DEFINE_integer('save_summaries_secs', 600,'The frequency with which summaries are saved, in seconds.')tf.app.flags.DEFINE_integer('save_interval_secs', 600,'The frequency with which the model is saved, in seconds.')tf.app.flags.DEFINE_float('gpu_memory_fraction', 0.9, 'GPU memory fraction to use.')

四、训练

方案1从vgg开始训练其中某些层的参数：

ssd_300_vgg中的300是指把图片归一化为 300*300，所以如果要用ssd_512_vgg来fine-tune的话，就只需要重新训练受图片分辨率影响的层即可。

# 通过加载预训练好的vgg16模型，进行训练 # 通过 checkpoint_exclude_scopes 指定哪些层的参数不需要从vgg16模型里面加载进来 # 通过 trainable_scopes指定哪些层的参数是需要训练的，未指定的参数保持不变,若注释掉此命令，所有的参数均需要训练DATASET_DIR=./tfrecords_/ # 数据存放路径TRAIN_DIR=./train_model/ # 训练生成模型的存放路径 CHECKPOINT_PATH=./checkpoints/vgg_16.ckpt # 加载预训练模型的路径 python ../train_ssd_network.py \ --train_dir=${TRAIN_DIR} \ # 训练生成模型的存放路径 --dataset_dir=${DATASET_DIR} \# 数据存放路径 --dataset_name=pascalvoc_ \ # 数据名的前缀 --dataset_split_name=train \ --model_name=ssd_300_vgg \ # 加载的模型的名字 --checkpoint_path=${CHECKPOINT_PATH} \ # 所加载模型的路径 --checkpoint_model_scope=vgg_16 \ # 所加载模型里面的作用域名 --checkpoint_exclude_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \ --trainable_scopes=ssd_300_vgg/conv6,ssd_300_vgg/conv7,ssd_300_vgg/block8,ssd_300_vgg/block9,ssd_300_vgg/block10,ssd_300_vgg/block11,ssd_300_vgg/block4_box,ssd_300_vgg/block7_box,ssd_300_vgg/block8_box,ssd_300_vgg/block9_box,ssd_300_vgg/block10_box,ssd_300_vgg/block11_box \ --save_summaries_secs=60 \ # 每60s保存一下日志 --save_interval_secs=600 \ # 每600s保存一下模型 --weight_decay=0.0005 \# 正则化的权值衰减的系数 --optimizer=adam \# 选取的最优化函数 --learning_rate=0.001 \# 学习率 --learning_rate_decay_factor=0.94 \# 学习率的衰减因子 --batch_size=24 \ # 可以小一点，不然可能会报错（显存不够用）--gpu_memory_fraction=0.9# 指定占用gpu内存的百分比

方案2：从头开始训练自己的模型

#注释掉如下参数：#CHECKPOINT_PATH=./checkpoints/vgg_16.ckpt 不提供初始化模型,让模型自己随机初始化权重，从头训练#--checkpoint_path=${CHECKPOINT_PATH}#--checkpoint_path=${CHECKPOINT_PATH}#--checkpoint_model_scope=ssd_512_vgg#--checkpoint_exclude_scopes=ssd_300_vgg/block10...#--trainable_scopes=ssd_300_vgg/conv6...#/bin/bashDATASET_DIR=./tfrecords_/ # 数据存放路径TRAIN_DIR=./train_model/ # 训练生成模型的存放路径CUDA_VISIBLE_DEVICES=0 python ./train_ssd_network.py \--train_dir=${TRAIN_DIR} \--dataset_dir=${DATASET_DIR} \--dataset_name=pascalvoc_ \--dataset_split_name=train \--model_name=ssd_300_vgg \--save_summaries_secs=600 \--save_interval_secs=600 \--optimizer=adam \--learning_rate_decay_factor=0.94 \--batch_size=32 \--gpu_memory_fraction=0.9

五、测试验证

生成.tfrecords文件。将测试图片转换为tfrecords

#!/bin/bashDATASET_DIR=./VOC/test_images/ # 测试图片目录（存放测试的图片）OUTPUT_DIR=./tfrecords_/tfrecords/ # 测试图片的 .tfrecords文件 python ./tf_convert_data.py \--dataset_name=pascalvoc \--dataset_dir=${DATASET_DIR} \--output_name=voc__test \--output_dir=${OUTPUT_DIR}

运行测试

#!/bin/bashDATASET_DIR=./tfrecords_/tfrecords/EVAL_DIR=./ssd_eval_log/CHECKPOINT_PATH=./train_model/model.ckpt-5000python ./eval_ssd_network.py \--eval_dir=${EVAL_DIR} \--dataset_dir=${DATASET_DIR} \--dataset_name=pascalvoc_ \--dataset_split_name=test \--model_name=ssd_300_vgg \--checkpoint_path=${CHECKPOINT_PATH} \--batch_size=1

使用notebooksssd_notebook.ipynb来查看模型标注的图片。详情请点击

修改ckpt_filename = "路径/自己训练的权重文件"

修改自己图片所在的路径，或者将需要测试的图片放入

六、报错误及解决方案：

错误1ZeroDivisionError: float division by zero,详情如下：

>> Converting image 117/504Traceback (most recent call last): #第117张标注文件有问题File "D:/AI_target_detection/SSD-Tensorflow/tf_convert_data.py", line 59, in <module>tf.app.run()File "C:\Anaconda3\envs\AI_tensorflow_GPU\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run_sys.exit(main(argv))File "D:/AI_target_detection/SSD-Tensorflow/tf_convert_data.py", line 54, in mainpascalvoc_to_tfrecords.run(FLAGS.dataset_dir, FLAGS.output_dir, FLAGS.output_name)File "D:\AI_target_detection\SSD-Tensorflow\datasets\pascalvoc_to_tfrecords.py", line 223, in run_add_to_tfrecord(dataset_dir, img_name, tfrecord_writer)File "D:\AI_target_detection\SSD-Tensorflow\datasets\pascalvoc_to_tfrecords.py", line 182, in _add_to_tfrecord_process_image(dataset_dir, name)File "D:\AI_target_detection\SSD-Tensorflow\datasets\pascalvoc_to_tfrecords.py", line 121, in _process_imagebboxes.append((max(float(bbox.find('ymin').text) / shape[0], 0.1),ZeroDivisionError: float division by zero

错误2All bounding box coordinates must be in [0.0, 1.0]

原因及解决方法：标注数据集时鼠标多点了一下，没有任何标注，和标注框超出图片范围。

找到那一张图片以及其标注，删除标注及其文件，或重新标注找到pascalvoc_to_tfrecords.py114-119行

将：bboxes.append((float(bbox.find('ymin').text) / shape[0],float(bbox.find('xmin').text) / shape[1],float(bbox.find('ymax').text) / shape[0],float(bbox.find('xmax').text) / shape[1]))修改为：bboxes.append((max(float(bbox.find('ymin').text) / shape[0], 0.0),max(float(bbox.find('xmin').text) / shape[1], 0.0),min(float(bbox.find('ymax').text) / shape[0], 1.0),min(float(bbox.find('xmax').text) / shape[1], 1.0)))

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。