将自己的数据集制作成TFRecord格式教程

更新时间：2020年02月17日 09:33:51 作者：v1_vivian

今天小编就为大家分享一篇将自己的数据集制作成TFRecord格式教程，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧

在使用TensorFlow训练神经网络时，首先面临的问题是：网络的输入

此篇文章，教大家将自己的数据集制作成TFRecord格式，feed进网络，除了TFRecord格式，TensorFlow也支持其他格

式的数据，此处就不再介绍了。建议大家使用TFRecord格式，在后面可以通过api进行多线程的读取文件队列。

1. 原本的数据集

此时，我有两类图片，分别是xiansu100,xiansu60,每一类中有10张图片。

2.制作成TFRecord格式

tfrecord会根据你选择输入文件的类，自动给每一类打上同样的标签。如在本例中，只有0,1 两类，想知道文件夹名与label关系的，可以自己保存起来。

#生成整数型的属性
def _int64_feature(value):
 return tf.train.Feature(int64_list = tf.train.Int64List(value = [value]))
 
#生成字符串类型的属性
def _bytes_feature(value):
 return tf.train.Feature(bytes_list = tf.train.BytesList(value = [value]))
 
#制作TFRecord格式
def createTFRecord(filename,mapfile):
 class_map = {}
 data_dir = '/home/wc/DataSet/traffic/testTFRecord/'
 classes = {'xiansu60','xiansu100'}
 #输出TFRecord文件的地址
 
 writer = tf.python_io.TFRecordWriter(filename)
 
 for index,name in enumerate(classes):
  class_path=data_dir+name+'/'
  class_map[index] = name
  for img_name in os.listdir(class_path):
   img_path = class_path + img_name #每个图片的地址
   img = Image.open(img_path)
   img= img.resize((224,224))
   img_raw = img.tobytes()   #将图片转化成二进制格式
   example = tf.train.Example(features = tf.train.Features(feature = {
    'label':_int64_feature(index),
    'image_raw': _bytes_feature(img_raw)
   }))
   writer.write(example.SerializeToString())
 writer.close()
 
 txtfile = open(mapfile,'w+')
 for key in class_map.keys():
  txtfile.writelines(str(key)+":"+class_map[key]+"\n")
 txtfile.close()

此段代码，运行完后会产生生成的.tfrecord文件。

3. 读取TFRecord的数据，进行解析，此时使用了文件队列以及多线程

#读取train.tfrecord中的数据
def read_and_decode(filename): 
 #创建一个reader来读取TFRecord文件中的样例
 reader = tf.TFRecordReader()
 #创建一个队列来维护输入文件列表
 filename_queue = tf.train.string_input_producer([filename], shuffle=False,num_epochs = 1)
 #从文件中读出一个样例，也可以使用read_up_to一次读取多个样例
 _,serialized_example = reader.read(filename_queue)
#  print _,serialized_example
 
 #解析读入的一个样例，如果需要解析多个，可以用parse_example
 features = tf.parse_single_example(
 serialized_example,
 features = {'label':tf.FixedLenFeature([], tf.int64),
    'image_raw': tf.FixedLenFeature([], tf.string),})
 #将字符串解析成图像对应的像素数组
 img = tf.decode_raw(features['image_raw'], tf.uint8)
 img = tf.reshape(img,[224, 224, 3]) #reshape为128*128*3通道图片
 img = tf.image.per_image_standardization(img)
 labels = tf.cast(features['label'], tf.int32)
 return img, labels

4. 将图片几个一打包，形成batch

def createBatch(filename,batchsize):
 images,labels = read_and_decode(filename)
 
 min_after_dequeue = 10
 capacity = min_after_dequeue + 3 * batchsize
 
 image_batch, label_batch = tf.train.shuffle_batch([images, labels], 
              batch_size=batchsize, 
              capacity=capacity, 
              min_after_dequeue=min_after_dequeue
              )
 
 label_batch = tf.one_hot(label_batch,depth=2)
 return image_batch, label_batch

5.主函数

if __name__ =="__main__":
 #训练图片两张为一个batch,进行训练，测试图片一起进行测试
 mapfile = "/home/wc/DataSet/traffic/testTFRecord/classmap.txt"
 train_filename = "/home/wc/DataSet/traffic/testTFRecord/train.tfrecords"
#  createTFRecord(train_filename,mapfile)
 test_filename = "/home/wc/DataSet/traffic/testTFRecord/test.tfrecords"
#  createTFRecord(test_filename,mapfile)
 image_batch, label_batch = createBatch(filename = train_filename,batchsize = 2)
 test_images,test_labels = createBatch(filename = test_filename,batchsize = 20)
 with tf.Session() as sess:
  initop = tf.group(tf.global_variables_initializer(),tf.local_variables_initializer())
  sess.run(initop)
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(sess = sess, coord = coord)
 
  try:
   step = 0
   while 1:
    _image_batch,_label_batch = sess.run([image_batch,label_batch])
    step += 1
    print step
    print (_label_batch)
  except tf.errors.OutOfRangeError:
   print (" trainData done!")
   
  try:
   step = 0
   while 1:
    _test_images,_test_labels = sess.run([test_images,test_labels])
    step += 1
    print step
 #     print _image_batch.shape
    print (_test_labels)
  except tf.errors.OutOfRangeError:
   print (" TEST done!")
  coord.request_stop()
  coord.join(threads)

此时，生成的batch，就可以feed进网络了。

以上这篇将自己的数据集制作成TFRecord格式教程就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持脚本之家。

您可能感兴趣的文章:

Python网络爬虫的基本原理解析
如果要获取网络上数据，我们要给爬虫一个网址（程序中通常叫URL），爬虫发送一个HTTP请求给目标网页的服务器，服务器返回数据给客户端（也就是我们的爬虫），爬虫再进行数据解析、保存等一系列操作,需要的朋友可以参考下
2023-05-05
python将处理好的图像保存到指定目录下的方法
今天小编就为大家分享一篇python将处理好的图像保存到指定目录下的方法，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2019-01-01
使用python批量生成insert语句的方法
很多时候需要造数据，大量的插入数据，本文介绍了使用python批量生成insert语句的方法，需要的朋友们下面随着小编来一起学习学习吧
2021-05-05
python中翻译功能translate模块实现方法
在本篇文章中小编给各位整理了一篇关于python中翻译功能translate模块实现方法，有需要的朋友们可以参考下。
2020-12-12
python三种数据结构及13种创建方法总结
拿Python来说，数据结构的概念也是超级重要，不同的数据结构，有着不同的函数，供我们调用,接下来，我们分别来介绍字符串、列表、字典的创建方法
2021-09-09
pytest测试框架+allure超详细教程
这篇文章主要介绍了pytest测试框架+allure超详细教程,本文给大家介绍的非常详细，对大家的学习或工作具有一定的参考借鉴价值，需要的朋友可以参考下
2022-11-11
Python中的各种装饰器详解
这篇文章主要介绍了Python中的各种装饰器详解,Python装饰器分两部分,一是装饰器本身的定义,一是被装饰器对象的定义,本文分别讲解了各种情况下的装饰器,需要的朋友可以参考下
2015-04-04
详解Python中的__new__、__init__、__call__三个特殊方法
用双下划线包围的特殊方法在Python中又被成为魔术方法,类似于C++等语言中的构造函数,这里我们就来详解Python中的__new__、__init__、__call__三个特殊方法:
2016-06-06
Python自动化运维和部署项目工具Fabric使用实例
Fabric是一个Python库，只要目标机器支持ssh访问，就可以借助fabric来进行远程操作（如在host1上对host2远程运行shell命令），显然，由于fabric是个Python package，故其它Python package都可以被import到fabric特有的fabfile.py脚本中
2016-09-09
Python实现的下载网页源码功能示例
这篇文章主要介绍了Python实现的下载网页源码功能,涉及Python基于http请求与响应实现的网页源码读取功能相关操作技巧,需要的朋友可以参考下
2017-06-06

将自己的数据集制作成TFRecord格式教程

相关文章

最新评论

大家感兴趣的内容

最近更新的内容

常用在线小工具