【问题标题】:Reading Cifar10 dataset in batches批量读取 Cifar10 数据集
【发布时间】:2016-09-27 11:51:27
【问题描述】:

我正在尝试读取来自https://www.cs.toronto.edu/~kriz/cifar.html> 的批量提供的 CIFAR10 数据集。我正在尝试使用 pickle 将其放入数据框中并读取其中的“数据”部分。但是我收到了这个错误。

KeyError                                  Traceback (most recent call last)
<ipython-input-24-8758b7a31925> in <module>()
----> 1 unpickle('datasets/cifar-10-batches-py/test_batch')

<ipython-input-23-04002b89d842> in unpickle(file)
      3     fo = open(file, 'rb')
      4     dict = pickle.load(fo, encoding ='bytes')
----> 5     X = dict['data']
      6     fo.close()
      7     return dict

KeyError: '数据'。

我正在使用 ipython,这是我的代码:

def unpickle(file):

 fo = open(file, 'rb')
 dict = pickle.load(fo, encoding ='bytes')
 X = dict['data']
 fo.close()
 return dict

unpickle('datasets/cifar-10-batches-py/test_batch')

【问题讨论】:

  • 添加 print dict.keys() 看看里面有什么
  • @lejlot dict_keys([b'batch_label', b'data', b'labels', b'filenames']) ..
  • 然后尝试做X = dict[b'data']
  • 是的..刚刚完成,结果证明它的工作:) ...无论如何感谢@lejlot

标签: python-3.x machine-learning computer-vision batch-processing


【解决方案1】:

您可以通过下面给出的代码读取 cifar 10 数据集,只需确保您提供了放置批次的写入目录

import tensorflow as tf
import pandas as pd
import numpy as np
import math
import timeit
import matplotlib.pyplot as plt
from six.moves import cPickle as pickle
import os
import platform
from subprocess import check_output
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

%matplotlib inline


img_rows, img_cols = 32, 32
input_shape = (img_rows, img_cols, 3)
def load_pickle(f):
    version = platform.python_version_tuple()
    if version[0] == '2':
        return  pickle.load(f)
    elif version[0] == '3':
        return  pickle.load(f, encoding='latin1')
    raise ValueError("invalid python version: {}".format(version))

def load_CIFAR_batch(filename):
    """ load single batch of cifar """
    with open(filename, 'rb') as f:
        datadict = load_pickle(f)
        X = datadict['data']
        Y = datadict['labels']
        X = X.reshape(10000,3072)
        Y = np.array(Y)
        return X, Y

def load_CIFAR10(ROOT):
    """ load all of cifar """
    xs = []
    ys = []
    for b in range(1,6):
        f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
        X, Y = load_CIFAR_batch(f)
        xs.append(X)
        ys.append(Y)
    Xtr = np.concatenate(xs)
    Ytr = np.concatenate(ys)
    del X, Y
    Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
    return Xtr, Ytr, Xte, Yte
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=10000):
    # Load the raw CIFAR-10 data
    cifar10_dir = '../input/cifar-10-batches-py/'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    x_train = X_train.astype('float32')
    x_test = X_test.astype('float32')

    x_train /= 255
    x_test /= 255

    return x_train, y_train, X_val, y_val, x_test, y_test


# Invoke the above function to get our data.
x_train, y_train, x_val, y_val, x_test, y_test = get_CIFAR10_data()


print('Train data shape: ', x_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', x_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', x_test.shape)
print('Test labels shape: ', y_test.shape)

【讨论】:

    【解决方案2】:

    我知道原因!我有同样的问题,我解决了! 关键问题是关于编码方式的,把代码改成

    dict = pickle.load(fo, encoding ='bytes')
    

    dict = pickle.load(fo, encoding ='latin1')
    

    【讨论】:

      【解决方案3】:

      我过去也遇到过类似的问题。

      我想为未来的读者提一下,您可以找到here 一个用于自动下载、提取和解析 cifar10 数据集的 python 包装器。

      【讨论】:

        【解决方案4】:

        试试这个

        def unpickle(file): import cPickle with open(file, 'rb') as fo: data = cPickle.load(fo) return data

        【讨论】:

        • 为什么 OP 应该“试试这个”? 好的答案将始终解释所做的事情以及为什么以这种方式完成,不仅适用于 OP,而且适用于 SO 的未来访问者。
        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2013-05-18
        • 1970-01-01
        • 2020-12-04
        • 2021-04-12
        • 1970-01-01
        • 2020-06-28
        • 1970-01-01
        相关资源
        最近更新 更多