【发布时间】:2020-02-04 21:15:24
【问题描述】:
我有一个列表,其中包含一个数据集的 6 个不同的子数据集。我想进行 6 折交叉验证。因此,在每次 6 个步骤的 for 循环中,将我的数据集分成 2 组(将包含 5 个子数据集和测试集的训练 - 包含留下一个子数据集)。我的代码如下:
编辑(考虑到 cmets):
sets = ['datasets/1.pickle', 'datasets/2.pickle', ..., 'datasets/6.pickle']
for i in range(0,7):
train_set = sets[:i]+sets[i+1:]
test_data, test_lbls = crossValidFiles(sets[i]) # returns the data for a specific sub-samlpe, returns two numpy arrays.
for item in train_set:
train_set = [(train_data, train_lbls) for crossValidFiles(item) in train_set]
train_data = np.concatenate([a for (a,b) in train_set], axis = 0)
train_lbls = np.concatenate([b for (a,b) in train_set], axis = 0)
#train_data, train_lbls = crossValidFiles(item) # that returns one file at time.
如何汇总我为训练集返回的文件?
【问题讨论】: