【问题标题】:How to reshape LSTM input如何重塑 LSTM 输入
【发布时间】:2021-11-24 20:52:08
【问题描述】:

我有 2833 行、6 个特征和 8 个标签的数据 然后我根据标签将数据分为训练和测试。 在 LSTM 建模时,我收到以下错误:

ValueError: cannot reshape array of size 11874 into shape (1979,64,6)

这是我的代码:

df_raw_primer = pd.read_excel(path_data)

# Get unique label
unique_labels = df_raw_primer['label'].unique()
split_factor=0.7
train_data = pd.DataFrame(columns=df_raw_primer.columns)
test_data = pd.DataFrame(columns=df_raw_primer.columns)

unique_lengths = {}
for uni in unique_labels:
  unique_lengths[uni] = int(len(df_raw_primer[df_raw_primer.label == uni]) * split_factor)

for uni in unique_labels:
  for _, row in df_raw_primer.iterrows():
    if(row['label'] == uni):
      if(unique_lengths[uni]):  # if unique klength is not equal to 0
        train_data = train_data.append({'label': row['label'],
                                          'gyro x': row['gyro x'], 
                                          'gyro y': row['gyro y'], 
                                          'gyro z': row['gyro z'], 
                                          'acc x': row['acc x'], 
                                          'acc y': row['acc y'], 
                                          'acc z': row['acc z']},
                                         ignore_index=True)
        unique_lengths[uni] = unique_lengths[uni] - 1  # minus unique lengths value
      else:
        test_data = test_data.append({'label': row['label'],
                                          'gyro x': row['gyro x'], 
                                          'gyro y': row['gyro y'], 
                                          'gyro z': row['gyro z'], 
                                          'acc x': row['acc x'], 
                                          'acc y': row['acc y'], 
                                          'acc z': row['acc z']},
                                         ignore_index=True)          
                                         
sliceInput_train_data = train_data.iloc[:, 0:-1]
sliceTarget_train_label = train_data['label']
sliceInput_test_data = test_data.iloc[:, 0:-1]
sliceTarget_test_label = test_data['label']

# fit and evaluate a model
def evaluate_model(trainX, trainy, testX, testy):
    verbose, epochs, batch_size = 0, 15, 64
    n_timesteps, n_features, n_outputs = 64, 6, 8
    model = Sequential() 
    model.add(LSTM(32, input_shape=(n_timesteps,n_features), return_sequences=True))
    model.add(Dropout(0.1))
    model.add(attention(return_sequences=False)) # receive 3D and output 2D
    model.add(Dense(n_outputs, activation='softmax'))
    model.summary()
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    # fit network
    model.fit(trainX, trainy, epochs=epochs, batch_size=batch_size, verbose=verbose)
    # evaluate model
    loss, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
    attention_weights = model.layers[3].get_weights()[0]
    heat_map = sb.heatmap(attention_weights)
    pyplot.show()
    return accuracy

# summarize scores
def summarize_results(scores):
    print(scores)
    m, s = mean(scores), std(scores)
    print('Accuracy: %.3f%% (+/-%.3f)' % (m, s))

# run an experiment
def run_experiment(repeats=10):
    trainX, trainy, testX, testy = sliceInput_train_data, sliceTarget_train_label, sliceInput_test_data, sliceTarget_test_label 
    trainX = np.array(trainX)
    trainX = np.reshape(trainX, (trainX.shape[0], 64, trainX.shape[1]))
    scores = list()
    for r in range(repeats):
        score = evaluate_model(trainX, trainy, testX, testy)
        score = score * 100.0
        print('>#%d: %.3f' % (r+1, score))
        scores.append(score)
    # summarize results
    summarize_results(scores)
    
run_experiment()

我真的很困惑,我不知道 11874 来自哪里,而我的 trainX 形状数据是 print(trainX.shape) = (1979, 6) 我的数据有问题吗?

【问题讨论】:

    标签: python dataframe deep-learning lstm reshape


    【解决方案1】:

    您需要将数据从 dataFrames 转换为 numpy 数组。检查数据的形状并相应地利用输出...

    编辑:

    trainX = np.array(trainX)
    
    trainX = np.reshape(trainX, (x, y, z))
    

    x、y 和 z 是您想要的数据形状的替代品。您可以访问特定组件,例如:

    trainX = np.reshape(trainX, (trainX.shape[0], 64, trainX.shape[1]))
    

    话虽如此,通常 input_shape 用方括号 [] 给出,这可能会影响事物,但您肯定需要一个 numpy 训练数据数组而不是 pandas 数据帧才能在任何机器学习程序中取得很大成功。

    【讨论】:

    • 感谢您的回答。我尝试了您的建议,并像这样更改了方法 def run_experiment(repeats=10): trainX = np.reshape(trainX, (trainX[0], 64, trainX[1])) trainy = np.reshape(trainy, ( trainy[0], 64, trainy[1])) testX = np.reshape(testX, (testX[0], 64, testX[1])) testy = np.reshape(testy, (testy[0], 64 , testy[1])) 但我收到这样的错误: KeyError: 0
    • 我刚刚编辑了我的答案。应该是 trainX.shape[0]
    • 再次感谢您的帮助,但现在我收到此错误:无法将大小为 11874 的数组重新整形为形状 (1979,64,6)。我不知道 11874 是从哪里来的。我的数据有问题吗?
    • 尝试首先将您的训练数据转换为数组:trainX = np.array(trainX) 然后重塑
    • 分train和test后的shape数据: print(train_data.shape) = (1979, 7) --- print(test_data.shape) = (854, 7) --- shape data when分成特征和标签 print(trainX.shape) = (1979, 6) --- print(trainy .shape) = (1979,) --- print(testX .shape) = (854, 6) --- print (testy.shape) = (854,)