如何将数据框列表拆分为两个列表？答案

【问题标题】：How to split a list of data frames into two lists?如何将数据框列表拆分为两个列表？
【发布时间】：2020-02-23 22:46:27
【问题描述】：

这是我的初始数据框df：

col1    col2    col3
  1       0.5     10
  1       0.3     11
  5       1.4     1
  3       1.5     2
  1       0.9     10
  3       0.4     7
  1       1.2     9
  3       0.1     11
  4       0.1     11

我将其转换为数据框列表list_df：

n = 3 # the value of "n" does not matter
list_df = [df[i:i+n] for i in range(0, df.shape[0],n)]

list_df

[
  pd.DataFrame(
    col1    col2    col3
      1       0.5     10
      1       0.3     11
      5       1.4     1),
  pd.DataFrame(
    col1    col2    col3
      3       1.5     2
      1       0.9     10
      3       0.4     7),
  pd.DataFrame(
    col1    col2    col3
      1       1.2     9
      3       0.1     11
      4       0.1     11)
]

如何将此列表随机拆分为两个数据框列表：list_df1 和 list_df2，这样list_df1 将包含 70% 的数据框列表，list_df2 将包含其余数据框。

我尝试使用掩码，但它不适用于数据框列表。

【问题讨论】：

您想将一个列表分成 n=2 个分区吗？检查：*.com/questions/2659900/…
@BelbaharRaouf：谢谢，但我认为这与我需要的不同。我有一个数据框列表。其实n的值（即数据帧的行数）无关紧要。
这个有帮助吗：*.com/a/48561916/1534017？也适用于数据框列表。
@Cleb：是的，似乎非常接近我的需要。如何定义数据帧列表应该拆分的索引？
@Cleb: list_df1, list_df1 = np.split(list_df, [6]) 这似乎不起作用。

标签： python pandas

【解决方案1】：

您可以使用random_integers from numpy 获取要保留的索引列表，然后过滤list_df

import numpy as np
import math

# compute what is 70% of the elements of list_df
n_70pct = math.floor(len(list_df)*0.7)

# take a sample of 70% of indexes in list_df
int_sample = np.random.random_integers(0,len(list_df), n_70pct ).tolist()

# keep in list_df1 the indices that are in int_sample
list_df1 = [ list_df[i] for i in int_sample]

# keep in list_df2 the indices that are not in int_sample
list_df2 = [ list_df[i] for i in range(0,len(list_df)) if i not in int_sample]

【讨论】：