将 pandas 数据帧拆分为具有相同行数的多个数据帧答案

【问题标题】：Split pandas dataframe into multiple dataframes with equal numbers of rows将 pandas 数据帧拆分为具有相同行数的多个数据帧
【发布时间】：2016-02-28 14:33:14
【问题描述】：

我有一个数据框df：

        a              b          c
0   0.897134    -0.356157   -0.396212
1   -2.357861   2.066570    -0.512687
2   -0.080665   0.719328    0.604294
3   -0.639392   -0.912989   -1.029892
4   -0.550007   -0.633733   -0.748733
5   -0.712962   -1.612912   -0.248270
6   -0.571474   1.310807    -0.271137
7   -0.228068   0.675771    0.433016
8   0.005606    -0.154633   0.985484
9   0.691329    -0.837302   -0.607225
10  -0.011909   -0.304162   0.422001
11  0.127570    0.956831    1.837523
12  -1.074771   0.379723    -1.889117
13  -1.449475   -0.799574   -0.878192
14  -1.029757   0.551023    2.519929
15  -1.001400   0.838614    -1.006977
16  0.677216    -0.403859   0.451338
17  0.221596    -0.323259   0.324158
18  -0.241935   -2.251687   -0.088494
19  -0.995426   0.665569    -2.228848
20  1.714709    -0.353391   0.671539
21  0.155050    1.136433    -0.005721
22  -0.502412   -0.610901   1.520165
23  -0.853906   0.648321    1.124464
24  1.149151    -0.187300   -0.412946
25  0.329229    -1.690569   -2.746895
26  0.165158    0.173424    0.896344
27  1.157766    0.525674    -1.279618
28  1.729730    -0.798158   0.644869
29  -0.107285   -1.290374   0.544023

我需要拆分为多个数据帧，这些数据帧将包含每 10 行 df ，每个小数据帧我将写入单独的文件。所以我决定创建多级数据框，为此首先使用此方法将索引分配给我的df 中的每 10 行：

df['split'] = df['split'].apply(lambda x: np.searchsorted(df.iloc[::10], x, side='right')[0])

扔出去

TypeError: 'function' object has no attribute '__getitem__'

那么，您知道如何解决它吗？我的方法哪里错了？

但是，如果您有另一种方法将我的数据帧拆分为多个数据帧，每个数据帧都包含 10 行 df，那么也欢迎您，因为这种方法只是我想到的第一个方法，但我不确定这是最好的。

【问题讨论】：

标签： python pandas dataframe split

【解决方案1】：

您可以使用字典推导以十行为一组保存数据帧的切片：

df_dict = {n: df.iloc[n:n+10, :] 
           for n in range(0, len(df), 10)}

>>> df_dict.keys()
[0, 10, 20]

>>> df_dict[10]
           a         b         c
10 -0.011909 -0.304162  0.422001
11  0.127570  0.956831  1.837523
12 -1.074771  0.379723 -1.889117
13 -1.449475 -0.799574 -0.878192
14 -1.029757  0.551023  2.519929
15 -1.001400  0.838614 -1.006977
16  0.677216 -0.403859  0.451338
17  0.221596 -0.323259  0.324158
18 -0.241935 -2.251687 -0.088494
19 -0.995426  0.665569 -2.228848

【讨论】：

这也适用于没有简单整数索引的数据帧，非常感谢！

【解决方案2】：

有很多方法可以做你想做的事，你的方法看起来过于复杂。使用缩放索引作为分组键的 groupby 将起作用：

df = pd.DataFrame(data=np.random.rand(100, 3), columns=list('ABC'))
groups = df.groupby(np.arange(len(df.index))//10)
for (frameno, frame) in groups:
    frame.to_csv("%s.csv" % frameno)

【讨论】：

感谢@AlexW，这正是我想要的！
如何在 scala 中做同样的事情？
应该将/ 替换为// 吗？否则，每个组仅包含一行，因为索引保持唯一。