Pandas：向多索引列数据框添加多列答案

【问题标题】：Pandas: add multiple columns to a multiindex column dataframePandas：向多索引列数据框添加多列
【发布时间】：2020-03-12 11:24:01
【问题描述】：

这个问题是试图概括为这个问题提供的解决方案：

Pandas: add a column to a multiindex column dataframe

我需要为每个列索引生成一列。

spencerlyon2 提供的解决方案适用于我们要添加单列时：

df['bar', 'three'] = [0, 1, 2]

但是我想为每个第一级列索引推广这个操作。

来源 DF：

In [1]: df
Out[2]:
first        bar                 baz
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

下面的目标 DF，要求 three 列是其各自索引的 one 和 two 列的相加。

In [1]: df
Out[2]:
first        bar                           baz                 
second       one       two     three       one       two      three
A      -1.089798  2.053026  0.963228‬  1.440740 -2.317647  -0.876907‬
B       0.488875  0.428836  0.917711 -0.683677  0.345873  -0.337804‬
C      -0.243064 -0.069446 -0.312510  0.478370  0.266761   0.745131‬

【问题讨论】：

标签： python pandas

【解决方案1】：

您可以将join 与具有相同索引的两个数据框一起使用，以一次创建一堆列。

首先，使用groupby 和axis=1 计算总和

ndf = df.groupby(df.columns.get_level_values(0), axis=1).sum()

        bar       baz
A  0.963228  1.910958
B  0.917711  0.729774
C -0.312510 -0.432796

（PS：如果你有两个以上的列，你可以这样做

df.loc[:, (slice(None), ['one', 'two'])].groupby(df.columns.get_level_values(0), axis=1).sum()

首先只切片“一”和“二”列，然后然后 groupby)

然后，使其与您的列索引匹配，即使其成为 MultiIndexed 数据框，就像您的原始数据框一样

ndf.columns = pd.MultiIndex.from_product([ndf.columns, ['three']])

        bar       baz
      three     three
A  0.963228  1.910958
B  0.917711  0.729774
C -0.312510 -0.432796

最后，df.join

finaldf = df.join(ndf).sort_index(axis=1)

如果您真的关心排序，请使用reindex

finaldf.reindex(['one', 'two', 'three'], axis=1, level=1)

first        bar                           baz                    
second       one       two     three       one       two     three
A      -1.089798  2.053026  0.963228  0.470218  1.440740  1.910958
B       0.488875  0.428836  0.917711  1.413451 -0.683677  0.729774
C      -0.243064 -0.069446 -0.312510 -0.911166  0.478370 -0.432796

【讨论】：

感谢 rafaelc 的回复。出于某种原因，在使用 df.columns.get_level_values(0) 时，我在 Groupby 中收到了 ValueError ValueError("Grouper and axis must be same length")。我将其更改为 level=0 并且有效。知道是什么引发了价值错误吗？
@SO_tourist 您可能忘记在 groupby 中添加“axis=1”作为选项

【解决方案2】：

我从您的示例输入开始：

first        bar                 baz          
second       one       two       one       two
A      -1.089798  2.053026  0.470218  1.440740
B       0.488875  0.428836  1.413451 -0.683677
C      -0.243064 -0.069446 -0.911166  0.478370

要向列 MultiIndex 的每个级别 0 添加一个新列，你可以运行类似的东西：

for c1 in df.columns.get_level_values('first').unique():
    # New column int index
    cInd = int(df.columns.get_loc(c1).stop)
    col = (c1, 'three')      # New column name
    newVal = df[(c1, 'one')] + df[(c1, 'two')]
    df.insert(loc=cInd, column=col, value=newVal)  # Insert the new column

在上面的例子中，新列中的值是连续的数字，但是根据您的情况设置它们。

我的代码的结果（列排序后）是：

first        bar                           baz                    
second       one       two     three       one       two     three
A      -1.089798  2.053026  0.963228  0.470218  1.440740  1.910958
B       0.488875  0.428836  0.917711  1.413451 -0.683677  0.729774
C      -0.243064 -0.069446 -0.312510 -0.911166  0.478370 -0.432796

【讨论】：

感谢您的回复。我稍微修改了我的问题，因为我可能没有清楚地解释自己。我需要为每个索引添加一个列，例如，新列是 one 和 two 列的添加
我更正了我的答案。现在，在 MultiIndex 列的顶层为每个名称添加了一个新列。
为什么结果是正确的？ OP明确声明three应该是one和two的总和哈哈；p
你是对的，最初我错过了这个细节。现在新列包含总和。
现在看起来正确！但即使这样可行，它也会非常非常慢；/