【问题标题】:setting columns in multiindex pandas在多索引熊猫中设置列
【发布时间】:2026-01-31 02:10:01
【问题描述】:

我有这个从 csv 导入的 pandas df:

df
0  0 apple  banana  orange                        dates apple  banana  orange
1  1      1       1      1     Friday, January 01, 2021      1       1      1
2  2      1       1      1   Saturday, January 02, 2021      2       2      2
3  3      1       1      1     Sunday, January 03, 2021      3       3      3
4  4      1       1      1     Monday, January 04, 2021      4       4      4
5  5      1       1      1    Tuesday, January 05, 2021      5       5      5
6  6      1       1      1  Wednesday, January 06, 2021      6       6      6
7  7      1       1      1   Thursday, January 07, 2021      7       7      7
8  8      1       4      1     Friday, January 08, 2021      8       8      8
9  9      1       1      1   Saturday, January 09, 2021      9       9      9

是否可以将左侧的所有内容分组在fresh 下,而日期右侧的所有内容都可以在多索引格式的spoil 列下。例如,有一列包含 [apple,banana,orange]。我想这样做是因为稍后当我将日期设置为索引时不会混淆,因为列的两侧具有相同的名称。

【问题讨论】:

    标签: python pandas multi-index


    【解决方案1】:

    你可以试试:

    # Get the column number of column `dates`
    dates_loc = df.columns.get_loc('dates')
    
    arrays = [['fresh'] * dates_loc + [''] + ['spoil'] * (len(df.columns) - dates_loc -1), df.columns.tolist()]
    
    df.columns = pd.MultiIndex.from_arrays(arrays)
    
    
    
      fresh                                                        spoil                  
          0   0 apple banana orange                        dates   apple   banana  orange
    0     1   1     1      1      1     Friday, January 01, 2021       1        1        1
    1     2   2     1      1      1   Saturday, January 02, 2021       2        2        2
    2     3   3     1      1      1     Sunday, January 03, 2021       3        3        3
    3     4   4     1      1      1     Monday, January 04, 2021       4        4        4
    4     5   5     1      1      1    Tuesday, January 05, 2021       5        5        5
    5     6   6     1      1      1  Wednesday, January 06, 2021       6        6        6
    6     7   7     1      1      1   Thursday, January 07, 2021       7        7        7
    7     8   8     1      4      1     Friday, January 08, 2021       8        8        8
    8     9   9     1      1      1   Saturday, January 09, 2021       9        9        9
    

    【讨论】:

      【解决方案2】:
      df.columns = pd.MultiIndex.from_arrays([['', '', 'fresh', 'fresh', 'fresh', '', 'spoil', 'spoil', 'spoil'],
                                              df.columns])
      

      输出:

               fresh                                              spoil                  
         0   0 apple banana orange                        dates   apple   banana   orange
      0  1   1     1      1      1     Friday, January 01, 2021       1        1        1
      1  2   2     1      1      1   Saturday, January 02, 2021       2        2        2
      2  3   3     1      1      1     Sunday, January 03, 2021       3        3        3
      3  4   4     1      1      1     Monday, January 04, 2021       4        4        4
      4  5   5     1      1      1    Tuesday, January 05, 2021       5        5        5
      5  6   6     1      1      1  Wednesday, January 06, 2021       6        6        6
      6  7   7     1      1      1   Thursday, January 07, 2021       7        7        7
      7  8   8     1      4      1     Friday, January 08, 2021       8        8        8
      8  9   9     1      1      1   Saturday, January 09, 2021       9        9        9
      

      注意。如果你想set_index('dates')在这个操作之前做,这会更容易

      【讨论】:

      • 感谢您的帮助。当我尝试运行时出现此错误:ValueError: all arrays must be same length
      • 检查真实数据框中的列数。可能不一样?
      • 您能否告诉我如何摆脱前 3 个“0”列?谢谢
      • 第一个是索引,如果你reset_index('dates')就会消失。另外两个确实是列,您可以使用df.iloc[:, 2:]df.drop('0', axis=1) 删除它们(如果零是字符串)。
      【解决方案3】:

      这可能会有所帮助 df.columns.values[1] = "apple1" df.columns.values[2] = "banana1"

      【讨论】: