【问题标题】:multiindex selecting in pandas熊猫中的多索引选择
【发布时间】:2018-09-27 06:49:12
【问题描述】:

我无法理解 pandas 中的多索引选择。

                    0  1  2  3
first second third            
C     one    mean   3  4  2  7
             std    4  1  7  7
      two    mean   3  1  4  7
             std    5  6  7  0
      three  mean   7  0  2  5
             std    7  3  7  1
H     one    mean   2  4  3  3
             std    5  5  3  5
      two    mean   5  7  0  6
             std    0  1  0  2
      three  mean   5  2  5  1
             std    9  0  4  6
V     one    mean   3  7  3  9
             std    8  7  9  3
      two    mean   1  9  9  0
             std    1  1  5  1
      three  mean   3  1  0  6
             std    6  2  7  4

我需要创建新行:

- 'CH' : ['CH',:,'mean'] => ['C',:,'mean'] - ['H',:,'mean']
- 'CH' : ['CH',:,'std'] => (['C',:,'std']**2 + ['H',:,'std']**2)**.5

当尝试选择行时,我得到不同类型的错误: UnsortedIndexError: 'MultiIndex Slicing 要求索引是完全 lexsorted tuple len (3), lexsort depth (1)'

这个操作应该如何进行?

import pandas as pd
import numpy as np
iterables = [['C', 'H', 'V'],
          ['one','two','three'],
          ['mean','std']]
midx = pd.MultiIndex.from_product(iterables, names=['first', 'second','third'])
chv = pd.DataFrame(np.random.randint(0,high=10,size=(18,4)), index=midx)
print (chv)
idx = pd.IndexSlice
chv.loc[:,idx['C',:,'mean']]

【问题讨论】:

    标签: python pandas multi-index


    【解决方案1】:

    您可以先过滤slicers,然后是rename第一级并使用算术运算,最后是concat

    #avoid UnsortedIndexError
    df = df.sort_index()
    
    idx = pd.IndexSlice
    c1 = chv.loc[idx['C',:,'mean'], :].rename({'C':'CH'}, level=0)
    h1 = chv.loc[idx['H',:,'mean'], :].rename({'H':'CH'}, level=0)
    ch1 = c1 - h1
    
    c2 = chv.loc[idx['C',:,'std'], :].rename({'C':'CH'}, level=0)**2
    h2 = chv.loc[idx['H',:,'std'], :].rename({'H':'CH'}, level=0)**2
    ch2 = (c2 + h2)**.5
    
    df = pd.concat([chv, ch1, ch2]).sort_index()
    

    print (df)
                               0         1         2         3
    first second third                                        
    C     one    mean   7.000000  5.000000  8.000000  3.000000
                 std    0.000000  4.000000  4.000000  4.000000
          three  mean   4.000000  2.000000  1.000000  6.000000
                 std    8.000000  7.000000  3.000000  3.000000
          two    mean   1.000000  8.000000  2.000000  5.000000
                 std    2.000000  2.000000  4.000000  2.000000
    CH    one    mean   1.000000  2.000000  1.000000  2.000000
                 std    4.000000  7.211103  4.000000  7.211103
          three  mean   1.000000  0.000000 -4.000000  2.000000
                 std    8.062258  7.071068  4.242641  3.000000
          two    mean  -1.000000  6.000000 -2.000000  3.000000
                 std    9.219544  7.280110  4.123106  2.000000
    H     one    mean   6.000000  3.000000  7.000000  1.000000
                 std    4.000000  6.000000  0.000000  6.000000
          three  mean   3.000000  2.000000  5.000000  4.000000
                 std    1.000000  1.000000  3.000000  0.000000
          two    mean   2.000000  2.000000  4.000000  2.000000
                 std    9.000000  7.000000  1.000000  0.000000
    V     one    mean   9.000000  5.000000  0.000000  5.000000
                 std    7.000000  9.000000  1.000000  1.000000
          three  mean   3.000000  0.000000  3.000000  4.000000
                 std    1.000000  4.000000  9.000000  2.000000
          two    mean   3.000000  6.000000  3.000000  2.000000
                 std    1.000000  3.000000  1.000000  4.000000
    

    【讨论】:

    • 很好的解决方案!谢谢
    • 出于任何原因,在我的真实案例中,解决方案没有按原样工作:我必须更改包括 columns= :rename(columns={'C':'CH'}, level= 0) 。相反,调试器运行良好。
    • @Guido - 也许在真实的 DataFrame 中,列中有多重索引。在我的解决方案中.rename({'C':'CH'}, level=0).rename(index={'C':'CH'}, level=0) 相同
    • 确实像你的评论
    猜你喜欢
    • 1970-01-01
    • 2015-01-20
    • 1970-01-01
    • 2020-04-27
    • 2017-10-30
    • 2016-12-14
    • 2020-10-12
    • 2017-03-12
    • 2014-10-01
    相关资源
    最近更新 更多