【问题标题】:How to map values of pandas MultiIndex dataframe to an other MultiIndex dataframe having a different shape?如何将 pandas MultiIndex 数据帧的值映射到具有不同形状的其他 MultiIndex 数据帧?
【发布时间】:2020-08-04 09:58:33
【问题描述】:

我有以下两个不同形状的 MultiIndex 数据框: 熊猫数据框'a'

col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])

有输出:

    Set 1             Set 2                    
    paa paa jaa paa   jaa jaa jaa paa paa paa
0     3   0   2   1     2   0   3   5   4   3
1     2   1   2   1     0   5   5   5   3   4
2     5   2   1   2     5   1   5   5   0   2

和数据框'b'

col0 = ['Set 1','Set 1','Set 2','Set 2']
col1 = ['P1_1','P1_2','P2_1','P2_2']
b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])

有输出:

   Set 1      Set 2     
   P1_1 P1_2  P2_1 P2_2
0     2    1     1    2
1     0    0     2    2
2     0    0     1    0

现在我想将两者结合起来。保留 pandas 'a' 的 MultiIndex 和 pandas'b' 的值。

pandas 'c' 的期望输出:

      Set 1                   Set 2                    
      P1_1  P1_2  P1_1  P1_2  P1_1  P1_2  P1_1  P1_2  P1_1  P1_2
0     2     1     2     1     1     2     1     2     1     2
1     0     0     0     0     2     2     2     2     2     2
2     0     0     0     0     1     0     1     0     1     0

pandas 'c' 的 level_0 与 pandas 'b' 的 level_0 重合。 'c' 中的 Level_1 与 pandas 'b' 的列交替..

您可能需要以某种方式组合以下内容:

temp=b.reindex(columns=map(lambda x:(x[0],'P1_1') ,a.columns))
a.groupby(level=0, axis=1)

一切都会有帮助!

【问题讨论】:

    标签: pandas replace duplicates multi-index fill


    【解决方案1】:

    想法是匹配级别ab 并重复用于DataFrame.reindex 的第二级列:

    np.random.seed(123)
        
    col0 = ['Set 1','Set 1','Set 1','Set 1','Set 2','Set 2','Set 2','Set 2','Set 2','Set 2']
    col1 = ['paa','paa','jaa','paa','jaa','jaa','jaa','paa','paa','paa']
    a = pd.DataFrame(data = np.random.randint(6, size=(3, 10)), columns = [col0,col1])
    
    col0 = ['Set 1','Set 1','Set 2','Set 2']
    col1 = ['P1_1','P1_2','P2_1','P2_2']
    b = pd.DataFrame(data = np.random.randint(3, size=(3, 4)), columns = [col0,col1])
    
    
    print (a)
      Set 1             Set 2                    
        paa paa jaa paa   jaa jaa jaa paa paa paa
    0     5   2   4   2     1   3   2   3   1   1
    1     0   1   1   0     0   1   3   5   4   0
    2     0   4   1   3     2   4   2   4   0   5
    
    print (b)
      Set 1      Set 2     
       P1_1 P1_2  P2_1 P2_2
    0     0    1     0    0
    1     0    2     1    1
    2     2    2     2    1
    

    #https://stackoverflow.com/a/3391105
    def repeat_to_length(s, wanted):
        return (s * (wanted//len(s) + 1))[:wanted]
    
    
    out = []
    for lvl in a.columns.levels[0]:
        colsa = a.xs(lvl, axis=1, level=0).columns.tolist()
        colsb = b.xs(lvl, axis=1, level=0).columns.tolist()
        lvl1 = repeat_to_length(colsb, len(colsa))
        out.extend(list(zip([lvl] * len(lvl1), lvl1)))
    
    print (out)
    [('Set 1', 'P1_1'), ('Set 1', 'P1_2'), ('Set 1', 'P1_1'), 
     ('Set 1', 'P1_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2'), 
     ('Set 2', 'P2_1'), ('Set 2', 'P2_2'), ('Set 2', 'P2_1'), ('Set 2', 'P2_2')]
    
    mux = pd.MultiIndex.from_tuples(out)
    print (mux)
    MultiIndex([('Set 1', 'P1_1'),
                ('Set 1', 'P1_2'),
                ('Set 1', 'P1_1'),
                ('Set 1', 'P1_2'),
                ('Set 2', 'P2_1'),
                ('Set 2', 'P2_2'),
                ('Set 2', 'P2_1'),
                ('Set 2', 'P2_2'),
                ('Set 2', 'P2_1'),
                ('Set 2', 'P2_2')],
               )
    

    c = b.reindex(mux, axis=1)
    print (c)
      Set 1                Set 2                         
       P1_1 P1_2 P1_1 P1_2  P2_1 P2_2 P2_1 P2_2 P2_1 P2_2
    0     0    1    0    1     0    0    0    0    0    0
    1     0    2    0    2     1    1    1    1    1    1
    2     2    2    2    2     2    1    2    1    2    1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-01-03
      • 2017-04-06
      • 2017-09-19
      • 1970-01-01
      • 1970-01-01
      • 2018-08-14
      • 2019-01-14
      • 2021-12-01
      相关资源
      最近更新 更多