【问题标题】:Convert dictionaries with list of values into a dataframe将具有值列表的字典转换为数据框
【发布时间】:2020-01-18 09:41:48
【问题描述】:

假设我有三本字典

dictionary_col2
{'MOB': [1, 2], 'ASP': [1, 2], 'YIP': [1, 2]}
 dictionary_col3

{'MOB': ['MOB_L001_R1_001.gz',
         'MOB_L002_R1_001.gz'],
 'ASP': ['ASP_L001_R1_001.gz',
         'ASP_L002_R1_001.gz'],
 'YIP': ['YIP_L001_R1_001.gz',
         'YIP_L002_R1_001.gz']}

dictionary_col4

{'MOB': ['MOB_L001_R2_001.gz',
         'MOB_L002_R2_001.gz'],
 'ASP': ['ASP_L001_R2_001.gz',
         'ASP_L002_R2_001.gz'],
 'YIP': ['YIP_L001_R2_001.gz',
         'YIP_L002_R2_001.gz']}

我想将上述字典转换成数据框。我尝试了以下方法,

df = pd.DataFrame([dictionary_col2, dictionary_col3, dictionary_col4]) df 数据框看起来像,

                ASP MOB YIP
0   [1, 2]  [1, 2]  [1, 2]
1   [ASP_L001_R1_001.gz, ASP_L002_R1_001.gz]    [MOB_L001_R1_001.gz, MOB_L002_R1_001.gz]    [YIP_L001_R1_001.gz, YIP_L002_R1_001.gz]
2   [ASP_L001_R2_001.gz, ASP_L002_R2_001.gz]    [MOB_L001_R2_001.gz, MOB_L002_R2_001.gz]    [YIP_L001_R2_001.gz, YIP_L002_R2_001.gz]

我的目标是拥有一个包含以下列的数据框:

    col1  col2 col3              col4 
    MOB   1   MOB_L001_R1_001.gz MOB_L001_R2_001.gz      
    MOB   2   MOB_L002_R1_001.gz MOB_L002_R2_001.gz 
    ASP   1   ASP_L001_R1_001.gz ASP_L001_R2_001.gz 
    ASP   2   ASP_L002_R1_001.gz MOB_L002_R2_001.gz 
    YIP   1   YIP_L001_R1_001.gz YIP_L001_R2_001.gz
    YIP   2   YIP_L002_R1_001.gz YIP_L002_R2_001.gz

感谢任何帮助/建议!

【问题讨论】:

标签: python pandas dictionary


【解决方案1】:

在 pandas 0.25.0 中使用 concatexplode 通知可以做什么

pd.concat([pd.Series(x).explode() for x in [d1,d2]],axis=1)

【讨论】:

    【解决方案2】:
    pd.DataFrame({'col2': pd.DataFrame(col2).unstack(),
                  'col3': pd.DataFrame(col3).unstack(),
                  'col4': pd.DataFrame(col4).unstack()}).reset_index(level=0)
    

    返回

      level_0  col2                col3                col4
    0     ASP     1  ASP_L001_R1_001.gz  ASP_L001_R2_001.gz
    1     ASP     2  ASP_L002_R1_001.gz  ASP_L002_R2_001.gz
    0     MOB     1  MOB_L001_R1_001.gz  MOB_L001_R2_001.gz
    1     MOB     2  MOB_L002_R1_001.gz  MOB_L002_R2_001.gz
    0     YIP     1  YIP_L001_R1_001.gz  YIP_L001_R2_001.gz
    1     YIP     2  YIP_L002_R1_001.gz  YIP_L002_R2_001.gz
    

    【讨论】:

    • 这个解决方案最适合我的问题,因为它也会生成列名。谢谢!
    【解决方案3】:
    dict_list = [dictionary_col2, dictionary_col3, dictionary_col4]
    
    df = pd.concat([pd.DataFrame.from_dict(x, orient = 'index').unstack() for x in dict_list], axis = 1)
    

    输出:

    >>> df
    
            0   1                   2
    0   MOB 1   MOB_L001_R1_001.gz  MOB_L001_R2_001.gz
        ASP 1   ASP_L001_R1_001.gz  ASP_L001_R2_001.gz
        YIP 1   YIP_L001_R1_001.gz  YIP_L001_R2_001.gz
    1   MOB 2   MOB_L002_R1_001.gz  MOB_L002_R2_001.gz
        ASP 2   ASP_L002_R1_001.gz  ASP_L002_R2_001.gz
        YIP 2   YIP_L002_R1_001.gz  YIP_L002_R2_001.gz
    

    【讨论】:

      【解决方案4】:

      IIUC,你可以这样做:

      pd.concat([pd.DataFrame(d).stack() for d in (d1,d2,d3)], axis=1)
      

      输出:

             0                   1                   2
      0 MOB  1  MOB_L001_R1_001.gz  MOB_L001_R2_001.gz
        ASP  1  ASP_L001_R1_001.gz  ASP_L001_R2_001.gz
        YIP  1  YIP_L001_R1_001.gz  YIP_L001_R2_001.gz
      1 MOB  2  MOB_L002_R1_001.gz  MOB_L002_R2_001.gz
        ASP  2  ASP_L002_R1_001.gz  ASP_L002_R2_001.gz
        YIP  2  YIP_L002_R1_001.gz  YIP_L002_R2_001.gz
      

      【讨论】:

        猜你喜欢
        • 2023-01-13
        • 2021-03-03
        • 2014-10-07
        • 1970-01-01
        • 2016-12-28
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多