【问题标题】:pandas dataframes merging rows into columns将行合并为列的熊猫数据框
【发布时间】:2019-07-02 04:14:35
【问题描述】:

我有 2 个 pandas 数据帧(df1、df2),我正在尝试从中提取数据并创建第三个数据帧(df3)

df1 有 2 列(一个 id 列和另一个包含第二个数据帧 (df2) 中的列名称的列

df1 looks like:
===============
id1      name
---      ----
1        df2_column1_name
5        df2_column1_name
33       df2_column3_name
...
... and so on

df2 looks like:
===============
id2  df2_column1_name   df2_column2_name   df2_column2_name .... and so on
---  ----------------   ----------------   ----------------
12   Jimmy              male               25               .... 
16   Becky              female             30               ....
75   Mike               male               80               ....
....
.... and so on


I am trying to create df3 to look like:
=======================================
column1  Column2  Column3
-------  -------  -------
1        12       Jimmy    
5        12       male 
33       12       25
.
.
1        16       Becky
5        16       female
33       16       30
.
.
1        75       Mike
5        75       male
33       75       80
.
.
.

数据框可能非常大。如果可能的话,我试图找出最有效的方法来做到这一点而无需双循环。请建议最好的方法来做到这一点。谢谢

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    堆栈和合并可以帮助您完成大部分工作:

    In [11]: df2.set_index("id2").stack().reset_index(name='value')
    Out[11]:
       id2           level_1   value
    0   12  df2_column1_name   Jimmy
    1   12  df2_column2_name    male
    2   12  df2_column3_name      25
    3   16  df2_column1_name   Becky
    4   16  df2_column2_name  female
    5   16  df2_column3_name      30
    6   75  df2_column1_name    Mike
    7   75  df2_column2_name    male
    8   75  df2_column3_name      80
    
    In [12]: df2.set_index("id2").stack().reset_index(name='value').merge(df1, right_on="name", left_on="level_1")
    Out[12]:
       id2           level_1   value  id1              name
    0   12  df2_column1_name   Jimmy    1  df2_column1_name
    1   16  df2_column1_name   Becky    1  df2_column1_name
    2   75  df2_column1_name    Mike    1  df2_column1_name
    3   12  df2_column2_name    male    5  df2_column2_name
    4   16  df2_column2_name  female    5  df2_column2_name
    5   75  df2_column2_name    male    5  df2_column2_name
    6   12  df2_column3_name      25   33  df2_column3_name
    7   16  df2_column3_name      30   33  df2_column3_name
    8   75  df2_column3_name      80   33  df2_column3_name
    

    最后,您必须只选择您想要的列并进行排序:

    In [13]: df2.set_index("id2").stack().reset_index(name='value').merge(df1, right_on="name", left_on="level_1")[["id1", "id2", "value"]].sort_v
         ...: alues("id2")
    Out[13]:
       id1  id2   value
    0    1   12   Jimmy
    3    5   12    male
    6   33   12      25
    1    1   16   Becky
    4    5   16  female
    7   33   16      30
    2    1   75    Mike
    5    5   75    male
    8   33   75      80
    

    【讨论】:

    • 谢谢。稍作改动,我就能让它满足我的需要。
    猜你喜欢
    • 1970-01-01
    • 2016-10-31
    • 2021-06-30
    • 2021-03-06
    • 2013-09-26
    • 1970-01-01
    • 2019-06-29
    • 1970-01-01
    • 2021-11-10
    相关资源
    最近更新 更多