【问题标题】:Python Pandas Merge Three Dataframes Based on Row ValuePython Pandas 根据行值合并三个数据框
【发布时间】:2021-01-13 04:20:23
【问题描述】:

假设我有一个具有这种结构的数据框:

  T1P1_T0   Count T1P1_T1  Count.1 T1P1_T3  Count.2
0     one  1207.0    four     1936     one    644.0
1     two   816.0     two     1601   seven    414.0
2   three   712.0    five     1457     NaN      NaN
3     NaN     NaN     six     4564     NaN      NaN

我想要的输出是这样的:

     Element    T1P1_T0  T1P1_T1  T1P1_T3
0        one    1207      NaN    644.0
1        two     816   1601.0      NaN
2      three     712      NaN      NaN
3       four     NaN   1936.0      NaN
4       five           1456.0      NaN
5        six     NaN   4564.0      NaN
6      seven     NaN      NaN    414.0

我尝试过的是将初始数据帧分成三个:

df1 = df.iloc[:,:2]
df2 = df.iloc[:,2:4]
df3 = df.iloc[:,4:]

并尝试合并前两个,然后是第三个,使用 pd.merge 的不同方法:

例如:

result = pd.merge(df1, df2, right_on=df.iloc[:,0], left_on=df.iloc[:,0])

但结果不是我想要的:

   key_0 T1P1_T0   Count T1P1_T1  Count.1
0    one     one  1207.0    four     1936
1    two     two   816.0     two     1601
2  three   three   712.0    five     1457
3    NaN     NaN     NaN     six     4564

不知道如何指定元素名称的列作为合并操作的键值。

对此有何建议?

谢谢

【问题讨论】:

    标签: python pandas merge


    【解决方案1】:

    让我们做concat

    out = pd.concat([x.set_index(x.columns[0]).iloc[:,0].dropna() for x in [df1,df2,df3]],keys=df.columns[::2],axis=1)
           T1P1_T0  T1P1_T1  T1P1_T3
    one     1207.0      NaN    644.0
    two      816.0   1601.0      NaN
    three    712.0      NaN      NaN
    four       NaN   1936.0      NaN
    five       NaN   1457.0      NaN
    six        NaN   4564.0      NaN
    seven      NaN      NaN    414.0
    

    【讨论】:

      【解决方案2】:

      从您的数据出发,您可以进行更多的争论以将数据转换为所需的形式;另外,不要合并,而是尝试连接:

      顺便说一句,想知道是否可以以更好的格式接收数据,这样您就不必在错误可以渗透的地方进行这种争吵。

      df1 = df.iloc[:, :2].dropna()
      df1 = (
          df1.set_index(df1.iloc[:, 0].rename("Element"))
          .iloc[:, -1]
          .rename(df1.iloc[:, 0].name)
      )
      df2 = df.iloc[:, 2:4].dropna()
      df2 = (
          df2.set_index(df2.iloc[:, 0].rename("Element"))
          .iloc[:, -1]
          .rename(df2.iloc[:, 0].name)
      )
      df3 = df.iloc[:, 4:].dropna()
      df3 = (
          df3.set_index(df3.iloc[:, 0].rename("Element"))
          .iloc[:, -1]
          .rename(df3.iloc[:, 0].name)
      )
      
      df1
      Element
      one      1207.0
      two       816.0
      three     712.0
      Name: T1P1_T0, dtype: float64
      
      df2
      Element
      four    1936
      two     1601
      five    1457
      six     4564
      Name: T1P1_T1, dtype: int64
      
      df3
      Element
      one      644.0
      seven    414.0
      Name: T1P1_T3, dtype: float64
      

      现在,连接:

      pd.concat([df1, df2, df3], axis="columns")
      
      
      
             T1P1_T0  T1P1_T1 T1P1_T3
      Element         
      one     1207.0  NaN     644.0
      two     816.0   1601.0  NaN
      three   712.0   NaN     NaN
      four    NaN     1936.0  NaN
      five    NaN     1457.0  NaN
      six     NaN     4564.0  NaN
      seven   NaN     NaN     414.0
      

      【讨论】:

        猜你喜欢
        • 2016-01-06
        • 2021-05-19
        • 1970-01-01
        • 1970-01-01
        • 2021-09-10
        • 2019-12-04
        • 1970-01-01
        • 2021-06-25
        • 1970-01-01
        相关资源
        最近更新 更多