Python Pandas 根据行值合并三个数据框答案

【问题标题】：Python Pandas Merge Three Dataframes Based on Row ValuePython Pandas 根据行值合并三个数据框
【发布时间】：2021-01-13 04:20:23
【问题描述】：

假设我有一个具有这种结构的数据框：

  T1P1_T0   Count T1P1_T1  Count.1 T1P1_T3  Count.2
0     one  1207.0    four     1936     one    644.0
1     two   816.0     two     1601   seven    414.0
2   three   712.0    five     1457     NaN      NaN
3     NaN     NaN     six     4564     NaN      NaN

我想要的输出是这样的：

     Element    T1P1_T0  T1P1_T1  T1P1_T3
0        one    1207      NaN    644.0
1        two     816   1601.0      NaN
2      three     712      NaN      NaN
3       four     NaN   1936.0      NaN
4       five           1456.0      NaN
5        six     NaN   4564.0      NaN
6      seven     NaN      NaN    414.0

我尝试过的是将初始数据帧分成三个：

df1 = df.iloc[:,:2]
df2 = df.iloc[:,2:4]
df3 = df.iloc[:,4:]

并尝试合并前两个，然后是第三个，使用 pd.merge 的不同方法：

例如：

result = pd.merge(df1, df2, right_on=df.iloc[:,0], left_on=df.iloc[:,0])

但结果不是我想要的：

   key_0 T1P1_T0   Count T1P1_T1  Count.1
0    one     one  1207.0    four     1936
1    two     two   816.0     two     1601
2  three   three   712.0    five     1457
3    NaN     NaN     NaN     six     4564

不知道如何指定元素名称的列作为合并操作的键值。

对此有何建议？

谢谢

【问题讨论】：

标签： python pandas merge

【解决方案1】：

让我们做concat

out = pd.concat([x.set_index(x.columns[0]).iloc[:,0].dropna() for x in [df1,df2,df3]],keys=df.columns[::2],axis=1)
       T1P1_T0  T1P1_T1  T1P1_T3
one     1207.0      NaN    644.0
two      816.0   1601.0      NaN
three    712.0      NaN      NaN
four       NaN   1936.0      NaN
five       NaN   1457.0      NaN
six        NaN   4564.0      NaN
seven      NaN      NaN    414.0

【讨论】：

【解决方案2】：

从您的数据出发，您可以进行更多的争论以将数据转换为所需的形式；另外，不要合并，而是尝试连接：

顺便说一句，想知道是否可以以更好的格式接收数据，这样您就不必在错误可以渗透的地方进行这种争吵。

df1 = df.iloc[:, :2].dropna()
df1 = (
    df1.set_index(df1.iloc[:, 0].rename("Element"))
    .iloc[:, -1]
    .rename(df1.iloc[:, 0].name)
)
df2 = df.iloc[:, 2:4].dropna()
df2 = (
    df2.set_index(df2.iloc[:, 0].rename("Element"))
    .iloc[:, -1]
    .rename(df2.iloc[:, 0].name)
)
df3 = df.iloc[:, 4:].dropna()
df3 = (
    df3.set_index(df3.iloc[:, 0].rename("Element"))
    .iloc[:, -1]
    .rename(df3.iloc[:, 0].name)
)

df1
Element
one      1207.0
two       816.0
three     712.0
Name: T1P1_T0, dtype: float64

df2
Element
four    1936
two     1601
five    1457
six     4564
Name: T1P1_T1, dtype: int64

df3
Element
one      644.0
seven    414.0
Name: T1P1_T3, dtype: float64

现在，连接：

pd.concat([df1, df2, df3], axis="columns")



       T1P1_T0  T1P1_T1 T1P1_T3
Element         
one     1207.0  NaN     644.0
two     816.0   1601.0  NaN
three   712.0   NaN     NaN
four    NaN     1936.0  NaN
five    NaN     1457.0  NaN
six     NaN     4564.0  NaN
seven   NaN     NaN     414.0

【讨论】：