【问题标题】:Interleave 2 Dataframes on certain columns在某些列上交错 2 个数据帧
【发布时间】:2020-05-29 22:59:41
【问题描述】:

我有 2 个数据框

df1:

StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed 
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000

df2:

StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort
AB11,RU15,P1,AJ11,RU25,P2
AB12,RU18,P2,AB11,RU35,P2
AB13,RU19,P3,AB11,RU40,P4

我想交错两个数据帧,我尝试了一些选项,但似乎无法让它工作。我已经接近使用以下代码的功能,但它没有加入相应的列

import pandas as pd
from toolz import interleave

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1)

预期输出如下所示

StartLocation,StartDevice,StartPort,EndLocation,EndDevice,EndPort,LinkType,Speed 
DD1,Switch1,P1,AD1,Switch2,P2,MTP,1000
AB11,RU15,P1,AJ11,RU25,P2,nan,nan
DD2,Switch2,P3,AD2,Switch3,P2,MTP,1000
AB12,RU18,P2,AB11,RU35,P2,nan,nan
DD3,Switch3,P5,AD3,Switch4,P6,MTP,1000
AB13,RU19,P3,AB11,RU40,P4,nan,nan

我认为它应该很简单,但我找不到合适的语法。谁能给点意见?

提前感谢您的帮助!

【问题讨论】:

  • 请提供预期的输出

标签: python pandas interleave


【解决方案1】:

如果列名相同,唯一的区别是可以使用其中一个 DataFrame 中的一些新列名:

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
  StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  \
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1          AB11        RU15        P1        AJ11      RU25      P2     None   
2           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
3          AB12        RU18        P2        AB11      RU35      P2     None   
4           DD3     Switch3        P5         AD3   Switch4      P6      MTP   
5          AB13        RU19        P3        AB11      RU40      P4     None   

    Speed  
0  1000.0  
1     NaN  
2  1000.0  
3     NaN  
4  1000.0  
5     NaN  

适用于任何列名称的更通用解决方案是在之前使用 DataFrame.align 以防止每个 DataFrame 正确对齐列:

print (df1)
  EndDevice EndLocation EndPort LinkType  Speed StartDevice StartLocation  \
0   Switch2         AD1      P2      MTP   1000     Switch1           DD1   
1   Switch3         AD2      P2      MTP   1000     Switch2           DD2   
2   Switch4         AD3      P6      MTP   1000     Switch3           DD3   

  StartPort  
0        P1  
1        P3  
2        P5  

print (df2)
  EndDevice EndLocation EndPort  LinkType  Speed StartDevice StartLocation  \
0      RU25        AJ11      P2       NaN    NaN        RU15          AB11   
1      RU35        AB11      P2       NaN    NaN        RU18          AB12   
2      RU40        AB11      P4       NaN    NaN        RU19          AB13   

  StartPort  
0        P1  
1        P2  
2        P3  

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
  EndDevice EndLocation EndPort LinkType   Speed StartDevice StartLocation  \
0   Switch2         AD1      P2      MTP  1000.0     Switch1           DD1   
1      RU25        AJ11      P2      NaN     NaN        RU15          AB11   
2   Switch3         AD2      P2      MTP  1000.0     Switch2           DD2   
3      RU35        AB11      P2      NaN     NaN        RU18          AB12   
4   Switch4         AD3      P6      MTP  1000.0     Switch3           DD3   
5      RU40        AB11      P4      NaN     NaN        RU19          AB13   

  StartPort  
0        P1  
1        P1  
2        P3  
3        P2  
4        P5  
5        P3  

Index.unionDataFrame.reindex 的另一个想法:

cols = df1.columns.union(df2.columns, sort=False)

df1 = df1.reindex(cols, axis=1)
df2 = df2.reindex(cols, axis=1)
print (df1)
  StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  \
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
2           DD3     Switch3        P5         AD3   Switch4      P6      MTP   

   Speed  
0   1000  
1   1000  
2   1000  

print (df2)
  StartLocation StartDevice StartPort EndLocation EndDevice EndPort  LinkType  \
0          AB11        RU15        P1        AJ11      RU25      P2       NaN   
1          AB12        RU18        P2        AB11      RU35      P2       NaN   
2          AB13        RU19        P3        AB11      RU40      P4       NaN   

   Speed  
0    NaN  
1    NaN  
2    NaN  

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=cols)
print (df3)
  StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType  \
0           DD1     Switch1        P1         AD1   Switch2      P2      MTP   
1          AB11        RU15        P1        AJ11      RU25      P2      NaN   
2           DD2     Switch2        P3         AD2   Switch3      P2      MTP   
3          AB12        RU18        P2        AB11      RU35      P2      NaN   
4           DD3     Switch3        P5         AD3   Switch4      P6      MTP   
5          AB13        RU19        P3        AB11      RU40      P4      NaN   

    Speed  
0  1000.0  
1     NaN  
2  1000.0  
3     NaN  
4  1000.0  
5     NaN  

【讨论】:

  • 问题已排序。非常感谢耶斯瑞尔!你是个传奇!
猜你喜欢
  • 1970-01-01
  • 2019-09-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-08-15
  • 2021-05-29
相关资源
最近更新 更多