【问题标题】:python merge and unmatched records also need to existspython合并和不匹配的记录也需要存在
【发布时间】:2023-04-11 10:51:01
【问题描述】:

我有两个文件

输入.csv

11/13/2020 07:41:09 TREE count1: id1 green001
11/13/2020 07:43:09 TREE count1: id1 black001
11/13/2020 07:45:09 TREE count1: id2 black001
11/13/2020 07:45:09 PLAN count1: id3 green002
11/13/2020 07:45:09 PLAN count1: id4 green004

lookup.csv

ID,item,message
id1,item1,message 1
id2,item2,message 2
id3,item3,message 3

我正在尝试合并这两个文件并预期低于输出 预期输出:

Time,Type,counts,id,item,message,colour
11/13/2020 07:41:09,TREE,count1,id1,item1,message 1,green001
11/13/2020 07:43:09,TREE,count1,id1,item1,message 1,black001
11/13/2020 07:45:09,TREE,count1,id2,item2,message 2,black001
11/13/2020 07:45:09,PLAN,count1,id3,item3,message 3,green002
11/13/2020 07:45:19,PLAN,count1,id4,     ,         ,green004

当查找文件中存在 ID 值时,我能够实现合并。 代码:

import pandas as pd

# read input and remove spurious : at end of count
input = pd.read_csv("input.csv", sep=' ',
         names=["date","time", "tree","count","ID", "info"])
input["count"] = input["count"].apply(lambda s:s[:-1])

# read lookup and merge
lookup = pd.read_csv("lookup.csv")
merged = input.merge(lookup, on="ID")

# collapse time and date to single column
merged["time"] = merged["date"] + " " + merged["time"]
del merged["date"]

# output
print(merged)
merged.to_csv("testme.csv", index=False)

如果 input.csv 中的所有 ID 值都存在于 lookup.csv 文件中,则代码可以正常工作,但当 ID 值不存在于 lookup.csv 文件中时代码会失败

任何建议都会有所帮助。

【问题讨论】:

    标签: python pandas python-2.7 dataframe merge


    【解决方案1】:

    尝试将合并的“方式”输入从“内部”更改为“左侧”或“外部”。默认为“内部”,这只会导致 ID 在两个 DataFrame 中的合并。您还可以设置指示标志,告诉您每个 DataFrame 中有哪些记录。

    merged = input.merge(lookup, on="ID", how='left')
    
    merged = input.merge(lookup, on="ID", how='outer', indicator=True)
    

    【讨论】:

      猜你喜欢
      • 2016-10-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-07-01
      • 2018-01-08
      • 2019-08-10
      • 1970-01-01
      相关资源
      最近更新 更多