【发布时间】:2026-01-31 23:00:01
【问题描述】:
我有 2 个数据框。我想根据 df_2 对 df_1 进行子集化,以便生成的数据框中的行与 df_2 中的行相对应。以下是两个示例数据框:
df_1 = pd.DataFrame({
"ID": ["Lemon","Banana","Apple","Cherry","Tomato","Blueberry","Avocado","Lime"],
"Color": ["Yellow","Yellow","Red","Red","Red","Blue","Green","Green"]})
df_2 = pd.DataFrame({"Color": ["Red","Blue","Yellow","Green","Red","Yellow"]})
我想要的输出是 df_3,其中“颜色”列与 df_2 中的相同:
df_3 = pd.DataFrame({
"ID": ["Apple","Blueberry","Lemon","Avocado","Cherry","Banana"],
"Color": ["Red","Blue","Yellow","Green","Red","Yellow"]})
当我合并 df_1 和 df_2 时,我得到了重复的行,因为 df_2 中的大多数行在 df_1 中有多个匹配项。
merged = df_2.merge(df_1, how="left", on="Color")
删除重复项适用于“黄色”颜色,因为它在 df_2 中的值和 df_1 中的选项的比例为 2:2,但它不适用于“红色”或“绿色”,因为它们有 2 :3 比例和 1:2 比例,导致额外的行。
no_duplicates = merged.drop_duplicates(subset = "ID")
有没有办法对 df_1 进行子集化,其中 df_2 中第一次出现的“Red”提取出 df_1 中第一次出现的“Red”,df_2 中第二次出现的“Red”提取出第二次出现的“Red”在 df_1 等?除非我别无选择,否则我宁愿不使用循环。谢谢。
【问题讨论】:
标签: python pandas merge subset