【发布时间】:2020-02-12 18:32:13
【问题描述】:
这是我的两个数据框:
structure(list(Author = c("Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Robinson et al.",
"Robinson et al.", "Robinson et al.", "Robinson et al.", "Louca et al.",
"Aquiloni, L., et al.", "Aquiloni, L., et al.", "Barbaresi, S., et al.",
"Barbaresi, S., et al.", "Barbaresi, S., et al.", "Gherardi, F., et al.",
"Gherardi, F., et al.", "Gherardi, F., et al.", "Loughman et al.",
"Loughman et al.", "Hall et al.", "Holsman et al. ", "Holsman et al. ",
"Smith B.D et al.", "Smith B.D et al."), Year = c(2006L, 2006L,
2006L, 2002L, 2002L, 2002L, 2002L, 2004L, 2004L, 2004L, 2000L,
2000L, 2000L, 2000L, 2014L, 2005L, 2005L, 2004L, 2004L, 2004L,
2002L, 2002L, 2002L, 2013L, 2013L, 1991L, 2006L, 2006L, 1991L,
1991L), Purpose = c("Invasive/Endangered Species", "Movement Metrics",
"Movement Metrics", "Invasive/Endangered Species", "Movement Metrics",
"Movement Metrics", "Movement Metrics", "Invasive/Endangered Species",
"Movement Metrics", "Movement Metrics", "Movement Metrics", "Movement Metrics",
"Movement Metrics", "Invasive/Endangered Species", "Human Interaction",
"Invasive/Endangered Species", "Habitat Use", "Invasive/Endangered Species",
"Feeding/Behavior", "Movement Metrics", "Movement Metrics", "Invasive/Endangered Species",
"Feeding/Behavior", "Movement Metrics", "Habitat Use", "Movement Metrics",
"Habitat Use", "Movement Metrics", "Movement Metrics", "Habitat Use"
)), row.names = c(NA, 30L), class = "data.frame")
structure(list(Author = c("Aquiloni, L., et al.", "Aquiloni, L., et al.",
"Barbaresi, S., et al.", "Barbaresi, S., et al.", "Barbaresi, S., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Bubb, D. H., et al.", "Bubb, D. H., et al.", "Bubb, D. H., et al.",
"Gherardi, F., et al.", "Gherardi, F., et al.", "Gherardi, F., et al.",
"Hall et al.", "Holsman et al. ", "Holsman et al. ", "Louca et al.",
"Loughman et al.", "Loughman et al.", "Robinson et al.", "Robinson et al.",
"Smith B.D et al.", "Smith B.D et al."), Year = c(2005L, 2005L,
2004L, 2004L, 2004L, 2002L, 2002L, 2004L, 2004L, 2006L, 2006L,
2002L, 2002L, 2002L, 1991L, 2006L, 2006L, 2014L, 2013L, 2013L,
2000L, 2000L, 1991L, 1991L), Purpose = c("Habitat Use", "Invasive/Endangered Species",
"Feeding/Behavior", "Invasive/Endangered Species", "Movement Metrics",
"Invasive/Endangered Species", "Movement Metrics", "Invasive/Endangered Species",
"Movement Metrics", "Invasive/Endangered Species", "Movement Metrics",
"Feeding/Behavior", "Invasive/Endangered Species", "Movement Metrics",
"Movement Metrics", "Habitat Use", "Movement Metrics", "Human Interaction",
"Habitat Use", "Movement Metrics", "Invasive/Endangered Species",
"Movement Metrics", "Habitat Use", "Movement Metrics"), count = c(1L,
1L, 1L, 1L, 1L, 1L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 3L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-24L))
第一个数据框显示作者、年份和研究目的(其中可以有多个)。但是,由于我创建数据的方式,存在一些重复项(即 Robinson 等人 2000 列出了 3 次“运动指标”,而我只希望它列出一次)。
我会使用 duplicated 或 unique 函数,但我原来的 DF 有更多非唯一列。
因此,我创建了第二个按作者/年份/目的分组的数据框,这样三个变量的每个组合都有一个计数。有什么办法让我说:
如果 DF2$count > 1,则在 DF1 中找到匹配的行并删除 n(counts)-1 行。
一个例子:
“SomeFunction”标识 DF2 中计数 > 1 的行。
"SomeFunction" 取 DF2 中的作者和年份列并与 DF1 匹配
“SomeFunction”删除重复的行,为每个作者/年份/目的组合留下一行
【问题讨论】:
标签: r duplicates