【发布时间】:2015-03-02 07:26:26
【问题描述】:
我有一个类似的数据框:
n = c(rep("x", 3), rep("y", 5), rep("z", 2))
s = c("aa", "bb", "cc", "dd", "ee", "aa", "bb", "cc", "dd", "ff")
df = data.frame(n, s)
如果我要在 df$s 上加入它们,我想找到每个唯一 df$n 与其他每个 df$n 的匹配数。以下工作,但它很慢,而且我有很大的数据集。有没有更快的方法来解决这个问题?
place <- unique(df$n)
df_answer <- data.frame(place1 ="test1", place2 = "test2", matches = 2)
for(i in place) {
for(k in place) {
m1 <- inner_join(filter(df, n == i), filter(df, n == k), by = "s")
m2 <- data.frame(place1 = i, place2 = k, matches = length(m1$s))
df_answer <- rbind(df_answer, m2)
}
}
df_answer <- filter(df_answer, place1 != "test1")
【问题讨论】: