匹配列和列表答案

【问题标题】：Match columns and lists匹配列和列表
【发布时间】：2014-05-31 08:40:32
【问题描述】：

抱歉标题，希望不要太误导。我有以下数据框 df1：

 id1     clas1    clas2    clas3
 512     ns       abx      NA
 512     ns       or       NA
 512     abx      dm       sup
 845     or       NA       NA
 1265    dd       ivf      NA
 1265    ns       ivf      pts
 9453    col      ns       ivf
 9453    abx      ns       or     
 95635   ns       abx      or

然后我有“df2”，其中包含以下信息（df1$id1 中的一些值包含在 df2$id2 中，反之亦然），这是另一个数据集中的列或第一个数据集的不同长度。

 id2      clas0
 102      ns
 512      ns
 915      ns
 1265     ns
 9453     ns
 10485    ns
 95639    ns
 100348   ns

我要做的是计算在任何 clas 列（即“ns”）中有多少“id1”与 id2 具有共同值（即“ns”）。

所以我试过这个：

 x<-as.numeric(levels(factor(df2$id2)))
 clas<-ls()
 for(i in 1:x){
   for(j in 1:length(df1$id1)){
     if(df1$id1==i){clas[[i]]=append(clas[[i]],c(df1$clas1[j],df1$clas2[j],df1$clas3[j]))}
   }
 }

我在这里要做的是在重复 id1 时创建一个包含所有 clas1、clas2 或 clas3 的列表，以便稍后我可以看到 clas0 中的值何时包含在列表中的某个位置？但是我不断收到以下警告：

    In if (id1$id1 == i) { ... :
 the condition has length > 1 and only the first element will be used

我被困住了。有人能指出我正确的方向吗？非常感谢马可

【问题讨论】：

我不太明白您要做什么，但错误可能来自if(df1$id1==i)。 if 不返回向量，它返回单个 T/F 值。您正在尝试将整个 df$id 向量与 i 进行比较，这应该为 df$id 的每个元素返回 T/F 而不仅仅是单个 T/F。
您能否举例说明您希望输出的样子

标签： r

【解决方案1】：

我要做的是计算有多少“id1”具有共同值（即“ns”）在任何 clas 列（即“ns”）中具有 id2。

df1 <- read.table(text="id1     clas1    clas2    clas3
 512     ns       abx      NA
 512     ns       or       NA
 512     abx      dm       sup
 845     or       NA       NA
 1265    dd       ivf      NA
 1265    ns       ivf      pts
 9453    col      ns       ivf
 9453    abx      ns       or     
 95635   ns       abx      or", header=TRUE)

df2 <- read.table(text=" id2      clas0
 102      ns
 512      ns
 915      ns
 1265     ns
 9453     ns
 10485    ns
 95639    ns
 100348   ns", header=TRUE)

df <- merge(df1, df2, by.x="id1", by.y="id2")
sum(apply(df$clas0 == df[, c("clas1", "clas2", "clas3")], 1, any, na.rm = TRUE))
#[1] 5

【讨论】：