R：通过检查参考集从列表中生成数据框答案

【问题标题】：R: generate a dataframe from lists by checking a reference setR：通过检查参考集从列表中生成数据框
【发布时间】：2011-05-12 09:36:36
【问题描述】：

我的同事萨曼莎问了一个不清楚的问题，所以我在这里问这个问题。她有一个变量goterms，包含所有要分析的数据帧。

goterms <- c('df1','df2','df3')

interestedGO 变量为每个 goterm 包含一个带有 ILMN 编号的列表。所以第一个列表包含df1 ans等的ILMN代码。

df1 <- c("ILMN_1665132", "ILMN_1691487", "ILMN_1716446", "ILMN_1769383",
         "ILMN_1772387", "ILMN_1783910", "ILMN_1784863")
df2 <- c("ILMN_1651599", "ILMN_1652693", "ILMN_1652825", "ILMN_1653324",
         "ILMN_1655595", "ILMN_1656057", "ILMN_1659077", "ILMN_1659923",
         "ILMN_1659947", "ILMN_1662322", "ILMN_1662619", "ILMN_1664565",
         "ILMN_1665132", "ILMN_1665738", "ILMN_1665859")
df3 <- c("ILMN_1661695", "ILMN_1665132", "ILMN_1716446", "ILMN_1737314",
         "ILMN_1772387", "ILMN_1784863", "ILMN_1796094", "ILMN_1800317",
         "ILMN_1800512", "ILMN_1807074")
interestedGO <- list(df1,df2,df3)

xx2 是一个比较集。 xx2 变量包含所有可能的 ILMN 编号的子集。

xx2 <- c("ILMN_1691487", "ILMN_1716446", "ILMN_1769383","ILMN_1832921")

x 是一种参考集。 x 变量包含所有可能的 ILMN 编号。

x <- c("ILMN_1665132", "ILMN_1691487", "ILMN_1716446", "ILMN_1769383", "ILMN_1772387",
       "ILMN_1783910", "ILMN_1784863","ILMN_1651599", "ILMN_1652693", "ILMN_1652825",
       "ILMN_1653324", "ILMN_1655595","ILMN_1656057", "ILMN_1659077", "ILMN_1659923",
       "ILMN_1659947", "ILMN_1662322","ILMN_1662619", "ILMN_1664565", "ILMN_1665132",
       "ILMN_1665738", "ILMN_1665859","ILMN_1661695", "ILMN_1665132", "ILMN_1716446",
       "ILMN_1737314", "ILMN_1772387","ILMN_1784863", "ILMN_1796094", "ILMN_1800317",
       "ILMN_1800512", "ILMN_1807074")

使用所有这些变量，目标是检查每个 goterm 与相应的 ILMN 代码是否在参考集 xx2 中。为了检查这一点，使用了 match 函数，所有没有匹配的都给出 0，匹配的值被替换为 1。为了方便地概述所有 goterms 实验，我想创建一个如下所示的循环，即检查它的每个基因是否在参考集x 中。最终结果必须是data.frame，它比较data.frame 中每个goterm 的结果。

test <- list()
for (i in 1:length(goterms)) {
  goilmn <- as.data.frame(interestedGO[i])
  resultILMN <- match(goilmn[,1], xx2, nomatch=0)
  resultILMN[resultILMN!=0] <- 1
  result <- cbind(goilmn, resultILMN)
  colnames(result) <- c('x', 'result')

  zz <- merge(result, x, all=TRUE)
  zz[is.na(zz)] <- 0
  test[[i]] <- matrix(resultloop)
}

最终的输出会是这样的：

1  ILMN_1651599      0  0  0
2  ILMN_1652693      0  0  0
3  ILMN_1652825      0  0  0
4  ILMN_1653324      0  0  0
5  ILMN_1655595      0  0  0
6  ILMN_1656057      0  0  0
7  ILMN_1659077      0  0  0
8  ILMN_1659923      0  0  0
9  ILMN_1659947      0  0  0
10 ILMN_1661695      0  0  0
11 ILMN_1662322      0  0  0
12 ILMN_1662619      0  0  0
13 ILMN_1664565      0  0  0
14 ILMN_1665132      0  0  0
15 ILMN_1665132      0  0  0
16 ILMN_1665132      0  0  0
17 ILMN_1665738      0  0  0
18 ILMN_1665859      0  0  0
19 ILMN_1691487      0  0  1
20 ILMN_1716446      1  0  1
21 ILMN_1716446      1  0  1
22 ILMN_1737314      0  0  0
23 ILMN_1769383      0  0  1
24 ILMN_1772387      0  0  0
25 ILMN_1772387      0  0  0
26 ILMN_1783910      0  0  0
27 ILMN_1784863      0  0  0
28 ILMN_1784863      0  0  0
29 ILMN_1796094      0  0  0
30 ILMN_1800317      0  0  0
31 ILMN_1800512      0  0  0
32 ILMN_1807074      0  0  0

谁能帮我解决这个问题？谢谢！

【问题讨论】：

标签： list r loops dataframe

【解决方案1】：

这对你有用吗？

data.frame(code=x, sapply(interestedGO, function(curdf){
        ifelse(x %in% xx2, x %in% curdf, 0)
    }))

【讨论】：

+1 不错。我正在研究类似的方法，但您的解决方案非常紧凑。