在 R 中简化没有矩阵表示法的条件表循环答案

【问题标题】：Simplify conditional table loop without matrix notation in R在 R 中简化没有矩阵表示法的条件表循环
【发布时间】：2015-05-08 03:13:23
【问题描述】：

使用下面的示例，我想知道是否有更有效的包或函数来对匹配的字符串元素进行条件计数和表格——例如，使用data.table 包、dplyr 包、lapply() 之类的功能？

produce = c("apple", "blueberry", "blueberry", "corn",
            "horseradish", "rutabega", "rutabega", "tomato") # Long list

veggies = c("carrot", "corn", "horseradish", "rutabega") # Short list

basket = matrix(rep(0, length(unique(veggies))*length(unique(produce)) ), nrow = length(unique(veggies)),  
                ncol = length(unique(produce)) )

rownames(basket) <- unique(veggies)
colnames(basket) <- unique(produce)

basket

输出：

#               apple blueberry corn horseradish rutabega tomato
# carrot          0         0    0           0        0      0
# corn            0         0    0           0        0      0
# horseradish     0         0    0           0        0      0
# rutabega        0         0    0           0        0      0

使用共享实例查找计数

for(i in 1:length(veggies)) {

  counter = NULL

  for (j in 1:length(produce)){ 

    if(veggies[i] ==  produce[j]){ 

      basket[i, which( colnames(basket) == produce[j] ) ] <- basket[i, 
                             which( colnames(basket) == produce[j] ) ] + 1

    }

  }

}

basket

我使用更快/更优雅的方法寻求的结果：

#               apple blueberry corn horseradish rutabega tomato
# carrot          0         0    0           0        0      0
# corn            0         0    1           0        0      0
# horseradish     0         0    0           1        0      0
# rutabega        0         0    0           0        2      0

【问题讨论】：

这可能是问题的good reference。

标签： r data.table dplyr lapply

【解决方案1】：

使用data.table

library(data.table)
dcast(data.table(produce), produce~produce)[veggies]

       produce apple blueberry corn horseradish rutabega tomato
#1:      carrot    NA        NA   NA          NA       NA     NA
#2:        corn     0         0    1           0        0      0
#3: horseradish     0         0    0           1        0      0
#4:    rutabega     0         0    0           0        2      0

【讨论】：

快速且更精简！你如何绕过这里的 NA？
carrot 不在produce 中。您可以将 NA 替换为 0 为 df[is.na(df)] <- 0
看起来dcast() 函数来自reshape2 包。
data.table v1.9.5（从 Github 安装）有自己的内置 dcast 函数。
@Frank 感谢您的指点。用 0 替换 NA 的 data.table 方式是什么？

【解决方案2】：

在基础 R 中我能想到的最不难看的解决方案：

newprod <- factor(produce, levels=unique(c(produce,veggies)))
table(newprod,newprod)[veggies,]

#             newprod
#newprod       apple blueberry corn horseradish rutabega tomato carrot
#  carrot          0         0    0           0        0      0      0
#  corn            0         0    1           0        0      0      0
#  horseradish     0         0    0           1        0      0      0
#  rutabega        0         0    0           0        2      0      0

或者全部在一个丑陋的行中：

do.call(table, replicate(2,factor(produce, levels=unique(c(produce,veggies))),simplify=FALSE))[veggies,]

【讨论】：