基于R中其他列数据的条件计算答案

【问题标题】：Conditional calculation based on data in other columns in R基于R中其他列数据的条件计算
【发布时间】：2013-09-09 09:00:49
【问题描述】：

新手：我有一个包含 3 列分类值的数据表，我想添加第四列，其中的值是根据前 3 列的值按行计算的。到目前为止，我有：

tC <- textConnection("Visit1    Visit2  Visit3
yes no  no
yes no  yes
yes yes yes")
data1 <- read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)
data1["pattern"] <- NA

接下来我想填写第 4 列，如果 visit1、visit2 和 visit3 的值例如是“yes”、“no”和“no”，则模式中的 NA 将替换为“1”该行的列。在其他语言中，这将是一个带有一些 IF 语句的 FOR 循环。我已经查看了 apply 系列，但仍然不太确定 R 中的最佳方法和语法。想法表示赞赏。

【问题讨论】：

你想要的其他条件是什么？还是只是每行“是”的数量？
此示例的所需输出是什么样的？
如果我们提供了答案，很高兴能得到一些信息或确认。
我的帖子不是很清楚。每行有 6 种可能的“是”和“否”回复组合（是的，是的，是的，是的，不是的，是的，不是的，等等）。我想创建一个附加列，每行中只有 1-6 的值，具体取决于该行中存在的是和否的组合。
实际上有 8 种可能的组合，除非您的数据不允许所有组合。

标签： r

【解决方案1】：

我不确定这是解决此问题的最有效方法，但我们可以找到唯一行，然后为 data.frame 中的每一行找到它匹配的唯一行。因此，这个数字就是模式 ID。不过，我们必须将行折叠成单个字符串元素，否则 R 向量化会妨碍我们想要的。下面的示例使用稍微扩展的示例数据：

#  Visit1 Visit2 Visit3
#1    yes     no     no
#2    yes     no    yes
#3    yes    yes    yes
#4     no    yes     no
#5    yes     no    yes

#  Get unique combinations
pats <- unique( data1 )

#  Colapse each row to a single string element
pats <- apply( pats , 1 , paste , collapse = " " )

#do the same to your data and compare with the patterns
data1$pattern <- apply( data1 , 1 , function(x) match( paste( x , collapse = " " ) , pats ) )
#  Visit1 Visit2 Visit3 pattern
#1    yes     no     no       1
#2    yes     no    yes       2
#3    yes    yes    yes       3
#4     no    yes     no       4
#5    yes     no    yes       2

【讨论】：

只是求和是行不通的，因为我必须能够区分“yes no no”（一种模式）和“no no yes”（另一种模式），它们都具有相同数量的yes条目。
@marcel 等一下。更新。您应该在 OP 中更好地解释这一点。
@SimonO101，也许I shouldn't have been so fast 给你我的投票:)

【解决方案2】：

假设我们使用@SimonO101 的扩展样本数据，我建议expand.grid 和factor。

首先，为三列创建“是”和“否”的所有组合。

facLevs <- expand.grid(c("yes", "no"), c("yes", "no"), c("yes", "no"))
facLevs
#   Var1 Var2 Var3
# 1  yes  yes  yes
# 2   no  yes  yes
# 3  yes   no  yes
# 4   no   no  yes
# 5  yes  yes   no
# 6   no  yes   no
# 7  yes   no   no
# 8   no   no   no

现在，我们将考虑列的组合。我们可以使用do.call(paste, ...) 比apply(mydf, ...) 更容易地做到这一点。我们会将其转换为as.numeric 以获取数字组。

mydf$pattern <- as.numeric(factor(do.call(paste, mydf[1:3]), 
                                  do.call(paste, facLevs)))
mydf
#   Visit1 Visit2 Visit3 pattern
# 1    yes     no     no       7
# 2    yes     no    yes       3
# 3    yes    yes    yes       1
# 4     no    yes     no       6
# 5    yes     no    yes       3

如您所见，pattern = 7 对应于我们将在我们创建的 facLevs data.frame 的第七行找到的值。

为方便起见，这里是mydf：

mydf <- structure(list(Visit1 = c("yes", "yes", "yes", "no", "yes"), 
                       Visit2 = c("no", "no", "yes", "yes", "no"), 
                       Visit3 = c("no", "yes", "yes", "no", "yes")), 
                  .Names = c("Visit1", "Visit2", "Visit3"), 
                  class = "data.frame", row.names = c("1", "2", "3", "4", "5"))

【讨论】：

+1 我会提醒您，自从我编辑了我的答案后，您可以随时删除您的支持！这似乎更明智:-)

【解决方案3】：

更新

用for循环回答：

updateRow <- function(rIndex, data1) { 
  if ((data1[rIndex, 1] == "yes") && 
      (data1[rIndex, 2] == "no") && 
      (data1[rIndex, 3] == "no")) { 
        data1[rIndex, 4] <- 1
  }   
}

for (i in c(1:3)) updateRow(i, data1); # dim(data1)[2]-1 the column number if you need to change it.

您可以根据需要更改 if。我希望这是你想要的。

【讨论】：

这看起来应该可以工作（谢谢！）。但是，代码不会更改第 4 列的内容： rm(list = ls(all = TRUE)); tC
我更新了代码。不过，您必须编写其他 if。