R：有条件地应用于具有排除值的子集答案

【问题标题】：R: Apply conditionally over subsets with excluded valuesR：有条件地应用于具有排除值的子集
【发布时间】：2018-09-15 19:23:05
【问题描述】：

下面是我在R 中的一个数据示例。 'column Ahas an assigned letter incolumn B, and an assigned value incolumn C. I want to add acolumn Dthat records a ratio for each observation incolumn A` 列中的每个观察值。以下是观察“1”的比率计算示例。

对于“1”的每个观察值，我想计算所有出现在表中但未分配给“1”的 x。在这种情况下，计数为 2，因为有两个 X 分配给观察“3”。除了上述条件之外，我还想计算column C 中 值大于 6 的未分配给观察“1”的 X。计数为 1，因为分配给“3”的两个 X 之一在 column C 中的值大于 6。因此，在column D 中，每个观测“1”的比率为 1 比 2: 1/2。

我也想为column B 中的 Y 做点什么。

data_table
Column A   Column B  Column C
 1           X         7
 1           X         8
 1           X         3
 1           X         3
 2           Y         3
 2           Y         8
 3           X         5
 3           X         7
 4           Y         6
 4           Y         7
 4           Y         8

我希望结果表如下所示：

Column A  Column B  Column C Column D
 1           X         7      1/2     #There are two x's assigned to "3", and one of which has a value greater than 6 in column C.
 1           X         8      1/2
 1           X         3      1/2
 1           X         3      1/2
 2           Y         3      2/3
 2           Y         8      2/3
 3           X         5      2/4
 3           X         7      2/4
 4           Y         6      1/2
 4           Y         7      1/2
 4           Y         8      1/2

这是我到目前为止提出的代码，但是对于 column A 中的每个观察，我未能生成 nrow 计数，这些计数跳过了分配给 那个观察的 X。

    final_df %>% group_by(column_B) %>% 
    mutate(ratio = nrow(filter(final_df, column_C>6))/nrow(final_df))

关于如何修改它以在计算大于 6 的 X 的比例（column C）时从特定观察 (column A) 中排除 X 的任何建议

谢谢！

【问题讨论】：

您要求number of x's that are not assigned to "1"。在 Column_A == 1 的 4 行中，这将是 0。作为分母，这将是一个糟糕的选择。我认为您需要查看问题的描述。在我看来，它与预期的结果几乎没有联系。
您好，感谢您的评论。我刚刚编辑了描述。有两个 x 分配给“2”，所以 2 是分母。分配的 X 之一在 C 列中的值大于 6，因此分子为 1。
我看不到“两个 X 分配给 2”。我在“2”定义的行中看到两个 Y。

标签： r subset apply

【解决方案1】：

像这样简单的东西怎么样？

## Simulate some data
id1 <- rep(round(runif(250, 0,1)*100000000),each=4)
id2 <- rep(round(runif(50, 0,1)*100000000),each=4)
id2 <- rep(id2, each=5)
value <- rnorm(1000, mean=6, sd=2)
df <- data.frame(id1, id2, value)

## Calculate using a loop
output <- data.frame(id1, id2, prop=NA)
output <- output[!duplicated(output),]
for(i in 1:nrow(output)){
    gt6 <- sum(df$value[df$id2!=output$id2[i]]>6)
    tot <- sum(df$id2!=output$id2[i])
    output$prop[i] <- gt6/tot
}

【讨论】：