根据 R 中的条件计算重复项答案

【问题标题】：Count duplicates based on condition in R根据 R 中的条件计算重复项
【发布时间】：2020-10-15 15:12:17
【问题描述】：

我想计算列d，其中重复值可用于以下条件： 1) a = 3w1 和 a = 7w1 2) a = 3w1 和 a = sp2 3) a = 3W1 和 a = 3W1 等等。所以，d 列中的每个 id 都可以在a 列的每次交互中计算出来。

我可以根据重叠日期提取或计算重复项，这可能更有意义，但我遇到了日期错误。 d在3W1、7W1、5W1、14W中重叠。我 sed library(dplyr).

a     b         c         d
3W1  5/11/2020 5/31/2020  1
3W1  5/11/2020 5/31/2020  1
7W1  5/11/2020 6/28/2020  1
5W1  6/1/2020  7/5/2020   1
14W  5/11/2020 8/16/2020  1
3W1  5/11/2020 5/31/2020  2
SP2  6/15/2020 8/16/2020  3
3W1  5/11/2020 5/31/2020  4
3W1  5/11/2020 5/31/2020  4

【问题讨论】：

当您 tried 时，请不要发布代码/数据/错误的图像：它无法复制或搜索 (SEO)，它会破坏屏幕阅读器，并且可能不适合一些移动设备。参考：meta.stackoverflow.com/a/285557（和xkcd.com/2116）。请直接包含代码、控制台输出或数据（例如，dput(head(x)) 或 data.frame(...)）。
我在该图像中看不到3w1。不仅有我们可以复制的格式的数据（见上文），而且在给定样本数据的情况下，您的预期结果也会有所帮助。如果您尝试过没有引用工作的代码，请包括它，以及您打算用来完成此操作的包。关于如何提出完整、独立、可重复的问题的一些很好的参考资料：stackoverflow.com/q/5963269、minimal reproducible example 和 stackoverflow.com/tags/r/info。
给定数据的预期输出是什么？
我需要计算重叠日期中的 ds 数。

标签： r count duplicates

【解决方案1】：

请在下面找到一个尝试计算每个 a 值组合的唯一 d 值的数量。它并不优雅，但可以随意改进。

library(dplyr)
# Create a table with all possible combo of df$a values
conds <- expand.grid(cond1 = unique(df$a), cond2 = unique(df$a), stringsAsFactors = FALSE)
conds

# Use this to make multiple subsets of df and each time count the number of unique d values
test <- setNames( object = as.data.frame(apply(conds, 1, function(x) df %>% filter(a %in% c(x[1], x[2])) %>% summarise(length(unique(d))))),
                  apply(conds, 1, function(x) paste(x[1], x[2], sep = " & ")) )

# Reshape this to get a pretty printed result
res <- reshape(test,
        varying = colnames(test),
        times = colnames(test),
        timevar = "conditions",
        v.names = "count_of_unique_d",
        direction = "long",
        new.row.names = seq_along(colnames(test)))
res <- res[, c("conditions", "count_of_unique_d")]
res

发生了什么？

apply(conds, 1, function(x) df %>% filter(a %in% c(x[1], x[2])) %>% summarise(length(unique(d)))) 子集df 根据conds 的每一行，这是您在a 上的条件。结果存储在使用as.data.frame() 转换为数据框的列表中。 setNames() 为每一列命名，以便您知道应用了哪些条件。

输出：

> head(res, 5)
  conditions count_of_unique_d
1  3W1 & 3W1                 3
2  7W1 & 3W1                 3
3  5W1 & 3W1                 3
4  14W & 3W1                 3
5  SP2 & 3W1                 4

数据：

df <- structure(list(a = c("3W1", "3W1", "7W1", "5W1", "14W", "3W1", 
                           "SP2", "3W1", "3W1"), b = c("5/11/2020", "5/11/2020", "5/11/2020", 
                                                       "6/1/2020", "5/11/2020", "5/11/2020", "6/15/2020", "5/11/2020", 
                                                       "5/11/2020"), c = c("5/31/2020", "5/31/2020", "6/28/2020", "7/5/2020", 
                                                                           "8/16/2020", "5/31/2020", "8/16/2020", "5/31/2020", "5/31/2020"
                                                       ), d = c(1L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 4L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                          -9L))

【讨论】：