R - 当 Col2 > Col3 时，对数据表的 Col1 中的唯一值进行编号答案

【问题标题】：R - Number Unique Vals in Col1 of a Data Table when Col2 > Col3R - 当 Col2 > Col3 时，对数据表的 Col1 中的唯一值进行编号
【发布时间】：2016-04-15 14:44:53
【问题描述】：

样本表：

ID      Score1      Score2
1       100         88
1       96          94
1       94          95
2       100         100
2       98          94
3       77          88

所以我希望返回值为 2，因为有 2 个独特的人有一个 Score1 > Score2 的实例。

为了重现性：

df = data.frame( ID=c(1,1,1,2,2,3), Score1=c(100,96,94,100,98,77), Score2=c(88,94,95,100,94,88) )
ID Score1 S

我在想

length( unique( which( df$Score1 > df$Score2 ) ) )

但是返回 3，显然是因为它没有考虑寻找唯一的 df$ID，只是唯一出现的次数。如何解释想要唯一df$ID 的唯一编号？

【问题讨论】：

标签： r data.table unique

【解决方案1】：

我想你在baseR 中寻找这个：

length(unique(df$ID[df$Score1 > df$Score2]))
[1] 2

或者使用data.table:

library(data.table)
setDT(df)[Score1 > Score2, uniqueN(ID)]

或dplyr:

library(dplyr)
df %>% filter(Score1 > Score2) %>% { n_distinct(.$ID) }

【讨论】：

是的！这正是我想不出的。非常感谢你。我不知道如何将 $ID 与 which 语句合并...结果，我不需要 which！

【解决方案2】：

在您的代码基础上，在 ID 上获取 unique

length(unique(df[df$Score1>df$Score2,1]))

【讨论】：

不要使用列号
@MichaelChirico，为什么不呢？
通常，它一点也不健壮。日后在代码中进一步添加/删除/重新排序列时，使用列号的做法可能会导致难以检测的错误