R：返回具有最高值的 5 行

【问题标题】：R: returning the 5 rows with the highest valuesR：返回具有最高值的 5 行
【发布时间】：2016-02-01 15:37:43
【问题描述】：

样本数据

mysample <- data.frame(ID = 1:100, kWh = rnorm(100))

我正在尝试自动化返回数据框中包含特定列中 5 个最高值的行的过程。在示例数据中，使用代码可以找到“kWh”列中的 5 个最高值：

(tail(sort(mysample$kWh), 5))

在我的情况下返回：

[1] 1.477391 1.765312 1.778396 2.686136 2.710494

我想创建一个表，其中包含第 2 列中包含这些数字的行。我正在尝试使用此代码：

mysample[mysample$kWh == (tail(sort(mysample$kWh), 5)),]

这会返回：

   ID      kWh  
87 87 1.765312

我希望它在“kWh”列中返回包含上述数字的 r 行。我确定我错过了一些基本的东西，但我无法弄清楚。

【问题讨论】：

标签： r

【解决方案1】：

我们可以使用rank

mysample$Rank <- rank(-mysample$kWh)
head(mysample[order(mysample$Rank),],5)

如果我们不需要创建列，直接使用order（就像@Jaap在三种替代方法中提到的那样）

#order descending and get the first 5 rows
head(mysample[order(-mysample$kWh),],5)
#order ascending and get the last 5 rows
tail(mysample[order(mysample$kWh),],5) 
#or just use sequence as index to get the rows.
mysample[order(-mysample$kWh),][1:5]

【讨论】：

为什么不只是head(mysample[order(-mysample$kWh),],5)？
@Jaap 是的，有可能，但我想我读到了类似创建一个新列的内容。
@Jaap 在阅读了这个问题stackoverflow.com/questions/3692563/… 后我使用了tail 而不是head
@akrun 也许您也可以添加这些替代方案：tail(mysample[order(mysample$kWh),],5) & mysample[order(-mysample$kWh),][1:5]
简单到满足我的需要：mysample[mysample$Rank<6,]@akrun，毕竟排名列是需要的。这适用于我的真实数据。很开心。