如何在数据框列中查找数字元素向量的第一次出现？答案

【问题标题】：How to find first occurrence of a vector of numeric elements within a data frame column?如何在数据框列中查找数字元素向量的第一次出现？
【发布时间】：2017-12-20 19:20:34
【问题描述】：

我有一个数据框 (min_set_obs)，其中包含两列：第一列包含数值，称为处理，第二列是 id 列，称为 seq：

min_set_obs
 Treatment seq
       1   29
       1   23
       3   60
       1   6
       2   41
       1   5
       2   44

假设我有一个数值向量，称为key：

key
[1] 1 1 1 2 2 3

即三个 1、两个 2 和一个 3 的向量。

我将如何确定 min_set_obs 数据框中的哪些行包含 key 向量中第一次出现的值？

我希望我的输出如下所示：

Treatment seq
   1   29
   1   23
   3   60
   1   6
   2   41
   2   44

即min_set_obs 的第六行是“额外的”（当应该只有三个 1 时，它是第四个 1），所以它会被删除。

我熟悉%in% 运算符，但我认为它无法告诉我key 向量在min_set_obs 数据框的第一列中第一次出现的位置。

谢谢

【问题讨论】：

标签： r dataframe vector subset

【解决方案1】：

这里有一个带有base R的选项，这里我们将split的'min_set_obs'通过'Treatment'变成list，得到head中元素的head使用对应的频率'key ' 和 rbind list 元素到单个 data.frame

res <- do.call(rbind, Map(head, split(min_set_obs, min_set_obs$Treatment), n = table(key)))
row.names(res) <- NULL
res
#   Treatment seq
#1         1  29
#2         1  23   
#3         1   6
#4         2  41
#5         2  44
#6         3  60

【讨论】：

【解决方案2】：

使用dplyr，可以先用table统计keys，然后从每组中对应取前n行：

library(dplyr)
m <- table(key)

min_set_obs %>% group_by(Treatment) %>% do({
    # as.character(.$Treatment[1]) returns the treatment for the current group
    # use coalesce to get the default number of rows (0) if the treatment doesn't exist in key
    head(., coalesce(m[as.character(.$Treatment[1])], 0L))
})

# A tibble: 6 x 2
# Groups:   Treatment [3]
#  Treatment   seq
#      <int> <int>
#1         1    29
#2         1    23
#3         1     6
#4         2    41
#5         2    44
#6         3    60

【讨论】：