通过查看最后一组有条件地确定列的值答案

【问题标题】：Conditionally determining value of column by looking at last group通过查看最后一组有条件地确定列的值
【发布时间】：2020-07-07 14:21:27
【问题描述】：

我有如下所示的测试数据：

   Group Value
1      a     1
2      a     2
3      a     3
4      a     4
5      b     5
6      b     2
7      b     3
8      c     6
9      c     7
10     c     8
11     c     3
12     c     6
13     d     9
14     d    10
15     e     9

我正在尝试创建一种矢量化方法，最好使用tidyverse 工具，该工具将创建一个额外的列，用于记录值是否存在于先前的分组中。下面是一个示例：

   Group Value In_Last_Group
1      a     1         FALSE
2      a     2         FALSE
3      a     3         FALSE
4      a     4         FALSE
5      b     5         FALSE
6      b     2          TRUE
7      b     3          TRUE
8      c     6         FALSE
9      c     7         FALSE
10     c     8         FALSE
11     c     3          TRUE
12     c     5          TRUE
13     d     9         FALSE
14     d    10         FALSE
15     e     9          TRUE

我有一种方法可以使用标准 for 循环来做到这一点，但我有一个大数据集，我相信如果它被矢量化它会快得多。任何帮助将不胜感激。

这里是测试数据的dput：

structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("a", "b", "c", "d", 
"e"), class = "factor"), Value = c(1, 2, 3, 4, 5, 2, 3, 6, 7, 
8, 3, 6, 9, 10, 9)), .Names = c("Group", "Value"), row.names = c(NA, 
-15L), class = "data.frame")

【问题讨论】：

标签： r dplyr tidyverse

【解决方案1】：

我们可以nest按'Group'分组后，然后去掉'data'列的第一个和最后一个元素，用map2做对应元素的比较，然后追加FALSE元素为第一组

library(dplyr)
library(purrr)
df2 <- df1 %>%
         group_by(Group) %>%
         nest

flag <-  map2(df2$data[-1], df2$data[-nrow(df2)], ~ 
      .x$Value %in% .y$Value) %>%
      unlist
df1$Last_Group <- c(rep(FALSE, nrow(df2$data[[1]])), flag)

【讨论】：

【解决方案2】：

您可以使用连接来查找前一组中的值，以查看这些值是否存在。它应该比遍历组更快。我不熟悉tidyverse，但这是data.table 中的一个实现（如果您的数据足够大，它也应该比tidyverse 更快）：

library(data.table)
setDT(DF)
DF[, c("g", "pg") := .(r <- rleid(Group), r - 1L)]
DF[, ilg := FALSE][DF, on=.(pg=g, Value), ilg := TRUE]

输出（请注意，OP 所需输出的第 12 行中的 Value 有错字）：

    Group Value g pg   ilg
 1:     a     1 1  0 FALSE
 2:     a     2 1  0 FALSE
 3:     a     3 1  0 FALSE
 4:     a     4 1  0 FALSE
 5:     b     5 2  1 FALSE
 6:     b     2 2  1  TRUE
 7:     b     3 2  1  TRUE
 8:     c     6 3  2 FALSE
 9:     c     7 3  2 FALSE
10:     c     8 3  2 FALSE
11:     c     3 3  2  TRUE
12:     c     6 3  2 FALSE
13:     d     9 4  3 FALSE
14:     d    10 4  3 FALSE
15:     e     9 5  4  TRUE

数据：

DF <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L), .Label = c("a", "b", "c", "d", 
"e"), class = "factor"), Value = c(1, 2, 3, 4, 5, 2, 3, 6, 7, 
8, 3, 6, 9, 10, 9)), .Names = c("Group", "Value"), row.names = c(NA, 
-15L), class = "data.frame")

【讨论】：