变异以获取值之前和之后的值答案

【问题标题】：Mutate to obtain values before and after a value变异以获取值之前和之后的值
【发布时间】：2015-04-17 18:03:01
【问题描述】：

我有一个数据集，其格式类似于：

amount | event
------ | ------
 3     |  FALSE
 4     |  FALSE
 6     |  TRUE
 7     |  FALSE
 3     |  FALSE
 4     |  TRUE
 8     |  FALSE

并且希望根据event 列的值进行拆分和变异，并且仅当event 的值为 TRUE 时，才创建新的列填充一行之前和之后的值。例如：

amount | event | before | after
------ | ----- | -----  | -----
 3     | FALSE |  NA    | NA
 4     | FALSE |  NA    | NA
 6     | TRUE  |  4     | 7
 7     | FALSE |  NA    | NA
 3     | FALSE |  NA    | NA
 4     | TRUE  |  3     | 8
 8     | FALSE |  NA    | NA

我正在考虑 ddply 和 mutate，但不确定如何根据拆分后的偏移量访问值。有什么想法吗？

【问题讨论】：

标签： r plyr reshape

【解决方案1】：

使用base R，我们用which（'indx'）在'event'列中找到TRUE值的位置，创建两个NA列（'before'和'after'），然后我们分配'indx' 到 'before' 和 'after' 列下方位置 1 和位置 1 的 'amount' 值

indx <- which(df1$event)
df1[c('before','after')] <- NA
df1$before[indx] <- df1$amount[indx-1]
df1$after[indx] <- df1$amount[indx+1]
 df1
 #  amount event before after
 #1      3 FALSE     NA    NA
 #2      4 FALSE     NA    NA
 #3      6  TRUE      4     7
 #4      7 FALSE     NA    NA
 #5      3 FALSE     NA    NA
 #6      4  TRUE      3     8
 #7      8 FALSE     NA    NA

或者使用data.table（类似于@Marat Talipov 的想法），我们可以使用shift 获取'amount' 的lag 和lead 值来创建'before/after' 列。我们将与 'event' (!event) 中的 FALSE 值对应的列中的行更改为 NA。

 library(data.table)#data.table_1.9.5
 setDT(df1)[,c('before', 'after'):= list(shift(amount, type='lag'),
    shift(amount, type='lead')) ][(!event), 3:4 := NA][]
 #     amount event before after
 #1:      3 FALSE     NA    NA
 #2:      4 FALSE     NA    NA
 #3:      6  TRUE      4     7
 #4:      7 FALSE     NA    NA
 #5:      3 FALSE     NA    NA
 #6:      4  TRUE      3     8
 #7:      8 FALSE     NA    NA

数据

df1 <- structure(list(amount = c(3L, 4L, 6L, 7L, 3L, 4L, 8L), 
event = c(FALSE, 
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE)), .Names = c("amount", 
"event"), class = "data.frame", row.names = c(NA, -7L))

【讨论】：

+1 用于从基础 R 开始，然后展示如何在 library(data.table) 中完成相同的操作。对于我们这些对 R 相对较新的人来说，首先考虑如何使用基本命令完成某些事情然后转换为 data.table 或 dplyr 是有用的，这两者都充分利用了 base 和 base 的含义。我的情况略有不同，我可能会提出一个新问题，但上面的工作示例让我继续前进。谢谢，克里斯
@Chris 我正在更新一些解释。

【解决方案2】：

您可以使用此代码：

library(dplyr)

d %>% 
  mutate(before=ifelse(event,lag(amount),NA),
         after =ifelse(event,lead(amount),NA))

#  amount event before after
#1      3 FALSE     NA    NA
#2      4 FALSE     NA    NA
#3      6  TRUE      4     7
#4      7 FALSE     NA    NA
#5      3 FALSE     NA    NA
#6      4  TRUE      3     8
#7      8 FALSE     NA    NA

d 是您的样本数据集：

d <- structure(list(amount = c(3, 4, 6, 7, 3, 4, 8), event = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE)), .Names = c("amount", "event"), row.names = c(NA, -7L), class = "data.frame")

【讨论】：

虽然这个答案的声明性更好，但我不能完全让它工作，因为：1）leads 没有定义，2）使用lag(amount, 1) 结果与@相同987654326@.
1) 它应该是lead，而不是leads。 2) 我在这里看不到问题。

【解决方案3】：

数据

df1 <- structure(list(smp = 1:17, x = c(609, 609, 609, 625, 625, 608, 608, 608, 608, 608, 608, 608, 630, 631, 605, 603, 602), y = c(449, 446, 446, 460, 455, 445, 445, 445, 445, 445, 445, 445, 459, 459, 446, 448, 452), blink = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE)), .Names = c("smp", "x", "y", "blink"), class = "data.frame", row.names = c(NA, -17L))

在这个有多个 TRUE 值的数据实例中，可能需要采用不同的方法进行索引，以在感兴趣的条件之前和之后实际获取值，因为上述基本方法将返回感兴趣的条件内的值。

考虑您需要在条件之前和之后的 SpatialPoints，然后想要将之前的距离与给定点进行比较，并将之后的条件与给定点进行比较。在那种情况下，您想要（正好）条件之前和（正好）之后的点，并且可能不想要中间点。类似于上面 akrun 的回答，这建议同时调整左侧 (LHS) 和右侧 (RHS) 的索引。调整 LHS 和 RHS 的索引提供了对感兴趣条件（之前或之后）的“外部性”进行第二次逻辑测试的机会，在一个之后有多个 T 的情况下，上述方法无法解决。 F 后跟一个 F，即 F、T、T、T、F、F。

head(df1, n = 17) smp x y blink 1 1 609 449 FALSE 2 2 609 446 FALSE 3 3 609 446 TRUE 4 4 625 460 FALSE 5 5 625 455 FALSE 6 6 608 445 TRUE 7 7 608 445 TRUE 8 8 608 445 FALSE 9 9 608 445 FALSE 10 10 608 445 TRUE 11 11 608 445 TRUE 12 12 608 445 TRUE 13 13 630 459 FALSE 14 14 631 459 FALSE 15 15 605 446 TRUE 16 16 603 448 TRUE 17 17 602 452 FALSE

df1[c('pre_x', 'pre_y', 'post_x', 'post_y')] <- NA

在这种情况下，pre_x/pre_y、post_x/post_y 最终将是 cbind 坐标，然后是 SpatialPoints；但是，这是在确定之前和之后的内容之后发生的。您的用例可能不同，但逻辑应该成立。

indx_1 <- which(df1$blink)

indx_1 [1] 3 6 7 10 11 12 15 16

然后使用indx_1计算pre_x、pre_y、post_x、post_y：

df1$pre_x[indx_1 - 1] <- df1$x[indx_1 - 1] df1$pre_y[indx_1 - 1] <- df1$y[indx_1 - 1] df1$post_x[indx_1 + 1] <- df1$post_x[indx_1 + 1] df1$post_y[indx_1 + 1] <- df1$post_y[indx_1 + 1]

> head(df1, n = 17) smp x y blink pre_x pre_y post_x post_y 1 1 609 449 FALSE NA NA NA NA 2 2 609 446 FALSE 609 446 NA NA 3 3 609 446 TRUE NA NA NA NA 4 4 625 460 FALSE NA NA 625 460 5 5 625 455 FALSE 625 455 NA NA 6 6 608 445 TRUE 608 445 NA NA 7 7 608 445 TRUE NA NA 608 445 8 8 608 445 FALSE NA NA 608 445 9 9 608 445 FALSE 608 445 NA NA 10 10 608 445 TRUE 608 445 NA NA 11 11 608 445 TRUE 608 445 608 445 12 12 608 445 TRUE NA NA 608 445 13 13 630 459 FALSE NA NA 630 459 14 14 631 459 FALSE 631 459 NA NA 15 15 605 446 TRUE 605 446 NA NA 16 16 603 448 TRUE NA NA 603 448 17 17 602 452 FALSE NA NA 602 452

现在所需的值被写入感兴趣的条件之外并可靠地报告前后值。此外，前索引 (indx_2) 和后 (indx_3) 可用于选择进一步处理，在我的情况下为 SpatialPoints 制作坐标。

indx_2 <- which(!df1$blink & !is.na(df1$pre_x))

indx_3 <- which(!df1$blink & !is.na(df1$post_x))

coords_pre <- cbind(x = df1$pre_x[indx_2], y = df1$pre_y[indx_2])

coords_post <- cbind( x = df1$post_x[indx_3], y = df1$post_y[indx_3])

library(sp) pre_blink_sp <- SpatialPoints(coords_pre) > summary(pre_blink_sp) Object of class SpatialPoints Coordinates: min max x 608 631 y 445 459 Is projected: NA proj4string : [NA] Number of points: 4

已经整理好如何在 base 中执行此操作，尽管很乏味，df1$smp 是否有 setkey()，因为我现在试图弄清楚如何在 data.table 中完成相同的操作。

【讨论】：