【问题标题】:Conditional Filter for 2 Variables with dplyr filter带有 dplyr 过滤器的 2 个变量的条件过滤器
【发布时间】:2017-11-09 14:25:16
【问题描述】:

我正在处理一些时间序列(瞳孔扩张)数据,并希望根据不同的因子变量 (SOA) 过滤不同的时间范围 (Time)

样本数据:

library(dplyr)        

Data <- structure(list(Subject = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 
        2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 
        2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("12", "14", 
        "15", "16", "18", "20", "21", "22", "23", "28", "29", "30", "33", 
        "36", "37", "38", "40", "42", "43", "44"), class = "factor"), 
        SOA = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
        2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
        1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Long SOA", "Short SOA"
        ), class = "factor"), Time = c(-66.68, -66.68, -66.68, -66.68, 
        -33.34, -33.34, -33.34, -33.34, 0, 0, 0, 0, 33.34, 33.34, 
        33.34, 33.34, 66.68, 66.68, 66.68, 66.68, 100.02, 100.02, 
        100.02, 100.02, 133.36, 133.36, 133.36, 133.36, 166.7, 166.7, 
        166.7, 166.7), Pcent_Chng = c(0.14391, 0.076759, -0.022377, 
        0.038111, 0.21093, 0.11448, -0.0047064, 0.078232, 0.27924, 
        0.1527, -0.0085276, 0.12385, 0.38328, 0.21299, 0.01988, 0.15626, 
        0.47471, 0.25357, 0.050318, 0.20517, 0.58012, 0.2888, 0.080629, 
        0.20616, 0.65861, 0.33622, 0.12892, 0.20832, 0.75277, 0.38181, 
        0.17921, 0.21789)), class = "data.frame", row.names = c(NA, 
        -32L), .Names = c("Subject", "SOA", "Time", "Pcent_Chng"))

我想对SOA = "Short"SOA = "Long" 在不同的Time 期间进行平均。

我已经为Type = "Word"filter 之前和之后group_by)尝试了这些:

Data %>% 
filter(Time[SOA = "Short SOA"] >= 0 & Time[SOA = "Short SOA"] <= 100,
       Time[SOA = "Long SOA"] >= 0 & Time[SOA = "Long SOA"] <= 150) %>%
group_by(Subject,SOA) %>%
summarize(Word_Avg_Pcent = mean(Pcent_Chng,na.rm=TRUE))

Data %>% 
group_by(Subject,SOA) %>%
filter(Time[SOA = "Short SOA"] >= 0 & Time[SOA = "Short SOA"] <= 100,
       Time[SOA = "Long SOA"] >= 0 & Time[SOA = "Long SOA"] <= 150) %>%
summarize(Word_Avg_Pcent = mean(Pcent_Chng,na.rm=TRUE))

两者都导致空数据帧;列在那里,但没有数据。如果我不使用第二个过滤器,我会得到一个完整的数据框。

有没有办法在 dplyr 链中使用管道和过滤器来完成我想要的事情?

【问题讨论】:

  • 我在您的第二个过滤器中看到您使用逗号。这被解释为&amp;。你想要“或”的任何机会?
  • 请做一个最小的例子。如果您的问题是关于过滤的,我们不需要查看您的变量格式化的几个步骤。一些指导stackoverflow.com/questions/5963269/… 另外,最好使其可正确重现,而不是依赖于您或 pastebin 可能随时中断的外部链接。
  • 从文档中,看起来 filter 需要多个参数,用逗号分隔。我的理解有误吗?
  • 逗号转换成 filter(x,y) 等价于 filter(x & y) 但你想要 filter(x | y)。
  • 是,但逗号会被解释为&amp;。如果您希望每个条件都为真,请使用| 而不是逗号。在此处查看最后一个示例?dplyr::filter

标签: r filter dplyr


【解决方案1】:

正如 cmets 中所说,您需要 OR | 这两个 AND &amp; 您正在寻找的条件。

你的过滤器是这样的:

filter(Time[SOA = "Short SOA"] >= 0 & Time[SOA = "Short SOA"] <= 1200, 
       Time[SOA = "Long SOA"] >= 0 & Time[SOA = "Long SOA"] <= 3000)

其中包含不符合逻辑的语句(例如 SOA = "Short SOA")。你需要做的是更明确。

您希望过滤到 SOA 等效于“Short SOA”并且时间值介于 0 和 1200 之间的值,或者 SOA 等效于“Long SOA”并且时间值介于 0 和 3000 之间的值。

SOA == "Short SOA" and 0 <= Time <= 1200 OR SOA == "Long SOA" and 0 <= Time <= 1200

您可以使用dplyr 中的between 作为时间条件。

这个实现是:

library(tidyverse)

Data <- eval(parse(file("http://pastebin.com/raw.php?i=VTWCVgCA")))

Data %>% 
  gather(Sample, Prop_Chng, X.8:X100) %>%
  mutate(Sample = gsub("[.]","-", Sample)) %>%
  mutate(Sample = as.numeric(gsub("X","", Sample))) %>%
  mutate(Time = Sample*33.34) %>%
  mutate(Pcent_Chng = Prop_Chng*100) %>%
  filter(Type == "Word") %>% 
  filter((SOA == "Short SOA" & between(Time, 0, 1200)) |  (SOA == "Long SOA" & between(Time, 0, 3000))) %>% 
  group_by(Subject, NsCond,Close,SOA) %>%
  summarize(Word_Avg_Pcent = mean(Pcent_Chng,na.rm=TRUE))         

【讨论】:

  • 完美!谢谢!我不知道between
猜你喜欢
  • 1970-01-01
  • 2017-05-06
  • 2020-04-17
  • 1970-01-01
  • 2019-04-04
  • 2017-10-11
  • 2019-10-10
  • 1970-01-01
  • 2019-11-01
相关资源
最近更新 更多