【问题标题】:R: evaluating multiple conditionals multiple timesR:多次评估多个条件
【发布时间】:2017-11-08 02:32:30
【问题描述】:

我有这样的数据:

df = as.data.frame(cbind(
  event1 = c(88.76,96.04,99.60,88.76,99.60,34.04,96.04,87.03,87.44,87.44),
  time1 = c(0.100,0.033,0.000,0.117,0.000,0.000,0.050,0.500,0.133,0.117),
  event2 = c(NA,99.60,NA,34.04,99.62,88.76,87.44,87.41,88.76,88.76),
  time2 = c(NA,0.050,NA,0.100,0.017,0.083,0.200,0.500,0.133,0.050),
  event100 = c(NA,89.52,NA,34.04,93.93,34.02,88.76,88.01,88.01,87.41),
  time100 = c(NA,0.050,NA,0.100,0.033,0.117,0.300,0.500,0.233,0.300),
  event_88.76_within_0.1 = rep(0,10)
))

其中event1 是主题发生的第一个事件的代码,time1event1 发生之前的时间,每个主题最多有 100 个事件和事件发生的时间。

我正在尝试创建一个变量 (event_88.76_within_0.1),以指示事件 88.76 是否在 0.1 分钟内发生。因此,如果一个对象的任何事件等于 88.76 并且相应的事件时间小于或等于 0.1,则它等于 1。

使用这个嵌套的for 循环:

for(r in 1:nrow(df)){ #for each subject
  for(c in 1:6){ #for each event
    if( !is.na(df[r, c]) & df[r, c] == 88.76 & df[r,(c+1)] <= 0.1){
#if the event code is not missing and if it's the needed event code and
#the next column over (the corresponding time to event) is less than 0.1
      df[r,"event_88.76_within_0.1"] = 1   
    } 
    i = i + 2  #skip 2 columns to get to next event code
  }
}

我可以得到这个,这就是我想要的:

      event1 time1 event2 time2 event100 time100 event_88.76_within_0.1
 [1,]  88.76 0.100     NA    NA       NA      NA                      1
 [2,]  96.04 0.033  99.60 0.050    89.52   0.050                      0
 [3,]  99.60 0.000     NA    NA       NA      NA                      0
 [4,]  88.76 0.117  34.04 0.100    34.04   0.100                      0
 [5,]  99.60 0.000  99.62 0.017    93.93   0.033                      0
 [6,]  34.04 0.000  88.76 0.083    34.02   0.117                      1
 [7,]  96.04 0.050  87.44 0.200    88.76   0.300                      0
 [8,]  87.03 0.500  87.41 0.500    88.01   0.500                      0
 [9,]  87.44 0.133  88.76 0.133    88.01   0.233                      0
[10,]  87.44 0.117  88.76 0.050    87.41   0.300                      1

但数据集有数千个主题(每个主题有 100 个可能的事件),因此嵌套的 for 循环需要一段时间才能运行。

我想将上面的循环向量化为这样的:

df$event_88.76_within_0.1 = 0
df$event_88.76_within_0.1[df[ "events that equal 88.76 and occurred within 0.1" ]]=1

但我没有运气。

任何帮助将不胜感激。

【问题讨论】:

    标签: r if-statement vector


    【解决方案1】:

    你可以这样做:

    ## Define the names of your events and times columns
    events = paste0("event",c(1,2,100))
    times = paste0("time",c(1,2,100))
    ## Check if your two conditions are met and multiply the results (multiplying TRUE by TRUE gives 1, multiplying TRUE or FALSE by FALSE returns 0)
    df$event_88.76_within_0.1 = pmin(1,rowSums((df[,events]==88.76)*(df[,times]<=0.1),na.rm=T))
    
       event1 time1 event2 time2 event100 time100 event_88.76_within_0.1
    1   88.76 0.100     NA    NA       NA      NA                      1
    2   96.04 0.033  99.60 0.050    89.52   0.050                      0
    3   99.60 0.000     NA    NA       NA      NA                      0
    4   88.76 0.117  34.04 0.100    34.04   0.100                      0
    5   99.60 0.000  99.62 0.017    93.93   0.033                      0
    6   34.04 0.000  88.76 0.083    34.02   0.117                      1
    7   96.04 0.050  87.44 0.200    88.76   0.300                      0
    8   87.03 0.500  87.41 0.500    88.01   0.500                      0
    9   87.44 0.133  88.76 0.133    88.01   0.233                      0
    10  87.44 0.117  88.76 0.050    87.41   0.300                      1
    

    【讨论】:

    • 漂亮!谢谢。
    • 呸!你打败了我;)
    • @Lamia,很好的答案!一个小建议:如果条件满足每行n 次,您将得到n 而不是1。我建议将ifelse 包裹在它周围,或者在 final 变量周围包裹起来,例如 ifelse(rowSums((df[,events]==88.76)*(df[,times]&lt;=0.1),na.rm=T)&gt;0, 1, 0)
    • @MattTyers 对不起.. :)
    • @YannisVassiliadis 没错,我从 OP 描述中了解到 88.76 事件每行只能发生一次,因为每个事件都不同。如果没有,那么可以包含您建议的 ifelse 或 pmin(1,rowSums((df[,events]==88.76)*(df[,times]&lt;=0.1),na.rm=‌​T))。我会相应地编辑我的答案。
    【解决方案2】:

    这个胶带球怎么样...

    cond1 <- df[,seq(1,6,by=2)]==88.76
    cond2 <- df[,seq(2,6,by=2)]<=0.1
    vec <- which(rowSums(cond1 & cond2, na.rm=T)==1)
    
    df[vec,]
    ##    event1 time1 event2 time2 event100 time100
    ## 1   88.76 0.100     NA    NA       NA      NA 
    ## 6   34.04 0.000  88.76 0.083    34.02   0.117 
    ## 10  87.44 0.117  88.76 0.050    87.41   0.300 
    

    【讨论】:

    • 也是一个很好的答案(我喜欢通过使用列号而不是@Lamia 的命名方法使其更加灵活的选项),但 Lamia 比你快一分钟。
    猜你喜欢
    • 1970-01-01
    • 2012-05-09
    • 2021-07-07
    • 1970-01-01
    • 2020-06-15
    • 2020-11-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多