【问题标题】:Compare different columns in separate rows in R比较 R 中不同行中的不同列
【发布时间】:2014-01-06 19:52:15
【问题描述】:

我想检查个人的资格状态是否存在任何差距。我将间隔定义为在最后一个 elig_end_date 之后 30 天发生的 date_of_claim。因此,我想做的是检查每个 date_of_claim 是否不超过 elig_end_date +30days 紧接在前面的行中。理想情况下,我想要一个指标,0 表示没有差距,1 表示每人有差距以及差距发生在哪里。这是一个示例 df,其解决方案内置为“间隙”。

 names date_of_claim elig_end_date obs gaps
1    tom    2010-01-01    2010-07-01   1    NA
2    tom    2010-05-04    2010-07-01   1    0
3    tom    2010-06-01    2014-01-01   2    0
4    tom    2010-10-10    2014-01-01   2    0
5   mary    2010-03-01    2014-06-14   1    NA
6   mary    2010-05-01    2014-06-14   1    0
7   mary    2010-08-01    2014-06-14   1    0
8   mary    2010-11-01    2014-06-14   1    0
9   mary    2011-01-01    2014-06-14   1    0
10  john    2010-03-27    2011-03-01   1    NA
11  john    2010-07-01    2011-03-01   1    0
12  john    2010-11-01    2011-03-01   1    0
13  john    2011-02-01    2011-03-01   1    0
14   sue    2010-02-01    2010-04-30   1    NA
15   sue    2010-02-27    2010-04-30   1    0
16   sue    2010-03-13    2010-05-31   2    0
17   sue    2010-04-27    2010-06-30   3    0
18   sue    2010-04-27    2010-06-30   3    0
19   sue    2010-05-06    2010-08-31   4    0
20   sue    2010-06-08    2010-09-30   5    0
21  mike    2010-05-01    2010-07-30   1    NA
22  mike    2010-06-01    2010-07-30   1    0
23  mike    2010-11-12    2011-07-30   2    1

我发现这篇文章非常有用How can I compare a value in a column to the previous one using R?,但我觉得我不能使用循环,因为我的 df 有 400 万行,而且我已经很难在上面运行循环了。

为此,我认为我需要的代码是这样的:

df$gaps<-ifelse(df$date_of_claim>=df$elig_end_date+30,1,0)  ##this doesn't use the preceeding row.

我用这个做了一个笨拙的尝试:

df$gaps<-df$date_of_claim>=df$elig_end_date[-1,]

但我收到一条错误消息,提示我的维度数不正确。

非常感谢所有帮助!谢谢。

【问题讨论】:

    标签: r rows col


    【解决方案1】:

    如果有四百万个观察值,我会使用 data.table:

    DF <- read.table(text="names date_of_claim elig_end_date obs gaps
    1    tom    2010-01-01    2010-07-01   1    NA
    2    tom    2010-05-04    2010-07-01   1    0
    3    tom    2010-06-01    2014-01-01   2    0
    4    tom    2010-10-10    2014-01-01   2    0
    5   mary    2010-03-01    2014-06-14   1    NA
    6   mary    2010-05-01    2014-06-14   1    0
    7   mary    2010-08-01    2014-06-14   1    0
    8   mary    2010-11-01    2014-06-14   1    0
    9   mary    2011-01-01    2014-06-14   1    0
    10  john    2010-03-27    2011-03-01   1    NA
    11  john    2010-07-01    2011-03-01   1    0
    12  john    2010-11-01    2011-03-01   1    0
    13  john    2011-02-01    2011-03-01   1    0
    14   sue    2010-02-01    2010-04-30   1    NA
    15   sue    2010-02-27    2010-04-30   1    0
    16   sue    2010-03-13    2010-05-31   2    0
    17   sue    2010-04-27    2010-06-30   3    0
    18   sue    2010-04-27    2010-06-30   3    0
    19   sue    2010-05-06    2010-08-31   4    0
    20   sue    2010-06-08    2010-09-30   5    0
    21  mike    2010-05-01    2010-07-30   1    NA
    22  mike    2010-06-01    2010-07-30   1    0
    23  mike    2010-11-12    2011-07-30   2    1", header=TRUE)
    
    library(data.table)
    DT <- data.table(DF)
    
    DT[, c("date_of_claim", "elig_end_date") := list(as.Date(date_of_claim), as.Date(elig_end_date))]
    
    DT[, gaps2:= c(NA, date_of_claim[-1] > head(elig_end_date, -1)+30), by=names]
    
    #    names date_of_claim elig_end_date obs gaps gaps2
    # 1:   tom    2010-01-01    2010-07-01   1   NA    NA
    # 2:   tom    2010-05-04    2010-07-01   1    0 FALSE
    # 3:   tom    2010-06-01    2014-01-01   2    0 FALSE
    # 4:   tom    2010-10-10    2014-01-01   2    0 FALSE
    # 5:  mary    2010-03-01    2014-06-14   1   NA    NA
    # 6:  mary    2010-05-01    2014-06-14   1    0 FALSE
    # 7:  mary    2010-08-01    2014-06-14   1    0 FALSE
    # 8:  mary    2010-11-01    2014-06-14   1    0 FALSE
    # 9:  mary    2011-01-01    2014-06-14   1    0 FALSE
    # 10:  john    2010-03-27    2011-03-01   1   NA    NA
    # 11:  john    2010-07-01    2011-03-01   1    0 FALSE
    # 12:  john    2010-11-01    2011-03-01   1    0 FALSE
    # 13:  john    2011-02-01    2011-03-01   1    0 FALSE
    # 14:   sue    2010-02-01    2010-04-30   1   NA    NA
    # 15:   sue    2010-02-27    2010-04-30   1    0 FALSE
    # 16:   sue    2010-03-13    2010-05-31   2    0 FALSE
    # 17:   sue    2010-04-27    2010-06-30   3    0 FALSE
    # 18:   sue    2010-04-27    2010-06-30   3    0 FALSE
    # 19:   sue    2010-05-06    2010-08-31   4    0 FALSE
    # 20:   sue    2010-06-08    2010-09-30   5    0 FALSE
    # 21:  mike    2010-05-01    2010-07-30   1   NA    NA
    # 22:  mike    2010-06-01    2010-07-30   1    0 FALSE
    # 23:  mike    2010-11-12    2011-07-30   2    1  TRUE
    #     names date_of_claim elig_end_date obs gaps gaps2
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-08-16
      • 2018-01-25
      • 2018-04-02
      • 1970-01-01
      • 2015-12-18
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多