【问题标题】:R How can I find the most recent row with a certain valueR如何找​​到具有特定值的最新行
【发布时间】:2021-12-01 12:59:33
【问题描述】:

晚上好,

我在 R 中有一个非常大的数据集,我正在尝试找到循环遍历它以解决一些问题的最佳方法。将数据想象为历史员工工作时间。它看起来像:

rawTable:

Department      Name      Date         Hours

Engineering     Mary      2021-01-01   8
Engineering     Mary      2021-01-02   8
Engineering     Mary      2021-01-03   0
Engineering     Mary      2021-01-04   6
Sales           Barry     2021-01-01   0
Sales           Barry     2021-01-02   12
Sales           Barry     2021-01-03   12
Sales           Barry     2021-01-04   12    

我的名单上大约有 3200 人,一年中的每一天都是一行,所以这张表显然很大。

我需要在表格中添加两列:

第一个是 LDO,显示(每天)他们的最后一天休息

第二个是 WSH 显示该人自上次休假以来工作了多少小时。看起来像:

rawTable:

Department      Name      Date         Hours  LDO          WSH

Engineering     Mary      2021-01-01   8      2020-12-31   8
Engineering     Mary      2021-01-02   8      2020-12-31   16
Engineering     Mary      2021-01-03   0      2021-01-03   0
Engineering     Mary      2021-01-04   6      2021-01-03   6
Sales           Barry     2021-01-01   0      2021-01-01   0
Sales           Barry     2021-01-02   12     2021-01-01   12
Sales           Barry     2021-01-03   12     2021-01-01   24
Sales           Barry     2021-01-04   12     2021-01-01   36

我尝试使用 for 循环让它逐行应用逻辑。对于每一行,如果小时数为零,则 LDO=Date 且 WSH=0。如果不是,则前一行的 LDO=LDO 和前 + 小时的 WSH=WSH。使用这个尺寸设置,它需要永远半运行。

接下来我创建了一个函数,给定一行,使用大列表的副本,并根据“which”语句告诉我该人在行日期前 0 小时工作的最后一天的行号。这也花了很长时间。除此之外,我什至没有进入 WSH 部分。看起来像:

rawLU <- rawTable

LDO = function(x) {
  max(c(0, which((rawLU$Name == x["Name"]) &
                   (rawLU$Hours == 0) & (rawLU$Date <= x[Date])
  )))
}

LastOff<-apply(rawTable,1,LDO)

我知道有一种更简单的方法可以做到这一点,但我也知道我似乎无法弄清楚。

有人可以帮忙吗?提前致谢!

迈克

【问题讨论】:

    标签: r for-loop apply


    【解决方案1】:

    这是dplyr 的可能解决方案-

    如果是Hours = 0,则获取Date 的值,使用fill 获取其他行的上一个非工作日期。 WSH 可以使用cumsum 计算。

    library(dplyr)
    library(tidyr)
    
    rawTable %>%
      mutate(Date = as.Date(Date)) %>%
      group_by(Department, Name) %>%
      mutate(LDO = if_else(Hours == 0, Date, as.Date(NA))) %>%
      fill(LDO) %>%
      mutate(LDO = if_else(is.na(LDO), min(Date) - 1, LDO)) %>%
      group_by(LDO, .add = TRUE) %>%
      mutate(WSH = cumsum(Hours)) %>%
      ungroup
    
    #  Department  Name  Date       Hours LDO          WSH
    #  <chr>       <chr> <date>     <int> <date>     <int>
    #1 Engineering Mary  2021-01-01     8 2020-12-31     8
    #2 Engineering Mary  2021-01-02     8 2020-12-31    16
    #3 Engineering Mary  2021-01-03     0 2021-01-03     0
    #4 Engineering Mary  2021-01-04     6 2021-01-03     6
    #5 Sales       Barry 2021-01-01     0 2021-01-01     0
    #6 Sales       Barry 2021-01-02    12 2021-01-01    12
    #7 Sales       Barry 2021-01-03    12 2021-01-01    24
    #8 Sales       Barry 2021-01-04    12 2021-01-01    36
    

    数据

    rawTable <- structure(list(Department = c("Engineering", "Engineering", "Engineering", 
    "Engineering", "Sales", "Sales", "Sales", "Sales"), Name = c("Mary", 
    "Mary", "Mary", "Mary", "Barry", "Barry", "Barry", "Barry"), 
        Date = c("2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04", 
        "2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04"), 
        Hours = c(8L, 8L, 0L, 6L, 0L, 12L, 12L, 12L)), class = "data.frame", row.names = c(NA, -8L))
    

    【讨论】:

    • 效果很好,谢谢!
    【解决方案2】:
    df1 %>%
       group_by(Department, Name, grp = cumsum(Hours==0)) %>%
       mutate(Date = as.Date(Date),
          LDO = first(Date) - (first(Hours)>0),
          WHS = cumsum(Hours))
    
    # A tibble: 8 x 7
    # Groups:   Department, Name, grp [3]
      Department  Name  Date       Hours   grp LDO          WHS
      <chr>       <chr> <date>     <int> <int> <date>     <int>
    1 Engineering Mary  2021-01-01     8     0 2020-12-31     8
    2 Engineering Mary  2021-01-02     8     0 2020-12-31    16
    3 Engineering Mary  2021-01-03     0     1 2021-01-03     0
    4 Engineering Mary  2021-01-04     6     1 2021-01-03     6
    5 Sales       Barry 2021-01-01     0     2 2021-01-01     0
    6 Sales       Barry 2021-01-02    12     2 2021-01-01    12
    7 Sales       Barry 2021-01-03    12     2 2021-01-01    24
    8 Sales       Barry 2021-01-04    12     2 2021-01-01    36
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2011-03-21
      • 1970-01-01
      • 2022-01-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-10-09
      • 1970-01-01
      相关资源
      最近更新 更多