【问题标题】:How do I get the sum of one column based on a TRUE/FALSE in another column如何根据另一列中的 TRUE/FALSE 获得一列的总和
【发布时间】:2019-10-03 08:16:13
【问题描述】:

我正在尝试将数据汇总为日高峰和非高峰期。某些时间不是高峰期。

Date        Time        Value
2019-09-01  00:00:00    0.34
2019-09-01  00:30:00    0.34
2019-09-01  01:00:00    0.34
2019-09-01  01:30:00    0.38
2019-09-01  02:00:00    0.34
2019-09-01  02:30:00    0.34
2019-09-01  03:00:00    0.34
2019-09-01  03:30:00    0.34
2019-09-01  04:00:00    0.34
2019-09-01  04:30:00    0.34
2019-09-01  05:00:00    0.34
2019-09-01  05:30:00    0.34
2019-09-01  06:00:00    0.41
2019-09-01  06:30:00    0.53
2019-09-01  07:00:00    0.56
2019-09-01  07:30:00    0.56
2019-09-01  08:00:00    0.53
2019-09-01  08:30:00    0.66
2019-09-01  09:00:00    1.03
2019-09-01  09:30:00    1.03

我已经使用这个添加峰值真/假到我的数据帧

Data$Peak <- Data$Time > "07:00:00" & Data$Time <= "23:00:00" & !grepl("S.+", weekdays(Data$Date))

这几乎可以满足我的要求。所有的值都在那里,但在一个很长的列表中。

Day_Summary <- aggregate(Data$Value, by=list(Data$Date, Data$Peak), FUN=sum)

我也尝试过summarizemutate,但没有得到我想要的。任何帮助都会很棒。

我希望数据显示为这样。

Date, Peak, OffPeak
2019-09-01, 156, 36
2019-09-02, 145, 56
2019-09-02, 180, 0

【问题讨论】:

    标签: r


    【解决方案1】:

    使用data.table,并假设周一至周五 07-23 是高峰期,而本周剩余时间是非高峰期...

    样本数据

    library(data.table)
    
    #create sample data
    dt <- fread("Date   Time    Value
    2019-09-02  00:00:00    0.34
    2019-09-02  00:30:00    0.34
    2019-09-02  01:00:00    0.34
    2019-09-02  01:30:00    0.38
    2019-09-02  02:00:00    0.34
    2019-09-02  02:30:00    0.34
    2019-09-02  03:00:00    0.34
    2019-09-02  03:30:00    0.34
    2019-09-02  04:00:00    0.34
    2019-09-02  04:30:00    0.34
    2019-09-02  05:00:00    0.34
    2019-09-02  05:30:00    0.34
    2019-09-02  06:00:00    0.41
    2019-09-02  06:30:00    0.53
    2019-09-02  07:00:00    0.56
    2019-09-02  07:30:00    0.56
    2019-09-02  08:00:00    0.53
    2019-09-02  08:30:00    0.66
    2019-09-02  09:00:00    1.03
    2019-09-02  09:30:00    1.03")
    #set date as iDate te and time as iTime
    dt[, `:=`( Date = as.IDate( Date ),
               Time = as.ITime( Time ) )]
    

    代码

    #NB, in data.table::wday, Sunday = 1 !! 
    #create column with peak/off-peak
    #assuming peak = Mon-Fri 7-23
    #initialise period column, all = "off-peak"
    dt[, period := "off-peak" ]
    #update period-column peak-period entries to "peak"
    dt[ !data.table::wday( Date ) %in% c(1,7) & 
          Time %between% c( as.ITime( "07:00:00" ), as.ITime( "23:00:00" ) ),
        period := "peak"]
    #summarise
    ans <- dt[, .( sum = sum( Value ) ), by = .( Date, period ) ]
    #cast to wide
    dcast( ans, Date ~ period, value.var = "sum", fill = 0 )
    

    输出

    #          Date off-peak peak
    # 1: 2019-09-02     5.06 4.37
    

    【讨论】:

      【解决方案2】:

      您可以创建一个详细的时间列Time2,即"POSIXct" 格式的日期时间。我在下面做了一些示例数据DF

      DF$Time2 <- as.POSIXct(sapply(1:nrow(DF), function(x) Reduce(paste, DF[x, c("Date", "Time")])))
      

      您可以从Time2 应用format() 使用this solution 创建一个小时-分钟-秒-列hms。这里的诀窍是,hms 只显示没有日期的时间,这有助于找到Peaks。

      DF$hms <- format(as.POSIXct(DF$Time2), "%H:%M:%S")
      DF$Peak <- with(DF, hms > "07:00:00" & hms <= "23:00:00" & !grepl("S.+", weekdays(Time2)))
      

      最后我们做了两个aggregate()s:在第一个中,我们做了一个与以前类似的“技巧”,但使用as.Date 去掉时间,第二个重新排列结果。用setNames() 设置好名字。 (我们还应该在它周围包裹一个do.call(data.frame, .) 以获得一个干净的结构,如this answer 中所述。)

      a1 <- with(DF, aggregate(Value, list(Peak=Peak, Date=as.Date(Time2)), sum))
      res <- setNames(do.call(data.frame, 
                              aggregate(x ~ Date, a1[-1, ], I)
                              ),
                      c("Date", "OffPeak", "Peak"))[c(1, 3, 2)]
      

      结果

      res
      #         Date  Peak OffPeak
      # 1 2019-01-01 41.38    8.51
      # 2 2019-01-02 49.12   11.41
      # 3 2019-01-03 37.38    6.46
      

      数据

      DF <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
      2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
      2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
      2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
      3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("2019-01-01", 
      "2019-01-02", "2019-01-03"), class = "factor"), Time = structure(c(1L, 
      2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
      16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 
      29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 
      42L, 43L, 44L, 45L, 46L, 47L, 48L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
      8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
      21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 
      34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 
      47L, 48L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
      13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 
      26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 
      39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L), .Label = c("00:00:00", 
      "00:30:00", "01:00:00", "01:30:00", "02:00:00", "02:30:00", "03:00:00", 
      "03:30:00", "04:00:00", "04:30:00", "05:00:00", "05:30:00", "06:00:00", 
      "06:30:00", "07:00:00", "07:30:00", "08:00:00", "08:30:00", "09:00:00", 
      "09:30:00", "10:00:00", "10:30:00", "11:00:00", "11:30:00", "12:00:00", 
      "12:30:00", "13:00:00", "13:30:00", "14:00:00", "14:30:00", "15:00:00", 
      "15:30:00", "16:00:00", "16:30:00", "17:00:00", "17:30:00", "18:00:00", 
      "18:30:00", "19:00:00", "19:30:00", "20:00:00", "20:30:00", "21:00:00", 
      "21:30:00", "22:00:00", "22:30:00", "23:00:00", "23:30:00"), class = "factor"), 
          Value = c(0.03, 0.04, 0.06, 0.1, 0.2, 0.22, 0.23, 0.28, 0.28, 
          0.31, 0.31, 0.35, 0.35, 0.37, 0.39, 0.39, 0.41, 0.44, 0.47, 
          0.48, 0.5, 0.57, 0.62, 0.66, 0.66, 0.67, 0.71, 0.72, 0.74, 
          0.78, 1.19, 1.2, 1.21, 1.25, 1.25, 1.29, 1.31, 1.34, 1.42, 
          1.46, 1.52, 1.76, 1.9, 2.41, 3.02, 4.17, 4.86, 5, 0.03, 0.03, 
          0.07, 0.13, 0.15, 0.16, 0.16, 0.18, 0.22, 0.22, 0.24, 0.25, 
          0.29, 0.33, 0.4, 0.42, 0.44, 0.45, 0.47, 0.47, 0.49, 0.5, 
          0.51, 0.55, 0.63, 0.64, 0.66, 0.67, 0.91, 1.03, 1.06, 1.12, 
          1.12, 1.13, 1.2, 1.27, 1.34, 1.45, 1.54, 1.57, 1.65, 1.75, 
          2.36, 2.51, 5.71, 6.65, 6.85, 8.46, 0.07, 0.08, 0.09, 0.09, 
          0.09, 0.1, 0.12, 0.17, 0.18, 0.22, 0.3, 0.36, 0.36, 0.38, 
          0.44, 0.46, 0.46, 0.48, 0.49, 0.49, 0.54, 0.55, 0.56, 0.57, 
          0.59, 0.65, 0.73, 0.77, 0.79, 0.8, 0.84, 0.99, 1.04, 1.11, 
          1.27, 1.34, 1.35, 1.42, 1.42, 1.82, 1.88, 1.89, 1.94, 1.96, 
          2.24, 2.85, 3.09, 3.56)), row.names = c(NA, -144L), class = "data.frame")
      

      【讨论】:

      • 谢谢 jay.sf。我已经尝试过一段时间的解决方案,但我无法得到我想要的结果。出于某种原因,这并没有为我提供正确的数据信息。 res &lt;- setNames(do.call(data.frame, aggregate(x ~ Date, a1[-1, ], I)), c("Date", "OffPeak", "Peak"))[c(1, 3, 2)] 问候
      • 你好 jay.sf dput(head(Data,10)) structure(list(Date = structure(c(1567296000, 1567296000, 1567296000, 1567296000, 1567296000, 1567296000, 1567296000, 1567296000, 1567296000, 1567296000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Time = c("00:00:00", "00:05:00", "00:10:00", "00:15:00", "00:20:00", "00:25:00", "00:30:00", "00:35:00", "00:40:00", "00:45:00"), kWhDelta = c(0.34, 0.34, 0.38, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34)), .Names = c("Date", "Time", "kWhDelta" ), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame" ))
      • 对不起,我很新来这里发帖。这有帮助吗?
      • @MrEMan 是的,这很酷,现在我可以看到您的Date 列已经是POSIXct 格式,你知道吗。您可能需要用kWhDelta 替换我的解决方案中的Value,然后它应该可以工作。我已经稍微修改了我的解释,请参阅更新的答案!
      猜你喜欢
      • 1970-01-01
      • 2017-08-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-11-10
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多