【问题标题】:Assign year ID if start month is other than January [duplicate]如果开始月份不是一月,则分配年份 ID [重复]
【发布时间】:2016-06-01 07:16:32
【问题描述】:

我有一个df data.frame,其中包含 8 年的每日价值。

date <- rep(as.Date(seq(as.Date("2001-05-01"),
                    as.Date("2008-04-30"), by= 1), format="%Y-%m-%d"), 3)

site <- c(rep("Site_1", 2557), rep("Site_2", 2557), rep("Site_3", 2557))

value <- c(as.numeric(sample(90:271, 2557, replace=T)),
           as.numeric(sample(125:340, 2557, replace=T)),
           as.numeric(sample(70:173, 2557, replace=T)))

df <- data.frame(date, site, value)

在这种情况下,每年从 5 月 开始,到 4 月 结束。

我想在 3 sites 处为每个 year 获取 meansdvalue

我做了以下

df1 <- df %>%
  dplyr::mutate(year = ifelse(date < "2002-05-01", "2001-2002",
                              ifelse(date < "2003-05-01", "2002-2003",
                                     ifelse(date < "2004-05-01", "2003-2004",
                                            ifelse(date < "2005-05-01", "2004-2005",
                                                   ifelse(date < "2006-05-01", "2005-2006",
                                                          ifelse(date < "2007-05-01", "2006-2007",
                                                                 ifelse(date < "2008-05-01", "2007-2008", NA )))))))) %>%
  dplyr::select(site, year, value) %>%
  dplyr::group_by(site, year) %>%
  dplyr::summarise_each(funs(
    mean(.),
    sd(.)
  ))

它给了我想要的东西。但是,如果我有 30 到 50 年的数据,那就需要时间了。另外,如果每个新的data.frame 有不同的开始月份,我需要每次修改ifelse() 以分配年份ID,以便能够按year 分组并进行不同的计算。

如果开始月份是 1 月以外的任何月份,是否有任何直接分配 yearID 的方法?

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    怎么样

    library(dplyr)
    df %>% 
      group_by(year=cut(date, seq(as.Date("2001-05-01"), as.Date("2008-05-01"), "1 year"), include.lowest = TRUE), site) %>%
      summarise(sd = sd(value), mean = mean(value)) 
    # Source: local data frame [21 x 4]
    # Groups: year [?]
    # 
    #          year   site       sd     mean
    #        (fctr) (fctr)    (dbl)    (dbl)
    # 1  2001-05-01 Site_1 51.82622 182.5890
    # 2  2001-05-01 Site_2 63.33385 241.1260
    # 3  2001-05-01 Site_3 30.04042 120.1233
    # 4  2002-05-01 Site_1 51.66325 182.6658
    # 5  2002-05-01 Site_2 62.87470 236.4192
    # 6  2002-05-01 Site_3 28.54769 122.2329
    # 7  2003-05-01 Site_1 50.97588 179.0874
    # 8  2003-05-01 Site_2 63.48810 227.1230
    # 9  2003-05-01 Site_3 30.87933 120.4918
    # 10 2004-05-01 Site_1 53.19898 176.5589
    # ..        ...    ...      ...      ...
    

    【讨论】:

    • 感谢卢克的时间和帮助
    【解决方案2】:

    使用包lubridate,您可以先添加列year,如下所示:

    library(lubridate) 
    df$year <- ifelse(month(ymd(df$date)) < 5, 
                      paste(year(ymd(df$date))-1, year(ymd(df$date)), sep="-"),
                      paste(year(ymd(df$date)), year(ymd(df$date))+1, sep="-"))
    
    
    
    df %>% dplyr::select(site, year, value) %>%
        dplyr::group_by(site, year) %>%
        dplyr::summarise_each(funs(
          mean(.),
          sd(.)
        ))
    
    Source: local data frame [6 x 4]
    Groups: site [1]
    
        site      year     mean       sd
      (fctr)     (chr)    (dbl)    (dbl)
    1 Site_1 2001-2002 178.2055 54.58277
    2 Site_1 2002-2003 176.9342 49.64435
    3 Site_1 2003-2004 177.4153 52.20447
    4 Site_1 2004-2005 179.5370 52.77848
    5 Site_1 2005-2006 180.3671 51.41292
    6 Site_1 2006-2007 179.3616 53.02291
    

    【讨论】:

    • 感谢您的宝贵时间和帮助
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-12-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多