【问题标题】:Creating list with the same number of values创建具有相同数量值的列表
【发布时间】:2021-06-01 20:47:44
【问题描述】:

我有一个包含日期、ID 和坐标的数据集,我想将其拆分为季节性月份。例如,对于冬天,我将一月发送至winter1,二月发送至winter2,三月发送至winter3。我在夏季也这样做了。

我想过滤掉所有这些月份的 ID,这样当我按 ID 和年份拆分数据时,我会得到相同的列表长度。

我不确定如何在下面的示例代码中模拟每个 ID 的不均匀值,但在我的实际数据中,一些 ID 只有summer1 而不是winter1,而可以翻转为summer2和winter2`。

library(lubridate)
library(tidyverse)

date <- rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"),1000)
ID <- rep(seq(1, 5), 100)

df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

df$month <- month(df$date)
df$year <- year(df$date)

df1 <- df %>%
  mutate(season_categ = case_when(month %in% 6 ~ 'summer1',
                                  month %in% 7 ~ 'summer2',
                                  month %in% 8 ~ 'summer3',
                                  month %in% 1 ~ 'winter1',
                                  month %in% 2 ~ 'winter2',
                                  month %in% 3 ~ 'winter3')) %>%
  group_by(year, ID )%>% 
  filter(any(month %in% 6:8) &
           any(month %in% 1:3))

summer_list <- df1 %>% 
  filter(season_categ == "summer1") %>% 
  group_split(year, ID)

# Renames the names in the list to AnimalID and year
names(summer_list) <- sapply(summer_list, 
                             function(x) paste(x$ID[1], 
                                               x$year[1], sep = '_'))

# Creates a list for each year and by ID
winter_list <- df1 %>% 
  filter(season_categ == "winter1") %>% 
  group_split(year, ID)

names(winter_list) <- sapply(winter_list, 
                             function(x) paste(x$ID[1], 
                                               x$year[1], sep = '_'))


【问题讨论】:

    标签: r dplyr tidyverse tidyr


    【解决方案1】:

    不确定这是否是您想要的,但我知道您希望摆脱在任何年份的第一季度和第三季度少于 6 个月的 ID,但您可以修改过滤器或分组,如果这个假设是错误的。

    这是一种方法:

    library(lubridate)
    library(dplyr)
    set.seed(12345)
    
    # random sampling of dates with this seed gives no July date for ID 2 in 2010
    df <- tibble(
      date = sample(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 
      1000, replace = TRUE), 
      x = runif(length(date), min = 60000, max = 80000),
      y = runif(length(date), min = 800000, max = 900000),
      ID = rep(1:5, 200),
      month = month(date),
      year  =year(date)) %>% 
      arrange(ID, date)
    
    df %>%
      filter(month %in% c(1:3, 6:8)) %>% 
      group_by(ID, year) %>% 
      mutate(complete = length(unique(month)) == 6) %>%
      group_by(ID) %>% 
      filter(all(complete)) %>%
      group_by(ID, year) %>% 
      group_split()
    

    【讨论】:

      【解决方案2】:

      对我来说,您在寻找什么并不是很清楚。在将数据拆分为列表之前,按列对行进行排序

      df1<-df1[order(ID,season_categ),]
      
      ### Determine which ID's have uneven numbers ###
      df1 %>%
      group_by(ID) %>%
      summarize(month_seq = paste(season_categ , collapse = "_"),
                number_of_months = n(season_categ))
      
      #### Remove odd numbers###
      

      【讨论】:

      • 我想基本上我想用我指出的所有 6 个月来识别我的数据集中的个人。最后一行代码Error: Problem with `summarise()` input `number_of_months`. x unused argument (season_categ) i Input `number_of_months` is `n(season_categ)`. i The error occurred in group 1: ID = 1. 也出现错误。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-09-10
      • 2011-10-12
      • 2012-08-20
      • 2015-07-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多