【问题标题】:Tidying Dataset整理数据集
【发布时间】:2019-11-30 15:55:18
【问题描述】:

一直在努力整理数据集。简要说明:每一行代表一个人,然后对于我们有分数的所有日期,列重复“Date”、“ScoreA”、“ScoreB”等。希望得到它的所有日期都在一列中,并且相应的分数在相邻的列中。尝试了简单的 pivot_longer 或 pivot_wider 的各种组合,但到目前为止没有成功。示例文件位于

https://github.com/tueland/R_help.git

感谢您的帮助!

【问题讨论】:

    标签: r tidyverse


    【解决方案1】:

    我们可以在做一些names_repair之后使用pivot_longer

    library(tidyr)
    library(dplyr)
    
    nm1 <- sub("\\.?\\d+$", "", names(df1)[-1])
    names(df1)[-1] <- paste0(nm1, "_", ave(nm1, nm1, FUN = seq_along))
    df1 %>%
        select(-c("X_1", "X_2")) %>% 
        pivot_longer(cols = -ID, names_to = c(".value", "group"), 
                names_sep="_", values_drop_na = TRUE ) %>% 
        na.omit
    # A tibble: 5 x 6
    #     ID group Date   ScoreA ScoreB ScoreC
    #  <int> <chr> <fct>   <dbl>  <dbl>  <dbl>
    #1   123 1     1/1/11    1      2      3  
    #2   123 2     1/2/11    4      5      6  
    #3   123 3     1/4/11    6.1    6.2    6.3
    #4   345 1     2/2/22    7      8      9  
    #5   345 2     2/3/22   10     11     12  
    

    map

    library(purrr)
    library(stringr)
    map_dfr(1:3, ~ df1 %>%
                     select(ID, ends_with(as.character(.x))) %>% 
                     rename_all(~ str_remove(., "_\\d+$")))  %>% 
             select(-X) %>% 
             na.omit
    #   ID   Date ScoreA ScoreB ScoreC
    #1  123 1/1/11    1.0    2.0    3.0
    #2  345 2/2/22    7.0    8.0    9.0
    #6  123 1/2/11    4.0    5.0    6.0
    #7  345 2/3/22   10.0   11.0   12.0
    #11 123 1/4/11    6.1    6.2    6.3
    

    或者用melt 来自data.table

    library(data.table)
    na.omit(melt(setDT(df1), measure = patterns("^Date", "ScoreA", 
       "ScoreB", "ScoreC", "^X"),
        value.name = c("Date", "ScoreA", "ScoreB", "ScoreC", "X"))[, X := NULL])
    #    ID variable   Date ScoreA ScoreB ScoreC
    #1: 123        1 1/1/11    1.0    2.0    3.0
    #2: 345        1 2/2/22    7.0    8.0    9.0
    #3: 123        2 1/2/11    4.0    5.0    6.0
    #4: 345        2 2/3/22   10.0   11.0   12.0
    #5: 123        3 1/4/11    6.1    6.2    6.3
    

    数据

    df1 <- read.csv("https://raw.githubusercontent.com/tueland/R_help/master/R%20help.csv")
    

    【讨论】:

    • 完美。感谢您的帮助!
    猜你喜欢
    • 2018-05-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-04-26
    • 2018-08-17
    • 2022-10-23
    相关资源
    最近更新 更多