【问题标题】:Using pivot_wider with over 100 variables使用具有 100 多个变量的 pivot_wider
【发布时间】:2020-05-13 04:37:28
【问题描述】:

求助pivot_wider的使用。我有超过 100 个变量和 30 万个观察值的大型数据集。我试图根据一个关键变量进一步传播信息,但我什至可能使用了错误的方法。一个更简洁的例子:

> example <- data.frame(incident_num = c("X1", "X1", "X2", "X3", "X3", "X3", "X4"),
                        unit_num     = c("T1", "E2", "M1", "M3", "T5", "E6", "M5"))
> example
  incident_num unit_num
1           X1       T1
2           X1       E2
3           X2       M1
4           X3       M3
5           X3       T5
6           X3       E6
7           X4       M5

我想要达到的是:

output
  incident_num unit_num_1 unit_num_2 unit_num_3
1           X1         T1         E2       <NA>
2           X2         M1       <NA>       <NA>
3           X3         M3         T5         E6
4           X4         M5       <NA>       <NA>

这还包括与unit_num 关联的所有其他变量。任何帮助将不胜感激!

【问题讨论】:

    标签: r dplyr tidyr


    【解决方案1】:
    library(tidyverse)
    
    example %>% 
      group_by(incident_num) %>% 
      mutate(id = seq_along(incident_num)) %>% 
      pivot_wider(incident_num,
                  names_from = id,
                  values_from = unit_num,
                  names_prefix = "unit_num_")
    

    看看结果,我理解对了吗?

    df <- data.frame(incident_num = c("X1", "X1", "X2", "X3", "X3", "X3", "X4"),
                     unit_num1     = c("T1", "E2", "M1", "M3", "T5", "E6", "M5"), 
                     unit_num2     = c("A1", "B2", "C1", "C3", "J5", "U6", "B5"))
    
    df %>% 
      group_by(incident_num) %>% 
      mutate(id = row_number()) %>% 
      pivot_wider(incident_num,
                  names_from = id,
                  values_from = vars_select(names(df), starts_with("unit_num")))
    

    请为您的示例显示结果。你期待吗?

    df %>% 
      group_by(incident_number) %>%
      mutate(id = row_number()) %>% 
      pivot_wider(incident_number, 
                  names_from = id, 
                  values_from = tidyselect::vars_select(names(df), -incident_number))
    

    【讨论】:

    • 如何保留与此相关的其他变量?
    • 将计算结果保存在数据 dfame df &lt;- example %&gt;% group_by(incident_num)... ect
    • 我做到了;问题是我只得到了两个变量 - 一列 incident_num 和其他列来自传播数据。我还有 100 个变量需要保留。
    • 仍然遇到同样的问题...是不是我的df太大了?我有变量 113 个唯一变量;例如incident_numberunit_call_signzip_codeprocedure_1procedure_2...我不明白为什么这对我来说是一场如此挣扎!
    • 查看答案中的新示例
    【解决方案2】:

    我们可以创建一个按“incident_num”分组的序列列,然后执行pivot_wider

    library(dplyr)
    library(tidyr)
    library(stringr)
    example %>% 
        group_by(incident_num) %>%
        mutate(rn = str_c('unit_num_', row_number())) %>%
        ungroup %>%
        pivot_wider(names_from = rn, values_from = unit_num)
    # A tibble: 4 x 4
    #  incident_num unit_num_1 unit_num_2 unit_num_3
    #  <fct>        <fct>      <fct>      <fct>     
    #1 X1           T1         E2         <NA>      
    #2 X2           M1         <NA>       <NA>      
    #3 X3           M3         T5         E6        
    #4 X4           M5         <NA>       <NA>      
    

    【讨论】:

      猜你喜欢
      • 2019-12-09
      • 1970-01-01
      • 1970-01-01
      • 2022-01-03
      • 2020-01-30
      • 1970-01-01
      • 2012-08-19
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多