【问题标题】:Batch column aggregation and reordering dataframe in RR中的批量列聚合和重新排序数据框
【发布时间】:2021-05-19 18:00:05
【问题描述】:

我有人口普查数据将我的年龄变量按性别划分为男性 (varname_m) 和女性 (varname_f) 的值:

Rows: 146,112
Columns: 13
$ tractid    <chr> "01001020100", "01001020100", "01001020200", "01001020200", "01001020300", "01001020300", "01001020400", "01001020400", "01001020500", "01001020500", "0100102060…
$ ag18to19_m <dbl> 37, 57, 24, 15, 49, 27, 87, 33, 293, 159, 57, 40, 19, 41, 18, 56, 143, 86, 25, 155, 41, 7, 40, 0, 35, 0, 99, 25, 190, 420, 61, 157, 63, 110, 37, 127, 67, 45, 198…
$ ag20_m     <dbl> 6, 14, 64, 0, 11, 18, 16, 8, 115, 21, 42, 15, 53, 71, 16, 0, 63, 77, 43, 96, 32, 15, 21, 0, 12, 44, 8, 0, 105, 80, 34, 20, 8, 0, 13, 46, 88, 0, 83, 241, 10, 96, …
$ ag21_m     <dbl> 18, 0, 15, 7, 0, 16, 117, 18, 14, 40, 23, 26, 45, 47, 32, 0, 41, 50, 0, 76, 14, 45, 20, 1, 48, 11, 11, 30, 18, 30, 60, 55, 20, 0, 28, 43, 31, 21, 9, 0, 11, 8, 0,…
$ ag22to24_m <dbl> 48, 64, 109, 45, 25, 62, 65, 41, 224, 531, 28, 51, 31, 60, 0, 24, 132, 96, 59, 98, 27, 45, 111, 30, 113, 58, 71, 61, 46, 114, 11, 86, 116, 99, 28, 158, 72, 135, …
$ ag25to29_m <dbl> 49, 31, 83, 99, 87, 144, 153, 142, 428, 327, 69, 35, 36, 22, 61, 113, 202, 420, 184, 255, 94, 84, 118, 82, 71, 30, 47, 195, 44, 135, 118, 150, 215, 157, 118, 180…
$ ag30to34_m <dbl> 52, 72, 59, 97, 84, 157, 124, 85, 415, 227, 95, 13, 105, 202, 37, 86, 274, 334, 161, 182, 91, 173, 84, 84, 81, 106, 79, 67, 263, 77, 40, 115, 199, 411, 81, 115, …
$ ag18to19_f <dbl> 33, 8, 51, 7, 31, 19, 107, 15, 33, 25, 47, 37, 35, 81, 98, 92, 127, 147, 72, 0, 109, 57, 7, 74, 78, 0, 36, 24, 109, 268, 88, 62, 10, 0, 47, 33, 79, 191, 63, 134,…
$ ag20_f     <dbl> 13, 40, 23, 18, 27, 18, 12, 11, 37, 0, 58, 83, 19, 45, 20, 77, 16, 103, 0, 36, 15, 0, 8, 37, 29, 34, 36, 0, 23, 30, 37, 0, 10, 48, 51, 67, 17, 15, 125, 55, 27, 1…
$ ag21_f     <dbl> 40, 6, 13, 24, 36, 0, 16, 19, 17, 0, 11, 0, 0, 89, 28, 31, 39, 20, 15, 0, 7, 13, 0, 17, 9, 13, 17, 47, 106, 36, 42, 94, 0, 13, 19, 50, 67, 0, 122, 48, 21, 9, 145…
$ ag22to24_f <dbl> 21, 67, 71, 21, 69, 35, 28, 165, 346, 350, 15, 0, 53, 50, 25, 42, 207, 165, 158, 114, 20, 0, 73, 66, 29, 29, 59, 39, 83, 94, 22, 24, 79, 69, 37, 21, 73, 201, 282…
$ ag25to29_f <dbl> 36, 24, 86, 51, 88, 160, 130, 73, 318, 539, 157, 127, 128, 111, 86, 29, 334, 365, 87, 217, 57, 60, 177, 92, 17, 90, 86, 113, 67, 204, 136, 120, 130, 108, 211, 51…
$ ag30to34_f <dbl> 36, 73, 38, 42, 87, 154, 63, 84, 440, 414, 51, 95, 151, 73, 27, 70, 429, 458, 231, 173, 54, 82, 104, 24, 61, 159, 69, 30, 218, 82, 88, 214, 222, 158, 76, 125, 24…

我想将每个按性别划分的变量聚合为一个组合变量。比如我想添加 ag18to19_m 和 ag18to19_f 来创建 ag18to19。我可以使用 mutate 和以下代码轻松做到这一点,并将它们排序到数据框的前面:

aggregated <- merged %>% 
  mutate(ag18to19 = ag18to19_m + ag18to19_f) %>% 
  relocate(ag18to19, .before = ag18to19_m)  %>% 
  
  mutate(ag20 = ag20_m + ag20_f) %>% 
  relocate(ag20, .before = ag20_m)  %>% 
  
  mutate(ag21 = ag21_m + ag21_f) %>% 
  relocate(ag21, .before = ag21_m)  %>% 
  
  mutate(ag22to24 = ag22to24_m + ag22to24_f) %>% 
  relocate(ag22to24, .before = ag22to24_m)  %>% 
  
  mutate(ag25to29 = ag25to29_m + ag25to29_f) %>% 
  relocate(ag25to29, .before = ag25to29_m)  %>% 
  
  mutate(ag30to34 = ag30to34_m + ag30to34_f) %>% 
  relocate(ag30to34, .before = ag30to34_m)

我知道有一种更有效的方法可以使用循环或 map_df 函数来执行此操作,它还会给我一个数据框作为输出。在过去的一个小时里,我一直在尝试编写一个函数并使用 map_df,但没有任何成功。有人有什么建议吗?

此处更高效的代码是最佳实践,它将帮助我将相同的数据清理步骤应用于以相同方式分组的其他几个变量(例如,按性别分组的收入或按年龄分组的教育)。

任何帮助将不胜感激。谢谢。

【问题讨论】:

    标签: r function tidyverse dplyr


    【解决方案1】:

    这是tidyverse中的一个选项

    library(dplyr)
    library(stringr)
    merged1 <- merged %>% 
         mutate(across(ends_with('_m'), ~ 
                       . + get(str_replace(cur_column(), '_m', '_f')),
                    .names = '{.col}_new')) %>%
           rename_at(vars(ends_with('_new')),
                  ~ str_remove(., '_[m]_new$')) %>%
           select(tract_id, order(names(.)[-1]) + 1)
    

    【讨论】:

    • 谢谢,@akrun!这很有效,正是我想要的。
    猜你喜欢
    • 1970-01-01
    • 2019-01-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-07-03
    • 2012-10-01
    • 1970-01-01
    相关资源
    最近更新 更多