【问题标题】:Purr, Grouping and Filtering咕噜声,分组和过滤
【发布时间】:2019-06-19 10:48:55
【问题描述】:

我正在使用 purrr 和函数式编程让我的婴儿步,我可能会淹死在一杯水中。 考虑列表

zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, 
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081, 
23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942, 
34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306, 
14207780.264), relation = c("EU28-Algeria", "EU28-Algeria", "EU28-Algeria", 
"EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria", 
"World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria", 
"Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467, 
0.0271163231905857, -0.0573261107603093, -0.000504474880914325, 
0.614846575418334, 0.0272549232650638, -0.0156418673197543,     0.0326138831530727, 
0.428272657063707, 0.0275142592018328, 0.0623237165799383, 0.0875811837579971
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001, 
2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171, 
7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887, 
16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207
), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt", 
"World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra EU28-Egypt", 
"Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"), 
 g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081, 
 -0.120399959882366, 0.124744629514854, -0.0721097823643728, 
-0.0202454077789513, -0.174521376957825, 0.146712116047648, 
 -0.0146912579338002, 0.0163501051368976, -0.206837670383671
)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
)))

我能够用地图做一些非常简单的事情,例如迭代地取某一列的平均值

map(zz, function(x) mean(x$tot_i))

或过滤年份的值

map(zz, function(x) filter(x, year==2000))

但是,只要我想增加一点复杂性,我就会把头撞到墙上。比如

1) 我想通过关系对 zz 中的数据进行迭代分组,并通过取 tot_i 和的平均值来总结它们

2) 给定一个年份列表

   ll<-list(c(2000, 2001), c(2001, 2003))

我想根据ll中列出的年份过滤zz列表的两个元素。

然后我将对数据执行大量其他操作,但已经了解 1 和 2 将使我远离现在的困境。

欢迎提出任何建议。

【问题讨论】:

    标签: r dplyr purrr


    【解决方案1】:

    当我们从 'll' 的相应元素中提取子集时,使用 map2 循环基于 'year' 元素 %in% .ylists 和 filter

    map2(zz, ll, ~ .x %>% 
                   filter(year %in% .y))
    #[[1]]
    # A tibble: 6 x 4
    #   year     tot_i relation           g_rate
    #  <dbl>     <dbl> <chr>               <dbl>
    #1  2000 22393349. EU28-Algeria       0.736 
    #2  2001 23000574. EU28-Algeria       0.0271
    #3  2000 34361300. World-Algeria      0.615 
    #4  2001 35297815. World-Algeria      0.0273
    #5  2000 11967951. Extra EU28-Algeria 0.428 
    #6  2001 12297241. Extra EU28-Algeria 0.0275
    
    #[[2]]
    # A tibble: 6 x 4
    #   year     tot_i relation          g_rate
    #  <dbl>     <dbl> <chr>              <dbl>
    #1  2001  7869288. EU28-Egypt       -0.148 
    #2  2003  6395999. EU28-Egypt       -0.120 
    #3  2001 19851236. World-Egypt      -0.0721
    #4  2003 16055014. World-Egypt      -0.175 
    #5  2001 11981948. Extra EU28-Egypt -0.0147
    #6  2003  9659015. Extra EU28-Egypt -0.207 
    

    如果我们使用匿名函数,那么有两个参数而不是 1

    map2(zz, ll, function(x, y) filter(x, year %in% y))
    

    类似于我们从base R使用Map的方式

    Map(function(x, y) subset(x, year %in% y), zz, ll)
    

    【讨论】:

      猜你喜欢
      • 2020-03-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-12-16
      • 1970-01-01
      • 2014-02-26
      • 1970-01-01
      相关资源
      最近更新 更多