【问题标题】:How to access/select nested data frame column with dplyr如何使用 dplyr 访问/选择嵌套数据框列
【发布时间】:2021-04-09 02:49:41
【问题描述】:

我有以下数据框:

library(tidyverse)
iris %>% 
  dplyr::select(Species, Petal.Width) %>% 
  as_tibble() 

然后我通过Species分组得到mean_se,代码如下:

df <- iris %>% 
  dplyr::select(Species, Petal.Width) %>% 
  as_tibble() %>%
  group_by(Species) %>% 
  mutate(ms = mean_se(Petal.Width))

看起来像这样:

 df
# A tibble: 150 x 3
# Groups:   Species [3]
   Species Petal.Width  ms$y $ymin $ymax
   <fct>         <dbl> <dbl> <dbl> <dbl>
 1 setosa          0.2 0.246 0.231 0.261
 2 setosa          0.2 0.246 0.231 0.261
 3 setosa          0.2 0.246 0.231 0.261
 4 setosa          0.2 0.246 0.231 0.261
 5 setosa          0.2 0.246 0.231 0.261
 6 setosa          0.4 0.246 0.231 0.261
 7 setosa          0.3 0.246 0.231 0.261
 8 setosa          0.2 0.246 0.231 0.261
 9 setosa          0.2 0.246 0.231 0.261
10 setosa          0.1 0.246 0.231 0.261

但是当我想选择ms$y$ymax 列时,像这样

> df %>% dplyr::select(Species, ms$y, $ymax)
Error: unexpected '$' in "df %>% dplyr::select(Species, ms$y, $"

失败了。有什么办法呢?

【问题讨论】:

    标签: r dplyr tidyverse


    【解决方案1】:

    它被存储为嵌套数据框。您可以将其转换为普通数据框:

    library(tidyverse)
    
    iris %>% 
      select(Species, Petal.Width) %>% 
      as_tibble() %>%
      group_by(Species) %>% 
      mutate(ms = mean_se(Petal.Width)) %>%
      ungroup -> tmp
    
    df <- bind_cols(tmp %>% select(-ms), tmp$ms) 
    df
    
    # A tibble: 150 x 5
    #   Species Petal.Width     y  ymin  ymax
    #   <fct>         <dbl> <dbl> <dbl> <dbl>
    # 1 setosa          0.2 0.246 0.231 0.261
    # 2 setosa          0.2 0.246 0.231 0.261
    # 3 setosa          0.2 0.246 0.231 0.261
    # 4 setosa          0.2 0.246 0.231 0.261
    # 5 setosa          0.2 0.246 0.231 0.261
    # 6 setosa          0.4 0.246 0.231 0.261
    # 7 setosa          0.3 0.246 0.231 0.261
    # 8 setosa          0.2 0.246 0.231 0.261
    # 9 setosa          0.2 0.246 0.231 0.261
    #10 setosa          0.1 0.246 0.231 0.261
    # … with 140 more rows
    

    选择您需要的列。

    df %>% select(Species, y, ymax)
    
    # A tibble: 150 x 3
    #   Species     y  ymax
    #   <fct>   <dbl> <dbl>
    # 1 setosa  0.246 0.261
    # 2 setosa  0.246 0.261
    # 3 setosa  0.246 0.261
    # 4 setosa  0.246 0.261
    # 5 setosa  0.246 0.261
    # 6 setosa  0.246 0.261
    # 7 setosa  0.246 0.261
    # 8 setosa  0.246 0.261
    # 9 setosa  0.246 0.261
    #10 setosa  0.246 0.261
    # … with 140 more rows
    

    另一种不创建临时变量tmp 的方法是:

    iris %>% 
      select(Species, Petal.Width) %>% 
      as_tibble() %>%
      group_by(Species) %>% 
      mutate(ms = list(mean_se(Petal.Width))) %>%
      unnest(ms) %>%
      ungroup
    

    【讨论】:

    • 谢谢。有没有办法通过tmp一步?
    • 是的,您可以将输出存储在listunnest 中。查看更新的答案。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-07-03
    • 2021-04-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-06-19
    相关资源
    最近更新 更多