如何使用 dplyr 访问/选择嵌套数据框列答案

【问题标题】：How to access/select nested data frame column with dplyr如何使用 dplyr 访问/选择嵌套数据框列
【发布时间】：2021-04-09 02:49:41
【问题描述】：

我有以下数据框：

library(tidyverse)
iris %>% 
  dplyr::select(Species, Petal.Width) %>% 
  as_tibble()

然后我通过Species分组得到mean_se，代码如下：

df <- iris %>% 
  dplyr::select(Species, Petal.Width) %>% 
  as_tibble() %>%
  group_by(Species) %>% 
  mutate(ms = mean_se(Petal.Width))

看起来像这样：

 df
# A tibble: 150 x 3
# Groups:   Species [3]
   Species Petal.Width  ms$y $ymin $ymax
   <fct>         <dbl> <dbl> <dbl> <dbl>
 1 setosa          0.2 0.246 0.231 0.261
 2 setosa          0.2 0.246 0.231 0.261
 3 setosa          0.2 0.246 0.231 0.261
 4 setosa          0.2 0.246 0.231 0.261
 5 setosa          0.2 0.246 0.231 0.261
 6 setosa          0.4 0.246 0.231 0.261
 7 setosa          0.3 0.246 0.231 0.261
 8 setosa          0.2 0.246 0.231 0.261
 9 setosa          0.2 0.246 0.231 0.261
10 setosa          0.1 0.246 0.231 0.261

但是当我想选择ms$y 和$ymax 列时，像这样

> df %>% dplyr::select(Species, ms$y, $ymax)
Error: unexpected '$' in "df %>% dplyr::select(Species, ms$y, $"

失败了。有什么办法呢？

【问题讨论】：

标签： r dplyr tidyverse

【解决方案1】：

它被存储为嵌套数据框。您可以将其转换为普通数据框：

library(tidyverse)

iris %>% 
  select(Species, Petal.Width) %>% 
  as_tibble() %>%
  group_by(Species) %>% 
  mutate(ms = mean_se(Petal.Width)) %>%
  ungroup -> tmp

df <- bind_cols(tmp %>% select(-ms), tmp$ms) 
df

# A tibble: 150 x 5
#   Species Petal.Width     y  ymin  ymax
#   <fct>         <dbl> <dbl> <dbl> <dbl>
# 1 setosa          0.2 0.246 0.231 0.261
# 2 setosa          0.2 0.246 0.231 0.261
# 3 setosa          0.2 0.246 0.231 0.261
# 4 setosa          0.2 0.246 0.231 0.261
# 5 setosa          0.2 0.246 0.231 0.261
# 6 setosa          0.4 0.246 0.231 0.261
# 7 setosa          0.3 0.246 0.231 0.261
# 8 setosa          0.2 0.246 0.231 0.261
# 9 setosa          0.2 0.246 0.231 0.261
#10 setosa          0.1 0.246 0.231 0.261
# … with 140 more rows

选择您需要的列。

df %>% select(Species, y, ymax)

# A tibble: 150 x 3
#   Species     y  ymax
#   <fct>   <dbl> <dbl>
# 1 setosa  0.246 0.261
# 2 setosa  0.246 0.261
# 3 setosa  0.246 0.261
# 4 setosa  0.246 0.261
# 5 setosa  0.246 0.261
# 6 setosa  0.246 0.261
# 7 setosa  0.246 0.261
# 8 setosa  0.246 0.261
# 9 setosa  0.246 0.261
#10 setosa  0.246 0.261
# … with 140 more rows

另一种不创建临时变量tmp 的方法是：

iris %>% 
  select(Species, Petal.Width) %>% 
  as_tibble() %>%
  group_by(Species) %>% 
  mutate(ms = list(mean_se(Petal.Width))) %>%
  unnest(ms) %>%
  ungroup

【讨论】：

谢谢。有没有办法通过tmp一步？
是的，您可以将输出存储在list 和unnest 中。查看更新的答案。