使用 purrr 基于嵌套数据框列中的数据进行过滤答案

【问题标题】：Filter based on data in a nested data frame column using purrr使用 purrr 基于嵌套数据框列中的数据进行过滤
【发布时间】：2023-03-19 10:09:02
【问题描述】：

我正在尝试根据嵌套数据框列中的数据过滤数据框的行。考虑以下示例：

library(tidyverse)

df  <- structure(list(id = c(47L, 47L, 45L, 45L, 85L, 85L), src = c("bycity", 
         "indb", "bycity", "indb", "bycity", "indb"), lat = c(42.73856678, 
         NA, 39.40803248, 39.40620766, 42.52458775, NA), lon = c(-85.82890251, 
         -85.654987, -88.47774221, -88.50701219, -87.26410992, -83.647894)), .Names = c("id", 
          "src", "lat", "lon"), row.names = c(NA, -6L), class = c("tbl_df", 
         "tbl", "data.frame")
    ) %>% 
  nest(-id) %>% 
  mutate(
    anothervar = c(0.077537764, NA, 0.029326812)
  )


# only keep the rows where the lat in the indb row is NA
filtereddf  <- df %>% 
   filter(map(data, ~(.x %>% pluck("lat", 2) %>% is.na )) )

# Error in filter_impl(.data, quo) : 
#   Argument 2 filter condition does not evaluate to a logical vector


# desired output would be the two rows where data[[2,2]] is NA
# A tibble: 2 x 3
     id             data anothervar
  <int>           <list>      <dbl>
1    47 <tibble [2 x 3]> 0.07753776
3    85 <tibble [2 x 3]> 0.02932681

我过滤的嵌套数据框具有一致的列名，我总是只想查看第二行。

我想我可以取消嵌套数据框（每个 ID 给我两行，我之前有一个），然后将内容过滤到符合我的条件的 ID 列表并使用 anti_join() 丢弃违规行，但我更感兴趣的是了解为什么在过滤器中使用 map() 不能按我预期的方式工作。

为什么我会收到此错误，如何过滤嵌套数据框列？

【问题讨论】：

它说filter 需要一个逻辑向量来评估所以也许map_lgl 是你所追求的？
天哪！看起来它做到了。那么map() 返回的是什么？也许是一个逻辑值列表，而filter() 想要一个逻辑值向量，我想？
或者可能是索引，但我不确定。不是purrr 专家

标签： r dplyr tidyverse purrr

【解决方案1】：

您想使用map_lgl()，map() 将返回一个列表，而map_lgl() 返回一个逻辑类型的向量。

filtereddf  <- df %>% 
   filter(map_lgl(data, ~(.x %>% pluck("lat", 2) %>% is.na )) )

【讨论】：