【发布时间】:2019-04-16 05:46:57
【问题描述】:
我正在使用如下的 tibble:
ex <- structure(list(rowid = c(4L, 5L, 6L, 9L, 10L), timestamp = structure(c(1502480694.03336,
1502480695.44736, 1502480696.03336, 1502480703.99836, 1502480706.19936
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), cat = c(32L,
1L, 1L, 1L, 1L), var1 = structure(c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = "1", class = "factor"),
var2 = c(0, 50, 29.7, 51, 70.8), var3 = c(NA, 26.3, 24, 20.5,
12), order = c(NA, 1L, 1L, 1L, 1L), bfr = list(NA, structure(list(
rowid = integer(0), timestamp = structure(numeric(0), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), cat = integer(0), var1 = structure(integer(0), .Label = "1", class = "factor"),
var2 = numeric(0), var3 = numeric(0), order = integer(0)), class = c("tbl_df",
"tbl", "data.frame"), row.names = integer(0)), structure(list(
rowid = 5L, timestamp = structure(1502480695.44736, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), cat = 1L, var1 = structure(NA_integer_, .Label = "1", class = "factor"),
var2 = 50, var3 = 26.3, order = 1L), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -1L)), structure(list(
rowid = 5:8, timestamp = structure(c(1502480695.44736,
1502480696.03336, 1502480699.03336, 1502480701.03336), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), cat = c(1L, 1L, 1L, 1L), var1 = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = "1", class = "factor"),
var2 = c(50, 29.7, 52.8, 44), var3 = c(26.3, 24, 8.9,
12.4), order = c(1L, 1L, 1L, 1L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L)), structure(list(
rowid = 5:9, timestamp = structure(c(1502480695.44736,
1502480696.03336, 1502480699.03336, 1502480701.03336,
1502480703.99836), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
cat = c(1L, 1L, 1L, 1L, 1L), var1 = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = "1", class = "factor"),
var2 = c(50, 29.7, 52.8, 44, 51), var3 = c(26.3, 24,
8.9, 12.4, 20.5), order = c(1L, 1L, 1L, 1L, 1L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L)))), row.names = c(4L,
5L, 6L, 9L, 10L), class = "data.frame")
我想用map 总结bfr 列中的嵌套小标题。为了省略不必要的计算,我想使用map_if,它会在bfr 包含少于2 行cat == 1 时跳过该行。然而,由于NAs 和bfr 列中的空小标题的存在,我正在努力编写适当的谓词函数。这是我的尝试:
more_than <- function(df){
if (nrow(df) == 0 | is.na(df)) return(FALSE)
n <- df %>%
summarise(sum(cat == 1)) %>%
as.numeric()
return(n > 2)
}
ex %>%
mutate(mean_var2 = map_if(bfr, more_than,
~.x %>% summarise(mean_var2 = mean(var2))))
导致:
if (nrow(df) == 0 | is.na(df)) return(FALSE) 中的错误: 参数长度为零
如何处理 NAs 和空 tibbles 的存在以编写适当的谓词函数?
【问题讨论】:
-
问题在于
is.na(df),它对整个数据进行 NA 检查,而 nrow 是单个输出 -
另外,在
more_than中,您正在进行一些其他计算,而这些计算在mean_var2中没有作为输出得到 -
抱歉,我没有收到您的第一条评论 - 您能否详细说明您的答案?
more_than只是一个谓词,以避免对bfr列的某些元素进行不必要的计算。