【发布时间】:2018-04-16 04:04:57
【问题描述】:
我很困惑。我试图以多种方式从我的data.frame/data.table 中删除 NA:na.omit、dropNA()(我从 StackOverflow 找到的一个函数)、complete.cases、
dropNA():
dropNA <- function(dat) {
dat %>% filter(rowSums(is.na(.)) != ncol(.))
}
我尝试使用上述方法删除NAs,但正如您在下面的tibble 中看到的那样,结果中仍然包含NA。
> # drop NAs:
> design_mat4 <- na.omit(design_mat4)
> design_mat4 <- dropNA(design_mat4)
> design_mat4 <- design_mat4[complete.cases(design_mat4), ]
> target_n <- sum(design_mat4$label == 0)
> a <- design_mat4[which(design_mat4$label == 1), ]
> positive_samp = a[sample(x = nrow(design_mat4),
+ size = target_n,
+ replace = TRUE), ]
> positive_samp
# A tibble: 50,447 x 14
email_status score email_is_blacklis~ email_domain_is_bla~ email_domain_blackl~ email_domain_pa~
<fct> <int> <fct> <fct> <fct> <fct>
1 verified 85 0 0 "" not_parked
2 verified 85 1 0 "" not_parked
3 verified 85 0 0 "" not_parked
4 NA NA NA NA NA NA
5 verified 57 1 0 "" not_parked
6 verified 85 0 0 "" no_website_cont~
7 verified 57 1 0 "" not_parked
8 verified 85 0 0 "" not_parked
9 NA NA NA NA NA NA
10 verified 85 0 0 "" not_parked
# ... with 50,437 more rows, and 8 more variables: email_domain_lawsite <fct>, . . ., label <fct>
是不是因为tibble 会生成数据原始状态的汇总统计信息?
最后,我希望移除 NA。请帮忙!
【问题讨论】:
-
你试过了吗df %>% na.omit
-
@TonyHellmuth 是的。
-
您的 NA 值是否可能实际上是字符串?
-
你一定也尝试过df %>% filter(complete.cases(.))。
-
是的,这可能有几个原因,但可能需要重现结果;也许给我们一个样品?